Computer Science
New submissions
[ showing up to 2000 entries per page: fewer  more ]
New submissions for Tue, 25 Feb 20
 [1] arXiv:2002.09477 [pdf]

Title: Graph Computing based Distributed State Estimation with PMUsComments: 5 pages, 3 figures, 3 tables, 2020 IEEE Power and Energy Society General Meeting. arXiv admin note: substantial text overlap with arXiv:1902.06893Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF); Signal Processing (eess.SP); Numerical Analysis (math.NA)
Power system state estimation plays a fundamental and critical role in the energy management system (EMS). To achieve a high performance and accurate system states estimation, a graph computing based distributed state estimation approach is proposed in this paper. Firstly, a power system network is divided into multiple areas. Reference buses are selected with PMUs being installed at these buses for each area. Then, the system network is converted into multiple independent areas. In this way, the power system state estimation could be conducted in parallel for each area and the estimated system states are obtained without compromise of accuracy. IEEE 118bus system and MP 10790bus system are employed to verify the results accuracy and present the promising computation performance.
 [2] arXiv:2002.09478 [pdf, other]

Title: On the Search for Feedback in Reinforcement LearningComments: arXiv admin note: substantial text overlap with arXiv:1904.08361Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper addresses the problem of learning the optimal feedback policy for a nonlinear stochastic dynamical system with continuous state space, continuous action space and unknown dynamics. Feedback policies are complex objects that typically need a large dimensional parametrization, which makes Reinforcement Learning algorithms that search for an optimum in this large parameter space, sample inefficient and subject to high variance. We propose a "decoupling" principle that drastically reduces the feedback parameter space while still remaining nearoptimal to the fourthorder in a small noise parameter. Based on this principle, we propose a decoupled databased control (D2C) algorithm that addresses the stochastic control problem: first, an openloop deterministic trajectory optimization problem is solved using a blackbox simulation model of the dynamical system. Then, a linear closedloop control is developed around this nominal trajectory using only a simulation model. Empirical evidence suggests significant reduction in training time, as well as the training variance, compared to other state of the art Reinforcement Learning algorithms.
 [3] arXiv:2002.09479 [pdf, ps, other]

Title: KullbackLeibler DivergenceBased Fuzzy $C$Means Clustering Incorporating Morphological Reconstruction and Wavelet Frames for Image SegmentationComments: 13 pages, 13 figures, 5 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Although spatial information of images usually enhance the robustness of the Fuzzy CMeans (FCM) algorithm, it greatly increases the computational costs for image segmentation. To achieve a sound tradeoff between the segmentation performance and the speed of clustering, we come up with a KullbackLeibler (KL) divergencebased FCM algorithm by incorporating a tight wavelet frame transform and a morphological reconstruction operation. To enhance FCM's robustness, an observed image is first filtered by using the morphological reconstruction. A tight wavelet frame system is employed to decompose the observed and filtered images so as to form their feature sets. Considering these feature sets as data of clustering, an modified FCM algorithm is proposed, which introduces a KL divergence term in the partition matrix into its objective function. The KL divergence term aims to make membership degrees of each image pixel closer to those of its neighbors, which brings that the membership partition becomes more suitable and the parameter setting of FCM becomes simplified. On the basis of the obtained partition matrix and prototypes, the segmented feature set is reconstructed by minimizing the inverse process of the modified objective function. To modify abnormal features produced in the reconstruction process, each reconstructed feature is reassigned to the closest prototype. As a result, the segmentation accuracy of KL divergencebased FCM is further improved. What's more, the segmented image is reconstructed by using a tight wavelet frame reconstruction operation. Finally, supporting experiments coping with synthetic, medical and color images are reported. Experimental results exhibit that the proposed algorithm works well and comes with better segmentation performance than other comparative algorithms. Moreover, the proposed algorithm requires less time than most of the FCMrelated algorithms.
 [4] arXiv:2002.09481 [pdf, other]

Title: TFApprox: Towards a Fast Emulation of DNN Approximate Hardware Accelerators on GPUComments: To appear at the 23rd Design, Automation and Test in Europe (DATE 2020). Grenoble, FranceSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG)
Energy efficiency of hardware accelerators of deep neural networks (DNN) can be improved by introducing approximate arithmetic circuits. In order to quantify the error introduced by using these circuits and avoid the expensive hardware prototyping, a software emulator of the DNN accelerator is usually executed on CPU or GPU. However, this emulation is typically two or three orders of magnitude slower than a software DNN implementation running on CPU or GPU and operating with standard floating point arithmetic instructions and common DNN libraries. The reason is that there is no hardware support for approximate arithmetic operations on common CPUs and GPUs and these operations have to be expensively emulated. In order to address this issue, we propose an efficient emulation method for approximate circuits utilized in a given DNN accelerator which is emulated on GPU. All relevant approximate circuits are implemented as lookup tables and accessed through a texture memory mechanism of CUDA capable GPUs. We exploit the fact that the texture memory is optimized for irregular readonly access and in some GPU architectures is even implemented as a dedicated cache. This technique allowed us to reduce the inference time of the emulated DNN accelerator approximately 200 times with respect to an optimized CPU version on complex DNNs such as ResNet. The proposed approach extends the TensorFlow library and is available online at https://github.com/ehwfit/tfapproximate.
 [5] arXiv:2002.09485 [pdf, other]

Title: The Four Dimensions of Social Network Analysis: An Overview of Research Methods, Applications, and Software ToolsComments: This paper is currently under evaluation in Information Fusion journalSubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG)
Social network based applications have experienced exponential growth in recent years. One of the reasons for this rise is that this application domain offers a particularly fertile place to test and develop the most advanced computational techniques to extract valuable information from the Web. The main contribution of this work is threefold: (1) we provide an uptodate literature review of the state of the art on social network analysis (SNA);(2) we propose a set of new metrics based on four essential features (or dimensions) in SNA; (3) finally, we provide a quantitative analysis of a set of popular SNA tools and frameworks. We have also performed a scientometric study to detect the most active research areas and application domains in this area. This work proposes the definition of four different dimensions, namely Pattern & Knowledge discovery, Information Fusion & Integration, Scalability, and Visualization, which are used to define a set of new metrics (termed degrees) in order to evaluate the different software tools and frameworks of SNA (a set of 20 SNAsoftware tools are analyzed and ranked following previous metrics). These dimensions, together with the defined degrees, allow evaluating and measure the maturity of social network technologies, looking for both a quantitative assessment of them, as to shed light to the challenges and future trends in this active area.
 [6] arXiv:2002.09504 [pdf, ps, other]

Title: Robust Numerical Tracking of One Path of a Polynomial Homotopy on Parallel Shared Memory ComputersSubjects: Numerical Analysis (math.NA); Distributed, Parallel, and Cluster Computing (cs.DC); Symbolic Computation (cs.SC)
We consider the problem of tracking one solution path defined by a polynomial homotopy on a parallel shared memory computer. Our robust path tracker applies Newton's method on power series to locate the closest singular parameter value. On top of that, it computes singular values of the Hessians of the polynomials in the homotopy to estimate the distance to the nearest different path. Together, these estimates are used to compute an appropriate adaptive stepsize. For ndimensional problems, the cost overhead of our robust path tracker is O(n), compared to the commonly used predictorcorrector methods. This cost overhead can be reduced by a multithreaded program on a parallel shared memory computer.
 [7] arXiv:2002.09505 [pdf, other]

Title: Estimating Q(s,s') with Deep Deterministic Dynamics GradientsAuthors: Ashley D. Edwards, Himanshu Sahni, Rosanne Liu, Jane Hung, Ankit Jain, Rui Wang, Adrien Ecoffet, Thomas Miconi, Charles Isbell, Jason YosinskiSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make nextstate predictions that maximize this value. This formulation decouples actions from values while still learning offpolicy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning offpolicy from state observations generated by suboptimal or completely random policies. Code and videos are available at \url{sites.google.com/view/qsspaper}.
 [8] arXiv:2002.09511 [pdf, other]

Title: Chronofold: a data structure for versioned textSubjects: Data Structures and Algorithms (cs.DS)
Collaborative text editing and versioning is known to be a tough topic. Diffs, OT and CRDT are three relevant classes of algorithms which all have their issues. CRDT is the only one that works correctly and deterministically in a distributed environment, at the unfortunate cost of data structure complexity and metadata overheads.
A chronofold is a data structure for editable linear collections based on the Causal Tree CRDT model. A chronofold maintains timeordering and spaceordering of its elements. Simply put, a it is both a log and a text at the same time, which makes it very convenient for text versioning and synchronization. Being a simple arraybased data structure with O(1) insertions, chronofold makes CRDT overheads acceptable for many practical applications.  [9] arXiv:2002.09516 [pdf, other]

Title: MinimaxOptimal OffPolicy Evaluation with Linear Function ApproximationSubjects: Machine Learning (cs.LG); Statistics Theory (math.ST); Machine Learning (stat.ML)
This paper studies the statistical theory of batch data reinforcement learning with function approximation. Consider the offpolicy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history generated by unknown behavioral policies. We study a regressionbased fitted Q iteration method, and show that it is equivalent to a modelbased method that estimates a conditional mean embedding of the transition operator. We prove that this method is informationtheoretically optimal and has nearly minimal estimation error. In particular, by leveraging contraction property of Markov processes and martingale concentration, we establish a finitesample instancedependent error upper bound and a nearlymatching minimax lower bound. The policy evaluation error depends sharply on a restricted $\chi^2$divergence over the function class between the longterm distribution of the target policy and the distribution of past data. This restricted $\chi^2$divergence is both instancedependent and functionclassdependent. It characterizes the statistical limit of offpolicy evaluation. Further, we provide an easily computable confidence bound for the policy evaluator, which may be useful for optimistic planning and safe policy improvement.
 [10] arXiv:2002.09518 [pdf, other]

Title: MemoryBased Graph NetworksComments: ICLR 2020Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Graph neural networks (GNNs) are a class of deep models that operate on data with arbitrary topology represented as graphs. We introduce an efficient memory layer for GNNs that can jointly learn node representations and coarsen the graph. We also introduce two new networks based on this layer: memorybased GNN (MemGNN) and graph memory network (GMN) that can learn hierarchical graph representations. The experimental results shows that the proposed models achieve stateoftheart results in eight out of nine graph classification and regression benchmarks. We also show that the learned representations could correspond to chemical features in the molecule data. Code and reference implementations are released at: https://github.com/amirkhas/GraphMemoryNet
 [11] arXiv:2002.09519 [pdf, ps, other]

Title: Exponential Amortized Resource AnalysisSubjects: Programming Languages (cs.PL)
Automatic amortized resource analysis (AARA) is a typebased technique for inferring concrete (nonasymptotic) bounds on a program's resource usage. Existing work on AARA has focused on bounds that are polynomial in the sizes of the inputs. This paper presents and extension of AARA to exponential bounds that preserves the benefits of the technique, such as compositionality and efficient type inference based on linear constraint solving. A key idea is the use of the Stirling numbers of the second kind as the basis of potential functions, which play the same role as the binomial coefficients in polynomial AARA. To formalize the similarities with the existing analyses, the paper presents a general methodology for AARA that is instantiated to the polynomial version, the exponential version, and a combined system with potential functions that are formed by products of Stirling numbers and binomial coefficients. The soundness of exponential AARA is proved with respect to an operational cost semantics and the analysis of representative example programs demonstrates the effectiveness of the new analysis.
 [12] arXiv:2002.09523 [pdf, other]

Title: StructMMSB: Mixed Membership Stochastic Blockmodels with Interpretable Structured PriorsComments: ECAI 2020Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
The mixed membership stochastic blockmodel (MMSB) is a popular framework for community detection and network generation. It learns a lowrank mixed membership representation for each node across communities by exploiting the underlying graph structure. MMSB assumes that the membership distributions of the nodes are independently drawn from a Dirichlet distribution, which limits its capability to model highly correlated graph structures that exist in realworld networks. In this paper, we present a flexible richly structured MMSB model, \textit{StructMMSB}, that uses a recently developed statistical relational learning model, hingeloss Markov random fields (HLMRFs), as a structured prior to model complex dependencies among node attributes, multirelational links, and their relationship with mixedmembership distributions. Our model is specified using a probabilistic programming templating language that uses weighted firstorder logic rules, which enhances the model's interpretability. Further, our model is capable of learning latent characteristics in realworld networks via meaningful latent variables encoded as a complex combination of observed features and membership distributions. We present an expectationmaximization based inference algorithm that learns latent variables and parameters iteratively, a scalable stochastic variation of the inference algorithm, and a method to learn the weights of HLMRF structured priors. We evaluate our model on six datasets across three different types of networks and corresponding modeling scenarios and demonstrate that our models are able to achieve an improvement of 15\% on average in test loglikelihood and faster convergence when compared to stateoftheart network models.
 [13] arXiv:2002.09533 [pdf, other]

Title: RealTime Visualization in NonIsotropic GeometriesSubjects: Graphics (cs.GR); Differential Geometry (math.DG)
Nonisotropic geometries are of interest to lowdimensional topologists, physicists and cosmologists. However, they are challenging to comprehend and visualize. We present novel methods of computing realtime native geodesic rendering of nonisotropic geometries. Our methods can be applied not only to visualization, but also are essential for potential applications in machine learning and video games.
 [14] arXiv:2002.09534 [pdf, other]

Title: Hyperbolic Minesweeper is in PAuthors: Eryk KopczyńskiSubjects: Computational Complexity (cs.CC); Artificial Intelligence (cs.AI)
We show that, while Minesweeper is NPcomplete, its hyperbolic variant is in P. Our proof does not rely on the rules of Minesweeper, but is valid for any puzzle based on satisfying local constraints on a graph embedded in the hyperbolic plane.
 [15] arXiv:2002.09535 [pdf, other]

Title: RobustPeriod: TimeFrequency Mining for Robust Multiple Periodicities DetectionComments: 9 pages, 7 figures, and 4 tablesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)
Periodicity detection is an important task in time series analysis as it plays a crucial role in many time series tasks such as classification, clustering, compression, anomaly detection, and forecasting. It is challenging due to the following reasons: 1, complicated nonstationary time series; 2, dynamic and complicated periodic patterns, including multiple interlaced periodic components; 3, outliers and noises. In this paper, we propose a robust periodicity detection algorithm to address these challenges. Our algorithm applies maximal overlap discrete wavelet transform to transform the time series into multiple temporalfrequency scales such that different periodicities can be isolated. We rank them by wavelet variance and then at each scale, and then propose Huberperiodogram by formulating the periodogram as the solution to Mestimator for introducing robustness. We rigorously prove the theoretical properties of Huberperiodogram and justify the use of Fisher's test on Huberperiodogram for periodicity detection. To further refine the detected periods, we compute unbiased autocorrelation function based on WienerKhinchin theorem from Huberperiodogram for improved robustness and efficiency. Experiments on synthetic and realworld datasets show that our algorithm outperforms other popular ones for both single and multiple periodicity detection. It is now implemented and provided as a public online service at Alibaba Group and has been used extensive in different business lines.
 [16] arXiv:2002.09536 [pdf, other]

Title: Image to Language Understanding: Captioning approachComments: 8 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV)
Extracting context from visual representations is of utmost importance in the advancement of Computer Science. Representation of such a format in Natural Language has a huge variety of applications such as helping the visually impaired etc. Such an approach is a combination of Computer Vision and Natural Language techniques which is a hard problem to solve. This project aims to compare different approaches for solving the image captioning problem. In specific, the focus was on comparing two different types of models: EncoderDecoder approach and a Multimodel approach. In the encoderdecoder approach, inject and merge architectures were compared against a multimodal image captioning approach based primarily on object detection. These approaches have been compared on the basis on state of the art sentence comparison metrics such as BLEU, GLEU, Meteor, and Rouge on a subset of the Google Conceptual captions dataset which contains 100k images. On the basis of this comparison, we observed that the best model was the Inception injected encoder model. This best approach has been deployed as a webbased system. On uploading an image, such a system will output the best caption associated with the image.
 [17] arXiv:2002.09539 [pdf, other]

Title: Overlap LocalSGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGDComments: Accepted to ICASSP 2020Subjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC); Machine Learning (stat.ML)
Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named OverlapLocalSGD (and its momentum variant) to overlap the communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR10 dataset demonstrate the effectiveness of OverlapLocalSGD. We also provide a convergence guarantee for the proposed algorithm under nonconvex objective functions.
 [18] arXiv:2002.09541 [pdf]

Title: Evaluation of Automatic FPGA Offloading for Loop Statements of ApplicationsAuthors: Yoji YamatoComments: 7 pages, 4 figure, in Japanese, IEICE Technical Report, SWIM201925Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
In recent years, with the prediction of Moore's law slowing down, utilization of hardware other than CPU such as FPGA which is energy effective is increasing. However, when using heterogeneous hardware other than CPUs, barriers of technical skills such as OpenCL are high. Based on that, I have proposed environment adaptive software that enables automatic conversion, configuration, and highperformance operation of once written code, according to the hardware to be placed. Partly of the offloading to the GPU was automated previously. In this paper, I propose and evaluate an automatic extraction method of appropriate offload target loop statements of source code as the first step of offloading to FPGA. I evaluate the effectiveness of the proposed method using an existing application.
 [19] arXiv:2002.09543 [pdf, other]

Title: Modelling Latent Skills for Multitask Language GenerationSubjects: Computation and Language (cs.CL)
We present a generative model for multitask conditional language generation. Our guiding hypothesis is that a shared set of latent skills underlies many disparate language generation tasks, and that explicitly modelling these skills in a task embedding space can help with both positive transfer across tasks and with efficient adaptation to new tasks. We instantiate this task embedding space as a latent variable in a latent variable sequencetosequence model. We evaluate this hypothesis by curating a series of monolingual texttotext language generation datasets  covering a broad range of tasks and domains  and comparing the performance of models both in the multitask and fewshot regimes. We show that our latent task variable model outperforms other sequencetosequence baselines on average across tasks in the multitask setting. In the fewshot learning setting on an unseen test dataset (i.e., a new task), we demonstrate that model adaptation based on inference in the latent task space is more robust than standard finetuning based parameter adaptation and performs comparably in terms of overall performance. Finally, we examine the latent task representations learnt by our model and show that they cluster tasks in a natural way.
 [20] arXiv:2002.09545 [pdf, other]

Title: RobustTAD: Robust Time Series Anomaly Detection via Decomposition and Convolutional Neural NetworksComments: 9 pages, 5 figures, and 2 tablesSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Applications (stat.AP); Machine Learning (stat.ML)
The monitoring and management of numerous and diverse time series data at Alibaba Group calls for an effective and scalable time series anomaly detection service. In this paper, we propose RobustTAD, a Robust Time series Anomaly Detection framework by integrating robust seasonaltrend decomposition and convolutional neural network for time series data. The seasonaltrend decomposition can effectively handle complicated patterns in time series, and meanwhile significantly simplifies the architecture of the neural network, which is an encoderdecoder architecture with skip connections. This architecture can effectively capture the multiscale information from time series, which is very useful in anomaly detection. Due to the limited labeled data in time series anomaly detection, we systematically investigate data augmentation methods in both time and frequency domains. We also introduce labelbased weight and valuebased weight in the loss function by utilizing the unbalanced nature of the time series anomaly detection problem. Compared with the widely used forecastingbased anomaly detection algorithms, decompositionbased algorithms, traditional statistical algorithms, as well as recent neural network based algorithms, RobustTAD performs significantly better on public benchmark datasets. It is deployed as a public online service and widely adopted in different business scenarios at Alibaba Group.
 [21] arXiv:2002.09546 [pdf, other]

Title: IMDfence: Architecting a Secure Protocol for Implantable Medical DevicesSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Over the past decade, focus on the security and privacy aspects of implantable medical devices (IMDs) has intensified, driven by the multitude of cybersecurity vulnerabilities found in various existing devices. However, due to their strict computational, energy and physical constraints, conventional security protocols are not directly applicable to IMDs. Customtailored schemes have been proposed instead which, however, fail to cover the full spectrum of security features that modern IMDs and their ecosystems so critically require. In this paper we propose IMDfence, a security protocol for IMD ecosystems that provides a comprehensive yet practical security portfolio, which includes availability, nonrepudiation, access control, emergency access, entity authentication, remote monitoring and system scalability. The performance of the security protocol as well as its feasibility and impact on modern IMDs are extensively analyzed and evaluated. We find that IMDfence achieves the above security requirements at a mere 4.64\% increase in total IMD energy consumption, and less than 14 ms and 6 kB increase in system delay and memory footprint respectively.
 [22] arXiv:2002.09553 [pdf, ps, other]

Title: Sequential decomposition of discrete memoryless channel with noisy feedbackAuthors: Deepanshu VasalSubjects: Systems and Control (eess.SY); Information Theory (cs.IT)
In this paper, we consider a discrete memoryless point to point channel with noisy feedback, where there is a sender with a private message that she wants to communicate to a receiver by sequentially transmitting symbols over a noisy channel. After each transmission, she receives a noisy feedback of the symbol received by the receiver. The goal is to design transmission control strategy of the sender that minimize the average probability of error. This is an instance of decentralized control of information where the two controllers, the sender and the receiver have no common information. There exist no methodology in the literature that provides a notion of "state" and a dynamic program to find optimal policies for this problem In this paper, we show introduce a notion of state, based on which we provide a sequential decomposition methodology that finds optimum policies within the class of Markov strategies with respect to this state (which need not be globally optimum). This allows to decompose the problem across time and reduce the complexity dependence on time from double exponential to linear in time.
 [23] arXiv:2002.09554 [pdf, other]

Title: Particle Filter Based Monocular Human Tracking with a 3D Cardbox Model and a Novel Deterministic Resampling StrategySubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
The challenge of markerless human motion tracking is the high dimensionality of the search space. Thus, efficient exploration in the search space is of great significance. In this paper, a motion capturing algorithm is proposed for upper body motion tracking. The proposed system tracks human motion based on monocular silhouettematching, and it is built on the top of a hierarchical particle filter, within which a novel deterministic resampling strategy (DRS) is applied. The proposed system is evaluated quantitatively with the ground truth data measured by an inertial sensor system. In addition, we compare the DRS with the stratified resampling strategy (SRS). It is shown in experiments that DRS outperforms SRS with the same amount of particles. Moreover, a new 3D articulated human upper body model with the name 3D cardbox model is created and is proven to work successfully for motion tracking. Experiments show that the proposed system can robustly track upper body motion without selfocclusion. Motions towards the camera can also be well tracked.
 [24] arXiv:2002.09560 [pdf, other]

Title: Practical Verification of MapReduce Computation Integrity via Partial ReexecutionComments: 12 pagesSubjects: Cryptography and Security (cs.CR); Distributed, Parallel, and Cluster Computing (cs.DC)
Big data processing is often outsourced to powerful, but untrusted cloud service providers that provide agile and scalable computing resources to weaker clients. However, untrusted cloud services do not ensure the integrity of data and computations while clients have no control over the outsourced computation or no means to check the correctness of the execution. Despite a growing interest and recent progress in verifiable computation, the existing techniques are still not practical enough for big data processing due to high verification overhead. In this paper, we present a solution called VMR (Verifiable MapReduce), which is a framework that verifies the integrity of MapReduce computation outsourced in the untrusted cloud via partial reexecution. VMR is practically effective and efficient in that (1) it can detect the violation of MapReduce computation integrity and identify the malicious workers involved in the that produced the incorrect computation. (2) it can reduce the overhead of verification via partial reexecution with carefully selected input data and program code using program analysis. The experiment results of a prototype of VMR show that VMR can verify the integrity of MapReduce computation effectively with small overhead for partial reexecution.
 [25] arXiv:2002.09561 [pdf, other]

Title: Performance / Complexity Tradeoffs of the Sphere Decoder Algorithm for Massive MIMO SystemsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Signal Processing (eess.SP)
Massive MIMO systems are seen by many researchers as a paramount technology toward next generation networks. This technology consists of hundreds of antennas that are capable of sending and receiving simultaneously a huge amount of data. One of the main challenges when using this technology is the necessity of an efficient decoding framework. The latter must guarantee both a low complexity and a good signal detection accuracy. The Sphere Decoder (SD) algorithm represents one of the promising decoding algorithms in terms of detection accuracy. However, it is inefficient for dealing with large MIMO systems due to its prohibitive complexity. To overcome this drawback, we propose to revisit the sequential SD algorithm and implement several variants that aim at finding appropriate tradeoffs between complexity and performance. Then, we propose an efficient highlevel parallel SD scheme based on the master/worker paradigm, which permits multiple SD instances to simultaneously explore the search space, while mitigating the overheads from load imbalance. The results of our parallel SD implementation outperform the stateoftheart by more than 5x using similar MIMO configuration systems, and show a superlinear speedup on multicore platforms. Moreover, this paper presents a new hybrid implementation that combines the strengths of SD and Kbest algorithms, i.e., maintaining the detection accuracy of SD, while reducing the complexity using the Kbest way of pruning search space. The hybrid approach extends our parallel SD implementation: the master contains the SD search tree, and the workers use the Kbest algorithm to accelerate its exploration. The resulting hybrid approach enhances the diversification gain, and therefore, lowers the overall complexity. Our synergistic hybrid approach permits to deal with large MIMO configurations up to 100x100, without sacrificing the accuracy and complexity.
 [26] arXiv:2002.09563 [pdf, ps, other]

Title: StructurePreserving and Efficient Numerical Methods for Ion TransportSubjects: Numerical Analysis (math.NA)
Ion transport, often described by the PoissonNernstPlanck (PNP) equations, is ubiquitous in electrochemical devices and many biological processes of significance. In this work, we develop conservative, positivitypreserving, energy dissipating, and implicit finite difference schemes for solving the multidimensional PNP equations with multiple ionic species. A centraldifferencing discretization based on harmonicmean approximations is employed for the NernstPlanck (NP) equations. The backward Euler discretization in time is employed to derive a fully implicit nonlinear system, which is efficiently solved by a newly proposed Newton's method. The improved computational efficiency of the Newton's method originates from the usage of the electrostatic potential as the iteration variable, rather than the unknowns of the nonlinear system that involves both the potential and concentration of multiple ionic species. Numerical analysis proves that the numerical schemes respect three desired analytical properties (conservation, positivity preserving, and energy dissipation) fully discretely. Based on advantages brought by the harmonicmean approximations, we are able to establish estimate on the upper bound of condition numbers of coefficient matrices in linear systems that are solved iteratively. The solvability and stability of the linearized problem in the Newton's method are rigorously established as well. Numerical tests are performed to confirm the anticipated numerical accuracy, computational efficiency, and structurepreserving properties of the developed schemes. Adaptive time stepping is implemented for further efficiency improvement. Finally, the proposed numerical approaches are applied to characterize ion transport subject to a sinusoidal applied potential.
 [27] arXiv:2002.09564 [pdf, other]

Title: Towards Robust and Reproducible Active Learning Using Neural NetworksSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Active learning (AL) is a promising ML paradigm that has the potential to parse through large unlabeled data and help reduce annotation cost in domains where labeling entire data can be prohibitive. Recently proposed neural network based AL methods use different heuristics to accomplish this goal. In this study, we show that recent AL methods offer a gain over random baseline under a brittle combination of experimental conditions. We demonstrate that such marginal gains vanish when experimental factors are changed, leading to reproducibility issues and suggesting that AL methods lack robustness. We also observe that with a properly tuned model, which employs recently proposed regularization techniques, the performance significantly improves for all AL methods including the random sampling baseline, and performance differences among the AL methods become negligible. Based on these observations, we suggest a set of experiments that are critical to assess the true effectiveness of an AL method. To facilitate these experiments we also present an open source toolkit. We believe our findings and recommendations will help advance reproducible research in robust AL using neural networks.
 [28] arXiv:2002.09565 [pdf, other]

Title: Adversarial Attacks on Machine Learning Systems for HighFrequency TradingAuthors: Micah Goldblum, Avi Schwarzschild, Naftali Cohen, Tucker Balch, Ankit B. Patel, Tom GoldsteinSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Statistical Finance (qfin.ST)
Algorithmic trading systems are often completely automated, and deep learning is increasingly receiving attention in this domain. Nonetheless, little is known about the robustness properties of these models. We study valuation models for algorithmic trading from the perspective of adversarial machine learning. We introduce new attacks specific to this domain with size constraints that minimize attack costs. We further discuss how these attacks can be used as an analysis tool to study and evaluate the robustness properties of financial models. Finally, we investigate the feasibility of realistic adversarial attacks in which an adversarial trader fools automated trading systems into making inaccurate predictions.
 [29] arXiv:2002.09570 [pdf, ps, other]

Title: Feedback game on Eulerian graphsComments: 16 pages, 12 figuresSubjects: Computer Science and Game Theory (cs.GT); Discrete Mathematics (cs.DM); Combinatorics (math.CO)
In this paper, we introduce a twoplayer impartial game on graphs, called a {\em feedback game}, which is a variant of the generalized geography. We study the feedback game on Eulerian graphs. In particular, we show that the PSPACEcompleteness of the game and determine the winner of the game on several classes of Eulerian graphs.
 [30] arXiv:2002.09571 [pdf, other]

Title: Learning to Continually LearnAuthors: Shawn Beaulieu, Lapo Frati, Thomas Miconi, Joel Lehman, Kenneth O. Stanley, Jeff Clune, Nick CheneySubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Continual lifelong learning requires an agent or model to learn many sequentially ordered tasks, building on previous knowledge without catastrophically forgetting it. Much work has gone towards preventing the default tendency of machine learning models to catastrophically forget, yet virtually all such work involves manuallydesigned solutions to the problem. We instead advocate metalearning a solution to catastrophic forgetting, allowing AI to learn to continually learn. Inspired by neuromodulatory processes in the brain, we propose A Neuromodulated MetaLearning Algorithm (ANML). It differentiates through a sequential learning process to metalearn an activationgating function that enables contextdependent selective activation within a deep neural network. Specifically, a neuromodulatory (NM) neural network gates the forward pass of another (otherwise normal) neural network called the prediction learning network (PLN). The NM network also thus indirectly controls selective plasticity (i.e. the backward pass of) the PLN. ANML enables continual learning without catastrophic forgetting at scale: it produces stateoftheart continual learning performance, sequentially learning as many as 600 classes (over 9,000 SGD updates).
 [31] arXiv:2002.09572 [pdf, other]

Title: The BreakEven Point on Optimization Trajectories of Deep Neural NetworksAuthors: Stanislaw Jastrzebski, Maciej Szymczak, Stanislav Fort, Devansh Arpit, Jacek Tabor, Kyunghyun Cho, Krzysztof GerasComments: Accepted as a spotlight at ICLR 2020. The last two authors contributed equallySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The early phase of training of deep neural networks is critical for their final performance. In this work, we study how the hyperparameters of stochastic gradient descent (SGD) used in the early phase of training affect the rest of the optimization trajectory. We argue for the existence of the "breakeven" point on this trajectory, beyond which the curvature of the loss surface and noise in the gradient are implicitly regularized by SGD. In particular, we demonstrate on multiple classification tasks that using a large learning rate in the initial phase of training reduces the variance of the gradient, and improves the conditioning of the covariance of gradients. These effects are beneficial from the optimization perspective and become visible after the breakeven point. Complementing prior work, we also show that using a low learning rate results in bad conditioning of the loss surface even for a neural network with batch normalization layers. In short, our work shows that key properties of the loss surface are strongly influenced by SGD in the early phase of training. We argue that studying the impact of the identified effects on generalization is a promising future direction.
 [32] arXiv:2002.09574 [pdf, other]

Title: Coded Federated LearningComments: Presented at the Wireless Edge Intelligence Workshop, IEEE GLOBECOM 2019Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Federated learning is a method of training a global model from decentralized data distributed across client devices. Here, model parameters are computed locally by each client device and exchanged with a central server, which aggregates the local models for a global view, without requiring sharing of training data. The convergence performance of federated learning is severely impacted in heterogeneous computing platforms such as those at the wireless edge, where straggling computations and communication links can significantly limit timely model parameter updates. This paper develops a novel coded computing technique for federated learning to mitigate the impact of stragglers. In the proposed Coded Federated Learning (CFL) scheme, each client device privately generates parity training data and shares it with the central server only once at the start of the training phase. The central server can then preemptively perform redundant gradient computations on the composite parity data to compensate for the erased or delayed parameter updates. Our results show that CFL allows the global model to converge nearly four times faster when compared to an uncoded approach
 [33] arXiv:2002.09575 [pdf, other]

Title: A MultiChannel Neural Graphical Event Model with Negative EvidenceAuthors: Tian Gao, Dharmashankar Subramanian, Karthikeyan Shanmugam, Debarun Bhattacharjya, Nicholas MatteiComments: AAAI 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Event datasets are sequences of events of various types occurring irregularly over the timeline, and they are increasingly prevalent in numerous domains. Existing work for modeling events using conditional intensities rely on either using some underlying parametric form to capture historical dependencies, or on nonparametric models that focus primarily on tasks such as prediction. We propose a nonparametric deep neural network approach in order to estimate the underlying intensity functions. We use a novel multichannel RNN that optimally reinforces the negative evidence of no observable events with the introduction of fake event epochs within each consecutive interevent interval. We evaluate our method against stateoftheart baselines on model fitting tasks as gauged by loglikelihood. Through experiments on both synthetic and realworld datasets, we find that our proposed approach outperforms existing baselines on most of the datasets studied.
 [34] arXiv:2002.09576 [pdf, other]

Title: UnMask: Adversarial Detection and Defense Through Robust Feature AlignmentSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Deep learning models are being integrated into a wide range of highimpact, securitycritical systems, from selfdriving cars to medical diagnosis. However, recent research has demonstrated that many of these deep learning architectures are vulnerable to adversarial attackshighlighting the vital need for defensive techniques to detect and mitigate these attacks before they occur. To combat these adversarial attacks, we developed UnMask, an adversarial detection and defense framework based on robust feature alignment. The core idea behind UnMask is to protect these models by verifying that an image's predicted class ("bird") contains the expected robust features (e.g., beak, wings, eyes). For example, if an image is classified as "bird", but the extracted features are wheel, saddle and frame, the model may be under attack. UnMask detects such attacks and defends the model by rectifying the misclassification, reclassifying the image based on its robust features. Our extensive evaluation shows that UnMask (1) detects up to 96.75% of attacks, with a false positive rate of 9.66% and (2) defends the model by correctly classifying up to 93% of adversarial images produced by the current strongest attack, Projected Gradient Descent, in the graybox setting. UnMask provides significantly better protection than adversarial training across 8 attack vectors, averaging 31.18% higher accuracy. Our proposed method is architecture agnostic and fast. We open source the code repository and data with this paper: https://github.com/unmaskd/unmask.
 [35] arXiv:2002.09577 [pdf, other]

Title: Emulating duration and curvature of coral snake antipredator thrashing behaviors using a softrobotic platformAuthors: Shannon M. Danforth, Margaret Kohler, Daniel Bruder, Alison R. Davis Rabosky, Sridhar Kota, Ram Vasudevan, Talia Y. MooreComments: 6 pages, 7 figuresSubjects: Robotics (cs.RO)
This paper presents a softrobotic platform for exploring the ecological relevance of nonlocomotory movements via animalrobot interactions. Coral snakes (genus Micrurus) and their mimics use vigorous, nonlocomotory, and arrhythmic thrashing to deter predation. There is variation across snake species in the duration and curvature of antipredator thrashes, and it is unclear how these aspects of motion interact to contribute to snake survival. In this work, soft robots composed of fiberreinforced elastomeric enclosures (FREEs) are developed to emulate the antipredator behaviors of three genera of snake. Curvature and duration of motion are estimated for both live snakes and robots, providing a quantitative assessment of the robots' ability to emulate snake poses. The curvature values of the fabricated softrobotic head, midsection, and tail segments are found to overlap with those exhibited by live snakes. Soft robot motion durations were less than or equal to those of snakes for all three genera. Additionally, combinations of segments were selected to emulate three specific snake genera with distinct antipredatory behavior, producing curvature values that aligned well with live snake observations.
 [36] arXiv:2002.09579 [pdf, other]

Title: Robustness to Programmable String Transformations via Augmented Abstract TrainingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep neural networks for natural language processing tasks are vulnerable to adversarial input perturbations. In this paper, we present a versatile language for programmatically specifying string transformations  e.g., insertions, deletions, substitutions, swaps, etc.  that are relevant to the task at hand. We then present an approach to adversarially training models that are robust to such userdefined string transformations. Our approach combines the advantages of searchbased techniques for adversarial training with abstractionbased techniques. Specifically, we show how to decompose a set of userdefined string transformations into two component specifications, one that benefits from search and another from abstraction. We use our technique to train models on the AG and SST2 datasets and show that the resulting models are robust to combinations of userdefined transformations mimicking spelling mistakes and other meaningpreserving transformations.
 [37] arXiv:2002.09581 [pdf]

Title: Extracting and Validating Explanatory Word Archipelagoes using Dual EntropyComments: 7 pages, 2 figures, 2 columnsSubjects: Computation and Language (cs.CL)
The logical connectivity of text is represented by the connectivity of words that form archipelagoes. Here, each archipelago is a sequence of islands of the occurrences of a certain word. An island here means the local sequence of sentences where the word is emphasized, and an archipelago of a length comparable to the target text is extracted using the covariation of entropy A (the windowbased entropy) on the distribution of the word's occurrences with the width of each time window. Then, the logical connectivity of text is evaluated on entropy B (the graphbased entropy) computed on the distribution of sentences to connected wordclusters obtained on the cooccurrence of words. The results show the parts of the target text with words forming archipelagoes extracted on entropy A, without learned or prepared knowledge, form an explanatory part of the text that is of smaller entropy B than the parts extracted by the baseline methods.
 [38] arXiv:2002.09587 [pdf, ps, other]

Title: The Sample Complexity of Meta Sparse RegressionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
This paper addresses the metalearning problem in sparse linear regression with infinite tasks. We assume that the learner can access several similar tasks. The goal of the learner is to transfer knowledge from the prior tasks to a similar but novel task. For p parameters, size of the support set k , and l samples per task, we show that T \in O (( k log(p) ) /l ) tasks are sufficient in order to recover the common support of all tasks. With the recovered support, we can greatly reduce the sample complexity for estimating the parameter of the novel task, i.e., l \in O (1) with respect to T and p . We also prove that our rates are minimax optimal. A key difference between metalearning and the classical multitask learning, is that metalearning focuses only on the recovery of the parameters of the novel task, while multitask learning estimates the parameter of all tasks, which requires l to grow with T . Instead, our efficient metalearning estimator allows for l to be constant with respect to T (i.e., fewshot learning).
 [39] arXiv:2002.09591 [pdf, other]

Title: Dynamics of large scale networks following a mergerComments: 8 pages, 17 figuresJournalref: T. Ozyer and R. Alhajj (eds), Machine Learning Techniques for Online Social Networks, Lecture Notes in Social Networks, pages 173193. Springer, 2018Subjects: Social and Information Networks (cs.SI); Physics and Society (physics.socph)
We study the dynamic network of relationships among avatars in the massively multiplayer online game Planetside 2. In the spring of 2014, two separate servers of this game were merged, and as a result, two previously distinct networks were combined into one. We observed the evolution of this network in the seven month period following the merger and report our observations. We found that some structures of original networks persist in the combined network for a long time after the merger. As the original avatars are gradually removed, these structures slowly dissolve, but they remain observable for a surprisingly long time. We present a number of visualizations illustrating the postmerger dynamics and discuss time evolution of selected quantities characterizing the topology of the network.
 [40] arXiv:2002.09594 [pdf, other]

Title: OCGNN: Oneclass Classification with Graph Neural NetworksComments: 7 pages, 2 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Nowadays, graphstructured data are increasingly used to model complex systems. Meanwhile, detecting anomalies from graph has become a vital research problem of pressing societal concerns. Anomaly detection is an unsupervised learning task of identifying rare data that differ from the majority. As one of the dominant anomaly detection algorithms, One Class Support Vector Machine has been widely used to detect outliers. However, those traditional anomaly detection methods lost their effectiveness in graph data. Since traditional anomaly detection methods are stable, robust and easy to use, it is vitally important to generalize them to graph data. In this work, we propose One Class Graph Neural Network (OCGNN), a oneclass classification framework for graph anomaly detection. OCGNN is designed to combine the powerful representation ability of Graph Neural Networks along with the classical oneclass objective. Compared with other baselines, OCGNN achieves significant improvements in extensive experiments.
 [41] arXiv:2002.09595 [pdf]

Title: The Pragmatic Turn in Explainable Artificial Intelligence (XAI)Authors: Andrés PáezJournalref: Minds and Machines, 29(3), 441459, 2019Subjects: Artificial Intelligence (cs.AI); Computers and Society (cs.CY); HumanComputer Interaction (cs.HC); Machine Learning (cs.LG)
In this paper I argue that the search for explainable models and interpretable decisions in AI must be reformulated in terms of the broader project of offering a pragmatic and naturalistic account of understanding in AI. Intuitively, the purpose of providing an explanation of a model or a decision is to make it understandable to its stakeholders. But without a previous grasp of what it means to say that an agent understands a model or a decision, the explanatory strategies will lack a welldefined goal. Aside from providing a clearer objective for XAI, focusing on understanding also allows us to relax the factivity condition on explanation, which is impossible to fulfill in many machine learning models, and to focus instead on the pragmatic conditions that determine the best fit between a model and the methods and devices deployed to understand it. After an examination of the different types of understanding discussed in the philosophical and psychological literature, I conclude that interpretative or approximation models not only provide the best way to achieve the objectual understanding of a machine learning model, but are also a necessary condition to achieve posthoc interpretability. This conclusion is partly based on the shortcomings of the purely functionalist approach to posthoc interpretability that seems to be predominant in most recent literature.
 [42] arXiv:2002.09597 [pdf, other]

Title: On Layered FanPlanar Graph DrawingsAuthors: Therese Biedl, Steven Chaplick, Jiři Fiala, Michael Kaufmann, Fabrizio Montecchiani, Martin Nöllenburg, Chrysanthi RaftopoulouSubjects: Computational Geometry (cs.CG); Discrete Mathematics (cs.DM)
In this paper, we study fanplanar drawings that use $h$ layers and are proper, i.e., edges connect adjacent layers. We show that if the embedding of the graph is fixed, then testing the existence of such drawings is fixedparameter tractable in $h$, via a reduction to a similar result for planar graphs by Dujmovi\'{c} et al. If the embedding is not fixed, then we give partial results for $h=2$: It was already known how to test existence of fanplanar proper 2layer drawings for 2connected graphs, and we show here how to test this for trees. Along the way, we exhibit other interesting results for graphs with a fanplanar proper $h$layer drawings; in particular we bound their pathwidth and show that they have a bar1visibility representation.
 [43] arXiv:2002.09598 [pdf, ps, other]

Title: A characterization of proportionally representative committeesSubjects: Computer Science and Game Theory (cs.GT); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Theoretical Economics (econ.TH)
A wellknown axiom for proportional representation is Proportionality of Solid Coalitions (PSC). We characterize committees satisfying PSC as possible outcomes of the Minimal Demand rule, which generalizes an approach pioneered by Michael Dummett.
 [44] arXiv:2002.09599 [pdf, other]

Title: Training Question Answering Models From Synthetic DataSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Question and answer generation is a data augmentation method that aims to improve question answering (QA) models given the limited amount of human labeled data. However, a considerable gap remains between synthetic and humangenerated questionanswer pairs. This work aims to narrow this gap by taking advantage of large language models and explores several factors such as model size, quality of pretrained models, scale of data synthesized, and algorithmic choices. On the SQuAD1.1 question answering task, we achieve higher accuracy using solely synthetic questions and answers than when using the SQuAD1.1 training set questions alone. Removing access to real Wikipedia data, we synthesize questions and answers from a synthetic corpus generated by an 8.3 billion parameter GPT2 model. With no access to human supervision and only access to other models, we are able to train state of the art question answering networks on entirely modelgenerated data that achieve 88.4 Exact Match (EM) and 93.9 F1 score on the SQuAD1.1 dev set. We further apply our methodology to SQuAD2.0 and show a 2.8 absolute gain on EM score compared to prior work using synthetic data.
 [45] arXiv:2002.09600 [pdf, other]

Title: Convex Shape Representation with Binary Labels for Image Segmentation: Models and Fast AlgorithmsSubjects: Numerical Analysis (math.NA); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
We present a novel and effective binary representation for convex shapes. We show the equivalence between the shape convexity and some properties of the associated indicator function. The proposed method has two advantages. Firstly, the representation is based on a simple inequality constraint on the binary function rather than the definition of convex shapes, which allows us to obtain efficient algorithms for various applications with convexity prior. Secondly, this method is independent of the dimension of the concerned shape. In order to show the effectiveness of the proposed representation approach, we incorporate it with a probability based model for object segmentation with convexity prior. Efficient algorithms are given to solve the proposed models using Lagrange multiplier methods and linear approximations. Various experiments are given to show the superiority of the proposed methods.
 [46] arXiv:2002.09603 [pdf, other]

Title: Efficient solvers for hybridized threefield mixed finite element coupled poromechanicsSubjects: Numerical Analysis (math.NA)
We consider a mixed hybrid finite element formulation for coupled poromechanics. A stabilization strategy based on a macroelement approach is advanced to eliminate the spurious pressure modes appearing in undrained/incompressible conditions. The efficient solution of the stabilized mixed hybrid block system is addressed by developing a class of block triangular preconditioners based on a Schurcomplement approximation strategy. Robustness, computational efficiency and scalability of the proposed approach are theoretically discussed and tested using challenging benchmark problems on massively parallel architectures.
 [47] arXiv:2002.09604 [pdf, other]

Title: Emergent Communication with World ModelsComments: NeurIPS Workshop on Emergent CommunicationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We introduce Language World Models, a class of languageconditional generative model which interpret natural language messages by predicting latent codes of future observations. This provides a visual grounding of the message, similar to an enhanced observation of the world, which may include objects outside of the listening agent's fieldofview. We incorporate this "observation" into a persistent memory state, and allow the listening agent's policy to condition on it, akin to the relationship between memory and controller in a World Model. We show this improves effective communication and task success in 2D gridworld speakerlistener navigation tasks. In addition, we develop two losses framed specifically for our modelbased formulation to promote positive signalling and positive listening. Finally, because messages are interpreted in a generative model, we can visualize the model beliefs to gain insight into how the communication channel is utilized.
 [48] arXiv:2002.09605 [pdf, ps, other]

Title: Error estimation of the Relaxation Finite Difference Scheme for the nonlinear Schrödinger EquationAuthors: Georgios E. ZourarisSubjects: Numerical Analysis (math.NA)
We consider an initial and boundary value problem for the nonlinear Schr\"odinger equation with homogeneous Dirichlet boundary conditions in the one space dimension case. We discretize the problem in space by a central finite difference method and in time by the Relaxation Scheme proposed by C. Besse [C. R. Acad. Sci. Paris S\'er. I {\bf 326} (1998), 14271432]. We provide optimal order error estimates, in the discrete $L_t^{\infty}(H_x^1)$ norm, for the approximation error at the time nodes and at the intermediate time nodes. In the context of the nonlinear Schr{\"o}dinger equation, it is the first time that the derivation of an error estimate, for a fully discrete method based on the Relaxation Scheme, is completely addressed.
 [49] arXiv:2002.09607 [pdf, other]

Title: MultiRepresentation Knowledge Distillation For Audio ClassificationSubjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)
As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speaking, the raw signal can be transformed into various representations (such as Short Time Fourier Transform and Mel Frequency Cepstral Coefficients), and information implied in different representations can be complementary. Ensembling the models trained on different representations can greatly boost the classification performance, however, making inference using a large number of models is cumbersome and computationally expensive. In this paper, we propose a novel endtoend collaborative learning framework for the audio classification task. The framework takes multiple representations as the input to train the models in parallel. The complementary information provided by different representations is shared by knowledge distillation. Consequently, the performance of each model can be significantly promoted without increasing the computational overhead in the inference stage. Extensive experimental results demonstrate that the proposed approach can improve the classification performance and achieve stateoftheart results on both acoustic scene classification tasks and general audio tagging tasks.
 [50] arXiv:2002.09609 [pdf, ps, other]

Title: Private Stochastic Convex Optimization: Efficient Algorithms for Nonsmooth ObjectivesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we revisit the problem of private stochastic convex optimization. We propose an algorithm, based on noisy mirror descent, which achieves optimal rates up to a logarithmic factor, both in terms of statistical complexity and number of queries to a firstorder stochastic oracle. Unlike prior work, we do not require Lipschitz continuity of stochastic gradients to achieve optimal rates. Our algorithm generalizes beyond the Euclidean setting and yields anytime utility and privacy guarantees.
 [51] arXiv:2002.09610 [pdf, ps, other]

Title: Improved MPC Algorithms for MIS, Matching, and Coloring on Trees and BeyondSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
We present $O(\log\log n)$ round scalable Massively Parallel Computation algorithms for maximal independent set and maximal matching, in trees and more generally graphs of bounded arboricity, as well as for constant coloring trees. Following the standards, by a scalable MPC algorithm, we mean that these algorithms can work on machines that have capacity/memory as small as $n^{\delta}$ for any positive constant $\delta<1$. Our results improve over the $O(\log^2\log n)$ round algorithms of Behnezhad et al. [PODC'19]. Moreover, our matching algorithm is presumably optimal as its bound matches an $\Omega(\log\log n)$ conditional lower bound of Ghaffari, Kuhn, and Uitto [FOCS'19].
 [52] arXiv:2002.09616 [pdf, other]

Title: "Wait, I'm Still Talking!" Predicting the Dialogue Interaction Behavior Using ImagineThenArbitrate ModelSubjects: Computation and Language (cs.CL)
Producing natural and accurate responses like human beings is the ultimate goal of intelligent dialogue agents. So far, most of the past works concentrate on selecting or generating one pertinent and fluent response according to current query and its context. These models work on a onetoone environment, making one response to one utterance each round. However, in real humanhuman conversations, human often sequentially sends several short messages for readability instead of a long message in one turn. Thus messages will not end with an explicit ending signal, which is crucial for agents to decide when to reply. So the first step for an intelligent dialogue agent is not replying but deciding if it should reply at the moment. To address this issue, in this paper, we propose a novel ImaginethenArbitrate (ITA) neural dialogue model to help the agent decide whether to wait or to make a response directly. Our method has two imaginator modules and an arbitrator module. The two imaginators will learn the agent's and user's speaking style respectively, generate possible utterances as the input of the arbitrator, combining with dialogue history. And the arbitrator decides whether to wait or to make a response to the user directly. To verify the performance and effectiveness of our method, we prepared two dialogue datasets and compared our approach with several popular models. Experimental results show that our model performs well on addressing ending prediction issue and outperforms baseline models.
 [53] arXiv:2002.09617 [pdf, other]

Title: PowerConstrained Trajectory Optimization for Wireless UAV Relays with Random RequestsComments: Accepted and to appear at IEEE ICC 2020Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This paper studies the adaptive trajectory design of a rotarywing UAV serving as a relay between ground nodes dispersed in a circular cell and generating uplink data transmissions randomly according to a Poisson process, and a central base station. We seek to minimize the expected average communication delay to service the data transmission requests, subject to an average power constraint on the mobility of the UAV. The problem is cast as a semiMarkov decision process, and it is shown that the policy exhibits a twoscale structure, which can be efficiently optimized: in the outer decision, upon starting a communication phase, and given its current radius, the UAV selects a target end radius position so as to optimally balance a tradeoff between average longterm communication delay and power consumption; in the inner decision, the UAV selects its trajectory between the start radius and the selected end radius, so as to greedily minimize the delay and energy consumption to serve the current request. Numerical evaluations show that, during waiting phases, the UAV circles at some optimal radius at the most energy efficient speed, until a new request is received. Lastly, the expected average communication delay and power consumption of the optimal policy is compared to that of several heuristics, demonstrating a reduction in latency by over 50% and 20%, respectively, compared to static and mobile heuristic schemes.
 [54] arXiv:2002.09620 [pdf, other]

Title: Efficient Sentence Embedding via Semantic Subspace AnalysisComments: 7 pages, 2 figuresSubjects: Computation and Language (cs.CL)
A novel sentence embedding method built upon semantic subspace analysis, called semantic subspace sentence embedding (S3E), is proposed in this work. Given the fact that word embeddings can capture semantic relationship while semantically similar words tend to form semantic groups in a highdimensional embedding space, we develop a sentence representation scheme by analyzing semantic subspaces of its constituent words. Specifically, we construct a sentence model from two aspects. First, we represent words that lie in the same semantic group using the intragroup descriptor. Second, we characterize the interaction between multiple semantic groups with the intergroup descriptor. The proposed S3E method is evaluated on both textual similarity tasks and supervised tasks. Experimental results show that it offers comparable or better performance than the stateoftheart. The complexity of our S3E method is also much lower than other parameterized models.
 [55] arXiv:2002.09623 [pdf, other]

Title: Anypath Routing Protocol Design via QLearning for Underwater Sensor NetworksSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
As a promising technology in the Internet of Underwater Things, underwater sensor networks have drawn a widespread attention from both academia and industry. However, designing a routing protocol for underwater sensor networks is a great challenge due to high energy consumption and large latency in the underwater environment. This paper proposes a Qlearningbased localizationfree anypath routing (QLFR) protocol to prolong the lifetime as well as reduce the endtoend delay for underwater sensor networks. Aiming at optimal routing policies, the Qvalue is calculated by jointly considering the residual energy and depth information of sensor nodes throughout the routing process. More specifically, we define two reward functions (i.e., depthrelated and energyrelated rewards) for Qlearning with the objective of reducing latency and extending network lifetime. In addition, a new holding time mechanism for packet forwarding is designed according to the priority of forwarding candidate nodes. Furthermore, a mathematical analysis is presented to analyze the performance of the proposed routing protocol. Extensive simulation results demonstrate the superiority performance of the proposed routing protocol in terms of the endtoend delay and the network lifetime.
 [56] arXiv:2002.09626 [pdf, other]

Title: Feedback Identification of conductancebased modelsComments: 14 pages, 10 figuresSubjects: Systems and Control (eess.SY)
This paper applies the classical prediction error method (PEM) to the estimation of nonlinear models of neuronal systems subject to inputadditive noise. While the nonlinear system exhibits excitability, bifurcations, and limitcycle oscillations, we prove consistency of the parameter estimation procedure under output feedback. Hence, this paper provides a rigorous framework for the application of conventional nonlinear system identification methods to stochastic neuronal systems. The main result exploits the elementary property that conductancebased models of neurons have an exponentially contracting inverse dynamics. This property is implied by the voltageclamp experiment, which has been the fundamental modeling experiment of neurons ever since the pioneering work of Hodgkin and Huxley.
 [57] arXiv:2002.09627 [pdf, ps, other]

Title: Feedback for nonlinear system identificationComments: 18th European Control Conference (ECC), Napoli, Italy, June 2528 2019Journalref: 18th European Control Conference (ECC), Naples, Italy, 2019, pp. 13441349Subjects: Systems and Control (eess.SY)
Motivated by neuronal models from neuroscience, we consider the system identification of simple feedback structures whose behaviors include nonlinear phenomena such as excitability, limitcycles and chaos. We show that output feedback is sufficient to solve the identification problem in a twostep procedure. First, the nonlinear static characteristic of the system is extracted, and second, using a feedback linearizing law, a mildly nonlinear system with an approximatelyfinite memory is identified. In an ideal setting, the second step boils down to the identification of a LTI system. To illustrate the method in a realistic setting, we present numerical simulations of the identification of two classical systems that fit the assumed model structure.
 [58] arXiv:2002.09629 [pdf, other]

Title: An Empirical Study of Android Security Bulletins in Different VendorsSubjects: Cryptography and Security (cs.CR)
Mobile devices encroach on almost every part of our lives, including work and leisure, and contain a wealth of personal and sensitive information. It is, therefore, imperative that these devices uphold high security standards. A key aspect is the security of the underlying operating system. In particular, Android plays a critical role due to being the most dominant platform in the mobile ecosystem with more than one billion active devices and due to its openness, which allows vendors to adopt and customize it. Similar to other platforms, Android maintains security by providing monthly security patches and announcing them via the Android security bulletin. To absorb this information successfully across the Android ecosystem, impeccable coordination by many different vendors is required.
In this paper, we perform a comprehensive study of 3,171 Androidrelated vulnerabilities and study to which degree they are reflected in the Android security bulletin, as well as in the security bulletins of three leading vendors: Samsung, LG, and Huawei. In our analysis, we focus on the metadata of these security bulletins (e.g., timing, affected layers, severity, and CWE data) to better understand the similarities and differences among vendors. We find that (i) the studied vendors in the Android ecosystem have adopted different structures for vulnerability reporting, (ii) vendors are less likely to react with delay for CVEs with Android Git repository references, (iii) vendors handle Qualcommrelated CVEs differently from the rest of external layer CVEs.  [59] arXiv:2002.09632 [pdf, other]

Title: Using SingleStep Adversarial Training to Defend Iterative Adversarial ExamplesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Adversarial examples have become one of the largest challenges that machine learning models, especially neural network classifiers, face. These adversarial examples break the assumption of attackfree scenario and fool stateoftheart (SOTA) classifiers with insignificant perturbations to human. So far, researchers achieved great progress in utilizing adversarial training as a defense. However, the overwhelming computational cost degrades its applicability and little has been done to overcome this issue. SingleStep adversarial training methods have been proposed as computationally viable solutions, however they still fail to defend against iterative adversarial examples. In this work, we first experimentally analyze several different SOTA defense methods against adversarial examples. Then, based on observations from experiments, we propose a novel singlestep adversarial training method which can defend against both singlestep and iterative adversarial examples. Lastly, through extensive evaluations, we demonstrate that our proposed method outperforms the SOTA singlestep and iterative adversarial training defense. Compared with ATDA (singlestep method) on CIFAR10 dataset, our proposed method achieves 35.67% enhancement in test accuracy and 19.14% reduction in training time. When compared with methods that use BIM or Madry examples (iterative methods) on CIFAR10 dataset, it saves up to 76.03% in training time with less than 3.78% degeneration in test accuracy.
 [60] arXiv:2002.09634 [pdf, other]

Title: Data Augmentation for CopyMechanism in Dialogue State TrackingSubjects: Computation and Language (cs.CL)
While several stateoftheart approaches to dialogue state tracking (DST) have shown promising performances on several benchmarks, there is still a significant performance gap between seen slot values (i.e., values that occur in both training set and test set) and unseen ones (values that occur in training set but not in test set). Recently, the copymechanism has been widely used in DST models to handle unseen slot values, which copies slot values from user utterance directly. In this paper, we aim to find out the factors that influence the generalization ability of a common copymechanism model for DST. Our key observations include: 1) the copymechanism tends to memorize values rather than infer them from contexts, which is the primary reason for unsatisfactory generalization performance; 2) greater diversity of slot values in the training set increase the performance on unseen values but slightly decrease the performance on seen values. Moreover, we propose a simple but effective algorithm of data augmentation to train copymechanism models, which augments the input dataset by copying user utterances and replacing the real slot values with randomly generated strings. Users could use two hyperparameters to realize a tradeoff between the performances on seen values and unseen ones, as well as a tradeoff between overall performance and computational cost. Experimental results on three widely used datasets (WoZ 2.0, DSTC2, and MultiWoZ 2.0) show the effectiveness of our approach.
 [61] arXiv:2002.09636 [pdf, other]

Title: Conceptual Game ExpansionComments: 14 pages, 5 figuresSubjects: Artificial Intelligence (cs.AI)
Automated game design is the problem of automatically producing games through computational processes. Traditionally these methods have relied on the authoring of search spaces by a designer, defining the space of all possible games for the system to author. In this paper we instead learn representations of existing games and use these to approximate a search space of novel games. In a human subject study we demonstrate that these novel games are indistinguishable from human games for certain measures.
 [62] arXiv:2002.09637 [pdf, other]

Title: Markov Chain MonteCarlo Phylogenetic Inference Construction in Computational Historical LinguisticsAuthors: Tianyi NiSubjects: Computation and Language (cs.CL)
More and more languages in the world are under study nowadays, as a result, the traditional way of historical linguistics study is facing some challenges. For example, the linguistic comparative research among languages needs manual annotation, which becomes more and more impossible with the increasing amount of language data coming out all around the world. Although it could hardly replace linguists work, the automatic computational methods have been taken into consideration and it can help people reduce their workload. One of the most important work in historical linguistics is word comparison from different languages and find the cognate words for them, which means people try to figure out if the two languages are related to each other or not. In this paper, I am going to use computational method to cluster the languages and use Markov Chain Monte Carlo (MCMC) method to build the language typology relationship tree based on the clusters.
 [63] arXiv:2002.09646 [pdf, other]

Title: Machine Translation System Selection from Bandit FeedbackSubjects: Computation and Language (cs.CL)
Adapting machine translation systems in the real world is a difficult problem. In contrast to offline training, users cannot provide the type of finegrained feedback typically used for improving the system. Moreover, users have different translation needs, and even a single user's needs may change over time.
In this work we take a different approach, treating the problem of adapting as one of selection. Instead of adapting a single system, we train many translation systems using different architectures and data partitions. Using bandit learning techniques on simulated user feedback, we learn a policy to choose which system to use for a particular translation task. We show that our approach can (1) quickly adapt to address domain changes in translation tasks, (2) outperform the single best system in mixeddomain translation tasks, and (3) make effective instancespecific decisions when using contextual bandit strategies.  [64] arXiv:2002.09650 [pdf, other]

Title: Learning Cost Functions for Optimal TransportSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning the cost function for optimal transport from observed transport plan or its samples has been cast as a bilevel optimization problem. In this paper, we derive an unconstrained convex optimization formulation for the problem which can be further augmented by any customizable regularization. This novel framework avoids repeatedly solving a forward optimal transport problem in each iteration which has been a thorny computational bottleneck for the bilevel optimization approach. To validate the effectiveness of this framework, we develop two numerical algorithms, one is a fast matrix scaling method based on the SinkhornKnopp algorithm for the discrete case, and the other is a supervised learning algorithm that realizes the cost function as a deep neural network in the continuous case. Numerical results demonstrate promising efficiency and accuracy advantages of the proposed algorithms over existing state of the art methods.
 [65] arXiv:2002.09663 [pdf, other]

Title: Active Lighting Recurrence by Parallel Lighting Analogy for FineGrained Change DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper studies a new problem, namely active lighting recurrence (ALR) that physically relocalizes a light source to reproduce the lighting condition from single reference image for a same scene, which may suffer from finegrained changes during twice observations. ALR is of great importance for finegrained visual inspection and change detection, because some phenomena or minute changes can only be clearly observed under particular lighting conditions. Therefore, effective ALR should be able to online navigate a light source toward the target pose, which is challenging due to the complexity and diversity of realworld lighting and imaging processes. To this end, we propose to use the simple parallel lighting as an analogy model and based on Lambertian law to compose an instant navigation ball for this purpose. We theoretically prove the feasibility, i.e., equivalence and convergence, of this ALR approach for realistic near point light source and small near surface light source. Besides, we also theoretically prove the invariance of our ALR approach to the ambiguity of normal and lighting decomposition. The effectiveness and superiority of the proposed approach have been verified by both extensive quantitative experiments and challenging realworld tasks on finegrained change detection of cultural heritages. We also validate the generality of our approach to nonLambertian scenes.
 [66] arXiv:2002.09664 [pdf, other]

Title: BookAhead & Supply Management for Ridesourcing PlatformsSubjects: Systems and Control (eess.SY)
Ridesourcing platforms recently introduced the "schedule a ride" service where passengers may reserve (bookahead) a ride in advance of their trip. Reservations give platforms precise information that describes the start time and location of anticipated future trips; in turn, platforms can use this information to adjust the availability and spatial distribution of the driver supply. In this article, we propose a framework for modeling/analyzing reservations in timevarying stochastic ridesourcing systems. We consider that the driver supply is distributed over a network of geographic regions and that bookahead rides have reach time priority over nonreserved rides. First, we propose a statedependent admission control policy that assigns drivers to passengers; this policy ensures that the reach time service requirement would be attained for bookahead rides. Second, given the admission control policy and reservations information in each region, we predict the "target" number of drivers that is required (in the future) to probabilistically guarantee the reach time service requirement for stochastic nonreserved rides. Third, we propose a reactive dispatching/rebalancing mechanism that determines the adjustments to the driver supply that are needed to maintain the targets across regions. For a specific reach time quality of service, simulation results using data from Lyft rides in Manhattan exhibit how the number of idle drivers decreases with the fraction of bookahead rides. We also observe that the nonstationary demand (ride request) rate varies significantly across time; this rapid variation further illustrates that timedependent models are needed for operational analysis of ridesourcing systems.
 [67] arXiv:2002.09666 [pdf, ps, other]

Title: String stable integral control of vehicle platoons with disturbancesComments: 7 pages, 4 figures, submitted to AutomaticaSubjects: Systems and Control (eess.SY)
This paper presents string stable controllers with disturbance rejection properties for vehicle platoons. Through the addition of integral action and a coordinate change, sufficient smoothness conditions on the closed loop system are established that ensure the proposed controller is string stable in the presence of timevarying disturbances, and is able to reject constant disturbances. Error bounds from desired platoon configuration are also developed. Further, a suitable controller structure is introduced, and an example is provided that achieves the required smoothness conditions and is examined in simulation studies.
 [68] arXiv:2002.09668 [pdf, other]

Title: CommunicationEfficient Edge AI: Algorithms and SystemsComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
Artificial intelligence (AI) has achieved remarkable breakthroughs in a wide range of fields, ranging from speech processing, image classification to drug discovery. This is driven by the explosive growth of data, advances in machine learning (especially deep learning), and easy access to vastly powerful computing resources. Particularly, the wide scale deployment of edge devices (e.g., IoT devices) generates an unprecedented scale of data, which provides the opportunity to derive accurate models and develop various intelligent applications at the network edge. However, such enormous data cannot all be sent from end devices to the cloud for processing, due to the varying channel quality, traffic congestion and/or privacy concerns. By pushing inference and training processes of AI models to edge nodes, edge AI has emerged as a promising alternative. AI at the edge requires close cooperation among edge devices, such as smart phones and smart vehicles, and edge servers at the wireless access points and base stations, which however result in heavy communication overheads. In this paper, we present a comprehensive survey of the recent developments in various techniques for overcoming these communication challenges. Specifically, we first identify key communication challenges in edge AI systems. We then introduce communicationefficient techniques, from both algorithmic and system perspectives for training and inference tasks at the network edge. Potential future research directions are also highlighted.
 [69] arXiv:2002.09670 [pdf, other]

Title: Nonmyopic Gaussian Process Optimization with MacroActionsComments: 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020), Extended version with proofs, 32 pagesSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
This paper presents a multistaged approach to nonmyopic adaptive Gaussian process optimization (GPO) for Bayesian optimization (BO) of unknown, highly complex objective functions that, in contrast to existing nonmyopic adaptive BO algorithms, exploits the notion of macroactions for scaling up to a further lookahead to match up to a larger available budget. To achieve this, we generalize GP upper confidence bound to a new acquisition function defined w.r.t. a nonmyopic adaptive macroaction policy, which is intractable to be optimized exactly due to an uncountable set of candidate outputs. The contribution of our work here is thus to derive a nonmyopic adaptive epsilonBayesoptimal macroaction GPO (epsilonMacroGPO) policy. To perform nonmyopic adaptive BO in real time, we then propose an asymptotically optimal anytime variant of our epsilonMacroGPO policy with a performance guarantee. We empirically evaluate the performance of our epsilonMacroGPO policy and its anytime variant in BO with synthetic and realworld datasets.
 [70] arXiv:2002.09671 [pdf, ps, other]

Title: Vehicle Tracking in Wireless Sensor Networks via Deep Reinforcement LearningSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Machine Learning (stat.ML)
Vehicle tracking has become one of the key applications of wireless sensor networks (WSNs) in the fields of rescue, surveillance, traffic monitoring, etc. However, the increased tracking accuracy requires more energy consumption. In this letter, a decentralized vehicle tracking strategy is conceived for improving both tracking accuracy and energy saving, which is based on adjusting the intersection area between the fixed sensing area and the dynamic activation area. Then, two deep reinforcement learning (DRL) aided solutions are proposed relying on the dynamic selection of the activation area radius. Finally, simulation results show the superiority of our DRL aided design.
 [71] arXiv:2002.09673 [pdf, other]

Title: Incorporating Effective Global Information via Adaptive Gate Attention for Text ClassificationSubjects: Computation and Language (cs.CL)
The dominant text classification studies focus on training classifiers using textual instances only or introducing external knowledge (e.g., handcraft features and domain expert knowledge). In contrast, some corpuslevel statistical features, like word frequency and distribution, are not well exploited. Our work shows that such simple statistical information can enhance classification performance both efficiently and significantly compared with several baseline models. In this paper, we propose a classifier with gate mechanism named Adaptive Gate Attention model with Global Information (AGA+GI), in which the adaptive gate mechanism incorporates global statistical features into latent semantic features and the attention layer captures dependency relationship within the sentence. To alleviate the overfitting issue, we propose a novel Leaky Dropout mechanism to improve generalization ability and performance stability. Our experiments show that the proposed method can achieve better accuracy than CNNbased and RNNbased approaches without global information on several benchmarks.
 [72] arXiv:2002.09674 [pdf, other]

Title: Temporal Sparse Adversarial Attack on Gait RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Gait recognition has a broad application in social security due to its advantages in longdistance human identification. Despite the high accuracy of gait recognition systems, their adversarial robustness has not been explored. In this paper, we demonstrate that the stateoftheart gait recognition model is vulnerable to adversarial attacks. A novel temporal sparse adversarial attack under a new defined distortion measurement is proposed. GANbased architecture is employed to semantically generate adversarial highquality gait silhouette. By sparsely substituting or inserting a few adversarial gait silhouettes, our proposed method can achieve a high attack success rate. The imperceptibility and the attacking success rate of the adversarial examples are well balanced. Experimental results show even only onefortieth frames are attacked, the attack success rate still reaches 76.8%.
 [73] arXiv:2002.09676 [pdf, other]

Title: Guided Constrained Policy Optimization for Dynamic Quadrupedal Robot LocomotionComments: 8 pages, 8 figures, 5 tables, 1 algorithm, accepted to IEEE Robotics and Automation Letters (RAL), January 2020 with presentation at International Conference on Robotics and Automation (ICRA) 2020Subjects: Robotics (cs.RO); Machine Learning (cs.LG); Systems and Control (eess.SY)
Deep reinforcement learning (RL) uses modelfree techniques to optimize taskspecific control policies. Despite having emerged as a promising approach for complex problems, RL is still hard to use reliably for realworld applications. Apart from challenges such as precise reward function tuning, inaccurate sensing and actuation, and nondeterministic response, existing RL methods do not guarantee behavior within required safety constraints that are crucial for real robot scenarios. In this regard, we introduce guided constrained policy optimization (GCPO), an RL framework based upon our implementation of constrained proximal policy optimization (CPPO) for tracking base velocity commands while following the defined constraints. We also introduce schemes which encourage state recovery into constrained regions in case of constraint violations. We present experimental results of our training method and test it on the real ANYmal quadruped robot. We compare our approach against the unconstrained RL method and show that guided constrained RL offers faster convergence close to the desired optimum resulting in an optimal, yet physically feasible, robotic control behavior without the need for precise reward function tuning.
 [74] arXiv:2002.09680 [pdf, other]

Title: On Boolean gates in fungal colonyAuthors: Andrew Adamatzky, Martin Tegelaar, Han A. B. Wosten, Anna L. Powell, Alexander E. Beasley, Richard MayneSubjects: Emerging Technologies (cs.ET)
A fungal colony maintains its integrity via flow of cytoplasm along mycelium network. This flow, together with possible coordination of mycelium tips propagation, is controlled by calcium waves and associated waves of electrical potential changes. We propose that these excitation waves can be employed to implement a computation in the mycelium networks. We use FitzHughNagumo model to imitate propagation of excitation in a single colony of Aspergillus niger. Boolean values are encoded by spikes of extracellular potential. We represent binary inputs by electrical impulses on a pair of selected electrodes and we record responses of the colony from sixteen electrodes. We derive sets of twoinputsonoutput logical gates implementable the fungal colony and analyse distributions of the gates.
 [75] arXiv:2002.09681 [pdf]

Title: Towards fieldprogrammable photonic gate arraysComments: A version of this paper was presented at the OPTO conference in Photonics West 2020Subjects: Emerging Technologies (cs.ET); Optics (physics.optics)
We review some of the basic principles, fundamentals, technologies, architectures and recent advances leading to thefor the implementation of Field Programmable Photonic Field Arrays (FPPGAs).
 [76] arXiv:2002.09682 [pdf, ps, other]

Title: Concurrent Kleene Algebra with Observations: from Hypotheses to CompletenessSubjects: Logic in Computer Science (cs.LO)
Concurrent Kleene Algebra (CKA) extends basic Kleene algebra with a parallel composition operator, which enables reasoning about concurrent programs. However, CKA fundamentally misses tests, which are needed to model standard programming constructs such as conditionals and $\mathsf{while}$loops. It turns out that integrating tests in CKA is subtle, due to their interaction with parallelism. In this paper we provide a solution in the form of Concurrent Kleene Algebra with Observations (CKAO). Our main contribution is a completeness theorem for CKAO. Our result resorts on a more general study of CKA "with hypotheses", of which CKAO turns out to be an instance: this analysis is of independent interest, as it can be applied to extensions of CKA other than CKAO.
 [77] arXiv:2002.09685 [pdf, other]

Title: Exploiting Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural NetworkSubjects: Computation and Language (cs.CL)
Targeted sentiment classification predicts the sentiment polarity on given target mentions in input texts. Dominant methods employ neural networks for encoding the input sentence and extracting relations between target mentions and their contexts. Recently, graph neural network has been investigated for integrating dependency syntax for the task, achieving the stateoftheart results. However, existing methods do not consider dependency label information, which can be intuitively useful. To solve the problem, we investigate a novel relational graph attention network that integrates typed syntactic dependency information. Results on standard benchmarks show that our method can effectively leverage label information for improving targeted sentiment classification performances. Our final model significantly outperforms stateoftheart syntaxbased approaches.
 [78] arXiv:2002.09689 [pdf, ps, other]

Title: Fair and Decentralized Exchange of Digital GoodsComments: 10 pagesSubjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY); Social and Information Networks (cs.SI)
We construct a privacypreserving, distributed and decentralized marketplace where parties can exchange data for tokens. In this market, buyers and sellers make transactions in a blockchain and interact with a third party, called notary, who has the ability to vouch for the authenticity and integrity of the data.
We introduce a protocol for the datatoken exchange where neither party gains more information than what it is paying for, and the exchange is fair: either both parties gets the other's item or neither does. No third party involvement is required after setup, and no dispute resolution is needed.  [79] arXiv:2002.09690 [pdf, other]

Title: A Positive and Energy Stable Numerical Scheme for the PoissonNernstPlanckCahnHilliard Equations with Steric InteractionsSubjects: Numerical Analysis (math.NA); Chemical Physics (physics.chemph); Computational Physics (physics.compph)
We consider numerical methods for the PoissonNernstPlanckCahnHilliard (PNPCH) equations with steric interactions. We propose a novel energy stable numerical scheme that respects mass conservation and positivity at the discrete level. Existence and uniqueness of the solution to the proposed nonlinear scheme are established by showing that the solution is a unique minimizer of a convex functional over a closed, convex domain. The positivity of numerical solutions is further theoretically justified by the singularity of the entropy terms, which prevents the minimizer from approaching zero concentrations. A further numerical analysis proves discrete freeenergy dissipation. Extensive numerical tests are performed to validate that the numerical scheme is firstorder accurate in time and secondorder accurate in space, and is capable of preserving the desired properties, such as mass conservation, positivity, and free energy dissipation, at the discrete level. Moreover, the PNPCH equations and the proposed scheme are applied to study charge dynamics and selfassembled nanopatterns in highly concentrated electrolytes that are widely used in electrochemical energy devices. Numerical results demonstrate that the PNPCH equations and our numerical scheme are able to capture nanostructures, such as lamellar patterns and labyrinthine patterns in electric double layers and the bulk, and multiple time relaxation with multiple time scales. In addition, we numerically characterize the interplay between cross steric interactions of short range and the concentration gradient regularization, and their impact on the development of nanostructures in the equilibrium state.
 [80] arXiv:2002.09692 [pdf, other]

Title: CommunicationEfficient Decentralized Learning with Sparsification and Adaptive Peer SelectionSubjects: Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (stat.ML)
Distributed learning techniques such as federated learning have enabled multiple workers to train machine learning models together to reduce the overall training time. However, current distributed training algorithms (centralized or decentralized) suffer from the communication bottleneck on multiple lowbandwidth workers (also on the server under the centralized architecture). Although decentralized algorithms generally have lower communication complexity than the centralized counterpart, they still suffer from the communication bottleneck for workers with low network bandwidth. To deal with the communication problem while being able to preserve the convergence performance, we introduce a novel decentralized training algorithm with the following key features: 1) It does not require a parameter server to maintain the model during training, which avoids the communication pressure on any single peer. 2) Each worker only needs to communicate with a single peer at each communication round with a highly compressed model, which can significantly reduce the communication traffic on the worker. We theoretically prove that our sparsification algorithm still preserves convergence properties. 3) Each worker dynamically selects its peer at different communication rounds to better utilize the bandwidth resources. We conduct experiments with convolutional neural networks on 32 workers to verify the effectiveness of our proposed algorithm compared to seven existing methods. Experimental results show that our algorithm significantly reduces the communication traffic and generally select relatively high bandwidth peers.
 [81] arXiv:2002.09693 [pdf, other]

Title: Interpretable Crowd Flow Prediction with SpatialTemporal SelfAttentionComments: 7pagesSubjects: Machine Learning (cs.LG); Information Retrieval (cs.IR); Image and Video Processing (eess.IV)
Crowd flow prediction has been increasingly investigated in intelligent urban computing field as a fundamental component of urban management system. The most challenging part of predicting crowd flow is to measure the complicated spatialtemporal dependencies. A prevalent solution employed in current methods is to divide and conquer the spatial and temporal information by various architectures (e.g., CNN/GCN, LSTM). However, this strategy has two disadvantages: (1) the sophisticated dependencies are also divided and therefore partially isolated; (2) the spatialtemporal features are transformed into latent representations when passing through different architectures, making it hard to interpret the predicted crowd flow. To address these issues, we propose a SpatialTemporal SelfAttention Network (STSAN) with an ST encoding gate that calculates the entire spatialtemporal representation with positional and time encodings and therefore avoids dividing the dependencies. Furthermore, we develop a Multiaspect attention mechanism that applies scaled dotproduct attention over spatialtemporal information and measures the attention weights that explicitly indicate the dependencies. Experimental results on traffic and mobile data demonstrate that the proposed method reduces inflow and outflow RMSE by 16% and 8% on the TaxiNYC dataset compared to the SOTA baselines.
 [82] arXiv:2002.09699 [pdf, other]

Title: FMore: An Incentive Scheme of Multidimensional Auction for Federated Learning in MECSubjects: Machine Learning (cs.LG); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
Promising federated learning coupled with Mobile Edge Computing (MEC) is considered as one of the most promising solutions to the AIdriven service provision. Plenty of studies focus on federated learning from the performance and security aspects, but they neglect the incentive mechanism. In MEC, edge nodes would not like to voluntarily participate in learning, and they differ in the provision of multidimensional resources, both of which might deteriorate the performance of federated learning. Also, lightweight schemes appeal to edge nodes in MEC. These features require the incentive mechanism to be well designed for MEC. In this paper, we present an incentive mechanism FMore with multidimensional procurement auction of K winners. Our proposal FMore not only is lightweight and incentive compatible, but also encourages more highquality edge nodes with low cost to participate in learning and eventually improve the performance of federated learning. We also present theoretical results of Nash equilibrium strategy to edge nodes and employ the expected utility theory to provide guidance to the aggregator. Both extensive simulations and realworld experiments demonstrate that the proposed scheme can effectively reduce the training rounds and drastically improve the model accuracy for challenging AI tasks.
 [83] arXiv:2002.09705 [pdf, other]

Title: The candy wrapper problem  a temporal multiscale approach for pde/pde systemsSubjects: Numerical Analysis (math.NA)
We describe a temporal multiscale approach for the simulation of longterm processes with shortterm influences involving partial differential equations. The specific problem under consideration is a growth process in blood vessels. The \emph{Candy Wrapper Process} describes a restenosis in a vessel that has previously be widened by inserting a stent. The development of a new stenosis takes place on a long time horizon (months) while the acting forces are mainly given by the pulsating blood flow. We describe a coupled pde model and a finite element simulation that is used as basis for our multiscale approach, which is based on averaging the long scale equation and approximating the fast scale impact by localized periodicintime problems. Numerical test cases in prototypical 3d configurations demonstrate the power of the approach.
 [84] arXiv:2002.09706 [pdf]

Title: Structural Combinatorial of Network Information System of Systems based on Evolutionary Optimization MethodSubjects: Neural and Evolutionary Computing (cs.NE)
The network information system is a military information network system with evolution characteristics. Evolution is a process of replacement between disorder and order, chaos and equilibrium. Given that the concept of evolution originates from biological systems, in this article, the evolution of network information architecture is analyzed by genetic algorithms, and the network information architecture is represented by chromosomes. Besides, the genetic algorithm is also applied to find the optimal chromosome in the architecture space. The evolutionary simulation is used to predict the optimal scheme of the network information architecture and provide a reference for system construction.
 [85] arXiv:2002.09707 [pdf, other]

Title: Compression with wildcards: All spanning treesAuthors: Marcel WildComments: 14 pagesSubjects: Data Structures and Algorithms (cs.DS)
By processing all minimal cutsets of a graph G, and by using novel wildcards, all spanning trees of G can be compactly encoded. Thus, different from all previous enumeration schemes, the spanning trees are not generated onebyone. The Mathematica implementation of one of our algorithms generated for a random (11,50)graph its 819'603'181 spanning trees, in bundles of size about 400, within 52 seconds.
 [86] arXiv:2002.09708 [pdf, other]

Title: Robust Multimodal Brain Tumor Segmentation via Feature Disentanglement and Gated FusionComments: MICCAI 2019Subjects: Computer Vision and Pattern Recognition (cs.CV)
Accurate medical image segmentation commonly requires effective learning of the complementary information from multimodal data. However, in clinical practice, we often encounter the problem of missing imaging modalities. We tackle this challenge and propose a novel multimodal segmentation framework which is robust to the absence of imaging modalities. Our network uses feature disentanglement to decompose the input modalities into the modalityspecific appearance code, which uniquely sticks to each modality, and the modalityinvariant content code, which absorbs multimodal information for the segmentation task. With enhanced modalityinvariance, the disentangled content code from each modality is fused into a shared representation which gains robustness to missing data. The fusion is achieved via a learningbased strategy to gate the contribution of different modalities at different locations. We validate our method on the important yet challenging multimodal brain tumor segmentation task with the BRATS challenge dataset. With competitive performance to the stateoftheart approaches for full modality, our method achieves outstanding robustness under various missing modality(ies) situations, significantly exceeding the stateoftheart method by over 16% in average for Dice on whole tumor segmentation.
 [87] arXiv:2002.09710 [pdf, other]

Title: Actively Mapping Industrial Structures with Information GainBased Planning on a Quadruped RobotSubjects: Robotics (cs.RO)
In this paper, we develop an online active mapping system to enable a quadruped robot to autonomously survey large physical structures. We describe the perception, planning and control modules needed to scan and reconstruct an object of interest, without requiring a prior model. The system builds a voxel representation of the object, and iteratively determines the NextBestView (NBV) to extend the representation, according to both the reconstruction itself and to avoid collisions with the environment. By computing the expected information gain of a set of candidate scan locations sampled on the assensed terrain map, as well as the cost of reaching these candidates, the robot decides the NBV for further exploration. The robot plans an optimal path towards the NBV, avoiding obstacles and untraversable terrain. Experimental results on both simulated and realworld environments show the capability and efficiency of our system. Finally we present a full system demonstration on the real robot, the ANYbotics ANYmal, autonomously reconstructing a building facade and an industrial structure.
 [88] arXiv:2002.09718 [pdf, other]

Title: Safe Screening for the Generalized Conditional Gradient MethodSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
The conditional gradient method (CGM) has been widely used for fast sparse approximation, having a low per iteration computational cost for structured sparse regularizers. We explore the sparsity acquiring properties of a generalized CGM (gCGM), where the constraint is replaced by a penalty function based on a gauge penalty; this can be done without significantly increasing the periteration computation, and applies to general notions of sparsity. Without assuming bounded iterates, we show $O(1/t)$ convergence of the function values and gap of gCGM. We couple this with a safe screening rule, and show that at a rate $O(1/(t\delta^2))$, the screened support matches the support at the solution, where $\delta \geq 0$ measures how close the problem is to being degenerate. In our experiments, we show that the gCGM for these modified penalties have similar feature selection properties as common penalties, but with potentially more stability over the choice of hyperparameter.
 [89] arXiv:2002.09719 [pdf, ps, other]

Title: Joint Transmission and Computing Scheduling for Status Update with Mobile Edge ComputingComments: 6 pages, 6 figures, accepted by IEEE ICC'20Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Age of Information (AoI), defined as the time elapsed since the generation of the latest received update, is a promising performance metric to measure data freshness for realtime status monitoring. In many applications, status information needs to be extracted through computing, which can be processed at an edge server enabled by mobile edge computing (MEC). In this paper, we aim to minimize the average AoI within a given deadline by jointly scheduling the transmissions and computations of a series of update packets with deterministic transmission and computing times. The main analytical results are summarized as follows. Firstly, the minimum deadline to guarantee the successful transmission and computing of all packets is given. Secondly, a \emph{nowait computing} policy which intuitively attains the minimum AoI is introduced, and the feasibility condition of the policy is derived. Finally, a closedform optimal scheduling policy is obtained on the condition that the deadline exceeds a certain threshold. The behavior of the optimal transmission and computing policy is illustrated by numerical results with different values of the deadline, which validates the analytical results.
 [90] arXiv:2002.09721 [pdf, other]

Title: General theory of interpolation error estimates on anisotropic meshesComments: 22 pages, 2 figuresSubjects: Numerical Analysis (math.NA)
We propose a general theory of estimating interpolation error for smooth functions in two and three dimensions. In our theory, the error of interpolation is bound in terms of the diameter of a simplex and a geometric parameter. In the twodimensional case, our geometric parameter is equivalent to the circumradius of a triangle. In the threedimensional case, our geometric parameter also represents the flatness of a tetrahedron. Through the introduction of the geometric parameter, the error estimates newly obtained can be applied to cases that violate the maximumangle condition.
 [91] arXiv:2002.09722 [pdf, ps, other]

Title: Checking Phylogenetic Decisiveness in Theory and in PracticeSubjects: Data Structures and Algorithms (cs.DS)
Suppose we have a set $X$ consisting of $n$ taxa and we are given information from $k$ loci from which to construct a phylogeny for $X$. Each locus offers information for only a fraction of the taxa. The question is whether this data suffices to construct a reliable phylogeny. The decisiveness problem expresses this question combinatorially. Although a precise characterization of decisiveness is known, the complexity of the problem is open. Here we relate decisiveness to a hypergraph coloring problem. We use this idea to (1) obtain lower bounds on the amount of coverage needed to achieve decisiveness, (2) devise an exact algorithm for decisiveness, (3) develop problem reduction rules, and use them to obtain efficient algorithms for inputs with few loci, and (4) devise an integer linear programming formulation of the decisiveness problem, which allows us to analyze data sets that arise in practice.
 [92] arXiv:2002.09723 [pdf, other]

Title: Constructing fast approximate eigenspaces with application to the fast graph Fourier transformsSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
We investigate numerically efficient approximations of eigenspaces associated to symmetric and general matrices. The eigenspaces are factored into a fixed number of fundamental components that can be efficiently manipulated (we consider extended orthogonal Givens or scaling and shear transformations). The number of these components controls the tradeoff between approximation accuracy and the computational complexity of projecting on the eigenspaces. We write minimization problems for the single fundamental components and provide closedform solutions. Then we propose algorithms that iterative update all these components until convergence. We show results on random matrices and an application on the approximation of graph Fourier transforms for directed and undirected graphs.
 [93] arXiv:2002.09725 [pdf, other]

Title: Testing the Agreement of Trees with Internal LabelsSubjects: Data Structures and Algorithms (cs.DS)
The input to the agreement problem is a collection $P = \{T_1, T_2, \dots , T_k\}$ of phylogenetic trees, called input trees, over partially overlapping sets of taxa. The question is whether there exists a tree $T$, called an agreement tree, whose taxon set is the union of the taxon sets of the input trees, such that for each $i \in \{1, 2, \dots , k\}$, the restriction of $T$ to the taxon set of $T_i$ is isomorphic to $T_i$. We give a $O(n k (\sum_{i \in [k]} d_i + \log^2(nk)))$ algorithm for a generalization of the agreement problem in which the input trees may have internal labels, where $n$ is the total number of distinct taxa in $P$, $k$ is the number of trees in $P$, and $d_i$ is the maximum number of children of a node in $T_i$.
 [94] arXiv:2002.09726 [pdf, other]

Title: Operator inference for nonintrusive model reduction of systems with nonpolynomial nonlinear termsSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Dynamical Systems (math.DS); Machine Learning (stat.ML)
This work presents a nonintrusive model reduction method to learn lowdimensional models of dynamical systems with nonpolynomial nonlinear terms that are spatially local and that are given in analytic form. In contrast to stateoftheart model reduction methods that are intrusive and thus require full knowledge of the governing equations and the operators of a full model of the discretized dynamical system, the proposed approach requires only the nonpolynomial terms in analytic form and learns the rest of the dynamics from snapshots computed with a potentially blackbox fullmodel solver. The proposed method learns operators for the linear and polynomially nonlinear dynamics via a leastsquares problem, where the given nonpolynomial terms are incorporated in the righthand side. The leastsquares problem is linear and thus can be solved efficiently in practice. The proposed method is demonstrated on three problems governed by partial differential equations, namely the diffusionreaction ChafeeInfante model, a tubular reactor model for reactive flows, and a batchchromatography model that describes a chemical separation process. The numerical results provide evidence that the proposed approach learns reduced models that achieve comparable accuracy as models constructed with stateoftheart intrusive model reduction methods that require full knowledge of the governing equations.
 [95] arXiv:2002.09733 [pdf, ps, other]

Title: Numerical Analysis of a HighOrder Scheme for Nonlinear Fractional Differential Equations with Uniform AccuracySubjects: Numerical Analysis (math.NA)
We introduce a highorder numerical scheme for fractional ordinary differential equations with the Caputo derivative. The method is developed by dividing the domain into a number of subintervals, and applying the quadratic interpolation on each subinterval. The method is shown to be unconditionally stable, and for general nonlinear equations, the uniform sharp numerical order $3\nu$ can be rigorously proven for sufficiently smooth solutions at all time steps. The proof provides a general guide for proving the sharp order for higherorder schemes in the nonlinear case. Some numerical examples are given to validate our theoretical results.
 [96] arXiv:2002.09740 [pdf, other]

Title: (Faster) MultiSided Boundary LabellingComments: 16 pages, 12 figuresSubjects: Computational Geometry (cs.CG)
A 1bend boundary labelling problem consists of an axisaligned rectangle $B$, $n$ points (called sites) in the interior, and $n$ points (called ports) on the labels along the boundary of $B$. The goal is to find a set of $n$ axisaligned curves (called leaders), each having at most one bend and connecting one site to one port, such that the leaders are pairwise disjoint. A 1bend boundary labelling problem is $k$sided ($1\leq k\leq 4$) if the ports appear on $k$ different sides of $B$. Kindermann et al. ["MultiSided Boundary Labeling", Algorithmica, 76(1): 225258, 2016] showed that the 1bend threesided and foursided boundary labelling problems can be solved in $O(n^4)$ and $O(n^9)$ time, respectively. Bose et al. [SWAT, 12:112:14, 2018] improved the latter running time to $O(n^6)$ by reducing the problem to computing maximum independent set in an outerstring graph. In this paper, we improve both previous results by giving new algorithms with running times $O(n^3\log n)$ and $O(n^5)$ to solve the 1bend threesided and foursided boundary labelling problems, respectively.
 [97] arXiv:2002.09745 [pdf, other]

Title: Differentially Private Set UnionAuthors: Sivakanth Gopi, Pankaj Gulhane, Janardhan Kulkarni, Judy Hanwen Shen, Milad Shokouhi, Sergey YekhaninComments: 23 pages, 7 figuresSubjects: Cryptography and Security (cs.CR); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($\epsilon$,$\delta$)differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$grams etc., from private text data belonging to users is an instance of the set union problem.
Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.  [98] arXiv:2002.09748 [pdf, other]

Title: DECIBEL: Improving Audio Chord Estimation for Popular Music by Alignment and Integration of CrowdSourced Symbolic RepresentationsComments: 81 pages, 47 figuresSubjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
Automatic Chord Estimation (ACE) is a fundamental task in Music Information Retrieval (MIR) and has applications in both music performance and MIR research. The task consists of segmenting a music recording or score and assigning a chord label to each segment. Although it has been a task in the annual benchmarking evaluation MIREX for over 10 years, ACE is not yet a solved problem, since performance has stagnated and modern systems have started to tune themselves to subjective training data. We propose DECIBEL, a new ACE system that exploits widely available MIDI and tab representations to improve ACE from audio only. From an audio file and a set of MIDI and tab files corresponding to the same popular music song, DECIBEL first estimates chord sequences. For audio, stateoftheart audio ACE methods are used. MIDI files are aligned to the audio, followed by a MIDI chord estimation step. Tab files are transformed into untimed chord sequences and then aligned to the audio. Next, DECIBEL uses data fusion to integrate all estimated chord sequences into one final output sequence. DECIBEL improves all tested stateoftheart ACE methods by over 3 percent on average. This result shows that the integration of musical knowledge from heterogeneous symbolic music representations is a suitable strategy for addressing challenging MIR tasks such as ACE.
 [99] arXiv:2002.09750 [pdf, other]

Title: On some neural network architectures that can represent viscosity solutions of certain high dimensional HamiltonJacobi partial differential equationsSubjects: Numerical Analysis (math.NA)
We propose novel connection between several neural network architectures and viscosity solutions of some HamiltonJacobi (HJ) partial differential equations (PDEs) whose Hamiltonian is convex and only depends on the spatial gradient of the solution. To be specific, we prove that under certain assumptions, the two neural network architectures we proposed represent viscosity solutions to two sets of HJ PDEs with zero error. We also implement our proposed neural network architectures using Tensorflow and provide several examples and illustrations. Note that these neural network representations can avoid curve of dimensionality for certain HJ PDEs, since they do not involve grid or discretization. Our results suggest that efficient dedicated hardware implementation for neural networks can be leveraged to evaluate viscosity solutions of certain HJ PDEs.
 [100] arXiv:2002.09751 [pdf, ps, other]

Title: Automatic Decoupling and Indexaware ModelOrder Reduction for Nonlinear DifferentialAlgebraic EquationsSubjects: Numerical Analysis (math.NA)
We extend the indexaware modelorder reduction method to systems of nonlinear differentialalgebraic equations with a special nonlinear term f(Ex), where E is a singular matrix. Such nonlinear differentialalgebraic equations arise, for example, in the spatial discretization of the gas flow in pipeline networks. In practice, mathematical models of reallife processes pose challenges when used in numerical simulations, due to complexity and system size. Modelorder reduction aims to eliminate this problem by generating reducedorder models that have lower computational cost to simulate, yet accurately represent the original largescale system behavior. However, direct reduction and simulation of nonlinear differentialalgebraic equations is difficult due to hidden constraints which affect the choice of numerical integration methods and modelorder reduction techniques. We propose an extension of indexaware modelorder reduction methods to nonlinear differentialalgebraic equations without any kind of linearization. The proposed modelorder reduction approach involves automatic decoupling of nonlinear differentialalgebraic equations into nonlinear ordinary differential equations and algebraic equations. This allows applying standard modelorder reduction techniques to both parts without worrying about the index. The same procedure can also be used to simulate nonlinear differentialalgebraic equations using standard integration schemes. We illustrate the performance of our proposed method for nonlinear differentialalgebraic equations arising from gas flow models in pipeline networks.
 [101] arXiv:2002.09754 [pdf, other]

Title: Sampling for Deep Learning Model Diagnosis (Technical Report)Subjects: Machine Learning (cs.LG); Databases (cs.DB)
Deep learning (DL) models have achieved paradigmchanging performance in many fields with high dimensional data, such as images, audio, and text. However, the blackbox nature of deep neural networks is a barrier not just to adoption in applications such as medical diagnosis, where interpretability is essential, but also impedes diagnosis of under performing models. The task of diagnosing or explaining DL models requires the computation of additional artifacts, such as activation values and gradients. These artifacts are large in volume, and their computation, storage, and querying raise significant data management challenges.
In this paper, we articulate DL diagnosis as a data management problem, and we propose a general, yet representative, set of queries to evaluate systems that strive to support this new workload. We further develop a novel data sampling technique that produce approximate but accurate results for these model debugging queries. Our sampling technique utilizes the lower dimension representation learned by the DL model and focuses on model decision boundaries for the data in this lower dimensional space. We evaluate our techniques on one standard computer vision and one scientific data set and demonstrate that our sampling technique outperforms a variety of stateoftheart alternatives in terms of query accuracy.  [102] arXiv:2002.09755 [pdf, other]

Title: BAD to the Bone: Big Active Data at its CoreComments: 27 pages. Submitted to VLDBJSubjects: Databases (cs.DB); Distributed, Parallel, and Cluster Computing (cs.DC)
Virtually all of today's Big Data systems are passive in nature, responding to queries posted by their users. Instead, we are working to shift Big Data platforms from passive to active. In our view, a Big Active Data (BAD) system should continuously and reliably capture Big Data while enabling timely and automatic delivery of relevant information to a large pool of interested users, as well as supporting retrospective analyses of historical information. While various scalable streaming query engines have been created, their active behavior is limited to a (relatively) small window of the incoming data. To this end we have created a BAD platform that combines ideas and capabilities from both Big Data and Active Data (e.g., Publish/Subscribe, Streaming Engines). It supports complex subscriptions that consider not only newly arrived items but also their relationships to past, stored data. Further, it can provide actionable notifications by enriching the subscription results with other useful data. Our platform extends an existing opensource Big Data Management System, Apache AsterixDB, with an active toolkit. The toolkit contains features to rapidly ingest semistructured data, share execution pipelines among users, manage scaled user data subscriptions, and actively monitor the state of the data to produce individualized information for each user. This paper describes the features and design of our current BAD data platform and demonstrates its ability to scale without sacrificing query capabilities or result individualization.
 [103] arXiv:2002.09758 [pdf, other]

Title: Unsupervised Question Decomposition for Question AnsweringSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We aim to improve question answering (QA) by decomposing hard questions into easier subquestions that existing QA systems can answer. Since collecting labeled decompositions is cumbersome, we propose an unsupervised approach to produce subquestions. Specifically, by leveraging >10M questions from Common Crawl, we learn to map from the distribution of multihop questions to the distribution of singlehop subquestions. We answer subquestions with an offtheshelf QA model and incorporate the resulting answers in a downstream, multihop QA system. On a popular multihop QA dataset, HotpotQA, we show large improvements over a strong baseline, especially on adversarial and outofdomain questions. Our method is generally applicable and automatically learns to decompose questions of different classes, while matching the performance of decomposition methods that rely heavily on handengineering and annotation.
 [104] arXiv:2002.09759 [pdf, ps, other]

Title: BlockTerm Tensor Decomposition: Model Selection and ComputationSubjects: Numerical Analysis (math.NA)
The socalled blockterm decomposition (BTD) tensor model has been recently receiving increasing attention due to its enhanced representation ability in numerous applications involving mixing of signals of rank higher than one (blocks). Its uniqueness and approximation have thus been thoroughly studied. Nevertheless, the problem of estimating the BTD model structure, namely the number of block terms and their individual ranks, has only recently started to attract significant attention, as it is more challenging compared to more classical tensor models such as canonical polyadic decomposition (CPD) and Tucker decomposition (TD). This article briefly reports our recent results on this topic, which are based on an appropriate extension to the BTD model of our earlier rankrevealing work on lowrank matrix and tensor approximation. The idea is to impose column sparsity \emph{jointly} on the factors and successively estimate the ranks as the numbers of factor columns of nonnegligible magnitude, with the aid of alternating iteratively reweighted least squares (IRLS). Simulation results are reported that demonstrate the effectiveness of our method in accurately estimating both the ranks and the factors of the least squares BTD approximation, and in a computationally efficient manner.
 [105] arXiv:2002.09763 [pdf, other]

Title: Longitudinal Support Vector Machines for High Dimensional Time SeriesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We consider the problem of learning a classifier from observed functional data. Here, each datapoint takes the form of a single timeseries and contains numerous features. Assuming that each such series comes with a binary label, the problem of learning to predict the label of a new coming timeseries is considered. Hereto, the notion of {\em margin} underlying the classical support vector machine is extended to the continuous version for such data. The longitudinal support vector machine is also a convex optimization problem and its dual form is derived as well. Empirical results for specified cases with significance tests indicate the efficacy of this innovative algorithm for analyzing such longterm multivariate data.
 [106] arXiv:2002.09765 [pdf, other]

Title: Predictive refinement methodology for compressed sensing imagingAuthors: Alfredo NavaTudelaComments: 33 pages, 9 figures, 1 tableSubjects: Information Theory (cs.IT)
The weak$\ell^p$ norm can be used to define a measure $s$ of sparsity. When we compute $s$ for the discrete cosine transform coefficients of a signal, the value of $s$ is related to the information content of said signal. We use this value of $s$ to define a referencefree index $\mathcal{E}$, called the sparsity index, that we can use to predict with high accuracy the quality of signal reconstruction in the setting of compressed sensing imaging. That way, when compressed sensing is framed in the context of sampling theory, we can use $\mathcal{E}$ to decide when to further partition the sampling space and increase the sampling rate to optimize the recovery of an image when we use compressed sensing techniques.
 [107] arXiv:2002.09766 [pdf, other]

Title: Improving the Tightness of Convex Relaxation Bounds for Training Certifiably Robust ClassifiersSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Convex relaxations are effective for training and certifying neural networks against normbounded adversarial attacks, but they leave a large gap between certifiable and empirical robustness. In principle, convex relaxation can provide tight bounds if the solution to the relaxed problem is feasible for the original nonconvex problem. We propose two regularizers that can be used to train neural networks that yield tighter convex relaxation bounds for robustness. In all of our experiments, the proposed regularizers result in higher certified accuracy than nonregularized baselines.
 [108] arXiv:2002.09772 [pdf, other]

Title: NonIntrusive Detection of Adversarial Deep Learning Attacks via Observer NetworksComments: 5 pages, 2 figures, 4 tablesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
Recent studies have shown that deep learning models are vulnerable to specifically crafted adversarial inputs that are quasiimperceptible to humans. In this letter, we propose a novel method to detect adversarial inputs, by augmenting the main classification network with multiple binary detectors (observer networks) which take inputs from the hidden layers of the original network (convolutional kernel outputs) and classify the input as clean or adversarial. During inference, the detectors are treated as a part of an ensemble network and the input is deemed adversarial if at least half of the detectors classify it as so. The proposed method addresses the tradeoff between accuracy of classification on clean and adversarial samples, as the original classification network is not modified during the detection process. The use of multiple observer networks makes attacking the detection mechanism nontrivial even when the attacker is aware of the victim classifier. We achieve a 99.5% detection accuracy on the MNIST dataset and 97.5% on the CIFAR10 dataset using the Fast Gradient Sign Attack in a semiwhite box setup. The number of false positive detections is a mere 0.12% in the worst case scenario.
 [109] arXiv:2002.09773 [pdf, ps, other]

Title: Convex Duality of Deep Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study regularized deep neural networks and introduce an analytic framework to characterize the structure of the hidden layers. We show that a set of optimal hidden layer weight matrices for a norm regularized deep neural network training problem can be explicitly found as the extreme points of a convex set. For twolayer linear networks, we first formulate a convex dual program and prove that strong duality holds. We then extend our derivations to prove that strong duality also holds for certain deep networks. In particular, for linear deep networks, we show that each optimal layer weight matrix is rankone and aligns with the previous layers when the network output is scalar. We also extend our analysis to the vector outputs and other convex loss functions. More importantly, we show that the same characterization can also be applied to deep ReLU networks with rankone inputs, where we prove that strong duality still holds and optimal layer weight matrices are rankone for scalar output networks. As a corollary, we prove that norm regularized deep ReLU networks yield spline interpolation for onedimensional datasets which was previously known only for twolayer networks. We then verify our theoretical results via several numerical experiments.
 [110] arXiv:2002.09779 [pdf, other]

Title: Stochasticity in Neural ODEs: An Empirical StudySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Stochastic regularization of neural networks (e.g. dropout) is a widespread technique in deep learning that allows for better generalization. Despite its success, continuoustime models, such as neural ordinary differential equation (ODE), usually rely on a completely deterministic feedforward operation. This work provides an empirical study of stochastically regularized neural ODE on several imageclassification tasks (CIFAR10, CIFAR100, TinyImageNet). Building upon the formalism of stochastic differential equations (SDEs), we demonstrate that neural SDE is able to outperform its deterministic counterpart. Further, we show that data augmentation during the training improves the performance of both deterministic and stochastic versions of the same model. However, the improvements obtained by the data augmentation completely eliminate the empirical gains of the stochastic regularization, making the difference in the performance of neural ODE and neural SDE negligible.
 [111] arXiv:2002.09781 [pdf, other]

Title: On the Inductive Bias of a CNN for Orthogonal Patterns DistributionsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Training overparameterized convolutional neural networks with gradient based methods is the most successful learning method for image classification. However, its theoretical properties are far from understood even for very simple learning tasks. In this work, we consider a simplified image classification task where images contain orthogonal patches and are learned with a 3layer overparameterized convolutional network and stochastic gradient descent. We empirically identify a novel phenomenon where the dotproduct between the learned pattern detectors and their detected patterns are governed by the pattern statistics in the training set. We call this phenomenon Pattern Statistics Inductive Bias (PSI) and prove that PSI holds for a simple setup with two points in the training set. Furthermore, we prove that if PSI holds, stochastic gradient descent has sample complexity $O(d^2\log(d))$ where $d$ is the filter dimension. In contrast, we show a VC dimension lower bound in our setting which is exponential in $d$. Taken together, our results provide strong evidence that PSI is a unique inductive bias of stochastic gradient descent, that guarantees good generalization properties.
 [112] arXiv:2002.09784 [pdf, ps, other]

Title: Compactly Representing Uniform Interpolants for EUF using (conditional) DAGSSubjects: Logic in Computer Science (cs.LO)
The concept of a uniform interpolant for a quantifierfree formula from a given formula with a list of symbols, while wellknown in the logic literature, has been unknown to the formal methods and automated reasoning community. This concept is precisely defined. Two algorithms for computing the uniform interpolant of a quantifierfree formula in EUF endowed with a list of symbols to be eliminated are proposed. The first algorithm is nondeterministic and generates a uniform interpolant expressed as a disjunction of conjunction of literals, whereas the second algorithm gives a compact representation of a uniform interpolant as a conjunction of Horn clauses. Both algorithms exploit efficient dedicated DAG representation of terms. Correctness and completeness proofs are supplied, using arguments combining rewrite techniques with modeltheoretic tools.
 [113] arXiv:2002.09786 [pdf, other]

Title: HarDNN: Feature Map Vulnerability Evaluation in CNNsAuthors: Abdulrahman Mahmoud, Siva Kumar Sastry Hari, Christopher W. Fletcher, Sarita V. Adve, Charbel Sakr, Naresh Shanbhag, Pavlo Molchanov, Michael B. Sullivan, Timothy Tsai, Stephen W. KecklerComments: 14 pages, 5 figures, a short version accepted for publication in First Workshop on Secure and Resilient Autonomy (SARA) colocated with MLSys2020Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
As Convolutional Neural Networks (CNNs) are increasingly being employed in safetycritical applications, it is important that they behave reliably in the face of hardware errors. Transient hardware errors may percolate undesirable state during execution, resulting in softwaremanifested errors which can adversely affect highlevel decision making. This paper presents HarDNN, a softwaredirected approach to identify vulnerable computations during a CNN inference and selectively protect them based on their propensity towards corrupting the inference output in the presence of a hardware error. We show that HarDNN can accurately estimate relative vulnerability of a feature map (fmap) in CNNs using a statistical error injection campaign, and explore heuristics for fast vulnerability assessment. Based on these results, we analyze the tradeoff between error coverage and computational overhead that the system designers can use to employ selective protection. Results show that the improvement in resilience for the added computation is superlinear with HarDNN. For example, HarDNN improves SqueezeNet's resilience by 10x with just 30% additional computations.
 [114] arXiv:2002.09790 [pdf, other]

Title: Shallow2Deep: Indoor Scene Modeling by Single Image UnderstandingComments: Accepted by Pattern RecognitionJournalref: Pattern Recognition. 2020 Feb 12:107271Subjects: Computer Vision and Pattern Recognition (cs.CV)
Dense indoor scene modeling from 2D images has been bottlenecked due to the absence of depth information and cluttered occlusions. We present an automatic indoor scene modeling approach using deep features from neural networks. Given a single RGB image, our method simultaneously recovers semantic contents, 3D geometry and object relationship by reasoning indoor environment context. Particularly, we design a shallowtodeep architecture on the basis of convolutional networks for semantic scene understanding and modeling. It involves multilevel convolutional networks to parse indoor semantics/geometry into nonrelational and relational knowledge. Nonrelational knowledge extracted from shallowend networks (e.g. room layout, object geometry) is fed forward into deeper levels to parse relational semantics (e.g. support relationship). A Relation Network is proposed to infer the support relationship between objects. All the structured semantics and geometry above are assembled to guide a global optimization for 3D scene modeling. Qualitative and quantitative analysis demonstrates the feasibility of our method in understanding and modeling semanticsenriched indoor scenes by evaluating the performance of reconstruction accuracy, computation performance and scene complexity.
 [115] arXiv:2002.09792 [pdf, other]

Title: VisionGuard: Runtime Detection of Adversarial Inputs to Perception SystemsAuthors: Yiannis Kantaros, Taylor Carpenter, Sangdon Park, Radoslav Ivanov, Sooyong Jang, Insup Lee, James WeimerSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO); Image and Video Processing (eess.IV)
Deep neural network (DNN) models have proven to be vulnerable to adversarial attacks. In this paper, we propose VisionGuard, a novel attack and datasetagnostic and computationallylight defense mechanism for adversarial inputs to DNNbased perception systems. In particular, VisionGuard relies on the observation that adversarial images are sensitive to lossy compression transformations. Specifically, to determine if an image is adversarial, VisionGuard checks if the output of the target classifier on a given input image changes significantly after feeding it a transformed version of the image under investigation. Moreover, we show that VisionGuard is computationallylight both at runtime and designtime which makes it suitable for realtime applications that may also involve largescale image domains. To highlight this, we demonstrate the efficiency of VisionGuard on ImageNet, a task that is computationally challenging for the majority of relevant defenses. Finally, we include extensive comparative experiments on the MNIST, CIFAR10, and ImageNet datasets that show that VisionGuard outperforms existing defenses in terms of scalability and detection performance.
 [116] arXiv:2002.09794 [pdf, other]

Title: PoETBiN: Power Efficient Tiny Binary NeuronsComments: Accepted in MLSys 2020 conferenceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The success of neural networks in image classification has inspired various hardware implementations on embedded platforms such as Field Programmable Gate Arrays, embedded processors and Graphical Processing Units. These embedded platforms are constrained in terms of power, which is mainly consumed by the Multiply Accumulate operations and the memory accesses for weight fetching. Quantization and pruning have been proposed to address this issue. Though effective, these techniques do not take into account the underlying architecture of the embedded hardware. In this work, we propose PoETBiN, a LookUp Table based power efficient implementation on resource constrained embedded devices. A modified Decision Tree approach forms the backbone of the proposed implementation in the binary domain. A LUT access consumes far less power than the equivalent Multiply Accumulate operation it replaces, and the modified Decision Tree algorithm eliminates the need for memory accesses. We applied the PoETBiN architecture to implement the classification layers of networks trained on MNIST, SVHN and CIFAR10 datasets, with near stateofthe art results. The energy reduction for the classifier portion reaches up to six orders of magnitude compared to a floating point implementations and up to three orders of magnitude when compared to recent binary quantized neural networks.
 [117] arXiv:2002.09795 [pdf, ps, other]

Title: Periodic QLearningSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
The use of target networks is a common practice in deep reinforcement learning for stabilizing the training; however, theoretical understanding of this technique is still limited. In this paper, we study the socalled periodic Qlearning algorithm (PQlearning for short), which resembles the technique used in deep Qlearning for solving infinitehorizon discounted Markov decision processes (DMDP) in the tabular setting. PQlearning maintains two separate Qvalue estimates  the online estimate and target estimate. The online estimate follows the standard Qlearning update, while the target estimate is updated periodically. In contrast to the standard Qlearning, PQlearning enjoys a simple finite time analysis and achieves better sample complexity for finding an epsilonoptimal policy. Our result provides a preliminary justification of the effectiveness of utilizing target estimates or networks in Qlearning algorithms.
 [118] arXiv:2002.09797 [pdf, other]

Title: Reliable Fidelity and Diversity Metrics for Generative ModelsComments: First two authors have contributed equallySubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Devising indicative evaluation metrics for the image generation task remains an open problem. The most widely used metric for measuring the similarity between real and generated images has been the Fr\'echet Inception Distance (FID) score. Because it does not differentiate the fidelity and diversity aspects of the generated images, recent papers have introduced variants of precision and recall metrics to diagnose those properties separately. In this paper, we show that even the latest version of the precision and recall metrics are not reliable yet. For example, they fail to detect the match between two identical distributions, they are not robust against outliers, and the evaluation hyperparameters are selected arbitrarily. We propose density and coverage metrics that solve the above issues. We analytically and experimentally show that density and coverage provide more interpretable and reliable signals for practitioners than the existing metrics. Code: https://github.com/clovaai/generativeevaluationprdc.
 [119] arXiv:2002.09799 [pdf, other]

Title: Sample Debiasing in the Themis Open World Database System (Extended Version)Comments: SIGMOD 2020Subjects: Databases (cs.DB)
Open world database management systems assume tuples not in the database still exist and are becoming an increasingly important area of research. We present Themis, the first open world database that automatically rebalances arbitrarily biased samples to approximately answer queries as if they were issued over the entire population. We leverage apriori population aggregate information to develop and combine two different approaches for automatic debiasing: sample reweighting and Bayesian network probabilistic modeling. We build a prototype of Themis and demonstrate that Themis achieves higher query accuracy than the default AQP approach, an alternative sample reweighting technique, and a variety of Bayesian network models while maintaining interactive query response times. We also show that \name is robust to differences in the support between the sample and population, a key use case when using social media samples.
 [120] arXiv:2002.09801 [pdf, other]

Title: Highorder Methods for a Pressure Poisson Equation Reformulation of the NavierStokes Equations with Electric Boundary ConditionsSubjects: Numerical Analysis (math.NA)
Pressure Poisson equation (PPE) reformulations of the incompressible NavierStokes equations (NSE) replace the incompressibility constraint by a Poisson equation for the pressure and a suitable choice of boundary conditions. This yields a timeevolution equation for the velocity field only, with the pressure gradient acting as a nonlocal operator. Thus, numerical methods based on PPE reformulations, in principle, have no limitations in achieving high order. In this paper, it is studied to what extent highorder methods for the NSE can be obtained from a specific PPE reformulation with electric boundary conditions (EBC). To that end, implicitexplicit (IMEX) timestepping is used to decouple the pressure solve from the velocity update, while avoiding a parabolic timestep restriction; and mixed finite elements are used in space, to capture the structure imposed by the EBC. Via numerical examples, it is demonstrated that the methodology can yield at least third order accuracy in space and time.
 [121] arXiv:2002.09803 [pdf, other]

Title: Author Name Disambiguation on Heterogeneous Information Network with Adversarial Representation LearningComments: AAAI 2020Subjects: Social and Information Networks (cs.SI); Digital Libraries (cs.DL)
Author name ambiguity causes inadequacy and inconvenience in academic information retrieval, which raises the necessity of author name disambiguation (AND). Existing AND methods can be divided into two categories: the models focusing on content information to distinguish whether two papers are written by the same author, the models focusing on relation information to represent information as edges on the network and to quantify the similarity among papers. However, the former requires adequate labeled samples and informative negative samples, and are also ineffective in measuring the highorder connections among papers, while the latter needs complicated feature engineering or supervision to construct the network. We propose a novel generative adversarial framework to grow the two categories of models together: (i) the discriminative module distinguishes whether two papers are from the same author, and (ii) the generative module selects possibly homogeneous papers directly from the heterogeneous information network, which eliminates the complicated feature engineering. In such a way, the discriminative module guides the generative module to select homogeneous papers, and the generative module generates highquality negative samples to train the discriminative module to make it aware of highorder connections among papers. Furthermore, a selftraining strategy for the discriminative module and a random walk based generating algorithm are designed to make the training stable and efficient. Extensive experiments on two realworld AND benchmarks demonstrate that our model provides significant performance improvement over the stateoftheart methods.
 [122] arXiv:2002.09805 [pdf, ps, other]

Title: RiskAware Optimization of Age of Information in the Internet of ThingsComments: 6 pages. Accepted to IEEE International Conference on Communications (ICC) 2020, Dublin, IrelandSubjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI); Systems and Control (eess.SY)
For timesensitive Internet of Things (IoT) applications, a riskneutral approach for age of information (AoI) optimization which focuses only on minimizing the expected value of the AoI based cost function, cannot capture rare yet critical events with potentially very large AoI. Thus, in this paper, in order to quantify such rare events, an effective coherent risk measure, called the conditional valueatrisk (CVaR), is studied for the purpose of minimizing the AoI of realtime IoT status updates. Particularly, a realtime IoT monitoring system is considered in which an IoT device monitors a physical process and sends the status updates to a remote receiver with an updating cost. The optimal status updating process is designed to jointly minimize the AoI at the receiver, the CVaR of the AoI at the receiver, and the energy cost. This stochastic optimization problem is formulated as an infinite horizon discounted riskaware Markov decision process (MDP), which is computationally intractable due to the time inconsistency of the CVaR. By exploiting the special properties of coherent risk measures, the riskaware MDP is reduced to a standard MDP with an augmented state space, for which, a dynamic programming based solution is proposed to derive the optimal stationary policy. In particular, the optimal historydependent policy of the riskaware MDP is shown to depend on the history only through the augmented system states and can be readily constructed using the optimal stationary policy of the augmented MDP. The proposed solution is computationally tractable and minimizes the AoI in realtime IoT monitoring systems in a riskaware manner.
 [123] arXiv:2002.09807 [pdf, other]

Title: Online Stochastic MaxWeight Matching: prophet inequality for vertex and edge arrival modelsComments: 29 pages, 2 figuresSubjects: Data Structures and Algorithms (cs.DS); Computer Science and Game Theory (cs.GT)
We provide prophet inequality algorithms for online weighted matching in general (nonbipartite) graphs, under two wellstudied arrival models, namely edge arrival and vertex arrival. The weight of each edge is drawn independently from an apriori known probability distribution. Under edge arrival, the weight of each edge is revealed upon arrival, and the algorithm decides whether to include it in the matching or not. Under vertex arrival, the weights of all edges from the newly arriving vertex to all previously arrived vertices are revealed, and the algorithm decides which of these edges, if any, to include in the matching. To study these settings, we introduce a novel unified framework of batched prophet inequalities that captures online settings where elements arrive in batches; in particular it captures matching under the two aforementioned arrival models. Our algorithms rely on the construction of suitable online contention resolution scheme (OCRS). We first extend the framework of OCRS to batchedOCRS, we then establish a reduction from batched prophet inequality to batched OCRS, and finally we construct batched OCRSs with selectable ratios of 0.337 and 0.5 for edge and vertex arrival models, respectively. Both results improve the state of the art for the corresponding settings. For the vertex arrival, our result is tight. Interestingly, a pricingbased prophet inequality with comparable competitive ratios is unknown.
 [124] arXiv:2002.09808 [pdf, other]

Title: My Fair Bandit: Distributed Learning of MaxMin Fairness with Multiplayer BanditsSubjects: Computer Science and Game Theory (cs.GT); Multiagent Systems (cs.MA)
Consider N cooperative but noncommunicating players where each plays one out of M arms for T turns. Players have different utilities for each arm, representable as an N x M matrix. These utilities are unknown to the players. In each turn players receive noisy observations of their utility for their selected arm. However, if any other players selected the same arm that turn, they will all receive zero utility due to the conflict. No other communication or coordination between the players is possible. Our goal is to design a distributed algorithm that learns the matching between players and arms that achieves maxmin fairness while minimizing the regret. We present an algorithm and prove that it is regret optimal up to a $\log\log T$ factor. This is the first maxmin fairness multiplayer bandit algorithm with (near) order optimal regret.
 [125] arXiv:2002.09809 [pdf, other]

Title: Random Bundle: Brain Metastases Segmentation Ensembling through Annotation RandomizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce a novel ensembling method, Random Bundle (RB), that improves performance for brain metastases segmentation. We create our ensemble by training each network on our dataset with 50% of our annotated lesions censored out. We also apply a lopsided bootstrap loss to recover performance after inducing an in silico 50% false negative rate and make our networks more sensitive. We improve our network detection of lesions's mAP value by 39% and more than triple the sensitivity at 80% precision. We also show slight improvements in segmentation quality through DICE score. Further, RB ensembling improves performance over baseline by a larger margin than a variety of popular ensembling strategies. Finally, we show that RB ensembling is computationally efficient by comparing its performance to a single network when both systems are constrained to have the same compute.
 [126] arXiv:2002.09811 [pdf, other]

Title: Automatic Cost Function Learning with Interpretable Compositional NetworksSubjects: Artificial Intelligence (cs.AI)
Cost Function Networks (CFN) are a formalism in Constraint Programming to model combinatorial satisfaction or optimization problems. By associating a function to each constraint type to evaluate the quality of an assignment, it extends the expressivity of regular CSP/COP formalisms but at a price of making harder the problem modeling. Indeed, in addition to regular variables/domains/constraints sets, one must provide a set of cost functions that are not always easy to define. Here we propose a method to automatically learn a cost function of a constraint, given a function deciding if assignments are valid or not. This is to the best of our knowledge the first attempt to automatically learn cost functions. Our method aims to learn cost functions in a supervised fashion, trying to reproduce the Hamming distance, by using a variation of neural networks we named Interpretable Compositional Networks, allowing us to get explainable results, unlike regular artificial neural networks. We experiment it on 5 different constraints to show its versatility. Experiments show that functions learned on small dimensions scale on high dimensions, outputting a perfect or nearperfect Hamming distance for most constraints. Our system can be used to automatically generate cost functions and then having the expressivity of CFN with the same modeling effort than for CSP/COP.
 [127] arXiv:2002.09812 [pdf, ps, other]

Title: Sketching Transformed Matrices with Applications to Natural Language ProcessingComments: AISTATS 2020Subjects: Data Structures and Algorithms (cs.DS); Computation and Language (cs.CL); Machine Learning (cs.LG)
Suppose we are given a large matrix $A=(a_{i,j})$ that cannot be stored in memory but is in a disk or is presented in a data stream. However, we need to compute a matrix decomposition of the entrywisely transformed matrix, $f(A):=(f(a_{i,j}))$ for some function $f$. Is it possible to do it in a space efficient way? Many machine learning applications indeed need to deal with such large transformed matrices, for example word embedding method in NLP needs to work with the pointwise mutual information (PMI) matrix, while the entrywise transformation makes it difficult to apply known linear algebraic tools. Existing approaches for this problem either need to store the whole matrix and perform the entrywise transformation afterwards, which is space consuming or infeasible, or need to redesign the learning method, which is application specific and requires substantial remodeling.
In this paper, we first propose a spaceefficient sketching algorithm for computing the product of a given small matrix with the transformed matrix. It works for a general family of transformations with provable small error bounds and thus can be used as a primitive in downstream learning tasks. We then apply this primitive to a concrete application: lowrank approximation. We show that our approach obtains small error and is efficient in both space and time. We complement our theoretical results with experiments on synthetic and real data.  [128] arXiv:2002.09814 [pdf, other]

Title: Survey Bandits with Regret GuaranteesComments: 17 pages, 10 figuresSubjects: Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
We consider a variant of the contextual bandit problem. In standard contextual bandits, when a user arrives we get the user's complete feature vector and then assign a treatment (arm) to that user. In a number of applications (like healthcare), collecting features from users can be costly. To address this issue, we propose algorithms that avoid needless feature collection while maintaining strong regret guarantees.
 [129] arXiv:2002.09818 [pdf, other]

Title: Assembling SemanticallyDisentangled Representations for PredictiveGenerative Models via Adaptation from Synthetic DomainComments: 8 pages, 18 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Deep neural networks can form highlevel hierarchical representations of input data. Various researchers have demonstrated that these representations can be used to enable a variety of useful applications. However, such representations are typically based on the statistics within the data, and may not conform with the semantic representation that may be necessitated by the application. Conditional models are typically used to overcome this challenge, but they require large annotated datasets which are difficult to come by and costly to create. In this paper, we show that semanticallyaligned representations can be generated instead with the help of a physics based engine. This is accomplished by creating a synthetic dataset with decoupled attributes, learning an encoder for the synthetic dataset, and augmenting prescribed attributes from the synthetic domain with attributes from the real domain. It is shown that the proposed (SYNTHVAEGAN) method can construct a conditional predictivegenerative model of human face attributes without relying on real data labels.
 [130] arXiv:2002.09820 [pdf, other]

Title: Deep Reinforcement Learning with Linear Quadratic Regulator RegionsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Robotics (cs.RO); Systems and Control (eess.SY)
Practitioners often rely on computeintensive domain randomization to ensure reinforcement learning policies trained in simulation can robustly transfer to the real world. Due to unmodeled nonlinearities in the real system, however, even such simulated policies can still fail to perform stably enough to acquire experience in real environments. In this paper we propose a novel method that guarantees a stable region of attraction for the output of a policy trained in simulation, even for highly nonlinear systems. Our core technique is to use "biasshifted" neural networks for constructing the controller and training the network in the simulator. The modified neural networks not only capture the nonlinearities of the system but also provably preserve linearity in a certain region of the state space and thus can be tuned to resemble a linear quadratic regulator that is known to be stable for the real system. We have tested our new method by transferring simulated policies for a swingup inverted pendulum to real systems and demonstrated its efficacy.
 [131] arXiv:2002.09825 [pdf, other]

Title: Model Predictive Congestion Control for TCP EndpointsComments: 13 pages, 13 figuresSubjects: Networking and Internet Architecture (cs.NI)
A common problem in science networks and private wide area networks (WANs) is that of achieving predictable data transfers of multiple concurrent flows by maintaining specific pacing rates for each. We address this problem by developing a control algorithm based on concepts from model predictive control (MPC) to produce flows with smooth pacing rates and round trip times (RTTs). In the proposed approach, we model the bottleneck link as a queue and derive a model relating the pacing rate and the RTT. A MPC based control algorithm based on this model is shown to avoid the extreme window (which translates to rate) reduction that exists in current control algorithms when facing network congestion. We have implemented our algorithm as a Linux kernel module. Through simulation and experimental analysis, we show that our algorithm achieves the goals of a low standard deviation of RTT and pacing rate, even when the bottleneck link is fully utilized. In the case of multiple flows, we can assign different rates to each flow and as long as the sum of rates is less than bottleneck rate, they can maintain their assigned pacing rate with low standard deviation. This is achieved even when the flows have different RTTs.
 [132] arXiv:2002.09827 [pdf, ps, other]

Title: Signature in Counterparts, a Formal TreatmentAuthors: Ron van der MeydenSubjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO); Multiagent Systems (cs.MA)
"Signature in counterparts" is a legal process that permits a contract between two or more parties to be brought into force by having the parties independently (possibly, remotely) sign different copies of the contract, rather than placing their signatures on a common copy at a physical meeting. The paper develops a logical understanding of this process, developing a number of axioms that can be used to justify the validity of a contract from the assumption that separate copies have been signed. It is argued that a satisfactory account benefits from a logic with syntactic selfreference. The axioms used are supported by a formal semantics, and a number of further properties of this semantics are investigated. In particular, it is shown that the semantics implies that when a contract is valid, the parties do not just agree, but are in mutual agreement (a commonknowledgelike notion) about the validity of the contract.
 [133] arXiv:2002.09831 [pdf, other]

Title: On the Role of Dataset Quality and Heterogeneity in Model ConfidenceComments: 25 pages, 14 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Safetycritical applications require machine learning models that output accurate and calibrated probabilities. While uncalibrated deep networks are known to make overconfident predictions, it is unclear how model confidence is impacted by the variations in the data, such as label noise or class size. In this paper, we investigate the role of the dataset quality by studying the impact of dataset size and the label noise on the model confidence. We theoretically explain and experimentally demonstrate that, surprisingly, label noise in the training data leads to underconfident networks, while reduced dataset size leads to overconfident models. We then study the impact of dataset heterogeneity, where data quality varies across classes, on model confidence. We demonstrate that this leads to heterogenous confidence/accuracy behavior in the test data and is poorly handled by the standard calibration algorithms. To overcome this, we propose an intuitive heterogenous calibration technique and show that the proposed approach leads to improved calibration metrics (both average and worstcase errors) on the CIFAR datasets.
 [134] arXiv:2002.09832 [pdf, other]

Title: Sequence Preserving Network Traffic GenerationAuthors: Sigal Shaked, Amos Zamir, Roman Vainshtein, Moshe Unger, Lior Rokach, Rami Puzis, Bracha ShapiraSubjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
We present the Network Traffic Generator (NTG), a framework for perturbing recorded network traffic with the purpose of generating diverse but realistic background traffic for network simulation and whatif analysis in enterprise environments. The framework preserves many characteristics of the original traffic recorded in an enterprise, as well as sequences of network activities. Using the proposed framework, the original traffic flows are profiled using 200 crossprotocol features. The traffic is aggregated into flows of packets between IP pairs and clustered into groups of similar network activities. Sequences of network activities are then extracted. We examined two methods for extracting sequences of activities: a Markov model and a neural language model. Finally, new traffic is generated using the extracted model. We developed a prototype of the framework and conducted extensive experiments based on two real network traffic collections. Hypothesis testing was used to examine the difference between the distribution of original and generated features, showing that 30100\% of the extracted features were preserved. Small differences between ngram perplexities in sequences of network activities in the original and generated traffic, indicate that sequences of network activities were well preserved.
 [135] arXiv:2002.09834 [pdf, other]

Title: PrivGen: Preserving Privacy of Sequences Through Data GenerationSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Sequential data is everywhere, and it can serve as a basis for research that will lead to improved processes. For example, road infrastructure can be improved by identifying bottlenecks in GPS data, or early diagnosis can be improved by analyzing patterns of disease progression in medical data. The main obstacle is that access and use of such data is usually limited or not permitted at all due to concerns about violating user privacy, and rightly so. Anonymizing sequence data is not a simple task, since a user creates an almost unique signature over time. Existing anonymization methods reduce the quality of information in order to maintain the level of anonymity required. Damage to quality may disrupt patterns that appear in the original data and impair the preservation of various characteristics. Since in many cases the researcher does not need the data as is and instead is only interested in the patterns that exist in the data, we propose PrivGen, an innovative method for generating data that maintains patterns and characteristics of the source data. We demonstrate that the data generation mechanism significantly limits the risk of privacy infringement. Evaluating our method with realworld datasets shows that its generated data preserves many characteristics of the data, including the sequential model, as trained based on the source data. This suggests that the data generated by our method could be used in place of actual data for various types of analysis, maintaining user privacy and the data's integrity at the same time.
 [136] arXiv:2002.09836 [pdf, other]

Title: Fill in the BLANC: Humanfree quality estimation of document summariesComments: 12 pages, 9 figures, 2 tablesSubjects: Computation and Language (cs.CL)
We present BLANC, a new approach to the automatic estimation of document summary quality. Our goal is to measure the functional performance of a summary with an objective, reproducible, and fully automated method. Our approach achieves this by measuring the performance boost gained by a pretrained language model with access to a document summary while carrying out its language understanding task on the document's text. We present evidence that BLANC scores have at least as good correlation with human evaluations as do the ROUGE family of summary quality measurements. And unlike ROUGE, the BLANC method does not require humanwritten reference summaries, allowing for fully humanfree summary quality estimation.
 [137] arXiv:2002.09841 [pdf, other]

Title: SetRank: A Setwise Bayesian Approach for Collaborative Ranking from Implicit FeedbackComments: This paper has been accepted in AAAI'20Journalref: The ThirtyFourth AAAI Conference on Artificial Intelligenc (AAAI'20), New York, New York, USA, 2020Subjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Machine Learning (stat.ML)
The recent development of online recommender systems has a focus on collaborative ranking from implicit feedback, such as user clicks and purchases. Different from explicit ratings, which reflect graded user preferences, the implicit feedback only generates positive and unobserved labels. While considerable efforts have been made in this direction, the wellknown pairwise and listwise approaches have still been limited by various challenges. Specifically, for the pairwise approaches, the assumption of independent pairwise preference is not always held in practice. Also, the listwise approaches cannot efficiently accommodate "ties" due to the precondition of the entire list permutation. To this end, in this paper, we propose a novel setwise Bayesian approach for collaborative ranking, namely SetRank, to inherently accommodate the characteristics of implicit feedback in recommender system. Specifically, SetRank aims at maximizing the posterior probability of novel setwise preference comparisons and can be implemented with matrix factorization and neural networks. Meanwhile, we also present the theoretical analysis of SetRank to show that the bound of excess risk can be proportional to $\sqrt{M/N}$, where $M$ and $N$ are the numbers of items and users, respectively. Finally, extensive experiments on four realworld datasets clearly validate the superiority of SetRank compared with various stateoftheart baselines.
 [138] arXiv:2002.09843 [pdf, other]

Title: Practical and Bilateral Privacypreserving Federated LearningComments: Submitted to ICML 2020Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Federated learning, as an emerging distributed training model of neural networks without collecting raw data, has attracted widespread attention. However, almost all existing researches of federated learning only consider protecting the privacy of clients, but not preventing model iterates and final model parameters from leaking to untrusted clients and external attackers. In this paper, we present the first bilateral privacypreserving federated learning scheme, which protects not only the raw training data of clients, but also model iterates during the training phase as well as final model parameters. Specifically, we present an efficient privacypreserving technique to mask or encrypt the global model, which not only allows clients to train over the noisy global model, but also ensures only the server can obtain the exact updated model. Detailed security analysis shows that clients can access neither model iterates nor the final global model; meanwhile, the server cannot obtain raw training data of clients from additional information used for recovering the exact updated model. Finally, extensive experiments demonstrate the proposed scheme has comparable model accuracy with traditional federated learning without bringing much extra communication overhead.
 [139] arXiv:2002.09846 [pdf, other]

Title: Tree++: Truncated Tree Based Graph KernelsSubjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
Graphstructured data arise ubiquitously in many application domains. A fundamental problem is to quantify their similarities. Graph kernels are often used for this purpose, which decompose graphs into substructures and compare these substructures. However, most of the existing graph kernels do not have the property of scaleadaptivity, i.e., they cannot compare graphs at multiple levels of granularities. Many realworld graphs such as molecules exhibit structure at varying levels of granularities. To tackle this problem, we propose a new graph kernel called Tree++ in this paper. At the heart of Tree++ is a graph kernel called the pathpattern graph kernel. The pathpattern graph kernel first builds a truncated BFS tree rooted at each vertex and then uses paths from the root to every vertex in the truncated BFS tree as features to represent graphs. The pathpattern graph kernel can only capture graph similarity at fine granularities. In order to capture graph similarity at coarse granularities, we incorporate a new concept called super path into it. The super path contains truncated BFS trees rooted at the vertices in a path. Our evaluation on a variety of realworld graphs demonstrates that Tree++ achieves the best classification accuracy compared with previous graph kernels.
 [140] arXiv:2002.09848 [pdf, ps, other]

Title: A new regularization method for a parameter identification problem in a nonlinear partial differential equationSubjects: Numerical Analysis (math.NA); Functional Analysis (math.FA)
We consider a parameter identification problem related to a quasilinear elliptic Neumann boundary value problem involving a parameter function $a(\cdot)$ and the solution $u(\cdot)$, where the problem is to identify $a(\cdot)$ on an interval $I:= g(\Gamma)$ from the knowledge of the solution $u(\cdot)$ as $g$ on $\Gamma$, where $\Gamma$ is a given curve on the boundary of the domain $\Omega \subseteq \mathbb{R}^3$ of the problem and $g$ is a continuous function. For obtaining stable approximate solutions, we consider new regularization method which gives error estimates similar to, and in certain cases better than, the classical Tikhonov regularization considered in the literature in recent past.
 [141] arXiv:2002.09849 [pdf, other]

Title: MultiAntenna UAV Data Harvesting: Joint Trajectory and Communication OptimizationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Unmanned aerial vehicle (UAV)enabled communication is a promising technology to extend coverage and enhance throughput for traditional terrestrial wireless communication systems. In this paper, we consider a UAVenabled wireless sensor network (WSN), where a multiantenna UAV is dispatched to collect data from a group of sensor nodes (SNs). The objective is to maximize the minimum data collection rate from all SNs via jointly optimizing their transmission scheduling and power allocations as well as the trajectory of the UAV, subject to the practical constraints on the maximum transmit power of the SNs and the maximum speed of the UAV. The formulated optimization problem is challenging to solve as it involves nonconvex constraints and discretevalue variables. To draw useful insight, we first consider the special case of the formulated problem by ignoring the UAV speed constraint and optimally solve it based on the Lagrange duality method. It is shown that for this relaxed problem, the UAV should hover above a finite number of optimal locations with different durations in general. Next, we address the general case of the formulated problem where the UAV speed constraint is considered and propose a traveling salesman problem (TSP)based trajectory initialization, where the UAV sequentially visits the locations obtained in the relaxed problem with minimum flying time. Given this initial trajectory, we then find the corresponding transmission scheduling and power allocations of the SNs and further optimize the UAV trajectory by applying the block coordinate descent (BCD) and successive convex approximation (SCA) techniques. Finally, numerical results are provided to illustrate the spectrum and energy efficiency gains of the proposed scheme for multiantenna UAV data harvesting, as compared to benchmark schemes.
 [142] arXiv:2002.09850 [pdf, other]

Title: Active localization of multiple targets using noisy relative measurementsComments: 8 pages, 5 figuresSubjects: Robotics (cs.RO)
Consider a mobile robot tasked with localizing targets at unknown locations by obtaining relative measurements. The observations can be bearing or range measurements. How should the robot move so as to localize the targets and minimize the uncertainty in their locations as quickly as possible? Most existing approaches are either greedy in nature or rely on accurate initial estimates.
We formulate this path planning problem as an unsupervised learning problem where the measurements are aggregated using a Bayesian histogram filter. The robot learns to minimize the total uncertainty of each target in the shortest amount of time using the current measurement and an aggregate representation of the current belief state. We analyze our method in a series of experiments where we show that our method outperforms a standard greedy approach. In addition, its performance is also comparable to an offline algorithm which has access to the true location of the targets.  [143] arXiv:2002.09853 [pdf, other]

Title: Optimizing Traffic Lights with Multiagent Deep Reinforcement Learning and V2X communicationComments: 7 Figure, Table 1Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Multiagent Systems (cs.MA)
We consider a system to optimize duration of traffic signals using multiagent deep reinforcement learning and VehicletoEverything (V2X) communication. This system aims at analyzing independent and shared rewards for multiagents to control duration of traffic lights. A learning agent traffic light gets information along its lanes within a circular V2X coverage. The duration cycles of traffic light are modeled as Markov decision Processes. We investigate four variations of reward functions. The first two are unsharedrewards: based on waiting number, and waiting time of vehicles between two cycles of traffic light. The third and fourth functions are: sharedrewards based on waiting cars, and waiting time for all agents. Each agent has a memory for optimization through target network and prioritized experience replay. We evaluate multiagents through the Simulation of Urban MObility (SUMO) simulator. The results prove effectiveness of the proposed system to optimize traffic signals and reduce average waiting cars to 41.5 % as compared to the traditional periodic traffic control system.
 [144] arXiv:2002.09854 [pdf, ps, other]

Title: Crossing the Reality Gap with Evolved Plastic NeurocontrollersComments: Submitted to GECCO2020Subjects: Robotics (cs.RO); Neural and Evolutionary Computing (cs.NE)
A critical issue in evolutionary robotics is the transfer of controllers learned in simulation to reality. This is especially the case for small Unmanned Aerial Vehicles (UAVs), as the platforms are highly dynamic and susceptible to breakage. Previous approaches often require simulation models with a high level of accuracy, otherwise significant errors may arise when the welldesigned controller is being deployed onto the targeted platform. Here we try to overcome the transfer problem from a different perspective, by designing a spiking neurocontroller which uses synaptic plasticity to cross the reality gap via online adaptation. Through a set of experiments we show that the evolved plastic spiking controller can maintain its functionality by selfadapting to model changes that take place after evolutionary training, and consequently exhibit better performance than its nonplastic counterpart.
 [145] arXiv:2002.09857 [pdf, ps, other]

Title: Verifying Array Manipulating Programs with FullProgram InductionSubjects: Software Engineering (cs.SE); Programming Languages (cs.PL)
We present a fullprogram induction technique for proving (a subclass of) quantified as well as quantifierfree properties of programs manipulating arrays of parametric size N. Instead of inducting over individual loops, our technique inducts over the entire program (possibly containing multiple loops) directly via the program parameter N. Significantly, this does not require generation or use of loopspecific invariants. We have developed a prototype tool Vajra to assess the efficacy of our technique. We demonstrate the performance of Vajra visavis several stateoftheart tools on a set of array manipulating benchmarks.
 [146] arXiv:2002.09858 [pdf, other]

Title: Deep Learning Based FDD NonStationary Massive MIMO Downlink Channel ReconstructionSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
This paper proposes a modeldriven deep learningbased downlink channel reconstruction scheme for frequency division duplexing (FDD) massive multiinput multioutput (MIMO) systems. The spatial nonstationarity, which is the key feature of the future extremely large aperture massive MIMO system, is considered. Instead of the channel matrix, the channel model parameters are learned by neural networks to save the overhead and improve the accuracy of channel reconstruction. By viewing the channel as an image, we introduce You Only Look Once (YOLO), a powerful neural network for object detection, to enable a rapid estimation process of the model parameters, including the detection of angles and delays of the paths and the identification of visibility regions of the scatterers. The deep learningbased scheme avoids the complicated iterative process introduced by the algorithmbased parameter extraction methods. A lowcomplexity algorithmbased refiner further refines the YOLO estimates toward high accuracy. Given the efficiency of modeldriven deep learning and the combination of neural network and algorithm, the proposed scheme can rapidly and accurately reconstruct the nonstationary downlink channel. Moreover, the proposed scheme is also applicable to widely concerned stationary systems and achieves comparable reconstruction accuracy as an algorithmbased method with greatly reduced time consumption.
 [147] arXiv:2002.09859 [pdf, other]

Title: DotFAN: A Domaintransferred Face Augmentation Network for Pose and Illumination Invariant Face RecognitionComments: 12 pages, 10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
The performance of a convolutional neural network (CNN) based face recognition model largely relies on the richness of labelled training data. Collecting a training set with large variations of a face identity under different poses and illumination changes, however, is very expensive, making the diversity of withinclass face images a critical issue in practice. In this paper, we propose a 3D modelassisted domaintransferred face augmentation network (DotFAN) that can generate a series of variants of an input face based on the knowledge distilled from existing rich face datasets collected from other domains. DotFAN is structurally a conditional CycleGAN but has two additional subnetworks, namely face expert network (FEM) and face shape regressor (FSR), for latent code control. While FSR aims to extract face attributes, FEM is designed to capture a face identity. With their aid, DotFAN can learn a disentangled face representation and effectively generate face images of various facial attributes while preserving the identity of augmented faces. Experiments show that DotFAN is beneficial for augmenting small face datasets to improve their withinclass diversity so that a better face recognition model can be learned from the augmented dataset.
 [148] arXiv:2002.09860 [pdf, other]

Title: Variance Loss in Variational AutoencodersAuthors: Andrea AspertiSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In this article, we highlight what appears to be major issue of Variational Autoencoders, evinced from an extensive experimentation with different network architectures and datasets: the variance of generated data is sensibly lower than that of training data. Since generative models are usually evaluated with metrics such as the Frechet Inception Distance (FID) that compare the distributions of (features of) real versus generated images, the variance loss typically results in degraded scores. This problem is particularly relevant in a two stage setting, where we use a second VAE to sample in the latent space of the first VAE. The minor variance creates a mismatch between the actual distribution of latent variables and those generated by the second VAE, that hinders the beneficial effects of the second stage. Renormalizing the output of the second VAE towards the expected normal spherical distribution, we obtain a sudden burst in the quality of generated samples, as also testified in terms of FID.
 [149] arXiv:2002.09864 [pdf, other]

Title: Stealing BlackBox Functionality Using The Deep Neural Tree ArchitectureComments: 8 pages, 7 figures, 1 tableSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
This paper makes a substantial step towards cloning the functionality of blackbox models by introducing a Machine learning (ML) architecture named Deep Neural Trees (DNTs). This new architecture can learn to separate different tasks of the blackbox model, and clone its taskspecific behavior. We propose to train the DNT using an active learning algorithm to obtain faster and more sampleefficient training. In contrast to prior work, we study a complex "victim" blackbox model based solely on inputoutput interactions, while at the same time the attacker and the victim model may have completely different internal architectures. The attacker is a ML based algorithm whereas the victim is a generally unknown module, such as a multipurpose digital chip, complex analog circuit, mechanical system, software logic or a hybrid of these. The trained DNT module not only can function as the attacked module, but also provides some level of explainability to the cloned model due to the treelike nature of the proposed architecture.
 [150] arXiv:2002.09866 [pdf, other]

Title: On the generalization of bayesian deep nets for multiclass classificationSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we propose a new generalization bound for Bayesian deep nets by exploiting the contractivity of the LogSobolev inequalities. Using these inequalities adds an additional lossgradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. Empirically, we analyze the affect of this lossgradient norm term using different deep nets.
 [151] arXiv:2002.09869 [pdf, ps, other]

Title: Nearoptimal Regret Bounds for Stochastic Shortest PathSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Stochastic shortest path (SSP) is a wellknown problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In the learning formulation of the problem, the agent is unaware of the environment dynamics (i.e., the transition function) and has to repeatedly play for a given number of episodes while reasoning about the problem's optimal solution. Unlike other wellstudied models in reinforcement learning (RL), the length of an episode is not predetermined (or bounded) and is influenced by the agent's actions. Recently, Tarbouriech et al. (2019) studied this problem in the context of regret minimization and provided an algorithm whose regret bound is inversely proportional to the square root of the minimum instantaneous cost. In this work we remove this dependence on the minimum costwe give an algorithm that guarantees a regret bound of $\widetilde{O}(B_\star S \sqrt{A K})$, where $B_\star$ is an upper bound on the expected cost of the optimal policy, $S$ is the set of states, $A$ is the set of actions and $K$ is the number of episodes. We additionally show that any learning algorithm must have at least $\Omega(B_\star \sqrt{S A K})$ regret in the worst case.
 [152] arXiv:2002.09877 [pdf, other]

Title: Automata for HyperlanguagesComments: 12 pages of main paper and another 10 pages appendixSubjects: Formal Languages and Automata Theory (cs.FL); Computation and Language (cs.CL)
Hyperproperties lift conventional trace properties from a set of execution traces to a set of sets of execution traces. Hyperproperties have been shown to be a powerful formalism for expressing and reasoning about informationflow security policies and important properties of cyberphysical systems such as sensitivity and robustness, as well as consistency conditions in distributed computing such as linearizability. Although there is an extensive body of work on automatabased representation of trace properties, we currently lack such characterization for hyperproperties. We introduce hyperautomata for em hyperlanguages, which are languages over sets of words. Essentially, hyperautomata allow running multiple quantified words over an automaton. We propose a specific type of hyperautomata called nondeterministic finite hyperautomata (NFH), which accept regular hyperlanguages. We demonstrate the ability of regular hyperlanguages to express hyperproperties for finite traces. We then explore the fundamental properties of NFH and show their closure under the Boolean operations. We show that while nonemptiness is undecidable in general, it is decidable for several fragments of NFH. We further show the decidability of the membership problem for finite sets and regular languages for NFH, as well as the containment problem for several fragments of NFH. Finally, we introduce learning algorithms based on Angluin's Lstar algorithm for the fragments NFH in which the quantification is either strictly universal or strictly existential.
 [153] arXiv:2002.09880 [pdf, other]

Title: Mixed Integer Programming for Searching Maximum QuasiBicliquesComments: This paper draft is stored here for selfarchiving purposesJournalref: Springer Proceedings in Mathematics & Statistics, vol 315. Springer, Cham (2020)Subjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Discrete Mathematics (cs.DM); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
This paper is related to the problem of finding the maximal quasibicliques in a bipartite graph (bigraph). A quasibiclique in the bigraph is its "almost" complete subgraph. The relaxation of completeness can be understood variously; here, we assume that the subgraph is a $\gamma$quasibiclique if it lacks a certain number of edges to form a biclique such that its density is at least $\gamma \in (0,1]$. For a bigraph and fixed $\gamma$, the problem of searching for the maximal quasibiclique consists of finding a subset of vertices of the bigraph such that the induced subgraph is a quasibiclique and its size is maximal for a given graph. Several models based on Mixed Integer Programming (MIP) to search for a quasibiclique are proposed and tested for working efficiency. An alternative model inspired by biclustering is formulated and tested; this model simultaneously maximizes both the size of the quasibiclique and its density, using the leastsquare criterion similar to the one exploited by triclustering \textsc{TriBox}.
 [154] arXiv:2002.09884 [pdf, other]

Title: Discriminative Particle Filter Reinforcement Learning for Complex Partial ObservationsComments: Accepted to ICLR 2020Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
Deep reinforcement learning is successful in decision making for sophisticated games, such as Atari, Go, etc. However, realworld decision making often requires reasoning with partial information extracted from complex visual observations. This paper presents Discriminative Particle Filter Reinforcement Learning (DPFRL), a new reinforcement learning framework for complex partial observations. DPFRL encodes a differentiable particle filter in the neural network policy for explicit reasoning with partial observations over time. The particle filter maintains a belief using learned discriminative update, which is trained endtoend for decision making. We show that using the discriminative update instead of standard generative models results in significantly improved performance, especially for tasks with complex visual observations, because they circumvent the difficulty of modeling complex observations that are irrelevant to decision making. In addition, to extract features from the particle belief, we propose a new type of belief feature based on the moment generating function. DPFRL outperforms stateoftheart POMDP RL models in Flickering Atari Games, an existing POMDP RL benchmark, and in Natural Flickering Atari Games, a new, more challenging POMDP RL benchmark introduced in this paper. Further, DPFRL performs well for visual navigation with realworld data in the Habitat environment.
 [155] arXiv:2002.09885 [pdf, other]

Title: Speeding up the AIFV$2$ dynamic programs by two orders of magnitude using Range Minimum QueriesSubjects: Data Structures and Algorithms (cs.DS); Information Theory (cs.IT)
AIFV$2$ codes are a new method for constructing lossless codes for memoryless sources that provide better worstcase redundancy than Huffman codes. They do this by using two code trees instead of one and also allowing some bounded delay in the decoding process. Known algorithms for constructing AIFVcode are iterative; at each step they replace the current code tree pair with a "better" one. The current state of the art for performing this replacement is a pair of Dynamic Programming (DP) algorithms that use $O(n^5)$ time to fill in two tables, each of size $O(n^3)$ (where $n$ is the number of different characters in the source).
This paper describes how to reduce the time for filling in the DP tables by two orders of magnitude, down to $O(n^3)$. It does this by introducing a grouping technique that permits separating the $\Theta(n^3)$space tables into $\Theta(n)$ groups, each of size $O(n^2)$, and then using TwoDimensional RangeMinimum Queries (RMQs) to fill in that group's table entries in $O(n^2)$ time. This RMQ speedup technique seems to be new and might be of independent interest.  [156] arXiv:2002.09891 [pdf, other]

Title: EndToEnd Graphbased Deep SemiSupervised LearningComments: 5 figures, 6 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The quality of a graph is determined jointly by three key factors of the graph: nodes, edges and similarity measure (or edge weights), and is very crucial to the success of graphbased semisupervised learning (SSL) approaches. Recently, dynamic graph, which means part/all its factors are dynamically updated during the training process, has demonstrated to be promising for graphbased semisupervised learning. However, existing approaches only update part of the three factors and keep the rest manually specified during the learning stage. In this paper, we propose a novel graphbased semisupervised learning approach to optimize all three factors simultaneously in an endtoend learning fashion. To this end, we concatenate two neural networks (feature network and similarity network) together to learn the categorical label and semantic similarity, respectively, and train the networks to minimize a unified SSL objective function. We also introduce an extended graph Laplacian regularization term to increase training efficiency. Extensive experiments on several benchmark datasets demonstrate the effectiveness of our approach.
 [157] arXiv:2002.09893 [pdf, other]

Title: Efficient Compression of Long Arbitrary Sequences with No Reference at the EncoderSubjects: Information Theory (cs.IT)
In a distributed information application an encoder compresses an arbitrary vector while a similar reference vector is available to the decoder as side information. For the Hammingdistance similarity measure, and when guaranteed perfect reconstruction is required, we present two contributions to the solution of this problem. One result shows that when a set of potential reference vectors is available to the encoder, lower compression rates can be achieved when the set satisfies a certain clustering property. Another result reduces the best known decoding complexity from exponential in the vector length $n$ to $O(n^{1.5})$ by generalized concatenation of inner coset codes and outer errorcorrecting codes. One potential application of the results is the compression of DNA sequences, where similar (but not identical) reference vectors are shared among senders and receivers.
 [158] arXiv:2002.09895 [pdf, other]

Title: Treeplication: An Erasure Code for Distributed Full Recovery under the Random Multiset ChannelSubjects: Information Theory (cs.IT)
This paper presents a new erasure code called Treeplication designed for distributed recovery of the full information word, while most prior work in coding for distributed storage only supports distributed repair of individual symbols. A Treeplication code for $k$ information symbols is defined on a binary tree with $2k1$ vertices, along with a distribution for selecting code symbols from the tree layers. We analyze and optimize the code under a randommultiset model, which captures the system property that the nodes available for recovery are drawn randomly from the nodes storing the code symbols. Treeplication codes are shown to have fullrecovery communicationcost comparable to replication, while offering much better recoverability.
 [159] arXiv:2002.09896 [pdf, other]

Title: Adversarial Attack on DLbased Massive MIMO CSI FeedbackComments: 12 pages, 5 figures, 1 table. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
With the increasing application of deep learning (DL) algorithms in wireless communications, the physical layer faces new challenges caused by adversarial attack. Such attack has significantly affected the neural network in computer vision. We chose DLbased analog channel state information (CSI) to show the effect of adversarial attack on DLbased communication system. We present a practical method to craft whitebox adversarial attack on DLbased CSI feedback process. Our simulation results showed the destructive effect adversarial attack caused on DLbased CSI feedback by analyzing the performance of normalized mean square error. We also launched a jamming attack for comparison and found that the jamming attack could be prevented with certain precautions. As DL algorithm becomes the trend in developing wireless communication, this work raises concerns regarding the security in the use of DLbased algorithms.
 [160] arXiv:2002.09898 [pdf, other]

Title: Efficient numerical methods for computing the stationary states of phase field crystal modelsComments: 26 pages, 5 figuresSubjects: Numerical Analysis (math.NA)
Finding the stationary states of a free energy functional is an important problem in phase field crystal (PFC) models. Many efforts have been devoted for designing numerical schemes with energy dissipation and mass conservation properties. However, most existing approaches are timeconsuming due to the requirement of small effective time steps. In this paper, we discretize the energy functional and propose efficient numerical algorithms for solving the constrained nonconvex minimization problem. A class of first order approaches, which is the socalled adaptive accelerated Bregman proximal gradient (AABPG) methods, is proposed and the convergence property is established without the global Lipschitz constant requirements. Moreover, we design a hybrid approach that applies an inexact Newton method to further accelerate the local convergence. One key feature of our algorithm is that the energy dissipation and mass conservation properties hold during the iteration process. Extensive numerical experiments, including two three dimensional periodic crystals in LandauBrazovskii (LB) model and a two dimensional quasicrystal in LifshitzPetrich (LP) model, demonstrate that our approaches have adaptive time steps which lead to a significant acceleration over many existing methods when computing complex structures.
 [161] arXiv:2002.09901 [pdf]

Title: A Nepali Rule Based Stemmer and its performance on different NLP applicationsComments: 5 pages, 2 figures, 3 tablesJournalref: Proceedings of the 4th International IT Conference on ICT with Smart Computing and 9th National Students' Conference on Information Technology, (NaSCoIT 2018), Kathmandu, Nepal, ISSN No 25051075, pp. 16 (December 2018)Subjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Stemming is an integral part of Natural Language Processing (NLP). It's a preprocessing step in almost every NLP application. Arguably, the most important usage of stemming is in Information Retrieval (IR). While there are lots of work done on stemming in languages like English, Nepali stemming has only a few works. This study focuses on creating a Rule Based stemmer for Nepali text. Specifically, it is an affix stripping system that identifies two different class of suffixes in Nepali grammar and strips them separately. Only a single negativity prefix (Na) is identified and stripped. This study focuses on a number of techniques like exception word identification, morphological normalization and word transformation to increase stemming performance. The stemmer is tested intrinsically using Paice's method and extrinsically on a basic tfidf based IR system and an elementary news topic classifier using Multinomial Naive Bayes Classifier. The difference in performance of these systems with and without using the stemmer is analysed.
 [162] arXiv:2002.09905 [pdf, other]

Title: Exploring SpatialTemporal MultiFrequency Analysis for HighFidelity and TemporalConsistency Video PredictionComments: Submitted to a conference, under reviewSubjects: Computer Vision and Pattern Recognition (cs.CV)
Video prediction is a pixelwise dense prediction task to infer future frames based on past frames. Missing appearance details and motion blur are still two major problems for current predictive models, which lead to image distortion and temporal inconsistency. In this paper, we point out the necessity of exploring multifrequency analysis to deal with the two problems. Inspired by the frequency band decomposition characteristic of Human Vision System (HVS), we propose a video prediction network based on multilevel wavelet analysis to deal with spatial and temporal information in a unified manner. Specifically, the multilevel spatial discrete wavelet transform decomposes each video frame into anisotropic subbands with multiple frequencies, helping to enrich structural information and reserve fine details. On the other hand, multilevel temporal discrete wavelet transform which operates on time axis decomposes the frame sequence into subband groups of different frequencies to accurately capture multifrequency motions under a fixed frame rate. Extensive experiments on diverse datasets demonstrate that our model shows significant improvements on fidelity and temporal consistency over stateoftheart works.
 [163] arXiv:2002.09907 [pdf, ps, other]

Title: Performance Analysis of Intelligent Reflecting Surface Assisted NOMA NetworksComments: 13 pages, 11 figuresSubjects: Information Theory (cs.IT)
Intelligent reflecting surface (IRS) is a promising technology to enhance the coverage and performance of wireless networks. We consider the application of IRS to nonorthogonal multiple access (NOMA), where a base station transmits superposed signals to multiple users by the virtue of an IRS. The performance of an IRSassisted NOMA networks with imperfect successive interference cancellation (ipSIC) and perfect successive interference cancellation (pSIC) is investigated by invoking 1bit coding scheme. In particular, we derive new exact and asymptotic expressions for both outage probability and ergodic rate of the mth user with ipSIC/pSIC. Based on analytical results, the diversity order of the mth user with pSIC is in connection with the number of reflecting elements and channel ordering. The high signaltonoise radio (SNR) slope of ergodic rate for the $m$th user is obtained. The throughput and energy efficiency of nonorthogonal users for IRSNOMA are discussed both in delaylimited and delaytolerant transmission modes. Additionally, we derive new exact expressions of outage probability and ergodic rate for IRSassisted orthogonal multiple access (IRSOMA). Numerical results are presented to substantiate our analyses and demonstrate that: i) The outage behaviors of IRSNOMA are superior to that of IRSOMA and relaying schemes; ii) With increasing the number of reflecting elements, IRSNOMA is capable of achieving enhanced outage performance; and iii) The Mth user has a larger ergodic rate compared to IRSOMA and benchmarks. However, the ergodic performance of the $m$th user exceeds relaying schemes in the low SNR regime.
 [164] arXiv:2002.09917 [pdf, other]

Title: Improve SGD Training via Aligning MinbatchesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep neural networks (DNNs) for supervised learning can be viewed as a pipeline of a feature extractor (i.e. last hidden layer) and a linear classifier (i.e. output layer) that is trained jointly with stochastic gradient descent (SGD). In each iteration of SGD, a minibatch from the training data is sampled and the true gradient of the loss function is estimated as the noisy gradient calculated on this minibatch. From the feature learning perspective, the feature extractor should be updated to learn meaningful features with respect to the entire data, and reduce the accommodation to noise in the minibatch. With this motivation, we propose InTraining Distribution Matching (ITDM) to improve DNN training and reduce overfitting. Specifically, along with the loss function, ITDM regularizes the feature extractor by matching the moments of distributions of different minibatches in each iteration of SGD, which is fulfilled by minimizing the maximum mean discrepancy. As such, ITDM does not assume any explicit parametric form of data distribution in the latent feature space. Extensive experiments are conducted to demonstrate the effectiveness of our proposed strategy.
 [165] arXiv:2002.09919 [pdf, other]

Title: Do MultiHop Question Answering Systems Know How to Answer the SingleHop SubQuestions?Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Multihop question answering (QA) requires a model to retrieve and integrate information from different parts of a long text to answer a question. Humans answer this kind of complex questions via a divideandconquer approach. In this paper, we investigate whether topperforming models for multihop questions understand the underlying subquestions like humans. We adopt a neural decomposition model to generate subquestions for a multihop complex question, followed by extracting the corresponding subanswers. We show that multiple stateoftheart multihop QA models fail to correctly answer a large portion of subquestions, although their corresponding multihop questions are correctly answered. This indicates that these models manage to answer the multihop questions using some partial clues, instead of truly understanding the reasoning paths. We also propose a new model which significantly improves the performance on answering the subquestions. Our work takes a step forward towards building a more explainable multihop QA system.
 [166] arXiv:2002.09923 [pdf, other]

Title: Monocular Direct Sparse Localization in a Prior 3D Surfel MapComments: 7 pages, 6 figures, to appear in ICRA 2020Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
In this paper, we introduce an approach to tracking the pose of a monocular camera in a prior surfel map. By rendering vertex and normal maps from the prior surfel map, the global planar information for the sparse tracked points in the image frame is obtained. The tracked points with and without the global planar information involve both global and local constraints of frames to the system. Our approach formulates all constraints in the form of direct photometric errors within a local window of the frames. The final optimization utilizes these constraints to provide the accurate estimation of global 6DoF camera poses with the absolute scale. The extensive simulation and realworld experiments demonstrate that our monocular method can provide accurate camera localization results under various conditions.
 [167] arXiv:2002.09925 [pdf, other]

Title: ORCSolver: An Efficient Solver for Adaptive GUI Layout with ORConstraintsComments: Published at CHI2020Subjects: HumanComputer Interaction (cs.HC); Artificial Intelligence (cs.AI); Data Structures and Algorithms (cs.DS)
ORconstrained (ORC) graphical user interface layouts unify conventional constraintbased layouts with flow layouts, which enables the definition of flexible layouts that adapt to screens with different sizes, orientations, or aspect ratios with only a single layout specification. Unfortunately, solving ORC layouts with current solvers is timeconsuming and the needed time increases exponentially with the number of widgets and constraints. To address this challenge, we propose ORCSolver, a novel solving technique for adaptive ORC layouts, based on a branchandbound approach with heuristic preprocessing. We demonstrate that ORCSolver simplifies ORC specifications at runtime and our approach can solve ORC layout specifications efficiently at nearinteractive rates.
 [168] arXiv:2002.09927 [pdf, other]

Title: Weighting Is Worth the Wait: Bayesian Optimization with Importance SamplingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Many contemporary machine learning models require extensive tuning of hyperparameters to perform well. A variety of methods, such as Bayesian optimization, have been developed to automate and expedite this process. However, tuning remains extremely costly as it typically requires repeatedly fully training models. We propose to accelerate the Bayesian optimization approach to hyperparameter tuning for neural networks by taking into account the relative amount of information contributed by each training example. To do so, we leverage importance sampling (IS); this significantly increases the quality of the blackbox function evaluations, but also their runtime, and so must be done carefully. Casting hyperparameter search as a multitask Bayesian optimization problem over both hyperparameters and importance sampling design achieves the best of both worlds: by learning a parameterization of IS that tradesoff evaluation complexity and quality, we improve upon Bayesian optimization stateoftheart runtime and final validation error across a variety of datasets and complex neural architectures.
 [169] arXiv:2002.09928 [pdf, other]

Title: Predictive Sampling with Forecasting Autoregressive ModelsComments: 13 pages, 16 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Autoregressive models (ARMs) currently hold stateoftheart performance in likelihoodbased modeling of image and audio data. Generally, neural network based ARMs are designed to allow fast inference, but sampling from these models is impractically slow. In this paper, we introduce the predictive sampling algorithm: a procedure that exploits the fast inference property of ARMs in order to speed up sampling, while keeping the model intact. We propose two variations of predictive sampling, namely sampling with ARM fixedpoint iteration and learned forecasting modules. Their effectiveness is demonstrated in two settings: i) explicit likelihood modeling on binary MNIST, SVHN and CIFAR10, and ii) discrete latent modeling in an autoencoder trained on SVHN, CIFAR10 and Imagenet32. Empirically, we show considerable improvements over baselines in number of ARM inference calls and sampling speed.
 [170] arXiv:2002.09931 [pdf, other]

Title: The Value of Big Data for Credit Scoring: Enhancing Financial Inclusion using Mobile Phone Data and Social Network AnalyticsJournalref: Applied Soft Computing, Volume 74, January 2019, Pages 2639Subjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY); Machine Learning (cs.LG); Machine Learning (stat.ML)
Credit scoring is without a doubt one of the oldest applications of analytics. In recent years, a multitude of sophisticated classification techniques have been developed to improve the statistical performance of credit scoring models. Instead of focusing on the techniques themselves, this paper leverages alternative data sources to enhance both statistical and economic model performance. The study demonstrates how including call networks, in the context of positive credit information, as a new Big Data source has added value in terms of profit by applying a profit measure and profitbased feature selection. A unique combination of datasets, including calldetail records, credit and debit account information of customers is used to create scorecards for credit card applicants. Calldetail records are used to build call networks and advanced social network analytics techniques are applied to propagate influence from prior defaulters throughout the network to produce influence scores. The results show that combining calldetail records with traditional data in credit scoring models significantly increases their performance when measured in AUC. In terms of profit, the best model is the one built with only calling behavior features. In addition, the calling behavior features are the most predictive in other models, both in terms of statistical and economic performance. The results have an impact in terms of ethical use of calldetail records, regulatory implications, financial inclusion, as well as data sharing and privacy.
 [171] arXiv:2002.09941 [pdf, other]

Title: A Bridge between Polynomial Optimization and Games with Imperfect RecallSubjects: Computer Science and Game Theory (cs.GT); Logic in Computer Science (cs.LO)
We provide several positive and negative complexity results for solving games with imperfect recall. Using a onetoone correspondence between these games on one side and multivariate polynomials on the other side, we show that solving games with imperfect recall is as hard as solving certain problems of the first order theory of reals. We establish square root sum hardness even for the specific class of Aloss games. On the positive side, we find restrictions on games and strategies motivated by Bridge bidding that give polynomialtime complexity.
 [172] arXiv:2002.09942 [pdf, ps, other]

Title: How Good Is a Strategy in a Game With Nature?Journalref: ACM Transactions on Computational Logic, Vol. 21, No 3, Article 21, pp. 139, February 2020Subjects: Formal Languages and Automata Theory (cs.FL); Computer Science and Game Theory (cs.GT); Logic in Computer Science (cs.LO)
We consider games with two antagonistic players  \'Elo\"ise (modelling a program) and Ab\'elard (modelling a byzantine environment)  and a third, unpredictable and uncontrollable player, that we call Nature. Motivated by the fact that the usual probabilistic semantics very quickly leads to undecidability when considering either infinite game graphs or imperfectinformation, we propose two alternative semantics that leads to decidability where the probabilistic one fails: one based on counting and one based on topology.
 [173] arXiv:2002.09943 [pdf, other]

Title: Network Clustering Via KernelARMA Modeling and the Grassmannian The BrainNetwork CaseAuthors: Cong Ye, Konstantinos Slavakis, Pratik V. Patil, Johan Nakuci, Sarah F. Muldoon, John MedagliaComments: arXiv admin note: substantial text overlap with arXiv:1906.02292Subjects: Machine Learning (cs.LG); Social and Information Networks (cs.SI); Machine Learning (stat.ML)
This paper introduces a clustering framework for networks with nodes annotated with timeseries data. The framework addresses all types of networkclustering problems: State clustering, node clustering within states (a.k.a. topology identification or community detection), and even subnetworkstatesequence identification/tracking. Via a bottomup approach, features are first extracted from the raw nodal timeseries data by kernel autoregressivemovingaverage modeling to reveal nonlinear dependencies and lowrank representations, and then mapped onto the Grassmann manifold (Grassmannian). All clustering tasks are performed by leveraging the underlying Riemannian geometry of the Grassmannian in a novel way. To validate the proposed framework, brainnetwork clustering is considered, where extensive numerical tests on synthetic and real functional magnetic resonance imaging (fMRI) data demonstrate that the advocated learning framework compares favorably versus several stateoftheart clustering schemes.
 [174] arXiv:2002.09945 [pdf, other]

Title: On the Estimation of Complex Circuits Functional Failure Rate by Machine Learning TechniquesComments: arXiv admin note: text overlap with arXiv:2002.08882Subjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
DeRating or Vulnerability Factors are a major feature of failure analysis efforts mandated by today's Functional Safety requirements. Determining the Functional DeRating of sequential logic cells typically requires computationally intensive faultinjection simulation campaigns. In this paper a new approach is proposed which uses Machine Learning to estimate the Functional DeRating of individual flipflops and thus, optimising and enhancing fault injection efforts. Therefore, first, a set of perinstance features is described and extracted through an analysis approach combining static elements (cell properties, circuit structure, synthesis attributes) and dynamic elements (signal activity). Second, reference data is obtained through firstprinciples fault simulation approaches. Finally, one part of the reference dataset is used to train the Machine Learning algorithm and the remaining is used to validate and benchmark the accuracy of the trained tool. The intended goal is to obtain a trained model able to provide accurate perinstance Functional DeRating data for the full list of circuit instances, an objective that is difficult to reach using classical methods. The presented methodology is accompanied by a practical example to determine the performance of various Machine Learning models for different training sizes.
 [175] arXiv:2002.09949 [pdf, other]

Title: Path Outlines: Browsing PathBased Summaries of Linked Open DatasetsComments: 13 pages, 9 figuresSubjects: HumanComputer Interaction (cs.HC)
Linked Data (LD) are structured sources of information, such as DBpedia or Geonames, that can be linked together and queried. The information they contain is atomized into triples, each triple being a simple statement composed of a subject, a predicate and an object. Triples can then be combined to form higher level statements following information needs. This granularity makes it difficult to produce overviews of LD content. We therefore introduce the concept of pathbased summaries which carries a higher level of semantics for data producers. We also introduce the tool Path Outlines to support LD producers in browsing pathbased summaries of their datasets. We present its interface based on the broken (out)lines layout algorithm and the path browser visualisation.
Our approach, reifying chains of statements into path outlines, was informed by the observation of LD producers and we report a characterisation of their needs. We compare Path Outlines with the current baseline technique (Virtuoso SPARQL query editor) in an experiment with 36 participants. We show that participants prefer Path Outlines, find it easier to understand, easier to use, faster, and lowering the number of tasks that users giveup before completing them.  [176] arXiv:2002.09951 [pdf]

Title: MultiStream Networks and GroundTruth Generation for Crowd CountingComments: this https URLJournalref: The International Journal of Electrical and Computer Engineering Systems 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Crowd scene analysis has received a lot of attention recently due to the wide variety of applications, for instance, forensic science, urban planning, surveillance and security. In this context, a challenging task is known as crowd counting, whose main purpose is to estimate the number of people present in a single image. A MultiStream Convolutional Neural Network is developed and evaluated in this work, which receives an image as input and produces a density map that represents the spatial distribution of people in an endtoend fashion. In order to address complex crowd counting issues, such as extremely unconstrained scale and perspective changes, the network architecture utilizes receptive fields with different size filters for each stream. In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method based on tiny face detection and scale interpolation. Experiments conducted on two challenging datasets, UCFCC50 and ShanghaiTech, demonstrate that using our ground truth generation methods achieves superior results.
 [177] arXiv:2002.09956 [pdf, other]

Title: Derandomized PACBayes Margin Bounds: Applications to Nonconvex and Nonsmooth PredictorsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In spite of several notable efforts, explaining the generalization of deterministic deep nets, e.g., ReLUnets, has remained challenging. Existing approaches usually need to bound the Lipschitz constant of such deep nets but such bounds have been shown to increase substantially with the number of training samples yielding vacuous generalization bounds [Nagarajan and Kolter, 2019a]. In this paper, we present new derandomized PACBayes margin bounds for deterministic nonconvex and nonsmooth predictors, e.g., ReLUnets. The bounds depend on a tradeoff between the $L_2$norm of the weights and the effective curvature (`flatness') of the predictor, avoids any dependency on the Lipschitz constant, and yield meaningful (decreasing) bounds with increase in training set size. Our analysis first develops a derandomization argument for nonconvex but smooth predictors, e.g., linear deep networks (LDNs). We then consider nonsmooth predictors which for any given input realize as a smooth predictor, e.g., ReLUnets become some LDN for a given input, but the realized smooth predictor can be different for different inputs.
For such nonsmooth predictors, we introduce a new PACBayes analysis that maintains distributions over the structure as well as parameters of smooth predictors, e.g., LDNs corresponding to ReLUnets, which after derandomization yields a bound for the deterministic nonsmooth predictor. We present empirical results to illustrate the efficacy of our bounds over changing training set size and randomness in labels.  [178] arXiv:2002.09958 [pdf, other]

Title: Gradual Channel Pruning while Training using Feature Relevance Scores for Convolutional Neural NetworksComments: 10 pages, 6 figures, 5 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
The enormous inference cost of deep neural networks can be scaled down by network compression. Pruning is one of the predominant approaches used for deep network compression. However, existing pruning techniques have one or more of the following limitations: 1) Additional energy cost on top of the compute heavy training stage due to pruning and finetuning stages, 2) Layerwise pruning based on the statistics of a particular, ignoring the effect of error propagation in the network, 3) Lack of an efficient estimate for determining the important channels globally, 4) Unstructured pruning requires specialized hardware for effective use. To address all the above issues, we present a simpleyeteffective gradual channel pruning while training methodology using a novel data driven metric referred as Feature relevance score. The proposed technique gets rid of the additional retraining cycles by pruning least important channels in a structured fashion at fixed intervals during the actual training phase. Feature relevance scores help in efficiently evaluating the contribution of each channel towards the discriminative power of the network. We demonstrate the effectiveness of the proposed methodology on architectures such as VGG and ResNet using datasets such as CIFAR10, CIFAR100 and ImageNet, and successfully achieve significant model compression while trading off less than $1\%$ accuracy. Notably on CIFAR10 dataset trained on ResNet110, our approach achieves $2.4\times$ compression and a $56\%$ reduction in FLOPs with an accuracy drop of $0.01\%$ compared to the unpruned network.
 [179] arXiv:2002.09963 [pdf, other]

Title: Mitigating Class Boundary Label Uncertainty to Reduce Both Model Bias and VarianceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The study of model bias and variance with respect to decision boundaries is critically important in supervised classification. There is generally a tradeoff between the two, as finetuning of the decision boundary of a classification model to accommodate more boundary training samples (i.e., higher model complexity) may improve training accuracy (i.e., lower bias) but hurt generalization against unseen data (i.e., higher variance). By focusing on just classification boundary finetuning and model complexity, it is difficult to reduce both bias and variance. To overcome this dilemma, we take a different perspective and investigate a new approach to handle inaccuracy and uncertainty in the training data labels, which are inevitable in many applications where labels are conceptual and labeling is performed by human annotators. The process of classification can be undermined by uncertainty in the labels of the training data; extending a boundary to accommodate an inaccurately labeled point will increase both bias and variance. Our novel method can reduce both bias and variance by estimating the pointwise label uncertainty of the training set and accordingly adjusting the training sample weights such that those samples with high uncertainty are weighted down and those with low uncertainty are weighted up. In this way, uncertain samples have a smaller contribution to the objective function of the model's learning algorithm and exert less pull on the decision boundary. In a realworld physical activity recognition case study, the data presents many labeling challenges, and we show that this new approach improves model performance and reduces model variance.
 [180] arXiv:2002.09964 [pdf, other]

Title: Quantized Pushsum for Gossip and Decentralized Optimization over Directed GraphsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Signal Processing (eess.SP); Systems and Control (eess.SY)
We consider a decentralized stochastic learning problem where data points are distributed among computing nodes communicating over a directed graph. As the model size gets large, decentralized learning faces a major bottleneck that is the heavy communication load due to each node transmitting large messages (model updates) to its neighbors. To tackle this bottleneck, we propose the quantized decentralized stochastic learning algorithm over directed graphs that is based on the pushsum algorithm in decentralized consensus optimization. More importantly, we prove that our algorithm achieves the same convergence rates of the decentralized stochastic learning algorithm with exactcommunication for both convex and nonconvex losses. A key technical challenge of the work is to prove \emph{exact convergence} of the proposed decentralized learning algorithm in the presence of quantization noise with unbounded variance over directed graphs. We provide numerical evaluations that corroborate our main theoretical results and illustrate significant speedup compared to the exactcommunication methods.
 [181] arXiv:2002.09971 [pdf, other]

Title: Rapidly Personalizing Mobile Health Treatment Policies with Limited DataSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
In mobile health (mHealth), reinforcement learning algorithms that adapt to one's context without learning personalized policies might fail to distinguish between the needs of individuals. Yet the high amount of noise due to the in situ delivery of mHealth interventions can cripple the ability of an algorithm to learn when given access to only a single user's data, making personalization challenging. We present IntelligentPooling, which learns personalized policies via an adaptive, principled use of other users' data. We show that IntelligentPooling achieves an average of 26% lower regret than stateoftheart across all generative models. Additionally, we inspect the behavior of this approach in a live clinical trial, demonstrating its ability to learn from even a small group of users.
 [182] arXiv:2002.09972 [pdf, other]

Title: Structural Parameterizations with Modulator OblivionSubjects: Data Structures and Algorithms (cs.DS)
It is known that problems like Vertex Cover, Feedback Vertex Set and Odd Cycle Transversal are polynomial time solvable in the class of chordal graphs. We consider these problems in a graph that has at most $k$ vertices whose deletion results in a chordal graph, when parameterized by $k$. While this investigation fits naturally into the recent trend of what are called `structural parameterizations', here we assume that the deletion set is not given.
One method to solve them is to compute a $k$sized or an approximate ($f(k)$ sized, for a function $f$) chordal vertex deletion set and then use the structural properties of the graph to design an algorithm. This method leads to at least $k^{\mathcal{O}(k)}n^{\mathcal{O}(1)}$ running time when we use the known parameterized or approximation algorithms for finding a $k$sized chordal deletion set on an $n$ vertex graph.
In this work, we design $2^{\mathcal{O}(k)}n^{\mathcal{O}(1)}$ time algorithms for these problems. Our algorithms do not compute a chordal vertex deletion set (or even an approximate solution). Instead, we construct a tree decomposition of the given graph in time $2^{\mathcal{O}(k)}n^{\mathcal{O}(1)}$ where each bag is a union of four cliques and $\mathcal{O}(k)$ vertices. We then apply standard dynamic programming algorithms over this special tree decomposition. This special tree decomposition can be of independent interest.
Our algorithms are adaptive (robust) in the sense that given an integer $k$, they detect whether the graph has a chordal vertex deletion set of size at most $k$ or output the special tree decomposition and solve the problem.
We also show lower bounds for the problems we deal with under the Strong Exponential Time Hypothesis (SETH).  [183] arXiv:2002.09979 [pdf, other]

Title: GaussianProcessbased Robot Learning from DemonstrationComments: 7 pages, 10 figuresSubjects: Robotics (cs.RO)
Endowed with higher levels of autonomy, robots are required to perform increasingly complex manipulation tasks. Learning from demonstration is arising as a promising paradigm for easily extending robot capabilities so that they adapt to unseen scenarios. We present a novel GaussianProcessbased approach for learning manipulation skills from observations of a human teacher. This probabilistic representation allows to generalize over multiple demonstrations, and encode uncertainty variability along the different phases of the task. In this paper, we address how Gaussian Processes can be used to effectively learn a policy from trajectories in task space. We also present a method to efficiently adapt the policy to fulfill new requirements, and to modulate the robot behavior as a function of task uncertainty. This approach is illustrated through a realworld application using the TIAGo robot.
 [184] arXiv:2002.09989 [pdf, other]

Title: Deriving a UsageIndependent Software Quality MetricSubjects: Software Engineering (cs.SE)
Context:The extent of postrelease use of software affects the number of faults, thus biasing quality metrics and adversely affecting associated decisions. The proprietary nature of usage data limited deeper exploration of this subject in the past. Objective: To determine how software faults and software use are related and how an accurate quality measure can be designed. Method: New users, usage intensity, usage frequency, exceptions, and release date and duration measured for complex proprietary mobile applications for Android and iOS. Utilized Bayesian Network and Random Forest models to explain the interrelationships and to derive the usage independent release quality measure. Investigated the interrelationship among various code complexity measures, usage (downloads), and number of issues for 520 NPM packages and derived a usageindependent quality measure from these analyses, applied it on 4430 popular NPM packages to construct timelines for comparing the perceived quality (issues) and our derived measure of quality for these packages.Results: We found the number of new users to be the primary factor determining the number of exceptions, and found no direct link between the intensity and frequency of software usage and software faults. Release quality expressed as crashes per user was independent of other usagerelated predictors, thus serving as a usage independent measure of software quality. Usage also affected quality in NPM, where downloads were strongly associated with numbers of issues, even after taking the other code complexity measures into consideration. Conclusions: We expect our result and our proposed quality measure will help gauge release quality of a software more accurately and inspire further research in this area.
 [185] arXiv:2002.10002 [pdf, other]

Title: On Thompson Sampling with Langevin AlgorithmsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Thompson sampling is a methodology for multiarmed bandit problems that is known to enjoy favorable performance in both theory and practice. It does, however, have a significant limitation computationally, arising from the need for samples from posterior distributions at every iteration. We propose two Markov Chain Monte Carlo (MCMC) methods tailored to Thompson sampling to address this issue. We construct quickly converging Langevin algorithms to generate approximate samples that have accuracy guarantees, and we leverage novel posterior concentration rates to analyze the regret of the resulting approximate Thompson sampling algorithm. Further, we specify the necessary hyperparameters for the MCMC procedure to guarantee optimal instancedependent frequentist regret while having low computational complexity. In particular, our algorithms take advantage of both posterior concentration and a sample reuse mechanism to ensure that only a constant number of iterations and a constant amount of data is needed in each round. The resulting approximate Thompson sampling algorithm has logarithmic regret and its computational complexity does not scale with the time horizon of the algorithm.
 [186] arXiv:2002.10003 [pdf, ps, other]

Title: NeurIPS 2019 Disentanglement Challenge: Improved Disentanglement through Aggregated Convolutional Feature MapsAuthors: Maximilian SeitzerComments: Disentanglement Challenge  33rd Conference on Neural Information Processing Systems (NeurIPS)  NeurIPS 2019Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
This report to our stage 1 submission to the NeurIPS 2019 disentanglement challenge presents a simple image preprocessing method for training VAEs leading to improved disentanglement compared to directly using the images. In particular, we propose to use regionally aggregated feature maps extracted from CNNs pretrained on ImageNet. Our method achieved the 2nd place in stage 1 of the challenge. Code is available at https://github.com/mseitzer/neurips2019disentanglementchallenge.
 [187] arXiv:2002.10006 [pdf, other]

Title: Comparing the Parameter Complexity of Hypernetworks and the EmbeddingBased AlternativeSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In the context of learning to map an input $I$ to a function $h_I:\mathcal{X}\to \mathbb{R}$, we compare two alternative methods: (i) an embeddingbased method, which learns a fixed function in which $I$ is encoded as a conditioning signal $e(I)$ and the learned function takes the form $h_I(x) = q(x,e(I))$, and (ii) hypernetworks, in which the weights $\theta_I$ of the function $h_I(x) = g(x;\theta_I)$ are given by a hypernetwork $f$ as $\theta_I=f(I)$.
We extend the theory of~\cite{devore} and provide a lower bound on the complexity of neural networks as function approximators, i.e., the number of trainable parameters. This extension, eliminates the requirements for the approximation method to be robust. Our results are then used to compare the complexities of $q$ and $g$, showing that under certain conditions and when letting the functions $e$ and $f$ be as large as we wish, $g$ can be smaller than $q$ by orders of magnitude. In addition, we show that for typical assumptions on the function to be approximated, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.  [188] arXiv:2002.10007 [pdf, other]

Title: A Critical View of the Structural Causal ModelSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In the univariate case, we show that by comparing the individual complexities of univariate cause and effect, one can identify the cause and the effect, without considering their interaction at all. In our framework, complexities are captured by the reconstruction error of an autoencoder that operates on the quantiles of the distribution. Comparing the reconstruction errors of the two autoencoders, one for each variable, is shown to perform surprisingly well on the accepted causality directionality benchmarks. Hence, the decision as to which of the two is the cause and which is the effect may not be based on causality but on complexity.
In the multivariate case, where one can ensure that the complexities of the cause and effect are balanced, we propose a new adversarial training method that mimics the disentangled structure of the causal model. We prove that in the multidimensional case, such modeling is likely to fit the data only in the direction of causality. Furthermore, a uniqueness result shows that the learned model is able to identify the underlying causal and residual (noise) components. Our multidimensional method outperforms the literature methods on both synthetic and real world datasets.  [189] arXiv:2002.10009 [pdf, other]

Title: Fighting Fire with Light: A Case for Defending DDoS Attacks Using the Optical LayerComments: 6 pages, 4 figuresSubjects: Cryptography and Security (cs.CR)
The DDoS attack landscape is growing at an unprecedented pace. Inspired by the recent advances in optical networking, we make a case for optical layeraware DDoS defense (OLAD) in this paper. Our approach leverages the optical layer to isolate attack traffic rapidly via dynamic reconfiguration of (backup) wavelengths using ROADMsbridging the gap between (a) evolution of the DDoS attack landscape and (b) innovations in the optical layer (e.g., reconfigurable optics). We show that the physical separation of traffic profiles allows finergrained handling of suspicious flows and offers better performance for benign traffic in the face of an attack. We present preliminary results modeling throughput and latency for legitimate flows while scaling the strength of attacks. We also identify a number of open problems for the security, optical, and systems communities: modeling diverse DDoS attacks (e.g., fixed vs. variable rate, detectable vs. undetectable), building a fullfledged defense system with optical advancements (e.g., OpenConfig), and optical layeraware defenses for a broader class of attacks (e.g., network reconnaissance).
 [190] arXiv:2002.10010 [pdf, other]

Title: Driving with Data in the Motor City: Mining and Modeling Vehicle Fleet Maintenance DataAuthors: Josh Gardner, Jawad Mroueh, Natalia Jenuwine, Noah Weaverdyck, Samuel Krassenstein, Arya Farahi, Danai KoutraSubjects: Computers and Society (cs.CY)
The City of Detroit maintains an active fleet of over 2500 vehicles, spending an annual average of over \$5 million on purchases and over \$7.7 million on maintenance. Modeling patterns and trends in this data is of particular importance to a variety of stakeholders, particularly as Detroit emerges from Chapter 9 bankruptcy, but the structure in such data is complex, and the city lacks dedicated resources for indepth analysis. The City of Detroit's Operations and Infrastructure Group and the University of Michigan initiated a collaboration which seeks to address this unmet need by analyzing data from the City of Detroit's vehicle fleet. This work presents a case study and provides the first datadriven benchmark, demonstrating a suite of methods to aid in data understanding and prediction for large vehicle maintenance datasets. We present analyses to address three key questions raised by the stakeholders, related to discovering multivariate maintenance patterns over time; predicting maintenance; and predicting vehicle and fleetlevel costs. We present a novel algorithm, PRISM, for automating multivariate sequential data analyses using tensor decomposition. This work is a first of its kind that presents both methodologies and insights to guide future civic data research.
 [191] arXiv:2002.10011 [pdf, other]

Title: Geometric Algebra Power Theory (GAPoT): Revisiting Apparent Power under NonSinusoidal ConditionsAuthors: Francisco Gil Montoya, Alfredo Alcayde, Francisco ArrabalCampos, Raul Baños, Javier RoldánPérezSubjects: Systems and Control (eess.SY)
Traditional power theories and one of their most important concepts apparent power are still a source of debate and, as shown in the literature, they present several flaws that misinterpret the powertransfer phenomena under distorted grid conditions. In recent years, advanced mathematical tools such as geometric algebra (GA) have been applied to address these issues. However, the application of GA to electrical circuits requires more consensus, improvements and refinement. In this paper, power theories based on GA are revisited. Several drawbacks and inconsistencies of previous works are identified and modifications to the socalled geometric algebra power theory (GAPoT) are presented. This theory takes into account power components generated by crossproducts between current and voltage harmonics in the frequency domain. Compared to other theories based on GA, it is compatible with the traditional definition of apparent power calculated as the product of RMS voltage and current. Also, mathematical developments are done in a multidimensional Euclidean space where the energy conservation principle is satisfied. The paper includes a basic example and experimental results in which measurements from a utility supply are analysed. Finally, suggestions for the extension to threephase systems are drawn.
 [192] arXiv:2002.10016 [pdf, other]

Title: Deep Multimodal ImageText Embeddings for Automatic CrossMedia RetrievalAuthors: Hadi Abdi Khojasteh (1), Ebrahim Ansari (1 and 2), Parvin Razzaghi (1 and 3), Akbar Karimi (4) ((1) Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran, (2) Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics, Charles University, Czechia, (3) Institute for Research in Fundamental Sciences (IPM), Tehran, Iran, (4) IMP Lab, Department of Engineering and Architecture, University of Parma, Parma, Italy)Comments: 6 pages and 2 figures, Learn more about this project at this https URLSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
This paper considers the task of matching images and sentences by learning a visualtextual embedding space for crossmodal retrieval. Finding such a space is a challenging task since the features and representations of text and image are not comparable. In this work, we introduce an endtoend deep multimodal convolutionalrecurrent network for learning both vision and language representations simultaneously to infer imagetext similarity. The model learns which pairs are a match (positive) and which ones are a mismatch (negative) using a hingebased triplet ranking. To learn about the joint representations, we leverage our newly extracted collection of tweets from Twitter. The main characteristic of our dataset is that the images and tweets are not standardized the same as the benchmarks. Furthermore, there can be a higher semantic correlation between the pictures and tweets contrary to benchmarks in which the descriptions are wellorganized. Experimental results on MSCOCO benchmark dataset show that our model outperforms certain methods presented previously and has competitive performance compared to the stateoftheart. The code and dataset have been made available publicly.
 [193] arXiv:2002.10018 [pdf, ps, other]

Title: Distributed Quantum Proofs for Replicated DataSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Quantum Physics (quantph)
The paper tackles the issue of $\textit{checking}$ that all copies of a large data set replicated at several nodes of a network are identical. The fact that the replicas may be located at distant nodes prevents the system from verifying their equality locally, i.e., by having each node consult only nodes in its vicinity. On the other hand, it remains possible to assign $\textit{certificates}$ to the nodes, so that verifying the consistency of the replicas can be achieved locally. However, we show that, as the data set is large, classical certification mechanisms, including distributed MerlinArthur protocols, cannot guarantee good completeness and soundness simultaneously, unless they use very large certificates. The main result of this paper is a distributed $\textit{quantum}$ MerlinArthur protocol enabling the nodes to collectively check the consistency of the replicas, based on small certificates, and in a single round of message exchange between neighbors, with short messages. In particular, the certificatesize is logarithmic in the size of the data set, which gives an exponential advantage over classical certification mechanisms.
 [194] arXiv:2002.10021 [pdf, other]

Title: How Transferable are the Representations Learned by Deep Q Agents?Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
In this paper, we consider the source of Deep Reinforcement Learning (DRL)'s sample complexity, asking how much derives from the requirement of learning useful representations of environment states and how much is due to the sample complexity of learning a policy. While for DRL agents, the distinction between representation and policy may not be clear, we seek new insight through a set of transfer learning experiments. In each experiment, we retain some fraction of layers trained on either the same game or a related game, comparing the benefits of transfer learning to learning a policy from scratch. Interestingly, we find that benefits due to transfer are highly variable in general and nonsymmetric across pairs of tasks. Our experiments suggest that perhaps transfer from simpler environments can boost performance on more complex downstream tasks and that the requirements of learning a useful representation can range from negligible to the majority of the sample complexity, based on the environment. Furthermore, we find that finetuning generally outperforms training with the transferred layers frozen, confirming an insight first noted in the classification setting.
 [195] arXiv:2002.10025 [pdf, other]

Title: Triple Wins: Boosting Accuracy, Robustness and Efficiency Together by Enabling InputAdaptive InferenceSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Deep networks were recently suggested to face the odds between accuracy (on clean natural images) and robustness (on adversarially perturbed images) (Tsipras et al., 2019). Such a dilemma is shown to be rooted in the inherently higher sample complexity (Schmidt et al., 2018) and/or model capacity (Nakkiran, 2019), for learning a highaccuracy and robust classifier. In view of that, give a classification task, growing the model capacity appears to help draw a winwin between accuracy and robustness, yet at the expense of model size and latency, therefore posing challenges for resourceconstrained applications. Is it possible to codesign model accuracy, robustness and efficiency to achieve their triple wins? This paper studies multiexit networks associated with inputadaptive efficient inference, showing their strong promise in achieving a "sweet point" in cooptimizing model accuracy, robustness and efficiency. Our proposed solution, dubbed Robust Dynamic Inference Networks (RDINets), allows for each input (either clean or adversarial) to adaptively choose one of the multiple output layers (early branches or the final one) to output its prediction. That multiloss adaptivity adds new variations and flexibility to adversarial attacks and defenses, on which we present a systematical investigation. We show experimentally that by equipping existing backbones with such robust adaptive inference, the resulting RDINets can achieve better accuracy and robustness, yet with over 30% computational savings, compared to the defended original models.
 [196] arXiv:2002.10029 [pdf, other]

Title: Symbolic Querying of Vector Spaces: Probabilistic Databases Meets Relational EmbeddingsSubjects: Artificial Intelligence (cs.AI); Databases (cs.DB)
To deal with increasing amounts of uncertainty and incompleteness in relational data, we propose unifying techniques from probabilistic databases and relational embedding models. We use probabilistic databases as our formalism to define the probabilistic model with respect to which all queries are done. This allows us to leverage the rich literature of theory and algorithms from probabilistic databases for solving problems. While this formalization can be used with any relational embedding model, the lack of a well defined joint probability distribution causes simple problems to become provably hard. With this in mind, we introduce \TO, a relational embedding model designed in terms of probabilistic databases to exploit typical embedding assumptions within the probabilistic framework. Using principled, efficient inference algorithms that can be derived from its definition, we empirically demonstrate that \TOs is an effective and general model for these tasks.
 [197] arXiv:2002.10033 [pdf]

Title: Citations Systematically Misrepresent the Quality and Impact of Research Articles: Survey and Experimental Evidence from Thousands of CitersSubjects: Social and Information Networks (cs.SI)
Citations are ubiquitous in evaluating research, but how exactly they relate to what they are thought to measure (quality and intellectual impact) is unclear. We investigate the relationships between citations, quality, and impact using a survey with an embedded experiment in which 12,670 authors in 15 academic fields describe about 25K specific referencing decisions. Results suggest that citation counts, when equated with quality and impact, are biased in opposite directions. First, experimentally exposing papers' actual citation counts during the survey causes respondents to perceive all but the top 10% cited papers as of lower quality. Because perceptions of quality are a key factor in citing decisions, citation counts are likely to endogenously cause more citing of top papers and equating them with quality overestimates the actual quality of those papers. Conversely, 54% of references had either zero or minor influence on authors who cite them, but references to highly cited papers were about 200% more likely to denote substantial impact. Equating citations with impact thus underestimates the impact of highly cited papers. Real citation practices thus reveal that citations are biased measures of quality and impact.
 [198] arXiv:2002.10035 [pdf, ps, other]

Title: A Note on EchelonFerrers ConstructionSubjects: Information Theory (cs.IT)
EchelonFerrers is one of important techniques to help researchers to improve lower bounds for constantdimension codes. Fagang Li [6] combined the linkage construction and echelonFerrers to obtain some new lower bounds of constantdimension codes. However, this method seems incorrect since we found a counterexample.
 [199] arXiv:2002.10039 [pdf, other]

Title: Computing BiLipschitz Outlier Embeddings into the LineSubjects: Data Structures and Algorithms (cs.DS)
The problem of computing a biLipschitz embedding of a graphical metric into the line with minimum distortion has received a lot of attention. The bestknown approximation algorithm computes an embedding with distortion $O(c^2)$, where $c$ denotes the optimal distortion [B\u{a}doiu \etal~2005]. We present a bicriteria approximation algorithm that extends the above results to the setting of \emph{outliers}.
Specifically, we say that a metric space $(X,\rho)$ admits a $(k,c)$embedding if there exists $K\subset X$, with $K=k$, such that $(X\setminus K, \rho)$ admits an embedding into the line with distortion at most $c$. Given $k\geq 0$, and a metric space that admits a $(k,c)$embedding, for some $c\geq 1$, our algorithm computes a $({\mathsf p}{\mathsf o}{\mathsf l}{\mathsf y}(k, c, \log n), {\mathsf p}{\mathsf o}{\mathsf l}{\mathsf y}(c))$embedding in polynomial time. This is the first algorithmic result for outlier biLipschitz embeddings. Prior to our work, comparable outlier embeddings where known only for the case of additive distortion.  [200] arXiv:2002.10043 [pdf, other]

Title: Complete Dictionary Learning via $\ell_p$norm MaximizationSubjects: Machine Learning (cs.LG); Information Theory (cs.IT); Signal Processing (eess.SP); Machine Learning (stat.ML)
Dictionary learning is a classic representation learning method that has been widely applied in signal processing and data analytics. In this paper, we investigate a family of $\ell_p$norm ($p>2,p \in \mathbb{N}$) maximization approaches for the complete dictionary learning problem from theoretical and algorithmic aspects. Specifically, we prove that the global maximizers of these formulations are very close to the true dictionary with high probability, even when Gaussian noise is present. Based on the generalized power method (GPM), an efficient algorithm is then developed for the $\ell_p$based formulations. We further show the efficacy of the developed algorithm: for the population GPM algorithm over the sphere constraint, it first quickly enters the neighborhood of a global maximizer, and then converges linearly in this region. Extensive experiments will demonstrate that the $\ell_p$based approaches enjoy a higher computational efficiency and better robustness than conventional approaches and $p=3$ performs the best.
 [201] arXiv:2002.10045 [pdf, ps, other]

Title: Optimal Advertising for Information ProductsSubjects: Computer Science and Game Theory (cs.GT); Theoretical Economics (econ.TH)
When selling information, sometimes the seller can increase the revenue by giving away some partial information to change the buyer's belief about the information product, so the buyer may be more willing to purchase. This work studies the general problem of advertising information products by revealing some partial information. We consider a buyer who needs to make a decision, the outcome of which depends on the state of the world that is unknown to the buyer. There is an information seller who has access to information about the state of the world. The seller can advertise the information by revealing some partial information. We consider a seller who chooses an advertising strategy and then commits to it. The buyer decides whether to purchase the full information product after seeing the partial information. The seller's goal is to maximize the expected revenue. We prove that finding the optimal advertising strategy is hard, even in the simple case that the buyer type is known. Nevertheless, we show that when the buyer type is known, the problem is equivalent to finding the concave closure of a function. Based on this observation, we prove some properties of the optimal mechanism, which allow us to solve the optimal mechanism by a convex program (with exponential size in general, polynomial size for special cases). We also prove some interesting characterizations of the optimal mechanisms based on these properties. For the general problem when the seller only knows the type distribution of the buyer, it is NPhard to find a constant factor approximation. We thus look at special cases and provide an approximation algorithm that finds an $\varepsilon$suboptimal mechanism when it is not too hard to predict the possible type of buyer who will make the purchase.
 [202] arXiv:2002.10047 [pdf, other]

Title: Parallel Clique Counting and Peeling AlgorithmsSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Dense subgraphs capture strong communities in social networks and entities possessing strong interactions in biological networks. In particular, $k$clique counting and listing have applications in identifying important actors in a graph. However, finding $k$cliques is computationally expensive, and thus it is important to have fast parallel algorithms.
We present a new parallel algorithm for $k$clique counting that has polylogarithmic span and is workefficient with respect to the wellknown sequential algorithm for $k$clique listing by Chiba and Nishizeki. Our algorithm can be extended to support listing and enumeration, and is based on computing low outdegree orientations. We present a new linearwork and polylogarithmic span algorithm for computing such orientations, and new parallel algorithms for producing unbiased estimations of clique counts. Finally, we design new parallel workefficient algorithms for approximating the $k$clique densest subgraph. Our first algorithm gives a $1/k$approximation and is based on iteratively peeling vertices with the lowest clique counts; our algorithm is workefficient, but we prove that this process is Pcomplete and hence does not have polylogarithmic span. Our second algorithm gives a $1/(k(1+\epsilon))$approximation, is workefficient, and has polylogarithmic span.
In addition, we implement these algorithms and propose optimizations. On a 60core machine, we achieve 13.2338.99x and 1.1913.76x selfrelative parallel speedup for $k$clique counting and $k$clique densest subgraph, respectively. Compared to the stateoftheart parallel $k$clique counting algorithms, we achieve a 1.319.88x speedup, and compared to existing implementations of $k$clique densest subgraph, we achieve a 1.0111.83x speedup. We are able to compute the $4$clique counts on the largest publiclyavailable graph with over two hundred billion edges.  [203] arXiv:2002.10055 [pdf, other]

Title: Ensuring Privacy in LocationBased Services: A Modelbased ApproachSubjects: Cryptography and Security (cs.CR)
In recent years, the widespread of mobile devices equipped with GPS and communication chips has led to the growing use of locationbased services (LBS) in which a user receives a service based on his current location. The disclosure of user's location, however, can raise serious concerns about user privacy in general, and location privacy in particular which led to the development of various location privacypreserving mechanisms aiming to enhance the location privacy while using LBS applications. In this paper, we propose to model the user mobility pattern and utility of the LBS as a Markov decision process (MDP), and inspired by probabilistic current state opacity notation, we introduce a new location privacy metric, namely $\epsilon$privacy, that quantifies the adversary belief over the user's current location. We exploit this dynamic model to design a LPPM that while it ensures the utility of service is being fully utilized, independent of the adversary prior knowledge about the user, it can guarantee a userspecified privacy level can be achieved for an infinite time horizon. The overall privacypreserving framework, including the construction of the user mobility model as a MDP, and design of the proposed LPPM, are demonstrated and validated with realworld experimental data.
 [204] arXiv:2002.10059 [pdf, other]

Title: Cooperative Adaptive Learning Control for A Group of Nonholonomic UGVs by Output FeedbackSubjects: Systems and Control (eess.SY)
A highgain observerbased cooperative deterministic learning (CDL) control algorithm is proposed in this chapter for a group of identical unicycletype unmanned ground vehicles (UGVs) to track over desired reference trajectories. For the vehicle states, the positions of the vehicles can be measured, while the velocities are estimated using the highgain observer. For the trajectory tracking controller, the radial basis function (RBF) neural network (NN) is used to online estimate the unknown dynamics of the vehicle, and the NN weight convergence and estimation accuracy is guaranteed by CDL. The major challenge and novelty of this chapter is to track the reference trajectory using this observerbased CDL algorithm without the full knowledge of the vehicle state and vehicle model. In addition, any vehicle in the system is able to learn the knowledge of unmodeled dynamics along the union of trajectories experienced by all vehicle agents, such that the learned knowledge can be reused to follow any reference trajectory defined in the learning phase. The learningbased tracking convergence and consensus learning results, as well as using learned knowledge for tracking experienced trajectories, are shown using the Lyapunov method. Simulation is given to show the effectiveness of this algorithm.
 [205] arXiv:2002.10061 [pdf, other]

Title: Rethinking 1DCNN for Time Series Classification: A Stronger BaselineSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
For time series classification task using 1DCNN, the selection of kernel size is critically important to ensure the model can capture the right scale salient signal from a long timeseries. Most of the existing work on 1DCNN treats the kernel size as a hyperparameter and tries to find the proper kernel size through a grid search which is timeconsuming and is inefficient. This paper theoretically analyses how kernel size impacts the performance of 1DCNN. Considering the importance of kernel size, we propose a novel OmniScale 1DCNN (OSCNN) architecture to capture the proper kernel size during the model learning period. A specific design for kernel size configuration is developed which enables us to assemble very few kernelsize options to represent more receptive fields. The proposed OSCNN method is evaluated using the UCR archive with 85 datasets. The experiment results demonstrate that our method is a stronger baseline in multiple performance indicators, including the critical difference diagram, counts of wins, and average accuracy. We also published the experimental source codes at GitHub (https://github.com/WensiTang/OSCNN/).
 [206] arXiv:2002.10064 [pdf, other]

Title: Exploring the Connection Between Binary and Spiking Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Onchip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks  both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on largescale datasets like CIFAR$100$ and ImageNet. An important implication of this work is that Binary Spiking Neural Networks can be enabled by "InMemory" hardware accelerators catered for Binary Neural Networks without suffering any accuracy degradation due to binarization. We utilize standard training techniques for nonspiking networks to generate our spiking networks by conversion process and also perform an extensive empirical analysis and explore simple designtime and runtime optimization techniques for reducing inference latency of spiking networks (both for binary and fullprecision models) by an order of magnitude over prior work.
 [207] arXiv:2002.10066 [pdf, other]

Title: Learning From Strategic Agents: Accuracy, Improvement, and CausalityComments: 18 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
In many predictive decisionmaking scenarios, such as credit scoring and academic testing, a decisionmaker must construct a model (predicting some outcome) that accounts for agents' incentives to "game" their features in order to receive better decisions. Whereas the strategic classification literature generally assumes that agents' outcomes are not causally dependent on their features (and thus strategic behavior is a form of lying), we join concurrent work in modeling agents' outcomes as a function of their changeable attributes. Our formulation is the first to incorporate a crucial phenomenon: when agents act to change observable features, they may as a side effect perturb hidden features that causally affect their true outcomes.
We consider three distinct desiderata for a decisionmaker's model: accurately predicting agents' postgaming outcomes (accuracy), incentivizing agents to improve these outcomes (improvement), and, in the linear setting, estimating the visible coefficients of the true causal model (causal precision). As our main contribution, we provide the first algorithms for learning accuracyoptimizing, improvementoptimizing, and causalprecisionoptimizing linear regression models directly from data, without prior knowledge of agents' possible actions. These algorithms circumvent the hardness result of Miller et al. (2019) by allowing the decision maker to observe agents' responses to a sequence of decision rules, in effect inducing agents to perform causal interventions for free.  [208] arXiv:2002.10069 [pdf, other]

Title: Robust LearningBased Control via Bootstrapped Multiplicative NoiseSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Dynamical Systems (math.DS); Optimization and Control (math.OC); Machine Learning (stat.ML)
Despite decades of research and recent progress in adaptive control and reinforcement learning, there remains a fundamental lack of understanding in designing controllers that provide robustness to inherent nonasymptotic uncertainties arising from models estimated with finite, noisy data. We propose a robust adaptive control algorithm that explicitly incorporates such nonasymptotic uncertainties into the control design. The algorithm has three components: (1) a leastsquares nominal model estimator; (2) a bootstrap resampling method that quantifies nonasymptotic variance of the nominal model estimate; and (3) a nonconventional robust control design method using an optimal linear quadratic regulator (LQR) with multiplicative noise. A key advantage of the proposed approach is that the system identification and robust control design procedures both use stochastic uncertainty representations, so that the actual inherent statistical estimation uncertainty directly aligns with the uncertainty the robust controller is being designed against. We show through numerical experiments that the proposed robust adaptive controller can significantly outperform the certainty equivalent controller on both expected regret and measures of regret risk.
 [209] arXiv:2002.10070 [pdf, other]

Title: An Overlapping Domain Decomposition Framework without Dual Formulation for Variational Imaging ProblemsAuthors: Jongho ParkComments: 24 pages, 7 figuresSubjects: Numerical Analysis (math.NA)
In this paper, we propose a novel overlapping domain decomposition method that can be applied to various problems in variational imaging such as total variation minimization. Most of recent domain decomposition methods for total variation minimization adopt the FenchelRockafellar duality, whereas the proposed method is based on the primal formulation. Thus, the proposed method can be applied not only to total variation minimization but also to those with complex dual problems such as higher order models. In the proposed method, an equivalent formulation of the model problem with parallel structure is constructed using a custom overlapping domain decomposition scheme with the notion of essential domains. As a solver for the constructed formulation, we propose a decoupled augmented Lagrangian method for untying the coupling of adjacent subdomains. Convergence analysis of the decoupled augmented Lagrangian method is provided. We present implementation details and numerical examples for various model problems including total variation minimizations and higher order models.
 [210] arXiv:2002.10072 [pdf, other]

Title: Reconfigurable Intelligent Surface Assisted Multiuser MISO Systems Exploiting Deep Reinforcement LearningComments: 12 pages. Accepted by IEEE JSAC special issue on Multiple Antenna Technologies for Beyond 5GSubjects: Information Theory (cs.IT); Machine Learning (cs.LG)
Recently, the reconfigurable intelligent surface (RIS), benefited from the breakthrough on the fabrication of programmable metamaterial, has been speculated as one of the key enabling technologies for the future six generation (6G) wireless communication systems scaled up beyond massive multiple input multiple output (MassiveMIMO) technology to achieve smart radio environments. Employed as reflecting arrays, RIS is able to assist MIMO transmissions without the need of radio frequency chains resulting in considerable reduction in power consumption. In this paper, we investigate the joint design of transmit beamforming matrix at the base station and the phase shift matrix at the RIS, by leveraging recent advances in deep reinforcement learning (DRL). We first develop a DRL based algorithm, in which the joint design is obtained through trialanderror interactions with the environment by observing predefined rewards, in the context of continuous state and action. Unlike the most reported works utilizing the alternating optimization techniques to alternatively obtain the transmit beamforming and phase shifts, the proposed DRL based algorithm obtains the joint design simultaneously as the output of the DRL neural network. Simulation results show that the proposed algorithm is not only able to learn from the environment and gradually improve its behavior, but also obtains the comparable performance compared with two stateoftheart benchmarks. It is also observed that, appropriate neural network parameter settings will improve significantly the performance and convergence rate of the proposed algorithm.
 [211] arXiv:2002.10077 [pdf, other]

Title: Approximate Data Deletion from Machine Learning Models: Algorithms and EvaluationsComments: 20 pages, 2 figures, under review by ICMLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Deleting data from a trained machine learning (ML) model is a critical task in many applications. For example, we may want to remove the influence of training points that might be out of date or outliers. Regulations such as EU's General Data Protection Regulation also stipulate that individuals can request to have their data deleted. The naive approach to data deletion is to retrain the ML model on the remaining data, but this is too time consuming. Moreover there is no known efficient algorithm that exactly deletes data from most ML models. In this work, we evaluate several approaches for approximate data deletion from trained models. For the case of linear regression, we propose a new method with linear dependence on the feature dimension $d$, a significant gain over all existing methods which all have superlinear time dependence on the dimension. We also provide a new test for evaluating data deletion from linear models.
 [212] arXiv:2002.10078 [pdf, other]

Title: TrojanNet: Embedding Hidden Trojan Horse Models in Neural NetworksSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The complexity of largescale neural networks can lead to poor understanding of their internal details. We show that this opaqueness provides an opportunity for adversaries to embed unintended functionalities into the network in the form of Trojan horses. Our novel framework hides the existence of a Trojan network with arbitrary desired functionality within a benign transport network. We prove theoretically that the Trojan network's detection is computationally infeasible and demonstrate empirically that the transport network does not compromise its disguise. Our paper exposes an important, previously unknown loophole that could potentially undermine the security and trustworthiness of machine learning.
 [213] arXiv:2002.10080 [pdf, other]

Title: Sparse Optimization for Green Edge AI InferenceComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
With the rapid upsurge of deep learning tasks at the network edge, effective edge artificial intelligence (AI) inference becomes critical to provide lowlatency intelligent services for mobile users via leveraging the edge computing capability. In such scenarios, energy efficiency becomes a primary concern. In this paper, we present a joint inference task selection and downlink beamforming strategy to achieve energyefficient edge AI inference through minimizing the overall power consumption consisting of both computation and transmission power consumption, yielding a mixed combinatorial optimization problem. By exploiting the inherent connections between the set of task selection and group sparsity structural transmit beamforming vector, we reformulate the optimization as a group sparse beamforming problem. To solve this challenging problem, we propose a logsum function based threestage approach. By adopting the logsum function to enhance the group sparsity, a proximal iteratively reweighted algorithm is developed. Furthermore, we establish the global convergence analysis and provide the ergodic worstcase convergence rate for this algorithm. Simulation results will demonstrate the effectiveness of the proposed approach for improving energy efficiency in edge AI inference systems.
 [214] arXiv:2002.10081 [pdf, other]

Title: Toward a mathematical theory of the crystallographic phase retrieval problemSubjects: Information Theory (cs.IT); Algebraic Geometry (math.AG)
Motivated by the Xray crystallography technology to determine the atomic structure of biological molecules, we study the crystallographic phase retrieval problem, arguably the leading and hardest phase retrieval setup. This problem entails recovering a Ksparse signal of length N from its Fourier magnitude or, equivalently, from its periodic autocorrelation. Specifically, this work focuses on the fundamental question of uniqueness: what is the maximal sparsity level K/N that allows unique mapping between a signal and its Fourier magnitude, up to intrinsic symmetries. We design a systemic computational technique to affirm uniqueness for any specific pair (K,N), and establish the following conjecture: the Fourier magnitude determines a generic signal uniquely, up to intrinsic symmetries, as long as K<=N/2. Based on grouptheoretic considerations and an additional computational technique, we formulate a second conjecture: if K<N/2, then for any signal the set of solutions to the crystallographic phase retrieval problem has measure zero in the set of all signals with a given Fourier magnitude. Together, these conjectures constitute the first attempt to establish a mathematical theory for the crystallographic phase retrieval problem.
 [215] arXiv:2002.10083 [pdf, other]

Title: Optimizing High Performance Markov Clustering for PreExascale ArchitecturesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
HipMCL is a highperformance distributed memory implementation of the popular Markov Cluster Algorithm (MCL) and can cluster largescale networks within hours using a few thousand CPUequipped nodes. It relies on sparse matrix computations and heavily makes use of the sparse matrixsparse matrix multiplication kernel (SpGEMM). The existing parallel algorithms in HipMCL are not scalable to Exascale architectures, both due to their communication costs dominating the runtime at large concurrencies and also due to their inability to take advantage of accelerators that are increasingly popular.
In this work, we systematically remove scalability and performance bottlenecks of HipMCL. We enable GPUs by performing the expensive expansion phase of the MCL algorithm on GPU. We propose a CPUGPU joint distributed SpGEMM algorithm called pipelined Sparse SUMMA and integrate a probabilistic memory requirement estimator that is fast and accurate. We develop a new merging algorithm for the incremental processing of partial results produced by the GPUs, which improves the overlap efficiency and the peak memory usage. We also integrate a recent and faster algorithm for performing SpGEMM on CPUs. We validate our new algorithms and optimizations with extensive evaluations. With the enabling of the GPUs and integration of new algorithms, HipMCL is up to 12.4x faster, being able to cluster a network with 70 million proteins and 68 billion connections just under 15 minutes using 1024 nodes of ORNL's Summit supercomputer.  [216] arXiv:2002.10084 [pdf, other]

Title: Utilizing a null class to restrict decision spaces and defend against neural network adversarial attacksAuthors: Matthew J. RoosComments: 15 pages, 19 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Despite recent progress, deep neural networks generally continue to be vulnerable to socalled adversarial examplesinput images with small perturbations that can result in changes in the output classifications, despite no such change in the semantic meaning to human viewers. This is true even for seemingly simple challenges such as the MNIST digit classification task. In part, this suggests that these networks are not relying on the same set of object features as humans use to make these classifications. In this paper we examine an additional, and largely unexplored, cause behind this phenomenonnamely, the use of the conventional training paradigm in which the entire input space is parcellated among the training classes. Owing to this paradigm, learned decision spaces for individual classes span excessively large regions of the input space and include images that have no semantic similarity to images in the training set. In this study, we train models that include a null class. That is, models may "optout" of classifying an input image as one of the digit classes. During training, null images are created through a variety of methods, in an attempt to create tighter and more semantically meaningful decision spaces for the digit classes. The best performing models classify nearly all adversarial examples as nulls, rather than mistaking them as a member of an incorrect digit class, while simultaneously maintaining high accuracy on the unperturbed test set. The use of a null class and the training paradigm presented herein may provide an effective defense against adversarial attacks for some applications. Code for replicating this study will be made available at https://github.com/mattroos/null_class_adversarial_defense .
 [217] arXiv:2002.10085 [pdf, other]

Title: Temporal Spike Sequence Learning via Backpropagation for Deep Spiking Neural NetworksSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Spiking neural networks (SNNs) are well suited for spatiotemporal learning and implementations on energyefficient eventdriven neuromorphic processors. However, existing SNNs error backpropagation (BP) track methods lack proper handling of spiking discontinuities and suffer from low performance compared to BP methods for traditional artificial neural networks. In addition, a large number of time steps are typically required for SNNs to achieve decent performance, leading to high latency and rendering spikebased computation unscalable to deep architectures. We present a novel Temporal Spike Sequence Learning Backpropagation (TSSLBP) method for training deep SNNs, which breaks down error backpropagation across two types of interneuron and intraneuron dependencies. It considers the allornone characteristics of firing activities, capturing interneuron dependencies through presynaptic firing times, and internal evolution of each neuronal state through time capturing intraneuron dependencies. For various image classification datasets, TSSLBP efficiently trains deep SNNs within a short temporal time window of a few steps with improved accuracy and runtime efficiency including achieving more than 2% accuracy improvement over the previously reported SNN work on CIFAR10.
 [218] arXiv:2002.10096 [pdf, other]

Title: Emosaic: Visualizing Affective Content of Text at Varying GranularityComments: 9 pages, 7 figuresSubjects: HumanComputer Interaction (cs.HC); Computation and Language (cs.CL); Computers and Society (cs.CY)
This paper presents Emosaic, a tool for visualizing the emotional tone of text documents, considering multiple dimensions of emotion and varying levels of semantic granularity. Emosaic is grounded in psychological research on the relationship between language, affect, and color perception. We capitalize on an established threedimensional model of human emotion: valence (good, nice vs. bad, awful), arousal (calm, passive vs. exciting, active) and dominance (weak, controlled vs. strong, in control). Previously, multidimensional models of emotion have been used rarely in visualizations of textual data, due to the perceptual challenges involved. Furthermore, until recently most text visualizations remained at a high level, precluding closer engagement with the deep semantic content of the text. Informed by empirical studies, we introduce a color mapping that translates any point in threedimensional affective space into a unique color. Emosaic uses affective dictionaries of words annotated with the three emotional parameters of the valencearousaldominance model to extract emotional meanings from texts and then assigns to them corresponding color parameters of the huesaturationbrightness color space. This approach of mapping emotion to color is aimed at helping readers to more easily grasp the emotional tone of the text. Several features of Emosaic allow readers to interactively explore the affective content of the text in more detail; e.g., in aggregated form as histograms, in sequential form following the order of text, and in detail embedded into the text display itself. Interaction techniques have been included to allow for filtering and navigating of text and visualizations.
 [219] arXiv:2002.10097 [pdf, other]

Title: Fast and Stable Adversarial Training through Noise InjectionComments: 7 pages, 3 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Adversarial training is the most successful empirical method, to increase the robustness of neural networks against adversarial attacks yet. Unfortunately, this higher robustness is accompanied by considerably higher computational complexity. To date, only adversarial training with expensive multistep adversarial attacks like Projected Gradient Descent (PGD) proved effective against equally strong attacks. In this paper, we present two ideas that combined enable adversarial training with the computationally less expensive Fast Gradient Sign Method (FGSM). First, we add uniform noise to the initial data point of the FGSM attack, which creates a wider variety of stronger adversaries. Further, we add a learnable regularization step prior to the neural network called Stochastic Augmentation Layer (SAL). Inputs propagated trough the SAL are resampled from a Gaussian distribution. The randomness of the resampling at inference time makes it more complicated for the attacker to construct an adversarial example since the outcome of the model is not known in advance. We show that noise injection in conjunction with FGSM adversarial training achieves comparable results to adversarial training with PGD while being orders of magnitude faster. Moreover, we show superior results in comparison to PGDbased training when combining noise injection and SAL.
 [220] arXiv:2002.10098 [pdf, other]

Title: An RLSBased Instantaneous Velocity Estimator for Extended Radar TrackingComments: 8 pages, 11 figuresSubjects: Robotics (cs.RO)
Radar sensors have become an important part of the perception sensor suite due to their long range and their ability to work in adverse weather conditions. However, several shortcomings such as large amounts of noise and extreme sparsity of the point cloud result in them not being used to their full potential. In this paper, we present a novel Recursive Least Squares (RLS) based approach to estimate the instantaneous velocity of dynamic objects in realtime that is capable of handling large amounts of noise in the input data stream. We also present an endtoend pipeline to track extended objects in realtime that uses the computed velocity estimates for data association and track initialisation. The approaches are evaluated using several realworld inspired driving scenarios that test the limits of these algorithms. It is also experimentally proven that our approaches run in realtime with frame execution time not exceeding 30 ms even in dense traffic scenarios, thus allowing for their direct implementation on autonomous vehicles.
 [221] arXiv:2002.10099 [pdf, other]

Title: Implicit Geometric Regularization for Learning ShapesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR); Machine Learning (stat.ML)
Representing shapes as level sets of neural networks has been recently proved to be useful for different shape analysis and reconstruction tasks So far, such representations were computed using either: (i) precomputed implicit shape representations; or (ii) loss functions explicitly defined over the neural level sets. In this paper we offer a new paradigm for computing high fidelity implicit neural representations directly from raw data (i.e., point clouds, with or without normal information). We observe that a rather simple loss function, encouraging the neural network to vanish on the input point cloud and to have a unit norm gradient, possesses an implicit geometric regularization property that favors smooth and natural zero level set surfaces, avoiding bad zeroloss solutions. We provide a theoretical analysis of this property for the linear case, and show that, in practice, our method leads to state of the art implicit neural representations with higher levelofdetails and fidelity compared to previous methods.
 [222] arXiv:2002.10100 [pdf, other]

Title: LeafGAN: An Effective Data Augmentation Method for Practical Plant Disease DiagnosisSubjects: Computer Vision and Pattern Recognition (cs.CV)
Many applications for the automated diagnosis of plant disease have been developed based on the success of deep learning techniques. However, these applications often suffer from overfitting, and the diagnostic performance is drastically decreased when used on test datasets from new environments. The typical reasons for this are that the symptoms to be detected are unclear or faint, and there are limitations related to data diversity. In this paper, we propose LeafGAN, a novel imagetoimage translation system with own attention mechanism. LeafGAN generates a wide variety of diseased images via transformation from healthy images, as a data augmentation tool for improving the performance of plant disease diagnosis. Thanks to its own attention mechanism, our model can transform only relevant areas from images with a variety of backgrounds, thus enriching the versatility of the training images. Experiments with fiveclass cucumber disease classification show that data augmentation with vanilla CycleGAN cannot help to improve the generalization, i.e. disease diagnostic performance increased by only 0.7% from the baseline. In contrast, LeafGAN boosted the diagnostic performance by 7.4%. We also visually confirmed the generated images by our LeafGAN were much better quality and more convincing than those generated by vanilla CycleGAN.
 [223] arXiv:2002.10101 [pdf, other]

Title: GRET: Global Representation Enhanced TransformerComments: Accepted by AAAI 2020Subjects: Computation and Language (cs.CL)
Transformer, based on the encoderdecoder framework, has achieved stateoftheart performance on several natural language generation tasks. The encoder maps the words in the input sentence into a sequence of hidden states, which are then fed into the decoder to generate the output sentence. These hidden states usually correspond to the input words and focus on capturing local information. However, the global (sentence level) information is seldom explored, leaving room for the improvement of generation quality. In this paper, we propose a novel global representation enhanced Transformer (GRET) to explicitly model global representation in the Transformer network. Specifically, in the proposed model, an external state is generated for the global representation from the encoder. The global representation is then fused into the decoder during the decoding process to improve generation quality. We conduct experiments in two text generation tasks: machine translation and text summarization. Experimental results on four WMT machine translation tasks and LCSTS text summarization task demonstrate the effectiveness of the proposed approach on natural language generation.
 [224] arXiv:2002.10102 [pdf, other]

Title: GANHopper: MultiHop GAN for Unsupervised ImagetoImage TranslationComments: 9 pages, 9 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce GANHOPPER, an unsupervised imagetoimage translation network that transforms images gradually between two domains, through multiple hops. Instead of executing translation directly, we steer the translation by requiring the network to produce inbetween images which resemble weighted hybrids between images from the two input domains. Our network is trained on unpaired images from the two domains only, without any inbetween images.All hops are produced using a single generator along each direction. In addition to the standard cycleconsistency and adversarial losses, we introduce a new hybrid discriminator, which is trained to classify the intermediate images produced by the generator as weighted hybrids, with weights based on a predetermined hop count. We also introduce a smoothness term to constrain the magnitude of each hop,further regularizing the translation. Compared to previous methods, GANHOPPER excels at image translations involving domainspecific image features and geometric variations while also preserving nondomainspecific features such as backgrounds and general color schemes.
 [225] arXiv:2002.10105 [pdf, other]

Title: Communication Contention Aware Scheduling of Multiple Deep Learning Training JobsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Distributed Deep Learning (DDL) has rapidly grown its popularity since it helps boost the training performance on highperformance GPU clusters. Efficient job scheduling is indispensable to maximize the overall performance of the cluster when training multiple jobs simultaneously. However, existing schedulers do not consider the communication contention of multiple communication tasks from different distributed training jobs, which could deteriorate the system performance and prolong the job completion time. In this paper, we first establish a new DDL job scheduling framework which organizes DDL jobs as Directed Acyclic Graphs (DAGs) and considers communication contention between nodes. We then propose an efficient algorithm, LWF$\kappa$, to balance the GPU utilization and consolidate the allocated GPUs for each job. When scheduling those communication tasks, we observe that neither avoiding all the contention nor blindly accepting them is optimal to minimize the job completion time. We thus propose a provable algorithm, AdaDUAL, to efficiently schedule those communication tasks. Based on AdaDUAL, we finally propose AdaSRSF for the DDL job scheduling problem. Simulations on a 64GPU cluster connected with 10 Gbps Ethernet show that LWF$\kappa$ achieves up to $1.59\times$ improvement over the classical firstfit algorithms. More importantly, AdaSRSF reduces the average job completion time by $20.1\%$ and $36.7\%$, as compared to the SRSF(1) scheme (avoiding all the contention) and the SRSF(2) scheme (blindly accepting all of twoway communication contention) respectively.
 [226] arXiv:2002.10107 [pdf]

Title: Predicting Subjective Features from Questions on QA Websites using BERTComments: 5 pages, 4 figures, 2 tablesSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Modern QuestionAnswering websites, such as StackOverflow and Quora, have specific user rules to maintain their content quality. These systems rely on user reports for accessing new contents, which has serious problems including the slow handling of violations, the loss of normal and experienced users' time, the low quality of some reports, and discouraging feedback to new users. Therefore, with the overall goal of providing solutions for automating moderation actions in Q&A websites, we aim to provide a model to predict 20 quality or subjective aspects of questions in QA websites. To this end, we used data gathered by the CrowdSource team at Google Research in 2019 and finetuned pretrained BERT model on our problem. Model achieves 95.4% accuracy after 2 epochs of training and did not improve substantially in the next ones. Results confirm that by simple finetuning, we can achieve accurate models, in little time, and on less amount of data.
 [227] arXiv:2002.10110 [pdf, ps, other]

Title: Revisiting EXTRA for Smooth Distributed OptimizationSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Optimization and Control (math.OC)
EXTRA is a popular method for the dencentralized distributed optimization and has broad applications. This paper revisits the EXTRA. Firstly, we give a sharp complexity analysis for EXTRA with the improved $O\left(\left(\frac{L}{\mu}+\frac{1}{1\sigma_2(W)}\right)\log\frac{1}{\epsilon(1\sigma_2(W))}\right)$ communication and computation complexities for $\mu$strongly convex and $L$smooth problems, where $\sigma_2(W)$ is the second largest singular value of the weight matrix $W$. When the strong convexity is absent, we prove the $O\left(\left(\frac{L}{\epsilon}+\frac{1}{1\sigma_2(W)}\right)\log\frac{1}{1\sigma_2(W)}\right)$ complexities. Then, we use the Catalyst framework to accelerate EXTRA and obtain the $O\left(\sqrt{\frac{L}{\mu(1\sigma_2(W))}}\log\frac{ L}{\mu(1\sigma_2(W))}\log\frac{1}{\epsilon}\right)$ communication and computation complexities for strongly convex and smooth problems and the $O\left(\sqrt{\frac{L}{\epsilon(1\sigma_2(W))}}\log\frac{1}{\epsilon(1\sigma_2(W))}\right)$ complexities for nonstrongly convex ones. Our communication complexities of the accelerated EXTRA are only worse by the factors of $\left(\log\frac{L}{\mu(1\sigma_2(W))}\right)$ and $\left(\log\frac{1}{\epsilon(1\sigma_2(W))}\right)$ from the lower complexity bounds for strongly convex and nonstrongly convex problems, respectively.
 [228] arXiv:2002.10111 [pdf, other]

Title: SMOKE: SingleStage Monocular 3D Object Detection via Keypoint EstimationComments: 8 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Estimating 3D orientation and translation of objects is essential for infrastructureless autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a RCNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces nonnegligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multistep disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/postprocessing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best stateoftheart result on both 3D object detection and Bird's eye view evaluation. The code will be made publicly available.
 [229] arXiv:2002.10112 [pdf, ps, other]

Title: Intelligent Reflecting Surface: Practical Phase Shift Model and Beamforming OptimizationComments: submitted for possible journal publication (Part of this work will be presented in the IEEE International Conference on Communications (ICC), Dublin, Ireland, 2020 Available: arXiv:1907.06002)Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Intelligent reflecting surface (IRS) that enables the control of wireless propagation environment has recently emerged as a promising costeffective technology for boosting the spectrum and energy efficiency in future wireless communication systems. Prior works on IRS are mainly based on the ideal phase shift model assuming the full signal reflection by each of the elements regardless of its phase shift, which, however, is practically difficult to realize. In contrast, we propose in this paper the practical phase shift model that captures the phasedependent amplitude variation in the elementwise reflection coefficient. Based on the proposed model and considering an IRSaided multiuser system with an IRS deployed to assist in the downlink communications from a multiantenna access point (AP) to multiple singleantenna users, we formulate an optimization problem to minimize the total transmit power at the AP by jointly designing the AP transmit beamforming and the IRS reflect beamforming, subject to the users' individual signaltointerferenceplusnoise ratio (SINR) constraints. Iterative algorithms are proposed to find suboptimal solutions to this problem efficiently by utilizing the alternating optimization (AO) or penaltybased optimization technique. Moreover, we analyze the asymptotic performance loss of the IRSaided system that employs practical phase shifters but assumes the ideal phase shift model for beamforming optimization, as the number of IRS elements goes to infinity. Simulation results unveil substantial performance gains achieved by the proposed beamforming optimization based on the practical phase shift model as compared to the conventional ideal model.
 [230] arXiv:2002.10113 [pdf, other]

Title: APACNet: Alternating the Population and Agent Control via Two Neural Networks to Solve HighDimensional Stochastic Mean Field GamesSubjects: Machine Learning (cs.LG); Multiagent Systems (cs.MA); Optimization and Control (math.OC); Machine Learning (stat.ML)
We present APACNet, an alternating population and agent control neural network for solving stochastic mean field games (MFGs). Our algorithm is geared toward highdimensional instances MFGs that are beyond reach with existing solution methods. We achieve this in two steps. First, we take advantage of the underlying variational primaldual structure that MFGs exhibit and phrase it as a convexconcave saddle point problem. Second, we parameterize the value and density functions by two neural networks, respectively. By phrasing the problem in this manner, solving the MFG can be interpreted as a special case of training a generative adversarial generative network (GAN). We show the potential of our method on up to 50dimensional MFG problems.
 [231] arXiv:2002.10116 [pdf, other]

Title: A Hybrid Approach to Dependency Parsing: Combining Rules and Morphology with Deep LearningAuthors: Şaziye Betül Özateş (1), Arzucan Özgür (1), Tunga Güngör (1), Balkız Öztürk (2) ((1) Department of Computer Engineering, Boğaziçi University, (2) Department of Linguistics, Boğaziçi University)Comments: 25 pages, 7 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Fully datadriven, deep learningbased models are usually designed as languageindependent and have been shown to be successful for many natural language processing tasks. However, when the studied language is lowresourced and the amount of training data is insufficient, these models can benefit from the integration of natural language grammarbased information. We propose two approaches to dependency parsing especially for languages with restricted amount of training data. Our first approach combines a stateoftheart deep learningbased parser with a rulebased approach and the second one incorporates morphological information into the parser. In the rulebased approach, the parsing decisions made by the rules are encoded and concatenated with the vector representations of the input words as additional information to the deep network. The morphologybased approach proposes different methods to include the morphological structure of words into the parser network. Experiments are conducted on the IMSTUD Treebank and the results suggest that integration of explicit knowledge about the target language to a neural parser through a rulebased parsing system and morphological analysis leads to more accurate annotations and hence, increases the parsing performance in terms of attachment scores. The proposed methods are developed for Turkish, but can be adapted to other languages as well.
 [232] arXiv:2002.10119 [pdf, other]

Title: DeepSign: Deep OnLine Signature VerificationSubjects: Computer Vision and Pattern Recognition (cs.CV); HumanComputer Interaction (cs.HC)
Deep learning has become a breathtaking technology in the last years, overcoming traditional handcrafted approaches and even humans for many different tasks. However, in some tasks, such as the verification of handwritten signatures, the amount of publicly available data is scarce, what makes difficult to test the real limits of deep learning. In addition to the lack of public data, it is not easy to evaluate the improvements of novel proposed approaches as different databases and experimental protocols are usually considered.
The main contributions of this study are: i) we provide an indepth analysis of stateoftheart deep learning approaches for online signature verification, ii) we present and describe the new DeepSignDB online handwritten signature biometric public database, iii) we propose a standard experimental protocol and benchmark to be used for the research community in order to perform a fair comparison of novel approaches with the state of the art, and iv) we adapt and evaluate our recent deep learning approach named TimeAligned Recurrent Neural Networks (TARNNs) for the task of online handwritten signature verification. This approach combines the potential of Dynamic Time Warping and Recurrent Neural Networks to train more robust systems against forgeries. Our proposed TARNN system outperforms the state of the art, achieving results even below 2.0% EER when considering skilled forgery impostors and just one training signature per user.  [233] arXiv:2002.10120 [pdf, other]

Title: Semantic Flow for Fast and Accurate Scene ParsingSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
In this paper, we focus on effective methods for fast and accurate scene parsing. A common practice to improve the performance is to attain high resolution feature maps with strong semantic representation. Two strategies are widely usedastrous convolutions and feature pyramid fusion, are either computation intensive or ineffective. Inspired by Optical Flow for motion alignment between adjacent video frames, we propose a Flow Alignment Module (FAM) to learn Semantic Flow between feature maps of adjacent levels and broadcast highlevel features to high resolution features effectively and efficiently. Furthermore, integrating our module to a common feature pyramid structure exhibits superior performance over other realtime methods even on very lightweight backbone networks, such as ResNet18. Extensive experiments are conducted on several challenging datasets, including Cityscapes, PASCAL Context, ADE20K and CamVid. Particularly, our network is the first to achieve 80.4\% mIoU on Cityscapes with a frame rate of 26 FPS. The code will be available at \url{https://github.com/donnyyou/torchcv}.
 [234] arXiv:2002.10121 [pdf, other]

Title: Optimal and Greedy Algorithms for MultiArmed Bandits with Many ArmsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We characterize Bayesian regret in a stochastic multiarmed bandit problem with a large but finite number of arms. In particular, we assume the number of arms $k$ is $T^{\alpha}$, where $T$ is the timehorizon and $\alpha$ is in $(0,1)$. We consider a Bayesian setting where the reward distribution of each arm is drawn independently from a common prior, and provide a complete analysis of expected regret with respect to this prior. Our results exhibit a sharp distinction around $\alpha = 1/2$. When $\alpha < 1/2$, the fundamental lower bound on regret is $\Omega(k)$; and it is achieved by a standard UCB algorithm. When $\alpha > 1/2$, the fundamental lower bound on regret is $\Omega(\sqrt{T})$, and it is achieved by an algorithm that first subsamples $\sqrt{T}$ arms uniformly at random, then runs UCB on just this subset. Interestingly, we also find that a sufficiently large number of arms allows the decisionmaker to benefit from "free" exploration if she simply uses a greedy algorithm. In particular, this greedy algorithm exhibits a regret of $\tilde{O}(\max(k,T/\sqrt{k}))$, which translates to a {\em sublinear} (though not optimal) regret in the time horizon. We show empirically that this is because the greedy algorithm rapidly disposes of underperforming arms, a beneficial trait in the manyarmed regime. Technically, our analysis of the greedy algorithm involves a novel application of the Lundberg inequality, an upper bound for the ruin probability of a random walk; this approach may be of independent interest.
 [235] arXiv:2002.10126 [pdf, other]

Title: Safe reinforcement learning for probabilistic reachability and safety specifications: A Lyapunovbased approachSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
Emerging applications in robotics and autonomous systems, such as autonomous driving and robotic surgery, often involve critical safety constraints that must be satisfied even when information about system models is limited. In this regard, we propose a modelfree safety specification method that learns the maximal probability of safe operation by carefully combining probabilistic reachability analysis and safe reinforcement learning (RL). Our approach constructs a Lyapunov function with respect to a safe policy to restrain each policy improvement stage. As a result, it yields a sequence of safe policies that determine the range of safe operation, called the safe set, which monotonically expands and gradually converges. We also develop an efficient safe exploration scheme that accelerates the process of identifying the safety of unexamined states. Exploiting the Lyapunov shielding, our method regulates the exploratory policy to avoid dangerous states with high confidence. To handle highdimensional systems, we further extend our approach to deep RL by introducing a Lagrangian relaxation technique to establish a tractable actorcritic algorithm. The empirical performance of our method is demonstrated through continuous control benchmark problems, such as a reaching task on a planar robot arm.
 [236] arXiv:2002.10127 [pdf, other]

Title: FONDUE: A Framework for Node Disambiguation Using Network EmbeddingsComments: 11 pages, 3 figuresSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
Realworld data often presents itself in the form of a network. Examples include social networks, citation networks, biological networks, and knowledge graphs. In their simplest form, networks represent reallife entities (e.g. people, papers, proteins, concepts) as nodes, and describe them in terms of their relations with other entities by means of edges between these nodes. This can be valuable for a range of purposes from the study of information diffusion to bibliographic analysis, bioinformatics research, and questionanswering.
The quality of networks is often problematic though, affecting downstream tasks. This paper focuses on the common problem where a node in the network in fact corresponds to multiple reallife entities. In particular, we introduce FONDUE, an algorithm based on network embedding for node disambiguation. Given a network, FONDUE identifies nodes that correspond to multiple entities, for subsequent splitting. Extensive experiments on twelve benchmark datasets demonstrate that FONDUE is substantially and uniformly more accurate for ambiguous node identification compared to the existing stateoftheart, at a comparable computational cost, while less optimal for determining the best way to split ambiguous nodes.  [237] arXiv:2002.10131 [pdf, other]

Title: Angry Birds Flock Together: Aggression Propagation on Social MediaAuthors: Chrysoula Terizi, Despoina Chatzakou, Evaggelia Pitoura, Panayiotis Tsaparas, Nicolas KourtellisComments: 10 pages, 4 figuresSubjects: Social and Information Networks (cs.SI); Computers and Society (cs.CY)
Cyberaggression has been found in various contexts and online social platforms, and modeled on different data using stateoftheart machine and deep learning algorithms to enable automatic detection and blocking of this behavior. Users can be influenced to act aggressively or even bully others because of elevated toxicity and aggression in their own (online) social circle. In effect, this behavior can propagate from one user and neighborhood to another, and therefore, spread in the network. Interestingly, to our knowledge, no work has modeled the network dynamics of aggressive behavior. In this paper, we take a first step towards this direction, by studying propagation of aggression on social media. We look into various opinion dynamics models widely used to model how opinions propagate through a network. We propose ways to enhance these classical models to accommodate how aggression may propagate from one user to another, depending on how each user is connected to other aggressive or regular users. Through extensive simulations on Twitter data, we study how aggressive behavior could propagate in the network, and validate our models with ground truth from crawled data and crowdsourced annotations. We discuss the results and implications of our work.
 [238] arXiv:2002.10137 [pdf, other]

Title: Audiodriven Talking Face Video Generation with Natural Head PoseComments: 12 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Realworld talking faces often accompany with natural head movement. However, most existing talking face video generation methods only consider facial animation with fixed head pose. In this paper, we address this problem by proposing a deep neural network model that takes an audio signal A of a source person and a very short video V of a target person as input, and outputs a synthesized highquality talking face video with natural head pose (making use of the visual information in V), expression and lip synchronization (by considering both A and V). The most challenging issue in our work is that natural poses often cause inplane and outofplane head rotations, which makes synthesized talking face video far from realistic. To address this challenge, we reconstruct 3D face animation and rerender it into synthesized frames. To fine tune these frames into realistic ones with smooth background transition, we propose a novel memoryaugmented GAN module. Extensive experiments and three user studies show that our method can generate highquality (i.e., natural head movements, expressions and good lip synchronization) personalized talking face videos, outperforming the stateoftheart methods.
 [239] arXiv:2002.10142 [pdf, other]

Title: Explicit and Implicit Dynamic Coloring of Graphs with Bounded ArboricitySubjects: Data Structures and Algorithms (cs.DS)
Graph coloring is a fundamental problem in computer science. We study the fully dynamic version of the problem in which the graph is undergoing edge insertions and deletions and we wish to maintain a vertexcoloring with small update time after each insertion and deletion.
We show how to maintain an $O(\alpha \lg n)$coloring with polylogarithmic update time, where $n$ is the number of vertices in the graph and $\alpha$ is the current arboricity of the graph. This improves upon a result by Solomon and Wein (ESA'18) who maintained an $O(\alpha_{\max}\lg^2 n)$coloring, where $\alpha_{\max}$ is the maximum arboricity of the graph over all updates.
Furthermore, motivated by a lower bound by Barba et al. (Algorithmica'19), we initiate the study of implicit dynamic colorings. Barba et al. showed that dynamic algorithms with polylogarithmic update time cannot maintain an $f(\alpha)$coloring for any function $f$ when the vertex colors are stored explicitly, i.e., for each vertex the color is stored explicitly in the memory. Previously, all dynamic algorithms maintained explicit colorings. Therefore, we propose to study implicit colorings, i.e., the data structure only needs to offer an efficient query procedure to return the color of a vertex (instead of storing its color explicitly). We provide an algorithm which breaks the lower bound and maintains an implicit $2^{O(\alpha)}$coloring with polylogarithmic update time. In particular, this yields the first dynamic $O(1)$coloring for graphs with constant arboricity such as planar graphs or graphs with bounded treewidth, which is impossible using explicit colorings.
We also show how to dynamically maintain a partition of the graph's edges into $O(\alpha)$ forests with polylogarithmic update time. We believe this data structure is of independent interest and might have more applications in the future.  [240] arXiv:2002.10143 [pdf, other]

Title: Snitch: A 10 kGE Pseudo DualIssue Processor for Area and Energy Efficient Execution of FloatingPoint Intensive WorkloadsSubjects: Hardware Architecture (cs.AR)
Dataparallel applications, such as data analytics, machine learning, and scientific computing, are placing an evergrowing demand on floatingpoint operations per second on emerging systems. With increasing integration density, the quest for energy efficiency becomes the number one design concern. While dedicated accelerators provide high energy efficiency, they are overspecialized and hard to adjust to algorithmic changes. We propose an architectural concept that tackles the issues of achieving extreme energy efficiency while still maintaining high flexibility as a generalpurpose compute engine. The key idea is to pair a tiny 10kGE control core, called Snitch, with a doubleprecision FPU to adjust the compute to control ratio. While traditionally minimizing nonFPU area and achieving high floatingpoint utilization has been a tradeoff, with Snitch, we achieve them both, by enhancing the ISA with two minimally intrusive extensions: stream semantic registers (SSR) and a floatingpoint repetition instruction (FREP). SSRs allow the core to implicitly encode load/store instructions as register reads/writes, eliding many explicit memory instructions. The FREP extension decouples the floatingpoint and integer pipeline by sequencing instructions from a microloop buffer. These ISA extensions significantly reduce the pressure on the core and free it up for other tasks, making Snitch and FPU effectively dualissue at a minimal incremental cost of 3.2%. The two low overhead ISA extensions make Snitch more flexible than a contemporary vector processor lane, achieving a $2\times$ energyefficiency improvement. We have evaluated the proposed core and ISA extensions on an octacore cluster in 22nm technology. We achieve more than $5\times$ multicore speedup and a $3.5\times$ gain in energy efficiency on several parallel microkernels.
 [241] arXiv:2002.10145 [pdf, other]

Title: Hardness of equations over finite solvable groups under the exponential time hypothesisAuthors: Armin WeißSubjects: Computational Complexity (cs.CC); Group Theory (math.GR)
Goldmann and Russell (2002) initiated the study of the complexity of the equation satisfiability problem in finite groups by showing that it is in P for nilpotent groups while it is NPcomplete for nonsolvable groups. Since then, several results have appeared showing that the problem can be solved in polynomial time in certain solvable groups of Fitting length two. In this work, we present the first lower bounds for the equation satisfiability problem in finite solvable groups: under the assumption of the exponential time hypothesis, we show that it cannot be in P for any group of Fitting length at least four and for certain groups of Fitting length three. Moreover, the same hardness result applies to the equation identity problem.
 [242] arXiv:2002.10148 [pdf, other]

Title: Embeddedphysics machine learning for coarsegraining and collective variable discovery without dataSubjects: Machine Learning (cs.LG); Chemical Physics (physics.chemph); Computational Physics (physics.compph); Machine Learning (stat.ML)
We present a novel learning framework that consistently embeds underlying physics while bypassing a significant drawback of most modern, datadriven coarsegrained approaches in the context of molecular dynamics (MD), i.e., the availability of big data. The generation of a sufficiently large training dataset poses a computationally demanding task, while complete coverage of the atomistic configuration space is not guaranteed. As a result, the explorative capabilities of datadriven coarsegrained models are limited and may yield biased "predictive" tools. We propose a novel objective based on reverse KullbackLeibler divergence that fully incorporates the available physics in the form of the atomistic force field. Rather than separating model learning from the datageneration procedure  the latter relies on simulating atomistic motions governed by force fields  we query the atomistic force field at sample configurations proposed by the predictive coarsegrained model. Thus, learning relies on the evaluation of the force field but does not require any MD simulation. The resulting generative coarsegrained model serves as an efficient surrogate model for predicting atomistic configurations and estimating relevant observables. Beyond obtaining a predictive coarsegrained model, we demonstrate that in the discovered lowerdimensional representation, the collective variables (CVs) are related to physicochemical properties, which are essential for gaining understanding of unexplored complex systems. We demonstrate the algorithmic advances in terms of predictive ability and the physical meaning of the revealed CVs for a bimodal potential energy function and the alanine dipeptide.
 [243] arXiv:2002.10149 [pdf, other]

Title: Cognitive Argumentation and the Suppression TaskSubjects: Artificial Intelligence (cs.AI)
This paper addresses the challenge of modeling human reasoning, within a new framework called Cognitive Argumentation. This framework rests on the assumption that human logical reasoning is inherently a process of dialectic argumentation and aims to develop a cognitive model for human reasoning that is computational and implementable. To give logical reasoning a human cognitive form the framework relies on cognitive principles, based on empirical and theoretical work in Cognitive Science, to suitably adapt a general and abstract framework of computational argumentation from AI. The approach of Cognitive Argumentation is evaluated with respect to Byrne's suppression task, where the aim is not only to capture the suppression effect between different groups of people but also to account for the variation of reasoning within each group. Two main cognitive principles are particularly important to capture human conditional reasoning that explain the participants' responses: (i) the interpretation of a condition within a conditional as sufficient and/or necessary and (ii) the mode of reasoning either as predictive or explanatory. We argue that Cognitive Argumentation provides a coherent and cognitively adequate model for human conditional reasoning that allows a natural distinction between definite and plausible conclusions, exhibiting the important characteristics of contextsensitive and defeasible reasoning.
 [244] arXiv:2002.10151 [pdf, ps, other]

Title: VizingGoldberg type bounds for the equitable chromatic number of block graphsComments: 21 pages, 12 figuresSubjects: Discrete Mathematics (cs.DM); Combinatorics (math.CO)
An equitable coloring of a graph $G$ is a proper vertex coloring of $G$ such that the sizes of any two color classes differ by at most one. In the paper, we pose a conjecture that offers a gapone bound for the smallest number of colors needed to equitably color every block graph. In other words, the difference between the upper and the lower bounds of our conjecture is at most one. Thus, in some sense, the situation is similar to that of chromatic index, where we have the classical theorem of Vizing and the Goldberg conjecture for multigraphs. The results obtained in the paper support our conjecture. More precisely, we verify it in the class of wellcovered block graphs, which are block graphs in which each vertex belongs to a maximum independent set. We also show that the conjecture is true for block graphs, which contain a vertex that does not lie in an independent set of size larger than two. Finally, we verify the conjecture for some symmetriclike block graphs. In order to derive our results we obtain structural characterizations of block graphs from these classes.
 [245] arXiv:2002.10152 [pdf, other]

Title: Realtime Kinematic Ground Truth for the Oxford RobotCar DatasetComments: Dataset website: this https URLSubjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV)
We describe the release of reference data towards a challenging longterm localisation and mapping benchmark based on the largescale Oxford RobotCar Dataset. The release includes 72 traversals of a route through Oxford, UK, gathered in all illumination, weather and traffic conditions, and is representative of the conditions an autonomous vehicle would be expected to operate reliably in. Using postprocessed raw GPS, IMU, and static GNSS base station recordings, we have produced a globallyconsistent centimetreaccurate ground truth for the entire yearlong duration of the dataset. Coupled with a planned online benchmarking service, we hope to enable quantitative evaluation and comparison of different localisation and mapping approaches focusing on longterm autonomy for road vehicles in urban environments challenged by changing weather.
 [246] arXiv:2002.10158 [pdf, other]

Title: Robot Perception of Static and Dynamic Objects with an Autonomous Floor ScrubberAuthors: Zhi Yan, Simon Schreiberhuber, Georg Halmetschlager, Tom Duckett, Markus Vincze, Nicola BellottoComments: 15 pages, 16 figures, submitted to Intelligent Service RoboticsSubjects: Robotics (cs.RO)
This paper presents the perception system of a new professional cleaning robot for large public places. The proposed system is based on multiple sensors including 3D and 2D lidar, two RGBD cameras and a stereo camera. The two lidars together with an RGBD camera are used for dynamic object (human) detection and tracking, while the second RGBD and stereo camera are used for detection of static objects (dirt and ground objects). A learning and reasoning module for spatialtemporal representation of the environment based on the perception pipeline is also introduced. Furthermore, a new dataset collected with the robot in several public places, including a supermarket, a warehouse and an airport, is released. Baseline results on this dataset for further research and comparison are provided. The proposed system has been fully implemented into the Robot Operating System (ROS) with high modularity, also publicly available to the community.
 [247] arXiv:2002.10163 [pdf, other]

Title: Software Engineering Timeline: major areas of interest and multidisciplinary trendsComments: Technical report University of Almer\'iaSubjects: Software Engineering (cs.SE)
Society today cannot run without software and by extension, without Software Engineering. Since this discipline emerged in 1968, practitioners have learned valuable lessons that have contributed to current practices. Some have become outdated but many are still relevant and widely used. From the personal and incomplete perspective of the authors, this paper not only reviews the major milestones and areas of interest in the Software Engineering timeline helping software engineers to appreciate the state of things, but also tries to give some insights into the trends that this complex engineering will see in the near future.
 [248] arXiv:2002.10171 [pdf, ps, other]

Title: A Probabilistic Approach to Voting, Allocation, Matching, and Coalition FormationAuthors: Haris AzizComments: Preprint for book chapter in "The Future of Economic Design: The Continuing Development of a Field as Envisioned by Its Researchers"Subjects: Computer Science and Game Theory (cs.GT)
Randomisation and timesharing are some of the oldest methods to achieve fairness. I make a case that applying these approaches to social choice settings constitutes a powerful paradigm that deserves an extensive and thorough examination. I discuss challenges and opportunities in applying these approaches to settings including voting, allocation, matching, and coalition formation.
 [249] arXiv:2002.10172 [pdf, other]

Title: Optimal strategies in the Fighting Fantasy gaming system: influencing stochastic dynamics by gambling with limited resourceAuthors: Iain G. JohnstonComments: Keyword: stochastic game; Markov decision problem; stochastic simulation; dynamic programming; resource allocation; stochastic optimal control; Bellman equationSubjects: Multiagent Systems (cs.MA); Artificial Intelligence (cs.AI); Optimization and Control (math.OC)
Fighting Fantasy is a popular recreational fantasy gaming system worldwide. Combat in this system progresses through a stochastic game involving a series of rounds, each of which may be won or lost. Each round, a limited resource (`luck') may be spent on a gamble to amplify the benefit from a win or mitigate the deficit from a loss. However, the success of this gamble depends on the amount of remaining resource, and if the gamble is unsuccessful, benefits are reduced and deficits increased. Players thus dynamically choose to expend resource to attempt to influence the stochastic dynamics of the game, with diminishing probability of positive return. The identification of the optimal strategy for victory is a Markov decision problem that has not yet been solved. Here, we combine stochastic analysis and simulation with dynamic programming to characterise the dynamical behaviour of the system in the absence and presence of gambling policy. We derive a simple expression for the victory probability without luckbased strategy. We use a backward induction approach to solve the Bellman equation for the system and identify the optimal strategy for any given state during the game. The optimal control strategies can dramatically enhance success probabilities, but take detailed forms; we use stochastic simulation to approximate these optimal strategies with simple heuristics that can be practically employed. Our findings provide a roadmap to improving success in the games that millions of people play worldwide, and inform a class of resource allocation problems with diminishing returns in stochastic games.
 [250] arXiv:2002.10174 [pdf, other]

Title: When Relation Networks meet GANs: Relation GANs with Triplet LossComments: 8 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Though recent research has achieved remarkable progress in generating realistic images with generative adversarial networks (GANs), the lack of training stability is still a lingering concern of most GANs, especially on highresolution inputs and complex datasets. Since the randomly generated distribution can hardly overlap with the real distribution, training GANs often suffers from the gradient vanishing problem. A number of approaches have been proposed to address this issue by constraining the discriminator's capabilities using empirical techniques, like weight clipping, gradient penalty, spectral normalization etc. In this paper, we provide a more principled approach as an alternative solution to this issue. Instead of training the discriminator to distinguish real and fake input samples, we investigate the relationship between paired samples by training the discriminator to separate paired samples from the same distribution and those from different distributions. To this end, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Extensive experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks including unconditional and conditional image generation and image translation. Our source codes are available on the website: \url{https://github.com/JosephineRabbit/RelationGAN}
 [251] arXiv:2002.10177 [pdf, other]

Title: Improving STDPbased Visual Feature Learning with WhiteningSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
In recent years, spiking neural networks (SNNs) emerge as an alternative to deep neural networks (DNNs). SNNs present a higher computational efficiency using lowpower neuromorphic hardware and require less labeled data for training using local and unsupervised learning rules such as spike timingdependent plasticity (STDP). SNN have proven their effectiveness in image classification on simple datasets such as MNIST. However, to process natural images, a preprocessing step is required. DifferenceofGaussians (DoG) filtering is typically used together with oncenter/offcenter coding, but it results in a loss of information that is detrimental to the classification performance. In this paper, we propose to use whitening as a preprocessing step before learning features with STDP. Experiments on CIFAR10 show that whitening allows STDP to learn visual features that are closer to the ones learned with standard neural networks, with a significantly increased classification performance as compared to DoG filtering. We also propose an approximation of whitening as convolution kernels that is computationally cheaper to learn and more suited to be implemented on neuromorphic hardware. Experiments on CIFAR10 show that it performs similarly to regular whitening. Crossdataset experiments on CIFAR10 and STL10 also show that it is fairly stable across datasets, making it possible to learn a single whitening transformation to process different datasets.
 [252] arXiv:2002.10179 [pdf, other]

Title: HRank: Filter Pruning using HighRank Feature MapSubjects: Computer Vision and Pattern Recognition (cs.CV)
Neural network pruning offers a promising prospect to facilitate deploying deep neural networks on resourcelimited devices. However, existing methods are still challenged by the training inefficiency and labor cost in pruning designs, due to missing theoretical guidance of nonsalient network components. In this paper, we propose a novel filter pruning method by exploring the High Rank of feature maps (HRank). Our HRank is inspired by the discovery that the average rank of multiple feature maps generated by a single filter is always the same, regardless of the number of image batches CNNs receive. Based on HRank, we develop a method that is mathematically formulated to prune filters with lowrank feature maps. The principle behind our pruning is that lowrank feature maps contain less information, and thus pruned results can be easily reproduced. Besides, we experimentally show that weights with highrank feature maps contain more important information, such that even when a portion is not updated, very little damage would be done to the model performance. Without introducing any additional constraints, HRank leads to significant improvements over the stateofthearts in terms of FLOPs and parameters reduction, with similar accuracies. For example, with ResNet110, we achieve a 58.2%FLOPs reduction by removing 59.2% of the parameters, with only a small loss of 0.14% in top1 accuracy on CIFAR10. With Res50, we achieve a 43.8%FLOPs reduction by removing 36.7% of the parameters, with only a loss of 1.17% in the top1 accuracy on ImageNet. The codes can be available at https://github.com/lmbxmu/HRank.
 [253] arXiv:2002.10181 [pdf, other]

Title: Relaxing Relationship Queries on Graph DataComments: 16 pages, accepted to JoWSSubjects: Information Retrieval (cs.IR); Databases (cs.DB)
In many domains we have witnessed the need to search a large entityrelation graph for direct and indirect relationships between a set of entities specified in a query. A search result, called a semantic association (SA), is typically a compact (e.g., diameterconstrained) connected subgraph containing all the query entities. For this problem of SA search, efficient algorithms exist but will return empty results if some query entities are distant in the graph. To reduce the occurrence of failing query and provide alternative results, we study the problem of query relaxation in the context of SA search. Simply relaxing the compactness constraint will sacrifice the compactness of an SA, and more importantly, may lead to performance issues and be impracticable. Instead, we focus on removing the smallest number of entities from the original failing query, to form a maximum successful subquery which minimizes the loss of result quality caused by relaxation. We prove that verifying the success of a subquery turns into finding an entity (called a certificate) that satisfies a distancebased condition about the query entities. To efficiently find a certificate of the success of a maximum subquery, we propose a bestfirst search algorithm that leverages distancebased estimation to effectively prune the search space. We further improve its performance by adding two finegrained heuristics: one based on degree and the other based on distance. Extensive experiments over popular RDF datasets demonstrate the efficiency of our algorithm, which is more scalable than baselines.
 [254] arXiv:2002.10185 [pdf, other]

Title: iLQGames.jl: Rapidly Designing and Solving Differential Games in JuliaSubjects: Multiagent Systems (cs.MA)
In many problems that involve multiple decision making agents, optimal choices for each agent depend on the choices of others. Differential game theory provides a principled formalism for expressing these coupled interactions and recent work offers efficient approximations to solve these problems to noncooperative equilibria. iLQGames.jl is a framework for designing and solving differential games, built around the iterative linearquadratic method. It is written in the Julia programming language to allow flexible prototyping and integration with other research software, while leveraging the highperformance nature of the language to allow realtime execution. The opensource software package can be found at https://github.com/lassepe/iLQGames.jl.
 [255] arXiv:2002.10187 [pdf, other]

Title: 3DSSD: Pointbased 3D Single Stage Object DetectorSubjects: Computer Vision and Pattern Recognition (cs.CV)
Currently, there have been many kinds of voxelbased 3D single stage detectors, while pointbased single stage methods are still underexplored. In this paper, we first present a lightweight and effective pointbased 3D single stage object detector, named 3DSSD, achieving a good balance between accuracy and efficiency. In this paradigm, all upsampling layers and refinement stage, which are indispensable in all existing pointbased methods, are abandoned to reduce the large computation cost. We novelly propose a fusion sampling strategy in downsampling process to make detection on less representative points feasible. A delicate box prediction network including a candidate generation layer, an anchorfree regression head with a 3D centerness assignment strategy is designed to meet with our demand of accuracy and speed. Our paradigm is an elegant single stage anchorfree framework, showing great superiority to other existing methods. We evaluate 3DSSD on widely used KITTI dataset and more challenging nuScenes dataset. Our method outperforms all stateoftheart voxelbased single stage methods by a large margin, and has comparable performance to two stage pointbased methods as well, with inference speed more than 25 FPS, 2x faster than former stateoftheart pointbased methods.
 [256] arXiv:2002.10191 [pdf, other]

Title: Learning Attentive Pairwise Interaction for FineGrained ClassificationComments: Accepted at AAAI2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Finegrained classification is a challenging problem, due to subtle differences among highlyconfused categories. Most approaches address this difficulty by learning discriminative representation of individual input image. On the other hand, humans can effectively identify contrastive clues by comparing image pairs. Inspired by this fact, this paper proposes a simple but effective Attentive Pairwise Interaction Network (APINet), which can progressively recognize a pair of finegrained images by interaction. Specifically, APINet first learns a mutual feature vector to capture semantic differences in the input pair. It then compares this mutual vector with individual vectors to generate gates for each input image. These distinct gate vectors inherit mutual context on semantic differences, which allow APINet to attentively capture contrastive clues by pairwise interaction between two images. Additionally, we train APINet in an endtoend manner with a score ranking regularization, which can further generalize APINet by taking feature priorities into account. We conduct extensive experiments on five popular benchmarks in finegrained classification. APINet outperforms the recent SOTA methods, i.e., CUB2002011 (90.0%), Aircraft(93.9%), Stanford Cars (95.3%), Stanford Dogs (90.3%), and NABirds (88.1%).
 [257] arXiv:2002.10198 [pdf, other]

Title: Leveraging Code Generation to Improve Code Retrieval and Summarization via Dual LearningComments: Published at The Web Conference (WWW) 2020, full paperSubjects: Information Retrieval (cs.IR); Computation and Language (cs.CL); Software Engineering (cs.SE)
Code summarization generates brief natural language description given a source code snippet, while code retrieval fetches relevant source code given a natural language query. Since both tasks aim to model the association between natural language and programming language, recent studies have combined these two tasks to improve their performance. However, researchers have yet been able to effectively leverage the intrinsic connection between the two tasks as they train these tasks in a separate or pipeline manner, which means their performance can not be well balanced. In this paper, we propose a novel endtoend model for the two tasks by introducing an additional code generation task. More specifically, we explicitly exploit the probabilistic correlation between code summarization and code generation with dual learning, and utilize the two encoders for code summarization and code generation to train the code retrieval task via multitask learning. We have carried out extensive experiments on an existing dataset of SQL andPython, and results show that our model can significantly improve the results of the code retrieval task over thestateofart models, as well as achieve competitive performance in terms of BLEU score for the code summarization task.
 [258] arXiv:2002.10199 [pdf, ps, other]

Title: Better Classifier Calibration for Small Data SetsComments: Accepted for publication in ACM Transactions on Knowledge Discovery from DataSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Classifier calibration does not always go hand in hand with the classifier's ability to separate the classes. There are applications where good classifier calibration, i.e. the ability to produce accurate probability estimates, is more important than class separation. When the amount of data for training is limited, the traditional approach to improve calibration starts to crumble. In this article we show how generating more data for calibration is able to improve calibration algorithm performance in many cases where a classifier is not naturally producing wellcalibrated outputs and the traditional approach fails. The proposed approach adds computational cost but considering that the main use case is with small data sets this extra computational cost stays insignificant and is comparable to other methods in prediction time. From the tested classifiers the largest improvement was detected with the random forest and naive Bayes classifiers. Therefore, the proposed approach can be recommended at least for those classifiers when the amount of data available for training is limited and good calibration is essential.
 [259] arXiv:2002.10200 [pdf, other]

Title: ABCNet: Realtime Scene Text Spotting with Adaptive BezierCurve NetworkComments: Accepted to Proc. IEEE Conf. Comp. Vis. Pattern Recogn. (CVPR) 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Scene text detection and recognition has received increasing research attention. Existing methods can be roughly categorized into two groups: characterbased and segmentationbased. These methods either are costly for character annotation or need to maintain a complex pipeline, which is often not suitable for realtime applications. Here we address the problem by proposing the Adaptive BezierCurve Network (ABCNet). Our contributions are threefold: 1) For the first time, we adaptively fit arbitrarilyshaped text by a parameterized Bezier curve. 2) We design a novel BezierAlign layer for extracting accurate convolution features of a text instance with arbitrary shapes, significantly improving the precision compared with previous methods. 3) Compared with standard bounding box detection, our Bezier curve detection introduces negligible computation overhead, resulting in superiority of our method in both efficiency and accuracy. Experiments on arbitrarilyshaped benchmark datasets, namely TotalText and CTW1500, demonstrate that ABCNet achieves stateoftheart accuracy, meanwhile significantly improving the speed. In particular, on TotalText, our realtime version is over 10 times faster than recent stateoftheart methods with a competitive recognition accuracy. Code is available at https://tinyurl.com/AdelaiDet
 [260] arXiv:2002.10210 [pdf, other]

Title: Learning to Select BiAspect Information for DocumentScale Text Content ManipulationComments: accepted by AAAI2020Subjects: Computation and Language (cs.CL)
In this paper, we focus on a new practical task, documentscale text content manipulation, which is the opposite of text style transfer and aims to preserve text styles while altering the content. In detail, the input is a set of structured records and a reference text for describing another recordset. The output is a summary that accurately describes the partial content in the source recordset with the same writing style of the reference. The task is unsupervised due to lack of parallel data, and is challenging to select suitable records and style words from biaspect inputs respectively and generate a highfidelity long document. To tackle those problems, we first build a dataset based on a basketball game report corpus as our testbed, and present an unsupervised neural model with interactive attention mechanism, which is used for learning the semantic relationship between records and reference texts to achieve better content transfer and better style preservation. In addition, we also explore the effectiveness of the backtranslation in our task for constructing some pseudotraining pairs. Empirical results show superiority of our approaches over competitive methods, and the models also yield a new stateoftheart result on a sentencelevel dataset.
 [261] arXiv:2002.10211 [pdf, other]

Title: Mnemonics Training: MultiClass Incremental Learning without ForgettingComments: Accepted by CVPR 2020Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
MultiClass Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent tradeoff to effectively learning new concepts without forgetting previous ones, potentially leading to catastrophic forgetting of previous concepts. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an endtoend manner. We train the framework through bilevel optimizations, i.e., modellevel and exemplarlevel. We conduct extensive experiments on three MCIL benchmarks, CIFAR100, ImageNetSubset and ImageNet, and show that using mnemonics exemplars can surpass the stateoftheart by a large margin. Interestingly and quite intriguingly, the mnemonics exemplars tend to be on the boundaries between classes.
 [262] arXiv:2002.10212 [pdf, ps, other]

Title: A Mechanised Semantics for HOL with Adhoc OverloadingComments: 18 pages, submitted to LPAR 2020Subjects: Logic in Computer Science (cs.LO)
Isabelle/HOL augments classical higherorder logic with adhoc overloading of constant definitionsthat is, one constant may have several definitions for nonoverlapping types. In this paper, we present a mechanised proof that HOL with adhoc overloading is consistent. All our results have been formalised in the HOL4 theorem prover.
 [263] arXiv:2002.10213 [pdf, other]

Title: Superoptimization of WebAssembly BytecodeAuthors: Javier Cabrera Arteaga, Shrinish Donde, Jian Gu, Orestis Floros, Lucas Satabin, Benoit Baudry, Martin MonperrusComments: 4 pages, 3 figures, MoreVMs 2020Subjects: Programming Languages (cs.PL)
Motivated by the fast adoption of WebAssembly, we propose the first functional pipeline to support the superoptimization of WebAssembly bytecode. Our pipeline works over LLVM and Souper. We evaluate our superoptimization pipeline with 12 programs from the Rosetta code project. Our pipeline improves the code section size of 8 out of 12 programs. We discuss the challenges faced in superoptimization of WebAssembly with two case studies.
 [264] arXiv:2002.10214 [pdf, other]

Title: Injective Domain Knowledge in Neural Networks for Transprecision ComputingSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Machine Learning (ML) models are very effective in many learning tasks, due to the capability to extract meaningful information from large data sets. Nevertheless, there are learning problems that cannot be easily solved relying on pure data, e.g. scarce data or very complex functions to be approximated. Fortunately, in many contexts domain knowledge is explicitly available and can be used to train better ML models. This paper studies the improvements that can be obtained by integrating prior knowledge when dealing with a nontrivial learning task, namely precision tuning of transprecision computing applications. The domain information is injected in the ML models in different ways: I) additional features, II) adhoc graphbased network topology, III) regularization schemes. The results clearly show that ML models exploiting problemspecific information outperform the purely datadriven ones, with an average accuracy improvement around 38%.
 [265] arXiv:2002.10215 [pdf, other]

Title: On the General Value of Evidence, and Bilingual SceneText Visual Question AnsweringAuthors: Xinyu Wang, Yuliang Liu, Chunhua Shen, Chun Chet Ng, Canjie Luo, Lianwen Jin, Chee Seng Chan, Anton van den Hengel, Liangwei WangComments: Accepted to Proc. IEEE Conf. Computer Vision and Pattern Recognition 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Visual Question Answering (VQA) methods have made incredible progress, but suffer from a failure to generalize. This is visible in the fact that they are vulnerable to learning coincidental correlations in the data rather than deeper relations between image content and ideas expressed in language. We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages, and an evaluation process that coopts a well understood imagebased metric to reflect the method's ability to reason. Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct. The dataset reflects the scenetext version of the VQA problem, and the reasoning evaluation can be seen as a textbased version of a referring expression challenge. Experiments and analysis are provided that show the value of the dataset.
 [266] arXiv:2002.10217 [pdf, other]

Title: Automatic Estimation of Sphere Centers from Images of Calibrated CamerasSubjects: Computer Vision and Pattern Recognition (cs.CV)
Calibration of devices with different modalities is a key problem in robotic vision. Regular spatial objects, such as planes, are frequently used for this task. This paper deals with the automatic detection of ellipses in camera images, as well as to estimate the 3D position of the spheres corresponding to the detected 2D ellipses. We propose two novel methods to (i) detect an ellipse in camera images and (ii) estimate the spatial location of the corresponding sphere if its size is known. The algorithms are tested both quantitatively and qualitatively. They are applied for calibrating the sensor system of autonomous cars equipped with digital cameras, depth sensors and LiDAR devices.
 [267] arXiv:2002.10220 [pdf, ps, other]

Title: On the use of the Infinity Computer architecture to set up a dynamic precision floatingpoint arithmeticComments: 11 pages, 2 figures, 6 tablesSubjects: Numerical Analysis (math.NA)
We devise a variable precision floatingpoint arithmetic by exploiting the framework provided by the Infinity Computer. This is a computational platform implementing the Infinity Arithmetic system, a positional numeral system which can handle both infinite and infinitesimal quantities symbolized by the positive and negative finite powers of the radix grossone. The computational features offered by the Infinity Computer allows us to dynamically change the accuracy of representation and floatingpoint operations during the flow of a computation. When suitably implemented, this possibility turns out to be particularly advantageous when solving illconditioned problems. In fact, compared with a standard multiprecision arithmetic, here the accuracy is improved only when needed, thus not affecting that much the overall computational effort. An illustrative example about the solution of a nonlinear equation is also presented.
 [268] arXiv:2002.10221 [pdf, ps, other]

Title: The Archimedean trap: Why traditional reinforcement learning will probably not yield AGIAuthors: Samuel Allen AlexanderComments: 16 pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
After generalizing the Archimedean property of real numbers in such a way as to make it adaptable to nonnumeric structures, we demonstrate that the real numbers cannot be used to accurately measure nonArchimedean structures. We argue that, since an agent with Artificial General Intelligence (AGI) should have no problem engaging in tasks that inherently involve nonArchimedean rewards, and since traditional reinforcement learning rewards are real numbers, therefore traditional reinforcement learning cannot lead to AGI. We indicate two possible ways traditional reinforcement learning could be altered to remove this roadblock.
 [269] arXiv:2002.10228 [pdf]

Title: Dynamic Systems Simulation and Control Using Consecutive Recurrent Neural NetworksComments: 14 pages, granted for publication in Communications in Computer and Information Science (CCIS) proceedings by Springer Nature, presented in the International Conference on Modelling, Machine Learning and Astronomy (MMLA 2019)Subjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Signal Processing (eess.SP)
In this paper, we introduce a novel architecture to connecting adaptive learning and neural networks into and arbitrary machine's control system paradigm. Two consecutive Recurrent Neural Networks (RNNs) are used together to accurately model the dynamic characteristics of electromechanical systems that include controllers, actuators and motors. The ageold method of achieving control with the use of the Proportional, Integral and Derivative constants is well understood as a simplified method that does not capture the complexities of the inherent nonlinearities of complex control systems. In the context of controlling and simulating electromechanical systems, we propose an alternative to PID controllers, employing a sequence of two Recurrent Neural Networks. The first RNN emulates the behavior of the controller, and the second the actuator/motor. The second RNN when used in isolation, potentially serves as an advantageous alternative to extant testing methods of electromechanical systems.
 [270] arXiv:2002.10231 [pdf, other]

Title: Linearfrictional contact model for 3D discrete element simulations of granular systemsComments: 6 figuresJournalref: Int J Numer Methods Eng. 2020, 121(3), 560569Subjects: Computational Engineering, Finance, and Science (cs.CE); Soft Condensed Matter (condmat.soft)
The linearfrictional contact model is the most commonly used contact mechanism for discrete element (DEM) simulations of granular materials. Linear springs with a frictional slider are used for modeling interactions in directions normal and tangential to the contact surface. Although the model is simple in two dimensions, its implementation in 3D faces certain subtle challenges, and the particle interactions that occur within a single timestep require careful modeling with a robust algorithm. The paper details a 3D algorithm that accounts for the changing direction of the tangential force within a timestep, the transition from elastic to slip behavior within a timestep, possible contact sliding during only part of a timestep, and twirling and rotation of the tangential force during a timestep. Without three of these adjustments, errors are introduced in the incremental stiffness of an assembly. Without the fourth adjustment, the resulting stress tensor is not only incorrect, it is no longer a tensor. The algorithm also computes the work increments during a timestep, both elastic and dissipative.
 [271] arXiv:2002.10233 [pdf, ps, other]

Title: ArcText: An Unified Text Approach to Describing Convolutional Neural Network ArchitecturesAuthors: Yanan SunSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG); Image and Video Processing (eess.IV)
Numerous Convolutional Neural Network (CNN) models have demonstrated their promising performance mostly in computer vision. The superiority of CNNs mainly relies on their complex architectures that are often manually designed with extensive human expertise. Data mining on CNN architectures can discover useful patterns and fundamental subcomments from existing CNN architectures, providing common researchers with strong prior knowledge to design CNN architectures when they have no expertise in CNNs. There have been various stateoftheart data mining algorithms at hand, while there is rare work that has been used for this aspect. The main reason behind this is the barrier between CNN architectures and data mining algorithms. Specifically, the current CNN architecture descriptions cannot be exactly vectorized to the input to data mining algorithms. In this paper, we propose a unified approach, named ArcTxt, to describing CNN architectures based on text. Particularly, three different units of ArcText and an order method have been elaborately designed, to uniquely describe the same architecture including the sufficient information. Also, the resulted description can also be exactly converted back to the corresponding CNN architecture. ArcText bridge the gap between CNN and data mining researchers, and has the potentiality to be utilized to wider scenarios.
 [272] arXiv:2002.10234 [pdf, other]

Title: FRTrain: A mutual informationbased approach to fair and robust trainingSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Trustworthy AI is a critical issue in machine learning where, in addition to training a model that is accurate, one must consider both fair and robust training in the presence of data bias and poisoning. However, the existing model fairness techniques mistakenly view poisoned data as an additional bias, resulting in severe performance degradation. To fix this problem, we propose FRTrain, which holistically performs fair and robust model training. We provide a mutual informationbased interpretation of an existing adversarial trainingbased fairnessonly method, and apply this idea to architect an additional discriminator that can identify poisoned data using a clean validation set and reduce its influence. In our experiments, FRTrain shows almost no decrease in fairness and accuracy in the presence of data poisoning by both mitigating the bias and defending against poisoning. We also demonstrate how to construct clean validation sets using crowdsourcing, and release new benchmark datasets.
 [273] arXiv:2002.10235 [pdf, other]

Title: Recurrent Dirichlet Belief Networks for Interpretable Dynamic Relational Data ModellingComments: 7 pages, 3 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Dirichlet Belief Network~(DirBN) has been recently proposed as a promising approach in learning interpretable deep latent representations for objects. In this work, we leverage its interpretable modelling architecture and propose a deep dynamic probabilistic framework  the Recurrent Dirichlet Belief Network~(RecurrentDBN)  to study interpretable hidden structures from dynamic relational data. The proposed RecurrentDBN has the following merits: (1) it infers interpretable and organised hierarchical latent structures for objects within and across time steps; (2) it enables recurrent longterm temporal dependence modelling, which outperforms the oneorder Markov descriptions in most of the dynamic probabilistic frameworks. In addition, we develop a new inference strategy, which first upwardandbackward propagates latent counts and then downwardandforward samples variables, to enable efficient Gibbs sampling for the RecurrentDBN. We apply the RecurrentDBN to dynamic relational data problems. The extensive experiment results on realworld data validate the advantages of the RecurrentDBN over the stateoftheart models in interpretable latent structure discovery and improved link prediction performance.
 [274] arXiv:2002.10241 [pdf, other]

Title: Multiobjective Consensus Clustering Framework for Flight Search RecommendationSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Machine Learning (stat.ML)
In the travel industry, online customers book their travel itinerary according to several features, like cost and duration of the travel or the quality of amenities. To provide personalized recommendations for travel searches, an appropriate segmentation of customers is required. Clustering ensemble approaches were developed to overcome wellknown problems of classical clustering approaches, that each rely on a different theoretical model and can thus identify in the data space only clusters corresponding to this model. Clustering ensemble approaches combine multiple clustering results, each from a different algorithmic configuration, for generating more robust consensus clusters corresponding to agreements between initial clusters. We present a new clustering ensemble multiobjective optimizationbased framework developed for analyzing Amadeus customer search data and improve personalized recommendations. This framework optimizes diversity in the clustering ensemble search space and automatically determines an appropriate number of clusters without requiring user's input. Experimental results compare the efficiency of this approach with other existing approaches on Amadeus customer search data in terms of internal (Adjusted Rand Index) and external (Amadeus business metric) validations.
 [275] arXiv:2002.10242 [pdf, other]

Title: Age of Information Optimized MAC in V2X Sidelink via PiggybackBased CollaborationComments: Submitted to IEEE TWC for possible publicationSubjects: Information Theory (cs.IT); Networking and Internet Architecture (cs.NI)
Realtime status update in future vehicular networks is vital to enable controllevel cooperative autonomous driving. Cellular VehicletoEverything (CV2X), as one of the most promising vehicular wireless technologies, adopts a SemiPersistent Scheduling (SPS) based MediumAccessControl (MAC) layer protocol for its sidelink communications. Despite the recent and ongoing efforts to optimize SPS, very few work has considered the status update performance of SPS. In this paper, Age of Information (AoI) is first leveraged to evaluate the MAC layer performance of CV2X sidelink. Critical issues of SPS, i.e., persistent packet collisions and HalfDuplex (HD) effects, are identified to hinder its AoI performance. Therefore, a piggybackbased collaboration method is proposed accordingly, whereby vehicles collaborate to inform each other of potential collisions and collectively afford HD errors, while entailing only a small signaling overhead. Closedform AoI performance is derived for the proposed scheme, optimal configurations for key parameters are hence calculated, and the convergence property is proved for decentralized implementation. Simulation results show that compared with the standardized SPS and its stateoftheart enhancement schemes, the proposed scheme shows significantly better performance, not only in terms of AoI, but also of conventional metrics such as transmission reliability.
 [276] arXiv:2002.10244 [pdf, other]

Title: FractionalOrder Models for the Static and Dynamic Analysis of Nonlocal PlatesComments: 26 pages, 3 figures, 13 Tables. arXiv admin note: text overlap with arXiv:2001.06885, arXiv:2002.07148Subjects: Computational Engineering, Finance, and Science (cs.CE); Analysis of PDEs (math.AP); Dynamical Systems (math.DS); Numerical Analysis (math.NA)
This study presents the analytical formulation and the finite element solution of fractional order nonlocal plates under both Mindlin and Kirchoff formulations. By employing consistent definitions for fractionalorder kinematic relations, the governing equations and the associated boundary conditions are derived based on variational principles. Remarkably, the fractionalorder nonlocal model gives rise to a selfadjoint and positivedefinite system that accepts a unique solution. Further, owing to the difficulty in obtaining analytical solutions to this fractionalorder differintegral problem, a 2D finite element model for the fractionalorder governing equations is presented. Following a thorough validation with benchmark problems, the 2D fractional finite element model is used to study the static as well as the free dynamic response of fractionalorder plates subject to various loading and boundary conditions. It is established that the fractionalorder nonlocality leads to a reduction in the stiffness of the plate structure thereby increasing the displacements and reducing the natural frequency of vibration of the plates. Further, it is seen that the effect of nonlocality is stronger on the higher modes of vibration when compared to the fundamental mode. These effects of the fractionalorder nonlocality are noted irrespective of the nature of the boundary conditions. More specifically, the fractionalorder model of nonlocal plates is free from boundary effects that lead to paradoxical predictions such as hardening and absence of nonlocal effects in classical integral approaches to nonlocal elasticity. This consistency in the predictions is a result of the wellposed nature of the fractionalorder governing equations that accept a unique solution.
 [277] arXiv:2002.10245 [pdf, other]

Title: Specializing Coherence, Consistency, and Push/Pull for GPU Graph AnalyticsAuthors: Giordano Salvador, Wesley H. Darvin, Muhammad Huzaifa, Johnathan Alsop, Matthew D. Sinclair, Sarita V. AdveSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This work provides the first study to explore the interaction of update propagation with and without finegrained synchronization (push vs. pull), emerging coherence protocols (GPU vs. DeNovo coherence), and softwarecentric consistency models (DRF0, DRF1, and DRFrlx) for graph workloads on emerging integrated GPUCPU systems with native unified shared memory. We study 6 graph applications with 6 graph inputs for a total of 36 workloads running on 12 system (hardware+software) configurations reflecting the above design space of update propagation, coherence, and memory consistency. We make three key contributions. First, we show that there is no single best system configuration for all workloads, motivating systems with flexible coherence and consistency support. Second, we develop a model to accurately predict the best system configuration  this model can be used by software designers to decide on push vs. pull and the consistency model and by flexible hardware to invoke the appropriate coherence and consistency configuration for the given workload. Third, we show that the design dimensions explored here are interdependent, reinforcing the need for softwarehardware codesign in the above design dimensions. For example, software designers deciding on push vs. pull must consider the consistency model supported by hardware  in some cases, push maybe better if hardware supports DRFrlx while pull may be better if hardware does not support DRFrlx.
 [278] arXiv:2002.10246 [pdf, other]

Title: A subtractive manufacturing constraint for level set topology optimizationJournalref: Structural and Multidisciplinary Optimization (2020)Subjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC)
We present a method for enforcing manufacturability constraints in generated parts such that they will be automatically ready for fabrication using a subtractive approach. We primarily target multiaxis CNC milling approaches but the method should generalize to other subtractive methods as well. To this end, we take as user input: the radius of curvature of the tool bit, a coarse model of the tool head and optionally a set of milling directions. This allows us to enforce the following manufacturability conditions: 1) surface smoothness such that the radius of curvature of the part does not exceed the milling bit radius, 2) orientation such that every part of the surface to be milled is visible from at least one milling direction, 3) accessibility such that every surface patch can be reached by the tool bit without interference with the tool or head mount. We will show how to efficiently enforce the constraint during level setbased topology optimization modifying the advection velocity such that at each iteration the topology optimization maintains a descent optimization direction and does not violate any of the manufacturability conditions. This approach models the actual subtractive process by carving away material accessible to the machine at each iteration until a local optimum is achieved.
 [279] arXiv:2002.10248 [pdf, other]

Title: BayesProbe: DistributionGuided Sampling for Prediction Level SetsComments: Significantly expanded version of arXiv:2001.03076, with new problem formulation and experimentsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Building machine learning models requires a suite of tools for interpretation, understanding, and debugging. Many existing methods have been proposed, but it can still be difficult to probe for examples which communicate model behaviour. We introduce BayesProbe, a model inspection method for analyzing neural networks by generating distributionconforming examples of known prediction confidence. By selecting appropriate distributions and confidence prediction values, BayesProbe can be used to synthesize ambivalent predictions, uncover indistribution adversarial examples, and understand novelclass extrapolation and domain adaptation behaviours. BayesProbe is model agnostic, requiring only a data generator and classifier prediction. We use BayesProbe to analyze models trained on both procedurallygenerated data (CLEVR) and organic data (MNIST and FashionMNIST). Code is available at https://github.com/serenabooth/BayesProbe.
 [280] arXiv:2002.10251 [pdf, ps, other]

Title: Identifying stochastic governing equations from data of the most probable transition trajectoriesSubjects: Numerical Analysis (math.NA); Computational Physics (physics.compph); Methodology (stat.ME)
Extracting the governing stochastic differential equation model from elusive data is crucial to understand and forecast dynamics for various systems. We devise a method to extract the drift term and estimate the diffusion coefficient of a governing stochastic dynamical system, from its timeseries data for the most probable transition trajectory. By the OnsagerMachlup theory, the most probable transition trajectory satisfies the corresponding EulerLagrange equation, which is a second order deterministic ordinary differential equation involving the drift term and diffusion coefficient. We first estimate the coefficients of the EulerLagrange equation based on the data of the most probable trajectory, and then we calculate the drift and diffusion coefficient of the governing stochastic dynamical system. These two steps involve sparse regression and optimization. We finally illustrate our method with an example.
 [281] arXiv:2002.10252 [pdf, other]

Title: TensorShield: Tensorbased Defense Against Adversarial Attacks on ImagesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Recent studies have demonstrated that machine learning approaches like deep neural networks (DNNs) are easily fooled by adversarial attacks. Subtle and imperceptible perturbations of the data are able to change the result of deep neural networks. Leveraging vulnerable machine learning methods raises many concerns especially in domains where security is an important factor. Therefore, it is crucial to design defense mechanisms against adversarial attacks. For the task of image classification, unnoticeable perturbations mostly occur in the highfrequency spectrum of the image. In this paper, we utilize tensor decomposition techniques as a preprocessing step to find a lowrank approximation of images which can significantly discard highfrequency perturbations. Recently a defense framework called Shield could "vaccinate" Convolutional Neural Networks (CNN) against adversarial examples by performing randomquality JPEG compressions on local patches of images on the ImageNet dataset. Our tensorbased defense mechanism outperforms the SLQ method from Shield by 14% against FastGradient Descent (FGSM) adversarial attacks, while maintaining comparable speed.
 [282] arXiv:2002.10253 [pdf, other]

Title: PhysicsInformed MultiLSTM Networks for Metamodeling of Nonlinear StructuresComments: 21 pages, 13 figuresSubjects: Computational Engineering, Finance, and Science (cs.CE); Signal Processing (eess.SP)
This paper introduces an innovative physicsinformed deep learning framework for metamodeling of nonlinear structural systems with scarce data. The basic concept is to incorporate physics knowledge (e.g., laws of physics, scientific principles) into deep long shortterm memory (LSTM) networks, which boosts the learning within a feasible solution space. The physics constraints are embedded in the loss function to enforce the model training which can accurately capture latent system nonlinearity even with very limited available training datasets. Specifically for dynamic structures, physical laws of equation of motion, state dependency and hysteretic constitutive relationship are considered to construct the physics loss. In particular, two physicsinformed multiLSTM network architectures are proposed for structural metamodeling. The satisfactory performance of the proposed framework is successfully demonstrated through two illustrative examples (e.g., nonlinear structures subjected to ground motion excitation). It turns out that the embedded physics can alleviate overfitting issues, reduce the need of big training datasets, and improve the robustness of the trained model for more reliable prediction. As a result, the physicsinformed deep learning paradigm outperforms classical nonphysicsguided datadriven neural networks.
 [283] arXiv:2002.10254 [pdf, other]

Title: Empirical Study on Airline Delay Analysis and PredictionComments: Figure 13Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
The Big Data analytics are a logical analysis of very large scale datasets. The data analysis enhances an organization and improve the decision making process. In this article, we present Airline Delay Analysis and Prediction to analyze airline datasets with the combination of weather dataset. In this research work, we consider various attributes to analyze flight delay, for example, daywise, airlinewise, cloud cover, temperature, etc. Moreover, we present rigorous experiments on various machine learning model to predict correctly the delay of a flight, namely, logistic regression with L2 regularization, Gaussian Naive Bayes, KNearest Neighbors, Decision Tree classifier and Random forest model. The accuracy of the Random Forest model is 82% with a delay threshold of 15 minutes of flight delay. The analysis is carried out using dataset from 1987 to 2008, the training is conducted with dataset from 2000 to 2007 and validated prediction result using 2008 data. Moreover, we have got recall 99% in the Random Forest model.
 [284] arXiv:2002.10255 [pdf, other]

Title: Ambiguous phase assignment of discretized 3D geometries in topology optimizationSubjects: Computational Engineering, Finance, and Science (cs.CE); Optimization and Control (math.OC)
Level setbased immersed boundary techniques operate on nonconforming meshes while providing a crisp definition of interface and external boundaries. In such techniques, an isocontour of a level set field interpolated from nodal level set values defines a problem's geometry. If the interface is explicitly tracked, the intersected elements are typically divided into subelements to which a phase needs to be assigned. Due to loss of information in the discretization of the level set field, certain geometrical configurations allow for ambiguous phase assignment of subelements, and thus ambiguous definition of the interface. The study presented here focuses on analyzing these topological ambiguities in embedded geometries constructed from discretized level set fields on hexahedral meshes. The analysis is performed on threedimensional problems where several intersection configurations can significantly affect the problem's topology. This is in contrast to twodimensional problems where ambiguous topological features exist only in one intersection configuration and identifying and resolving them is straightforward. A set of rules that resolve these ambiguities for twophase problems is proposed, and algorithms for their implementations are provided. The influence of these rules on the evolution of the geometry in the optimization process is investigated with linear elastic topology optimization problems. These problems are solved by an explicit level set topology optimization framework that uses the extended finite element method to predict physical responses. This study shows that the choice of a rule to resolve topological features can result in drastically different final geometries. However, for the problems studied in this paper, the performances of the optimized design do not differ.
 [285] arXiv:2002.10258 [pdf, ps, other]

Title: Computation Rate Maximization in Wireless Powered MEC with Spread Spectrum Multiple AccessComments: The paper has been accepted for publication by Proc. IEEE ITOEC 2020Subjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
The integration of mobile edge computing (MEC) and wireless power transfer (WPT) technologies has recently emerged as an effective solution for extending battery life and increasing the computing power of wireless devices. In this paper, we study the resource allocation problem of a multiuser wireless powered MEC system, where the users share the wireless channel via direct sequence code division multiple access (DSCDMA). In particular, we are interested in jointly optimizing the task offloading decisions and resource allocation, to maximize the weighted sum computation rate of all the users in the network. The optimization problem is formulated as a mixed integer nonlinear programming (MINLP). For a given offloading user set, we implement an efficient Fractional Programming (FP) approach to mitigate the multiuser interference in the uplink task offloading. On top of that, we then propose a Stochastic Local Search algorithm to optimize the offloading decisions. Simulation results show that the proposed method can effectively enhance the computing performance of a wireless powered MEC with spread spectrum multiple access compared to other representative benchmark methods.
 [286] arXiv:2002.10259 [pdf, other]

Title: Markov Logic Networks with Complex Weights: Expressivity, Liftability and Fourier TransformsAuthors: Ondrej KuzelkaSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Logic in Computer Science (cs.LO)
We study expressivity of Markov logic networks (MLNs). We introduce complex MLNs, which use complexvalued weights, and we show that, unlike standard MLNs with realvalued weights, complex MLNs are fully expressive. We then observe that discrete Fourier transform can be computed using weighted first order model counting (WFOMC) with complex weights and use this observation to design an algorithm for computing relational marginal polytopes which needs substantially less calls to a WFOMC oracle than a recent algorithm.
 [287] arXiv:2002.10260 [pdf, other]

Title: Fixed Encoder SelfAttention Patterns in TransformerBased Machine TranslationSubjects: Computation and Language (cs.CL)
Transformerbased models have brought a radical change to neural machine translation. A key feature of the Transformer architecture is the socalled multihead attention mechanism, which allows the model to focus simultaneously on different parts of the input. However, recent works have shown that attention heads learn simple positional patterns which are often redundant. In this paper, we propose to replace all but one attention head of each encoder layer with fixed  nonlearnable  attentive patterns that are solely based on position and do not require any external knowledge. Our experiments show that fixing the attention heads on the encoder side of the Transformer at training time does not impact the translation quality and even increases BLEU scores by up to 3 points in lowresource scenarios.
 [288] arXiv:2002.10261 [pdf, other]

Title: Learning from Positive and Unlabeled Data with Arbitrary Positive ShiftSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Positiveunlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption is often violated in practice due to time variation, domain shift, or adversarial concept drift. This paper shows that PU learning is possible even with arbitrarily nonrepresentative positive data when provided unlabeled datasets from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We propose two methods to learn under such arbitrary positive bias. The first couples negativeunlabeled (NU) learning with unlabeledunlabeled (UU) learning while the other uses a novel recursive risk estimator robust to positive shift. Experimental results demonstrate our methods' effectiveness across numerous realworld datasets and forms of positive data bias, including disjoint positive classconditional supports.
 [289] arXiv:2002.10266 [pdf, other]

Title: Rhythm, Chord and Melody Generation for Lead Sheets using Recurrent Neural NetworksComments: 8 pages, 2 figures, 3 tables, 2 appendicesSubjects: Sound (cs.SD); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG); Machine Learning (stat.ML)
Music that is generated by recurrent neural networks often lacks a sense of direction and coherence. We therefore propose a twostage LSTMbased model for lead sheet generation, in which the harmonic and rhythmic templates of the song are produced first, after which, in a second stage, a sequence of melody notes is generated conditioned on these templates. A subjective listening test shows that our approach outperforms the baselines and increases perceived musical coherence.
 [290] arXiv:2002.10268 [pdf, other]

Title: Using Machine Learning to predict extreme events in the Hénon mapComments: 9 pages, 12 figuresJournalref: Chaos: An Interdisciplinary Journal of Nonlinear Science 30.1 (2020): 013113Subjects: Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD); Machine Learning (stat.ML)
Machine Learning (ML) inspired algorithms provide a flexible set of tools for analyzing and forecasting chaotic dynamical systems. We here analyze the performance of one algorithm for the prediction of extreme events in the twodimensional H\'enon map at the classical parameters. The task is to determine whether a trajectory will exceed a threshold after a set number of time steps into the future. This task has a geometric interpretation within the dynamics of the H\'enon map, which we use to gauge the performance of the neural networks that are used in this work. We analyze the dependence of the success rate of the ML models on the prediction time $T$ , the number of training samples $N_T$ and the size of the network $N_p$. We observe that in order to maintain a certain accuracy, $N_T \propto exp(2 h T)$ and $N_p \propto exp(hT)$, where $h$ is the topological entropy. Similar relations between the intrinsic chaotic properties of the dynamics and ML parameters might be observable in other systems as well.
 [291] arXiv:2002.10269 [pdf, other]

Title: A GraphBased Platform for Customer Behavior Analysis using Applications' Clickstream DataAuthors: Mojgan MohajerComments: Technical ReportSubjects: Databases (cs.DB); Machine Learning (cs.LG)
Clickstream analysis is getting more attention since the increase of usage in ecommerce and applications. Beside customers' purchase behavior analysis, there is also attempt to analyze the customer behavior in relation to the quality of web or application design. In general, clickstream data can be considered as a sequence of log events collected at different levels of web/app usage. The analysis of clickstream data can be performed directly as sequence analysis or by extracting features from sequences. In this work, we show how representing and saving the sequences with their underlying graph structures can induce a platform for customer behavior analysis. Our main idea is that clickstream data containing sequences of actions of an application, are walks of the corresponding finite state automaton (FSA) of that application. Our hypothesis is that the customers of an application normally do not use all possible walks through that FSA and the number of actual walks is much smaller than total number of possible walks through the FSA. Sequences of such a walk normally consist of a finite number of cycles on FSA graphs. Identifying and matching these cycles in the classical sequence analysis is not straight forward. We show that representing the sequences through their underlying graph structures not only groups the sequences automatically but also provides a compressed data representation of the original sequences.
 [292] arXiv:2002.10277 [pdf, other]

Title: PUGeoNet: A Geometrycentric Network for 3D Point Cloud UpsamplingSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the problem of generating uniform dense point clouds to describe the underlying geometric structures from given sparse point clouds. Due to the irregular and unordered nature, point cloud densification as a generative task is challenging. To tackle the challenge, we propose a novel deep neural network based method, called PUGeoNet, that learns a $3\times 3$ linear transformation matrix $\bf T$ for each input point. Matrix $\mathbf T$ approximates the augmented Jacobian matrix of a local parameterization and builds a onetoone correspondence between the 2D parametric domain and the 3D tangent plane so that we can lift the adaptively distributed 2D samples (which are also learned from data) to 3D space. After that, we project the samples to the curved surface by computing a displacement along the normal of the tangent plane. PUGeoNet is fundamentally different from the existing deep learning methods that are largely motivated by the image superresolution techniques and generate new points in the abstract feature space. Thanks to its geometrycentric nature, PUGeoNet works well for both CAD models with sharp features and scanned models with rich geometric details. Moreover, PUGeoNet can compute the normal for the original and generated points, which is highly desired by the surface reconstruction algorithms. Computational results show that PUGeoNet, the first neural network that can jointly generate vertex coordinates and normals, consistently outperforms the stateoftheart in terms of accuracy and efficiency for upsampling factor $4\sim 16$
 [293] arXiv:2002.10283 [pdf, other]

Title: The Knowledge Graph Track at OAEI  Gold Standards, Baselines, and the Golden Hammer BiasSubjects: Databases (cs.DB); Artificial Intelligence (cs.AI)
The Ontology Alignment Evaluation Initiative (OAEI) is an annual evaluation of ontology matching tools. In 2018, we have started the Knowledge Graph track, whose goal is to evaluate the simultaneous matching of entities and schemas of largescale knowledge graphs. In this paper, we discuss the design of the track and two different strategies of gold standard creation. We analyze results and experiences obtained in first editions of the track, and, by revealing a hidden task, we show that all tools submitted to the track (and probably also to other tracks) suffer from a bias which we name the golden hammer bias.
 [294] arXiv:2002.10284 [pdf]

Title: Word Embeddings Inherently Recover the Conceptual Organization of the Human MindAuthors: Victor SwiftComments: 12 pages, 4 figuresSubjects: Computation and Language (cs.CL); Neurons and Cognition (qbio.NC)
Machine learning is a means to uncover deep patterns from rich sources of data. Here, we find that machine learning can recover the conceptual organization of the human mind when applied to the natural language use of millions of people. Utilizing text from billions of webpages, we recover most of the concepts contained in English, Dutch, and Japanese, as represented in large scale Word Association networks. Our results justify machine learning as a means to probe the human mind, at a depth and scale that has been unattainable using selfreport and observational methods. Beyond direct psychological applications, our methods may prove useful for projects concerned with defining, assessing, relating, or uncovering concepts in any scientific field.
 [295] arXiv:2002.10286 [pdf, other]

Title: Prediction with Corrupted Expert AdviceSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We revisit the fundamental problem of prediction with expert advice, in a setting where the environment is benign and generates losses stochastically, but the feedback observed by the learner is subject to a moderate adversarial corruption. We prove that a variant of the classical Multiplicative Weights algorithm with decreasing step sizes achieves constant regret in this setting and performs optimally in a wide range of environments, regardless of the magnitude of the injected corruption. Our results reveal a surprising disparity between the often comparable Follow the Regularized Leader (FTRL) and Online Mirror Descent (OMD) frameworks: we show that for experts in the corrupted stochastic regime, the regret performance of OMD is in fact strictly inferior to that of FTRL.
 [296] arXiv:2002.10289 [pdf, other]

Title: EL PASSO: Privacypreserving, Asynchronous Single SignOnSubjects: Cryptography and Security (cs.CR)
We introduce EL PASSO, a privacypreserving, asynchronous Single SignOn (SSO) system. It enables personal authentication while protecting users' privacy against both identity providers and relying parties, and allows selective attribute disclosure. EL PASSO is based on anonymous credentials, yet it supports users' accountability. Selected authorities may recover the identity of allegedly misbehaving users, and users can prove properties about their identity without revealing it in the clear. EL PASSO does not require specific secure hardware or a third party (other than existing participants in SSO). The generation and use of authentication credentials are asynchronous, allowing users to sign on when identity providers are temporarily unavailable. We evaluate EL PASSO in a distributed environment and prove its low computational cost, yielding faster signon operations than OIDC from a regular laptop, onesecond userperceived latency from a lowpower device, and scaling to more than 50 signon operations per second at a relying party using a single 4core server in the cloud.
 [297] arXiv:2002.10294 [pdf, other]

Title: Semantic, Efficient, and Secure Search over Encrypted Cloud DataAuthors: Fateh BoucennaComments: 180 pages, PhD Thesis, University of Sciences and Technology Houari Boumediene (USTHB) Algiers Algeria, searchable encryption, cloud computing, semantic search, homomorphic encryption, data privacy, weighting formulaSubjects: Cryptography and Security (cs.CR); Information Retrieval (cs.IR)
Companies and individuals demand more and more storage space and computing power. For this purpose, several new technologies have been designed and implemented, such as the cloud computing. This technology provides its users with storage space and computing power according to their needs in a flexible and personalized way. However, the outsourced data such as emails, electronic health records, and company reports are sensitive and confidential. Therefore, It is primordial to protect the outsourced data against possible external attacks and the cloud server itself. That is why it is highly recommended to encrypt the sensitive data before being outsourced to a remote server. To perform searches over outsourced data, it is no longer possible to exploit traditional search engines given that these data are encrypted. Consequently, lots of searchable encryption (SE) schemes have been proposed in the literature. Three major research axes of searchable encryption area have been studied in the literature. The first axis consists in ensuring the security of the search approach. Indeed, the search process should be performed without decryption any data and without causing any sensitive information leakage. The second axis consists in studying the search performance. In fact, the encrypted indexes are less efficient than the plaintext indexes, which makes the searchable encryption schemes very slow in practice. More the approach is secure, less it is efficient, thus, the challenge consists in finding the best compromise between security and performance. Finally, the third research axis consists in the quality of the returned results in terms of relevance and recall. The problem is that the encryption of the index causes the degradation of the recall and the precision. Therefore, the goal is to propose a technique that is able to obtain almost the same result obtained in the traditional search.
 [298] arXiv:2002.10295 [pdf, other]

Title: SupRB: A Supervised Rulebased Learning System for Continuous ProblemsComments: Submitted to the Genetic and Evolutionary Computation Conference 2020 (GECCO 2020)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
We propose the SupRB learning system, a new Pittsburghstyle learning classifier system (LCS) for supervised learning on multidimensional continuous decision problems. SupRB learns an approximation of a quality function from examples (consisting of situations, choices and associated qualities) and is then able to make an optimal choice as well as predict the quality of a choice in a given situation. One area of application for SupRB is parametrization of industrial machinery. In this field, acceptance of the recommendations of machine learning systems is highly reliant on operators' trust. While an essential and muchresearched ingredient for that trust is prediction quality, it seems that this alone is not enough. At least as important is a humanunderstandable explanation of the reasoning behind a recommendation. While many stateoftheart methods such as artificial neural networks fall short of this, LCSs such as SupRB provide humanreadable rules that can be understood very easily. The prevalent LCSs are not directly applicable to this problem as they lack support for continuous choices. This paper lays the foundations for SupRB and shows its general applicability on a simplified model of an additive manufacturing problem.
 [299] arXiv:2002.10301 [pdf, other]

Title: Qlearning with Uniformly Bounded Variance: Large Discounting is Not a Barrier to Fast LearningComments: 30 pages, 2 figuresSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Machine Learning (stat.ML)
It has been a trend in the Reinforcement Learning literature to derive sample complexity bounds: a bound on how many experiences with the environment are required to obtain an $\varepsilon$optimal policy. In the discounted cost, infinite horizon setting, all of the known bounds have a factor that is a polynomial in $1/(1\beta)$, where $\beta < 1$ is the discount factor. For a large discount factor, these bounds seem to imply that a very large number of samples is required to achieve an $\varepsilon$optimal policy. The objective of the present work is to introduce a new class of algorithms that have sample complexity uniformly bounded for all $\beta < 1$. One may argue that this is impossible, due to a recent minmax lower bound. The explanation is that this previous lower bound is for a specific problem, which we modify, without compromising the ultimate objective of obtaining an $\varepsilon$optimal policy.
Specifically, we show that the asymptotic variance of the Qlearning algorithm, with an optimized stepsize sequence, is a quadratic function of $1/(1\beta)$; an expected, and essentially known result. The new relative Qlearning algorithm proposed here is shown to have asymptotic variance that is a quadratic in $1/(1 \rho \beta)$, where $1  \rho > 0$ is the spectral gap of an optimal transition matrix.  [300] arXiv:2002.10303 [pdf, ps, other]

Title: Wheeler LanguagesSubjects: Formal Languages and Automata Theory (cs.FL)
The recently introduced class of Wheeler graphs, inspired by the BurrowsWheeler Transform (BWT) of a given string, admits an efficient index data structure for searching for subpaths with a given path label, and lifts the applicability of the BurrowsWheeler transform from strings to languages. In this paper we study the regular languages accepted by automata having a Wheeler graph as transition function, and prove results on determination, Myhill_Nerode characterization, decidability, and closure properties for this class of languages.
 [301] arXiv:2002.10304 [pdf, ps, other]

Title: Fast inplace algorithms for polynomial operations: division, evaluation, interpolationSubjects: Symbolic Computation (cs.SC); Computational Complexity (cs.CC)
We consider spacesaving versions of several important operations on univariate polynomials, namely power series inversion and division, division with remainder, multipoint evaluation, and interpolation. Nowclassical results show that such problems can be solved in (nearly) the same asymptotic time as fast polynomial multiplication. However, these reductions, even when applied to an inplace variant of fast polynomial multiplication, yield algorithms which require at least a linear amount of extra space for intermediate results. We demonstrate new inplace algorithms for the aforementioned polynomial computations which require only constant extra space and achieve the same asymptotic running time as their outofplace counterparts. We also provide a precise complexity analysis so that all constants are made explicit, parameterized by the space usage of the underlying multiplication algorithms.
 [302] arXiv:2002.10306 [pdf, other]

Title: Adaptive Propagation Graph Convolutional NetworkComments: Preprint submitted to IEEE Transaction on Neural Networks and Learning SystemsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph convolutional networks (GCNs) are a family of neural network models that perform inference on graph data by interleaving vertexwise operations and messagepassing exchanges across nodes. Concerning the latter, two key questions arise: (i) how to design a differentiable exchange protocol (e.g., a 1hop Laplacian smoothing in the original GCN), and (ii) how to characterize the tradeoff in complexity with respect to the local updates. In this paper, we show that stateoftheart results can be achieved by adapting the number of communication steps independently at every node. In particular, we endow each node with a halting unit (inspired by Graves' adaptive computation time) that after every exchange decides whether to continue communicating or not. We show that the proposed adaptive propagation GCN (APGCN) achieves superior or similar results to the best proposed models so far on a number of benchmarks, while requiring a small overhead in terms of additional parameters. We also investigate a regularization term to enforce an explicit tradeoff between communication and accuracy. The code for the APGCN experiments is released as an opensource library.
 [303] arXiv:2002.10309 [pdf, other]

Title: Uncertainty based Class Activation Maps for Visual Question AnsweringComments: This work is an extension of our ICCV2019 work. arXiv admin note: text overlap with arXiv:1908.06306Subjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL); Machine Learning (cs.LG)
Understanding and explaining deep learning models is an imperative task. Towards this, we propose a method that obtains gradientbased certainty estimates that also provide visual attention maps. Particularly, we solve for visual question answering task. We incorporate modern probabilistic deep learning methods that we further improve by using the gradients for these estimates. These have twofold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide stateoftheart results in terms of correlation with human attention regions. The improved attention maps result in consistent improvement for various methods for visual question answering. Therefore, the proposed technique can be thought of as a recipe for obtaining improved certainty estimates and explanations for deep learning models. We provide detailed empirical analysis for the visual question answering task on all standard benchmarks and comparison with state of the art methods.
 [304] arXiv:2002.10310 [pdf, other]

Title: Sketch Less for More: OntheFly FineGrained Sketch Based Image RetrievalComments: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Finegrained sketchbased image retrieval (FGSBIR) addresses the problem of retrieving a particular photo instance given a user's query sketch. Its widespread applicability is however hindered by the fact that drawing a sketch takes time, and most people struggle to draw a complete and faithful sketch. In this paper, we reformulate the conventional FGSBIR framework to tackle these challenges, with the ultimate goal of retrieving the target photo with the least number of strokes possible. We further propose an onthefly design that starts retrieving as soon as the user starts drawing. To accomplish this, we devise a reinforcement learningbased crossmodal retrieval framework that directly optimizes rank of the groundtruth photo over a complete sketch drawing episode. Additionally, we introduce a novel reward scheme that circumvents the problems related to irrelevant sketch strokes, and thus provides us with a more consistent rank list during the retrieval. We achieve superior earlyretrieval efficiency over stateoftheart methods and alternative baselines on two publicly available finegrained sketch retrieval datasets.
 [305] arXiv:2002.10312 [pdf, ps, other]

Title: Learning Certified Individually Fair RepresentationsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
To effectively enforce fairness constraints one needs to define an appropriate notion of fairness and employ representation learning in order to impose this notion without compromising downstream utility for the data consumer. A desirable notion is individual fairness as it guarantees similar treatment for similar individuals. In this work, we introduce the first method which generalizes individual fairness to rich similarity notions via logical constraints while also enabling data consumers to obtain fairness certificates for their models. The key idea is to learn a representation that provably maps similar individuals to latent representations at most $\epsilon$ apart in $\ell_{\infty}$distance, enabling data consumers to certify individual fairness by proving $\epsilon$robustness of their classifier. Our experimental evaluation on six realworld datasets and a wide range of fairness constraints demonstrates that our approach is expressive enough to capture similarity notions beyond existing distance metrics while scaling to realistic use cases.
 [306] arXiv:2002.10313 [pdf, other]

Title: Imagining DataObjects for Reflective SelfTrackingSubjects: HumanComputer Interaction (cs.HC)
While selftracking data is typically captured realtime in a lived experience, the data is often stored in a manner detached from the context where it belongs. Research has shown that there is a potential to enhance people's lived experiences with dataobjects (artifacts representing contextually relevant data), for individual and collective reflections through a physical portrayal of data. This paper expands that research by studying how to design contextually relevant dataobjects based on people's needs. We conducted a participatory research project with five households using object theater as a core method to encourage participants to speculate upon combinations of meaningful objects and personal data archives. In this paper, we detail three aspects that seem relevant for designing dataobjects: social sharing, contextual ambiguity and interaction with the body. We show how an experiencecentric view on dataobjects can contribute with the contextual, social and bodily interplay between people, data and objects.
 [307] arXiv:2002.10316 [pdf, other]

Title: Fair Bandit Learning with Delayed Impact of ActionsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Algorithmic fairness has been studied mostly in a static setting where the implicit assumptions are that the frequencies of historically made decisions do not impact the problem structure in subsequent future. However, for example, the capability to pay back a loan for people in a certain group might depend on historically how frequently that group has been approved loan applications. If banks keep rejecting loan applications to people in a disadvantaged group, it could create a feedback loop and further damage the chance of getting loans for people in that group. This challenge has been noted in several recent works but is underexplored in a more generic sequential learning setting. In this paper, we formulate this delayed and longterm impact of actions within the context of multiarmed bandits (MAB). We generalize the classical bandit setting to encode the dependency of this action "bias" due to the history of the learning. Our goal is to learn to maximize the collected utilities over time while satisfying fairness constraints imposed over arms' utilities, which again depend on the decision they have received. We propose an algorithm that achieves a regret of $\tilde{\mathcal{O}}(KT^{2/3})$ and show a matching regret lower bound of $\Omega(KT^{2/3})$, where $K$ is the number of arms and $T$ denotes the learning horizon. Our results complement the bandit literature by adding techniques to deal with actions with longterm impacts and have implications in designing fair algorithms.
 [308] arXiv:2002.10319 [pdf, other]

Title: SelfAdaptive Training: beyond Empirical Risk MinimizationComments: 13 pages, 7 figures, 5 tablesSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
We propose selfadaptive traininga new training algorithm that dynamically corrects problematic training labels by model predictions without incurring extra computational costto improve generalization of deep learning for potentially corrupted training data. This problem is crucial towards robustly learning from data that are corrupted by, e.g., label noises and outofdistribution samples. The standard empirical risk minimization (ERM) for such data, however, may easily overfit noises and thus suffers from suboptimal performance. In this paper, we observe that model predictions can substantially benefit the training process: selfadaptive training significantly improves generalization over ERM under various levels of noises, and mitigates the overfitting issue in both natural and adversarial training. We evaluate the errorcapacity curve of selfadaptive training: the test error is monotonously decreasing w.r.t. model capacity. This is in sharp contrast to the recentlydiscovered doubledescent phenomenon in ERM which might be a result of overfitting of noises. Experiments on CIFAR and ImageNet datasets verify the effectiveness of our approach in two applications: classification with label noise and selective classification. We release our code at \url{https://github.com/LayneH/selfadaptivetraining}.
 [309] arXiv:2002.10322 [pdf, other]

Title: Anatomyaware 3D Human Pose Estimation in VideosSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we propose a new solution for 3D human pose estimation in videos. Instead of directly regressing the 3D joint locations, we draw inspiration from the human skeleton anatomy and decompose the task into bone direction prediction and bone length prediction, from which the 3D joint locations can be completely derived. Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time. This promotes us to develop effective techniques to utilize global information across {\it all} the frames in a video for highaccuracy bone length prediction. Moreover, for the bone direction prediction network, we propose a fullyconvolutional propagating architecture with long skip connections. Essentially, it predicts the directions of different bones hierarchically without using any timeconsuming memory units (e.g. LSTM). A novel joint shift loss is further introduced to bridge the training of the bone length and bone direction prediction networks. Finally, we employ an implicit attention mechanism to feed the 2D keypoint visibility scores into the model as extra guidance, which significantly mitigates the depth ambiguity in many challenging poses. Our full model outperforms the previous best results on Human3.6M and MPIINF3DHP datasets, where comprehensive evaluation validates the effectiveness of our model.
 [310] arXiv:2002.10327 [pdf, ps, other]

Title: Angle Aware User Cooperation for Secure Massive MIMO in Rician Fading ChannelComments: 14 pages, 12 figures, accepted by IEEE Journal on Selected Areas in CommunicationsSubjects: Information Theory (cs.IT)
Massive multipleinput multipleoutput communications can achieve highlevel security by concentrating radio frequency signals towards the legitimate users. However, this system is vulnerable in a Rician fading environment if the eavesdropper positions itself such that its channel is highly "similar" to the channel of a legitimate user. To address this problem, this paper proposes an angle aware user cooperation (AAUC) scheme, which avoids direct transmission to the attacked user and relies on other users for cooperative relaying. The proposed scheme only requires the eavesdropper's angle information, and adopts an angular secrecy model to represent the average secrecy rate of the attacked system. With this angular model, the AAUC problem turns out to be nonconvex, and a successive convex optimization algorithm, which converges to a KarushKuhnTucker solution, is proposed. Furthermore, a closedform solution and a Bregman firstorder method are derived for the cases of largescale antennas and largescale users, respectively. Extension to the intelligent reflecting surfaces based scheme is also discussed. Simulation results demonstrate the effectiveness of the proposed successive convex optimization based AAUC scheme, and also validate the lowcomplexity nature of the proposed largescale optimization algorithms.
 [311] arXiv:2002.10329 [pdf, other]

Title: KBSET  KnowledgeBased Support for Scholarly Editing and Text Processing with Declarative LaTeX Markup and a Core Written in SWIPrologComments: To appear in DECLARE 2019 Revised Selected PapersSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
KBSET is an environment that provides support for scholarly editing in two flavors: First, as a practical tool KBSET/Letters that accompanies the development of editions of correspondences (in particular from the 18th and 19th century), completely from source documents to PDF and HTML presentations. Second, as a prototypical tool KBSET/NER for experimentally investigating novel forms of working on editions that are centered around automated named entity recognition. KBSET can process declarative applicationspecific markup that is expressed in LaTeX notation and incorporate large external fact bases that are typically provided in RDF. KBSET includes specially developed LaTeX styles and a core system that is written in SWIProlog, which is used there in many roles, utilizing that it realizes the potential of Prolog as a unifying language.
 [312] arXiv:2002.10330 [pdf, other]

Title: FSinR: an exhaustive package for feature selectionComments: 17 pages, 6 figures, 2 tablesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Feature Selection (FS) is a key task in Machine Learning. It consists in selecting a number of relevant variables for the model construction or data analysis. We present the R package, FSinR, which implements a variety of widely known filter and wrapper methods, as well as search algorithms. Thus, the package provides the possibility to perform the feature selection process, which consists in the combination of a guided search on the subsets of features with the filter or wrapper methods that return an evaluation measure of those subsets. In this article, we also present some examples on the usage of the package and a comparison with other packages available in R that contain methods for feature selection.
 [313] arXiv:2002.10333 [pdf]

Title: A New Approach for Improvement Security against DoS Attacks in Vehicular Adhoc NetworkComments: 7 pages, 12 figures, 2 tables, 4 equation, journalJournalref: Int J Adv Comput Sci Appl, 7(7), 1016 (2016)Subjects: Cryptography and Security (cs.CR); Performance (cs.PF)
Vehicular AdHoc Networks (VANET) are a proper subset of mobile wireless networks, where nodes are revulsive, the vehicles are armed with special electronic devices on the motherboard OBU (On Board Unit) which enables them to trasmit and receive messages from other vehicles in the VANET. Furthermore the communication between the vehicles, the VANET interface is donated by the contact points with road infrastructure. VANET is a subgroup of MANETs. Unlike the MANETs nodes, VANET nodes are moving very fast. Impound a permanent route for the dissemination of emergency messages and alerts from a danger zone is a very challenging task. Therefore, routing plays a significant duty in VANETs. decreasing network overhead, avoiding network congestion, increasing traffic congestion and packet delivery ratio are the most important issues associated with routing in VANETs. In addition, VANET network is subject to various security attacks. In base VANET systems, an algorithm is used to dicover attacks at the time of confirmation in which overhead delay occurs. This paper proposes (PSecure) approach which is used for the detection of DoS attacks before the confirmation time. This reduces the overhead delays for processing and increasing the security in VANETs. Simulation results show that the PSecure approach, is more efficient than OBUmodelVaNET approach in terms of PDR, e2e_delay, throughput and drop packet rate.
 [314] arXiv:2002.10336 [pdf, other]

Title: SemiSupervised Speech Recognition via Local Prior MatchingSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
For sequence transduction tasks like speech recognition, a strong structured prior model encodes rich information about the target space, implicitly ruling out invalid sequences by assigning them low probability. In this work, we propose local prior matching (LPM), a semisupervised objective that distills knowledge from a strong prior (e.g. a language model) to provide learning signal to a discriminative model trained on unlabeled speech. We demonstrate that LPM is theoretically wellmotivated, simple to implement, and superior to existing knowledge distillation techniques under comparable settings. Starting from a baseline trained on 100 hours of labeled speech, with an additional 360 hours of unlabeled data, LPM recovers 54% and 73% of the word error rate on clean and noisy test sets relative to a fully supervised model on the same data.
 [315] arXiv:2002.10340 [pdf, other]

Title: Guessing State Tracking for Visual DialogueComments: 9 pages, 5 figures, Nov. 2019, this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
The Guesser plays an important role in GuessWhat?! like visual dialogues. It locates the target object in an image supposed by an oracle oneself over a questionanswer based dialogue between a Questioner and the Oracle. Most existing guessers make one and only one guess after receiving all questionanswer pairs in a dialogue with predefined number of rounds. This paper proposes the guessing state for the guesser, and regards guess as a process with change of guessing state through a dialogue. A guessing state tracking based guess model is therefore proposed. The guessing state is defined as a distribution on candidate objects in the image. A state update algorithm including three modules is given. UoVR updates the representation of the image according to current guessing state, QAEncoder encodes the questionanswer pairs, and UoGS updates the guessing state by combining both information from the image and dialogue history. With the guessing state in hand, two loss functions are defined as supervisions for model training. Early supervision brings supervision to guesser at early rounds, and incremental supervision brings monotonicity to the guessing state. Experimental results on GuessWhat?! dataset show that our model significantly outperforms previous models, achieves new stateoftheart, especially, the success rate of guessing 83.3% is approaching humanlevel performance 84.4%.
 [316] arXiv:2002.10342 [pdf, other]

Title: Comparing ViewBased and MapBased Semantic Labelling in RealTime SLAMComments: ICRA 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Generally capable Spatial AI systems must build persistent scene representations where geometric models are combined with meaningful semantic labels. The many approaches to labelling scenes can be divided into two clear groups: viewbased which estimate labels from the input viewwise data and then incrementally fuse them into the scene model as it is built; and mapbased which label the generated scene model. However, there has so far been no attempt to quantitatively compare viewbased and mapbased labelling. Here, we present an experimental framework and comparison which uses realtime height map fusion as an accessible platform for a fair comparison, opening up the route to further systematic research in this area.
 [317] arXiv:2002.10344 [pdf, other]

Title: On the Forward and Backward Motion of MilliBristleBotsSubjects: Systems and Control (eess.SY); Robotics (cs.RO)
This works presents the theoretical analysis and experimental observations of bidirectional motion of a millimeterscale bristle robot (millibristlebot) with an onboard piezoelectric actuator. First, the theory of the motion, based on the dryfriction model, is developed and the frequency regions of the forward and backward motion, along with resonant frequencies of the system are predicted. Secondly, millibristlebots with two different bristle tilt angles are fabricated, and their bidirectional motions are experimentally investigated. The dependency of the robot speed on the actuation frequency is studied,which reveals two distinct frequency regions for the forward and backward motions that well matches our theoretical predictions. Furthermore, the dependencies of the resonance frequency and robot speed on the bristle tilt angle are experimentally studied and tied to the theoretical model. This work marks the first demonstration of bidirectional motion at millimeterscales, achieved for bristlebots with a single onboard actuator.
 [318] arXiv:2002.10345 [pdf, other]

Title: Improving BERT FineTuning via SelfEnsemble and SelfDistillationComments: 7 pages, 6 figuresSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Finetuning pretrained language models like BERT has become an effective way in NLP and yields stateoftheart results on many downstream tasks. Recent studies on adapting BERT to new tasks mainly focus on modifying the model structure, redesigning the pretrain tasks, and leveraging external data and knowledge. The finetuning strategy itself has yet to be fully explored. In this paper, we improve the finetuning of BERT with two effective mechanisms: selfensemble and selfdistillation. The experiments on text classification and natural language inference tasks show our proposed methods can significantly improve the adaption of BERT without any external data or knowledge.
 [319] arXiv:2002.10347 [pdf, other]

Title: MilliCar  An ns3 Module for mmWave NR V2X NetworksComments: 8 pages, 5 figures. Submitted to WNS3 2020. The code related to this paper can be found at this https URLSubjects: Networking and Internet Architecture (cs.NI)
Vehicletovehicle (V2V) communications have opened the way towards cooperative automated driving as a means to guarantee improved road safety and traffic efficiency. The use of the millimeter wave (mmWave) spectrum for V2V, in particular, holds great promise since the large bandwidth available offers the possibility of realizing highdatarate connections. However, this potential is hindered by the significant path and penetration loss experienced at these frequencies. It then becomes fundamental to practically evaluate the feasibility of installing mmWavebased technologies in the vehicular scenario, in view of the strict latency and throughput requirements of future automotive applications. To do so, in this paper we present MilliCar, the first ns3 module for V2V mmWave networks, which features a detailed implementation of the sidelink Physical (PHY) and Medium Access Control (MAC) layers based on the latest NR V2X specifications, the 3GPP standard for nextgeneration vehicular systems. Our module is opensource and enables researchers to compare possible design options and their relative performance through an endtoend fullstack approach, thereby stimulating further research on this topic.
 [320] arXiv:2002.10348 [pdf, other]

Title: LowResource KnowledgeGrounded Dialogue GenerationComments: Published in ICLR 2020Subjects: Computation and Language (cs.CL)
Responding with knowledge has been recognized as an important capability for an intelligent conversational agent. Yet knowledgegrounded dialogues, as training data for learning such a response generation model, are difficult to obtain. Motivated by the challenge in practice, we consider knowledgegrounded dialogue generation under a natural assumption that only limited training examples are available. In such a lowresource setting, we devise a disentangled response decoder in order to isolate parameters that depend on knowledgegrounded dialogues from the entire generation model. By this means, the major part of the model can be learned from a large number of ungrounded dialogues and unstructured documents, while the remaining small parameters can be well fitted using the limited training examples. Evaluation results on two benchmarks indicate that with only 1/8 training data, our model can achieve the stateoftheart performance and generalize well on outofdomain knowledge.
 [321] arXiv:2002.10349 [pdf, other]

Title: A ModelBased DerivativeFree Approach to BlackBox Adversarial Examples: BOBYQASubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We demonstrate that modelbased derivative free optimisation algorithms can generate adversarial targeted misclassification of deep networks using fewer network queries than nonmodelbased methods. Specifically, we consider the blackbox setting, and show that the number of networks queries is less impacted by making the task more challenging either through reducing the allowed $\ell^{\infty}$ perturbation energy or training the network with defences against adversarial misclassification. We illustrate this by contrasting the BOBYQA algorithm with the stateoftheart modelfree adversarial targeted misclassification approaches based on genetic, combinatorial, and directsearch algorithms. We observe that for high $\ell^{\infty}$ energy perturbations on networks, the aforementioned simpler modelfree methods require the fewest queries. In contrast, the proposed BOBYQA based method achieves stateoftheart results when the perturbation energy decreases, or if the network is trained against adversarial perturbations.
 [322] arXiv:2002.10361 [pdf, other]

Title: Multilingual Twitter Corpus and Baselines for Evaluating Demographic Bias in Hate Speech RecognitionComments: Accepted at LREC 2020Subjects: Computation and Language (cs.CL)
Existing research on fairness evaluation of document classification models mainly uses synthetic monolingual data without ground truth for author demographic attributes. In this work, we assemble and publish a multilingual Twitter corpus for the task of hate speech detection with inferred four author demographic factors: age, country, gender and race/ethnicity. The corpus covers five languages: English, Italian, Polish, Portuguese and Spanish. We evaluate the inferred demographic labels with a crowdsourcing platform, Figure Eight. To examine factors that can cause biases, we take an empirical analysis of demographic predictability on the English corpus. We measure the performance of four popular document classifiers and evaluate the fairness and bias of the baseline classifiers on the authorlevel demographic attributes.
 [323] arXiv:2002.10362 [pdf, other]

Title: Group Membership Verification with Privacy: Sparse or Dense?Subjects: Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)
Group membership verification checks if a biometric trait corresponds to one member of a group without revealing the identity of that member. Recent contributions provide privacy for group membership protocols through the joint use of two mechanisms: quantizing templates into discrete embeddings and aggregating several templates into one group representation. However, this scheme has one drawback: the data structure representing the group has a limited size and cannot recognize noisy queries when many templates are aggregated. Moreover, the sparsity of the embeddings seemingly plays a crucial role on the performance verification. This paper proposes a mathematical model for group membership verification allowing to reveal the impact of sparsity on both security, compactness, and verification performances. This model bridges the gap towards a Bloom filter robust to noisy queries. It shows that a dense solution is more competitive unless the queries are almost noiseless.
 [324] arXiv:2002.10363 [pdf, other]

Title: Joint Learning of Assignment and Representation for Biometric Group MembershipSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR)
This paper proposes a framework for group membership protocols preventing the curious but honest server from reconstructing the enrolled biometric signatures and inferring the identity of querying clients. This framework learns the embedding parameters, group representations and assignments simultaneously. Experiments show the tradeoff between security/privacy and verification/identification performances.
 [325] arXiv:2002.10365 [pdf, other]

Title: The Early Phase of Neural Network TrainingComments: ICLR 2020 Camera Ready. Available on OpenReview at this https URLSubjects: Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)
Recent studies have shown that many important aspects of neural network learning take place within the very earliest iterations or epochs of training. For example, sparse, trainable subnetworks emerge (Frankle et al., 2019), gradient descent moves into a small subspace (GurAri et al., 2018), and the network undergoes a critical period (Achille et al., 2019). Here, we examine the changes that deep neural networks undergo during this early phase of training. We perform extensive measurements of the network state during these early iterations of training and leverage the framework of Frankle et al. (2019) to quantitatively probe the weight distribution and its reliance on various aspects of the dataset. We find that, within this framework, deep networks are not robust to reinitializing with random weights while maintaining signs, and that weight distributions are highly nonindependent even after only a few hundred iterations. Despite this behavior, pretraining with blurred inputs or an auxiliary selfsupervised task can approximate the changes in supervised networks, suggesting that these changes are not inherently labeldependent, though labels significantly accelerate this process. Together, these results help to elucidate the network changes occurring during this pivotal initial period of learning.
 [326] arXiv:2002.10371 [pdf, ps, other]

Title: A Hardware Architecture for Reconfigurable Intelligent Surfaces with Minimal Active Elements for Explicit Channel EstimationComments: 5 pages, 2 figures, invited/accepted to IEEE ICASSP 2020Subjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Intelligent surfaces comprising of cost effective, nearly passive, and reconfigurable unit elements are lately gaining increasing interest due to their potential in enabling fully programmable wireless environments. They are envisioned to offer environmental intelligence for diverse communication objectives, when coated on various objects of the deployment area of interest. To achieve this overarching goal, the channels where the Reconfigurable Intelligent Surfaces (RISs) are involved need to be in principle estimated. However, this is a challenging task with the currently available hardware RIS architectures requiring lengthy training periods among the network nodes utilizing RISassisted wireless communication. In this paper, we present a novel RIS architecture comprising of any number of passive reflecting elements, a simple controller for their adjustable configuration, and a single Radio Frequency (RF) chain for baseband measurements. Capitalizing on this architecture and assuming sparse wireless channels in the beamspace domain, we present an alternating optimization approach for explicit estimation of the channel gains at the RIS elements attached to the single RF chain. Representative simulation results demonstrate the channel estimation accuracy and achievable endtoend performance for various training lengths and numbers of reflecting unit elements.
 [327] arXiv:2002.10373 [pdf, other]

Title: Symbolic Learning and Reasoning with Noisy Data for Probabilistic AnchoringSubjects: Artificial Intelligence (cs.AI)
Robotic agents should be able to learn from subsymbolic sensor data, and at the same time, be able to reason about objects and communicate with humans on a symbolic level. This raises the question of how to overcome the gap between symbolic and subsymbolic artificial intelligence. We propose a semantic world modeling approach based on bottomup object anchoring using an objectcentered representation of the world. Perceptual anchoring processes continuous perceptual sensor data and maintains a correspondence to a symbolic representation. We extend the definitions of anchoring to handle multimodal probability distributions and we couple the resulting symbol anchoring system to a probabilistic logic reasoner for performing inference. Furthermore, we use statistical relational learning to enable the anchoring framework to learn symbolic knowledge in the form of a set of probabilistic logic rules of the world from noisy and subsymbolic sensor input. The resulting framework, which combines perceptual anchoring and statistical relational learning, is able to maintain a semantic world model of all the objects that have been perceived over time, while still exploiting the expressiveness of logical rules to reason about the state of objects which are not directly observed through sensory input data. To validate our approach we demonstrate, on the one hand, the ability of our system to perform probabilistic reasoning over multimodal probability distributions, and on the other hand, the learning of probabilistic logical rules from anchored objects produced by perceptual observations. The learned logical rules are, subsequently, used to assess our proposed probabilistic anchoring procedure. We demonstrate our system in a setting involving object interactions where object occlusions arise and where probabilistic inference is needed to correctly anchor objects.
 [328] arXiv:2002.10375 [pdf, other]

Title: Discriminative Adversarial Search for Abstractive SummarizationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
We introduce a novel approach for sequence decoding, Discriminative Adversarial Search (DAS), which has the desirable properties of alleviating the effects of exposure bias without requiring external metrics. Inspired by Generative Adversarial Networks (GANs), wherein a discriminator is used to improve the generator, our method differs from GANs in that the generator parameters are not updated at training time and the discriminator is only used to drive sequence generation at inference time.
We investigate the effectiveness of the proposed approach on the task of Abstractive Summarization: the results obtained show that a naive application of DAS improves over the stateoftheart methods, with further gains obtained via discriminator retraining. Moreover, we show how DAS can be effective for crossdomain adaptation. Finally, all results reported are obtained without additional rulebased filtering strategies, commonly used by the best performing systems available: this indicates that DAS can effectively be deployed without relying on posthoc modifications of the generated outputs.  [329] arXiv:2002.10376 [pdf, other]

Title: The Two Regimes of Deep Network TrainingComments: 14 pages (5 of appendix), 14 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Learning rate schedule has a major impact on the performance of deep learning models. Still, the choice of a schedule is often heuristical. We aim to develop a precise understanding of the effects of different learning rate schedules and the appropriate way to select them. To this end, we isolate two distinct phases of training, the first, which we refer to as the "largestep" regime, exhibits a rather poor performance from an optimization point of view but is the primary contributor to model generalization; the latter, "smallstep" regime exhibits much more "convexlike" optimization behavior but used in isolation produces models that generalize poorly. We find that by treating these regimes separatelyand em specializing our training algorithm to each one of them, we can significantly simplify learning rate schedules.
 [330] arXiv:2002.10378 [pdf, other]

Title: Supervised Deep Similarity MatchingSubjects: Machine Learning (cs.LG); Neurons and Cognition (qbio.NC); Machine Learning (stat.ML)
We propose a novel biologicallyplausible solution to the credit assignment problem, being motivated by observations in the ventral visual pathway and trained deep neural networks. In both, representations of objects in the same category become progressively more similar, while objects belonging to different categories becomes less similar. We use this observation to motivate a layerspecific learning goal in a deep network: each layer aims to learn a representational similarity matrix that interpolates between previous and later layers. We formulate this idea using a supervised deep similarity matching cost function and derive from it deep neural networks with feedforward, lateral and feedback connections, and neurons that exhibit biologicallyplausible Hebbian and antiHebbian plasticity. Supervised deep similarity matching can be interpreted as an energybased learning algorithm, but with significant differences from others in how a contrastive function is constructed.
 [331] arXiv:2002.10381 [pdf, other]

Title: Sketchformer: Transformerbased Representation for Sketched StructureComments: Accepted for publication at CVPR 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Sketchformer is a novel transformerbased representation for encoding freehand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image retrieval (SBIR), and the reconstruction and interpolation of sketches. We report several variants exploring continuous and tokenized input representations, and contrast their performance. Our learned embedding, driven by a dictionary learning tokenization scheme, yields state of the art performance in classification and image retrieval tasks, when compared against baseline representations driven by LSTM sequence to sequence architectures: SketchRNN and derivatives. We show that sketch reconstruction and interpolation are improved significantly by the Sketchformer embedding for complex sketches with longer stroke sequences.
 [332] arXiv:2002.10384 [pdf, ps, other]

Title: On the Sample Complexity of Adversarial MultiSource PAC LearningSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
We study the problem of learning from multiple untrusted data sources, a scenario of increasing practical relevance given the recent emergence of crowdsourcing and collaborative learning paradigms. Specifically, we analyze the situation in which a learning system obtains datasets from multiple sources, some of which might be biased or even adversarially perturbed. It is known that in the singlesource case, an adversary with the power to corrupt a fixed fraction of the training data can prevent PAClearnability, that is, even in the limit of infinitely much training data, no learning system can approach the optimal test error. In this work we show that, surprisingly, the same is not true in the multisource setting, where the adversary can arbitrarily corrupt a fixed fraction of the data sources. Our main results are a generalization bound that provides finitesample guarantees for this learning setting, as well as corresponding lower bounds. Besides establishing PAClearnability our results also show that in a cooperative learning setting sharing data with other parties has provable benefits, even if some participants are malicious.
 [333] arXiv:2002.10387 [pdf, other]

Title: Achievable Information Rates for Probabilistic Amplitude Shaping: A MinimumRandomness Approach via Random SignCoding ArgumentsComments: 10 pages, 4 figuresSubjects: Information Theory (cs.IT)
Probabilistic amplitude shaping (PAS) is a coded modulation strategy in which constellation shaping and channel coding are combined. PAS has attracted considerable attention in both wireless and optical communications. Achievable information rates (AIRs) of PAS have been investigated in the literature using Gallager's error exponent approach. In particular, it has been shown that PAS achieves the capacity of a memoryless channel. In this work, we revisit the capacityachieving property of PAS, and derive AIRs using weak typicality. We provide alternative proofs based on random signcoding arguments. Our objective is to minimize the randomness in the random coding experiment. Accordingly, in our proofs, only some signs of the channel inputs are drawn from a random code, while the remaining signs and the amplitudes are produced constructively. We consider both symbolmetric and bitmetric decoding.
 [334] arXiv:2002.10389 [pdf, other]

Title: SemiSupervised Neural Architecture SearchComments: Code available at this https URLSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Neural architecture search (NAS) relies on a good controller to generate better architectures or predict the accuracy of given architectures. However, training the controller requires both abundant and highquality pairs of architectures and their accuracy, while it is costly to evaluate an architecture and obtain its accuracy. In this paper, we propose \emph{SemiNAS}, a semisupervised NAS approach that leverages numerous unlabeled architectures~(without evaluation and thus nearly no cost) to improve the controller. Specifically, SemiNAS 1) trains an initial controller with a small set of architectureaccuracy data pairs; 2) uses the trained controller to predict the accuracy of large amount of architectures~(without evaluation); and 3) adds the generated data pairs to the original data to further improve the controller. SemiNAS has two advantages: 1) It reduces the computational cost under the same accuracy guarantee. 2) It achieves higher accuracy under the same computational cost. On NASBench101 benchmark dataset, it discovers a top 0.01% architecture after evaluating roughly 300 architectures, with only 1/7 computational cost compared with regularized evolution and gradientbased methods. On ImageNet, it achieves 24.2% top1 error rate (under the mobile setting) using 4 GPUdays for search. We further apply it to LJSpeech text to speech task and it achieves 97% intelligibility rate in the lowresource setting and 15% test error rate in the robustness setting, with 9%, 7% improvements over the baseline respectively. Our code is available at https://github.com/renqianluo/SemiNAS.
 [335] arXiv:2002.10390 [pdf, other]

Title: SpatialTemporal Moving Target Defense: A Markov Stackelberg Game ModelComments: accepted by AAMAS 2020Subjects: Computer Science and Game Theory (cs.GT); Cryptography and Security (cs.CR)
Moving target defense has emerged as a critical paradigm of protecting a vulnerable system against persistent and stealthy attacks. To protect a system, a defender proactively changes the system configurations to limit the exposure of security vulnerabilities to potential attackers. In doing so, the defender creates asymmetric uncertainty and complexity for the attackers, making it much harder for them to compromise the system. In practice, the defender incurs a switching cost for each migration of the system configurations. The switching cost usually depends on both the current configuration and the following configuration. Besides, different system configurations typically require a different amount of time for an attacker to exploit and attack. Therefore, a defender must simultaneously decide both the optimal sequences of system configurations and the optimal timing for switching. In this paper, we propose a Markov Stackelberg Game framework to precisely characterize the defender's spatial and temporal decisionmaking in the face of advanced attackers. We introduce a relative value iteration algorithm that computes the defender's optimal moving target defense strategies. Empirical evaluation on realworld problems demonstrates the advantages of the Markov Stackelberg game model for spatialtemporal moving target defense.
 [336] arXiv:2002.10392 [pdf, other]

Title: Suppressing Uncertainties for LargeScale Facial Expression RecognitionComments: This manuscript has been accepted by CVPR2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
Annotating a qualitative largescale facial expression dataset is extremely difficult due to the uncertainties caused by ambiguous facial expressions, lowquality facial images, and the subjectiveness of annotators. These uncertainties lead to a key challenge of largescale Facial Expression Recognition (FER) in deep learning era. To address this problem, this paper proposes a simple yet efficient SelfCure Network (SCN) which suppresses the uncertainties efficiently and prevents deep networks from overfitting uncertain facial images. Specifically, SCN suppresses the uncertainty from two different aspects: 1) a selfattention mechanism over minibatch to weight each training sample with a ranking regularization, and 2) a careful relabeling mechanism to modify the labels of these samples in the lowestranked group. Experiments on synthetic FER datasets and our collected WebEmotion dataset validate the effectiveness of our method. Results on public benchmarks demonstrate that our SCN outperforms current stateoftheart methods with \textbf{88.14}\% on RAFDB, \textbf{60.23}\% on AffectNet, and \textbf{89.35}\% on FERPlus. The code will be available at \href{https://github.com/kaiwang960112/SelfCureNetwork}{https://github.com/kaiwang960112/SelfCureNetwork}.
 [337] arXiv:2002.10394 [pdf, other]

Title: DeepPlume: Very High Resolution RealTime Air Quality MappingComments: 8 pages, 8 figuresSubjects: Machine Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)
This paper presents an engine able to predict jointly the realtime concentration of the main pollutants harming people's health: nitrogen dioxyde (NO2), ozone (O3) and particulate matter (PM2.5 and PM10, which are respectively the particles whose size are below 2.5 um and 10 um).
The engine covers a large part of the world and is fed with realtime official stations measures, atmospheric models' forecasts, land cover data, road networks and traffic estimates to produce predictions with a very high resolution in the range of a few dozens of meters. This resolution makes the engine adapted to very innovative applications like streetlevel air quality mapping or air quality adjusted routing.
Plume Labs has deployed a similar prediction engine to build several products aiming at providing air quality data to individuals and businesses. For the sake of clarity and reproducibility, the engine presented here has been built specifically for this paper and differs quite significantly from the one used in Plume Labs' products. A major difference is in the data sources feeding the engine: in particular, this prediction engine does not include mobile sensors measurements.  [338] arXiv:2002.10400 [pdf, other]

Title: Closing the convergence gap of SGD without replacementSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Stochastic gradient descent without replacement sampling is widely used in practice for model training. However, the vast majority of SGD analyses assumes data sampled with replacement, and when the function minimized is strongly convex, an $\mathcal{O}\left(\frac{1}{T}\right)$ rate can be established when SGD is run for $T$ iterations. A recent line of breakthrough work on SGD without replacement (SGDo) established an $\mathcal{O}\left(\frac{n}{T^2}\right)$ convergence rate when the function minimized is strongly convex and is a sum of $n$ smooth functions, and an $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^3}{T^3}\right)$ rate for sums of quadratics. On the other hand, the tightest known lower bound postulates an $\Omega\left(\frac{1}{T^2}+\frac{n^2}{T^3}\right)$ rate, leaving open the possibility of better SGDo convergence rates in the general case. In this paper, we close this gap and show that SGD without replacement achieves a rate of $\mathcal{O}\left(\frac{1}{T^2}+\frac{n^2}{T^3}\right)$ when the sum of the functions is a quadratic, and offer a new lower bound of $\Omega\left(\frac{n}{T^2}\right)$ for strongly convex functions that are sums of smooth functions.
 [339] arXiv:2002.10401 [pdf]

Title: BLAST: Bridging Length/time scales via Atomistic Simulation ToolkitAuthors: Henry Chan, Badri Narayanan, Mathew Cherukara, Troy D. Loeffler, Michael G. Sternberg, Anthony Avarca, Subramanian K. R. S. SankaranarayananSubjects: Computational Engineering, Finance, and Science (cs.CE); Mesoscale and Nanoscale Physics (condmat.meshall); Materials Science (condmat.mtrlsci)
The everincreasing power of supercomputers coupled with highly scalable simulation codes have made molecular dynamics an indispensable tool in applications ranging from predictive modeling of materials to computational design and discovery of new materials for a broad range of applications. Multifidelity scale bridging between the various flavors of molecular dynamics i.e. abinitio, classical and coarsegrained models has remained a longstanding challenge. Here, we introduce our framework BLAST (Bridging Length/time scales via Atomistic Simulation Toolkit) that leverages machine learning principles to address this challenge. BLAST is a multifidelity scale bridging framework that provide users with the capabilities to train and develop their own classical atomistic and coarsegrained interatomic potentials (force fields) for molecular simulations. BLAST is designed to address several longstanding problems in the molecular simulations community, such as unintended misuse of existing force fields due to knowledge gap between developers and users, bottlenecks in traditional force field development approaches, and other issues relating to the accuracy, efficiency, and transferability of force fields. Here, we discuss several important aspects in force field development and highlight features in BLAST that enable its functionalities and ease of use.
 [340] arXiv:2002.10410 [pdf, other]

Title: Lagrangian Decomposition for Neural Network VerificationAuthors: Rudy Bunel, Alessandro De Palma, Alban Desmaison, Krishnamurthy Dvijotham, Pushmeet Kohli, Philip H.S. Torr, M. Pawan KumarSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
A fundamental component of neural network verification is the computation of bounds on the values their outputs can take. Previous methods have either used offtheshelf solvers, discarding the problem structure, or relaxed the problem even further, making the bounds unnecessarily loose. We propose a novel approach based on Lagrangian Decomposition. Our formulation admits an efficient supergradient ascent algorithm, as well as an improved proximal algorithm. Both the algorithms offer three advantages: (i) they yield bounds that are provably at least as tight as previous dual algorithms relying on Lagrangian relaxations; (ii) they are based on operations analogous to forward/backward pass of neural networks layers and are therefore easily parallelizable, amenable to GPU implementation and able to take advantage of the convolutional structure of problems; and (iii) they allow for anytime stopping while still providing valid bounds. Empirically, we show that we obtain bounds comparable with offtheshelf solvers in a fraction of their running time, and obtain tighter bounds in the same time as previous dual algorithms. This results in an overall speedup when employing the bounds for formal verification.
 [341] arXiv:2002.10411 [pdf]

Title: Clustering and Classification with NonExistence Attributes: A Sentenced Discrepancy Measure Based TechniqueComments: 30 pages, 16 figuresSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
For some or all of the data instances a number of independentworld clustering issues suffer from incomplete data characterization due to losing or absent attributes. Typical clustering approaches cannot be applied directly to such data unless preprocessing by techniques like imputation or marginalization. We have overcome this drawback by utilizing a Sentenced Discrepancy Measure which we refer to as the Attribute Weighted Penalty based Discrepancy (AWPD). Using the AWPD measure, we modified the KMEANS++ and Scalable KMEANS++ for clustering algorithm and k Nearest Neighbor (kNN) for classification so as to make them directly applicable to datasets with nonexistence attributes. We have presented a detailed theoretical analysis which shows that the new AWPD based KMEANS++, Scalable KMEANS++ and kNN algorithm merge into a local prime among the number of iterations is finite. We have reported in depth experiments on numerous benchmark datasets for various forms of NonExistence showing that the projected clustering and classification techniques usually show better results in comparison to some of the renowned imputation methods that are generally used to process such insufficient data. This technique is designed to trace invaluable data to: directly apply our method on the datasets which have NonExistence attributes and establish a method for detecting unstructured NonExistence attributes with the best accuracy rate and minimum cost.
 [342] arXiv:2002.10413 [pdf, other]

Title: Neural Message Passing on High Order PathsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Graph neural network have achieved impressive results in predicting molecular properties, but they do not directly account for local and hidden structures in the graph such as functional groups and molecular geometry. At each propagation step, GNNs aggregate only over first order neighbours, ignoring important information contained in subsequent neighbours as well as the relationships between those higher order connections. In this work, we generalize graph neural nets to pass messages and aggregate across higher order paths. This allows for information to propagate over various levels and substructures of the graph. We demonstrate our model on a few tasks in molecular property prediction.
 [343] arXiv:2002.10416 [pdf, other]

Title: Resources for Turkish Dependency Parsing: Introducing the BOUN Treebank and the BoAT Annotation ToolAuthors: Utku Türk (1), Furkan Atmaca (1), Şaziye Betül Özateş (2), Gözde Berk (2), Seyyit Talha Bedir (1), Abdullatif Köksal (2), Balkız Öztürk Başaran (1), Tunga Güngör (2), Arzucan Özgür (2) ((1) Department of Linguistics Boğaziçi University, (2) Department of Computer Engineering Boğaziçi University)Comments: 29 pages, 5 figures, 10 tables, submitted to Language Resources and EvaluationSubjects: Computation and Language (cs.CL)
In this paper, we describe our contributions and efforts to develop Turkish resources, which include a new treebank (BOUN Treebank) with novel sentences, along with the guidelines we adopted and a new annotation tool we developed (BoAT). The manual annotation process we employed was shaped and implemented by a team of four linguists and five NLP specialists. Decisions regarding the annotation of the BOUN Treebank were made in line with the Universal Dependencies framework, which originated from the works of De Marneffe et al. (2014) and Nivre et al. (2016). We took into account the recent unifying efforts based on the reannotation of other Turkish treebanks in the UD framework (T\"urk et al., 2019). Through the BOUN Treebank, we introduced a total of 9,757 sentences from various topics including biographical texts, national newspapers, instructional texts, popular culture articles, and essays. In addition, we report the parsing results of a graphbased dependency parser obtained over each text type, the total of the BOUN Treebank, and all Turkish treebanks that we either reannotated or introduced. We show that a stateoftheart dependency parser has improved scores for identifying the proper head and the syntactic relationships between the heads and the dependents. In light of these results, we have observed that the unification of the Turkish annotation scheme and introducing a more comprehensive treebank improves performance with regards to dependency parsing
 [344] arXiv:2002.10420 [pdf, other]

Title: Boosting rare benthic macroinvertebrates taxa identification with oneclass classificationComments: 5 pages, 1 figure, 2 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Machine Learning (stat.ML)
Insect monitoring is crucial for understanding the consequences of rapid ecological changes, but taxa identification currently requires tedious manual expert work and cannot be scaledup efficiently. Deep convolutional neural networks (CNNs), provide a viable way to significantly increase the biomonitoring volumes. However, taxa abundances are typically very imbalanced and the amounts of training images for the rarest classes are simply too low for deep CNNs. As a result, the samples from the rare classes are often completely missed, while detecting them has biological importance. In this paper, we propose combining the trained deep CNN with oneclass classifiers to improve the rare species identification. Oneclass classification models are traditionally trained with much fewer samples and they can provide a mechanism to indicate samples potentially belonging to the rare classes for human inspection. Our experiments confirm that the proposed approach may indeed support moving towards partial automation of the taxa identification task.
 [345] arXiv:2002.10429 [pdf]

Title: Distributed Frequency Emergency Control with Coordinated Edge IntelligenceSubjects: Systems and Control (eess.SY); Signal Processing (eess.SP)
Developing effective strategies to rapidly support grid frequency while minimizing loss in case of severe contingencies is an important requirement in power systems. While distributed responsive load demands are commonly adopted for frequency regulation, it is difficult to achieve both rapid response and global accuracy in a practical and costeffective manner. In this paper, the cyberphysical design of an InternetofThings (IoT) enabled system, called Grid Sense, is presented. Grid Sense utilizes a large number of distributed appliances for frequency emergency support. It features a local power loss $\Delta P$ estimation approach for frequency emergency control based on coordinated edge intelligence. The specifically designed smart outlets of Grid Sense detect the frequency disturbance event locally using the parameters sent from the control center to estimate active power loss in the system and to make rapid and accurate switching decisions soon after a severe contingency. Based on a modified IEEE 24bus system, numerical simulations and hardware experiments are conducted to demonstrate the frequency support performance of Grid Sense in the aspects of accuracy and speed. It is shown that Grid Sense equipped with its local $\Delta P$estimation frequency control approach can accurately and rapidly prevent the drop of frequency after a major power loss.
 [346] arXiv:2002.10433 [pdf, other]

Title: From Chess and Atari to StarCraft and Beyond: How Game AI is Driving the World of AIJournalref: KI  Kuenstliche Intelligenz (2020)Subjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
This paper reviews the field of Game AI, which not only deals with creating agents that can play a certain game, but also with areas as diverse as creating game content automatically, game analytics, or player modelling. While Game AI was for a long time not very well recognized by the larger scientific community, it has established itself as a research area for developing and testing the most advanced forms of AI algorithms and articles covering advances in mastering video games such as StarCraft 2 and Quake III appear in the most prestigious journals. Because of the growth of the field, a single review cannot cover it completely. Therefore, we put a focus on important recent developments, including that advances in Game AI are starting to be extended to areas outside of games, such as robotics or the synthesis of chemicals. In this article, we review the algorithms and methods that have paved the way for these breakthroughs, report on the other important areas of Game AI research, and also point out exciting directions for the future of Game AI.
 [347] arXiv:2002.10434 [pdf, other]

Title: Maximum Entropy on the Mean: A Paradigm Shift for Regularization in Image DeblurringComments: 15 pages, 7 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Image deblurring is a notoriously challenging illposed inverse problem. In recent years, a wide variety of approaches have been proposed based upon regularization at the level of the image or on techniques from machine learning. We propose an alternative approach, shifting the paradigm towards regularization at the level of the probability distribution on the space of images. Our method is based upon the idea of maximum entropy on the mean wherein we work at the level of the probability density function of the image whose expectation is our estimate of the ground truth. Using techniques from convex analysis and probability theory, we show that the method is computationally feasible and amenable to very large blurs. Moreover, when images are imbedded with symbology (a known pattern), we show how our method can be applied to approximate the unknown blur kernel with remarkable effects. While our method is stable with respect to small amounts of noise, it does not actively denoise. However, for moderate to large amounts of noise, it performs well by preconditioned denoising with a state of the art method.
 [348] arXiv:2002.10435 [pdf, ps, other]

Title: Learning Structured Distributions From Untrusted Batches: Faster and SimplerComments: 34 pagesSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
We revisit the problem of learning from untrusted batches introduced by Qiao and Valiant [QV17]. Recently, Jain and Orlitsky [JO19] gave a simple semidefinite programming approach based on the cutnorm that achieves essentially informationtheoretically optimal error in polynomial time. Concurrently, Chen et al. [CLM19] considered a variant of the problem where $\mu$ is assumed to be structured, e.g. logconcave, monotone hazard rate, $t$modal, etc. In this case, it is possible to achieve the same error with sample complexity sublinear in $n$, and they exhibited a quasipolynomial time algorithm for doing so using Haar wavelets.
In this paper, we find an appealing way to synthesize the techniques of [JO19] and [CLM19] to give the best of both worlds: an algorithm which runs in polynomial time and can exploit structure in the underlying distribution to achieve sublinear sample complexity. Along the way, we simplify the approach of [JO19] by avoiding the need for SDP rounding and giving a more direct interpretation of it through the lens of soft filtering, a powerful recent technique in highdimensional robust estimation.  [349] arXiv:2002.10438 [pdf, other]

Title: LogicGAN: Logicguided Generative Adversarial NetworksComments: 6 pages (+ 1 page for reference) Vineel Nagisetty and Laura Graves are joint first authorsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Generative Adversarial Networks (GANs) are a revolutionary class of Deep Neural Networks (DNNs) that have been successfully used to generate realistic images, music, text, and other data. However, it is well known that GAN training can be notoriously resourceintensive and presents many challenges. Further, a potential weakness in GANs is that discriminator DNNs typically provide only one value (loss) of corrective feedback to generator DNNs (namely, the discriminator's assessment of the generated example). By contrast, we propose a new class of GAN we refer to as LogicGAN, that leverages recent advances in (logicbased) explainable AI (xAI) systems to provide a "richer" form of corrective feedback from discriminators to generators. Specifically, we modify the gradient descent process using xAI systems that specify the reason as to why the discriminator made the classification it did, thus providing the richer corrective feedback that helps the generator to better fool the discriminator. Using our approach, we show that LogicGANs learn much faster on MNIST data, achieving an improvement in data efficiency of 45% in single and 12.73% in multiclass setting over standard GANs while maintaining the same quality as measured by Fr\'echet Inception Distance. Further, we argue that LogicGAN enables users greater control over how models learn than standard GAN systems.
 [350] arXiv:2002.10444 [pdf, other]

Title: Batch Normalization Biases Deep Residual Networks Towards Shallow PathsSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Batch normalization has multiple benefits. It improves the conditioning of the loss landscape, and is a surprisingly effective regularizer. However, the most important benefit of batch normalization arises in residual networks, where it dramatically increases the largest trainable depth. We identify the origin of this benefit: At initialization, batch normalization downscales the residual branch relative to the skip connection, by a normalizing factor proportional to the square root of the network depth. This ensures that, early in training, the function computed by deep normalized residual networks is dominated by shallow paths with wellbehaved gradients. We use this insight to develop a simple initialization scheme which can train very deep residual networks without normalization. We also clarify that, although batch normalization does enable stable training with larger learning rates, this benefit is only useful when one wishes to parallelize training over large batch sizes. Our results help isolate the distinct benefits of batch normalization in different architectures.
 [351] arXiv:2002.10445 [pdf, other]

Title: Deep Nearest Neighbor Anomaly DetectionSubjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
Nearest neighbors is a successful and longstanding technique for anomaly detection. Significant progress has been recently achieved by selfsupervised deep methods (e.g. RotNet). Selfsupervised features however typically underperform Imagenet pretrained features. In this work, we investigate whether the recent progress can indeed outperform nearestneighbor methods operating on an Imagenet pretrained feature space. The simple nearestneighbor basedapproach is experimentally shown to outperform selfsupervised methods in: accuracy, few shot generalization, training time and noise robustness while making fewer assumptions on image distributions.
Crosslists for Tue, 25 Feb 20
 [352] arXiv:1908.08783 (crosslist from cs.NE) [pdf, other]

Title: Learning Fitness Functions for Genetic AlgorithmsAuthors: Shantanu Mandal, Todd A. Anderson, Javier S. Turek, Justin Gottschlich, Shengtian Zhou, Abdullah MuzahidSubjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG); Machine Learning (stat.ML)
The problem of automatic software generation is known as Machine Programming. In this work, we propose a framework based on genetic algorithms to solve this problem. Although genetic algorithms have been used successfully for many problems, one criticism is that handcrafting its fitness function, the test that aims to effectively guide its evolution, can be notably challenging. Our framework presents a novel approach to learn the fitness function using neural networks to predict values of ideal fitness function. We also augment the evolutionary process with a minimally intrusive search heuristic. This heuristic improves the framework's ability to discover correct programs from ones that are approximately correct and does so with negligible computational overhead. We compare our approach with two stateoftheart program synthesis methods and demonstrate that it finds more correct programs with fewer candidate program generations.
 [353] arXiv:2002.09487 (crosslist from physics.fludyn) [pdf, other]

Title: Quasiperiodic traveling gravitycapillary wavesComments: 25 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:2001.10745Subjects: Fluid Dynamics (physics.fludyn); Numerical Analysis (math.NA)
We present a numerical study of spatially quasiperiodic traveling waves on the surface of an ideal fluid of infinite depth. This is a generalization of the classic Wilton ripple problem to the case when the ratio of wave numbers satisfying the dispersion relation is irrational. We develop a conformal mapping formulation of the water wave equations that employs a quasiperiodic variant of the Hilbert transform to compute the normal velocity of the fluid from its velocity potential on the free surface. We develop a Fourier pseudospectral discretization of the traveling water wave equations in which onedimensional quasiperiodic functions are represented by twodimensional periodic functions on the torus. This leads to an overdetermined nonlinear least squares problem that we solve using a variant of the LevenbergMarquardt method. We investigate various properties of quasiperiodic traveling waves, including Fourier resonances and the dependence of wave speed and surface tension on the amplitude parameters that describe a twoparameter family of waves.
 [354] arXiv:2002.09488 (crosslist from math.OC) [pdf, other]

Title: Optimal Randomized FirstOrder Methods for LeastSquares ProblemsComments: arXiv admin note: text overlap with arXiv:2002.00864Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
We provide an exact analysis of a class of randomized algorithms for solving overdetermined leastsquares problems. We consider firstorder methods, where the gradients are preconditioned by an approximation of the Hessian, based on a subspace embedding of the data matrix. This class of algorithms encompasses several randomized methods among the fastest solvers for leastsquares problems. We focus on two classical embeddings, namely, Gaussian projections and subsampled randomized Hadamard transforms (SRHT). Our key technical innovation is the derivation of the limiting spectral density of SRHT embeddings. Leveraging this novel result, we derive the family of normalized orthogonal polynomials of the SRHT density and we find the optimal preconditioned firstorder method along with its rate of convergence. Our analysis of Gaussian embeddings proceeds similarly, and leverages classical random matrix theory results. In particular, we show that for a given sketch size, SRHT embeddings exhibits a faster rate of convergence than Gaussian embeddings. Then, we propose a new algorithm by optimizing the computational complexity over the choice of the sketching dimension. To our knowledge, our resulting algorithm yields the best known complexity for solving leastsquares problems with no condition number dependence.
 [355] arXiv:2002.09509 (crosslist from math.NT) [pdf, ps, other]

Title: Gowers norms for automatic sequencesComments: 50 pagesSubjects: Number Theory (math.NT); Formal Languages and Automata Theory (cs.FL); Combinatorics (math.CO); Dynamical Systems (math.DS)
We show that any automatic sequence can be separated into a structured part and a Gowers uniform part in a way that is considerably more efficient than guaranteed by the Arithmetic Regularity Lemma. For sequences produced by strongly connected and prolongable automata, the structured part is rationally almost periodic, while for general sequences the description is marginally more complicated. In particular, we show that all automatic sequences orthogonal to periodic sequences are Gowers uniform. As an application, we obtain for any $l \geq 2$ and any automatic set $A \subset \mathbb{N}_0$ lower bounds on the number of $l$term arithmetic progressions  contained in $A$  with a given difference. The analogous result is false for general subsets of $\mathbb{N}_0$ and progressions of length $\geq 5$.
 [356] arXiv:2002.09515 (crosslist from physics.geoph) [pdf, other]

Title: Joint geophysical, petrophysical and geologic inversion using a dynamic Gaussian mixture modelComments: 35 pages, 10 figures, submitted paper awaiting for decisionSubjects: Geophysics (physics.geoph); Computational Engineering, Finance, and Science (cs.CE); Applications (stat.AP)
We present a framework for petrophysically and geologically guided inversion to perform multiphysics joint inversions. Petrophysical and geological information is included in a multidimensional Gaussian mixture model that regularizes the inverse problem. The inverse problem we construct consists of a suite of three cyclic optimizations over the geophysical, petrophysical and geological information. The two additional problems over the petrophysical and geological data are used as a coupling term. They correspond to updating the geophysical reference model and regularization weights. This guides the inverse problem towards reproducing the desired petrophysical and geological characteristics. The objective function that we define for the inverse problem is comprised of multiple data misfit terms: one for each geophysical survey and one for the petrophysical properties and geological information. Each of these misfit terms has its target misfit value which we seek to fit in the inversion. We detail our reweighting strategies to handle multiple data misfits at once. Our framework is modular and extensible, and this allows us to combine multiple geophysical methods in a joint inversion and to distribute opensource code and reproducible examples. To illustrate the gains made by multiphysics inversions, we apply our framework to jointly invert, in 3D, synthetic potential fields data based on the DO$27$ kimberlite pipe case study (Northwest Territories, Canada). The pipe contains two distinct kimberlite facies embedded in a host rock. We show that inverting the datasets individually, even with petrophysical information, leads to a binary geologic model consisting of background or kimberlite. A joint inversion, with petrophysical information, can differentiate the two main kimberlite facies of the pipe.
 [357] arXiv:2002.09526 (crosslist from math.OC) [pdf, other]

Title: Stochastic Subspace Cubic Newton MethodComments: 29 pages, 5 figures, 1 table, 1 algorithmSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
In this paper, we propose a new randomized secondorder optimization algorithmStochastic Subspace Cubic Newton (SSCN)for minimizing a high dimensional convex function $f$. Our method can be seen both as a {\em stochastic} extension of the cubicallyregularized Newton method of Nesterov and Polyak (2006), and a {\em secondorder} enhancement of stochastic subspace descent of Kozak et al. (2019). We prove that as we vary the minibatch size, the global convergence rate of SSCN interpolates between the rate of stochastic coordinate descent (CD) and the rate of cubic regularized Newton, thus giving new insights into the connection between first and secondorder methods. Remarkably, the local convergence rate of SSCN matches the rate of stochastic subspace descent applied to the problem of minimizing the quadratic function $\frac12 (xx^*)^\top \nabla^2f(x^*)(xx^*)$, where $x^*$ is the minimizer of $f$, and hence depends on the properties of $f$ at the optimum only. Our numerical experiments show that SSCN outperforms nonaccelerated firstorder CD algorithms while being competitive to their accelerated variants.
 [358] arXiv:2002.09538 (crosslist from stat.ML) [pdf, other]

Title: Knot Selection in Sparse Gaussian ProcessesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Knotbased, sparse Gaussian processes have enjoyed considerable success as scalable approximations to full Gaussian processes. Problems can occur, however, when knot selection is done by optimizing the marginal likelihood. For example, the marginal likelihood surface is highly multimodal, which can cause suboptimal knot placement where some knots serve practically no function. This is especially a problem when many more knots are used than are necessary, resulting in extra computational cost for little to no gains in accuracy.
We propose a oneatatime knot selection algorithm to select both the number and placement of knots. Our algorithm uses Bayesian optimization to efficiently propose knots that are likely to be good and largely avoids the pathologies encountered when using the marginal likelihood as the objective function. We provide empirical results showing improved accuracy and speed over the current standard approaches.  [359] arXiv:2002.09547 (crosslist from stat.ML) [pdf, other]

Title: Stochastic Normalizing FlowsComments: 17 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs). Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs as random neural ordinary differential equations. These SDEs can be used for constructing efficient Markov chains to sample from the underlying distribution of a given dataset. Furthermore, by considering families of targeted SDEs with prescribed stationary distribution, we can apply VI to the optimization of hyperparameters in stochastic MCMC.
 [360] arXiv:2002.09558 (crosslist from eess.IV) [pdf, other]

Title: SelfSupervised PoissonGaussian DenoisingSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
We extend the blindspot model for selfsupervised denoising to handle PoissonGaussian noise and introduce an improved training scheme that avoids hyperparameters and adapts the denoiser to the test data. Selfsupervised models for denoising learn to denoise from only noisy data and do not require corresponding clean images, which are difficult or impossible to acquire in some application areas of interest such as lowlight microscopy. We introduce a new training strategy to handle PoissonGaussian noise which is the standard noise model for microscope images. Our new strategy eliminates hyperparameters from the loss function, which is important in a selfsupervised regime where no ground truth data is available to guide hyperparameter tuning. We show how our denoiser can be adapted to the test data to improve performance. Our evaluation on a microscope image denoising benchmark validates our approach.
 [361] arXiv:2002.09573 (crosslist from stat.ML) [pdf, ps, other]

Title: Causal structure learning from time series: Large regression coefficients may predict causal links better in practice than small pvaluesAuthors: Sebastian Weichwald, Martin E Jakobsen, Phillip B Mogensen, Lasse Petersen, Nikolaj Thams, Gherardo VarandoSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
In this article, we describe the algorithms for causal structure learning from time series data that won the Causality 4 Climate competition at the Conference on Neural Information Processing Systems 2019 (NeurIPS). We examine how our combination of established ideas achieves competitive performance on semirealistic and realistic time series data exhibiting common challenges in realworld Earth sciences data. In particular, we discuss a) a rationale for leveraging linear methods to identify causal links in nonlinear systems, b) a simulationbacked explanation as to why large regression coefficients may predict causal links better in practice than small pvalues and thus why normalising the data may sometimes hinder causal structure learning.
For benchmark usage, we provide implementations at https://github.com/sweichwald/tidybench and detail the algorithms here. We propose the presented competitionproven methods for baseline benchmark comparisons to guide the development of novel algorithms for structure learning from time series.  [362] arXiv:2002.09580 (crosslist from stat.ML) [pdf, other]

Title: Polarizing Front Ends for Robust CNNsComments: Published in 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP)
The vulnerability of deep neural networks to small, adversarially designed perturbations can be attributed to their "excessive linearity." In this paper, we propose a bottomup strategy for attenuating adversarial perturbations using a nonlinear front end which polarizes and quantizes the data. We observe that ideal polarization can be utilized to completely eliminate perturbations, develop algorithms to learn approximately polarizing bases for data, and investigate the effectiveness of the proposed strategy on the MNIST and Fashion MNIST datasets.
 [363] arXiv:2002.09589 (crosslist from stat.ML) [pdf, other]

Title: SURF: A Simple, Universal, Robust, Fast Distribution Learning AlgorithmComments: 20 pages, 6 figures, 2 tablesSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG); Statistics Theory (math.ST)
Sample and computationallyefficient distribution estimation is a fundamental tenet in statistics and machine learning. We present $\mathrm{SURF}$, an algorithm for approximating distributions by piecewise polynomials. $\mathrm{SURF}$ is simple, replacing existing generalpurpose optimization techniques by straightforward approximation of each potential polynomial piece by a simple empiricalprobability interpolation, and using plain divideandconquer to merge the pieces. It is universal, as wellknown lowdegree polynomialapproximation results imply that it accurately approximates a large class of common distributions. $\mathrm{SURF}$ is robust to distribution misspecification as for any degree $d\le 8$, it estimates any distribution to an $\ell_1$ distance $ <3 $ times that of the nearest degree$d$ piecewise polynomial, improving known factor upper bounds of 3 for single polynomials and 15 for polynomials with arbitrarily many pieces. It is fast, using optimal sample complexity, and running in near samplelinear time. In experiments, $\mathrm{SURF}$ significantly outperforms stateofthe art algorithms.
 [364] arXiv:2002.09611 (crosslist from eess.IV) [pdf, other]

Title: Tuningfree PlugandPlay Proximal Algorithm for Inverse Imaging ProblemsAuthors: Kaixuan Wei, Angelica AvilesRivero, Jingwei Liang, Ying Fu, CarolaBibiane Schnlieb, Hua HuangSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Plugandplay (PnP) is a nonconvex framework that combines ADMM or other proximal algorithms with advanced denoiser priors. Recently, PnP has achieved great empirical success, especially with the integration of deep learningbased denoisers. However, a key problem of PnP based approaches is that they require manual parameter tweaking. It is necessary to obtain highquality results across the high discrepancy in terms of imaging conditions and varying scene content. In this work, we present a tuningfree PnP proximal algorithm, which can automatically determine the internal parameters including the penalty parameter, the denoising strength and the terminal time. A key part of our approach is to develop a policy network for automatic search of parameters, which can be effectively learned via mixed modelfree and modelbased deep reinforcement learning. We demonstrate, through numerical and visual experiments, that the learned policy can customize different parameters for different states, and often more efficient and effective than existing handcrafted criteria. Moreover, we discuss the practical considerations of the plugged denoisers, which together with our learned policy yield stateoftheart results. This is prevalent on both linear and nonlinear exemplary inverse imaging problems, and in particular, we show promising results on Compressed Sensing MRI and phase retrieval.
 [365] arXiv:2002.09615 (crosslist from stat.ML) [pdf, other]

Title: Preference Modeling with ContextDependent Salient FeaturesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We consider the problem of estimating a ranking on a set of items from noisy pairwise comparisons given item features. We address the fact that pairwise comparison data often reflects irrational choice, e.g. intransitivity. Our key observation is that two items compared in isolation from other items may be compared based on only a salient subset of features. Formalizing this framework, we propose the "salient feature preference model" and prove a sample complexity result for learning the parameters of our model and the underlying ranking with maximum likelihood estimation. We also provide empirical results that support our theoretical bounds and illustrate how our model explains systematic intransitivity. Finally we demonstrate strong performance of maximum likelihood estimation of our model on both synthetic data and two real data sets: the UT Zappos50K data set and comparison data about the compactness of legislative districts in the US.
 [366] arXiv:2002.09621 (crosslist from math.OC) [pdf, other]

Title: Global Convergence and VarianceReduced Optimization for a Class of NonconvexNonconcave Minimax ProblemsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
Nonconvex minimax problems appear frequently in emerging machine learning applications, such as generative adversarial networks and adversarial learning. Simple algorithms such as the gradient descent ascent (GDA) are the common practice for solving these nonconvex games and receive lots of empirical success. Yet, it is known that these vanilla GDA algorithms with constant step size can potentially diverge even in the convex setting. In this work, we show that for a subclass of nonconvexnonconcave objectives satisfying a socalled twosided Polyak{\L}ojasiewicz inequality, the alternating gradient descent ascent (AGDA) algorithm converges globally at a linear rate and the stochastic AGDA achieves a sublinear rate. We further develop a variance reduced algorithm that attains a provably faster rate than AGDA when the problem has the finitesum structure.
 [367] arXiv:2002.09622 (crosslist from math.LO) [pdf, ps, other]

Title: Notes on neighborhood semantics for logics of unknown truths and false beliefsAuthors: Jie FanComments: 21 pagesSubjects: Logic (math.LO); Artificial Intelligence (cs.AI)
In this article, we study logics of unknown truths and false beliefs under neighborhood semantics. We compare the relative expressivity of the two logics. It turns out that they are incomparable over various classes of neighborhood models, and the combination of the two logics are equally expressive as standard modal logic over any class of neighborhood models. We propose morphisms for each logic, which can help us explore the frame definability problem, show a general soundness and completeness result, and generalize some results in the literature. We axiomatize the two logics over various classes of neighborhood frames. Last but not least, we extend the results to the case of public announcements, which has good applications to Moore sentences and some others.
 [368] arXiv:2002.09625 (crosslist from eess.IV) [pdf, other]

Title: Neural Architecture Search for Compressed Sensing Magnetic Resonance Image ReconstructionComments: 10 pages, submitted to IEEEtransSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Recent works have demonstrated that deep learning (DL) based compressed sensing (CS) implementation can provide impressive improvements to reconstruct highquality MR images from subsampling kspace data. However, network architectures adopted in current methods are all designed by handcraft, thus the performances of these networks are limited by researchers' expertise and labor. In this manuscript, we proposed a novel and efficient MR image reconstruction framework by Neural Architecture Search (NAS) algorithm. The inner cells in our reconstruction network are automatically defined from a flexible search space in a differentiable manner. Comparing to previous works where only several common convolutional operations are tried by human, our method can explore different operations (e.g. dilated convolution) with their possible combinations sufficiently. Our proposed method can also reach a better tradeoff between computation cost and reconstruction performance for practical clinical translation. Experiments performed on a publicly available dataset show that our network produces better reconstruction results compared to the previous stateoftheart methods in terms of PSNR and SSIM with 4 times fewer computation resources. The final network architecture found by the algorithm can also offer insights for network architecture designed in other medical image analysis applications.
 [369] arXiv:2002.09635 (crosslist from eess.IV) [pdf, other]

Title: Towards LabelFree 3D Segmentation of Optical Coherence Tomography Images of the Optic Nerve Head Using Deep LearningAuthors: Sripad Krishna Devalla, Tan Hung Pham, Satish Kumar Panda, Liang Zhang, Giridhar Subramanian, Anirudh Swaminathan, Chin Zhi Yun, Mohan Rajan, Sujatha Mohan, Ramaswami Krishnadas, Vijayalakshmi Senthil, John Mark S. de Leon, Tin A. Tun, ChingYu Cheng, Leopold Schmetterer, Shamira Perera, Tin Aung, Alexandre H. Thiery, Michael J. A. GirardSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Since the introduction of optical coherence tomography (OCT), it has been possible to study the complex 3D morphological changes of the optic nerve head (ONH) tissues that occur along with the progression of glaucoma. Although several deep learning (DL) techniques have been recently proposed for the automated extraction (segmentation) and quantification of these morphological changes, the device specific nature and the difficulty in preparing manual segmentations (training data) limit their clinical adoption. With several new manufacturers and nextgeneration OCT devices entering the market, the complexity in deploying DL algorithms clinically is only increasing. To address this, we propose a DL based 3D segmentation framework that is easily translatable across OCT devices in a labelfree manner (i.e. without the need to manually resegment data for each device). Specifically, we developed 2 sets of DL networks. The first (referred to as the enhancer) was able to enhance OCT image quality from 3 OCT devices, and harmonized imagecharacteristics across these devices. The second performed 3D segmentation of 6 important ONH tissue layers. We found that the use of the enhancer was critical for our segmentation network to achieve device independency. In other words, our 3D segmentation network trained on any of 3 devices successfully segmented ONH tissue layers from the other two devices with high performance (Dice coefficients > 0.92). With such an approach, we could automatically segment images from new OCT devices without ever needing manual segmentation data from such devices.
 [370] arXiv:2002.09656 (crosslist from qfin.ST) [pdf]

Title: A new hybrid approach for crude oil price forecasting: Evidence from multiscale dataSubjects: Statistical Finance (qfin.ST); Machine Learning (cs.LG); Machine Learning (stat.ML)
Faced with the growing research towards crude oil price fluctuations influential factors following the accelerated development of Internet technology, accessible data such as Google search volume index are increasingly quantified and incorporated into forecasting approaches. In this paper, we apply multiscale data that including both GSVI data and traditional economic data related to crude oil price as independent variables and propose a new hybrid approach for monthly crude oil price forecasting. This hybrid approach, based on divide and conquer strategy, consists of Kmeans method, kernel principal component analysis and kernel extreme learning machine , where Kmeans method is adopted to divide input data into certain clusters, KPCA is applied to reduce dimension, and KELM is employed for final crude oil price forecasting. The empirical result can be analyzed from data and method levels. At the data level, GSVI data perform better than economic data in level forecasting accuracy but with opposite performance in directional forecasting accuracy because of Herd Behavior, while hybrid data combined their advantages and obtain best forecasting performance in both level and directional accuracy. At the method level, the approaches with Kmeans perform better than those without Kmeans, which demonstrates that divide and conquer strategy can effectively improve the forecasting performance.
 [371] arXiv:2002.09658 (crosslist from math.OC) [pdf, other]

Title: An Efficient MPC Algorithm For Switched Nonlinear Systems with Minimum Dwell Time ConstraintsSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper presents an efficient suboptimal model predictive control (MPC) algorithm for nonlinear switched systems subject to minimum dwell time constraints (MTC). While MTC are required for most physical systems due to stability, power and mechanical restrictions, MPC optimization problems with MTC are challenging to solve. To efficiently solve such problems, the online MPC optimization problem is decomposed into a sequence of simpler problems, which include two nonlinear programs (NLP) and a rounding step, as typically done in mixedinteger optimal control (MIOC). Unlike the classical approach that embeds MTC in a mixedinteger linear program (MILP) with combinatorial constraints in the rounding step, our proposal is to embed the MTC in one of the NLPs using move blocking. Such a formulation can speedup online computations by employing recent move blocking algorithms for NLP problems and by using a simple sumuprounding (SUR) method for the rounding step. An explicit upper bound of the integer approximation error for the rounding step is given. In addition, a combined shrinking and receding horizon strategy is developed to satisfy closedloop MTC. Recursive feasibility is proven using a $l$step control invariant ($l$CI) set, where $l$ is the minimum dwell time step length. An algorithm to compute $l$CI sets for switched linear systems offline is also presented. Numerical studies demonstrate the efficiency and effectiveness of the proposed MPC algorithm for switched nonlinear systems with MTC.
 [372] arXiv:2002.09677 (crosslist from stat.ML) [pdf, other]

Title: Kernel interpolation with continuous volume samplingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR)
A fundamental task in kernel methods is to pick nodes and weights, so as to approximate a given function from an RKHS by the weighted sum of kernel translates located at the nodes. This is the crux of kernel density estimation, kernel quadrature, or interpolation from discrete samples. Furthermore, RKHSs offer a convenient mathematical and computational framework. We introduce and analyse continuous volume sampling (VS), the continuous counterpart  for choosing node locations  of a discrete distribution introduced in (Deshpande & Vempala, 2006). Our contribution is theoretical: we prove almost optimal bounds for interpolation and quadrature under VS. While similar bounds already exist for some specific RKHSs using adhoc node constructions, VS offers bounds that apply to any Mercer kernel and depend on the spectrum of the associated integration operator. We emphasize that, unlike previous randomized approaches that rely on regularized leverage scores or determinantal point processes, evaluating the pdf of VS only requires pointwise evaluations of the kernel. VS is thus naturally amenable to MCMC samplers.
 [373] arXiv:2002.09695 (crosslist from stat.ML) [pdf]

Title: A New Unified Deep Learning Approach with DecompositionReconstructionEnsemble Framework for Time Series ForecastingSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Applications (stat.AP)
A new variational mode decomposition (VMD) based deep learning approach is proposed in this paper for time series forecasting problem. Firstly, VMD is adopted to decompose the original time series into several subsignals. Then, a convolutional neural network (CNN) is applied to learn the reconstruction patterns on the decomposed subsignals to obtain several reconstructed subsignals. Finally, a long short term memory (LSTM) network is employed to forecast the time series with the decomposed subsignals and the reconstructed subsignals as inputs. The proposed VMDCNNLSTM approach is originated from the decompositionreconstructionensemble framework, and innovated by embedding the reconstruction, single forecasting, and ensemble steps in a unified deep learning approach. To verify the forecasting performance of the proposed approach, four typical time series datasets are introduced for empirical analysis. The empirical results demonstrate that the proposed approach outperforms consistently the benchmark approaches in terms of forecasting accuracy, and also indicate that the reconstructed subsignals obtained by CNN is of importance for further improving the forecasting performance.
 [374] arXiv:2002.09703 (crosslist from eess.IV) [pdf, other]

Title: Automatic Data Augmentation via Deep Reinforcement Learning for Effective Kidney Tumor SegmentationComments: 5 pages, 3 figuresSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Conventional data augmentation realized by performing simple preprocessing operations (\eg, rotation, crop, \etc) has been validated for its advantage in enhancing the performance for medical image segmentation. However, the data generated by these conventional augmentation methods are random and sometimes harmful to the subsequent segmentation. In this paper, we developed a novel automatic learningbased data augmentation method for medical image segmentation which models the augmentation task as a trialanderror procedure using deep reinforcement learning (DRL). In our method, we innovatively combine the data augmentation module and the subsequent segmentation module in an endtoend training manner with a consistent loss. Specifically, the best sequential combination of different basic operations is automatically learned by directly maximizing the performance improvement (\ie, Dice ratio) on the available validation set. We extensively evaluated our method on CT kidney tumor segmentation which validated the promising results of our method.
 [375] arXiv:2002.09711 (crosslist from physics.bioph) [pdf]

Title: Robotic modeling of snake traversing large, smooth obstacles reveals stability benefits of body complianceJournalref: Royal Society Open Science (2020), 7, 191192Subjects: Biological Physics (physics.bioph); Systems and Control (eess.SY); Quantitative Methods (qbio.QM)
Snakes can move through almost any terrain. Although their locomotion on flat surfaces using planar gaits is inherently stable, when snakes deform their body out of plane to traverse complex terrain, maintaining stability becomes a challenge. On trees and desert dunes, snakes grip branches or brace against depressed sand for stability. However, how they stably surmount obstacles like boulders too large and smooth to gain such anchor points is less understood. Similarly, snake robots are challenged to stably traverse large, smooth obstacles for search and rescue and building inspection. Our recent study discovered that snakes combine body lateral undulation and cantilevering to stably traverse large steps. Here, we developed a snake robot with this gait and snakelike anisotropic friction and used it as a physical model to understand stability principles. The robot traversed steps as high as a third of its body length rapidly and stably. However, on higher steps, it was more likely to fail due to more frequent rolling and flipping over, which was absent in the snake with a compliant body. Adding body compliance reduced the robot roll instability by statistically improving surface contact, without reducing speed. Besides advancing understanding of snake locomotion, our robot achieved high traversal speed surpassing most previous snake robots and approaching snakes, while maintaining high traversal probability.
 [376] arXiv:2002.09735 (crosslist from stat.ML) [pdf, other]

Title: Partially Observed Dynamic Tensor Response RegressionComments: This contains the main paper only. The supplement is available upon requestSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
In modern data science, dynamic tensor data is prevailing in numerous applications. An important task is to characterize the relationship between such dynamic tensor and external covariates. However, the tensor data is often only partially observed, rendering many existing methods inapplicable. In this article, we develop a regression model with partially observed dynamic tensor as the response and external covariates as the predictor. We introduce the lowrank, sparsity and fusion structures on the regression coefficient tensor, and consider a loss function projected over the observed entries. We develop an efficient nonconvex alternating updating algorithm, and derive the finitesample error bound of the actual estimator from each step of our optimization algorithm. Unobserved entries in tensor response have imposed serious challenges. As a result, our proposal differs considerably in terms of estimation algorithm, regularity conditions, as well as theoretical properties, compared to the existing tensor completion or tensor response regression solutions. We illustrate the efficacy of our proposed method using simulations, and two real applications, a neuroimaging dementia study and a digital advertising study.
 [377] arXiv:2002.09737 (crosslist from stat.ML) [pdf, other]

Title: Amortised Learning by WakeSleepSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Models that employ latent variables to capture structure in observed data lie at the heart of many current unsupervised learning algorithms, but exact maximumlikelihood learning for powerful and flexible latentvariable models is almost always intractable. Thus, stateoftheart approaches either abandon the maximumlikelihood framework entirely, or else rely on a variety of variational approximations to the posterior distribution over the latents. Here, we propose an alternative approach that we call amortised learning. Rather than computing an approximation to the posterior over latents, we use a wakesleep MonteCarlo strategy to learn a function that directly estimates the maximumlikelihood parameter updates. Amortised learning is possible whenever samples of latents and observations can be simulated from the generative model, treating the model as a "black box". We demonstrate its effectiveness on a wide range of complex models, including those with latents that are discrete or supported on nonEuclidean spaces.
 [378] arXiv:2002.09741 (crosslist from stat.ML) [pdf, other]

Title: VFlow: More Expressive Generative Flows with Variational Data AugmentationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Generative flows are promising tractable models for density modeling that define probabilistic distributions with invertible transformations. However, tractability imposes architectural constraints on generative flows, making them less expressive than other types of generative models. In this work, we study a previously overlooked constraint that all the intermediate representations must have the same dimensionality with the original data due to invertibility, limiting the width of the network. We tackle this constraint by augmenting the data with some extra dimensions and jointly learning a generative flow for augmented data as well as the distribution of augmented dimensions under a variational inference framework. Our approach, VFlow, is a generalization of generative flows and therefore always performs better. Combining with existing generative flows, VFlow achieves a new stateoftheart 2.98 bits per dimension on the CIFAR10 dataset and is more compact than previous models to reach similar modeling quality.
 [379] arXiv:2002.09769 (crosslist from stat.ML) [pdf, ps, other]

Title: Optimistic bounds for multioutput predictionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
We investigate the challenge of multioutput learning, where the goal is to learn a vectorvalued function based on a supervised data set. This includes a range of important problems in Machine Learning including multitarget regression, multiclass classification and multilabel classification. We begin our analysis by introducing the selfbounding Lipschitz condition for multioutput loss functions, which interpolates continuously between a classical Lipschitz condition and a multidimensional analogue of a smoothness condition. We then show that the selfbounding Lipschitz condition gives rise to optimistic bounds for multioutput learning, which are minimax optimal up to logarithmic factors. The proof exploits local Rademacher complexity combined with a powerful minoration inequality due to Srebro, Sridharan and Tewari. As an application we derive a stateoftheart generalization bound for multiclass gradient boosting.
 [380] arXiv:2002.09783 (crosslist from quantph) [pdf, other]

Title: Optimality Study of Existing Quantum Computing Layout Synthesis ToolsComments: 15 pages, 7 figuresSubjects: Quantum Physics (quantph); Hardware Architecture (cs.AR)
Layout synthesis, an important step in quantum computing, processes quantum circuits to satisfy device layout constraints. In this paper, we construct QUEKO benchmarks for this problem, which have known optimal depth. We use QUEKO to evaluate the optimality of current layout synthesis tools, including Cirq from Google, Qiskit from IBM, $\mathsf{t}\mathsf{ket}\rangle$ from Cambridge Quantum Computing, and recent academic work. To our surprise, despite over a decade of research and development by academia and industry on compilation and synthesis for quantum circuits, we are still able to demonstrate large optimality gaps. Even combining the best of all four solutions we evaluated, the gap is still about 4x for circuits with depths suitable for stateoftheart devices. This suggests substantial room for improvement. Finally, we also prove the NPcompleteness of the layout synthesis problem for quantum computing. We have made the QUEKO benchmarks open source.
 [381] arXiv:2002.09806 (crosslist from math.OC) [pdf, ps, other]

Title: FiniteTime LastIterate Convergence for MultiAgent Learning in GamesComments: 21 Pages. Under reviewSubjects: Optimization and Control (math.OC); Computer Science and Game Theory (cs.GT); Machine Learning (stat.ML)
We consider multiagent learning via online gradient descent (OGD) in a class of games called $\lambda$cocoercive games, a broad class of games that admits many Nash equilibria and that properly includes strongly monotone games. We characterize the finitetime lastiterate convergence rate for joint OGD learning on $\lambda$cocoercive games; further, building on this result, we develop a fully adaptive OGD learning algorithm that does not require any knowledge of the problem parameter (e.g., the cocoercive constant $\lambda$) and show, via a novel doublestoppingtime technique, that this adaptive algorithm achieves the same finitetime lastiterate convergence rate as its nonadaptive counterpart. Subsequently, we extend OGD learning to the noisy gradient feedback case and establish lastiterate convergence resultsfirst qualitative almost sure convergence, then quantitative finitetime convergence ratesall under nondecreasing stepsizes. These results fill in several gaps in the existing multiagent online learning literature, where three aspectsfinitetime convergence rates, nondecreasing stepsizes, and fully adaptive algorithmshave not been previously explored.
 [382] arXiv:2002.09815 (crosslist from stat.ML) [pdf, other]

Title: Neuron Shapley: Discovering the Responsible NeuronsSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
We develop Neuron Shapley as a new framework to quantify the contribution of individual neurons to the prediction and performance of a deep network. By accounting for interactions across neurons, Neuron Shapley is more effective in identifying important filters compared to common approaches based on activation patterns. Interestingly, removing just 30 filters with the highest Shapley scores effectively destroys the prediction accuracy of Inceptionv3 on ImageNet. Visualization of these few critical filters provides insights into how the network functions. Neuron Shapley is a flexible framework and can be applied to identify responsible neurons in many tasks. We illustrate additional applications of identifying filters that are responsible for biased prediction in facial recognition and filters that are vulnerable to adversarial attacks. Removing these filters is a quick way to repair models. Enabling all these applications is a new multiarm bandit algorithm that we developed to efficiently estimate Neuron Shapley values.
 [383] arXiv:2002.09821 (crosslist from eess.AS) [pdf, other]

Title: A Multiview CNNbased Acoustic Classification System for Automatic Animal Species IdentificationJournalref: Ad Hoc Networks 2020Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Automatic identification of animal species by their vocalization is an important and challenging task. Although many kinds of audio monitoring system have been proposed in the literature, they suffer from several disadvantages such as nontrivial feature selection, accuracy degradation because of environmental noise or intensive local computation. In this paper, we propose a deep learning based acoustic classification framework for Wireless Acoustic Sensor Network (WASN). The proposed framework is based on cloud architecture which relaxes the computational burden on the wireless sensor node. To improve the recognition accuracy, we design a multiview Convolution Neural Network (CNN) to extract the short, middle, and longterm dependencies in parallel. The evaluation on two real datasets shows that the proposed architecture can achieve high accuracy and outperforms traditional classification systems significantly when the environmental noise dominate the audio signal (low SNR). Moreover, we implement and deploy the proposed system on a testbed and analyse the system performance in realworld environments. Both simulation and realworld evaluation demonstrate the accuracy and robustness of the proposed acoustic classification system in distinguishing species of animals.
 [384] arXiv:2002.09847 (crosslist from eess.IV) [pdf, other]

Title: Unsupervised Denoising for Satellite Imagery using Wavelet Subband CycleGANSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Multispectral satellite imaging sensors acquire various spectral band images such as red (R), green (G), blue (B), nearinfrared (N), etc. Thanks to the unique spectroscopic property of each spectral band with respective to the objects on the ground, multispectral satellite imagery can be used for various geological survey applications. Unfortunately, image artifacts from imaging sensor noises often affect the quality of scenes and have negative impacts on the applications of satellite imagery. Recently, deep learning approaches have been extensively explored for the removal of noises in satellite imagery. Most deep learning denoising methods, however, follow a supervised learning scheme, which requires matched noisy image and clean image pairs that are difficult to collect in real situations. In this paper, we propose a novel unsupervised multispectral denoising method for satellite imagery using wavelet subband cycleconsistent adversarial network (WavCycleGAN). The proposed method is based on unsupervised learning scheme using adversarial loss and cycleconsistency loss to overcome the lack of paired data. Moreover, in contrast to the standard image domain cycleGAN, we introduce a wavelet subband domain learning scheme for effective denoising without sacrificing high frequency components such as edges and detail information. Experimental results for the removal of vertical stripe and wave noises in satellite imaging sensors demonstrate that the proposed method effectively removes noises and preserves important high frequency features of satellite images.
 [385] arXiv:2002.09889 (crosslist from stat.ML) [pdf, other]

Title: Investigating the interaction between gradientonly line searches and different activation functionsComments: 37 pages, 9 figures, submitted for journal reviewSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Gradientonly line searches (GOLS) adaptively determine step sizes along search directions for discontinuous loss functions resulting from dynamic minibatch subsampling in neural network training. Step sizes in GOLS are determined by localizing Stochastic NonNegative Associated Gradient Projection Points (SNNGPPs) along descent directions. These are identified by a sign change in the directional derivative from negative to positive along a descent direction. Activation functions are a significant component of neural network architectures as they introduce nonlinearities essential for complex function approximations. The smoothness and continuity characteristics of the activation functions directly affect the gradient characteristics of the loss function to be optimized. Therefore, it is of interest to investigate the relationship between activation functions and different neural network architectures in the context of GOLS. We find that GOLS are robust for a range of activation functions, but sensitive to the Rectified Linear Unit (ReLU) activation function in standard feedforward architectures. The zeroderivative in ReLU's negative input domain can lead to the gradientvector becoming sparse, which severely affects training. We show that implementing architectural features such as batch normalization and skip connections can alleviate these difficulties and benefit training with GOLS for all activation functions considered.
 [386] arXiv:2002.09914 (crosslist from stat.ML) [pdf, other]

Title: ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep LearningComments: 16 pages, 6 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
In drug discovery, knowledge of the graph structure of chemical compounds is essential. Many thousands of scientific articles in chemistry and pharmaceutical sciences have investigated chemical compounds, but in cases the details of the structure of these chemical compounds is published only as an images. A tool to analyze these images automatically and convert them into a chemical graph structure would be useful for many applications, such drug discovery. A few such tools are available and they are mostly derived from optical character recognition. However, our evaluation of the performance of those tools reveals that they make often mistakes in detecting the correct bond multiplicity and stereochemical information. In addition, errors sometimes even lead to missing atoms in the resulting graph. In our work, we address these issues by developing a compound recognition method based on machine learning. More specifically, we develop a deep neural network model for optical compound recognition. The deep learning solution presented here consists of a segmentation model, followed by three classification models that predict atom locations, bonds and charges. Furthermore, this model not only predicts the graph structure of the molecule but also produces all information necessary to relate each component of the resulting graph to the source image. This solution is scalable and could rapidly process thousands of images. Finally, we compare empirically the proposed method to a wellestablished tool and observe significant error reductions.
 [387] arXiv:2002.09916 (crosslist from math.OC) [pdf, ps, other]

Title: Extended formulation and valid inequalities for the multiitem inventory lotsizing problem with supplier selectionSubjects: Optimization and Control (math.OC); Computational Complexity (cs.CC)
This paper considers the multiitem inventory lotsizing problem with supplier selection. The problem consists in determining an optimal purchasing plan in order to satisfy dynamic deterministic demands for multiple items over a finite planning horizon, taking into account the fact that multiple suppliers are available to purchase from. As the complexity of the problem was an open question, we show that it is NPhard. We propose a facility location extended formulation for the problem which can be preprocessed based on the cost structure and describe new valid inequalities in the original space of variables, which we denote $(l,S_j)$inequalities. Furthermore, we study the projection of the extended formulation into the original space and show the connection between the inequalities generated by this projection and the newly proposed $(l,S_j)$inequalities. Additionally, we present a simple and easy to implement yet very effective MIP (mixed integer programming) heuristic using the extended formulation. Computational results show that the preprocessed facility location extended formulation outperforms all other formulations for small and medium instances, as it can solve nearly all of them to optimality within the time limit. Moreover, the presented MIP heuristic is able to obtain solutions which strictly improve those achieved by a stateofthe art method for all the large benchmark instances.
 [388] arXiv:2002.09929 (crosslist from math.AP) [pdf, other]

Title: Solvability for Photoacoustic Imaging with Idealized Piezoelectric SensorsAuthors: Sebastian AcostaSubjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Computational Physics (physics.compph)
Most reconstruction algorithms for photoacoustic imaging assume that the pressure field is measured by ultrasound sensors placed on a detection surface. However, such sensors do not measure pressure exactly due to their nonuniform directional and frequency responses, and resolution limitations. This is the case for piezoelectric sensors that are commonly employed for acousticsbased biomedical imaging. In this paper, using the method of matched asymptotic expansions and the basic constitutive relations for piezoelectricity, we propose a simple mathematical model for piezoelectric transducers. The approach simultaneously models how the pressure waves induce the piezoelectric measurements and how the presence of the sensors affects the pressure waves. Using this model, we analyze whether the data gathered by piezoelectric sensors leads to the solvability of the photoacoustic imaging problem. We conclude that this imaging problem is wellposed in certain normed spaces and under a geometric assumption. We also propose an iterative reconstruction algorithm that incorporates the model for piezoelectric measurements. Numerical implementation of the reconstruction algorithm is presented.
 [389] arXiv:2002.09946 (crosslist from physics.fludyn) [pdf, other]

Title: The Whitham Equation with Surface TensionComments: 19 pages, 5 figures, 1 table, 36 references. Other author's papers can be downloaded at this http URL arXiv admin note: text overlap with arXiv:1410.8299Journalref: Nonlinear Dynamics (2017), Vol. 88, pp. 11251138Subjects: Fluid Dynamics (physics.fludyn); Analysis of PDEs (math.AP); Numerical Analysis (math.NA); Atmospheric and Oceanic Physics (physics.aoph); Computational Physics (physics.compph)
The viability of the Whitham equation as a nonlocal model for capillarygravity waves at the surface of an inviscid incompressible fluid is under study. A nonlocal Hamiltonian system of model equations is derived using the Hamiltonian structure of the free surface water wave problem and the DirichletNeumann operator. The system features gravitational and capillary effects, and when restricted to oneway propagation, the system reduces to the capillary Whitham equation. It is shown numerically that in various scaling regimes the Whitham equation gives a more accurate approximation of the freesurface problem for the Euler system than other models like the KdV, and Kawahara equation. In the case of relatively strong capillarity considered here, the KdV and Kawahara equations outperform the Whitham equation with surface tension only for very long waves with negative polarity.
 [390] arXiv:2002.09954 (crosslist from stat.ML) [pdf, other]

Title: Nearlinear Time Gaussian Process Optimization with Adaptive Batching and ResparsificationSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Gaussian processes (GP) are one of the most successful frameworks to model uncertainty. However, GP optimization (e.g., GPUCB) suffers from major scalability issues. Experimental time grows linearly with the number of evaluations, unless candidates are selected in batches (e.g., using GPBUCB) and evaluated in parallel. Furthermore, computational cost is often prohibitive since algorithms such as GPBUCB require a time at least quadratic in the number of dimensions and iterations to select each batch. In this paper, we introduce BBKB (Batch Budgeted Kernel Bandits), the first noregret GP optimization algorithm that provably runs in nearlinear time and selects candidates in batches. This is obtained with a new guarantee for the tracking of the posterior variances that allows BBKB to choose increasingly larger batches, improving over GPBUCB. Moreover, we show that the same bound can be used to adaptively delay costly updates to the sparse GP approximation used by BBKB, achieving a nearconstant perstep amortized cost. These findings are then confirmed in several experiments, where BBKB is much faster than stateoftheart methods.
 [391] arXiv:2002.09970 (crosslist from quantph) [pdf, other]

Title: Computerinspired Quantum ExperimentsComments: Comments and suggestions for additional references are welcome!Subjects: Quantum Physics (quantph); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)
The design of new devices and experiments in science and engineering has historically relied on the intuitions of human experts. This credo, however, has changed. In many disciplines, computerinspired design processes, also known as inversedesign, have augmented the capability of scientists. Here we visit different fields of physics in which computerinspired designs are applied. We will meet vastly diverse computational approaches based on topological optimization, evolutionary strategies, deep learning, reinforcement learning or automated reasoning. Then we draw our attention specifically on quantum physics. In the quest for designing new quantum experiments, we face two challenges: First, quantum phenomena are unintuitive. Second, the number of possible configurations of quantum experiments explodes combinatorially. To overcome these challenges, physicists began to use algorithms for computerdesigned quantum experiments. We focus on the most mature and \textit{practical} approaches that scientists used to find new complex quantum experiments, which experimentalists subsequently have realized in the laboratories. The underlying idea is a highlyefficient topological search, which allows for scientific interpretability. In that way, some of the computerdesigns have led to the discovery of new scientific concepts and ideas  demonstrating how computer algorithm can genuinely contribute to science by providing unexpected inspirations. We discuss several extensions and alternatives based on optimization and machine learning techniques, with the potential of accelerating the discovery of practical computerinspired experiments or concepts in the future. Finally, we discuss what we can learn from the different approaches in the fields of physics, and raise several fascinating possibilities for future research.
 [392] arXiv:2002.09980 (crosslist from math.CA) [pdf, ps, other]

Title: Orthogonal Systems of Spline Wavelets as Unconditional Bases in Sobolev SpacesAuthors: Rajula SrivastavaComments: 21 pages, 1 figureSubjects: Classical Analysis and ODEs (math.CA); Functional Analysis (math.FA); Numerical Analysis (math.NA)
We exhibit the necessary range for which functions in the Sobolev spaces $L^s_p$ can be represented as an unconditional sum of orthonormal spline wavelet systems, such as the BattleLemari\'e wavelets. We also consider the natural extensions to TriebelLizorkin spaces. This builds upon, and is a generalization of, previous work of Seeger and Ullrich, where analogous results were established for the Haar wavelet system.
 [393] arXiv:2002.09996 (crosslist from stat.ML) [pdf, other]

Title: ConBO: Conditional Bayesian OptimizationComments: 10 pages, 7 pages appendixSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
Bayesian optimization is a class of data efficient model based algorithms typically focused on global optimization. We consider the more general case where a user is faced with multiple problems that each need to be optimized conditional on a state variable, for example we optimize the location of ambulances conditioned on patient distribution given a range of cities with different patient distributions. Similarity across objectives boosts optimization of each objective in two ways: in modelling by data sharing across objectives, and also in acquisition by quantifying how all objectives benefit from a single point on one objective. For this we propose ConBO, a novel efficient algorithm that is based on a new hybrid Knowledge Gradient method, that outperforms recently published works on synthetic and real world problems, and is easily parallelized to collecting a batch of points.
 [394] arXiv:2002.09998 (crosslist from stat.ME) [pdf, other]

Title: Generalized Bayesian Filtering via Sequential Monte CarloSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
We introduce a framework for inference in general statespace hidden Markov models (HMMs) under likelihood misspecification. In particular, we leverage the losstheoretic perspective of generalized Bayesian inference (GBI) to define generalized filtering recursions in HMMs, that can tackle the problem of inference under model misspecification. In doing so, we arrive at principled procedures for robust inference against observation contamination through the $\beta$divergence. Operationalizing the proposed framework is made possible via sequential Monte Carlo methods (SMC). The standard particle methods, and their associated convergence results, are readily generalized to the new setting. We demonstrate our approach to object tracking and Gaussian process regression problems, and observe improved performance over standard filtering algorithms.
 [395] arXiv:2002.10020 (crosslist from eess.SP) [pdf, other]

Title: Optimal Jammer Placement in UAVassisted Relay NetworksComments: 6 pages, 6 figuresSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
We consider the relaying application of unmanned aerial vehicles (UAVs), in which UAVs are placed between two transceivers (TRs) to increase the throughput of the system. Instead of studying the placement of UAVs as pursued in existing literature, we focus on investigating the placement of a jammer or a major source of interference on the ground to effectively degrade the performance of the system, which is measured by the maximum achievable data rate of transmission between the TRs. We demonstrate that the optimal placement of the jammer is in general a nonconvex optimization problem, for which obtaining the solution directly is intractable. Afterward, using the inherent characteristics of the signaltointerference ratio (SIR) expressions, we propose a tractable approach to find the optimal position of the jammer. Based on the proposed approach, we investigate the optimal positioning of the jammer in both dualhop and multihop UAV relaying settings. Numerical simulations are provided to evaluate the performance of our proposed method.
 [396] arXiv:2002.10022 (crosslist from physics.aoph) [pdf]

Title: Application of ERA5 and MENA simulations to predict offshore wind energy potentialComments: 21 pages, 12 figuresSubjects: Atmospheric and Oceanic Physics (physics.aoph); Machine Learning (cs.LG); Machine Learning (stat.ML)
This study explores wind energy resources in different locations through the Gulf of Oman and also their future variability due climate change impacts. In this regard, ECEARTH near surface wind outputs obtained from CORDEXMENA simulations are used for historical and future projection of the energy. The ERA5 wind data are employed to assess suitability of the climate model. Moreover, the ERA5 wave data over the study area are applied to compute sea surface roughness as an important variable for converting near surface wind speeds to those of wind speed at turbine hubheight. Considering the power distribution, bathymetry and distance from the coats, some spots as tentative energy hotspots to provide detailed assessment of directional and temporal variability and also to investigate climate change impact studies. RCP8.5 as a common climatic scenario is used to project and extract future variation of the energy in the selected sites. The results of this study demonstrate that the selected locations have a suitable potential for wind power turbine plan and constructions.
 [397] arXiv:2002.10023 (crosslist from math.OC) [pdf, other]

Title: Suboptimal Stabilization of Unknown Nonlinear Systems via Extended State ObserversAuthors: Amir ShakouriComments: 5 pages, 2 figuresSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper introduces a globally asymptotically stable, locally optimal, stabilizer for multiinput mutioutput nonlinear systems of any order with totally unknown dynamics in a special form. The control scheme proposed in this paper lies at the intersection of the active disturbance rejection control (ADRC) and the statedependent Riccati equation (SDRE) control method. It is shown that using an extended state observer, the statedependent coefficient matrix of the nonlinear system can be estimated. The system in then stabilized by a suboptimal controller in the region where SDRE method is effective (an estimated region of attraction) and uses an ADRC outside the region as a backup for global stability assurance.
 [398] arXiv:2002.10032 (crosslist from eess.IV) [pdf, other]

Title: Generalized Octave Convolutions for Learned MultiFrequency Image CompressionComments: 10 pages, 7 figures, 3 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Learned image compression has recently shown the potential to outperform all standard codecs. The stateoftheart ratedistortion performance has been achieved by contextadaptive entropy approaches in which hyperprior and autoregressive models are jointly utilized to effectively capture the spatial dependencies in the latent representations. However, the latents contain a mixture of high and low frequency information, which has inefficiently been represented by features maps of the same spatial resolution in previous works. In this paper, we propose the first learned multifrequency image compression approach that uses the recently developed octave convolutions to factorize the latents into high and low frequencies. Since the low frequency is represented by a lower resolution, their spatial redundancy is reduced, which improves the compression rate. Moreover, octave convolutions impose effective high and low frequency communication, which can improve the reconstruction quality. We also develop novel generalized octave convolution and octave transposedconvolution architectures with internal activation layers to preserve the spatial structure of the information. Our experiments show that the proposed scheme outperforms all standard codecs and learningbased methods in both PSNR and MSSSIM metrics, and establishes the new state of the art for learned image compression.
 [399] arXiv:2002.10034 (crosslist from qbio.QM) [pdf, other]

Title: Predicting Rate of Cognitive Decline at Baseline Using a Deep Neural Network with Multidata AnalysisAuthors: Sema Candemir, Xuan V. Nguyen, Luciano M. Prevedello, Matthew T. Bigelow, Richard D.White, Barbaros S. Erdal (for the Alzheimer's Disease Neuroimaging Initiative)Subjects: Quantitative Methods (qbio.QM); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Neurons and Cognition (qbio.NC)
This study investigates whether a machinelearningbased system can predict the rate of cognitivedecline in mildly cognitively impaired (MCI) patients by processing only the clinical and imaging data collected at the initial visit. We build a predictive model based on a supervised hybrid neural network utilizing a 3Dimensional Convolutional Neural Network to perform volume analysis of Magnetic Resonance Imaging (MRI) and integration of nonimaging clinical data at the fully connected layer of the architecture. The analysis is performed on the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset. Experimental results confirm that there is a correlation between cognitive decline and the data obtained at the first visit. The system achieved an area under the receiver operator curve (AUC) of 66.6% for cognitive decline class prediction.
 [400] arXiv:2002.10060 (crosslist from stat.ML) [pdf, other]

Title: Handling the PositiveDefinite Constraint in the Bayesian Learning RuleSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Bayesian learning rule is a recently proposed variational inference method, which not only contains many existing learning algorithms as special cases but also enables the design of new algorithms. Unfortunately, when posterior parameters lie in an open constraint set, the rule may not satisfy the constraints and requires linesearches which could slow down the algorithm. In this paper, we fix this issue for the positivedefinite constraint by proposing an improved rule that naturally handles the constraint. Our modification is obtained using Riemannian gradient methods, and is valid when the approximation attains a \emph{blockcoordinate natural parameterization} (e.g., Gaussian distributions and their mixtures). Our method outperforms existing methods without any significant increase in computation. Our work makes it easier to apply the learning rule in the presence of positivedefinite constraints in parameter spaces.
 [401] arXiv:2002.10091 (crosslist from stat.ME) [pdf, other]

Title: Towards precise causal effect estimation from data with hidden variablesAuthors: Debo Cheng (1), Jiuyong Li (1), Lin Liu (1), Kui Yu (2), Thuc Duy Lee (1), Jixue Liu (1) ((1) School of Information Technology and Mathematical Sciences, University of South Australia (2) School of Computer Science and Information Engineering, Hefei University of Technology)Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI)
Causal effect estimation from observational data is a crucial but challenging task. Currently, only a limited number of datadriven causal effect estimation methods are available. These methods either only provide a bound estimation of the causal effect of a treatment on the outcome, or have impractical assumptions on the data or low efficiency although providing a unique estimation of the causal effect. In this paper, we identify a practical problem setting and propose an approach to achieving unique causal effect estimation from data with hidden variables under this setting. For the approach, we develop the theorems to support the discovery of the proper covariate sets for confounding adjustment (adjustment sets). Based on the theorems, two algorithms are presented for finding the proper adjustment sets from data with hidden variables to obtain unbiased and unique causal effect estimation. Experiments with benchmark Bayesian networks and realworld datasets have demonstrated the efficiency and effectiveness of the proposed algorithms, indicating the practicability of the identified problem setting and the potential of the approach in realworld applications.
 [402] arXiv:2002.10118 (crosslist from stat.ML) [pdf, other]

Title: Being Bayesian, Even Just a Bit, Fixes Overconfidence in ReLU NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The point estimates of ReLU classification networksarguably the most widely used neural network architecturehave been shown to yield arbitrarily high confidence far away from the training data. This architecture, in conjunction with a maximum a posteriori estimation scheme, is thus not calibrated nor robust. Approximate Bayesian inference has been empirically demonstrated to improve predictive uncertainty in neural networks, although the theoretical analysis of such Bayesian approximations is limited. We theoretically analyze approximate Gaussian posterior distributions on the weights of ReLU networks and show that they fix the overconfidence problem. Furthermore, we show that even a simplistic, thus cheap, Bayesian approximation, also fixes these issues. This indicates that a sufficient condition for a calibrated uncertainty on a ReLU network is ``to be a bit Bayesian''. These theoretical results validate the usage of lastlayer Bayesian approximation and motivate a range of a fidelitycost tradeoff. We further validate these findings empirically via various standard experiments using common deep ReLU networks and Laplace approximations.
 [403] arXiv:2002.10123 (crosslist from eess.IV) [pdf, other]

Title: Fusion of Camera Model and Source Device Specific Forensic Methods for Improved Tamper DetectionComments: 13 pagesSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Signal Processing (eess.SP)
PRNU based camera recognition method is widely studied in the image forensic literature. In recent years, CNN based camera model recognition methods have been developed. These two methods also provide solutions to tamper localization problem. In this paper, we propose their combination via a Neural Network to achieve better smallscale tamper detection performance. According to the results, the fusion method performs better than underlying methods even under high JPEG compression. For forgeries as small as 100$\times$100 pixel size, the proposed method outperforms the stateoftheart, which validates the usefulness of fusion for localization of smallsize image forgeries. We believe the proposed approach is feasible for any tamperdetection pipeline using the PRNU based methodology.
 [404] arXiv:2002.10133 (crosslist from physics.compph) [pdf, other]

Title: Nonisothermal ScharfetterGummel scheme for electrothermal transport simulation in degenerate semiconductorsSubjects: Computational Physics (physics.compph); Other Condensed Matter (condmat.other); Numerical Analysis (math.NA); Applied Physics (physics.appph)
Electrothermal transport phenomena in semiconductors are described by the nonisothermal driftdiffusion system. The equations take a remarkably simple form when assuming the Kelvin formula for the thermopower. We present a novel, nonisothermal generalization of the ScharfetterGummel finite volume discretization for degenerate semiconductors obeying FermiDirac statistics, which preserves numerous structural properties of the continuous model on the discrete level. The approach is demonstrated by 2D simulations of a heterojunction bipolar transistor.
 [405] arXiv:2002.10138 (crosslist from physics.socph) [pdf, ps, other]

Title: Universality of citation distributions and its explanationAuthors: Michael GolosovskyComments: 23 pages, 9 figuresSubjects: Physics and Society (physics.socph); Digital Libraries (cs.DL)
Universality or nearuniversality of citation distributions was found empirically a decade ago but its theoretical justification has been lacking so far. Here, we systematically study citation distributions for different disciplines in order to characterize this putative universality and to understand it theoretically. Using our calibrated model of citation dynamics, we find microscopic explanation of the universality of citation distributions and explain deviations therefrom. We demonstrate that citation count of the paper is determined, on the one hand, by its fitness  the attribute which, for most papers, is set at the moment of publication. The fitness distributions for different disciplines are very similar and can be approximated by the lognormal distribution. On another hand, citation dynamics of a paper is related to the mechanism by which the knowledge about it spreads in the scientific community. This viral propagation is nonuniversal and disciplinespecific. Thus, universality of citation distributions traces its origin to the fitness distribution, while deviations from universality are associated with the disciplinespecific citation dynamics of papers.
 [406] arXiv:2002.10167 (crosslist from eess.SP) [pdf, other]

Title: Joint blind calibration and timedelay estimation for multiband rangingComments: 4 pages, 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)Subjects: Signal Processing (eess.SP); Systems and Control (eess.SY)
In this paper, we focus on the problem of blind joint calibration of multiband transceivers and timedelay (TD) estimation of multipath channels. We show that this problem can be formulated as a particular case of covariance matching. Although this problem is severely illposed, prior information about radiofrequency chain distortions and multipath channel sparsity is used for regularization. This approach leads to a biconvex optimization problem, which is formulated as a rankconstrained linear system and solved by a simple group Lasso algorithm.Numerical experiments show that the proposed algorithm provides better calibration and higher resolution for TD estimation than current stateoftheart methods.
 [407] arXiv:2002.10183 (crosslist from physics.insdet) [pdf, other]

Title: JPET Framework: Software platform for PET tomography data reconstruction and analysisComments: 12 pages, 4 figuresSubjects: Instrumentation and Detectors (physics.insdet); Software Engineering (cs.SE); Medical Physics (physics.medph)
JPET Framework is an opensource software platform for data analysis, written in C++ and based on the ROOT package. It provides a common environment for implementation of reconstruction, calibration and filtering procedures, as well as for userlevel analyses of Positron Emission Tomography data. The library contains a set of building blocks that can be combined by users with even little programming experience, into chains of processing tasks through a convenient, simple and welldocumented API. The generic inputoutput interface allows processing the data from various sources: lowlevel data from the tomography acquisition system or from diagnostic setups such as digital oscilloscopes, as well as highlevel tomography structures e.g. sinograms or a list of linesofresponse. Moreover, the environment can be interfaced with Monte Carlo simulation packages such as GEANT and GATE, which are commonly used in the medical scientific community.
 [408] arXiv:2002.10201 (crosslist from eess.IV) [pdf, other]

Title: Beyond Camera Motion Removing: How to Handle Outliers in DeblurringSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Performing camera motion deblurring is an important lowlevel vision task for achieving better imaging quality. When a scene has outliers such as saturated pixels and saltand pepper noise, the image becomes more difficult to restore. In this paper, we propose an edgeaware scalerecurrent network (EASRN) to conduct camera motion deblurring. EASRN has a separate deblurring module that removes blur at multiple scales and an upsampling module that fuses different input scales. We propose a salient edge detection network to supervise the training process and solve the outlier problem by proposing a novel method of dataset generation. Light streaks are printed on the sharp image to simulate the cutoff effect from saturation. We evaluate our method on the standard deblurring datasets. Both objective evaluation indexes and subjective visualization show that our method results in better deblurring quality than the other stateoftheart approaches.
 [409] arXiv:2002.10206 (crosslist from qfin.CP) [pdf, ps, other]

Title: Hybrid, adaptive, and positivity preserving numerical methods for the CoxIngersollRoss modelComments: 24 pages, 3 figures, 2 tablesSubjects: Computational Finance (qfin.CP); Numerical Analysis (math.NA)
We introduce an adaptive Euler method for the approximate solution of the CoxIngersollRoss short rate model. An explicit discretisation is applied over an adaptive mesh to the stochastic differential equation (SDE) governing the square root of the solution, relying upon a class of pathbounded timestepping strategies which work by reducing the stepsize as solutions approach a neighbourhood of zero. The method is hybrid in the sense that a backstop method is invoked if the timestep becomes too small, or to prevent solutions from overshooting zero and becoming negative. Under parameter constraints that imply Feller's condition, we prove that such a scheme is strongly convergent, of order at least 1/2. Under Feller's condition we also prove that the probability of ever needing the backstop method to prevent a negative value can be made arbitrarily small. Numerically, we compare this adaptive method to fixed step schemes extant in the literature, both implicit and explicit, and a novel semiimplicit adaptive variant. We observe that the adaptive approach leads to methods that are competitive over the entire domain of Feller's condition.
 [410] arXiv:2002.10218 (crosslist from astroph.CO) [pdf, other]

Title: Baryon acoustic oscillations reconstruction using convolutional neural networksAuthors: TianXiang Mao, Jie Wang, Baojiu Li, YanChuan Cai, Bridget Falck, Mark Neyrinck, Alex SzalaySubjects: Cosmology and Nongalactic Astrophysics (astroph.CO); Machine Learning (cs.LG)
Here we propose a new scheme to reconstruct the baryon acoustic oscillations (BAO) signal, with key cosmological information, based on deep convolutional neural networks. After training the network with almost no finetuning, in the test set, the network recovers largescale modes accurately: the correlation coefficient between the ground truth and recovered initial conditions still reach $90\%$ at $k \leq 0.2~ h\mathrm{Mpc}^{1}$, which significantly improves the BAO signaltonoise ratio until the scale $k=0.4~ h\mathrm{Mpc}^{1}$. Furthermore, our scheme is independent of the survey boundary since it reconstructs initial condition based on local density distribution in configuration space, which means that we can gain more information from the whole survey space. Finally, we found our trained network is not sensitive to the cosmological parameters and works very well in those cosmologies close to that of our training set. This new scheme will possibly help us dig out more information from the current, ongoing and future galaxy surveys.
 [411] arXiv:2002.10243 (crosslist from stat.ML) [pdf, other]

Title: Informative Gaussian Scale Mixture Priors for Bayesian Neural NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Encoding domain knowledge into the prior over the highdimensional weight space is challenging in Bayesian neural networks. Two types of domain knowledge are commonly available in scientific applications: 1. feature sparsity (number of relevant features); 2. signaltonoise ratio, quantified, for instance, as the proportion of variance explained (PVE). We show both types of domain knowledge can be encoded into the widely used Gaussian scale mixture priors with Automatic Relevance Determination. Specifically, we propose a new joint prior over the local (i.e., featurespecific) scale parameters to encode the knowledge about feature sparsity, and an algorithm to determine the global scale parameter (shared by all features) according to the PVE. Empirically, we show that the proposed informative prior improves prediction accuracy on publicly available datasets and in a genetics application.
 [412] arXiv:2002.10247 (crosslist from qfin.ST) [pdf]

Title: Forecasting Foreign Exchange Rate: A Multivariate Comparative Analysis between Traditional Econometric, Contemporary Machine Learning & Deep Learning TechniquesComments: 10 pagesSubjects: Statistical Finance (qfin.ST); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
In todays global economy, accuracy in predicting macroeconomic parameters such as the foreign the exchange rate or at least estimating the trend correctly is of key importance for any future investment. In recent times, the use of computational intelligencebased techniques for forecasting macroeconomic variables has been proven highly successful. This paper tries to come up with a multivariate time series approach to forecast the exchange rate (USD/INR) while parallelly comparing the performance of three multivariate prediction modelling techniques: Vector Auto Regression (a Traditional Econometric Technique), Support Vector Machine (a Contemporary Machine Learning Technique), and Recurrent Neural Networks (a Contemporary Deep Learning Technique). We have used monthly historical data for several macroeconomic variables from April 1994 to December 2018 for USA and India to predict USDINR Foreign Exchange Rate. The results clearly depict that contemporary techniques of SVM and RNN (Long ShortTerm Memory) outperform the widely used traditional method of Auto Regression. The RNN model with Long ShortTerm Memory (LSTM) provides the maximum accuracy (97.83%) followed by SVM Model (97.17%) and VAR Model (96.31%). At last, we present a brief analysis of the correlation and interdependencies of the variables used for forecasting.
 [413] arXiv:2002.10257 (crosslist from eess.IV) [pdf, other]

Title: Using wavelets to analyze similarities in image datasetsAuthors: Roozbeh YousefzadehSubjects: Image and Video Processing (eess.IV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Deep learning image classifiers usually rely on huge training sets and their training process can be described as learning the similarities and differences among training images. But, images in large training sets are not usually studied from this perspective and finelevel similarities and differences among images is usually overlooked. Some studies aim to identify the influential and redundant training images, but such methods require a model that is already trained on the entire training set. Here, we show that analyzing the contents of large training sets can provide valuable insights about the classification task at hand, prior to training a model on them. We use wavelet decomposition of images and other image processing tools to perform such analysis, with no need for a pretrained model. This makes the analysis of training sets, straightforward and fast. We show that similar images in standard datasets (such as CIFAR) can be identified in a few seconds, a significant speedup compared to alternative methods in the literature. We also show that similarities between training and testing images may explain the generalization of models and their mistakes. Finally, we investigate the similarities between images in relation to decision boundaries of a trained model.
 [414] arXiv:2002.10271 (crosslist from stat.ML) [pdf, other]

Title: Testing Goodness of Fit of Conditional Density Models with KernelsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Statistics Theory (math.ST)
We propose two nonparametric statistical tests of goodness of fit for conditional distributions: given a conditional probability density function $p(yx)$ and a joint sample, decide whether the sample is drawn from $p(yx)r_x(x)$ for some density $r_x$. Our tests, formulated with a Stein operator, can be applied to any differentiable conditional density model, and require no knowledge of the normalizing constant. We show that 1) our tests are consistent against any fixed alternative conditional model; 2) the statistics can be estimated easily, requiring no density estimation as an intermediate step; and 3) our second test offers an interpretable test result providing insight on where the conditional model does not fit well in the domain of the covariate. We demonstrate the interpretability of our test on a task of modeling the distribution of New York City's taxi dropoff location given a pickup point. To our knowledge, our work is the first to propose such conditional goodnessoffit tests that simultaneously have all these desirable properties.
 [415] arXiv:2002.10290 (crosslist from nuclth) [pdf, other]

Title: Trees and Forests in Nuclear PhysicsSubjects: Nuclear Theory (nuclth); Machine Learning (cs.LG); Nuclear Experiment (nuclex)
We present a detailed introduction to the decision tree algorithm using some simple examples taken from the domain of nuclear physics. We show how to improve the accuracy of the classical liquid drop nuclear mass model by performing Feature Engineering while using a decision tree. Finally, we apply the method to the DufloZucker mass model showing that, despite their simplicity, decision trees are capable of obtaining a level of accuracy comparable to more complex neural networks, but using way less adjustable parameters and obtaining easier to explain models.
 [416] arXiv:2002.10291 (crosslist from math.OC) [pdf, other]

Title: Estimationaware model predictive pathfollowing control for a general 2trailer with a carlike tractorComments: Submitted to IEEE Transactions on Robotics. arXiv admin note: text overlap with arXiv:2002.06874Subjects: Optimization and Control (math.OC); Robotics (cs.RO)
The design of the pathfollowing controller is crucial to enable reliable autonomous vehicle operation. This design problem is especially challenging for a general 2trailer with a carlike tractor due to the tractor's curvature limitations and the vehicle's structurally unstable jointangle kinematics in backward motion. Additionally, to make the control system independent of any sensor mounted on the trailer, advanced sensors placed in the rear of the tractor have been proposed to solve the jointangle estimation problem. Since these sensors typically have a limited field of view, the proposed estimation solution introduces restrictions on the jointangle configurations that can be estimated with high accuracy. To model and explicitly consider these constraints in the controller, a model predictive pathfollowing control approach is proposed. Two approaches with different computation complexity and performance are presented. In the first approach, the constraint on the joint angles is modeled as a union of convex polytopes, making it necessary to incorporate binary decision variables. The second approach avoids binary variables at the expense of a more restrictive approximation of the jointangle constraints. In simulations and field experiments, the performance of the proposed pathfollowing control approach in terms of suppressing disturbances and recovering from nontrivial initial states is compared with a previously proposed control strategy where the jointangle constraints are neglected.
 [417] arXiv:2002.10385 (crosslist from qfin.CP) [pdf, ps, other]

Title: Predictive intraday correlations in stable and volatile market environments: Evidence from deep learningComments: 15 pages, 6 figures, preprint submitted to Physica ASubjects: Computational Finance (qfin.CP); Machine Learning (cs.LG); Machine Learning (stat.ML)
Standard methods and theories in finance can be illequipped to capture highly nonlinear interactions in financial prediction problems based on largescale datasets, with deep learning offering a way to gain insights into correlations in markets as complex systems. In this paper, we apply deep learning to econometrically constructed gradients to learn and exploit lagged correlations among S&P 500 stocks to compare model behaviour in stable and volatile market environments, and under the exclusion of target stock information for predictions. In order to measure the effect of time horizons, we predict intraday and daily stock price movements in varying interval lengths and gauge the complexity of the problem at hand with a modification of our model architecture. Our findings show that accuracies, while remaining significant and demonstrating the exploitability of lagged correlations in stock markets, decrease with shorter prediction horizons. We discuss implications for modern finance theory and our work's applicability as an investigative tool for portfolio managers. Lastly, we show that our model's performance is consistent in volatile markets by exposing it to the environment of the recent financial crisis of 2007/2008.
 [418] arXiv:2002.10399 (crosslist from stat.ME) [pdf, other]

Title: Confidence Sets and Hypothesis Testing in a LikelihoodFree Inference SettingComments: 16 pages, 7 figures, 3 tables, 4 algorithm boxesSubjects: Methodology (stat.ME); Machine Learning (cs.LG); Machine Learning (stat.ML)
Parameter estimation, statistical tests and confidence sets are the cornerstones of classical statistics that allow scientists to make inferences about the underlying process that generated the observed data. A key question is whether one can still construct hypothesis tests and confidence sets with proper coverage and high power in a socalled likelihoodfree inference (LFI) setting; that is, a setting where the likelihood is not explicitly known but one can forwardsimulate observable data according to a stochastic model. In this paper, we present $\texttt{ACORE}$ (Approximate Computation via Odds Ratio Estimation), a frequentist approach to LFI that first formulates the classical likelihood ratio test (LRT) as a parametrized classification problem, and then uses the equivalence of tests and confidence sets to build confidence regions for parameters of interest. We also present a goodnessoffit procedure for checking whether the constructed tests and confidence regions are valid. $\texttt{ACORE}$ is based on the key observation that the LRT statistic, the rejection probability of the test, and the coverage of the confidence set are conditional distribution functions which often vary smoothly as a function of the parameters of interest. Hence, instead of relying solely on samples simulated at fixed parameter settings (as is the convention in standard Monte Carlo solutions), one can leverage machine learning tools and data simulated in the neighborhood of a parameter to improve estimates of quantities of interest. We demonstrate the efficacy of $\texttt{ACORE}$ with both theoretical and empirical results. Our implementation is available on Github.
Replacements for Tue, 25 Feb 20
 [419] arXiv:0805.1293 (replaced) [pdf]

Title: Testability of Reversible Iterative Logic ArraysAuthors: Avik ChakrabortySubjects: Other Computer Science (cs.OH)
 [420] arXiv:1303.3341 (replaced) [pdf, ps, other]

Title: A short proof that all linear codes are weakly algebraicgeometric using Bertini theorems of B. PoonenAuthors: Srimathy SrinivasanComments: Title modified, expository content shortened. Final version to appear in Discrete MathJournalref: Discrete Math., vol. 343, Issue 6, June 2020Subjects: Information Theory (cs.IT)
 [421] arXiv:1511.03019 (replaced) [pdf, other]

Title: 3D Timelapse Reconstruction from Internet PhotosComments: To appear in ICCV'15. Supplementary video at: this http URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
 [422] arXiv:1602.04410 (replaced) [pdf, ps, other]

Title: Simple Characterizations of Potential Games and Zerosum GamesSubjects: Computer Science and Game Theory (cs.GT)
 [423] arXiv:1607.00174 (replaced) [pdf, other]

Title: Blockchainbased Proof of LocationComments: 13 pages, 9 figuresJournalref: 2018 IEEE International Conference on Software Quality, Reliability and Security Companion (QRSC)Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Cryptography and Security (cs.CR)
 [424] arXiv:1609.02490 (replaced) [pdf, ps, other]

Title: Information and dimensionality of anisotropic random geometric graphsComments: 38 pagesSubjects: Statistics Theory (math.ST); Social and Information Networks (cs.SI); Probability (math.PR)
 [425] arXiv:1703.10556 (replaced) [pdf, other]

Title: Sparse Signal Recovery via Generalized Entropy Functions MinimizationJournalref: IEEE Transactions on Signal Processing, Vol. 67 (5), Mar. 2019Subjects: Information Theory (cs.IT)
 [426] arXiv:1710.04640 (replaced) [pdf, other]

Title: Hard and Easy Instances of LTromino TilingsComments: Full extended version of LNCS 11355:8295 (WALCOM 2019)Subjects: Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Combinatorics (math.CO)
 [427] arXiv:1711.09740 (replaced) [pdf, other]

Title: Distances between States and between PredicatesSubjects: Logic in Computer Science (cs.LO)
 [428] arXiv:1802.04613 (replaced) [pdf, other]

Title: Firstorder queries on classes of structures with bounded expansionSubjects: Databases (cs.DB)
 [429] arXiv:1802.09081 (replaced) [pdf, other]

Title: Temporal Difference Models: ModelFree Deep RL for ModelBased ControlComments: Appeared in ICLR 2018; typos correctedSubjects: Machine Learning (cs.LG)
 [430] arXiv:1804.11021 (replaced) [src]

Title: On the Effect of Suboptimal Estimation of Mutual Information in Feature Selection and ClassificationComments: Some of the results in the paper need to be taken back as they were not able to be reproduced, and thus we are requesting a withdrawal of the paper until we are able to update and verify our previous experiments & scriptsSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [431] arXiv:1805.00692 (replaced) [pdf, ps, other]

Title: Compressed Dictionary LearningComments: 5 figure, 4.6 pages per figureSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [432] arXiv:1806.01304 (replaced) [pdf, other]

Title: MOSES: A Streaming Algorithm for Linear Dimensionality ReductionSubjects: Information Theory (cs.IT)
 [433] arXiv:1806.04225 (replaced) [pdf, other]

Title: PACBayes Control: Learning Policies that Provably Generalize to Novel EnvironmentsComments: Extended version of paper presented at the 2018 Conference on Robot Learning (CoRL)Subjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Systems and Control (eess.SY); Optimization and Control (math.OC)
 [434] arXiv:1806.07709 (replaced) [pdf, ps, other]

Title: Notes on Abstract Argumentation TheoryAuthors: Anthony Peter YoungComments: 93 pages, 38 figures, 6 tablesSubjects: Artificial Intelligence (cs.AI)
 [435] arXiv:1806.07984 (replaced) [pdf, other]

Title: Enclave Tasking for Discontinuous Galerkin Methods on Dynamically Adaptive MeshesSubjects: Mathematical Software (cs.MS); Distributed, Parallel, and Cluster Computing (cs.DC)
 [436] arXiv:1808.08347 (replaced) [pdf, other]

Title: A Comparison of the Taguchi Method and Evolutionary Optimization in Multivariate TestingComments: 5 pages, 4 figures, IAAI19Subjects: Neural and Evolutionary Computing (cs.NE)
 [437] arXiv:1809.01674 (replaced) [pdf, other]

Title: Hierarchical Selective Recruitment in LinearThreshold Brain Networks  Part I: SingleLayer Dynamics and Selective InhibitionSubjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
 [438] arXiv:1809.02493 (replaced) [pdf, other]

Title: Hierarchical Selective Recruitment in LinearThreshold Brain Networks  Part II: MultiLayer Dynamics and TopDown RecruitmentSubjects: Systems and Control (eess.SY); Neural and Evolutionary Computing (cs.NE); Optimization and Control (math.OC)
 [439] arXiv:1810.11187 (replaced) [pdf, other]

Title: TarMAC: Targeted MultiAgent CommunicationAuthors: Abhishek Das, Théophile Gervet, Joshua Romoff, Dhruv Batra, Devi Parikh, Michael Rabbat, Joelle PineauComments: ICML 2019Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA); Machine Learning (stat.ML)
 [440] arXiv:1810.11556 (replaced) [pdf, other]

Title: Efficient and Trustworthy Social Navigation Via Explicit and Implicit RobotHuman CommunicationJournalref: IEEETransactionsonRobotics,pp(99):116,2020Subjects: Robotics (cs.RO)
 [441] arXiv:1811.01290 (replaced) [pdf, ps, other]

Title: AutoML Deep Learning for Rashi Scripts OCRComments: The paper is under consideration at Pattern Recognition LettersSubjects: Computer Vision and Pattern Recognition (cs.CV)
 [442] arXiv:1811.02702 (replaced) [pdf, other]

Title: Greedy FrankWolfe Algorithm for Exemplar SelectionSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [443] arXiv:1811.08581 (replaced) [pdf, other]

Title: Recent Advances in Open Set Recognition: A SurveyComments: This is a preliminary and will be kept updated, any suggestions and comments are welcome (gengchuanxing@nuaa.edu.cn)Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [444] arXiv:1811.09231 (replaced) [pdf, other]

Title: Goalconstrained Planning Domain Model Verification of Safety PropertiesSubjects: Artificial Intelligence (cs.AI)
 [445] arXiv:1811.11474 (replaced) [pdf, other]

Title: Improved Calibration of Numerical Integration Error in SigmaPoint FiltersComments: 13 pages, 4 figuresSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Signal Processing (eess.SP); Methodology (stat.ME)
 [446] arXiv:1811.12506 (replaced) [pdf, other]

Title: 3D SemiSupervised Learning with UncertaintyAware MultiView CoTrainingAuthors: Yingda Xia, Fengze Liu, Dong Yang, Jinzheng Cai, Lequan Yu, Zhuotun Zhu, Daguang Xu, Alan Yuille, Holger RothComments: Accepted to WACV 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
 [447] arXiv:1811.12804 (replaced) [pdf, other]

Title: Asymmetry Helps: Eigenvalue and Eigenvector Analyses of Asymmetrically Perturbed LowRank MatricesComments: accepted to Annals of Statistics, 2020. 37 pagesSubjects: Statistics Theory (math.ST); Information Theory (cs.IT); Signal Processing (eess.SP); Numerical Analysis (math.NA); Machine Learning (stat.ML)
 [448] arXiv:1812.00071 (replaced) [pdf, other]

Title: Stochastic Gradient MCMC with Repulsive ForcesComments: Extends the workshop versionSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [449] arXiv:1812.00885 (replaced) [pdf, ps, other]
 [450] arXiv:1812.04201 (replaced) [pdf, other]

Title: Rangebased Coordinate Alignment for Cooperative Mobile Sensor Network LocalizationSubjects: Systems and Control (eess.SY); Multiagent Systems (cs.MA)
 [451] arXiv:1812.09806 (replaced) [pdf, other]

Title: Analysis of contagion maps on a class of networks that are spatially embedded in a torusSubjects: Social and Information Networks (cs.SI); Algebraic Topology (math.AT); Dynamical Systems (math.DS); Adaptation and SelfOrganizing Systems (nlin.AO); Physics and Society (physics.socph)
 [452] arXiv:1901.00137 (replaced) [pdf, ps, other]

Title: A Theoretical Analysis of Deep QLearningComments: 65 pagesSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [453] arXiv:1901.03775 (replaced) [pdf, other]

Title: Creative AI Through Evolutionary ComputationAuthors: Risto MiikkulainenJournalref: In Banzhaf et al. (editors), Evolution in ActionPast, Present and Future. New York: Springer. 2020Subjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [454] arXiv:1901.04008 (replaced) [pdf, other]

Title: Fast Deterministic Algorithms for HighlyDynamic NetworksSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
 [455] arXiv:1901.10300 (replaced) [pdf, other]

Title: WeightedSampling Audio Adversarial Example AttackSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD)
 [456] arXiv:1902.00450 (replaced) [pdf, other]

Title: Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden ConfoundersSubjects: Machine Learning (cs.LG); Applications (stat.AP); Machine Learning (stat.ML)
 [457] arXiv:1902.09009 (replaced) [pdf, ps, other]

Title: Efficient Private Algorithms for Learning LargeMargin HalfspacesComments: changed title, added references and remarksSubjects: Machine Learning (cs.LG); Data Structures and Algorithms (cs.DS); Machine Learning (stat.ML)
 [458] arXiv:1902.09458 (replaced) [pdf, other]

Title: LongRange Indoor Navigation with PRMRLAuthors: Anthony Francis, Aleksandra Faust, HaoTien Lewis Chiang, Jasmine Hsu, J. Chase Kew, Marek Fiser, TsangWei Edward LeeComments: Accepted to TROSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
 [459] arXiv:1902.09527 (replaced) [pdf, other]

Title: clusterNOR: A NUMAOptimized Clustering FrameworkComments: arXiv admin note: Journal version of arXiv:1606.08905Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [460] arXiv:1903.00070 (replaced) [pdf, other]

Title: Learning to Plan in High Dimensions via Neural ExplorationExploitation TreesComments: 26 pages, 74 figures, ICLR 2020 spotlightSubjects: Machine Learning (cs.LG); Robotics (cs.RO); Machine Learning (stat.ML)
 [461] arXiv:1903.01672 (replaced) [pdf, other]

Title: Causal Discovery from Heterogeneous/Nonstationary DataAuthors: Biwei Huang, Kun Zhang, Jiji Zhang, Joseph Ramsey, Ruben SanchezRomero, Clark Glymour, Bernhard SchölkopfSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [462] arXiv:1903.06187 (replaced) [pdf, other]

Title: Noregret Exploration in Contextual Reinforcement LearningComments: Under review. 25 pages, 2 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [463] arXiv:1903.09338 (replaced) [pdf, other]

Title: Optimization Methods for Interpretable Differentiable Decision Trees in Reinforcement LearningAuthors: Andrew Silva, Taylor Killian, Ivan Dario Jimenez Rodriguez, SungHyun Son, Matthew GombolaySubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [464] arXiv:1904.00069 (replaced) [pdf, other]

Title: Unpaired Point Cloud Completion on Real Scans using Adversarial TrainingComments: ICLR 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
 [465] arXiv:1904.01803 (replaced) [pdf, other]

Title: GFF: Gated Fully Fusion for Semantic SegmentationComments: accepted by AAAI2020(oral)Subjects: Computer Vision and Pattern Recognition (cs.CV)
 [466] arXiv:1904.02311 (replaced) [pdf, ps, other]

Title: Approximation Rates for Neural Networks with General Activation FunctionsSubjects: Classical Analysis and ODEs (math.CA); Machine Learning (cs.LG)
 [467] arXiv:1904.03766 (replaced) [pdf, other]

Title: Generalized Persistence Algorithm for Decomposing Multiparameter Persistence ModulesSubjects: Algebraic Topology (math.AT); Computational Geometry (cs.CG)
 [468] arXiv:1904.04898 (replaced) [pdf, other]

Title: On multiple solutions to the steady flow of incompressible fluids subject to donothing or constant traction boundary conditions on artificial boundariesComments: 15 pagesJournalref: Journal of Mathematical Fluid Mechanics 22(11), 2020Subjects: Fluid Dynamics (physics.fludyn); Numerical Analysis (math.NA)
 [469] arXiv:1904.07162 (replaced) [pdf, other]

Title: Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent MemoryAuthors: Gurbinder Gill (1), Roshan Dathathri (1), Loc Hoang (1), Ramesh Peri (2), Keshav Pingali (1) ((1) The University of Texas at Austin, (2) Intel Corporation)Comments: 11 pagesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
 [470] arXiv:1904.07184 (replaced) [pdf, ps, other]

Title: A monotone scheme for Gequations with application to the explicit convergence rate of robust central limit theoremComments: 31 pagesSubjects: Probability (math.PR); Numerical Analysis (math.NA)
 [471] arXiv:1904.07538 (replaced) [pdf, other]

Title: LongTerm Human Video Generation of Multiple Futures Using PosesSubjects: Computer Vision and Pattern Recognition (cs.CV)
 [472] arXiv:1904.08554 (replaced) [pdf, ps, other]

Title: Using Honeypots to Catch Adversarial Attacks on Neural NetworksComments: 14 pagesSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [473] arXiv:1904.09573 (replaced) [pdf, ps, other]

Title: Enabling Secure Wireless Communications via Intelligent Reflecting SurfacesComments: 6 pages, 5 figures, accepted to IEEE Global Commun. Conf. (GLOBECOM), Waikoloa, HI, USA, Dec. 2019, final versionSubjects: Information Theory (cs.IT)
 [474] arXiv:1904.09675 (replaced) [pdf, other]

Title: BERTScore: Evaluating Text Generation with BERTComments: Code available at this https URL; To appear in ICLR2020Subjects: Computation and Language (cs.CL)
 [475] arXiv:1904.10164 (replaced) [pdf, other]

Title: Foundations, Properties, and Security Applications of Puzzles: A SurveySubjects: Cryptography and Security (cs.CR)
 [476] arXiv:1904.12069 (replaced) [pdf]

Title: Improving Deep Speech Denoising by Noisy2Noisy Signal MappingSubjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
 [477] arXiv:1905.02602 (replaced) [pdf, other]

Title: Dissecting Android Cryptocurrency MinersAuthors: Stanislav Dashevskyi, Yury Zhauniarovich, Olga Gadyatskaya, Aleksandr Pilgun, Hamza OuhssainSubjects: Cryptography and Security (cs.CR)
 [478] arXiv:1905.02789 (replaced) [pdf, other]

Title: Variational training of neural network approximations of solution maps for physical modelsSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Machine Learning (stat.ML)
 [479] arXiv:1905.03428 (replaced) [pdf, other]

Title: Testing Scenario Library Generation for Connected and Automated Vehicles, Part II: Case StudiesComments: 12 pages, 13 figuresSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI)
 [480] arXiv:1905.10428 (replaced) [pdf, other]

Title: LdSM: Logarithmdepth Streaming Multilabel Decision TreesSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [481] arXiv:1905.10634 (replaced) [pdf, other]

Title: Adaptive, DistributionFree Prediction Intervals for Deep NetworksSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [482] arXiv:1905.11460 (replaced) [pdf, other]

Title: Incidence Networks for Geometric Deep LearningComments: Last revised 24 Feb 2020Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [483] arXiv:1905.11475 (replaced) [pdf, other]

Title: Adversarial Example Detection and Classification With Asymmetrical Adversarial TrainingComments: ICLR 2020Subjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [484] arXiv:1905.11656 (replaced) [pdf, other]

Title: Discrete Infomax Codes for Supervised Representation LearningComments: 19 pagesSubjects: Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [485] arXiv:1905.11978 (replaced) [pdf, other]

Title: Better LongRange Dependency By Bootstrapping A Mutual Information RegularizerComments: Cameraready for AISTATS 2020Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)
 [486] arXiv:1905.12407 (replaced) [pdf, other]

Title: Nonlinear Multitask Learning with Deep Gaussian ProcessesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [487] arXiv:1905.12915 (replaced) [pdf, ps, other]

Title: Separating an Outlier from a ChangeComments: 9 pages, 10 figuresSubjects: Signal Processing (eess.SP); Information Theory (cs.IT); Applications (stat.AP)
 [488] arXiv:1905.12935 (replaced) [pdf, ps, other]

Title: Consistency of circuit lower bounds with bounded theoriesSubjects: Computational Complexity (cs.CC); Logic (math.LO)
 [489] arXiv:1905.13651 (replaced) [pdf, other]

Title: Principal Fairness: Removing Bias via ProjectionsAuthors: Aris Anagnostopoulos, Luca Becchetti, Adriano Fazzone, Cristina Menghini, Chris SchwiegelshohnSubjects: Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG)
 [490] arXiv:1906.00829 (replaced) [pdf, ps, other]

Title: An adaptive multiresolution discontinuous Galerkin method with artificial viscosity for scalar hyperbolic conservation laws in multidimensionsSubjects: Numerical Analysis (math.NA)
 [491] arXiv:1906.01827 (replaced) [pdf, ps, other]

Title: Coresets for Dataefficient Training of Machine Learning ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Machine Learning (stat.ML)
 [492] arXiv:1906.02735 (replaced) [pdf, other]

Title: Residual Flows for Invertible Generative ModelingComments: NeurIPS 2019Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [493] arXiv:1906.02922 (replaced) [pdf, other]

Title: Drifting Reinforcement Learning: The Blessing of (More) Optimism in Face of Endogenous & Exogenous DynamicsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [494] arXiv:1906.03038 (replaced) [pdf, other]

Title: A Generative Framework for ZeroShot Learning with Adversarial Domain AdaptationComments: Proceedings of Winter Conference on Applications of Computer Vision (WACV) 2020Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (stat.ML)
 [495] arXiv:1906.03218 (replaced) [pdf, other]

Title: Planning With Uncertain Specifications (PUnS)Comments: Accepted for publication by IEEE Robotics and Automation Letters. Accepted for presentation at the 2020 IEEE International Conference on Robotics and AutomationSubjects: Robotics (cs.RO)
 [496] arXiv:1906.03231 (replaced) [pdf, ps, other]

Title: A cryptographic approach to black box adversarial machine learningSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR); Machine Learning (stat.ML)
 [497] arXiv:1906.03671 (replaced) [pdf, other]

Title: Deep Batch Active Learning by Diverse, Uncertain Gradient Lower BoundsJournalref: 2020 International Conference on Learning RepresentationsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [498] arXiv:1906.05204 (replaced) [pdf, other]

Title: ModelFree Practical Cooperative Control for Diffusively Coupled SystemsComments: 12 pages, 7 figuresSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
 [499] arXiv:1906.06289 (replaced) [pdf, other]

Title: MultiCarrier Agile Phased Array RadarComments: 16 pagesSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
 [500] arXiv:1906.06575 (replaced) [src]

Title: Single Image Superresolution via Dense Blended Attention Generative Adversarial Network for Clinical DiagnosisAuthors: Kewen Liu, Yuan Ma, Hongxia Xiong, Zejun Yan, Zhijun Zhou, Chaoyang Liu, Panpan Fang, Xiaojun Li, Yalei ChenComments: We abandoned this paper due to its limitation only applied on medical images, please view our lastest work at arXiv:1911.03464Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
 [501] arXiv:1906.06794 (replaced) [pdf, other]

Title: BackProjection based Fidelity Term for IllPosed Linear Inverse ProblemsSubjects: Computer Vision and Pattern Recognition (cs.CV); Numerical Analysis (math.NA)
 [502] arXiv:1906.08720 (replaced) [pdf, other]

Title: Boosting for Control of Dynamical SystemsSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [503] arXiv:1906.11152 (replaced) [pdf, other]

Title: Modulating Surrogates for Bayesian OptimizationAuthors: Erik Bodin, Markus Kaiser, Ieva Kazlauskaite, Zhenwen Dai, Neill D. F. Campbell, Carl Henrik EkSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
 [504] arXiv:1907.00422 (replaced) [pdf, other]

Title: Dedicated Lane for Connected and Automated Vehicle: How Much Does A Homogeneous Traffic Flow Contribute?Subjects: Systems and Control (eess.SY)
 [505] arXiv:1907.00939 (replaced) [pdf, other]

Title: Pano Popups: Indoor 3D Reconstruction with a PlaneAware NetworkComments: 2019 International Conference on 3D Vision (3DV). IEEE, 2019Subjects: Computer Vision and Pattern Recognition (cs.CV)
 [506] arXiv:1907.03406 (replaced) [pdf, other]

Title: Sparse Hierarchical Preconditioners Using Piecewise Smooth Approximations of EigenvectorsSubjects: Numerical Analysis (math.NA); Computational Physics (physics.compph)
 [507] arXiv:1907.03707 (replaced) [pdf, ps, other]

Title: Asymmetric LOCO Codes: Constrained Codes for Flash MemoriesComments: 9 pages (double column), 0 figures, accepted at the Annual Allerton Conference on Communication, Control, and ComputingSubjects: Information Theory (cs.IT)
 [508] arXiv:1907.04409 (replaced) [pdf, other]

Title: Global Optimality Guarantees for Nonconvex Unsupervised Video SegmentationComments: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, 2019; added funding source information and notation definitionsJournalref: Proceedings of the 57th Annual Allerton Conference on Communication, Control, and Computing, pp. 965972, 2019Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Optimization and Control (math.OC); Machine Learning (stat.ML)
 [509] arXiv:1907.04596 (replaced) [pdf, other]

Title: A Little Charity Guarantees Almost EnvyFreenessComments: Preliminary version appeared in SODA 2020Subjects: Computer Science and Game Theory (cs.GT)
 [510] arXiv:1907.04640 (replaced) [pdf, ps, other]

Title: SanthaVazirani sources, deterministic condensers and very strong extractorsSubjects: Computational Complexity (cs.CC)
 [511] arXiv:1907.05320 (replaced) [pdf, other]

Title: TraceRelating Compiler Correctness and Secure CompilationAuthors: Carmine Abate, Roberto Blanco, Stefan Ciobaca, Adrien Durier, Deepak Garg, Catalin Hritcu, Marco Patrignani, Éric Tanter, Jérémy ThibaultComments: ESOP'20 camera ready version together with online appendixSubjects: Programming Languages (cs.PL); Cryptography and Security (cs.CR)
 [512] arXiv:1907.05444 (replaced) [pdf, other]

Title: On the Optimality of Trees Generated by ID3Subjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
 [513] arXiv:1907.05972 (replaced) [pdf, other]

Title: Spearphone: A Speech Privacy Exploit via AccelerometerSensed Reverberations from Smartphone LoudspeakersComments: 15 pages, 25 figuresSubjects: Cryptography and Security (cs.CR)
 [514] arXiv:1907.06319 (replaced) [pdf]

Title: Enabling MultiShell bValue Generalizability of DataDriven Diffusion Models with Deep SHOREAuthors: Vishwesh Nath, Ilwoo Lyu, Kurt G. Schilling, Prasanna Parvathaneni, Colin B. Hansen, Yucheng Tang, Yuankai Huo, Vaibhav A. Janve, Yurui Gao, Iwona Stepniewska, Adam W. Anderson, Bennett A. LandmanSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
 [515] arXiv:1907.06406 (replaced) [pdf, other]

Title: Improving the Harmony of the Composite Image by SpatialSeparated Attention ModuleComments: Accepted by IEEE Transactions on Image Processing (TIP) 2020Subjects: Computer Vision and Pattern Recognition (cs.CV)
 [516] arXiv:1907.08089 (replaced) [pdf, other]

Title: Comparing the Effects of DNS, DoT, and DoH on Web PerformanceComments: The Web Conference 2020 (WWW '20)Subjects: Networking and Internet Architecture (cs.NI); Cryptography and Security (cs.CR)
 [517] arXiv:1907.10257 (replaced) [pdf, other]

Title: Adaptive and Compressive Beamforming Using Deep Learning for Medical UltrasoundComments: This is a significantly extended version of the original paper in arXiv:1901.01706. This paper is accepted for IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency ControlSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
 [518] arXiv:1907.10905 (replaced) [pdf, other]

Title: A GroupTheoretic Framework for Data AugmentationAuthors: Shuxiao Chen,