\n", - " | title | \n", - "summary | \n", - "authors | \n", - "url | \n", - "_distance | \n", - "
---|---|---|---|---|---|
0 | \n", - "XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification | \n", - "In recent years, there have been numerous developments towards solving\\nmultimodal tasks, aiming to learn a stronger representation than through a\\nsingle modality. Certain aspects of the data can be particularly useful in this\\ncase - for example, correlations in the space or time domain across modalities\\n- but should be wisely exploited in order to benefit from their full predictive\\npotential. We propose two deep learning architectures with multimodal\\ncross-connections that allow for dataflow between several feature extractors\\n(XFlow). Our models derive more interpretable features and achieve better\\nperformances than models which do not exchange representations, usefully\\nexploiting correlations between audio and visual data, which have a different\\ndimensionality and are nontrivially exchangeable. Our work improves on existing\\nmultimodal deep learning algorithms in two essential ways: (1) it presents a\\nnovel method for performing cross-modality (before features are learned from\\nindividual modalities) and (2) extends the previously proposed\\ncross-connections which only transfer information between streams that process\\ncompatible data. Illustrating some of the representations learned by the\\nconnections, we analyse their contribution to the increase in discrimination\\nability and reveal their compatibility with a lip-reading network intermediate\\nrepresentation. We provide the research community with Digits, a new dataset\\nconsisting of three data types extracted from videos of people saying the\\ndigits 0-9. Results show that both cross-modal architectures outperform their\\nbaselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits\\ndatasets, achieving state-of-the-art results. | \n", - "[arxiv.Result.Author('Cătălina Cangea'), arxiv.Result.Author('Petar Veličković'), arxiv.Result.Author('Pietro Liò')] | \n", - "http://arxiv.org/abs/1709.00572v2 | \n", - "40.346901 | \n", - "
1 | \n", - "Dualing GANs | \n", - "Generative adversarial nets (GANs) are a promising technique for modeling a\\ndistribution from samples. It is however well known that GAN training suffers\\nfrom instability due to the nature of its maximin formulation. In this paper,\\nwe explore ways to tackle the instability problem by dualizing the\\ndiscriminator. We start from linear discriminators in which case conjugate\\nduality provides a mechanism to reformulate the saddle point objective into a\\nmaximization problem, such that both the generator and the discriminator of\\nthis 'dualing GAN' act in concert. We then demonstrate how to extend this\\nintuition to non-linear formulations. For GANs with linear discriminators our\\napproach is able to remove the instability in training, while for GANs with\\nnonlinear discriminators our approach provides an alternative to the commonly\\nused GAN training algorithm. | \n", - "[arxiv.Result.Author('Yujia Li'), arxiv.Result.Author('Alexander Schwing'), arxiv.Result.Author('Kuan-Chieh Wang'), arxiv.Result.Author('Richard Zemel')] | \n", - "http://arxiv.org/abs/1706.06216v1 | \n", - "40.449284 | \n", - "
2 | \n", - "Domain Generalization for Object Recognition with Multi-task Autoencoders | \n", - "The problem of domain generalization is to take knowledge acquired from a\\nnumber of related domains where training data is available, and to then\\nsuccessfully apply it to previously unseen domains. We propose a new feature\\nlearning algorithm, Multi-Task Autoencoder (MTAE), that provides good\\ngeneralization performance for cross-domain object recognition.\\n Our algorithm extends the standard denoising autoencoder framework by\\nsubstituting artificially induced corruption with naturally occurring\\ninter-domain variability in the appearance of objects. Instead of\\nreconstructing images from noisy versions, MTAE learns to transform the\\noriginal image into analogs in multiple related domains. It thereby learns\\nfeatures that are robust to variations across domains. The learnt features are\\nthen used as inputs to a classifier.\\n We evaluated the performance of the algorithm on benchmark image recognition\\ndatasets, where the task is to learn features from multiple datasets and to\\nthen predict the image label from unseen datasets. We found that (denoising)\\nMTAE outperforms alternative autoencoder-based models as well as the current\\nstate-of-the-art algorithms for domain generalization. | \n", - "[arxiv.Result.Author('Muhammad Ghifary'), arxiv.Result.Author('W. Bastiaan Kleijn'), arxiv.Result.Author('Mengjie Zhang'), arxiv.Result.Author('David Balduzzi')] | \n", - "http://arxiv.org/abs/1508.07680v1 | \n", - "41.127644 | \n", - "
\n", + " | title | \n", + "summary | \n", + "authors | \n", + "url | \n", + "_distance | \n", + "
---|---|---|---|---|---|
0 | \n", + "XFlow: Cross-modal Deep Neural Networks for Audiovisual Classification | \n", + "In recent years, there have been numerous developments towards solving\\nmultimodal tasks, aiming to learn a stronger representation than through a\\nsingle modality. Certain aspects of the data can be particularly useful in this\\ncase - for example, correlations in the space or time domain across modalities\\n- but should be wisely exploited in order to benefit from their full predictive\\npotential. We propose two deep learning architectures with multimodal\\ncross-connections that allow for dataflow between several feature extractors\\n(XFlow). Our models derive more interpretable features and achieve better\\nperformances than models which do not exchange representations, usefully\\nexploiting correlations between audio and visual data, which have a different\\ndimensionality and are nontrivially exchangeable. Our work improves on existing\\nmultimodal deep learning algorithms in two essential ways: (1) it presents a\\nnovel method for performing cross-modality (before features are learned from\\nindividual modalities) and (2) extends the previously proposed\\ncross-connections which only transfer information between streams that process\\ncompatible data. Illustrating some of the representations learned by the\\nconnections, we analyse their contribution to the increase in discrimination\\nability and reveal their compatibility with a lip-reading network intermediate\\nrepresentation. We provide the research community with Digits, a new dataset\\nconsisting of three data types extracted from videos of people saying the\\ndigits 0-9. Results show that both cross-modal architectures outperform their\\nbaselines (by up to 11.5%) when evaluated on the AVletters, CUAVE and Digits\\ndatasets, achieving state-of-the-art results. | \n", + "[arxiv.Result.Author('Cătălina Cangea'), arxiv.Result.Author('Petar Veličković'), arxiv.Result.Author('Pietro Liò')] | \n", + "http://arxiv.org/abs/1709.00572v2 | \n", + "40.346901 | \n", + "
1 | \n", + "Dualing GANs | \n", + "Generative adversarial nets (GANs) are a promising technique for modeling a\\ndistribution from samples. It is however well known that GAN training suffers\\nfrom instability due to the nature of its maximin formulation. In this paper,\\nwe explore ways to tackle the instability problem by dualizing the\\ndiscriminator. We start from linear discriminators in which case conjugate\\nduality provides a mechanism to reformulate the saddle point objective into a\\nmaximization problem, such that both the generator and the discriminator of\\nthis 'dualing GAN' act in concert. We then demonstrate how to extend this\\nintuition to non-linear formulations. For GANs with linear discriminators our\\napproach is able to remove the instability in training, while for GANs with\\nnonlinear discriminators our approach provides an alternative to the commonly\\nused GAN training algorithm. | \n", + "[arxiv.Result.Author('Yujia Li'), arxiv.Result.Author('Alexander Schwing'), arxiv.Result.Author('Kuan-Chieh Wang'), arxiv.Result.Author('Richard Zemel')] | \n", + "http://arxiv.org/abs/1706.06216v1 | \n", + "40.449284 | \n", + "
2 | \n", + "Domain Generalization for Object Recognition with Multi-task Autoencoders | \n", + "The problem of domain generalization is to take knowledge acquired from a\\nnumber of related domains where training data is available, and to then\\nsuccessfully apply it to previously unseen domains. We propose a new feature\\nlearning algorithm, Multi-Task Autoencoder (MTAE), that provides good\\ngeneralization performance for cross-domain object recognition.\\n Our algorithm extends the standard denoising autoencoder framework by\\nsubstituting artificially induced corruption with naturally occurring\\ninter-domain variability in the appearance of objects. Instead of\\nreconstructing images from noisy versions, MTAE learns to transform the\\noriginal image into analogs in multiple related domains. It thereby learns\\nfeatures that are robust to variations across domains. The learnt features are\\nthen used as inputs to a classifier.\\n We evaluated the performance of the algorithm on benchmark image recognition\\ndatasets, where the task is to learn features from multiple datasets and to\\nthen predict the image label from unseen datasets. We found that (denoising)\\nMTAE outperforms alternative autoencoder-based models as well as the current\\nstate-of-the-art algorithms for domain generalization. | \n", + "[arxiv.Result.Author('Muhammad Ghifary'), arxiv.Result.Author('W. Bastiaan Kleijn'), arxiv.Result.Author('Mengjie Zhang'), arxiv.Result.Author('David Balduzzi')] | \n", + "http://arxiv.org/abs/1508.07680v1 | \n", + "41.127644 | \n", + "
\n", - " | title | \n", - "summary | \n", - "authors | \n", - "url | \n", - "_distance | \n", - "
---|---|---|---|---|---|
0 | \n", - "A General Theory for Training Learning Machine | \n", - "Though the deep learning is pushing the machine learning to a new stage,\\nbasic theories of machine learning are still limited. The principle of\\nlearning, the role of the a prior knowledge, the role of neuron bias, and the\\nbasis for choosing neural transfer function and cost function, etc., are still\\nfar from clear. In this paper, we present a general theoretical framework for\\nmachine learning. We classify the prior knowledge into common and\\nproblem-dependent parts, and consider that the aim of learning is to maximally\\nincorporate them. The principle we suggested for maximizing the former is the\\ndesign risk minimization principle, while the neural transfer function, the\\ncost function, as well as pretreatment of samples, are endowed with the role\\nfor maximizing the latter. The role of the neuron bias is explained from a\\ndifferent angle. We develop a Monte Carlo algorithm to establish the\\ninput-output responses, and we control the input-output sensitivity of a\\nlearning machine by controlling that of individual neurons. Applications of\\nfunction approaching and smoothing, pattern recognition and classification, are\\nprovided to illustrate how to train general learning machines based on our\\ntheory and algorithm. Our method may in addition induce new applications, such\\nas the transductive inference. | \n", - "[arxiv.Result.Author('Hong Zhao')] | \n", - "http://arxiv.org/abs/1704.06885v1 | \n", - "33.708359 | \n", - "
1 | \n", - "Learning Visual Reasoning Without Strong Priors | \n", - "Achieving artificial visual reasoning - the ability to answer image-related\\nquestions which require a multi-step, high-level process - is an important step\\ntowards artificial general intelligence. This multi-modal task requires\\nlearning a question-dependent, structured reasoning process over images from\\nlanguage. Standard deep learning approaches tend to exploit biases in the data\\nrather than learn this underlying structure, while leading methods learn to\\nvisually reason successfully but are hand-crafted for reasoning. We show that a\\ngeneral-purpose, Conditional Batch Normalization approach achieves\\nstate-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%\\nerror rate. We outperform the next best end-to-end method (4.5%) and even\\nmethods that use extra supervision (3.1%). We probe our model to shed light on\\nhow it reasons, showing it has learned a question-dependent, multi-step\\nprocess. Previous work has operated under the assumption that visual reasoning\\ncalls for a specialized architecture, but we show that a general architecture\\nwith proper conditioning can learn to visually reason effectively. | \n", - "[arxiv.Result.Author('Ethan Perez'), arxiv.Result.Author('Harm de Vries'), arxiv.Result.Author('Florian Strub'), arxiv.Result.Author('Vincent Dumoulin'), arxiv.Result.Author('Aaron Courville')] | \n", - "http://arxiv.org/abs/1707.03017v5 | \n", - "36.282284 | \n", - "
2 | \n", - "Encoder Based Lifelong Learning | \n", - "This paper introduces a new lifelong learning solution where a single model\\nis trained for a sequence of tasks. The main challenge that vision systems face\\nin this context is catastrophic forgetting: as they tend to adapt to the most\\nrecently seen task, they lose performance on the tasks that were learned\\npreviously. Our method aims at preserving the knowledge of the previous tasks\\nwhile learning a new one by using autoencoders. For each task, an\\nunder-complete autoencoder is learned, capturing the features that are crucial\\nfor its achievement. When a new task is presented to the system, we prevent the\\nreconstructions of the features with these autoencoders from changing, which\\nhas the effect of preserving the information on which the previous tasks are\\nmainly relying. At the same time, the features are given space to adjust to the\\nmost recent environment as only their projection into a low dimension\\nsubmanifold is controlled. The proposed system is evaluated on image\\nclassification tasks and shows a reduction of forgetting over the\\nstate-of-the-art | \n", - "[arxiv.Result.Author('Amal Rannen Triki'), arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Mathew B. Blaschko'), arxiv.Result.Author('Tinne Tuytelaars')] | \n", - "http://arxiv.org/abs/1704.01920v1 | \n", - "37.254250 | \n", - "
\n", + " | title | \n", + "summary | \n", + "authors | \n", + "url | \n", + "_distance | \n", + "
---|---|---|---|---|---|
0 | \n", + "A General Theory for Training Learning Machine | \n", + "Though the deep learning is pushing the machine learning to a new stage,\\nbasic theories of machine learning are still limited. The principle of\\nlearning, the role of the a prior knowledge, the role of neuron bias, and the\\nbasis for choosing neural transfer function and cost function, etc., are still\\nfar from clear. In this paper, we present a general theoretical framework for\\nmachine learning. We classify the prior knowledge into common and\\nproblem-dependent parts, and consider that the aim of learning is to maximally\\nincorporate them. The principle we suggested for maximizing the former is the\\ndesign risk minimization principle, while the neural transfer function, the\\ncost function, as well as pretreatment of samples, are endowed with the role\\nfor maximizing the latter. The role of the neuron bias is explained from a\\ndifferent angle. We develop a Monte Carlo algorithm to establish the\\ninput-output responses, and we control the input-output sensitivity of a\\nlearning machine by controlling that of individual neurons. Applications of\\nfunction approaching and smoothing, pattern recognition and classification, are\\nprovided to illustrate how to train general learning machines based on our\\ntheory and algorithm. Our method may in addition induce new applications, such\\nas the transductive inference. | \n", + "[arxiv.Result.Author('Hong Zhao')] | \n", + "http://arxiv.org/abs/1704.06885v1 | \n", + "33.708359 | \n", + "
1 | \n", + "Learning Visual Reasoning Without Strong Priors | \n", + "Achieving artificial visual reasoning - the ability to answer image-related\\nquestions which require a multi-step, high-level process - is an important step\\ntowards artificial general intelligence. This multi-modal task requires\\nlearning a question-dependent, structured reasoning process over images from\\nlanguage. Standard deep learning approaches tend to exploit biases in the data\\nrather than learn this underlying structure, while leading methods learn to\\nvisually reason successfully but are hand-crafted for reasoning. We show that a\\ngeneral-purpose, Conditional Batch Normalization approach achieves\\nstate-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%\\nerror rate. We outperform the next best end-to-end method (4.5%) and even\\nmethods that use extra supervision (3.1%). We probe our model to shed light on\\nhow it reasons, showing it has learned a question-dependent, multi-step\\nprocess. Previous work has operated under the assumption that visual reasoning\\ncalls for a specialized architecture, but we show that a general architecture\\nwith proper conditioning can learn to visually reason effectively. | \n", + "[arxiv.Result.Author('Ethan Perez'), arxiv.Result.Author('Harm de Vries'), arxiv.Result.Author('Florian Strub'), arxiv.Result.Author('Vincent Dumoulin'), arxiv.Result.Author('Aaron Courville')] | \n", + "http://arxiv.org/abs/1707.03017v5 | \n", + "36.282284 | \n", + "
2 | \n", + "Encoder Based Lifelong Learning | \n", + "This paper introduces a new lifelong learning solution where a single model\\nis trained for a sequence of tasks. The main challenge that vision systems face\\nin this context is catastrophic forgetting: as they tend to adapt to the most\\nrecently seen task, they lose performance on the tasks that were learned\\npreviously. Our method aims at preserving the knowledge of the previous tasks\\nwhile learning a new one by using autoencoders. For each task, an\\nunder-complete autoencoder is learned, capturing the features that are crucial\\nfor its achievement. When a new task is presented to the system, we prevent the\\nreconstructions of the features with these autoencoders from changing, which\\nhas the effect of preserving the information on which the previous tasks are\\nmainly relying. At the same time, the features are given space to adjust to the\\nmost recent environment as only their projection into a low dimension\\nsubmanifold is controlled. The proposed system is evaluated on image\\nclassification tasks and shows a reduction of forgetting over the\\nstate-of-the-art | \n", + "[arxiv.Result.Author('Amal Rannen Triki'), arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Mathew B. Blaschko'), arxiv.Result.Author('Tinne Tuytelaars')] | \n", + "http://arxiv.org/abs/1704.01920v1 | \n", + "37.254250 | \n", + "
\n", + " | title | \n", + "summary | \n", + "authors | \n", + "url | \n", + "score | \n", + "
---|---|---|---|---|---|
0 | \n", + "Expert Gate: Lifelong Learning with a Network of Experts | \n", + "In this paper we introduce a model of lifelong learning, based on a Network\\nof Experts. New tasks / experts are learned and added to the model\\nsequentially, building on what was learned before. To ensure scalability of\\nthis process,data from previous tasks cannot be stored and hence is not\\navailable when learning a new task. A critical issue in such context, not\\naddressed in the literature so far, relates to the decision which expert to\\ndeploy at test time. We introduce a set of gating autoencoders that learn a\\nrepresentation for the task at hand, and, at test time, automatically forward\\nthe test sample to the relevant expert. This also brings memory efficiency as\\nonly one expert network has to be loaded into memory at any given time.\\nFurther, the autoencoders inherently capture the relatedness of one task to\\nanother, based on which the most relevant prior model to be used for training a\\nnew expert, with finetuning or learning without-forgetting, can be selected. We\\nevaluate our method on image classification and video prediction problems. | \n", + "[arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Punarjay Chakravarty'), arxiv.Result.Author('Tinne Tuytelaars')] | \n", + "http://arxiv.org/abs/1611.06194v2 | \n", + "4.703215 | \n", + "
1 | \n", + "Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs | \n", + "The idea of computer vision as the Bayesian inverse problem to computer\\ngraphics has a long history and an appealing elegance, but it has proved\\ndifficult to directly implement. Instead, most vision tasks are approached via\\ncomplex bottom-up processing pipelines. Here we show that it is possible to\\nwrite short, simple probabilistic graphics programs that define flexible\\ngenerative models and to automatically invert them to interpret real-world\\nimages. Generative probabilistic graphics programs consist of a stochastic\\nscene generator, a renderer based on graphics software, a stochastic likelihood\\nmodel linking the renderer's output and the data, and latent variables that\\nadjust the fidelity of the renderer and the tolerance of the likelihood model.\\nRepresentations and algorithms from computer graphics, originally designed to\\nproduce high-quality images, are instead used as the deterministic backbone for\\nhighly approximate and stochastic generative models. This formulation combines\\nprobabilistic programming, computer graphics, and approximate Bayesian\\ncomputation, and depends only on general-purpose, automatic inference\\ntechniques. We describe two applications: reading sequences of degraded and\\nadversarially obscured alphanumeric characters, and inferring 3D road models\\nfrom vehicle-mounted camera images. Each of the probabilistic graphics programs\\nwe present relies on under 20 lines of probabilistic code, and supports\\naccurate, approximately Bayesian inferences about ambiguous real-world images. | \n", + "[arxiv.Result.Author('Vikash K. Mansinghka'), arxiv.Result.Author('Tejas D. Kulkarni'), arxiv.Result.Author('Yura N. Perov'), arxiv.Result.Author('Joshua B. Tenenbaum')] | \n", + "http://arxiv.org/abs/1307.0060v1 | \n", + "4.515473 | \n", + "
2 | \n", + "Learning Visual Reasoning Without Strong Priors | \n", + "Achieving artificial visual reasoning - the ability to answer image-related\\nquestions which require a multi-step, high-level process - is an important step\\ntowards artificial general intelligence. This multi-modal task requires\\nlearning a question-dependent, structured reasoning process over images from\\nlanguage. Standard deep learning approaches tend to exploit biases in the data\\nrather than learn this underlying structure, while leading methods learn to\\nvisually reason successfully but are hand-crafted for reasoning. We show that a\\ngeneral-purpose, Conditional Batch Normalization approach achieves\\nstate-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%\\nerror rate. We outperform the next best end-to-end method (4.5%) and even\\nmethods that use extra supervision (3.1%). We probe our model to shed light on\\nhow it reasons, showing it has learned a question-dependent, multi-step\\nprocess. Previous work has operated under the assumption that visual reasoning\\ncalls for a specialized architecture, but we show that a general architecture\\nwith proper conditioning can learn to visually reason effectively. | \n", + "[arxiv.Result.Author('Ethan Perez'), arxiv.Result.Author('Harm de Vries'), arxiv.Result.Author('Florian Strub'), arxiv.Result.Author('Vincent Dumoulin'), arxiv.Result.Author('Aaron Courville')] | \n", + "http://arxiv.org/abs/1707.03017v5 | \n", + "4.332870 | \n", + "
3 | \n", + "Memory Aware Synapses: Learning what (not) to forget | \n", + "Humans can learn in a continuous manner. Old rarely utilized knowledge can be\\noverwritten by new incoming information while important, frequently used\\nknowledge is prevented from being erased. In artificial learning systems,\\nlifelong learning so far has focused mainly on accumulating knowledge over\\ntasks and overcoming catastrophic forgetting. In this paper, we argue that,\\ngiven the limited model capacity and the unlimited new information to be\\nlearned, knowledge has to be preserved or erased selectively. Inspired by\\nneuroplasticity, we propose a novel approach for lifelong learning, coined\\nMemory Aware Synapses (MAS). It computes the importance of the parameters of a\\nneural network in an unsupervised and online manner. Given a new sample which\\nis fed to the network, MAS accumulates an importance measure for each parameter\\nof the network, based on how sensitive the predicted output function is to a\\nchange in this parameter. When learning a new task, changes to important\\nparameters can then be penalized, effectively preventing important knowledge\\nrelated to previous tasks from being overwritten. Further, we show an\\ninteresting connection between a local version of our method and Hebb's\\nrule,which is a model for the learning process in the brain. We test our method\\non a sequence of object recognition tasks and on the challenging problem of\\nlearning an embedding for predicting $<$subject, predicate, object$>$ triplets.\\nWe show state-of-the-art performance and, for the first time, the ability to\\nadapt the importance of the parameters based on unlabeled data towards what the\\nnetwork needs (not) to forget, which may vary depending on test conditions. | \n", + "[arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Francesca Babiloni'), arxiv.Result.Author('Mohamed Elhoseiny'), arxiv.Result.Author('Marcus Rohrbach'), arxiv.Result.Author('Tinne Tuytelaars')] | \n", + "http://arxiv.org/abs/1711.09601v4 | \n", + "4.307245 | \n", + "
4 | \n", + "Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning | \n", + "Although aviation accidents are rare, safety incidents occur more frequently\\nand require a careful analysis to detect and mitigate risks in a timely manner.\\nAnalyzing safety incidents using operational data and producing event-based\\nexplanations is invaluable to airline companies as well as to governing\\norganizations such as the Federal Aviation Administration (FAA) in the United\\nStates. However, this task is challenging because of the complexity involved in\\nmining multi-dimensional heterogeneous time series data, the lack of\\ntime-step-wise annotation of events in a flight, and the lack of scalable tools\\nto perform analysis over a large number of events. In this work, we propose a\\nprecursor mining algorithm that identifies events in the multidimensional time\\nseries that are correlated with the safety incident. Precursors are valuable to\\nsystems health and safety monitoring and in explaining and forecasting safety\\nincidents. Current methods suffer from poor scalability to high dimensional\\ntime series data and are inefficient in capturing temporal behavior. We propose\\nan approach by combining multiple-instance learning (MIL) and deep recurrent\\nneural networks (DRNN) to take advantage of MIL's ability to learn using weakly\\nsupervised data and DRNN's ability to model temporal behavior. We describe the\\nalgorithm, the data, the intuition behind taking a MIL approach, and a\\ncomparative analysis of the proposed algorithm with baseline models. We also\\ndiscuss the application to a real-world aviation safety problem using data from\\na commercial airline company and discuss the model's abilities and\\nshortcomings, with some final remarks about possible deployment directions. | \n", + "[arxiv.Result.Author('Vijay Manikandan Janakiraman')] | \n", + "http://arxiv.org/abs/1710.04749v2 | \n", + "4.206257 | \n", + "
5 | \n", + "A General Theory for Training Learning Machine | \n", + "Though the deep learning is pushing the machine learning to a new stage,\\nbasic theories of machine learning are still limited. The principle of\\nlearning, the role of the a prior knowledge, the role of neuron bias, and the\\nbasis for choosing neural transfer function and cost function, etc., are still\\nfar from clear. In this paper, we present a general theoretical framework for\\nmachine learning. We classify the prior knowledge into common and\\nproblem-dependent parts, and consider that the aim of learning is to maximally\\nincorporate them. The principle we suggested for maximizing the former is the\\ndesign risk minimization principle, while the neural transfer function, the\\ncost function, as well as pretreatment of samples, are endowed with the role\\nfor maximizing the latter. The role of the neuron bias is explained from a\\ndifferent angle. We develop a Monte Carlo algorithm to establish the\\ninput-output responses, and we control the input-output sensitivity of a\\nlearning machine by controlling that of individual neurons. Applications of\\nfunction approaching and smoothing, pattern recognition and classification, are\\nprovided to illustrate how to train general learning machines based on our\\ntheory and algorithm. Our method may in addition induce new applications, such\\nas the transductive inference. | \n", + "[arxiv.Result.Author('Hong Zhao')] | \n", + "http://arxiv.org/abs/1704.06885v1 | \n", + "4.150894 | \n", + "
6 | \n", + "A Brief Survey of Deep Reinforcement Learning | \n", + "Deep reinforcement learning is poised to revolutionise the field of AI and\\nrepresents a step towards building autonomous systems with a higher level\\nunderstanding of the visual world. Currently, deep learning is enabling\\nreinforcement learning to scale to problems that were previously intractable,\\nsuch as learning to play video games directly from pixels. Deep reinforcement\\nlearning algorithms are also applied to robotics, allowing control policies for\\nrobots to be learned directly from camera inputs in the real world. In this\\nsurvey, we begin with an introduction to the general field of reinforcement\\nlearning, then progress to the main streams of value-based and policy-based\\nmethods. Our survey will cover central algorithms in deep reinforcement\\nlearning, including the deep $Q$-network, trust region policy optimisation, and\\nasynchronous advantage actor-critic. In parallel, we highlight the unique\\nadvantages of deep neural networks, focusing on visual understanding via\\nreinforcement learning. To conclude, we describe several current areas of\\nresearch within the field. | \n", + "[arxiv.Result.Author('Kai Arulkumaran'), arxiv.Result.Author('Marc Peter Deisenroth'), arxiv.Result.Author('Miles Brundage'), arxiv.Result.Author('Anil Anthony Bharath')] | \n", + "http://arxiv.org/abs/1708.05866v2 | \n", + "3.549962 | \n", + "
7 | \n", + "Interpretable Explanations of Black Boxes by Meaningful Perturbation | \n", + "As machine learning algorithms are increasingly applied to high impact yet\\nhigh risk tasks, such as medical diagnosis or autonomous driving, it is\\ncritical that researchers can explain how such algorithms arrived at their\\npredictions. In recent years, a number of image saliency methods have been\\ndeveloped to summarize where highly complex neural networks \"look\" in an image\\nfor evidence for their predictions. However, these techniques are limited by\\ntheir heuristic nature and architectural constraints. In this paper, we make\\ntwo main contributions: First, we propose a general framework for learning\\ndifferent kinds of explanations for any black box algorithm. Second, we\\nspecialise the framework to find the part of an image most responsible for a\\nclassifier decision. Unlike previous works, our method is model-agnostic and\\ntestable because it is grounded in explicit and interpretable image\\nperturbations. | \n", + "[arxiv.Result.Author('Ruth Fong'), arxiv.Result.Author('Andrea Vedaldi')] | \n", + "http://arxiv.org/abs/1704.03296v4 | \n", + "3.451381 | \n", + "
8 | \n", + "Self corrective Perturbations for Semantic Segmentation and Classification | \n", + "Convolutional Neural Networks have been a subject of great importance over\\nthe past decade and great strides have been made in their utility for producing\\nstate of the art performance in many computer vision problems. However, the\\nbehavior of deep networks is yet to be fully understood and is still an active\\narea of research. In this work, we present an intriguing behavior: pre-trained\\nCNNs can be made to improve their predictions by structurally perturbing the\\ninput. We observe that these perturbations - referred as Guided Perturbations -\\nenable a trained network to improve its prediction performance without any\\nlearning or change in network weights. We perform various ablative experiments\\nto understand how these perturbations affect the local context and feature\\nrepresentations. Furthermore, we demonstrate that this idea can improve\\nperformance of several existing approaches on semantic segmentation and scene\\nlabeling tasks on the PASCAL VOC dataset and supervised classification tasks on\\nMNIST and CIFAR10 datasets. | \n", + "[arxiv.Result.Author('Swami Sankaranarayanan'), arxiv.Result.Author('Arpit Jain'), arxiv.Result.Author('Ser Nam Lim')] | \n", + "http://arxiv.org/abs/1703.07928v2 | \n", + "3.417501 | \n", + "
9 | \n", + "Graph Approximation and Clustering on a Budget | \n", + "We consider the problem of learning from a similarity matrix (such as\\nspectral clustering and lowd imensional embedding), when computing pairwise\\nsimilarities are costly, and only a limited number of entries can be observed.\\nWe provide a theoretical analysis using standard notions of graph\\napproximation, significantly generalizing previous results (which focused on\\nspectral clustering with two clusters). We also propose a new algorithmic\\napproach based on adaptive sampling, which experimentally matches or improves\\non previous methods, while being considerably more general and computationally\\ncheaper. | \n", + "[arxiv.Result.Author('Ethan Fetaya'), arxiv.Result.Author('Ohad Shamir'), arxiv.Result.Author('Shimon Ullman')] | \n", + "http://arxiv.org/abs/1406.2602v1 | \n", + "3.358721 | \n", + "
\n", - " | title | \n", - "summary | \n", - "authors | \n", - "url | \n", - "score | \n", - "
---|---|---|---|---|---|
0 | \n", - "Expert Gate: Lifelong Learning with a Network of Experts | \n", - "In this paper we introduce a model of lifelong learning, based on a Network\\nof Experts. New tasks / experts are learned and added to the model\\nsequentially, building on what was learned before. To ensure scalability of\\nthis process,data from previous tasks cannot be stored and hence is not\\navailable when learning a new task. A critical issue in such context, not\\naddressed in the literature so far, relates to the decision which expert to\\ndeploy at test time. We introduce a set of gating autoencoders that learn a\\nrepresentation for the task at hand, and, at test time, automatically forward\\nthe test sample to the relevant expert. This also brings memory efficiency as\\nonly one expert network has to be loaded into memory at any given time.\\nFurther, the autoencoders inherently capture the relatedness of one task to\\nanother, based on which the most relevant prior model to be used for training a\\nnew expert, with finetuning or learning without-forgetting, can be selected. We\\nevaluate our method on image classification and video prediction problems. | \n", - "[arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Punarjay Chakravarty'), arxiv.Result.Author('Tinne Tuytelaars')] | \n", - "http://arxiv.org/abs/1611.06194v2 | \n", - "4.703215 | \n", - "
1 | \n", - "Approximate Bayesian Image Interpretation using Generative Probabilistic Graphics Programs | \n", - "The idea of computer vision as the Bayesian inverse problem to computer\\ngraphics has a long history and an appealing elegance, but it has proved\\ndifficult to directly implement. Instead, most vision tasks are approached via\\ncomplex bottom-up processing pipelines. Here we show that it is possible to\\nwrite short, simple probabilistic graphics programs that define flexible\\ngenerative models and to automatically invert them to interpret real-world\\nimages. Generative probabilistic graphics programs consist of a stochastic\\nscene generator, a renderer based on graphics software, a stochastic likelihood\\nmodel linking the renderer's output and the data, and latent variables that\\nadjust the fidelity of the renderer and the tolerance of the likelihood model.\\nRepresentations and algorithms from computer graphics, originally designed to\\nproduce high-quality images, are instead used as the deterministic backbone for\\nhighly approximate and stochastic generative models. This formulation combines\\nprobabilistic programming, computer graphics, and approximate Bayesian\\ncomputation, and depends only on general-purpose, automatic inference\\ntechniques. We describe two applications: reading sequences of degraded and\\nadversarially obscured alphanumeric characters, and inferring 3D road models\\nfrom vehicle-mounted camera images. Each of the probabilistic graphics programs\\nwe present relies on under 20 lines of probabilistic code, and supports\\naccurate, approximately Bayesian inferences about ambiguous real-world images. | \n", - "[arxiv.Result.Author('Vikash K. Mansinghka'), arxiv.Result.Author('Tejas D. Kulkarni'), arxiv.Result.Author('Yura N. Perov'), arxiv.Result.Author('Joshua B. Tenenbaum')] | \n", - "http://arxiv.org/abs/1307.0060v1 | \n", - "4.515473 | \n", - "
2 | \n", - "Learning Visual Reasoning Without Strong Priors | \n", - "Achieving artificial visual reasoning - the ability to answer image-related\\nquestions which require a multi-step, high-level process - is an important step\\ntowards artificial general intelligence. This multi-modal task requires\\nlearning a question-dependent, structured reasoning process over images from\\nlanguage. Standard deep learning approaches tend to exploit biases in the data\\nrather than learn this underlying structure, while leading methods learn to\\nvisually reason successfully but are hand-crafted for reasoning. We show that a\\ngeneral-purpose, Conditional Batch Normalization approach achieves\\nstate-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%\\nerror rate. We outperform the next best end-to-end method (4.5%) and even\\nmethods that use extra supervision (3.1%). We probe our model to shed light on\\nhow it reasons, showing it has learned a question-dependent, multi-step\\nprocess. Previous work has operated under the assumption that visual reasoning\\ncalls for a specialized architecture, but we show that a general architecture\\nwith proper conditioning can learn to visually reason effectively. | \n", - "[arxiv.Result.Author('Ethan Perez'), arxiv.Result.Author('Harm de Vries'), arxiv.Result.Author('Florian Strub'), arxiv.Result.Author('Vincent Dumoulin'), arxiv.Result.Author('Aaron Courville')] | \n", - "http://arxiv.org/abs/1707.03017v5 | \n", - "4.332870 | \n", - "
3 | \n", - "Memory Aware Synapses: Learning what (not) to forget | \n", - "Humans can learn in a continuous manner. Old rarely utilized knowledge can be\\noverwritten by new incoming information while important, frequently used\\nknowledge is prevented from being erased. In artificial learning systems,\\nlifelong learning so far has focused mainly on accumulating knowledge over\\ntasks and overcoming catastrophic forgetting. In this paper, we argue that,\\ngiven the limited model capacity and the unlimited new information to be\\nlearned, knowledge has to be preserved or erased selectively. Inspired by\\nneuroplasticity, we propose a novel approach for lifelong learning, coined\\nMemory Aware Synapses (MAS). It computes the importance of the parameters of a\\nneural network in an unsupervised and online manner. Given a new sample which\\nis fed to the network, MAS accumulates an importance measure for each parameter\\nof the network, based on how sensitive the predicted output function is to a\\nchange in this parameter. When learning a new task, changes to important\\nparameters can then be penalized, effectively preventing important knowledge\\nrelated to previous tasks from being overwritten. Further, we show an\\ninteresting connection between a local version of our method and Hebb's\\nrule,which is a model for the learning process in the brain. We test our method\\non a sequence of object recognition tasks and on the challenging problem of\\nlearning an embedding for predicting $<$subject, predicate, object$>$ triplets.\\nWe show state-of-the-art performance and, for the first time, the ability to\\nadapt the importance of the parameters based on unlabeled data towards what the\\nnetwork needs (not) to forget, which may vary depending on test conditions. | \n", - "[arxiv.Result.Author('Rahaf Aljundi'), arxiv.Result.Author('Francesca Babiloni'), arxiv.Result.Author('Mohamed Elhoseiny'), arxiv.Result.Author('Marcus Rohrbach'), arxiv.Result.Author('Tinne Tuytelaars')] | \n", - "http://arxiv.org/abs/1711.09601v4 | \n", - "4.307245 | \n", - "
4 | \n", - "Explaining Aviation Safety Incidents Using Deep Temporal Multiple Instance Learning | \n", - "Although aviation accidents are rare, safety incidents occur more frequently\\nand require a careful analysis to detect and mitigate risks in a timely manner.\\nAnalyzing safety incidents using operational data and producing event-based\\nexplanations is invaluable to airline companies as well as to governing\\norganizations such as the Federal Aviation Administration (FAA) in the United\\nStates. However, this task is challenging because of the complexity involved in\\nmining multi-dimensional heterogeneous time series data, the lack of\\ntime-step-wise annotation of events in a flight, and the lack of scalable tools\\nto perform analysis over a large number of events. In this work, we propose a\\nprecursor mining algorithm that identifies events in the multidimensional time\\nseries that are correlated with the safety incident. Precursors are valuable to\\nsystems health and safety monitoring and in explaining and forecasting safety\\nincidents. Current methods suffer from poor scalability to high dimensional\\ntime series data and are inefficient in capturing temporal behavior. We propose\\nan approach by combining multiple-instance learning (MIL) and deep recurrent\\nneural networks (DRNN) to take advantage of MIL's ability to learn using weakly\\nsupervised data and DRNN's ability to model temporal behavior. We describe the\\nalgorithm, the data, the intuition behind taking a MIL approach, and a\\ncomparative analysis of the proposed algorithm with baseline models. We also\\ndiscuss the application to a real-world aviation safety problem using data from\\na commercial airline company and discuss the model's abilities and\\nshortcomings, with some final remarks about possible deployment directions. | \n", - "[arxiv.Result.Author('Vijay Manikandan Janakiraman')] | \n", - "http://arxiv.org/abs/1710.04749v2 | \n", - "4.206257 | \n", - "
5 | \n", - "A General Theory for Training Learning Machine | \n", - "Though the deep learning is pushing the machine learning to a new stage,\\nbasic theories of machine learning are still limited. The principle of\\nlearning, the role of the a prior knowledge, the role of neuron bias, and the\\nbasis for choosing neural transfer function and cost function, etc., are still\\nfar from clear. In this paper, we present a general theoretical framework for\\nmachine learning. We classify the prior knowledge into common and\\nproblem-dependent parts, and consider that the aim of learning is to maximally\\nincorporate them. The principle we suggested for maximizing the former is the\\ndesign risk minimization principle, while the neural transfer function, the\\ncost function, as well as pretreatment of samples, are endowed with the role\\nfor maximizing the latter. The role of the neuron bias is explained from a\\ndifferent angle. We develop a Monte Carlo algorithm to establish the\\ninput-output responses, and we control the input-output sensitivity of a\\nlearning machine by controlling that of individual neurons. Applications of\\nfunction approaching and smoothing, pattern recognition and classification, are\\nprovided to illustrate how to train general learning machines based on our\\ntheory and algorithm. Our method may in addition induce new applications, such\\nas the transductive inference. | \n", - "[arxiv.Result.Author('Hong Zhao')] | \n", - "http://arxiv.org/abs/1704.06885v1 | \n", - "4.150894 | \n", - "
6 | \n", - "A Brief Survey of Deep Reinforcement Learning | \n", - "Deep reinforcement learning is poised to revolutionise the field of AI and\\nrepresents a step towards building autonomous systems with a higher level\\nunderstanding of the visual world. Currently, deep learning is enabling\\nreinforcement learning to scale to problems that were previously intractable,\\nsuch as learning to play video games directly from pixels. Deep reinforcement\\nlearning algorithms are also applied to robotics, allowing control policies for\\nrobots to be learned directly from camera inputs in the real world. In this\\nsurvey, we begin with an introduction to the general field of reinforcement\\nlearning, then progress to the main streams of value-based and policy-based\\nmethods. Our survey will cover central algorithms in deep reinforcement\\nlearning, including the deep $Q$-network, trust region policy optimisation, and\\nasynchronous advantage actor-critic. In parallel, we highlight the unique\\nadvantages of deep neural networks, focusing on visual understanding via\\nreinforcement learning. To conclude, we describe several current areas of\\nresearch within the field. | \n", - "[arxiv.Result.Author('Kai Arulkumaran'), arxiv.Result.Author('Marc Peter Deisenroth'), arxiv.Result.Author('Miles Brundage'), arxiv.Result.Author('Anil Anthony Bharath')] | \n", - "http://arxiv.org/abs/1708.05866v2 | \n", - "3.549962 | \n", - "
7 | \n", - "Interpretable Explanations of Black Boxes by Meaningful Perturbation | \n", - "As machine learning algorithms are increasingly applied to high impact yet\\nhigh risk tasks, such as medical diagnosis or autonomous driving, it is\\ncritical that researchers can explain how such algorithms arrived at their\\npredictions. In recent years, a number of image saliency methods have been\\ndeveloped to summarize where highly complex neural networks \"look\" in an image\\nfor evidence for their predictions. However, these techniques are limited by\\ntheir heuristic nature and architectural constraints. In this paper, we make\\ntwo main contributions: First, we propose a general framework for learning\\ndifferent kinds of explanations for any black box algorithm. Second, we\\nspecialise the framework to find the part of an image most responsible for a\\nclassifier decision. Unlike previous works, our method is model-agnostic and\\ntestable because it is grounded in explicit and interpretable image\\nperturbations. | \n", - "[arxiv.Result.Author('Ruth Fong'), arxiv.Result.Author('Andrea Vedaldi')] | \n", - "http://arxiv.org/abs/1704.03296v4 | \n", - "3.451381 | \n", - "
8 | \n", - "Self corrective Perturbations for Semantic Segmentation and Classification | \n", - "Convolutional Neural Networks have been a subject of great importance over\\nthe past decade and great strides have been made in their utility for producing\\nstate of the art performance in many computer vision problems. However, the\\nbehavior of deep networks is yet to be fully understood and is still an active\\narea of research. In this work, we present an intriguing behavior: pre-trained\\nCNNs can be made to improve their predictions by structurally perturbing the\\ninput. We observe that these perturbations - referred as Guided Perturbations -\\nenable a trained network to improve its prediction performance without any\\nlearning or change in network weights. We perform various ablative experiments\\nto understand how these perturbations affect the local context and feature\\nrepresentations. Furthermore, we demonstrate that this idea can improve\\nperformance of several existing approaches on semantic segmentation and scene\\nlabeling tasks on the PASCAL VOC dataset and supervised classification tasks on\\nMNIST and CIFAR10 datasets. | \n", - "[arxiv.Result.Author('Swami Sankaranarayanan'), arxiv.Result.Author('Arpit Jain'), arxiv.Result.Author('Ser Nam Lim')] | \n", - "http://arxiv.org/abs/1703.07928v2 | \n", - "3.417501 | \n", - "
9 | \n", - "Graph Approximation and Clustering on a Budget | \n", - "We consider the problem of learning from a similarity matrix (such as\\nspectral clustering and lowd imensional embedding), when computing pairwise\\nsimilarities are costly, and only a limited number of entries can be observed.\\nWe provide a theoretical analysis using standard notions of graph\\napproximation, significantly generalizing previous results (which focused on\\nspectral clustering with two clusters). We also propose a new algorithmic\\napproach based on adaptive sampling, which experimentally matches or improves\\non previous methods, while being considerably more general and computationally\\ncheaper. | \n", - "[arxiv.Result.Author('Ethan Fetaya'), arxiv.Result.Author('Ohad Shamir'), arxiv.Result.Author('Shimon Ullman')] | \n", - "http://arxiv.org/abs/1406.2602v1 | \n", - "3.358721 | \n", - "