We gratefully acknowledge support from
the Simons Foundation and member institutions.

Audio and Speech Processing

Authors and titles for recent submissions

[ total of 26 entries: 1-25 | 26 ]
[ showing 25 entries per page: fewer | more | all ]

Thu, 27 Feb 2020

[1]  arXiv:2002.11356 [pdf, ps, other]
Title: BUT System for the Second DIHARD Speech Diarization Challenge
Subjects: Audio and Speech Processing (eess.AS)
[2]  arXiv:2002.11312 [pdf, other]
Title: Multitask Learning and Multistage Fusion for Dimensional Audiovisual Emotion Recognition
Comments: Submitted to ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS)
[3]  arXiv:2002.11268 [pdf, other]
Title: A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition
Comments: 8 pages, 4 figures, presented at 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU 2019)
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Sound (cs.SD)
[4]  arXiv:2002.11250 [pdf]
Title: Dataset of raw and pre-processed speech signals, Mel Frequency Cepstral Coefficients of Speech and Heart Rate measurements
Comments: conference
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[5]  arXiv:2002.11241 [pdf, other]
Title: Lightweight Online Separation of the Sound Source of Interest through BLSTM-Based Binary Masking
Comments: Submitted to IEEE/ACM Transactions on Audio Speech and Language Processing
Subjects: Audio and Speech Processing (eess.AS)
[6]  arXiv:2002.11561 (cross-list from cs.SD) [pdf, ps, other]
Title: An Open-set Recognition and Few-Shot Learning Dataset for Audio Event Classification in Domestic Environments
Comments: To be submitted to Expert System with Applications
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[7]  arXiv:2002.11474 (cross-list from cs.SD) [pdf, other]
Title: RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[8]  arXiv:2002.11188 (cross-list from cs.NI) [pdf, other]
Title: IoT Based Real Time Noise Mapping System for Urban Sound Pollution Study
Authors: Sakib Ahmed (1), Touseef Saleh Bin Ahmed (1), Sumaiya Jafreen (1), Jannatul Tajrin (1), Jia Uddin (1) ((1) BRAC University)
Comments: Appendix by Sakib Ahmed Accepted as Conference Paper at ICIEV and icIVPR, 2018, Student Conference on Informatics, Electronics & Vision (SCIEV): Paper ID 175
Subjects: Networking and Internet Architecture (cs.NI); Information Retrieval (cs.IR); Audio and Speech Processing (eess.AS)

Wed, 26 Feb 2020

[9]  arXiv:2002.10988 [pdf, other]
Title: An LSTM Based Architecture to Relate Speech Stimulus to EEG
Comments: 3 figures, 6 pages
Subjects: Audio and Speech Processing (eess.AS)
[10]  arXiv:2002.10708 [pdf]
Title: Controllable Sequence-To-Sequence Neural TTS with LPCNET Backend for Real-time Speech Synthesis on CPU
Subjects: Audio and Speech Processing (eess.AS)

Tue, 25 Feb 2020

[11]  arXiv:2002.09821 [pdf, other]
Title: A Multi-view CNN-based Acoustic Classification System for Automatic Animal Species Identification
Journal-ref: Ad Hoc Networks 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
[12]  arXiv:2002.09661 [pdf, other]
Title: Multi-Branch Learning for Weakly-Labeled Sound Event Detection
Comments: Accepted by ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS)
[13]  arXiv:2002.10336 (cross-list from cs.CL) [pdf, other]
Title: Semi-Supervised Speech Recognition via Local Prior Matching
Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
[14]  arXiv:2002.09748 (cross-list from cs.SD) [pdf, other]
Title: DECIBEL: Improving Audio Chord Estimation for Popular Music by Alignment and Integration of Crowd-Sourced Symbolic Representations
Comments: 81 pages, 47 figures
Subjects: Sound (cs.SD); Multimedia (cs.MM); Audio and Speech Processing (eess.AS)
[15]  arXiv:2002.09607 (cross-list from cs.MM) [pdf, other]
Title: Multi-Representation Knowledge Distillation For Audio Classification
Subjects: Multimedia (cs.MM); Sound (cs.SD); Audio and Speech Processing (eess.AS)

Mon, 24 Feb 2020

[16]  arXiv:2002.09286 [pdf, other]
Title: Efficient Trainable Front-Ends for Neural Speech Enhancement
Comments: 5 pages, 5 figures, ICASSP 2020
Subjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Sound (cs.SD); Machine Learning (stat.ML)
[17]  arXiv:2002.09026 [pdf]
Title: Multi-label Sound Event Retrieval Using a Deep Learning-based Siamese Structure with a Pairwise Presence Matrix
Comments: Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Audio and Speech Processing (eess.AS); Information Retrieval (cs.IR); Machine Learning (cs.LG); Sound (cs.SD)
[18]  arXiv:2002.09143 (cross-list from cs.LG) [pdf, other]
Title: Few-shot acoustic event detection via meta-learning
Comments: ICASSP 2020
Subjects: Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)
[19]  arXiv:2002.09021 (cross-list from cs.SD) [pdf]
Title: A Comparative Study of Western and Chinese Classical Music based on Soundscape Models
Comments: Paper accepted for 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2020)
Subjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)

Fri, 21 Feb 2020 (showing first 6 of 7 entries)

[20]  arXiv:2002.08933 [pdf, other]
Title: Wavesplit: End-to-End Speech Separation by Speaker Clustering
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD); Machine Learning (stat.ML)
[21]  arXiv:2002.08926 [pdf, ps, other]
Title: Imputer: Sequence Modelling via Imputation and Dynamic Programming
Comments: preprint
Subjects: Audio and Speech Processing (eess.AS); Computation and Language (cs.CL); Machine Learning (cs.LG); Sound (cs.SD)
[22]  arXiv:2002.08796 [pdf, ps, other]
Title: iSEGAN: Improved Speech Enhancement Generative Adversarial Networks
Authors: Deepak Baby
Comments: A short report on improving SEGAN performance
Subjects: Audio and Speech Processing (eess.AS); Sound (cs.SD); Signal Processing (eess.SP)
[23]  arXiv:2002.08742 [pdf, other]
Title: Disentangled Speech Embeddings using Cross-modal Self-supervision
Comments: To appear in ICASSP 2020. The first three authors contributed equally to this work
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD)
[24]  arXiv:2002.08688 [pdf, other]
Title: An empirical study of Conv-TasNet
Comments: In proceedings of ICASSP2020
Subjects: Audio and Speech Processing (eess.AS); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Sound (cs.SD)
[25]  arXiv:2002.08700 (cross-list from cs.CV) [pdf, other]
Title: Photorealistic Lip Sync with Adversarial Temporal Convolutional Networks
Comments: 9 pages, 7 figures
Subjects: Computer Vision and Pattern Recognition (cs.CV); Audio and Speech Processing (eess.AS)
[ total of 26 entries: 1-25 | 26 ]
[ showing 25 entries per page: fewer | more | all ]

Disable MathJax (What is MathJax?)

Links to: arXiv, form interface, find, eess, new, 2002, contact, help  (Access key information)