Google at Interspeech 2023 – Google Research Blog

Posted by Catherine Armato, Program Supervisor, Google

This week, the twenty fourth Annual Convention of the Worldwide Speech Communication Affiliation (INTERSPEECH 2023) is being held in Dublin, Eire, representing one of many world’s most intensive conferences on analysis and know-how of spoken language understanding and processing. Specialists in speech-related analysis fields collect to participate in oral shows and poster classes and to construct collaborations throughout the globe.

We’re excited to be a Platinum Sponsor of INTERSPEECH 2023, where we will be showcasing more than 20 research publications and supporting a number of workshops and special sessions. We welcome in-person attendees to drop by the Google Research booth to meet our researchers and participate in Q&As and demonstrations of some of our latest speech technologies, which help to improve accessibility and provide convenience in communication for billions of users. In addition, online attendees are encouraged to visit our virtual booth in Topia where you can get up-to-date information on research and opportunities at Google. Visit the @GoogleAI Twitter account to seek out out about Google sales space actions (e.g., demos and Q&A classes). You can too be taught extra concerning the Google analysis being offered at INTERSPEECH 2023 under (Google affiliations in daring).

Board and Organizing Committee

ISCA Board, Technical Committee Chair: Bhuvana Ramabhadran

Space Chairs embody:
    Evaluation of Speech and Audio Alerts: Richard Rose
    Speech Synthesis and Spoken Language Technology: Rob Clark
    Particular Areas: Tara Sainath

Satellite tv for pc occasions

Keynote speak – ISCA Medalist

Survey Speak

Speech Compression within the AI Period
Speaker: Jan Skoglund

Particular session papers

Cascaded Encoders for Effective-Tuning ASR Fashions on Overlapped Speech
Richard Rose, Oscar Chang, Olivier Siohan

TokenSplit: Utilizing Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Hakan Erdogan, Scott Knowledge, Xuankai Chang*, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, John R. Hershey

Papers

DeePMOS: Deep Posterior Imply-Opinion-Rating of Speech
Xinyu Liang, Fredrik Cumlin, Christian Schüldt, Saikat Chatterjee

O-1: Self-Coaching with Oracle and 1-Greatest Speculation
Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Kartik Audhkhasi

Re-investigating the Environment friendly Switch Studying of Speech Basis Mannequin Utilizing Function Fusion Strategies
Zhouyuan Huo, Khe Chai Sim, Dongseong Hwang, Tsendsuren Munkhdalai, Tara N. Sainath, Pedro Moreno

MOS vs. AB: Evaluating Textual content-to-Speech Programs Reliably Utilizing Clustered Customary Errors
Joshua Camp, Tom Kenter, Lev Finkelstein, Rob Clark

LanSER: Language-Mannequin Supported Speech Emotion Recognition
Taesik Gong, Josh Belanich, Krishna Somandepalli, Arsha Nagrani, Brian Eoff, Brendan Jou

Modular Area Adaptation for Conformer-Primarily based Streaming ASR
Qiujia Li, Bo Li, Dongseong Hwang, Tara N. Sainath, Pedro M. Mengibar

On Coaching a Neural Residual Acoustic Echo Suppressor for Improved ASR
Sankaran Panchapagesan, Turaj Zakizadeh Shabestary, Arun Narayanan

MD3: The Multi-dialect Dataset of Dialogues
Jacob Eisenstein, Vinodkumar Prabhakaran, Clara Rivera, Dorottya Demszky, Devyani Sharma

Twin-Mode NAM: Efficient High-Okay Context Injection for Finish-to-Finish ASR
Zelin Wu, Tsendsuren Munkhdalai, Pat Rondon, Golan Pundak, Khe Chai Sim, Christopher Li

Utilizing Textual content Injection to Enhance Recognition of Private Identifiers in Speech
Yochai Blau, Rohan Agrawal, Lior Madmony, Gary Wang, Andrew Rosenberg, Zhehuai Chen, Zorik Gekhman, Genady Beryozkin, Parisa Haghani, Bhuvana Ramabhadran

How you can Estimate Mannequin Transferability of Pre-trained Speech Fashions?
Zih-Ching Chen, Chao-Han Huck Yang*, Bo Li, Yu Zhang, Nanxin Chen, Shuo-yiin Chang, Rohit Prabhavalkar, Hung-yi Lee, Tara N. Sainath

Enhancing Joint Speech-Textual content Representations With out Alignment
Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho

Textual content Injection for Capitalization and Flip-Taking Prediction in Speech Fashions
Shaan Bijwadia, Shuo-yiin Chang, Weiran Wang, Zhong Meng, Hao Zhang, Tara N. Sainath

Streaming Parrotron for On-Machine Speech-to-Speech Conversion
Oleg Rybakov, Fadi Biadsy, Xia Zhang, Liyang Jiang, Phoenix Meadowlark, Shivani Agrawal

Semantic Segmentation with Bidirectional Language Fashions Improves Lengthy-Type ASR
W. Ronny Huang, Hao Zhang, Shankar Kumar, Shuo-yiin Chang, Tara N. Sainath

Common Computerized Phonetic Transcription into the Worldwide Phonetic Alphabet
Chihiro Taguchi, Yusuke Sakai, Parisa Haghani, David Chiang

Combination-of-Skilled Conformer for Streaming Multilingual ASR
Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays

Actual Time Spectrogram Inversion on Cellular Cellphone
Oleg Rybakov, Marco Tagliasacchi, Yunpeng Li, Liyang Jiang, Xia Zhang, Fadi Biadsy

2-Bit Conformer Quantization for Computerized Speech Recognition
Oleg Rybakov, Phoenix Meadowlark, Shaojin Ding, David Qiu, Jian Li, David Rim, Yanzhang He

LibriTTS-R: A Restored Multi-speaker Textual content-to-Speech Corpus
Yuma Koizumi, Heiga Zen, Shigeki Karita, Yifan Ding, Kohei Yatabe, Nobuyuki Morioka, Michiel Bacchiani, Yu Zhang, Wei Han, Ankur Bapna

PronScribe: Extremely Correct Multimodal Phonemic Transcription from Speech and Textual content
Yang Yu, Matthew Perez*, Ankur Bapna, Fadi Haik, Siamak Tazari, Yu Zhang

Label Conscious Speech Illustration Studying for Language Identification
Shikhar Vashishth, Shikhar Bharadwaj, Sriram Ganapathy, Ankur Bapna, Min Ma, Wei Han, Vera Axelrod, Partha Talukdar

* Work carried out whereas at Google

Important Pages:

Google at Interspeech 2023 – Google Research Blog

Would you trust AI to mediate an argument?

UC Berkeley Researchers Propose DocETL: A Declarative System that Optimizes Complex Document Processing Tasks using LLMs

Making it easier to verify an AI model’s responses | KryptoCoinz

Investing in AI to build next-generation infrastructure

aiXcoder-7B: A Lightweight and Efficient Large Language Model Offering High Accuracy in Code Completion Across Multiple Languages and Benchmarks

MIT launches new Music Technology and Computation Graduate Program | KryptoCoinz

Google DeepMind wins joint Nobel Prize in Chemistry for protein prediction AI

Meta AI Releases Meta’s Open Materials 2024 (OMat24) Inorganic Materials Dataset and Models

Important Pages:

Google at Interspeech 2023 – Google Research Blog

Board and Organizing Committee

Satellite tv for pc occasions

Keynote speak – ISCA Medalist

Survey Speak

Particular session papers

Papers

Related Posts