Google Research Blog 今天公佈 Google 語音搜尋 (Voice Search) 採用了全新的核芯技術,可以更快更準確分析語音輸入,進行搜尋。
Today, we’re happy to announce we built even better neural network acoustic models using Connectionist Temporal Classification (CTC) and sequence discriminative training techniques. These models are a special extension of recurrent neural networks (RNNs) that are more accurate, especially in noisy environments, and they are blazingly fast!
過往舊技術,會將用家所講的 waveform 分拆成一個個 10微秒的 frames,再逐一分析。之後再透過不同的模型去判定分析出來的音節字詞的在該語音內的可能性,從而選取成該個語音結果。
新技術採用 Recurrent Neural Networks,具有 feedback loops 機制,判斷每一音節前後相關的音節,可以提升聲音的辨認度。另外,將語音模型不斷的訓練,語音模型可以更少的運算作出辨認。而在語音模型訓練亦特別加入一些雜音,從以提升語音模型在雜聲下的辨識能力。
We are happy to announce that our new acoustic models are now used for voice searches and commands in the Google app (on Android and iOS), and for dictation on Android devices. In addition to requiring much lower computational resources, the new models are more accurate, robust to noise, and faster to respond to voice search queries – so give it a try, and happy (voice) searching!
簡單而言,用了新技術 Voice Search 可以更快速更準確。新技術已經在 Android 和 iOS 的 Google App 上應用,亦用在語音輸入上。各位不妨試試 Voice Search ,是否真的較以前更快呢?
Source: Google Research Blog