2 min readfrom Machine Learning

Why don't Automatic speech Recognition models use prompting? [D]

Our take

Automatic Speech Recognition (ASR) models often overlook the potential of prompting, which could significantly enhance their performance in real-world applications. While some platforms, like Deepgram, offer word boosting features, they may not effectively address the complexities of live conversations. Implementing prompting could allow models to leverage context, such as conversation history or specific word categories, to improve accuracy. Despite its advantages, this technique remains underutilized in ASR development. Understanding why prompting isn't widely adopted could open new avenues for enhancing voice agents and their functionalities.

I've been working on the listening part of my full-duplex speech model and I realized that ASR prompting could be very useful.

Deepgram allows for word boosting but that doesn't work that well in real word applications.

Other thing that is missing is feeding a whole conversation history as context to the ASR model. This could be very useful for voice agents.

TLDR, during the testing I realized the model can be fine tuned for prompting with text like:

<text>Expect a license plate (3 letters, 3 numbers). For example ABC123.</text><|start|> 

or

<text>Expect a person's name. It could also contain a last name. For example John Doe.</text><|start|> 

Instead of specifying all specific words to boost (which sometimes is not feasible, or you'd run out of context window) we can just specify a category of words and the model will know what to boost.

<text>Boost words: [Australian cities, food names, TV shows]</text><|start|> 

I thought that by now surely this would be something that most ASR models support but it seems like none do.

Is there a reason why this is not a common feature?

Link to the full description:

https://ketsuilabs.io/blog/listen-head

submitted by /u/kwazar90
[link] [comments]

Read on the original site

Open the publisher's page for the full experience

View original article

Tagged with

#natural language processing for spreadsheets#generative AI for data analysis#Excel alternatives for data analysis#no-code spreadsheet solutions#rows.com#machine learning in spreadsheet applications#cloud-based spreadsheet applications#real-time data collaboration#financial modeling with spreadsheets#real-time collaboration#Automatic Speech Recognition#ASR prompting#conversation history#Deepgram#word boosting#voice agents#fine tuning#contextual boosting#full-duplex#boosting categories