Deepgram raises $12M for enterprise speech recognition

Deepgram, a startup focused on high-quality, real-time speech recognition, announced a $12 million Series A this morning.

The startup, founded a half decade ago, according to Crunchbase data, with just a few million in raised capital, is interesting, as its success to date was founded on two consecutive experiments. The first dealing with its technology, and the second concerning its market.

Deepgram sits in the midst of our continuing conversation about AI-grounded companies, or at least companies that make use of deep learning. Let’s explore its round, and how the company got to where it is today.

Foundations

Speech recognition has come a long way since terrible ’90s headsets and trying to train Dragon Naturally Speaking to better let you dictate into Word documents. Startups like Otter.ai have taken speech recognition tooling and made it available to the masses. But, while Otter.ai is something that journalists love for its ease of use and modest price point, there’s still something missing in the modern world of speech recognition: improved accuracy.

Otter and other services can do a fine job gisting a sound file into words and paragraphs, even working to differentiate between speakers. But it’s only so good, and it’s retroactive. With most calls that I execute for TechCrunch, for example, I record the chat on my phone, export the audio, upload it to Otter.ai, leave it be and circle back later on to listen and clean up the text for use in an article. (Here’s one, for example.)

What Deepgram can do is a bit more heavy duty and is not aimed at journalists or other individuals. Instead, Deepgram has built a speech recognition tool that it claims is more accurate and can handle real-time text input. It sells the tech to large companies.

TechCrunch spoke with Deepgram CEO Scott Stephenson about his company’s product during our call about the round itself. Summarizing our chat, here’s what we found out. Instead of trying to improve existing tech — which doesn’t sport strong gross margins, the CEO said — Deepgram started from scratch, building a deep learning tool that, after a few years’ work, was a step ahead of other speech recognition technologies in terms of accuracy.

Its investors agree. In a call with TechCrunch, Nvida’s Jeff Herbst, who took part in the investment, said that Deepgram was “one of the best, if not the best” speech recognition companies around. Deepgram provides its services in two ways, hosted on its own hardware (the firm claims better margins by running its own metal, and, you now know why Nvidia is involved) and on-prem on client hardware. The startup is targeting enterprise call centers and voice platforms as customers.

It took time to prove the company’s tech, years in fact. Deepgram then spent another few years testing out its possible commercial appeal. It may seem obvious today that there would be demand for what Deepgram built, but Gong.io and other, similar services are only so old. Regardless, after about four years, the company was content that it had proven out its product and customer base. Or as Stephenson told TechCrunch, the “tech risk” that Deepgram faced is now behind it, as is its “market risk.”

That’s why the company raised now, so let’s talk about the round.

The round

Deepgram’s $12 million investment was led by Wing VC. Other firms took part, including Nvidia as mentioned, and Y Combinator and SAP.

What’s the money for? Adding staff, among other things. Deepgram has about 40 people today, but declined to tell TechCrunch how quickly it will scale personnel (oddly, as that’s a pretty standard question), saying instead that it’s hiring aggressively, with a focus on go-to-market and engineering. The firm also intends to use some of its Series A on hardware.

What’s fun is that Deepgram has what it considers to be a strong market position, now crossed with a pile of cash. How fast it can grow is now the question, and the first thing we’re asking the next time we speak with the firm.