The voice space is much bigger than just Google and Amazon, and there are many ways to develop voice applications. What’s missing in the voice design space? A simple markup language.
Dr. Andrey Esaúlov is the CEO of SmartHouse Technologies and co-founder of BotTalk. SmartHouse Technologies specializes in consulting and development of voice applications and mobile apps. The company’s BotTalk product is a platform for creating Alexa Skills and Google Actions with a simple markup language.
From this conversation, you’ll learn that you don’t need to be a developer to build sophisticated, context-aware voice applications that can benefit your business or even become your business.
Carl: Hello and welcome back to another episode of the voice tech podcast. My name is Carl Robinson, and today you hear me talk with Dr. Andrey Esaulov, who is the CEO of SmartHouse Technologies and the co-founder of BotTalk. Smarthouse technologies specialise in consulting and development of voice applications and mobile apps, and companies BotTalk product is a platform for creating Alexa skills and google actions using a simple markup language, so from our conversation, you will learn that you don't need to be a developer to build sophisticated context-aware voice applications that can benefit your business.
So in this episode with Andrey Esaulov, you will get a lot of good advice for designing voice applications. You will hear some strategies that you can take for intent capture and slot filling while avoiding annoying the user. We talk about the role of multimodal and how to handle the interaction between both the screen and voice. We discuss the importance of contextual awareness, some of the difficulties with implementing this with current tools, and the features that BotTalk has introduced to make it easier for the developer to implement. Finally, we discuss our expectations for the voice platform and discuss some of the services missing from the voice space why there are so many tools companies and so few voice app developers in comparison.
There is much more besides it is an exciting conversation you will get a lot from it if you are involved in building voice apps, and you will learn all about a new tool that you can use to do this much more quickly, much more effectively.
Andrey: Thank you for having me!
Carl: It's going to be an exciting conversation. Very briefly, as an intro, you are the CEO of SmartHouse Technologies based in Cologne in Germany. You specialize in consulting and developing voice applications and mobile apps. You are also the CEO and co-founder of BotTalk, a cornerstone product of SmartHouse Technologies, a platform for creating Alexa Skills and Actions on Google with a simple markup language.
Can you give us a bit of background of what that means, specifically on BotTalk, the product that you guys have there? What's it for, whos it for, what's it does, and what problems it solves.
Andrey: So SmartHouse Technologies started about three years ago, and we started initially with mobile apps, and then pretty soon came the voice applications. We sort of hit this problem which we confirmed on one of the hackathons in Berlin. It was "Talk To Me Hamburg," and our problem was that Google and Alexa were developing the same thing but in different ways. There were solutions out there that required a lot of backend programming lots of know-how was needed for programmers to jump in.
We looked at what we used to do on the web and then on mobile and said: "Well in web, there is HTML, in mobile space, there was React Native," and what was pretty obvious for us that this markup technology is straightforward to grasp very easy to learn.
You can start up small and evolve your knowledge around. This was missing in this voice space, and this is how BotTalk was born, so we said wouldn't it be nice if you can write your applications in simple markup style language, and we took this idea.
Carl: Fantastic. How long ago was that?
Andrey: One and half years ago.
Carl: So it is still relatively new in the voice space. So you identified a real need by going to these events and seeing how people were building these apps. Then you decided to take on a pretty ambitious project, I have to say. Trying to simplify down from multiple platforms into an easy-to-use programming tool that anybody can launch these quite involved applications.
Was this a daunting task, or were you and your co-founder supremely confident?
Andrey: Well, initially, it was a big task, but looking into the JASON that these platforms are outputting. We realized that in the background, it's all the same. What we were very nervous about in the beginning is the direction that API.AI could go. Then the internal tools of Alexa as well because the first version was recreating something like dialogue flow entirely from scratch.
Thank god we changed our approach and said, okay, we cant keep up with these companies. Instead, we should build something smarter.
Carl: Hands SmartHouse was born. Did that come at the same time as the idea?
Andrey: Almost. SmartHouse was born half a year before that. We were already developing some voice applications for a couple of publishing houses here in Germany. That's where we developed first of all this internal framework, and then we went to prove it by going on a couple of hackathons. I went to hackathons, not for the sake of winning but for the sake of going to different teams, saying, hey guys, how are you developing this? And telling them about our product.
Carl: It's got to be the best way to do it. Right?
Andrey: Yeah, and all this feedback that you directly get from people some say sounds interesting, some of them say no, we got to stick to the native code, and stuff like that is very helpful in the beginning.
Carl: The JSON that these two platforms consume is an abstraction, basically. It's a middle layer, and your product translates clicks and drags. To be more accurately its Yaml, a simplified markup language, but I understand that you also do multimodal, so I guess there is a visual element as well, but you translate that which comes from the designer the programmer into this JSON this middle layer, and then that is translated by Google or Amazon Alexa into the full-blown app which is deployed. So there are multiple layers of processing going on.
Andrey: Yeah, you should understand BotTalk as an HTML, basically, right? So you could describe visual things like a table or a list of anything on your screen with HTML and CSS, and that's what BotTalk allows you to do as well, so you could describe how your player will look like on Alexa. We were delighted that I think a year into BotTalk, Amazon introduced APL, and this is the visual language for voice assistance that also lets you provide an additional visual interface.
Carl: Could you break that down for me a little bit more then?
So you say everything could be represented like in HTML and CSS page, but you don't need to be a web developer to use BotTalk all that is handled through the translation, right?
All you need to do is write Yaml and the Yaml language is very human-readable, very structured; it's kind of like the easiest way. Could you explain how multimodal works? And how do I get stuff on the screen?
Andrey: Well, you should understand Yaml that we are putting out there is a set of instructions that you are sending to your voice assistant. So you could say, okay, I am sending this text hello user nice to have you here and then straight after that i might be willing to send some visual stuff for example carousel for google assistant or a set of suggestionships for google assistant so you could say like hello and welcome to our action and what do you want to do and then the person basically has a choice if he has a screen he is gonna see the suggestionships and he is going to click on them and that will translate into next steps when he is interacting with a google home without a screen he will just say lets check out the product or stuff like that.
Carl: In the Yaml, I specify there a spoken response along with a button or a title kind of standard tags, and I know that they will be shown to the people
Andrey: And because we support different platforms, you can do multiple things with multiple platforms. You can, for example, send a voice message to both Alexa and Google Assistant, and then you send the Suggestion Ships for Google Assistant. If you are fancy, you also send some visuals to the Alexa with a screen BotTalk handles all of that. Okay, am I talking to a google assistant? Am I talking to Alexa right now? It puts this whole checking process out of the way of the developer.
So it is faster, and we also measured it. We put our developers and put a couple of agencies that work with native code, and it turned out three times faster to develop with BotTalk. Because it just puts everywhere this if statements checking what kind of device is that and puts all out of the way. You can concentrate on the dialogue the good stuff.
Carl: Wonderful. It sounds like an easy way to get a skill up and running. Did you get templates there as well? So you don't have to start from scratch. What kind of things can the templates do, and how adaptable are they?
Andrey: One of the big things that we put out a month ago is one-click deployment directly from our landing page. So you could choose the template on the landing page. You can click on the button to use it right, if you don't have an account, you will be registered and redirected straight away from the landing page to the working skill or action. That's amazing from a technical point of view. But the idea behind the templates is to take the most popular use cases. And those are usually some audio skills, so radio or podcast player and flash briefing are one of them, and we have a template for that.
Many other customers use Google spreadsheets as a way to organize their data. We use it as a template as well, and for those developers who want to play around and create a couple of skills or actions that do games...
Click here to listen to the full podcast.