A collection of computer systems and programming tips that you may find useful.
 
Brought to you by Craic Computing LLC, a bioinformatics consulting company.

Tuesday, April 16, 2013

Web Text To Speech using Bing Translate - Demonstration Sinatra App

Given the variety of sophisticated web services available today, it is surprising that a good Text To Speech API (TTS) is not readily available.

Google has one as part of its Translation API but it is not publicized or actively supported. The Bing/Microsoft Translate API also has a TTS feature and this is supported. In my experience this works very well and allows you to specify the language of the text, which changes the voice and pronunciation that is used.

Accessing this API is easy enough using a wrapper library, such as https://github.com/CodeBlock/bing_translator-gem (disclaimer: I added the speak() function to this ruby gem) but it returns binary data that represents an MP3 file. The HTML5 audio tag allows you play audio files, but not, apparently, to play the data itself.

The result is that the TTS output must be first written to a file and then played via the audio tag.

To demonstrate how these different pieces fit together, I have written a TTS demo app that consists of a Javascript in a web page that sends the query text to a Sinatra app, using ajax. The server in turn sends this to Bing and get back the audio. The server then writes this to a file on Amazon S3 and returns the URL for this back to the web page where the audio is played.

The Live Demo for this is at http://bing-translate-tts-demo.craic.com/ and the code required to implement the whole thing is freely available at https://github.com/craic/bing_translate_text_to_speech

The demo has several moving parts and setting it up for yourself requires experience with Sinatra, S3, etc.


4 comments:

Erdoğan FIRAT said...
This comment has been removed by the author.
Erdoğan FIRAT said...

hello

Asif Rabbi said...

Very Informative Post. Thanks for sharing.
--->>Translate Webpage or Words Online By Google Translate

jowdjbrown said...

The result is that the TTS output must be first written to a file and then played via the audio tag.speech recognition software

Archive of Tips