Reading non-ASCII text
Suppose you have your robot configure to speak in French, and you want it to say some sentences from a data file.
Doing so is a bit trickier than it sounds, because you have to take care of the encoding.
First, download the following files and put them on the robot, in the same directory:
file contains the string “I like coffee”, the files
hold its French
J’aime le café
, so it’s best if you robot can speak French in
addition to English :)
Let’s have a closer look on the file
#! /usr/bin/env python # -*- encoding: UTF-8 -*- """Example: Non ascii Characters""" import qi import argparse import sys import codecs def say_from_file(tts_service, filename, encoding): with codecs.open(filename, encoding=encoding) as fp: contents = fp.read() # warning: print contents won't work to_say = contents.encode("utf-8") tts_service.say(to_say) def main(session): """ This example uses non ascii characters. """ # Get the service ALTextToSpeech. tts_service = session.service("ALTextToSpeech") try : tts_service.setLanguage('French') except RuntimeError: print "No French pronunciation because French language is not installed. Pronunciation will be incorrect." say_from_file(tts_service, 'coffee_fr_utf-8.txt', 'utf-8') say_from_file(tts_service, 'coffee_fr_latin9.txt', 'latin9') tts_service.setLanguage('English') # the string "I like coffee" is encoded the exact same way in these three # encodings say_from_file(tts_service, 'coffee_en.txt', 'ascii') say_from_file(tts_service, 'coffee_en.txt', 'utf-8') say_from_file(tts_service, 'coffee_en.txt', 'latin9') if __name__ == "__main__": parser = argparse.ArgumentParser() parser.add_argument("--ip", type=str, default="127.0.0.1", help="Robot IP address. On robot or Local Naoqi: use '127.0.0.1'.") parser.add_argument("--port", type=int, default=9559, help="Naoqi port number") args = parser.parse_args() session = qi.Session() try: session.connect("tcp://" + args.ip + ":" + str(args.port)) except RuntimeError: print ("Can't connect to Naoqi at ip \"" + args.ip + "\" on port " + str(args.port) +".\n" "Please check your script arguments. Run with -h option for help.") sys.exit(1) main(session)
First, notice how we do not use
, specifying the
Also notice how we decode the result of the read from the file.
The object returned by
object, and we need to
encode it back to get a
object encoded in i’UTF-8’, usable the TTS
Trying to run
Traceback (most recent call last): File "non_ascii.py", line 22, in <module> main() File "non_ascii.py", line 18, in main say_from_file(filename) File "non_ascii", line 10, in say_from_file print contents UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 13: ordinal not in range(128)
Notice at last that regardless of the file encoding, everything gets encoded to ‘UTF-8’ before being sent to the text-to-speech proxy.
Running the example ¶
Open a SSH connection on the robot, and type
$ python non_ascii.py
Going further ¶
If you are not sure whereas your file is UTF-8 encoded, you can use something like:
with codecs.open(filename, encoding="utf-8") as fp: try: contents = fp.read() except UnicodeDecodeError: print filename, "is not UTF-8 encoded" return