ALTextToSpeech Tutorial

NAOqi Audio - Overview | API | Tutorial


This tutorial explains how to say a sentence, change the voice effects, or change the language and the voice of the synthesis engine.

Note

All the examples provided are written in Python.

Creating a proxy on the module

Before using the TTS commands, you need to create a proxy on the TTS module.


            
             # Creates a proxy on the text-to-speech module
from naoqi import ALProxy

IP = "<IP ADDRESS>"
tts = ALProxy("ALTextToSpeech", IP, 9559)

            
           

Saying a text string

You can say a sentence using the say function.


            
             # Example: Sends a string to the text-to-speech module
tts.say("Hello World!")

            
           

Modifying speed

The API allows some modifications of the voice’s speed. Example:


            
             tts.setParameter("speed", 200)

            
           

For further details, see ALTextToSpeechProxy::setParameter . To go back to the default speed value:


            
             tts.setParameter("speed", 200)
tts.resetSpeed()

            
           

To change that default value:


            
             tts.setParameter("defaultVoiceSpeed", 200)

            
           

Changing the value of the default speed also resets the current value of the speed.

Changing the default speed value is a non-persistent modification. If you switch to another language and then come back, the default speed value will be back to 100. To keep a specific value, modify the voiceSettings.xml file stored in the language package by adding:


            
             <Setting name="defaultVoiceSpeed" description="Voice speed" value="150.0"/>

            
           

This value will then be loaded everytime the language is used.

Modifying pitch

The API allows some modifications of the voice’s pitch. Example:


            
             tts.setParameter("pitchShift", 1.1)

            
           

This command raises the pitch of the main voice. 1.1 is the ratio between the fundamental frequency of the transformed voice and the original one.

Modifying double voice parameters

The double voice rendering can be modified using 3 parameters:

  • doubleVoice: the ratio between the fundamental frequency of the transformed voice and the original one.
  • doubleVoiceLevel: the ratio between the volume of the second voice and the first one.
  • doubleVoiceTimeShift: the time shift between the second voice and the first one.

For example, a “robotic sounding” voice can be generated using these commands:


            
             tts.setParameter("doubleVoice", 1)
tts.setParameter("doubleVoiceLevel", 0.5)
tts.setParameter("doubleVoiceTimeShift", 0.1)
tts.setParameter("pitchShift", 1.1)

            
           

Changing the language of the synthesis engine

The language of the synthesis engine can be changed using the setLanguage function. The list of the available languages can be obtained with the getAvailableLanguages function.


            
             # Example: set the language of the synthesis engine to English:
tts.setLanguage("English")

            
           

Changing the voice of the synthesis engine

You can also change the voice of the synthesis engine with the setVoice function. The list of the available voices can be obtained with the getAvailableVoices function. When you change the voice, the current language is automatically changed by the language corresponding to this voice.


            
             # Example: use the voice of Kenny:
tts.setVoice("Kenny22Enhanced")

            
           

Using tags for voice tuning

Available for all engines

Different tags are available to change the pronunciation regarding to the context of your application. According to the engine of the language package tags available can be different.

Changing the pitch

Available for all engines

Insert \\vct=value\\ in the text. The value is between 50 and 200 in %. Default value is 100.


              
               # Say the sentence with a pitch of +50%
tts.say("\\vct=150\\Hello my friends")

              
             

Changing the speaking rate

Available for all engines

Insert \\rspd=value\\ in the text. The value between 50 and 400 in %. Default value is 100.


              
               # Say the sentence 50% slower than normal speed
tts.say("\\rspd=50\\hello my friends")

              
             

Inserting a pause

Available for all engines

Insert \\pau=value\\ in the text. The value is a duration in msec.


              
               # Insert a pause of 1s
tts.say("Hello my friends \\pau=1000\\ how are you ?")

              
             

Changing the volume

Available for all engines

Insert \\vol=value\\ in the text. The value is between 0 and 100 in %. Default value is 80. Values > 80 can introduce clipping in the audio signal.


              
               # Say the sentence with a volume of 50%
tts.say("\\vol=50\\Hello my friends")

              
             

Inserting a bookmark

Available for all engines

This tag is very useful if you want to synchronize the speech and a specific action of the robot.

Insert \\mrk=value\\ in the text. The value is between 0 to 64535.

The value will be raised in the “ALTextToSpeech/CurrentBookMark” event of ALMemory.


              
               tts.say("\\mrk=0\\ I say a sentence.\\mrk=1\\ And a second one.")

              
             

Resetting control sequences to the default

Available for all engines

Insert \\rst\\ in the text.


              
               tts.say("\\vct=150\\\\rspd=50\\Hello my friends.\\rst\\ How are you ?")

              
             

Nuance only

Setting the type of prosodic boundary

Available only for Nuance packages

Insert \\bound=value\\ in the text. The possible values are:

  • W: Weak phrase boundary
  • S: Strong phrase boundary
  • N: No boundary

              
               # Say the sentence with a weak phrase boundary (no silence in speech)
tts.say("\\bound=W\\ Hello my friends")
# Say the sentence with a strong phrase boundary (silence in speech)
tts.say("\\bound=S\\ Hello my friends")

              
             

Setting the word prominence level

Available only for Nuance packages

Insert \\emph=value\\ in the text. The possible values are:

  • 0: Reduced
  • 1: Stressed
  • 2: Accented

              
               tts.say("\\emph=0\\ There is a total of 32 apples and 12 oranges")
tts.say("\\emph=1\\ There is a total of 32 apples and 12 oranges")
tts.say("\\emph=2\\ There is a total of 32 apples and 12 oranges")

              
             

Controlling end-of-sentence detection

Available only for Nuance packages

Insert \\eos=value\\ in the text. The possible values are:

  • 0: suppress a sentence break
  • 1: force a sentence break

Warning

must appear immediately after the symbols that triggers the break


              
               tts.say("Hello my friends.\\eos=0\\How are you ?") # no break
tts.say("Hello my friends.\\eos=1\\How are you ?") # break

              
             

Controlling the read mode

Available only for Nuance packages

Insert \\readmode=value\\ in the text. The possible values are:

  • sent: Sentence mode (default value)
  • char: Character mode (similar to spelling)
  • word: Word-by-word mode

              
               tts.say("\\readmode=sent\\ Hello my friends")
tts.say("\\readmode=char\\ Hello my friends")
tts.say("\\readmode=word\\ Hello my friends")

              
             

Guiding text normalization

Available only for Nuance packages

Insert \\tn=value\\ in the text. The possible values are:

  • spell: start spelling out the following input text.
  • address: expand the following text as an address.
  • sms: expand the following text as an SMS message.
  • normal: reset to the regular text normalization.

              
               tts.say("\\tn=address\ 244 Perryn Rd Ithaca, NY \\tn=normal\\ That’s spelled \\tn=spell\\ Ithaca \\tn=normal\\.")
tts.say("\\tn=sms\\ Carlo, can u give me a lift 2 Helena's house 2nite? David \\tn=normal\\")

              
             

Setting the spelling pause duration

Available only for Nuance packages

Insert \\spell=value\\ in the text. The value is inter-character pause in msec.


              
               tts.say("\\tn=spell\\hello")
tts.say("\\tn=spell\\\\spell=2000\\hello")

              
             

Setting the end of sentence pause duration

Available only for Nuance packages

Insert \\wait=value\\ in the text. The value is between 0 and 9, where the pause will be 200 msec multiplied by that number.


              
               tts.say(“\\wait=2\\ There will be a short wait period after this sentence. \\wait=9\\ This sentence will be followed by a long wait period. Did you notice the difference?”)

              
             

Inserting a digital audio recording

Available only for Nuance packages

Insert \\audio=path\\ in the text. The path is the path to the audio file on the robot.

Warning

The audio file must be a WAV file that contains linear 16-bit PCM samples at 22050Hz


              
               tts.say("\\audio=/usr/share/naoqi/wav/0.wav\\")

              
             

Inserting phonetic text

Available only for Nuance packages

By default Nuance engine considers the input as orthographic text, but it also supports other phonetic text. The tag \toi=value\ marks the type of the input starting after the control sequence:

  • lhp: Phonetic text in the phonetic alphabet L&H+ (Nuance specific alphabet).
  • orth: Orthographic text (default)

Note: if you want to reset the type of input don’t forget to insert \toi=orth\ after the word


              
               tts.setLanguage("English")
tts.say("\\toi=lhp\\‘zi.R+o&U \\toi=orth\\")
# Same as
tts.say("zero")

              
             

Changing the style of the voice

Available only for Nuance Chinese, English and French packages

Insert \\style=value\\ in the text. The possible values are:

  • neutral (default value)
  • joyful
  • didactic

Note: when you change the style, the style set is saved for the next sentences.


              
               tts.say("\\style=joyful\\ Today I am feeling happy.")
tts.say("And now I speak with a joyful voice.")
tts.say("\\style=didactic\\ I can explain you how my ears work.")
tts.say("And now I speak with a didactic voice.")
tts.say("\\style=neutral\\ Everything is normal.")
tts.say("And now I speak with a neutral voice.")

              
             

Voxygen only

Inserting phonetic text

Available only for Voxygen packages

By default Voxygen engine considers the input as orthographic text, but it also supports other phonetic text. The tag \phoneme=value\ will read the value using phonetic alphabet x-voxygen (Voxygen specific alphabet):


              
               tts.setLanguage("English")
tts.say("\\phoneme=h_@_l_ou\\")
# Same as
tts.say("Hello")

              
             

Setting the word prominence level

Available only for Voxygen packages

Insert \\emph=value\\ in the text. The possible values are:

  • 0: Reduced
  • 1: Moderate
  • 2: Strong

              
               tts.say("Hello my \\emph=0\\ friends") # reduced
tts.say("Hello my \\emph=1\\ friends") # moderate
tts.say("Hello my \\emph=2\\ friends") # accented

              
             

Controlling the read mode

Available only for Voxygen packages

Insert \\sayas=value\\ in the text. The possible values are:

  • date: Date mode
  • time: Time mode
  • telephone: Telephone number spelling
  • characters: Character mode (similar to spelling)
  • cardinal: number spelling in cardinal mode
  • ordinal: number spelling in ordinal mode

              
               tts.say("\\sayas=date\\ 19/11/2015")
tts.say("\\sayas=time\\ 10:00 AM")
tts.say("\\sayas=telephone\\ My phone number is 0556340548")
tts.say("\\sayas=characters\\ Combo")
tts.say("\\sayas=cardinal\\ There are 42 apples.")
tts.say("\\sayas=ordinal\\ Great ! 1 place !")

              
             

Inserting a digital audio recording

Available only for Voxygen packages

Insert \\audio=path\\ in the text. The path is the path to the audio file on the robot.


              
               tts.say("\\audio=/usr/share/naoqi/wav/0.wav\\")

              
             

Changing the timbre of the voice

Insert \\timbre=value\\ in the text. The value is between 20 and 200 in %. Default value is 100.


              
               # Say the sentence with a timbre of 70%
tts.say("\\vol=70\\Hello my friends")

              
             

Changing the style of the voice

Available only for Voxygen English

Insert \\style=value\\ in the text. The possible values are:

  • neutral (default value)
  • playful
  • narrative

Note: when you change the style, the style set is saved for the next sentences.


              
               tts.say("\\style=playful\\ Today I am feeling happy.")
tts.say("And now I speak with a joyful voice.")
tts.say("\\style=narrative\\ I can explain you how my ears work.")
tts.say("And now I speak with a didactic voice.")
tts.say("\\style=neutral\\ Everything is normal.")
tts.say("And now I speak with a neutral voice.")

              
             

AiTalk only

Setting the word prominence level

Available only for AiTalk packages

Insert \\emph=value\\ in the text. The value is between 0 and 200 in %. Default value is 100.


              
               tts.say("こんにちは\\emph=200\\マスター。\\emph=100\\")

              
             

Increasing speech volume

Available only for Language Japanese package

Set the parameter enableCompression to activate the audio compression. This will increase speech volume


              
               tts.say("こんにちは!普通の声で話す!")
tts.setParameter("enableCompression",True)
tts.say("こんにちは!大きいな声で話す!")

              
             

Acapela only

For specific tags for Acapela please refer here:

Acapela Mobility Text TAGS Documentation