Mastering Pronunciation

Basic tweaks

About the voice of the robot

The voice you hear from your robot is automatically generated from written text using a text to speech engine based on a language package. This language package contains a database of recorded speech and rules to generate the proper pronunciation. To create the speech database of a given language package, a (native speaker) vocal talent recorded her voice, which was then processed to generate as many speech fragments as possible. These are phonemes. Phonemes are put together to recreate virtually any word. To avoid the uncanny valley feeling, a filter was added to the language packages to make the voice sound a bit more artificial and therefore robotic (as it makes people uneasy to hear a human voice coming from a robot).

Please note that you should never have to record someone's voice and play it as the robot's. Inserting a sound file in the robot's speech is reserved for sound design in order to keep speech uniform. Therefore, you should always use the language package of the language you want the robot to speak and understand. If the pronunciation is not perfect, then you should definitely use this lesson, you will find some useful tips on how to improve it!

Text editing tools

You can tweak with various tools depending on your needs and preferences.

The Robot's webpage

Click on your robot's chest button to obtain his IP address, and type it in your browser to access his webpage.
Select the "settings" tab and check that your robot's language is correct for the text you're about to make him say.

Choosing your robot's language via the webpage.
Choosing your robot's language via the webpage.

Select the first tab again ("my robot"), write or paste the text you want to check in the text box and press "enter" (don't forget to check the volume level beforehand!).

The robot's webpage
A user can remote control the Pepper via its Web Page and make them say a short sentence like `Hello world!` by editing a text in a text box.

It's quick and easy, but the text box is small, so if you want to tweak more than a sentence or two, you will find it easier to use a text editor or Choregraphe (see below).

Qicli command

Click on your robot's chest button to obtain his IP address, and open a terminal to connect:

ssh nao@10.0.xxx.xxx
or
rssh 10.0.xxx.xxx

(for Windows users, enter your robot's IP address in the appropriate box)

Note: even though it says "nao", this works for Pepper too!

You can then make your robot speak with the following command:

qicli call ALTextToSpeech.say "Hello world!"

As with the robot's webpage, this is a quick and easy way to test your tweaks, but for more than a sentence or two, you will find it easier to use a text editor or Choregraphe (see below).

Choregraphe

Using Choregraphe to edit and test your outputs is a bit longer to set up than using the webpage or a terminal, but you won't have to copy and paste all the time, and it will allow you to check for potential typos that can cause compilation errors in dialogs later on.

Please note that the following steps are for Choregraphe 2.8, earlier versions will sometimes differ.

First, you will have to activate some parts of the interface that are not present by default: go to the top bar, select "View", and tick the boxes for "Script editor" and "Dialog" ("Log viewer" is also recommended to monitor what happens).

Choregraphe: activating views
Choregraphe: activating views

Then, create a Dialog Topic. A Dialog Topic is a script file containing the rules of a human-robot conversation as a multilingual set of QiChat scripts. You can create and edit them with Choregraphe 2.8 as below.

Choregraphe: creating a dialog topic
Choregraphe: creating a dialog topic

Name your new dialog topic and select at least a language as below:

Choregraphe: naming the new dialog topic
Choregraphe: naming the new dialog topic

By default, the only language available is English. You can choose to support more languages by clicking on "Properties", and ticking the ones you want to support:

Choregraphe: selecting more languages
Choregraphe: selecting more languages

This way, they will be available in the topic. If you want to support several language, you will have a .top file for each one.

Open the English .top file:

Choregraphe: opening the .top file
Choregraphe: opening the .top file

Note: check that your robot's language is the same as the one of the .top file you are working on.

And write a test rule as underneath:

u:(test)
Hello world!

Choregraphe: writing a dialog
Choregraphe: writing a dialog

You can then copy/paste the robot's output in the webpage or terminal to check it, or use Choregraphe to make the robot talk directly.

Drag and drop the .dlg file to the behavior section to create the dialog box, and link it to the start button:

Choregraphe: creating a dialog box
Choregraphe: creating a dialog box

Play the behavior:

Choregraphe: playing the behavior
Choregraphe: playing the behavior

Say "test" to your robot, or type in in the dialog widget: your robot will say the output:

Choregraphe: entering an input in the dialog widget
Choregraphe: entering an input in the dialog widget

If you want to correct it, stop and restart the behavior to take the changes into account.

Note: on 2.9, the only tool available is Android Studio.

Testing examples

In the following lesson, you will be able to listen to the various examples directly by clicking on the sound files, but you can also test them on your robot by copy/pasting them via the webpage. Don't forget to check that your robot's language is correct for the text you're about to make him say!
You can change your robot's language here:

Choosing your robot's language via the webpage.
Choosing your robot's language via the webpage.

Punctuation

When you want the robot to say something, you have to write the sentence, which is then converted to speech. Written and spoken languages are very different, and we all have unconscious bias when using one or the other. Therefore, when writing the robot's outputs, it is very important to keep in mind that these sentences will be said, not read, and to adapt them accordingly.

That being said, there is still one writing convention that should still be respected: using uppercase at the beginning of a sentence and end punctuation at the end to enable the TTS engine to distinguish sentences correctly. This makes a difference in the prosody, each sentence being treated separately.

TTS takes care of the intonation automatically: it naturally falls at the end of a non-interrogative sentence, which is why end punctuation is important. Please note that for the TTS engine, there is no difference between a full stop (".") and an exclamation mark ("!"): both will make the intonation go down. But for interrogative sentences, the intonation should rise instead, which is why the question mark ("?") has a different effect on the sentence prosody.

Pauses

Commas (",") make a small pause in the speech and help separate words better when needed, so don't use them as you would in writing, but consider where the robot should make a pause in his speech to sound natural, as in the example below:

"Oh wait!" is not very natural, "Oh, wait!" is better.

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for 'Oh wait!'

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for 'Oh, wait!'

If the pauses from commas or periods still don't sound right, you can define their duration more precisely with a pause tag.


"\pau=value\" (value between 1 and 30000): pauses TTS during <value> milliseconds.

For example:

  • What you read: "Under Louis the 14th's reign, in the great city of Paris the cardinal is plotting. The bravest of the musketeers is fighting for lost causes and the love of his life."
  • How you should tweak it:
Under Louis the 14th's reign, in the great city of Paris \pau=300\ the cardinal is plotting. \pau=500\ The bravest of the musketeers \pau=10\ is fighting for lost causes and the love of his life.

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for `Under Louis the 14th's reign, in the great city of Paris the cardinal is plotting. The bravest of the musketeers is fighting for lost causes and the love of his life.`

Notice the end dot followed by a pause: the end punctuation is important for the intonation, while the added pause gives the speech a storytelling effect.

Important note: using tags

To avoid compilation errors when using the tags, be careful to:

  • use uppercase for the body of the tags to avoid them being mistaken for escape characters. Using double backslashes can work too, but some tags are not recognized this way, so uppercase is a safer bet.
  • always leave a space between a quote and the backslash of a tag.

Spelling & Pronunciation

When words or proper nouns are mispronounced by the robot, the most basic tweak to change the spelling to force the pronunciation. Be careful though, you can't use this if you intend to display this text on Pepper's tablet.

The most common mispronunciations are:

  • Foreign words

"Change for line 6 at Charles de Gaulle Étoile." vs "Change for line 6 at sharl dugawl aytwol."

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for “Change for line 6 at Charles de Gaulle Étoile.“

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for “Change for line 6 at sharl dugawl aytwol.“

  • Brand names

"Louis Vuitton" vs "luy vueeton"

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for `Louis Vuitton`

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for `luy vueeton`

  • Proper nouns

"Nice to meet you Line!" vs "Nice to meet you Lynn!"

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for `Nice to meet you Line!`

Your current browser is not supported to display this sound in a player. Download file manually
Sound file for `Nice to meet you Lynn!`

Japanese is a very interesting language for tweaking, because it is necessary to compose with not one but three alphabets: the hiragana, katakana and kanji. Words are usually written in either of them, or a combination of kanji and hiragana. The hiragana and katakana alphabets are phonetics, but the kanji are more complex: they each have a variety of possible pronounciations, and also carry a meaning.
Tweaking in Japanese usually involves using a mix of all three alphabets, regardless of common usage and meaning, to obtain the best pronunciation: the result is extremely hard to make sense of for a casual reader.

For example, "battery":

Normal writing, in katakana: バッテリー

Tweak mixing katakana, hiragana and kanji: ばッ照りぃ

(the kanji for "shining" is used for its phonetic value, regardless of the meaning).

Here are some more examples:

20歳: ハタチ ("twenty years old", normal writing with number and kanji, tweak with katakana)

カレー: 彼絵 ("curry", normal writing with katakana, tweak with the kanji for "he" and "painting")

もう一回: モー一回 ("one more time", normal writing with hiragana and kanji, tweak with katakana and kanji)

おはよう: お葉よう ("good morning", normal writing with hiragana, tweak with hiragana and the kanji for "leaf")

If you want to correct the pronunciation of a given word or phrase more than once, or simply keep your text cleaner, you can use skins (only in a Dialog Topic file using the qiChat language) as below.


"s:(Output) ^replace(Expression1, Expression2, Frequency)" replaces _expression1_ with _expression2_ in each matching _output_ with the given _frequency_ (value between 0 and 1, with 0: never and 1: always).

For example, the following skin will always replace おはよう with the correct pronunciation お葉よう:

s:({}おはよう{}) ^replace(おはよう, お葉よう, 1)

Now that you've got some basic clues for fixing mispronunciations and changing the prosody with punctuation and pauses, you can go further with all the subtle variations of TTS tags. On to the next step!