Robot Dialogue Guidelines

What you should know before writing dialogues for Pepper and NAO
Robot Dialogue Guidelines

Writing dialogue for robots may seem easy, but there is actually a lot to keep in mind: from naming conventions to best practices for writing good inputs and outputs, plus some tips about scenarios and how to best use Pepper's tablet. Let's check all this together and see how to make great dialogues for Pepper and NAO!

1. What is dialogue?

This article will give you a broad overview of the best practices and guidelines to make sure your dialogues are coherent and easy to maintain while giving the best results for a successful interaction.

Some parts are specific to NaoQi and therefore not applicable to QiSDK, and others focus explicitly on QiChat, SoftBank Robotics' proprietary language for dialogue. Apart from these specific tips, the advice given here can and should be applied to any means of creating a dialogue for Pepper and Nao.

First, let's make sure that everyone is on the same page with some basic dialogue vocabulary:

  • Inputs are what the robot can understand and answer to. Not all inputs are verbal: the robot can react to other stimuli, such as a touch to his sensors.
  • Outputs are what the robot says or does, either in answer to an input or as a proactive query.
  • A dialogue is a conversation between a robot and a human. The robot can simply answer the user's questions but also lead the conversation proactively, in this case an additional scenario is necessary to make the dialogue experience meaningful and enjoyable.

There are different rules for writing inputs and outputs, as they don't have the same goal:

Inputs are all about understanding, so the more you write, the better! You have to use as many variations as possible to make sure that users can speak as they usually do, and don't have to read a cheatsheet to know exactly what to say. Add some slang and oral variations to make sure to understand all users, but be careful, inputs have to be spelled correctly in order to use speech recognition at its fullest.

Outputs on the other hand are more restricted in their variations: they have to have some variety too, so that the robot doesn't seem to repeat himself, but you want him to speak well at all times, so there will be less possibilities, and absolutely no slang! Outputs don't have to be spelled correctly though, in fact you will often find that you have to alter them to achieve perfect pronunciation - but more on that later.

Finally, don't forget that Pepper and Nao can - and should - move! Conversation between humans is not only about words, body language is important too, so don't forget about it when writing dialogues: communication is not only verbal!

Let's go into more details, starting with the syntax guidelines to keep your code clean and smooth.

1. Best practices for QiChat syntax

This is about the cosmetic part of writing topics (a file format containing the dialogue rules written with the QiChat language), to keep the syntax coherent and easy to read. For more specific instructions about using QiChat, you can check out this article.

Naming conventions

There are specific naming conventions for each item, but you will find a lot of common rules.

Dialog topic names
  • Use English.
  • Use lowercase.
  • Use underscores to separate words.
  • Never use any other punctuation than underscores.
  • Start topic names with "dlg_".
topic:~dlg_test()
QiChat concept names
  • Use English.
  • Use lowercase.
  • Use underscores to separate words.
  • Never use any other punctuation than underscores.
concept:(my_concept)
concept:(simple)
Bookmarks
  • Use English.
  • Use uppercase.
  • Use underscores to separate words.
  • Never use any other punctuation than underscores.
%BOOKMARK
%ANOTHER_BOOKMARK
QiChat variables

NaoQi only, not applicable to QiSDK

  • Use English.
  • Don't use any punctuation.

For simple ALMemory variables, use CamelCase starting with an uppercase letter.

$MyApplication/MyVariable
$MyVariable

For user variables, start with "$user/" then use camelCase starting with a lowercase letter.

$user/myVariable

For input and output variables, use their exact name (case is relevant).

$onStopped

Indentation and syntax

Rules and subrules

  • Don't indent u: rules and proposal.
  • Indent u1: rules once, u2: rules twice, etc.
  • Put outputs below the rule, with the same indentation.
proposal:
Coffee is one of the most popular products in the whole world.
Can you tell me how much you like coffee? 
    u1:(I like it)
    Cool! Coffee rules!
    u1:(it's okay)
    Fair enough!
    u1:(I don't like it)
    Even though you don't like coffee, we could still talk about it, right?
        u2:(~yes)
        Great!
        u2:(~no)
        Ok, no problem!

Lists

  • Indent each individual item if they are long or there are more than two.
u:(How are you?)
^rand[
    "Great!"
    "I'm feeling good!"
    "I'm fine!"
]

Comments

  • Put comments above the rule being commented, with the same indentation.
#The robot says hello.
proposal:
Hello.

TTS markers

NaoQi only, not applicable to QiSDK

  • Use spaces around the markers.
  • If they are used in code, double the backslashes to avoid syntax errors.
- Hello\pau=500\\vct=200\world!
+ Hello \pau=500\ \vct=200\ world!
+ Hello \\pau=500\\ \\vct=200\\ world!

Ordering

NaoQi only, not applicable to QiSDK

Lexicons and Basic Channel topics should be included at the top of the .top file, just below the header. Then add the specific concepts if necessary, and finally the u: and proposal rules.

topic:~dlg_test()
language:enu

include: lexicon_enu.top
include: dialog_adjust_volume/dlg_adjust_volume/dlg_adjust_volume_enu.top

concept:(my_concept)[...]

Thank you for sticking with us through these technical guidelines and conventions, we know they are not the most exciting part of this article, but they are important nonetheless, as clean and coherent code is easier to read and maintain. Now that you know how to make your QiChat look good, let's see how to make it work well!

2. Best practices for inputs

Inputs are the dialogue equivalent of the underwater part of an iceberg: unseen even though far bigger than the part above the surface, and providing a stable base for the rest. Indeed, in order to answer correctly, the robot first has to understand what was said, so there is no good output without solid inputs. Here are the golden rules:

Never write inputs in another language than the one defined for the topic.

Always use correct spelling.

Inputs should be at least 3 syllables long to avoid overmatching (except for subrules that can use wordspotting).

Be careful for conflicts between inputs:

  • Don't duplicate inputs in different rules at the same level or in the same focus.
  • Avoid phonetically close inputs in different rules.

Write as many variations as you can according to context.

Don't forget that inputs are meant to catch what the user says: you can add some slang and oral variations to be sure to understand everyone.

Write numbers both as digits and spelled out.

u:([2 two])

Only capitalize named entities and exceptions such as "I".

Don't use punctuation, except for "?" if your inputs are meant to be displayed.

Actions

Always validate with the user before doing an action that can potentially interrupt or disrupt the interaction.

User: Speak English.
Robot: Do you want me to speak English?
User: Yes.
Robot, Okay, let's speak English!

Language-dependent rules

For the languages that use the T–V distinction, always use the T form.

For English, use both contracted and non-contracted forms ("do not" and "don't" for example).

Now you have a robot that is able to understand your users reliably thanks to your well-written inputs, but the users will only know they have been heard and understood through the robot's answers, so let's check out how to write good outputs!

3. Best practices for outputs

Don't forget that human communication is filled with a lot of subtle signals, like facial expressions, body language, emphasis, and intonations. The robot cannot imitate many of these subtleties because his face is static, his gestures are not as flexible as a human's and his voice uses text-to-speech software. Therefore, always make sure that the outputs are clear and unambiguous.

Remember that spoken and written languages are two distinct methods of communication: we do not speak the way we write. The robot uses spoken language, neither formal nor too familiar, and certainly not vulgar. Be careful to adapt the robot's vocabulary to the audience.

Always acknowledge what the user said and give feedback before moving on - it doesn't necessarily have to be verbal, depending on the situation, some sound design or an animation can be just as well, or even better.

When an error occurs, the user has to be informed on what went wrong and how to fix it.

Don'ts

The robot must not express anger or irritation towards the user, nor put them down: the robot can sometimes be mad at himself, but at no one else.

In order to preserve a bonding experience, the robot is not self- oriented. This is displayed in the way he speaks to users. Ensure that he uses the pronouns "we," "us," "together," instead of "you," "me," "I" whenever possible.

The robot should never lie about his abilities and pretend to do something he can't.

Avoid anxiety-provoking expressions such as: "Caution," "I'm warning you", "Warning", "Watch out" or "Be careful" to avoid alarming the user. Rephrase these requests to use positive or neutral phrasing instead.

- Caution, battery is critically low.
+ My battery is running low, please plug in my charger.

Slang, insults and swear words are forbidden for outputs. The robot can understand some, but does not say them. He does not self-sensor either: do not insert censoring noises such as a "beep" in place of vulgar words, simply don't use them.

The robot should never talk and make a sound at the same time.

Never leave the robot static and silent for too long during an interaction, and make sure he looks alive when standing alone.

Prosody

Create multiple outputs whenever possible, to avoid repetitions.

Always test the outputs on the robot to make sure that everything is pronounced correctly. If a part is not, you can check the related article on how to fix it here.

Try and keep the outputs short, around 10 words or 4 seconds. Depending on the situation, you can use longer outputs if needed, but keep in mind that the user may not remember everything, and can become bored quickly.

Use correct writing conventions: capitalize the first word, use punctuation.

Use more punctuation than you normally would: TTS is monotonous, adding commas forces the robot to make pauses, therefore making his speech more natural. With NaoQi, you can also use a specific TTS marker to customize the length of the pause: \pau=xxx\ with xxx in milliseconds (more here).

- Oh wait!
+ Oh, wait!

Tip: To know where a pause is needed, read your text out loud to detect when one seems most natural and when you need to breathe in.

Just as our voice is instantly recognizable, so is the robot's. Therefore, his sound and tone must remain consistent throughout the application. There is only one exception to this rule: If the robot imitates a character, make it clear that this is an imitation by inserting a transition between the robot's voice and the voice of the character he is imitating.

NaoQi only, not applicable to QiSDK

The robot has three vocal styles: Neutral, Joyful, and Didactic. Please use Neutral for most applications. Do not use the Joyful or Didactic style often, and when you do, make sure it doesn't sound like two different voices.

Be careful with the sound design: don't make the robot laugh too much, and check that the onomatopoeia fits with the robot's current language.

Questions

Guide the user through the dialogue. Keep in mind that they might not know how to speak to a robot, so make it easy for them:

  • Make it clear that the robot is expecting an answer by using a question.
  • Give some context or list the possible answers
  • Don't list more than three possible answers, if there are more possibilities you can indicate them by saying something like "for example".
- Tell me what you want to do.
+ Do you want to see a dance, play a game or do a quizz?

Questions from the robot must always:

  • Finish with a question mark.
  • Be at the end of a spoken phrase.
  • Have enough subrules to handle all possible answers

Respect the interrogative formal syntax to ask questions (reverse verb and subject).

- Wanna play with me?
+ Do you want to play with me?

Never ask more than one question at a time.

When the robot asks a question, always place it at the end of the output so that the user can answer directly.

- What's your favorite color? Mine is white.
+ My favorite color is white, what's yours?

Avoid open and rhetorical questions, as the robot won't be able to handle all the possible answers. Stick to closed questions, and pay attention to their formulation to anticipate answers.

Language-dependent rules

For the languages that use the T–V distinction, always use the T form.

English

Use contractions in outputs.

French

The negative form in French must have "ne...pas" in order to be understood correctly (it is usually omitted in spoken language, but the robot must use it).

Interrogative form must either reverse the verb and subject, or use "est-ce que". Use no more than one interrogative pronoun such as who or which.

Japanese

The robot uses 僕 for himself.

Animations

Always use animated Speech or custom animations throughout the dialogue, except when it is necessary for the robot to stand still, like for scanning a QRcode: the robot should always seem alive.

And now your robot is able to understand humans and answer them in a correct and entertaining way! But a dialogue is more than one question and answer, so let's check the specific rules for scenarios.

4. Best practices for scenarios

As we said before, a dialogue can be simply about answering the user's questions, but the robot can also lead them through a scenario with a series of proactive questions. In that case, the robot must of course still be able to answer questions not related to the chosen scenario.

NaoQi only, not applicable to QiSDK

We suggest that you use the Basic Channel topics alongside your application, so that the robot already has conversation basics for grounded topics (related to himself).

A scenario can be started either by the robot or the user, but afterwards the robot keeps it going. Also, please note that relying on the user to trigger it means it will not always happen.

Remember that most people have never interacted with a robot and aren't sure of what to do. Users are sometimes shy when interacting with the robot in public.

Take care of your users by creating accessible and predictable applications, guiding them by indicating what to do and how to do it, and helping them if they're in trouble in the flow of the interaction. Leave no room for guessing.

Make your scenario flexible enough to be relevant for both newbies and experts: provide guided steps for the first and shortcuts for the latter.

Don'ts

The robot proactively engages users in conversation, asking closed-ended questions and proposing content. Never ask fully-open questions: the answers must be restricted to a given field so that they can all be predicted (avoid questions such as "what is your favorite food?" or "what is your hobby?" for example).

However, don't use more than two successive yes/no questions: there must be some more open alternatives in between or the scenario will be boring for the user.

There should be no dead-ends or gaps in conversation flow.

Subrules

There will always be users who won't follow the scenario, so the robot has to understand more answers than he suggests.

For yes/no questions, don't expect just "yes" and "no" as answers: plan for all possibilities such as "I don't know", "I don't care", "it's up to you" or "later".

NaoQi only, not applicable to QiSDK

You can include the lexicon and use its concepts for all the suggested subrules to ensure proper variability.

For semi-open questions, don't forget to plan more than the obvious answers: for example, if the user has to choose between three games, the answers can be "game 1/2/3" but also "the first/last one", "choose for me", "I don't want to play", etc. (and all their variants of course).

Always plan for exit triggers such as "I'm done", "I'm not interested", etc. to allow the user to leave the scenario if he wishes to.

Another important subrule is the "not understood" one: you can use the "e:Dialog/NotUnderstood" event to catch when the robot didn't understand what the user said, and react accordingly. It all depends on the context of the question of course, but we recommend that the first time, you simply ask the user to repeat, then if the event is triggered again, you can suggest something else, like using the tablet. Never loop, and never leave the user without a solution.

User: blah blah blah
Robot: I didn't understand. Can you repeat that please?
User: blah blah blah
Robot: I'm sorry, it seems I cannot hear you very well. Please make a choice on my tablet.

Adding a timer can also be an additional solution, especially for B2B when some users will leave unexpectedly. Repeat the question if the user hasn't answered after 10 seconds, but be careful not to create an infinite loop.

The depth of the scenario tree must not exceed 3, and 1 or 2 is usually enough. It is better to make converging blocks, especially when several answers lead to the same point: use functions such as ^gotoReactivate in NaoQi or ^enableThenGoto in QiSDK to create links and avoid too many subrules and duplicates. Let's check how to do this with a very simplified example (for the sake of clarity, we removed many input variants and subrules, as this is only meant to demonstrate the concept, not to provide a usable scenario)!

First, with subrules mimicking a decision tree:

proposal:
I have observed that coffee is oftentimes the drink of choice for philosophers and great thinkers. Why do you think that is?
    u1:([stimulant ~i_dont_know])
    ^firstOptional["$1==stimulant That's right!"]
    Coffee is a stimulant! However, it's traditionally served hot. Which means you have to ingest it carefully. So, coffee speeds you up, yet you have to literally slow down to drink it. That's an interesting juxtaposition, isn't it?
        u2:(~yes)
        I guess that's why there is such a thing as the coffee house philosopher!
        u2:(~no)  
        Alright, maybe it was a strange idea!
        u2:(what is juxtaposition)
        I had to look it up too! It's when two things being seen or placed close together have a contrasting effect. Like drinking something very slowly to speed you up! Got it?
            u3:(~yes)
            Great!
            u3:(~no)
            Yeah, it's a bit difficult, I'm not sure I fully understood it either.

Now the same scenario, but with more rules and redirections instead:

proposal: %COFFEE_PHILOSOPHICAL
I have observed that coffee is oftentimes the drink of choice for philosophers and great thinkers. Why do you think that is?
    u1:(stimulant)
    That's right! ^goto(COFFEE_EXPLANATION)
    u1:(~i_dont_know)
    ^gotoReactivate(COFFEE_EXPLANATION)

proposal: %COFFEE_EXPLANATION
Coffee is a stimulant! However, it's traditionally served hot. Which means you have to ingest it carefully. So, coffee speeds you up, yet you have to literally slow down to drink it. That's an interesting juxtaposition, isn't it?
    u1:(~yes)
    I guess that's why there is such a thing as the coffee house philosopher!
    u1:(~no)  
    Alright, maybe it was a strange idea!
    u1:(what is juxtaposition)
    ^gotoReactivate(JUXTAPOSITION_EXPLANATION)

proposal: %JUXTAPOSITION_EXPLANATION
I had to look it up too! It's when two things being seen or placed close together have a contrasting effect. Like drinking something very slowly to speed you up! Got it?
    u1:(~yes)
    Great!
    u1:(~no)
    Yeah, it's a bit difficult, I'm not sure I fully understood it either.

The two versions work similarly, but the second one is by far easier to read and maintain: there is no need to make a test to adapt the output or worse, duplicate the subrules. Plus, as there are only u1 rules in the second version, if you need to change or add anything afterwards, you can easily find it and make the modification without affecting the rest of the rules.

Scenarios are tricky, because it is impossible to predict each and every person's reactions, and the robot is far less adaptable than a human would be, but we hope that with these tips, you will be able to create a great experience for your users!

5. Best practices for Pepper's tablet

The tablet is helpful during interactions because it allows Pepper to display or highlight information, signs, and conversation indicators, in addition to the verbal communication taking place. But always keep in mind that Pepper is a robot, not a tablet-holder: the focus should always be on talking to the robot, and the tablet only used as a backup assistant.

Do not hesitate to leave the tablet screen without content most of the time. That way, when something is shown, it catches the eye, but doesn't prompt the user to tap it for information.

Nonetheless, both verbal and tablet responses should be possible at every step so that users have the choice to connect with Pepper in whichever way they prefer.

Every action should therefore be possible to initiate via voice or touch, and readable items on the tablet have to be robust verbal triggers.

You can use quotes to make it more obvious that the suggestions are sayable.

The tablet display is one of the first places users will see information: it improves user experience by showcasing content that supports Pepper's dialogue. Therefore, use the tablet for information that is difficult or awkward to communicate verbally, such as a long list of choices or a high scores table. You can also help users by displaying suggestions of what to say to Pepper at a given moment.

Always use the same wording for what Pepper says and what is displayed on the tablet (the sentence written on the tablet can be a summary of what Pepper says, but it has to use similar wording).

Display the items on the tablet in the same order as what Pepper said them.

Be careful with pronouns, inputs displayed on the tablet should be coherent.

- Speak with Pepper
+ Speak with you

Animations

When it is necessary for users to focus on the tablet, Pepper should make a gesture to present the tablet and look down at it.

Dedicated animations exist in the animation library, do not hesitate to use them!

When users have to use the tablet to enter information such as email or phone number, or scan a QRcode, Pepper must stop moving to avoid hitting them and making the task more difficult.

Pepper's tablet is a powerful tool: it can help users navigate the conversation more smoothly, but can also take their attention away from the robot if used incorrectly, so be sure to remember that it is not just a tablet, and should not be treated as such: it is part of Pepper, and therefore part of the experience of interacting with Pepper, but never the main focus.

Thank you for taking the time to read this far! Creating a dialogue is a long process, and full of trials and errors, but we hope that you found our recommendations insightful and that they will help you minimize the inconvenience and maximize your efficiency and effectiveness throughout the whole process, and that you will create great interactions. Pepper and NAO are humanoid robots and as such, interact mainly by voice, but not only: sound design, lifelike animations and Pepper's tablet are also very important in order to achieve a great experience overall. For more tips and know-how about writing dialogue for our robots, do not hesitate to check out the related articles and lessons on our website!

Glossary

Topic: a file format containing the dialogue rules written with the QiChat language (a series of human inputs and robot outputs).

QiChat: the QiChat is a Softbank Robotics proprietary language forked from ChatScript, a combination of Natural Language engine and dialogue management system designed initially for creating chatbots. QiChat provides specific robotic functions to manage both inputs (words and human actions) and outputs (the robot's speech and movements).

Dialogue: a conversation between a person and a robot as a feature of a human-robot-interaction (HRI). Dialogue rules are written in a file that predefines inputs (what the robot can understand) and outputs (what the robot can answer when he understands something).