Pepper plays ball-in-the-cup

We made Pepper the robot play games of skill with AI (Artificial Intelligence) at SBRE
Pepper plays ball-in-the-cup

SBRE AI Lab (Artificial Intelligence Laboratory) taught Pepper how to successfully throw a ball in a cup and a dart at the dartboard (they are exactly the same dynamic problem) using dexterity and a bit of dynamical systems theory. Here is the story of what it takes to match elementary games and robotics.

The state-of-the-art in AI and Machine Learning technology is progressing quickly. Impressive looking (and sometimes sadly hilarious) demos can be seen in the media. But what can we really expect from social robots such as Pepper? How can a non-programmer teach Pepper new skills? What really is Pepper’s ability to acquire and apply knowledge and skill? And could Pepper totally cut loose operators to fulfill their mission at best?

The video above shows an operator giving Pepper a demonstration of how to play the ball-in-a-cup manipulating the robot's arm.

A state-of-the-art method to teach games of skills

An operator starts by providing Pepper with a demonstration of how the task is solved (throw the ball in the cup or the dart in the middle of the target). The operator physically executes a movement on Pepper the robot. Then, the robot has to nearly autonomously self-improve, until it gets the movement right!

How does Pepper memorize the movement to learn and to improve?

The members of the AI Lab at SBRE used DMPs (Dynamic Movement Primitives) to model the task that Pepper should imitate. DMPs are a state-of-the-art methods to program the robot to learn a complex movement [1]. They are a mathematical formalization of primitive actions. When throwing a ball in a cup or a dart to a target, Pepper’s joints perform trajectories, very much like an animation run on the robot. In order to succeed (land the ball in the cup, or hit the target in the center) all of Pepper’s joints must move in harmony, following a very critical trajectory. DMPs model the time dependence of Pepper’s joints in this trajectory, and turn out to be an efficient way to make the robot learn the movement. Pepper measures the distance between the ball and the cup using its cameras and sensors. Progressively, after very many trials, Pepper reduces the gap between the projectile and the target.

From baseline actions to complex behaviour and optimized trajectories

As a darts champion, to play the perfect shot requires a few parameters: keeping a dart level, having a solid and relaxed grip, fingers that are in the grip should stay relaxed, a stable stance to fix the stare, the dart and the target inline, the correct distribution of the player's weight, a good balance, etc.
To find out their champion style, dart players have to try out many grips, stances and throwing. Dart players have to master the basics and bring them together to execute a complex perfect throw.

So too, Pepper has to master goal point, joint positions, velocity and acceleration to perform a perfect throw.

In addition of large number of parameters, human factors also play a part. The user’s demonstration of the task on the robot can be inaccurate. The better the initial throw the better the learning. (In Machine Learning (ML), cost functions are used to estimate the error). In addition, the robots’ sensors and controls have limitations, and the environment may also contribute to the performance of the robot.

Basic research and robot applications, what can we expect today?

Machine learning worldwide has recently made a significant leap forwards. Basic research programs such as skill-learning on Pepper widens the scope of opportunities of robots applications. But operating successfully in the real world is much than just quickly finding the optimal trajectory to throw a ball in a cup. Many learning tasks to learn are harder than this, because correct behavior depends of the context, so Pepper would have to practice in each possible context. Playing nice human-robot interactions in a crowded mall includes a lot of changing items in its environment.
So what is next? Could Pepper get a high score at games of skill tournaments? Maybe. But can Pepper learn how to clean up the kitchen, or how to act in a social group? Not yet. Much research is still needed to build AI that understands the world even remotely as well as for example a two-year-old child does.
In the meantime, teams at SBRE are working hard to make programming of Pepper as intuitive and simple as possible, because SBRE partners should not have to master classical mechanics and dynamical systems theory to teach Pepper the latest trendy moves!


Nikolas HEMION, AI expert at SBRE

Nikolas HEMION graduated with a Ph.D in Intelligent Systems in 2013 at the Research Institute for Cognition and Robotics, Bielefeld University, Germany. He joined the AI Lab at SoftBank Robotics Europe as senior researcher, and was appointed as director of the AI Lab in 2015 [2].
He supervised the Pepper Bilboquet Project and gives us an interview and presents highlights of this project.

Let us know how the Pepper Bilboquet Project started and where did it take you?

“The Pepper Bilboquet project began in March 2016 when Asya GRECHKA [3] did her internship at SBRE AI Lab. As her supervisor, I instructed her and set her goals, she did the hard work to implement it on Pepper and spent many, many hours with the robots, throwing ball after ball after ball, until eventually, Pepper landed the balls in the cups!
The work that led up to the “Pepper robot learning ball-in-a-cup" video was Asya's internship. After Softbank Robotics put the ball-in-a-cup video online, tens of thousands of people watched and shared the video.
Later, a lot of follow up work was done over the last three years, including collaboration with researchers at the CITEC (Cluster of Excellence Cognitive Interaction Technology) [4] at Bielefeld University in Germany. This collaboration is still ongoing, as we are trying to make teaching Pepper new tricks easier.”

Which are the most important concepts in robot skill learning?

“The main notions to consider in a Machine Learning project like this one are the way movement is represented (as a DMP in our case) and how learning is performed (choice of learning algorithm) in addition to the cost function. The difficulty really lies in having an intuition for these choices. DMPs for example are almost certainly not a good choice for making a robot learn how to grasp an object. There are other algorithms in the Reinforcement Learning (RL) literature that are much better adapted to these tasks. Unfortunately, there is no easy recipe. It takes a lot of experience in machine learning, and a very deep understanding of the different algorithms (and there are more than just a few) that can be used. If you want to do machine learning in your project, beyond simple object recognition, consider involving a machine learning expert. Be aware there is no easy solution for everything.”

Is the choice of the library critical in this kind of project?

“No, the library is just a tool. It’s the mathematical concepts (movement representation, learning algorithm, cost function, etc) that make it work. We were lucky that a colleague and good friend, Dr Freek Stulp [5], already spent a lot of work implementing DMPs [6] easily, and this made our work much faster. But the same could be implemented with a large number of libraries and programming languages.”

What are the main obstacles you had to overcome and how did you deal with them?

“The cost function actually. Getting a close result with the ball to the cup was easy, but making it really hit in the end, and reliably for that matter, was quite a challenge. A lot of factors had to be just right: the cameras had to have a very high frame rate to track the ball accurately, lots of special cases had to be covered (what to do if the ball is occluded by the robot's hand in the image, what to give as a score when the ball bounces off the corner of the cup, ...). This took a while to get right.”
To make it easier to use, we developed a way to teach Pepper with the help of a GUI (the Bilbo project, a software application on github. The main reason was to get rid of the cost function part, because that was really the biggest hurdle. We found that we could teach Pepper the bilboquet without having to define a cost function altogether! Instead, we just score each movement of the robot using a simple GUI, like you rate items on Amazon (1 star: not good. 5 stars: great). This actually works even better than tediously designing a cost function, and is so simple that it could be used by laymen. That's what we showed in a scientific user study, in collaboration with our colleagues from CITEC.
That research led to a publication in the journal Frontiers in Robotics and AI [7], including a video (see above). “

Video above: a User Study on Robots Skill Learning Without a Cost Function: Simple optimization of Dynamic Movement Primitives via Naive User Feedback; Video on line.

What kind of advice would you give to a developer involved in Machine Learning project with Pepper the robot, to broaden their basic knowledge?

“That heavily depends on the project they are thinking of doing. If they want to do deep learning to implement a state-of-the-art object recognition for their purposes (for example recognize a very specific object to be used in an interaction with Pepper), then they will find a lot of tutorials on how to use standard machine learning frameworks, such as Tensorflow [8], on the internet. Simply typing "TensorFlow for object recognition tutorial" in Google. Add a few of the keywords for your own project, and you might be lucky enough to find other people who have worked on something similar. The machine learning community is very active on social media, and it's easier than ever to find material to start learning.
When it comes to more non-standard problems (such as using machine learning to make robots move in a smart way, which is definitively non-standard), things get a bit more complicated. For this, you really can’t get around reading scientific papers and follow the work of other researchers in the robot learning community. The OpenAI gym [9] is a nice place to start on this topic, but it's a steep learning curve.
For those who are serious about learning, YouTube has thousands of recordings of presentations by scientists talking about their work or giving lectures on the topic.”

Scoring Pepper on the tablet
In the final project a human operator can score Pepper's movement in the robot's tablet and help Pepper to correct behaviour and minimize mistakes.

  1. Learning Motor Primitives for Robotics; Department of Empirical Inference and Machine learning, Max Planck Institute for Biological Cybernetics ICRA 2008 (International Conference on Robotics and Automation)2008 ![vidéo] (
  2. Nikolas HEMION ‘s bio and publications are available here:
  3. Asya GRECHKA Pursued and achieved a double degree in mathematics and computer science and then a Master of Technology (M.Tech.), Robotics, Operations Research, Decision Theory, Machine Learning at Université Pierre et Marie Curie (UPMC) - Sorbonne Université (Paris). As a Robotics Engineer at SBRE, she was involved in the evaluation of Skill Learning Algorithms on Pepper and she also worked in research and application of state-of-the-art technology in voice recognition. She is now an AI Consultant for Kickmaker (Station F).
  4. CITEC is a central academic department of Bielefeld University where research groups and faculties lead interdisciplinary research activities to adapt future technology to the human user. There Biology, Linguistics and Literary Studies, Mathematics, Psychology and Sports Science, and Technology serve developing technical systems that are intuitive and easy to operate for human users. See
  5. Freek STULP, Head of Department, Department of Cognitive Robotics, Institute of Robotics and Mechatronics, German Aerospace Center (DLR). Dr F. STULP's list of publications is maintained on his Google scholar profile
  6. C++ library for Function Approximation, Dynamical Movement Primitives, and Black-Box Optimization
  7. Vollmer A-L and Hemion NJ (2018) A User Study on Robot Skill Learning Without a Cost Function: Optimization of Dynamic Movement Primitives via Naive User Feedback. Front. Robot. AI 5:77. doi: 10.3389/frobt.2018.00077 Go and see
  8. TensorFlow is an open-source software library for dataflow programming. Their official website provides tutorial on Image Recognition. Go and see
  9. Gym is a toolkit for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball. Go and see


Learning from demonstration
Related terms: Imitation learning, Kinesthetic teaching, User-defined tasks

  1. Imitation Learning is a sequential task where the learner tries to mimic an expert's action in order to achieve the best performance.
    (SOURCE: Jan. 2018, Global overview of Imitation Learning;
  2. A user gives the robot a demonstration of how the task is solved by physically performing a movement on the robot in order to make the robot learn the given task.

Dynamical Movement Primitives (DMPs)

  1. Method of trajectory control, first introduced in 2002 by Auke Ijspeert, Jun Nakanishi, and Stefan Schaal from the University of Southern California (USC), which aims to represent complex motor action in a flexible way. DMPs produce a representation that facilitates the optimization of trajectories to solve tasks.
  2. Differential equations representing a trajectory from start to end positions of a given joint, or group of joints, involving position, velocity and acceleration.

Pepper joint

  1. An Articulation, such as head, shoulder, elbow, wrist, hand, hip and knee, rotating around the joint axis (pitch, roll and yaw).
    For further details see Doc NAOqi