CROWDBOT: Safe Navigation of Robots in Dense Crowds.
Today’s moving robots stop when a human, or any obstacle is too close, to avoid impact. This prevents robots from entering packed areas and effectively performing in a high dynamic environment. CROWDBOT aims to fill in the gap in knowledge on close interactions between robots and humans in motion.
1. Enabling mobile robots to navigate autonomously and assist humans in crowded areas.
CrowdBot is a European Union Horizon 2020 (EU H2020) project for research and innovation in the area of Information and Communication Technologies (ICT), and funded by the European Commission for the period of January 2018─June 2021. The official name of the project is “Safe Navigation of Robots in Dense Human Crowds”, which is also the focus area of our research activities. The name “CrowdBot” is an amalgam of human “crowd” and machine “bot.” We also lovingly refer to any of our four robots as a CrowdBot.
CROWDBOT gathers the required expertise to develop new robot capabilities to allow robots to move in a safe and socially acceptable manner. This requires achieving step changes in a) sensing abilities to estimate the crowd motion around the robot, b) cognitive abilities for the robot to predict the short term evolution of the crowd state and c) navigation abilities to perform safe motion at close range from people.
Our partners in this project are:
- The Institut national de recherche en informatique et en automatique (Inria), the only French public research body fully dedicated to computational sciences.
- ETH Zurich, is a swiss academic institution with an excellent track record of research and teaching since 1885.
- The Ecole Polytechnique Fédérale de Lausanne (EPFL) is one of the two Swiss Federal Institutes of Technology.
- RWTH Aachen University is one of Germany’s major technical universities.
- The University College London, it was founded in 1826 and nowadays has around 38,300 students and 12,420 staff.
- Locomotec GmbH, it was founded in March 2010, its business objective is to develop and commercialize technology in the domain of mobility and locomotion.
1.1 CROWDBOT Robots
Pepper is the world’s first social humanoid robot, it was optimized for human interaction and is able to engage with people through conversation and his touch screen. Pepper is available today for businesses and schools. Over 2,000 companies around the world have adopted Pepper as an assistant to welcome, inform and guide visitors in an innovative way. His curvy design ensures danger-free use and a high level of acceptance by users.
The CuyBot is a new type of robot currently developed by Locomotec that can safely execute a range of service and logistics tasks. Qolo is a device that combines active powered wheels and passive exoskeleton for bringing about a compact, light-weighted wearable robot for wheelchair users. The smart wheelchair serves as a platform for implementing and testing shared control robot navigation algorithms.
CrowdBot project combine 5 main technological thematics :
- Navigation in crowds
- Perception and Tracking
- Mapping and Localization
- Crowd Prediction and Simulation
- Safety and Robust Design
Each of the partners working on the project has their own expertise on the different topics of these technologies. Their collaborations and exchanges of knowledge enable transversal advances and major innovations in the field of crowd robot navigation.
2.1 Navigation in crowds
Navigating through human crowds is a tough challenge for a robot. Crowds can cause severe sensor occlusions and often don’t leave much free space for the robot to move in, leading to what’s known as the « freezing robot problem ». Our goal in CROWDBOT is to develop navigation and motion planning algorithms that allow the robot to work with the flow of the crowd to get to its destination.
Short-term vs. Long-term Motion Planning
Robot-crowd navigation is more than just following a pre-computed path. We need to consider the changing movements of the crowd as a whole as well as the interactions of the individual pedestrians with each other and with the robot. This means that we need to make motion plans that operate at different time scales. Short-term motion needs to ensure safety and compliance with the immediate crowd around the robot. On the other hand, long-term motion plans can decide when and how to engage with different groups of pedestrians within the crowd, e.g. by detouring around, following or inching through.
We’ve all heard of autonomous systems (self-driving cars etc.), but in some cases fully autonomous systems aren’t really what the users want. For example, most wheelchair users do not want to be treated simply like precious pieces of cargo and ferried around automatically from point A to point B. Instead they want to be empowered to get around by themselves so they can continue with their activities of daily life as independently as possible. However, there are some instances when environmental barriers such as narrow passageways, rough terrain and indeed crowds of oblivious pedestrians rather hamper their efforts. In cases such as these, we propose a “shared control navigation”, whereby the wheelchair itself is able to actively assist the user in safely and efficiently maneuvering through these difficult situations, without taking away their overall control authority.
In terms of CROWDBOT navigation, a robot requires fast and safe methods for ensuring collision avoidance under normal operational conditions. The term reactive navigation refers to the ability to operate normally when dynamic obstacles surrounding the robot behave outside of any planner predictions.
For these situations, we have developed a dynamical system based approach where an initial linear dynamical system (desire motion given by a high-level planner) is modulated around obstacles. The modulation works in real-time and dynamically around any number of obstacles.
Making a robot navigating in a human environment requires social capabilities that need to be integrated on the robot, some of them are fulfilled by the navigation techniques described before. Nevertheless, others social capabilities need an special approach that until now have not been enoughly explored in the robotics community.
When a robot cannot continue its navigation due to the presence of people, for example in a cocktail party, it experiences a problem called the “freezing problem”. Our approach is to consider the people as interactive agents who can react to social cues of the robot to clear its path. Some of these cues could be the robot asking for permission to pass when people are blocking its way, and/or a physical touch by the part of the robot could be useful, for example when the robot is in a noisy environment and the person does not listen to the petition of the robot to clean the path. Then when the path is free, the robot could continue its navigation.
2.2 Perception and Tracking
Visual sensing capabilities are extremely important for a robot to make navigation decisions. The objective here is to assist the robots by enabling them to sense the surroundings by detecting and tracking people in their neighbourhood. Since object detection could be challenging in dense crowds, flow estimation techniques will also have to be used to associate moving objects and their parts.
Our robots are equipped with RGB-D cameras, which provide them vision. We detect pedestrians using the rich information from images, and localize the pedestrians in 3D space using the measured depth. Advanced person analysis, including pose estimation and re-identification, is also carried out using image data.
Our perception pipeline is designed following the well-known tracking-by-detection paradigm. Under this paradigm, objects are detected for each frame independently, and a tracking algorithm is used to associate detections that belong to the same object instance over multiple frames.
A 2D laser scanner scans an area at high frequency, measuring distance at fixed angular resolution. Empowered with deep learning, our robots detect pedestrians from range data collected by 2D laser scanner. Thanks to its large field of view, with a front and a back facing scanners, we obtain 360 degree pedestrian detection.
When the crowd density becomes very large, conventional tracking-by-detection systems based on RGB data might fail due to heavy occlusion. In such cases it would be helpful to fall back to low level vision techniques such as optical flow or scene flow. Optical flow provides motion information at the pixel level for a small neighbourhood within an image. This information can be useful to make motion predictions in cases where the tracker fails due to occlusions, for example when people close to the camera obstruct the field of view. For this purpose, we explore the benefits of integrating optical flow in our pipeline under the purview of CROWDBOT.
Detection from multiple sensors are collected together and passed into a tracking module. The tracker associates detection at different times and links them into trajectories of different individuals.
In CROWDBOT, we deal with dense crowd scenarios where there could be heavy occlusions. In order to deal with such scenarios, multiple input sources such as RGB-D and LIDAR have to be leveraged. Our tracker follows a modularised approach that allows for easy integration of multiple sensors.
2.3 Mapping and Localization
For a robot to plan a path from one point to another, it needs a map of the environment. Simultaneous localization and mapping (SLAM) allows a robot to build a map online and, at the same time, localize itself within that map. SLAM accounts for uncertainties and errors that arise from imperfect sensing and actuation. However, in crowded environments, additional challenges arise. Dynamic obstacles in the sensor field of view can be wrongly incorporated into the map. They can also cause the robot to become lost if there are too many occlusions.
In CROWDBOT, we take an active SLAM approach which balances the robot’s need to maintain good localization within parts of the map that it has already seen (i.e. areas with fewer dynamic obstacles), and the need to explore new areas to complete the map. We also conduct a pre-filtering step over the incoming sensor scans, in our case 360° 2D LIDAR, to remove points that are deemed to be returned from dynamic obstacles rather than from the static environment. This way, the robot builds clean, coherent maps that can be used for navigation.
When operating in a dense crowd, the reality is that a robot may often be unable to sense the static environment around it due to severe occlusions from the crowd. This raises the chances for experiencing what’s known as « the kidnapped robot scenario ». Unfortunately, existing localization solutions have been shown to perform very poorly under these circumstances, i.e. when prior pose information is unavailable.
Our goal for CROWDBOT localization is to achieve a prior-free solution that is fast to generate an accurate estimate of the robot pose at initialization and is also robust to partial or temporary sensor occlusions. Thus, our proposed approach is based on map-matching, as opposed to scan-matching. Furthermore, we use a branch-and-bound search to improve the computational complexity of the matching routine and enable real-time operation.
2.4 Crowd Prediction and Simulation
For an autonomous or semi-autonomous robot to navigate in a crowded environment safely, having a prediction of future behaviors of surrounding agents is unavoidable. Hence the Crowd Prediction module is in charge of estimating plausible future locations of these agents by considering possible interactions with each other and with the environment as well. The output of the system is a set of predicted trajectories and could be considered in the navigation process in certain situations.
In CrowdBot we have designed and implemented a data-driven prediction system that can learn from the observed trajectories in the environment to improve the prediction. This system that is called “Social Ways” uses generative adversarial networks to map a given input of N observed trajectories corresponding to N detected agents into K various sets of N predicted trajectories.
Our technologies need tools to evaluate, test, and benchmark their performances. Such tests require full control over the environment which is possible using simulation. However, implementation of a simulated crowd is not an easy task as the realism of such crowds is quickly questioned. Thus, we designed a simulation tool that uses virtual reality to immerse both (real) robots and humans in a virtual world.
A realistic simulated environment
Our simulation tool provides the possibility to simulate a crowd of realistic characters moving in realistic environments shared with our robots. To do so, we use various crowd simulation techniques (RVO, Vision Based, PowerLaw, etc.) and high quality 3D models. Our 3D engine, Unity, is coupled with ROS and implements sensor simulation (LiDAR, RGB-D, ultrasound, etc.) and robot control. Using its physical engine, we can simulate a proper robot motion and report on collisions with the crowd.
How Virtual Reality can help robots navigate crowds?
It is crucial to evaluate the capacity of robots to move safely in close proximity with humans. But testing such capacities in real conditions may raise risks of collisions between the robot and the experimenters or the participants of tests. To avoid such risk, CrowdBot is exploring the use of Virtual Reality to perform such tests. The principle is illustrated above: whilst the robot and the human remain physically separated, both the robot and the human perceive one another as if they were face to face, as illustrated on the right image.
2.5 Ethical & Safety Measures
In this work, we focused on the identification of foreseeable hazards when a robot navigates in crowded environments and/or engage in social interaction with pedestrians. We developed a list of hazards considering both physical as well as psychological harms, all detailed in D6.1. With respect to physical contacts, we distinguished them into “intended” and “unintended”. We further divided contacts into “from robot to human” and “from human to robot”. The types of physical contacts identified include collision, squash, push, swipe, drag, and touch. As far as psychological hazards are concerned, we considered potential harms deriving from robot presence, appearance, motion, physical contacts, and social capabilities. In addition, in collaboration with the Crowdbot robots’ designers, we performed a preliminary risk estimation exercise with respect to the physical and psychological hazards related to powered wheelchairs, the platform QOLO, the Pepper robot, and the cuyBot.
In the framework of this work, the Ethical and Safety Advisory Board (ESAB) has been activated and two teleconference meetings organized in order to discuss the crowdbots scenarios and the ethical and legal implications of crowdbots. The role of ESAB is to advise the consortium on risk assessment of project experiments; design of ethical protocols and available standards.
3. Next steps
Currently, we are in the middle of the project. We are integrating the different modules into one system and implementing it on the robots used in the robots of the project. Also, we are planning on making experiments both in simulation and in real life situations to test and validate our technologies. One of our final goals is to provide useful guidelines for designing robots and policies for navigation of robots in crowded human environments.
Finally, remember that you can have a better navigation ready to use with Pepper using the Pepper’s OS update 2.9.4 thanks to its Visual Slam module and new Goto API.