ALVisionRecognition

NAOqi Vision - Overview | API | Tutorial


What it does

ALVisionRecognition is a vision module in which the robot tries to recognize different pictures, objects sides or even locations learned previously.

How it works

This module is based on the recognition of visual key points and is only intended to recognize specific objects that have been learned previously.

The learning process is described in the Choregraphe Video monitor documentation:

Learning process

Teaching NAO to recognize objects . With few minutes experience, the robot should be able to learn any new thing in less than 30s.

Besides, you can also have the robot learn new objects via image files by using the ALVisionRecognitionProxy::learnFromFile function provided in the module.

Detection process

To detect known objects, use:

Maximal number of objects to detect can be set using ALVisionRecognitionProxy::setMaxOutObjs . This is to use when you want the robot to recognize several objects at the same time instead of only one. For example, suppose that you taught your robot to recognize your cup of coffee, your book, and your gamebox. Now, if you run the recognition process on the robot, by default, it will return one recognized object at a time. If you want your robot to return more than one object, you should call setMaxOutObjs() of ALVisionRecognitionProxy to set the desired number of recognized objects for the recognition process.

How it stores information about known objects

Information on each object is stored in one xml file accompanied with its respective images. The xml file and image files are named after the hash value created at the time of object’s insertion to the database. Each object has one unique hash value. In the xml file, you find all meta-data of the object, such as name, tags, original file, boundary, descriptors’ values. The database is a folder containing several xml files and image files. By default, databases are located on the robot in “/home/nao/naoqi/.local/share/naoqi/vision/visionrecognition/” folder. The default database name is “current”.

How it reports the detection results

Like for all other extractor modules, recognition results are placed in the ALMemory .

When something is recognized, you see an ALValue (a series of fields in brackets) organized as explained here:

The “PictureDetected” key is organized as follows:


             
              [
  TimeStamp,
  PictureInfo[N]
]

             
            

with as many PictureInfo tags as things currently recognized.

TimeStamp

This field is the time stamp of the image that was used to perform the detection.


             
              TimeStamp =
[
  TimeStamp_Seconds,
  Timestamp_Microseconds
]

             
            

PictureInfo

For each detected picture, we have one PictureInfo field:


             
              PictureInfo =
[
  Label[N],
  MatchedKeypoints,
  Ratio,
  BoundaryPoint[N]
]

             
            
  • Label : organized names given to the picture (e.g. [“cover”, “my book”], or [“fridge corner”, “kitchen”, “my flat”]).
  • MatchedKeypoints is the number of keypoints retrieved in the current frame for the object.
  • Ratio is the number of keypoints retrieved in the current frame for the object divided by the number of keypoints found during the learning stage of the object.
  • BoundaryPoint is a list of points coordinates in angle values (radian) representing the reprojection in the current image of the boundaries selected during the learning stage.

             
              BoundaryPoint =
[
  x,
  y
]

             
            

Performances and Limitations

Performances

  • The recognition process is robust to distance (down to half and up to twice the distance used for learning), angles (up to 50° inclination for something learned facing the camera), light conditions and rotation. In addition, learned objects can be partially hidden for the recognition stage.
  • Performance of this module varies depending on the resolution of the input image and on the size of your database. For a better performance on your robot, try with a resolution of 320x240 and a database not more than 50 objects.

Limitations

  • This module is based on the recognition of key points and not of the external shape of the objects, so it can’t recognize untextured objects.
  • Currently it is not designed for recognizing objects classes (e.g. a cookie box) but objects instances (that cookie box).

Getting Started

The easiest way to get started with ALVisionRecognition is to use Choregraphe. Learning an object can be done through the Teaching NAO to recognize objects section. Then, learned object can be recognized by using the Choregraphe Vision Reco. box.

You can also interact with ALVisionRecognition via Choregraphe Script boxes. For further details, see: Calling APIs of ALVisionRecognition from a Choregraphe Script box .