November 2022

Year 3

Posted by Daniela Rato on December 9, 2022

Building a ROS node for Human Pose Estimation using OpenPose as an off-the-shelf 2D algorithm

The HPE node subscribes to image topics define by the user in the launch file. If the user whishes to use this algorithm in multiple cameras, it is advised to throttle the image topics to 5fps in the launch file (an example is already provided), or else the skeletons will appear slower than the source images.

This node publishes the skeleton in two formats: for visualization, an image is published with the skeleton drawn in it. The node also publishes a custom message with the keypoint information and the detection score for each keypoint.

The following image represents the HPE package.

...

As mentioned before, custom messages were created to facilitate the comprehension of the skeleton and eventual subscription by other nodes.

  • Keypoint 2D custom message
  •                         float32 x
                            float32 y
                            float32 score
  • Person 2D custom message
  •                         std_msgs/Header header
                            keypoint2D[] keypoints # Array of keypoints
  • Person 2D list custom message (not tested)
  •                         std_msgs/Header header
                            person2D[] persons

The following video shows the node working with a recorded rosbag.

Future work and challenges

The following tasks regarding this human pose estimation package is to test and make small modifications to a multi-person context. This also means that we must find the correspondence between each multiple people in each camera.

The following task will be to develop an algorithm to build and publish the 3D skeleton from the multiple 2D skeleton on each image.

There are some challenges to be addressed to accomplish these tasks, namely how to manage the CUDA device (or devices) to enable both the 2D and 3D HPE inference. Also, choosing an appropriate network that receives as input the 2D skeletons and camera extrinsics and outputs the 3D skeleton. Aditionally, to build the 3D skeleton, it is necessary to define a set of rules for when a person in partially or totally occluded in one or more views and how to make the correspondence between the same keypoint in different images.

You can find out more about the implementation here: https://github.com/danifpdra/hpe.git .