Exploring options for converting multiple single camera skeletons to 3D skeletons
To simplify the problem I have created python scripts to work outside ROS as a first approach to the problem:
- Script that reads the rosbag, converts and saves the Image ros messages in a folder.
- Script that reads the folder where the images from the different cameras are saved, uses openpose to determine the 2D skeleton pose for each image and saves, for each camera, a .npy file with the coordinates of the skeleton points. It also creates a new folder with a sub-folder for each camera where the images with the skeletons are saved.
- Script that reads the .npy file for each camera, and reads the calibrated xacro of the systems to obtain the calibrated transformation tree between the sensors and will output the 3D skeleton.
During this month, I also spent the month studying and experimenting with several options for converting multiple 2D skeletons into a single 3D skeleton. Some of the more popular approaches are:
- Single frame estimation using CNN or GCN
- Video based estimation using temporal or spatio-temporal networks
- 3D pose triangulation
You can find out more about the implementation here: https://github.com/danifpdra/hpe.git .
I've also recorded a new rosbag with several people in scene to study the methodology in a multi-person context.