The LightBuzz SDK produces 2D and 3D skeleton joint coordinates using Color and Depth data.
All sensor types produce 2D joint positions, also known as screen-space coordinates.
2D coordinates are relative to the color camera frames. Each coordinate has a horizontal (X) and vertical (Y) value. The top-left corner of the color frame is the reference point. So, if the camera resolution is, e.g., 1280×720, all 2D coordinates will be within that range.
2D coordinates are measured in pixels.
Depth cameras (RealSense, Structure, Kinect, LiDAR, OAK-D) can also produce 3D joint positions, also known as world-space coordinates. 3D coordinates are relative to the camera, so the reference point is the camera itself. Each coordinate has a horizontal (X), vertical (Y), and depth (Z) value.
X and Y values can be either positive or negative, as shown in the image below. Z values can only be positive.
3D coordinates are measured in meters.
Depth cameras are great for tracking objects in 3D; however, infrared technology is not always reliable and can’t “see” behind obstacles. To solve this problem, the LightBuzz SDK can post-process the depth feed and produce more accurate humanoid pose estimations.
Without enhanced depth:
With enhanced depth:
To turn the enhanced depth setting on and off, use the following property:
Sensor sensor = Sensor.Create(/* configuration */); sensor.UseEnhancedDepth = false;
Since version 5.0, enhanced depth is enabled by default.
3D coordinates from 2D videos
The Enterprise version of our software allows you to capture 3D coordinates from plain RGB videos, webcams, and images — without the use of depth cameras. If you are an Enterprise customer, there’s nothing you need to do to enable that feature. The 3D joint coordinates will be automatically populated if you select the Webcam or Video sensor types!