Hello boys (and girls, of course). I am back!

Hello boys, I'm back!

Today, I am going to give you a detailed guide to get started with the new Azure Kinect camera programming.

What is Azure Kinect?

Azure Kinect is Microsoft’s latest depth sensing camera and the natural successor of the older Microsoft Kinect One sensor.

In terms of hardware, Azure Kinect is actually a “bundle” of 4 devices:

  • A 4K RGB camera (Color data)
  • A wide-angle depth sensor (Depth data)
  • An inertial measurement unit (Accelerometer – Gyroscope)
  • A microphone array

The Microsoft Camera SDK allows developers to access and combine the raw data of the aforementioned devices.

On top of that, Microsoft has also provided a new Skeleton Tracking kit that detects the 3D coordinates of 32 human body joints.

This is a part of what you’ll be able to do upon completing this tutorial:

Prerequisites

To run the Azure Kinect demos, you need a computer with the following specifications:

  • 7th Gen Intel® CoreTM i5 Processor (Quad Core 2.4 GHz or faster)
  • 4 GB Memory
  • NVIDIA GeForce GTX 1070 or better
  • Dedicated USB3 port
  • Windows 10

To write and execute code, you need to install the following software:

To run the demos in Unity3D, you also need to install the following plugin:

Azure Kinect for Unity3D

Download the Azure Kinect SDK for Unity3D (C# programming language).

Download for Unity3D

Kinect Camera Configurations

No matter which programming language you are using, the process is exactly the same. Before accessing the color, depth, and skeleton streams, you need to get some information about the connected camera. Code-wise, you need to create a device object and specify the desired parameters.

For our demo, we need to support the following parameters:

Device index

As you may know, you can plug many Azure Kinect devices in the same computer. Each device has a unique index. Specifying the device index, the SDK knows which camera to access. In the most common scenario, we only have one camera connected, thus the index would be zero.

Color resolution

The RGB color camera supports 6 color resolutions:

RGB Camera Resolution (HxV)Aspect RatioFormat OptionsFrame Rates (FPS)Nominal FOV (HxV)(post-processed)
3840×216016:9MJPEG0, 5, 15, 3090°x59°
2560×144016:9MJPEG0, 5, 15, 3090°x59°
1920×108016:9MJPEG0, 5, 15, 3090°x59°
1280×72016:9MJPEG/YUY2/NV120, 5, 15, 3090°x59°
4096×30724:3MJPEG0, 5, 1590°x74.3°
2048×15364:3MJPEG0, 5, 15, 3090°x74.3°

Be careful while choosing the preferred resolution — high resolutions may cause significant delays in your application! Moreover, the 4K resolution is only running at 15 frames per second maximum!

Color format

The color format specifies how the color frames will be encoded. For the sake of simplicity, let’s choose the RGBA format: that is, every pixel will be encoded as a set of Red, Green, Blue, and Alpha values.

Depth mode

Similarly to the color resolution, the depth mode affects the frame rate, resolution, and field of view.

ModeResolutionFoIFPSOperating range*Exposure time
NFOV unbinned640×57675°x65°0, 5, 15, 300.5 – 3.86 m12.8 ms
NFOV 2×2 binned (SW)320×28875°x65°0, 5, 15, 300.5 – 5.46 m12.8 ms
WFOV 2×2 binned512×512120°x120°0, 5, 15, 300.25 – 2.88 m12.8 ms
WFOV unbinned1024×1024120°x120°0, 5, 150.25 – 2.21 m20.3 ms
Passive IR1024×1024N/A0, 5, 15, 30N/A1.6 ms

Frame rate (FPS)

The frame rate configuration setting specifies the desired number of frames per second, as long as the aforementioned parameters allow it. For example, if you select the “WFOV unbineed” depth mode, you won’t be able to set it to 30 FPS, no matter how much you want to.

Starting and stopping the camera

Enough said! Let’s bring it all together and write some code to specify the desired configuration and open the camera!

sensor = KinectSensor.Create(new Configuration
{
    DeviceIndex = 0,
    ColorResolution = ColorResolution.ColorResolution_1080P,
    ColorFormat = ColorFormat.BGRA32,
    DepthMode = DepthMode.NFOV_Unbinned,
    FPS = FramesPerSecond.FPS_30
});
sensor?.Open();

In case you want to have the default configuration, simply use:

sensor = KinectSensor.GetDefault();

Stopping the camera is straightforward, too:

sensor.Stop();

Azure Kinect Frames

The color, depth, and skeleton data are bundled into frames. Each frame is a set of raw color, depth, and skeleton data. A new frame is available 30 times per second (or 15 or 5, depending on your configuration). Here is how to access a latest frame:

Frame frame = sensor.Update();

Azure Kinect: Color Data

The Microsoft SDK allows us to access the raw color data in the form of a single-dimension byte array. The byte array is encoded like this:

RGBARGBARGBARGBARGBARGBARGBARGBARGBARGBARGBARGBA...

Here’s how to access the data:

if (frame != null)
{
    if (frame.ColorFrameSource != null)
    {
        byte[] colorData = frame.ColorFrameSource.Data;
        int colorWidth = frame.ColorFrameSource.Width;
        int colorHeight = frame.ColorFrameSource.Height;
    }
}

Azure Kinect: Depth Data

The depth data is encoded as a single-dimension array of unsigned short integer values. Each value represents the distance of the corresponding point in millimeters.

Similarly to the color data, we can access the depth data as follows:

if (frame != null)
{
    if (frame.DepthFrameSource != null)
    {
        ushort[] depthData = frame.DepthFrameSource.Data;
        int depthWidth = frame.DepthFrameSource.Width;
        int depthHeight = frame.DepthFrameSource.Height;
    }
}

Azure Kinect: Body Tracking

Lastly, let’s get to the cool part: body tracking. The new Azure Kinect Body Tracking SDK was developed using advanced Machine Learning AI algorithms. It’s combining the raw color and depth data to accurately estimate the pose of a person.

In theory, there is no limit to the number of people the SDK can track. In practice, I would not recommend avoiding overcrowded environments. First, having more than 10 people within the field of view would make most of their joints invisible to the sensor (thus, less accurate). Second, we are living in the Coronavirus era and we know it’s not safe.

Joking aside, the SDK supports the following joints:

Azure Kinect - Human body joints

Each joint has the following properties:

  • Type – the name of the joint (e.g. “Head”).
  • Tracking State – the tracking confidence  of the joint.
  • Position – the coordinates of the joint (XYZ) in the 3D world space.
  • Orientation – the orientation of the joint in terms of its parent joint.

Here’s how to access the tracked bodies and capture the position of the head joint. Obviously, you can access all of the other joints in the same way.

if (frame.BodyFrameSource != null)
{
    List bodies = frame.BodyFrameSource.Bodies;
 
    foreach (Body body in bodies)
    {
        Joint head = body.Joints[JointType.Head];
        Vector3 position = head.Position;
    }
}

The Unity3D SDK provides samples that visualize the streams and skeletons.

As a bonus, it also includes a more advanced Avateering demo.

Azure Kinect for Unity3D

Download the Azure Kinect SDK for Unity3D (C# programming language).

Download for Unity3D

Summary

In this tutorial, you’ve learnt the following:

  • How to configure, open, and stop an Azure Kinect device.
  • How to access the raw color and depth data.
  • How to access the skeleton data.

So, what else do you want me to cover about the new Azure Kinect sensor? Let me know in the comments below!

‘Til the next time, keep Kinecting!

Cover image credits: microsoft.com

Vangos Pterneas

Vangos Pterneas

Vangos is helping innovative businesses and Fortune-500 companies create amazing digital products. He's an award-winning Microsoft Most Valuable Professional.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.