Hello boys (and girls, of course). I am back!

Hello boys, I'm back!

Today, I am going to give you a detailed guide to get started with the new Azure Kinect camera programming.

What is Azure Kinect?

Azure Kinect is Microsoft’s latest depth sensing camera and the natural successor of the older Microsoft Kinect One sensor.

In terms of hardware, Azure Kinect is actually a “bundle” of 4 devices:

  • A 4K RGB camera (Color data)
  • A wide-angle depth sensor (Depth data)
  • An inertial measurement unit (Accelerometer – Gyroscope)
  • A microphone array

The Microsoft Camera SDK allows developers to access and combine the raw data of the aforementioned devices.

On top of that, Microsoft has also provided a new Skeleton Tracking kit that detects the 3D coordinates of 32 human body joints.

This is a part of what you’ll be able to do upon completing this tutorial:


To run the Azure Kinect demos, you need a computer with the following specifications:

  • 7th Gen Intel® CoreTM i5 Processor (Quad Core 2.4 GHz or faster)
  • 4 GB Memory
  • NVIDIA GeForce GTX 1070 or better
  • Dedicated USB3 port
  • Windows 10

To write and execute code, you need to install the following software:

To run the demos in Unity3D, you also need to install the following plugin:

Azure Kinect for Unity3D

Download the Azure Kinect SDK for Unity3D (C# programming language).

Download for Unity3D

Kinect Camera Configurations

No matter which programming language you are using, the process is exactly the same. Before accessing the color, depth, and skeleton streams, you need to get some information about the connected camera. Code-wise, you need to create a device object and specify the desired parameters.

For our demo, we need to support the following parameters:

Device index

As you may know, you can plug many Azure Kinect devices in the same computer. Each device has a unique index. Specifying the device index, the SDK knows which camera to access. In the most common scenario, we only have one camera connected, thus the index would be zero.

Color resolution

The RGB color camera supports 6 color resolutions:

RGB Camera Resolution (HxV)Aspect RatioFormat OptionsFrame Rates (FPS)Nominal FOV (HxV)(post-processed)
3840×216016:9MJPEG0, 5, 15, 3090°x59°
2560×144016:9MJPEG0, 5, 15, 3090°x59°
1920×108016:9MJPEG0, 5, 15, 3090°x59°
1280×72016:9MJPEG/YUY2/NV120, 5, 15, 3090°x59°
4096×30724:3MJPEG0, 5, 1590°x74.3°
2048×15364:3MJPEG0, 5, 15, 3090°x74.3°

Be careful while choosing the preferred resolution — high resolutions may cause significant delays in your application! Moreover, the 4K resolution is only running at 15 frames per second maximum!

Color format

The color format specifies how the color frames will be encoded. For the sake of simplicity, let’s choose the RGBA format: that is, every pixel will be encoded as a set of Red, Green, Blue, and Alpha values.

Depth mode

Similarly to the color resolution, the depth mode affects the frame rate, resolution, and field of view.

ModeResolutionFoIFPSOperating range*Exposure time
NFOV unbinned640×57675°x65°0, 5, 15, 300.5 – 3.86 m12.8 ms
NFOV 2×2 binned (SW)320×28875°x65°0, 5, 15, 300.5 – 5.46 m12.8 ms
WFOV 2×2 binned512×512120°x120°0, 5, 15, 300.25 – 2.88 m12.8 ms
WFOV unbinned1024×1024120°x120°0, 5, 150.25 – 2.21 m20.3 ms
Passive IR1024×1024N/A0, 5, 15, 30N/A1.6 ms

Frame rate (FPS)

The frame rate configuration setting specifies the desired number of frames per second, as long as the aforementioned parameters allow it. For example, if you select the “WFOV unbineed” depth mode, you won’t be able to set it to 30 FPS, no matter how much you want to.

Starting and stopping the camera

Enough said! Let’s bring it all together and write some code to specify the desired configuration and open the camera!

sensor = KinectSensor.Create(new Configuration
    DeviceIndex = 0,
    ColorResolution = ColorResolution.ColorResolution_1080P,
    ColorFormat = ColorFormat.BGRA32,
    DepthMode = DepthMode.NFOV_Unbinned,
    FPS = FramesPerSecond.FPS_30

In case you want to have the default configuration, simply use:

sensor = KinectSensor.GetDefault();

Stopping the camera is straightforward, too:


Azure Kinect Frames

The color, depth, and skeleton data are bundled into frames. Each frame is a set of raw color, depth, and skeleton data. A new frame is available 30 times per second (or 15 or 5, depending on your configuration). Here is how to access a latest frame:

Frame frame = sensor.Update();

Azure Kinect: Color Data

The Microsoft SDK allows us to access the raw color data in the form of a single-dimension byte array. The byte array is encoded like this:


Here’s how to access the data:

if (frame != null)
    if (frame.ColorFrameSource != null)
        byte[] colorData = frame.ColorFrameSource.Data;
        int colorWidth = frame.ColorFrameSource.Width;
        int colorHeight = frame.ColorFrameSource.Height;

Azure Kinect: Depth Data

The depth data is encoded as a single-dimension array of unsigned short integer values. Each value represents the distance of the corresponding point in millimeters.

Similarly to the color data, we can access the depth data as follows:

if (frame != null)
    if (frame.DepthFrameSource != null)
        ushort[] depthData = frame.DepthFrameSource.Data;
        int depthWidth = frame.DepthFrameSource.Width;
        int depthHeight = frame.DepthFrameSource.Height;

Azure Kinect: Body Tracking

Lastly, let’s get to the cool part: body tracking. The new Azure Kinect Body Tracking SDK was developed using advanced Machine Learning AI algorithms. It’s combining the raw color and depth data to accurately estimate the pose of a person.

In theory, there is no limit to the number of people the SDK can track. In practice, I would recommend avoiding overcrowded environments. First, having more than 10 people within the field of view would make most of their joints invisible to the sensor (thus, less accurate). Second, we are living in the Coronavirus era and we know it’s not safe.

Joking aside, the SDK supports the following joints:

Azure Kinect - Human body joints

Each joint has the following properties:

  • Type – the name of the joint (e.g. “Head”).
  • Tracking State – the tracking confidence  of the joint.
  • Position – the coordinates of the joint (XYZ) in the 3D world space.
  • Orientation – the orientation of the joint in terms of its parent joint.

Here’s how to access the tracked bodies and capture the position of the head joint. Obviously, you can access all of the other joints in the same way.

if (frame.BodyFrameSource != null)
    List bodies = frame.BodyFrameSource.Bodies;

    foreach (Body body in bodies)
        Joint head = body.Joints[JointType.Head];
        Vector3 position = head.Position;

The Unity3D SDK provides samples that visualize the streams and skeletons.

As a bonus, it also includes a more advanced Avateering demo.

Azure Kinect for Unity3D

Download the Azure Kinect SDK for Unity3D (C# programming language).

Download for Unity3D


In this tutorial, you’ve learnt the following:

  • How to configure, open, and stop an Azure Kinect device.
  • How to access the raw color and depth data.
  • How to access the skeleton data.

So, what else do you want me to cover about the new Azure Kinect sensor? Let me know in the comments below!

‘Til the next time, keep Kinecting!

Cover image credits: microsoft.com

Vangos Pterneas

Vangos Pterneas

Vangos Pterneas is a professional software engineer and an award-winning Microsoft Most Valuable Professional (2014-2019). Since 2012, Vangos has been helping Fortune-500 companies, and ambitious startups create demanding motion-tracking applications. He’s obsessed with analyzing and modeling every aspect of human motion using Computer Vision and Mathematics. Kinect programming started as a hobby and quickly evolved into a full-time business. Vangos shares his passion by regularly publishing articles and open-source projects that help fellow developers understand the fascinating Kinect technology.


  • Hi there,
    is this SDK compatible with multiple Azure Kinect synchronized?
    Thanks in advance,

    • Hi Jaime and thanks for your comment. Just like the Azure Kinect C++ SDK, you can set the property “synced images only” to “true”. This way, you’ll be receiving only the synced frames 🙂

  • Thomas says:

    Hi Vangos,

    I’m using your asset, but ran into one issue: the build stops responding when it is closed. I close the sensor in OnDestroy, also tried in OnApplicationQuit, to no avail. Not closing the sensor also makes the application hang. The only that fixes it is not opening the sensor in the first place – which of course is not what I want. Got some pointers?


  • Jennie says:


    How can I add an avatar dynamically in Demo_Kinect4Azure_Avateering Scene?
    I can’t set the AvatarRoot in the script.

    • Hi Jennie. You can instantiate a new prefab or even have the avatar visible/hidden. If that’s not what you mean, please elaborate so that I can provide more information 🙂

  • Ostap Onyskiv says:

    Hello! Is it possible to mix the rgb camera feed with the 3d scene? The idea is to replace the whole body of a person (except of the head) with a 3d character.

    • Yes, it’s feasible, but it’s quite complex. You’ll need to use the User frame. The User frame specifies whether a particular pixel belongs to a player. After acquiring the User frame information, you’ll need to manually remove the pixels below the Neck joint.

  • Erwan says:

    Hi Sir,
    I’m working on a school project and your blog helped me out a lot so far so thanks already!
    I need to use the body index in a unity project and i have this “Pixels not aligned to stride of each lines” error when i try to get it.
    Here is my code to retrieve the depth pixels, and just under it, the my attempt at getting the body index.

    //Retrieves depths from depth image
    ushort[] depthPixels;
    depthPixels = frame.Capture.Depth.GetPixels().ToArray();

    //Retrieves body index from frame
    ushort[] bodyIndex;
    bodyIndex = frame.BodyIndexMap.GetPixels().ToArray();

    I dont know what to do i have the error on this last line. Could you help me? Thanks a lot

    • Hi Erwan! You are one step ahead — I was actually planning to cover this topic on a future Masterclass.

      To answer your question, the body-index data array is an array of bytes. There is no reason to use a different data type since the body-index array only contains small integers.

      Here is the working code:
      byte[] bodyIndex = bodyIndex.GetPixels<byte>().ToArray();

  • Richard Nockles says:

    Love this! thanks! I’m really interested in Hand tracking. do you have any tips or insight into hand movement?

  • Hans says:

    Hi, any ideas on Azure kinect and external rgb camera as a stereo pair sync for trigger launch and frame sync for post?

    • It’s doable, as long as we know the relative position of the external camera in terms of the Kinect. We can then sync their frames and map from one coordinate space to the other. It’s not supported by the SDK, so there’s a lot of custom work involved. In case you need help with a project, feel free to drop us a line.

  • Kohei Yoshimoto says:

    When I set the point cloud step = 1 and 2, FPS isn’t 30FPS.
    Do you have any tips and solutions to this problem?

    • Hello Kohei. Creating a point cloud is a resource-intensive task, especially if you are doing it in real-time. The step factor specifies the portion of the point cloud that will be created. The more points you add, the higher the processing needs would be. Steps 1 and 2 would consider all and half the points respectively. That’s why the recommended step is 4.

    • Kohei Yoshimoto says:

      Thank you for your reply.
      When the point cloud step was set in the sensor configuration, the update was not called at 30fps although the very low processing.
      Is it set to automatically reduce fps?
      Is there any way to change it?

      I want to get only the coordinates of all points at 30fps.
      There is no need to overlay the points.
      there any other good way?

    • Generating the point cloud is what takes so long. That’s why Microsoft recommends the step factor to be at least 4.

      If you are using a supported machine, the SDK should be running at 30 FPS.

    • Kohei Yoshimoto says:

      How many point clouds are generated when point cloud steps 3 and 4?

    • One point cloud is generated. The higher the step, the smaller the number points.

  • We have kiosks with 1660s installed. Will these work for basic background subtraction, or are 1070s absolutely required? Thanks.

  • Maxim says:

    Hi there!
    Though I can see it says I’d need windows, would there be a way to use this software/equipment on a Mac?

  • KimMyongKil says:

    BoneFrame data with Mater and two Sub cameras
    Is there a way to merge and get it?

    Obtained separately, direction not synchronized and recognized as the other two bins.

  • Jonathon says:

    Is there any way to get a 60fps playback? The 30fps seems to add quite a delay on my machine.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.