During the past few months, I have been heavily experimenting with the Intel RealSense D415 & D435 depth camera. Today, I am going to show you how to easily transform between different coordinate systems. RealSense D415/435 is a low-cost device that can enhance your applications with 3D perception. Its technical specs are displayed below:

Use Environment | Indoor/Outdoor |
---|---|

Depth Technology | Active IR Stereo (Global Shutter) |

Main Intel® RealSense™ component | Intel® RealSense™ Vision Processor D4 Intel® RealSense™ module D430 |

Depth FOV (Horizontal × Vertical × Diagonal) | 86 x 57 x 94 (+/- 3°) |

Depth Stream Output Resolution | Up to 1280 x 720 |

Depth Stream Output Frame Rate | Up to 90 fps |

Minimum Depth Distance (Min-Z) | 0.2m |

Sensor Shutter Type | Global shutter |

Maximum Range | ~10.0m |

RGB Sensor Resolution and Frame Rate | 1920 x 1080 at 30 fps |

RGB Sensor FOV (Horizontal x Vertical x Diagonal) | 69.4° x 42.5° x 77° (+/- 3°) |

Camera Dimension (Length x Depth x Height) | 90 mm x 25 mm x 25 mm |

Connectors | USB 3.0 Type – C |

Mounting Mechanism | One 1/4-20 UNC thread mounting point Two M3 thread mounting points |

## What is Coordinate Mapping?

One of the most common tasks when using depth cameras is mapping 3D world-space coordinates to 2D screen-space coordinates (and vice-versa).

The coordinate system in the 3D world-space is measured in meters. So, a 3D point is represented as a set of X, Y, and Z coordinates, which refer to the horizontal, vertical, and depth axis respectively. The origin point (0, 0, 0) of the 3D coordinate system is the position of the camera.

The coordinate system in the 2D screen-space is measured in pixels. A 2D point is represented as a set of X and Y coordinates, which refer to the positions of the leftmost and topmost point of the frame, respectively. The origin point (0, 0) of the Cartesian system is the top-left edge of the frame. RealSense consists of an RGB and a Depth sensor. This means there will be two discrete frame types generated from each sensor. For now, we’ll only focus on the Depth frames.

Coordinate mapping is the process of converting between the 3D and the 2D system.

Such transformations require knowledge of the internal configuration of the camera (intrinsic and extrinsic parameters). The intrinsic and extrinsic parameters specify properties such as the distortion of the lens, the focal point, the image format, the rotation matrix, etc. As a result, the required Mathematical transformations are usually performed internally by the SDK. Indeed, Intel has exposed the transformations as part of their **C++** and **Python** frameworks.

If you’ve been following this blog, you know that I am building most of my commercial applications in C# and Unity3D. So, it came out as a big surprise to see that the coordinate mapping functionality was not part of the official C# framework. This is why I decided to port the original C++ projection & deprojection code to C#.

Thankfully, the C# API already includes an **Intrinsics** struct. The Intrinsics struct is providing us with the raw camera intrinsic parameters.

## Background (lenses, distortion, photography)

Keep in mind that we’ll be using some terms that are common in photography and photogrammetry. The RealSense intrinsic parameters are the following:

### Width/Height

The width and height parameters are the dimensions of the camera image, measured in pixels.

### Principal Point

You can think of the principal point as the point in the camera where the parallel light rays are focusing. The coordinates of the principal point are stored in the **ppx** and **ppy** values.

### Focal Length

Focal length is the distance from the center of the lens to the principal point. It is represented with the **fx** and **fy** parameters.

The following illustration will help you understand the concept visually:

### Distortion model

Lenses do not see the world as a perfect rectangular frame. Instead, the frame is deviating from its rectilinear projection. The deviation is what causes those fish-eye effects. Distortion can be “corrected” if we know its coefficients. The correction is using the Brown-Conrady transformations.

### Distortion Coefficients

The distortion coefficients are values which describe the distortion in Mathematical terms and are stored in the **coeffs** matrix.

If all this seems too hard to understand, that’s because it is. Don’t worry, though. I promise I’ll do the Math for you 😉

## Prerequisites

To test the source code, you’ll need a RealSense camera, as well as the official RealSense SDK for C# or Unity.

- RealSense D435 camera or RealSense D415 camera
- RealSense SDK
- RealSense Unity package
- RealSense Source code

The recommended development environment is Visual Studio 2017 and Unity3D.

## Mapping 3D to 2D

First, we are going to convert 3D world coordinates to 2D camera coordinates. To encapsulate the 3D coordinates, I’ve created a simple **Vector3D** struct (X, Y, Z floating-point values). To represent the 2D coordinates, I’ve created a **Vector2D** struct (X, Y floating-point values).

```
public struct Vector3D
{
public float X;
public float Y;
public float Z;
}
public struct Vector2D
{
public float X;
public float Y;
}
```

*Obviously, you are free to use the native .NET or Unity vector structures if you want to.*

The following method is extending the Intrinsics class. It’s taking a 3D point as an input and it’s outputting the corresponding 2D coordinates. In case the transformations fail for any reason, the (0,0) point is returned.

```
public static Vector2D Map3DTo2D(this Intrinsics intrinsics, Vector3D point)
{
Vector2D pixel = new Vector2D();
float x = point.X / point.Z;
float y = point.Y / point.Z;
if (intrinsics.model == Distortion.ModifiedBrownConrady)
{
float r2 = x * x + y * y;
float f = 1f + intrinsics.coeffs[0] * r2 + intrinsics.coeffs[1] * r2 * r2 + intrinsics.coeffs[4] * r2 * r2 * r2;
x *= f;
y *= f;
float dx = x + 2f * intrinsics.coeffs[2] * x * y + intrinsics.coeffs[3] * (r2 + 2 * x * x);
float dy = y + 2f * intrinsics.coeffs[3] * x * y + intrinsics.coeffs[2] * (r2 + 2 * y * y);
x = dx;
y = dy;
}
if (intrinsics.model == Distortion.Ftheta)
{
float r = (float)Math.Sqrt(x * x + y * y);
float rd = (1f / intrinsics.coeffs[0] * (float)Math.Atan(2f * r * (float)Math.Tan(intrinsics.coeffs[0] / 2f)));
x *= rd / r;
y *= rd / r;
}
pixel.X = x * intrinsics.fx + intrinsics.ppx;
pixel.Y = y * intrinsics.fy + intrinsics.ppy;
return pixel;
}
```

## Mapping 2D to 3D

Now, we are going to do the opposite: map a 2D point to the 3D space. To do so, we need to know a couple of things:

- The X and Y of the 2D point, and
- The depth of the 2D point

The depth of the desired point can be found by calling the GetDistance() method of the DepthFrame class.

`float depth = depthFrame.GetDistance(x, y);`

Given these inputs, the following method will transform the 2D screen coordinates to 3D world coordinates:

```
public static Vector3D Map2DTo3D(this Intrinsics intrinsics, Vector2D pixel, float depth)
{
Vector3D point = new Vector3D();
float x = (pixel.X - intrinsics.ppx) / intrinsics.fx;
float y = (pixel.Y - intrinsics.ppy) / intrinsics.fy;
if (intrinsics.model == Distortion.InverseBrownConrady)
{
float r2 = x * x + y * y;
float f = 1 + intrinsics.coeffs[0] * r2 + intrinsics.coeffs[1] * r2 * r2 + intrinsics.coeffs[4] * r2 * r2 * r2;
float ux = x * f + 2 * intrinsics.coeffs[2] * x * y + intrinsics.coeffs[3] * (r2 + 2 * x * x);
float uy = y * f + 2 * intrinsics.coeffs[3] * x * y + intrinsics.coeffs[2] * (r2 + 2 * y * y);
x = ux;
y = uy;
}
point.X = depth * x;
point.Y = depth * y;
point.Z = depth;
return point;
}
```

### Mapping 2D color to depth

Finally, there is another coordinate mapping scenario: converting between 2D **color** coordinates and 2D **depth** coordinates. The C# SDK is providing the **Align** class for this purpose, so you can refer to the official documentation on usage instructions.

## Summary

This is it, folks! For your reference, here is the complete source code:

```
using Intel.RealSense;
using System;
namespace Intel.RealSense.Extensions
{
/// <summary>
/// Converts between 2D and 3D RealSense coordinates.
/// </summary>
public static class CoordinateMapper
{
/// <summary>
/// Maps the specified 3D point to the 2D space.
/// </summary>
/// <param name="intrinsics">The camera intrinsics to use.</param>
/// <param name="point">The 3D point to map.</param>
/// <returns>The corresponding 2D point.</returns>
public static Vector2D Map3DTo2D(this Intrinsics intrinsics, Vector3D point)
{
Vector2D pixel = new Vector2D();
float x = point.X / point.Z;
float y = point.Y / point.Z;
if (intrinsics.model == Distortion.ModifiedBrownConrady)
{
float r2 = x * x + y * y;
float f = 1f + intrinsics.coeffs[0] * r2 + intrinsics.coeffs[1] * r2 * r2 + intrinsics.coeffs[4] * r2 * r2 * r2;
x *= f;
y *= f;
float dx = x + 2f * intrinsics.coeffs[2] * x * y + intrinsics.coeffs[3] * (r2 + 2 * x * x);
float dy = y + 2f * intrinsics.coeffs[3] * x * y + intrinsics.coeffs[2] * (r2 + 2 * y * y);
x = dx;
y = dy;
}
if (intrinsics.model == Distortion.Ftheta)
{
float r = (float)Math.Sqrt(x * x + y * y);
float rd = (1f / intrinsics.coeffs[0] * (float)Math.Atan(2f * r * (float)Math.Tan(intrinsics.coeffs[0] / 2f)));
x *= rd / r;
y *= rd / r;
}
pixel.X = x * intrinsics.fx + intrinsics.ppx;
pixel.Y = y * intrinsics.fy + intrinsics.ppy;
return pixel;
}
/// <summary>
/// Maps the specified 2D point to the 3D space.
/// </summary>
/// <param name="intrinsics">The camera intrinsics to use.</param>
/// <param name="pixel">The 2D point to map.</param>
/// <param name="depth">The depth of the 2D point to map.</param>
/// <returns>The corresponding 3D point.</returns>
public static Vector3D Map2DTo3D(this Intrinsics intrinsics, Vector2D pixel, float depth)
{
Vector3D point = new Vector3D();
float x = (pixel.X - intrinsics.ppx) / intrinsics.fx;
float y = (pixel.Y - intrinsics.ppy) / intrinsics.fy;
if (intrinsics.model == Distortion.InverseBrownConrady)
{
float r2 = x * x + y * y;
float f = 1 + intrinsics.coeffs[0] * r2 + intrinsics.coeffs[1] * r2 * r2 + intrinsics.coeffs[4] * r2 * r2 * r2;
float ux = x * f + 2 * intrinsics.coeffs[2] * x * y + intrinsics.coeffs[3] * (r2 + 2 * x * x);
float uy = y * f + 2 * intrinsics.coeffs[3] * x * y + intrinsics.coeffs[2] * (r2 + 2 * y * y);
x = ux;
y = uy;
}
point.X = depth * x;
point.Y = depth * y;
point.Z = depth;
return point;
}
}
/// <summary>
/// Represensts a 2D vector/point.
/// </summary>
public struct Vector2D
{
public float X;
public float Y;
}
/// <summary>
/// Represensts a 3D vector/point.
/// </summary>
public struct Vector3D
{
public float X;
public float Y;
public float Z;
}
}
```

There you have it. You can now use the full power of RealSense in your C# and Unity applications!

‘Til the next time… keep coding!

Hello, would you mind providing the intrinsics of the D415 for me to use in my C++ program? Some of the github links don’t work anymore.

Or you could show me the way to access this information, thanks!

Hi Franklin. The intrinsics are provided by the RealSense SDK. Latest C++ version methods can be found here.

This is what you need :

StreamProfile depth_stream = _oSelectedPipe.GetStream(Stream.Depth);

VideoStreamProfile vspDepth = depth_stream.Cast();

Hi ! I intend to buy the version with Realsense support, but I’d like to know if the compatibility with realsense cameras means that I can use all angle features and analysis such as events like PeopleDetected and PeopleLeft.

Hello Nikolas. I assume you are referring to our Vitruvius product. All of our product versions include the same functionality, so you don’t miss anything.

Hello!

I’m not sure if you are still checking comments on this post, but I do have a few questions if you still are. I am just getting into programming with depth cameras (using the Intel RealSense D415) and while I have experience coding in Java and C#, I’m unfamiliar with a lot of the methods and attributes that the D415 uses (for instance, how PipeLine and DepthSensor objects are used, especially in creating motion-tracking applications). If you had any basic advice on how to start, I’d appreciate it. Thanks!

Hi Glaceon. You can start by downloading the official RealSense package for Unity3D. Includes a lot of handy demos and samples to get started quickly and easily.

This maybe a stupid question but I get the coordinates but how can I display them when I start the code?

That would depend on the framework you are using. For example, you could use a Canvas with Ellipses in XAML or a Canvas with RawImage elements in Unity3D.

Hello, and thanks for writing this!

I wanted to ask how I could use this in Unity. When I try dragging it onto a game object (like the D435 series prefab) it says it cannot be used because it’s abstract.

What changes do I need to make to pop it onto my camera prefab?

My ultimate goal is to get it to read the z values of specific X/Y coordinates to me.

Thanks for your comment. You would need to create a C# “wrapper” class and add the equivalent properties as Editor fields (SerializeField). Toggling the Editor fields would alter their corresponding C# properties.

Sir, can you please help a little more. I want to get the depth value of particular pixel. please explain the usage of this source code. I am unable to understand how to use this script in Unity with RealSense SDK. Thanks

First, install the RealSense Unity package. Then, import one of the samples and capture the information you need from the built-in device/sensor class.