[Dprglist] Depth image vs pointcloud

Wed Jan 5 11:49:56 PST 2022

Thanks for the clarification Karim - that makes sense, and thanks for the
additional real-world perspective. We are definitely on the same page.

Paul

On Wed, Jan 5, 2022 at 11:52 AM Karim Virani <pondersome64 at gmail.com> wrote:

> Yes I was asserting something subtly but importantly different through
> perhaps clumsy language.
>
> I was saying that a 3D point cloud and a depth frame are 100 percent
> equivalent. Meaning there is no substantive difference in the information
> they contain and what you can determine from that information. The depth
> frame *is* 3D data every bit as much as a point cloud. The 3D point cloud
> is just a linear transform applied to the constructed depth frame. It's a
> construct of a construct with nothing new added - except that it is
> presumably rectified to real-world coordinates from the camera's
> perspective.
>
> I was not saying that a point cloud is useless. The Cartesian "real world"
> is a wonderful common meeting ground, particularly when you are doing
> sensor fusion - like correlating a lidar with a depth camera. Like I said,
> it also makes it easier to do things like scanning a room to build a model
> or map of it.
>
> I was only saying that you don't get anything magical out of the point
> cloud transformation and it can be thought of as unnecessary processing for
> applications that don't need it. So if you are concerned with navigating
> around obstacles on a mobile robot, you only need to know if something is
> going to block your navigation path. That's pretty straightforward to do
> without a point cloud once you've got a good understanding of your camera's
> perspective on the real world.
>
> All of these considerations are just the beginning of course. What you
> were showing last night was the instantaneous point cloud constructed as a
> transform on the depth frame using the known camera intrinsics. It's great
> to let the system do that for you when you have the processing/power budget
> to spare. The next step is to correct the point cloud when your camera
> orientation changes (like if it looks down) Now you need to transform those
> points again, presumably with high-quality imu information. And then you
> want to correlate frame clouds over time which also involves transforms due
> to motion. This will result in a more stable point cloud model that can
> become the input to building a real 3D model of the environment. This is
> exactly what 3D scanners are doing.
>
> It's all well established goodness. The only thing I was questioning was
> what sounded (to me) like an implicit assumption that you need point clouds
> to make sense of depth frames or that it was somehow more efficient in
> collision avoidance. Some simple trig should be done one way or the other -
> to define your region of interest as the pixels or points your robot could
> collide with on it's current trajectory. It's true that the definition of
> the ROI, and therefore the approach to clipping/filtering looks different
> in the two approaches. In a depth frame you are more likely to use 2D image
> filtering algorithms, which will probably feel more complex to those who
> don't have experience with frameworks like OpenCV. Computationally this is
> likely to be far more efficient, but one allure of point clouds is that the
> ROIs can be as simple as a rectilinear 3D volumes. Don't forget you bought
> that simplicity of expression by doing transforms on every single pixel on
> every frame, but once you've sunk that cost, you might as well engage with
> that simplicity.
>
> To be clear, I am in favor of using point clouds when you have the compute
> budget. You are on the right side of Moore's law if you go that route. It's
> only going to get easier. However, it's still just a beginning. The road
> from instantaneous point clouds to extracted and interpreted real world
> geometry is a whole other non-trivial journey. In fact the very next step
> is a doozy.  How you identify the ground plane is not as simple as it seems
> once you put the camera on a moving / vibrating outdoor robot.
>
> I hope this makes sense. I think we're pretty much on the same page.
>
> Best,
>
> Karim
>
> On Tue, Jan 4, 2022 at 11:20 PM Paul Bouchier via DPRGlist <
> dprglist at lists.dprg.org> wrote:
>
>> It was asserted tonight that depth images and pointcouds are the same
>> thing, that no geometry or transformation is needed to access xyz of a
>> pixel in a depth-image or point in a pointcloud, or to relate one to the
>> other. I think that is incorrect, and that I gave up too easily on that
>> point. See the attached diagram for why it's not true.
>>
>> The diagram is an old-time physics lens drawing, showing the camera lens
>> imaging onto the image plane on the left. Two arrows: a1 at depth d=1, a2
>> at depth d=2, both 1 unit high, image on different parts of the image
>> plane; a1 at 0,-50 and a2 at 0,-30 because it's further away, despite being
>> the same height. a3 is twice as tall as a2, and twice as far away, but
>> images at the same coordinates as a2 - only the depth tells you it's
>> taller. You have to take the angle from the central axis to 0,-30 and the
>> depth to compute z for a3 or a2 and to tell the difference. IOW you have to
>> transform the image with the depth data to compute the xyz location of each
>> arrowhead.
>>
>> Maybe I misunderstood what was being asserted, but it seems to me a
>> pointcloud can simply be clipped at the boundaries of the region of
>> interest, whereas in a depth image each pixel must be transformed from it's
>> x,y,depth value to xyz using some math. All very doable, but if you're
>> trying to populate an obstacle map with some clipping on Z (e.g. to
>> eliminate the ground, or taller-than-relevant obstacles, or obstacles
>> outside the robot's path, you end up having to create a pointcloud to
>> determine if each point is within the region of interest.
>>
>> Did I misunderstand what was being asserted? David said he just wanted to
>> take the image and pull object height or horizontal offset from robot
>> centerline, and distance from the robot out of it (if I understood him) and
>> you have to do some trig to do that. It's the same math that has to be done
>> to create a pointcloud. It is true that the two forms contain the same
>> information, but it's represented differently, and transformation is needed
>> to map one to the other.
>>
>> Paul
>> _______________________________________________
>> DPRGlist mailing list
>> DPRGlist at lists.dprg.org
>> http://lists.dprg.org/listinfo.cgi/dprglist-dprg.org
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.dprg.org/pipermail/dprglist-dprg.org/attachments/20220105/55841fcb/attachment.html>