<div dir="ltr"><div>Thanks for the clarification Karim - that makes sense, and thanks for the additional real-world perspective. We are definitely on the same page.<br></div><div><br></div><div>Paul<br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Jan 5, 2022 at 11:52 AM Karim Virani <<a href="mailto:pondersome64@gmail.com">pondersome64@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Yes I was asserting something subtly but importantly different through perhaps clumsy language.<br><br>I was saying that a 3D point cloud and a depth frame are 100 percent equivalent. Meaning there is no substantive difference in the information they contain and what you can determine from that information. The depth frame <b>is</b> 3D data every bit as much as a point cloud. The 3D point cloud is just a linear transform applied to the constructed depth frame. It's a construct of a construct with nothing new added - except that it is presumably rectified to real-world coordinates from the camera's perspective.<br><br>I was not saying that a point cloud is useless. The Cartesian "real world" is a wonderful common meeting ground, particularly when you are doing sensor fusion - like correlating a lidar with a depth camera. Like I said, it also makes it easier to do things like scanning a room to build a model or map of it.<br><br>I was only saying that you don't get anything magical out of the point cloud transformation and it can be thought of as unnecessary processing for applications that don't need it. So if you are concerned with navigating around obstacles on a mobile robot, you only need to know if something is going to block your navigation path. That's pretty straightforward to do without a point cloud once you've got a good understanding of your camera's perspective on the real world.<br><br>All of these considerations are just the beginning of course. What you were showing last night was the instantaneous point cloud constructed as a transform on the depth frame using the known camera intrinsics. It's great to let the system do that for you when you have the processing/power budget to spare. The next step is to correct the point cloud when your camera orientation changes (like if it looks down) Now you need to transform those points again, presumably with high-quality imu information. And then you want to correlate frame clouds over time which also involves transforms due to motion. This will result in a more stable point cloud model that can become the input to building a real 3D model of the environment. This is exactly what 3D scanners are doing.<br><br>It's all well established goodness. The only thing I was questioning was what sounded (to me) like an implicit assumption that you need point clouds to make sense of depth frames or that it was somehow more efficient in collision avoidance. Some simple trig should be done one way or the other - to define your region of interest as the pixels or points your robot could collide with on it's current trajectory. It's true that the definition of the ROI, and therefore the approach to clipping/filtering looks different in the two approaches. In a depth frame you are more likely to use 2D image filtering algorithms, which will probably feel more complex to those who don't have experience with frameworks like OpenCV. Computationally this is likely to be far more efficient, but one allure of point clouds is that the ROIs can be as simple as a rectilinear 3D volumes. Don't forget you bought that simplicity of expression by doing transforms on every single pixel on every frame, but once you've sunk that cost, you might as well engage with that simplicity.<br><br>To be clear, I am in favor of using point clouds when you have the compute budget. You are on the right side of Moore's law if you go that route. It's only going to get easier. However, it's still just a beginning. The road from instantaneous point clouds to extracted and interpreted real world geometry is a whole other non-trivial journey. In fact the very next step is a doozy. How you identify the ground plane is not as simple as it seems once you put the camera on a moving / vibrating outdoor robot.<br><div><br></div><div><div>I hope this makes sense. I think we're pretty much on the same page.</div><div><br></div><div>Best,</div><div><br></div><div>Karim</div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, Jan 4, 2022 at 11:20 PM Paul Bouchier via DPRGlist <<a href="mailto:dprglist@lists.dprg.org" target="_blank">dprglist@lists.dprg.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>It was asserted tonight that depth images and pointcouds are the same thing, that no geometry or transformation is needed to access xyz of a pixel in a depth-image or point in a pointcloud, or to relate one to the other. I think that is incorrect, and that I gave up too easily on that point. See the attached diagram for why it's not true.</div><div><br></div><div>The diagram is an old-time physics lens drawing, showing the camera lens imaging onto the image plane on the left. Two arrows: a1 at depth d=1, a2 at depth d=2, both 1 unit high, image on different parts of the image plane; a1 at 0,-50 and a2 at 0,-30 because it's further away, despite being the same height. a3 is twice as tall as a2, and twice as far away, but images at the same coordinates as a2 - only the depth tells you it's taller. You have to take the angle from the central axis to 0,-30 and the depth to compute z for a3 or a2 and to tell the difference. IOW you have to transform the image with the depth data to compute the xyz location of each arrowhead.</div><div><br></div><div>Maybe I misunderstood what was being asserted, but it seems to me a pointcloud can simply be clipped at the boundaries of the region of interest, whereas in a depth image each pixel must be transformed from it's x,y,depth value to xyz using some math. All very doable, but if you're trying to populate an obstacle map with some clipping on Z (e.g. to eliminate the ground, or taller-than-relevant obstacles, or obstacles outside the robot's path, you end up having to create a pointcloud to determine if each point is within the region of interest.</div><div><br></div><div>Did I misunderstand what was being asserted? David said he just wanted to take the image and pull object height or horizontal offset from robot centerline, and distance from the robot out of it (if I understood him) and you have to do some trig to do that. It's the same math that has to be done to create a pointcloud. It is true that the two forms contain the same information, but it's represented differently, and transformation is needed to map one to the other.<br></div><div><br></div><div>Paul<br></div></div>
_______________________________________________<br>
DPRGlist mailing list<br>
<a href="mailto:DPRGlist@lists.dprg.org" target="_blank">DPRGlist@lists.dprg.org</a><br>
<a href="http://lists.dprg.org/listinfo.cgi/dprglist-dprg.org" rel="noreferrer" target="_blank">http://lists.dprg.org/listinfo.cgi/dprglist-dprg.org</a><br>
</blockquote></div>
</blockquote></div>