DexNoMa: Learning Geometry-Aware Nonprehensile Dexterous Manipulation

Anonymous Authors

Abstract

Nonprehensile manipulation, such as pushing and pulling, enables robots to move, align, or reposition objects that may be difficult to grasp due to their geometry, size, or relationship to the robot or the environment. Much of the existing work in nonprehensile manipulation relies on parallel-jaw grippers or tools such as rods and spatulas. Multi-fingered dexterous hands offer richer contact modes and versatility for handling diverse objects to provide stable support over the objects, which compensates for the difficulty of modeling the dynamics of nonprehensile manipulation. We propose Dexterous Nonprehensile Manipulation(DexNoMa), a method for nonprehensile manipulation which frames the problem as synthesizing and learning pre-contact dexterous hand poses that lead to effective pushing and pulling. We generate diverse hand poses via contact-guided sampling, filter them using physics simulation, and train a diffusion model conditioned on object geometry to predict viable poses. At test time, we sample hand poses and use standard motion planning tools to select and execute pushing and pulling actions. We perform 840 real-world experiments with an Allegro Hand, comparing our method to baselines. The results indicate that DexNoMa offers a scalable route for training dexterous nonprehensile manipulation policies. Our pre-trained models and dataset, including 1.3 million hand poses across 2.3k objects, will be open-source to facilitate further research.

NOTE: A tag on the index finger of the Allegro Hand in our recorded experiments videos could reveal personal information. To preserve double-blind review, we have obscured this tag with a black box in every video on our website.

Method Overview

Interpolate start reference image.

i. We present a large-scale dataset of hand poses specifically for pushing or pulling, and leverage it to train a diffusion model.

ii. During execution time, given an object, we obtain its basis point set representation and pass that to our trained diffusion model. This model synthesizes diverse floating pre-contact hand poses formed from our large-scale data generation pipeline.

iii. Given these hand poses, we then check their feasibility in a physics simulator by adding the arm back in and performing motion planning. We rank the feasible hand poses (e.g., “3” is infeasible in the example here) and select the best performing one (e.g., “4” in our example) and execute it in the real world.

Dataset

Explore our generated pushing hand poses over divers objects.

Click and drag to rotate. Right click and drag to pan. Scroll up to zoom in. Scroll down to zoom out. For more controls, check the upper right hand corner.

Experiments

Execution with DexNoMa

The vidoes below are rollouts of DexNoMa over 8 3D-printed objects and 6 off-the-shelf daily objects. We test three pushing directions for each of the object.



Direction 1

Direction 2

Direction 3

Baseline Comparision

We compare DexNoMa with the following methods.

i. Pre-Trained Grasp Pose: We use a pre-trained grasp synthesis model using NeRF. For each object, we train a NeRF representation, then query their pre-trained model for a grasp.

ii. Nearest Neighbor: Given a test object, we find the training object with the most similar BPS representation (in terms of Euclidean distance) and retrieve its associated hand poses. We then do the same motion planning pipeline as in DexNoMa.

ii. DexNoMa w/o Ranking: An ablation that excludes analytical ranking of hand poses (ignores Eq. 3) and executes a random feasible pose.

Show Detailed Result Figure



DexNoma

Pre-trained Grasp Pose



DexNoma

Nearest Neighbor

Multi-step Planning

DexNoMa can be used to perform multiple pushes. The videos below show two multi-step pushing sequences using DexNoMa. The robot uses two different hand poses to push the 3D-printed vase in the left video, as the first hand pose may not be ideal for the second hand pose, which shows the benefit of re-planning.


Failure Modes

DexNoMa still exhibits several limitations and failure modes.

i. Some rollouts are with nearly collision with our method. However, our evaluation metrics for real world experiments doesn't quite take such collision into consideration, which may potentially cause damage to the Allegro Hand.

ii. Our method fails with object toppling. However, since our focus is on nonprehensile hand pose generation, we didn't inverstigate too much into the toppling casued by factors such as firction, center of mass etc. We leave a more detailed investigation of how physical properties influence dexterous nonprehensile manipulation as future work.


Nearly Collision

Toppling