DexNoMa: Learning Geometry-Aware Nonprehensile Dexterous Manipulation

Nonprehensile manipulation, such as pushing and pulling, enables robots to move, align, or reposition objects that may be difficult to grasp due to their geometry, size, or relationship to the robot or the environment. Much of the existing work in nonprehensile manipulation relies on parallel-jaw grippers or tools such as rods and spatulas. Multi-fingered dexterous hands offer richer contact modes and versatility for handling diverse objects to provide stable support over the objects, which compensates for the difficulty of modeling the dynamics of nonprehensile manipulation. We propose Dexterous Nonprehensile Manipulation(DexNoMa), a method for nonprehensile manipulation which frames the problem as synthesizing and learning pre-contact dexterous hand poses that lead to effective pushing and pulling. We generate diverse hand poses via contact-guided sampling, filter them using physics simulation, and train a diffusion model conditioned on object geometry to predict viable poses. At test time, we sample hand poses and use standard motion planning tools to select and execute pushing and pulling actions. We perform 840 real-world experiments with an Allegro Hand, comparing our method to baselines. The results indicate that DexNoMa offers a scalable route for training dexterous nonprehensile manipulation policies. Our pre-trained models and dataset, including 1.3 million hand poses across 2.3k objects, will be open-source to facilitate further research.

NOTE: A tag on the index finger of the Allegro Hand in our recorded experiments videos could reveal personal information. To preserve double-blind review, we have obscured this tag with a black box in every video on our website.

i. We present a large-scale dataset of hand poses specifically for pushing or pulling, and leverage it to train a diffusion model.

ii. During execution time, given an object, we obtain its basis point set representation and pass that to our trained diffusion model. This model synthesizes diverse floating pre-contact hand poses formed from our large-scale data generation pipeline.

iii. Given these hand poses, we then check their feasibility in a physics simulator by adding the arm back in and performing motion planning. We rank the feasible hand poses (e.g., “3” is infeasible in the example here) and select the best performing one (e.g., “4” in our example) and execute it in the real world.

Explore our generated pushing hand poses over divers objects.

Execution with DexNoMa

The vidoes below are rollouts of DexNoMa over 8 3D-printed objects and 6 off-the-shelf daily objects. We test three pushing directions for each of the object.

Baseline Comparision

We compare DexNoMa with the following methods.

i. Pre-Trained Grasp Pose: We use a pre-trained grasp synthesis model using NeRF. For each object, we train a NeRF representation, then query their pre-trained model for a grasp.

ii. Nearest Neighbor: Given a test object, we find the training object with the most similar BPS representation (in terms of Euclidean distance) and retrieve its associated hand poses. We then do the same motion planning pipeline as in DexNoMa.

ii. DexNoMa w/o Ranking: An ablation that excludes analytical ranking of hand poses (ignores Eq. 3) and executes a random feasible pose.

Show Detailed Result Figure