Act the Part:

Learning Interaction Strategies for Articulated Object Part Discovery

People often use physical intuition when manipulating articulated objects, irrespective of object semantics. Motivated by this observation, we identify an important embodied task where an agent must learn to interact to recover object structures. To this end, we introduce Act the Part (AtP) to learn how to interact with articulated objects to discover and segment their parts. Our key insight is to couple learning interaction and motion prediction, which allows us to isolate parts and make perceptual part recovery possible without any explicit semantic information. Our experiments show the AtP model learns efficient strategies for discovering parts, can generalize to unseen categories, and is capable of conditional reasoning for the task. Although trained in simulation, we show convincing transfer to real world data with no fine-tuning.


Latest version: arXiv:2105.01047 [cs.CV] or here


1 Columbia University            2 Allen Institute for AI


title={Act the Part: Learning Interaction Strategies for Articulated Object Part Discovery},
author={Gadre, Samir Yitzhak and Ehsani, Kiana and Song, Shuran},
journal={arXiv preprint arXiv:2105.01047},
year={2021} }

Technical Summary Video (with audio)

Conditional Reasoning Demo

Probe the conditional reasoning of our model by (1) selecting an image from a real world unseen category and (2) selecting a pixel to hold. The model will output confidence for pushing at a pixel in eight different directions. Note: the AtP interaction network is running in the browser, so expect ~10s of latency.
(1) Select an image from the following drop down list:

(2) Select a hold pixel by clicking on the image:

Rotations every 45°, so AtP implicitly reasons about pushing in eight directions, while explicitly reasoning only about pushing right:


Thank you Shubham Agrawal, Jessie Chapman, Cheng Chi, the Gadres, Bilkit Githinji, Huy Ha, Kishanee Haththotuwegama, Gabriel Ilharco Magalhães, Amelia Kuskin, Samuel McKinney, Sarah Pratt, Jackie Reinhardt, Fiadh Sheeran, Mitchell Wortsman, and Zhenjia Xu for valuable conversations, code, feedback, and edits. Without you all, this work—and quarantine itself—would not have been possible. Special thanks to Fiadh Sheeran for the glasses and help filming.
This work was supported in part by the Amazon Research Award and NSF CMMI-2037101.


If you have any questions, please contact Samir.