A3D: Adaptive Affordance Assembly with Dual-Arm Manipulation

Abstract

Furniture assembly is a crucial yet challenging task for robots, requiring precise dual-arm coordination where one arm manipulates parts while the other provides collaborative support and stabilization. To accomplish this task more effectively, robots need to actively adapt support strategies throughout the long-horizon assembly process, while also generalizing across diverse part geometries.

We propose A3D, a framework which learns adaptive affordances to identify optimal support and stabilization locations on furniture parts. The method employs dense point-level geometric representations to model part interaction patterns, enabling generalization across varied geometries. To handle evolving assembly states, we introduce an adaptive module that uses interaction feedback to dynamically adjust support strategies during assembly based on previous interactions. We establish a simulation environment featuring 50 diverse parts across 8 furniture types, designed for dual-arm collaboration evaluation. Experiments demonstrate that our framework generalizes effectively to diverse part geometries and furniture categories in both simulation and real-world settings.

Comparison of Single Arm vs Dual Arm Assembly

Overview

Method

Feature Extraction

PointNet++ extracts dense geometric features. Operation points are encoded to guide support estimation.

Affordance & Proposal

Predicts Top-K support points and uses CVAE to generate feasible orientations.

Interaction Adaptation

Aggregates historical feedback (actions & movements) to refine predictions.

Key Technical Details

Our method learns a closed-loop adaptive policy π(u_t | S_t, I_t). The core innovation lies in the Interaction Context Adaptation Module.

Top-K Sampling Strategy: Instead of relying on a single deterministic output, we employ Top-K sampling to propose diverse candidate support points, avoiding local optima like unstable corners.

Interaction Context Encoding: When a support action fails, we record the tuple (Point Cloud, Action, Displacement). [cite_start]An Attention mechanism aggregates these historical contexts to dynamically adjust the affordance heatmap[cite: 7].

Results

Quantitative Evaluation

Method	Screw ↑	Push ↑	Pull ↑	Pick Up ↑
Random	9.0%	7.2%	4.0%	6.5%
Heuristic	46.7%	31.9%	52.8%	31.4%
DP3	17.4%	22.1%	10.1%	11.5%
LLM-Guided	0.0%	0.0%	0.0%	0.0%
A3D (Ours)	56.3%	67.9%	61.7%	47.1%

Qualitative Analysis & Ablations

Visual Affordance Adaptation

Ablation Study

Simulation Demonstrations

Desk Assembly (Multi-View)

Lamp Assembly (Multi-View)

Real-World Experiments

Spotlight: Adaptive Correction in Action

Full Procedure Demonstrations

BibTeX

@inproceedings{
}