Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer

Osaka University

Thanpimon Buamanee*, Masato Kobayashi*, Yuki Uranishi, Haruo Takemura
* Co-first authors equally contributed to this work.


Bi-ACT Model

Autonomous manipulation in robot arms is a complex and evolving field of study in robotics. This paper proposes work stands at the intersection of two innovative approaches in the field of robotics and machine learning. Inspired by the Action Chunking with Transformer (ACT) model, which employs joint location and image data to predict future movements, our work integrates principles of Bilateral Control-Based Imitation Learning to enhance robotic control. Our objective is to synergize these techniques, thereby creating a more robust and efficient control mechanism. In our approach, the data collected from the environment are images from the gripper and overhead cameras, along with the joint angles, angular velocities, and forces of the follower robot using bilateral control. The model is designed to predict the subsequent steps for the joint angles, angular velocities, and forces of the leader robot. This predictive capability is crucial for implementing effective bilateral control in the follower robot, allowing for more nuanced and responsive maneuvering.

Bi-ACT Model

Bi-ACT Model
With the purpose of improving the comprehensibility of the environment in the study, our method were based on Action Chunking with Transformer, with an addition of a novel dimension to the input and output data. Along with joint angle and images data which are previously used in the original work, we increased angle velocity and torque to the input and output data.

From the architecture, it is evident that the model receives inputs as two RGB images, each captured at a resolution of 360 x 640, one from the follower’s gripper and the other from an overhead perspective. In addition, the model processes the current follower’s joint data, which consists of three types of data (angle, angular velocity, and torque) across five joints, forming a 15-dimensional vector in total. Utilizing action chunking, the policy generates an $k$ x 15 tensor, representing the leader’s next actions over $k$ time steps. The leader’s actions for these time steps are then conveyed to the controller, which determines the required current for the joints in the follower robot, enabling it to move in the specified direction.


Experimental Environment

Data Collection

In the environment setup, two robotic units designated as the leader and the follower, were positioned adjacent to each other as delineated. The experimental environment was arranged on the side of the follower robot, which was the designated site for task execution.


Data Collection

For the initial task of ‘Pick-and-Place’, the objective was for the gripper to accurately pick up objects of various shapes, weights, and textures from the pick area and then place them within the place area.

The second task was ‘Put-in-Drawer’, which involved moving an object from the pick area to the drawer.

Objects for Pick and Place


We used a foam ball and a softball during the data collection phase. In testing the model, these two objects, along with seven untrained objects - a table tennis ball, an eye-cream package, Canele, a soccer ball, a plastic bell pepper, a honey bottle and a glue jar - were used.

Data Collection (Teleoperation: Bilateral Control)

We collected joint angles, angular velocities, and torques data for a Leader-Follower robot’s demonstration using a bilateral control system. The robot was controlled at a frequency of 1000Hz. Additionally, both the onboard hand RGB camera and the top RGB camera on the environmental side of the robot were operating at approximately 200Hz. To align both sets of data with the system’s operating cycle, we adjusted the data to 100Hz for use as training data. This was done because the model’s inference cycle is approximately 100Hz.

Data Collection of Foamball (Real-Time: 1X)
Data Collection of Soft Tennis Ball (Real-Time: 1X)
Data Collection of Put in Drawer (Real-Time: 1X)

Results (Autonomous)

Pick and Place (Real-Time: 1X)
Put in Drawer (Real-Time: 1X)


  author={Buamanee, Thanpimon and Kobayashi, Masato and Uranishi, Yuki and Takemura, Haruo},
  booktitle={2024 IEEE International Conference on Advanced Intelligent Mechatronics (AIM)}, 
  title={Bi-ACT: Bilateral Control-Based Imitation Learning via Action Chunking with Transformer}, 


Masato Kobayashi (Assistant Professor, Osaka University, Japan)