ALPHA-α and Bi-ACT Are All You Need: Importance of Position and Force Control and Information in Imitation Learning for Unimanual and Bimanual Robotic Manipulation

Masato Kobayashi*, Thanpimon Buamanee, Takumi Kobayashi

* Corresponding author: Masato Kobayashi

Team

Overview

Abstract:

Autonomous manipulation in everyday tasks requires flexible action generation to handle complex, diverse real-world environments, such as objects with varying hardness and softness. Imitation Learning (IL) enables robots to learn complex tasks from expert demonstrations. However, a lot of existing methods rely on position/unilateral control, leaving challenges in tasks that require force information/control, like carefully grasping fragile or varying-hardness objects. As the need for diverse controls increases, there are demand for low-cost bimanual robots that consider various motor inputs. To address these challenges, we introduce Bilateral Control-Based Imitation Learning via Action Chunking with Transformers(Bi-ACT) and"A" “L"ow-cost “P"hysical “Ha"rdware Considering Diverse Motor Control Modes for Research in Everyday Bimanual Robotic Manipulation (ALPHA-α). Bi-ACT leverages bilateral control to utilize both position and force information, enhancing the robot’s adaptability to object characteristics such as hardness, shape, and weight. The concept of ALPHA-α is affordability, ease of use, repairability, ease of assembly, and diverse control modes (position, velocity, torque), allowing researchers/developers to freely build control systems using ALPHA-α. In our experiments, we conducted a detailed analysis of Bi-ACT in unimanual manipulation tasks, confirming its superior performance and adaptability compared to Bi-ACT without force control. Based on these results, we applied Bi-ACT to bimanual manipulation tasks using ALPHA-α. Experimental results demonstrated high success rates in coordinated bimanual operations across multiple tasks, verifying the effectiveness of our approach in complex real-world scenarios. The effectiveness of the Bi-ACT and ALPHA-α can be seen through comprehensive real-world experiments.

ALPHA-α

A” “L"ow-cost “P"hysical “Ha"rdware Considering Diverse Motor Control Modes for Research in Everyday Bimanual Robotic Manipulation

Overview

we aimed to develop ALPHA-α, a low-cost bimanual robotic physical hardware considering diverse motor control modes that is suitable for robotics research capable of handling everyday tasks, allowing it to be easily constructed by many researchers and developers. It is important to note that we do not claim our hardware ALPHA-α is superior to ALOHA in terms of performance. The reason for comparing ALPHA-α and ALOHA in this paper is to clarify the position of ALPHA-α by comparing ALPHA-α with ALOHA, a bimanual robot platform used by many users. ALPHA-α features low cost, ease of use, repairability, ease of assembly, and ability to enable various control types and high control frequency.

The distinctive features of ALPHA-α are as follows.

- Low Cost:

As of November 2024, according to the ALOHA price documentation[1], the cost of the robots and cameras is $20,485.96 (USD), accounting for almost 75% of the total cost of ALOHA, which is $27,067.41 (USD). This suggests that reducing the cost of the robots and cameras could make the hardware more accessible to a wider range of users. We chose OpenMANIPULATOR SARA[2] for ALPHA’s Leada Follower robot. ALPHA robots (four OpenMANIPULATOR SARA) fits within the budget of most robotics labs, costing approximately $8,663, which is more than half less expensive compared to the four ALOHA robots at around $19,359[1].

Note: OpenMANIPULATOR SARA is not developed by us. However, our ALPHA-α has modified the leader hand from parts of OpenMANIPULATOR SARA. Since OpenMANIPULATOR SARA is not available as of 2024/11, the prices in the table are produced by modification from OpenMANIPULATOR-X[3]. We have also modified it from OpenMANIPULATOR-X. The price of OpenMANIPULATOR SARA may change in the future.

- Diverse Motor Control Modes:

ALPHA-α utilizes and improves upon robots, a robot composed of motors capable of position, velocity, and torque (current) control, to provide researchers with flexibility in selecting control methods. Since motors can be controlled by position, speed, and torque (current) control, researchers/developers can develop various control methods by themselves.

- Data Collection Frequency:

As the selection of control systems increases, for example, sensitive manipulation by force control requires a higher control frequency. Therefore, ALPHA-α employs a motor capable of collecting and estimating joint angle, velocity, and current data at 1000 Hz and an RGB camera capable of collecting RGB images at 260 Hz. For stable collection, RGB image data is collected at about 100 Hz in this paper.

We selected and improved a robot that meets these specifications, constructing the physical hardware which we have named ALPHA-α.

ALPHA-α via Bilateral Control

This paper focuses on bilateral control-based imitation learning. thus we implemented bilateral control in ALPHA-α. For clarity of explanation, we describe bilateral control in comparison with unilateral control, which is also employed in ALOHA.

Leader-Follower Control

Unilateral Control (e. g. ALOHA)

The primary difference between ALOHA/ACT and Bi-ACT is information and control methods. ALOHA/ACT is based on unilateral control, which relies solely on the robot’s joint positions and uses the joint angle data predicted by the ACT learning model directly as command values for ALOHA’s joint position control controller. This system prioritizes position targets, which can make it difficult to generate movements that require nuanced control of force. It is important to note that although it is possible to simulate force modulation in remote operations using only position control, this typically requires extensive time for operators to master the control of the leader robot.

Bilateral Control (e. g. ALPHA-α via Bilateral Control)

On the other hand, our Bi-ACT is based on bilateral control, which considers the robot’s joint positions, velocities, and torques. Bi-ACT utilizes not only the joint data of the leader robot—positions, velocities, and torques—but also incorporates this information from the actively operating follower robots to generate command values for current and torque control. This approach allows for control that combines both position and force in the robot’s movements. Crucially, the command values are not directly generated by the model; instead, they are produced by using the values generated by the model for the leader robot in conjunction with the actual values obtained from the follower robot. By leveraging four-channel bilateral control, this method enables the generation of command values that consider interactions with the environment, thus facilitating a broader range of movements.

Teleoperation Skills: ALPHA-α via Bilateral Control

Here is a video of ALPHA-α via bilateral control. In this way, researchers/developers can build various control systems on ALPHA-α.

Bi-ACT

“Bi"lateral Control-Based Imitation Learning via “A"ction “C"hunking with “T"ransformers

Our proposed work employs a method inspired by ACT research, utilizing joint and image data to predict movements, combined with Bilateral Control-Based Imitation Learning principles for a robust robotic control approach.

Data Collection

Data collected includes images from gripper and environmental cameras, along with joint angles, angular velocities, and torque of leader and follower robots.

Execution

Bi-ACT predicts subsequent steps for these factors, facilitating effective bilateral control in the follower robot for more responsive maneuvering.

Autonomous Skills

Unimanual Robotic Manipulation

Task

For the initial task of ‘Pick-and-Place’, the objective was for the gripper to accurately pick up objects of various shapes, weights, and textures from the pick area and then place them within the place area.

Objects for Pick and Place

We used a foam ball and a softball during the data collection phase. In testing the model, these two objects, along with seven untrained objects - a table tennis ball, an eye-cream package, Canele, a soccer ball, a plastic bell pepper, a honey bottle and a glue jar - were used.

We collected joint angles, angular velocities, and torques data for a Leader-Follower robot’s demonstration using a bilateral control system. The robot was controlled at a frequency of 1000Hz. Additionally, both the onboard hand RGB camera and the top RGB camera on the environmental side of the robot were operating. To align both sets of data with the system’s operating cycle, we adjusted the data to 100Hz for use as training data.

Results (Autonomous)

Pick and Place (Real-Time: 1X)

Why Bi-ACT(or position and force information/control) is important?

In this section, we analyze Bi-ACT model’s performance on various objects, focusing on joint5—the gripper joint with the most contact with the objects. Results showed that integrating force metrics significantly enhanced its effectiveness. Details are as follows.

Difference in hardness

We observe the impact of object hardness on torque exerted at joint5. The torque data from the follower shows a clear trend: harder objects require more gripping force. Among the tested objects, the plastic bell pepper—with its slippery surface or unique shape—demanded the highest force, followed by the glue jar. This indicates that objects with either a high hardness level or challenging surface characteristics, like slipperiness or irregular shape, require increased force for stable manipulation. On the other hand, softer objects like the canele and softball showed lower force levels, with the softball in particular demonstrating a gradual force increase due to its larger size and lower hardness. These observations emphasize the model’s ability to adapt to different hardness levels and indicate that force control allows the robot to apply appropriate gripping power according to each object’s physical properties.

Diﬀerence in shape consistency

We analyzed the impact of shape consistency by comparing two objects: the table tennis ball (consistent shape) and the honey bottle (irregular shape). The torque readings for the table tennis ball remained uniform across the 10 trials, as its consistent shape provided predictable points of contact for the gripper. However, the torque values for the honey bottle varied significantly, likely due to its irregular shape, which changes the contact points between the gripper and the object with each attempt. This variation underscores the model’s responsiveness to irregular shapes and highlights the importance of force feedback in adapting to inconsistent contact points during manipulation.

These results confirmed the importance of position and force information/control when using Bi-ACT.

Bimanual Robotic Manipulation

Task

To examine the applicability of Bi-ACT, experiments were conducted on three tasks, “Put-Cup-Ball,” “Egg Handling,” and “Open Cap,” using ALPHA-α. For each task, we collected 5 demonstrations as training data.

- Put-Cup-Ball

In the “Put-Cup-Ball” task, the left robot arm transports a cup located on the left side, while the right robot arm picks up a ball and places it on top of the cup. The specific steps are as follows: (#0) Initial position (#1) Pick up the cup and ball (#2) Place the cup and move the ball (#3) Place the ball on top of the cup.

This task requires coordinated bimanual robot actions, as each arm must monitor the other’s status and avoid interference, ensuring effective cooperation between the two robot arms.

- Egg Handling

In the “Egg Handling” task, the two robots coordinate to lift two eggs and place them in a designated area. The specific steps are as follows: (#0) Initial position (#1) Pick two eggs (#2) Move to place area (#3) Place two eggs in place area.

This task requires the left and right arms to carefully grasp and transport fragile eggs to the specified location. Proper and delicate bimanual robot actions are essential to avoid breaking the eggs.

- Open Cap

In the “Open Cap” task, two robots are used to open the cap of a plastic bottle. The specific steps are as follows: (#0) Initial position (#1) Grasp the bottle with the right robot (#2) Pass the bottle to the left robot (#3) Open the bottle cap.

This task requires even more careful coordination than the “Put-Cup-Ball” task, as both arms must monitor each other’s status and avoid interference. In particular, it necessitates coordinated bimanual actions, such as passing the bottle between robots and holding the bottle with the left robot arm while the right robot arm opens the cap.

Results (Autonomous)

Autonomous actions are executed based on learned Bi-ACT from only 5 collected demonstration data. The findings demonstrated that Bi-ACT model with force control, utilizing the ALPHA-α, exhibited high success rates and confirmed its applicability to bimanual tasks.

A little extra

Repairability

During system development, there were occasions when the robot was broken. However, the ability to quickly repair it ourselves is a significant advantage.

Control

For example, “ALPHA-α via bilateral control” enables tasks such as grasping potato chips, manipulating a cup by utilizing the environment, grasping cream puffs, and performing dual-arm coordinated lifting. This is made possible by the bidirectional transmission of position and force information between the operator and the robot, facilitating these actions. Additionally, ALPHA-α allows for the flexible implementation of not only bilateral control but also unilateral control and other existing or novel control systems.

Citation

@ARTICLE{10883984,
  author={Kobayashi, Masato and Buamanee, Thanpimon and Kobayashi, Takumi},
  journal={IEEE Access}, 
  title={ALPHA- α and Bi-ACT Are All You Need: Importance of Position and Force Information/ Control for Imitation Learning of Unimanual and Bimanual Robotic Manipulation With Low-Cost System}, 
  year={2025},
  volume={13},
  number={},
  pages={29886-29899},
  keywords={Robots;Costs;Force;Data collection;Imitation learning;Robot vision systems;Motor drives;Cameras;Transformers;Hardware;Imitation learning;bilateral control;robot arm;manipulation;bimanual},
  doi={10.1109/ACCESS.2025.3541200}}

Contact

Masato Kobayashi (Assistant Professor, The University of Osaka, Japan)

X (Twitter)
- English : https://twitter.com/MeRTcookingEN
- Japanese : https://twitter.com/MeRTcooking
Linkedin https://www.linkedin.com/in/kobayashi-masato-robot/
* Corresponding author: Masato Kobayashi