Introduction to YOLO Object Detection in ROS2

This document explains how to perform object detection using YOLO (You Only Look Once) on images obtained from robot cameras in a ROS2 environment. YOLO is a high-performance deep learning model that can detect objects in real-time.

Learning Objectives

By the end of this lesson, you will be able to:

Understand what YOLO is and how it works for object detection
Integrate YOLO models into ROS2 nodes
Process camera images and detect objects in real-time
Visualize and publish object detection results

YOLO Object Detection Overview

This lesson covers:

Processing images obtained from robot cameras in real-time
Implementation of object detection using deep learning (YOLO)
Visualization of detection results and publishing to ROS2 topics
Working with pre-trained models and understanding model selection

1. What is YOLO

YOLO (You Only Look Once) is a deep learning model for detecting objects in images. Compared to traditional object detection methods, it has the following features:

High Speed: Can detect objects in real-time
High Accuracy: Latest YOLO models achieve high detection accuracy
Ease of Use: Pre-trained models are provided and easy to use
Diversity: Can detect various object classes

Main Uses of YOLO

Object Detection: Detect position and type of objects in images
Object Tracking: Track objects in videos
Segmentation: Identify regions of objects in images
Pose Estimation: Estimate object poses

2. Basics of YOLO Object Detection in ROS2

To perform YOLO object detection in ROS2, the following steps are required:

Subscribe to images from the camera
Convert ROS2 image messages to OpenCV format
Perform object detection using YOLO model
Visualize detection results and publish as needed

Required Libraries

rclpy: ROS2 Python client library
cv_bridge: Library for converting between ROS2 image messages and OpenCV format images
OpenCV: Library for image processing
Ultralytics: Library that provides YOLO models

3. Implementation of YOLO Object Detection

Create a ROS2 node for YOLO object detection. This node subscribes to images from the camera, performs object detection using the YOLO model, and visualizes and publishes detection results.

Implementation Steps

Create a ROS2 node
Subscribe to images from the camera
Load YOLO model
Perform object detection on images
Visualize and publish detection results

Code Example

image_yolo_detection.py

#!/usr/bin/env python3
import rclpy
from rclpy.node import Node
from sensor_msgs.msg import Image
from cv_bridge import CvBridge
import cv2
from ultralytics import YOLO
from rclpy.qos import QoSProfile, QoSReliabilityPolicy, QoSDurabilityPolicy
from std_msgs.msg import String
import json


class ImageYoloDetection(Node):
    def __init__(self):
        super().__init__('image_yolo_detection')
        
        # Set QoS profile for image data transmission
        qos_profile = QoSProfile(
            depth=5,
            reliability=QoSReliabilityPolicy.BEST_EFFORT,
            durability=QoSDurabilityPolicy.VOLATILE
        )
        
        # Subscribe to camera image topic
        self.subscription = self.create_subscription(
            Image,
            '/kachaka/front_camera/image_raw',
            self.image_callback,
            qos_profile
        )
        
        # Create publisher for detection results
        self.publisher = self.create_publisher(
            Image,
            '/image_yolo_detection/image',
            qos_profile
        )

        # Publish detected object information
        self.object_publisher = self.create_publisher(
            String,
            '/image_yolo_detection/objects',
            qos_profile
        )

        # Create CvBridge for ROS-OpenCV image conversion
        self.bridge = CvBridge()

        # Load YOLOv8 model
        # 'yolov8x.pt' is the largest YOLOv8 model (highest accuracy, slowest)
        # The model file will be downloaded automatically on first use if not present
        # Other options: 'yolov8n.pt' (fastest), 'yolov8s.pt', 'yolov8m.pt', 'yolov8l.pt'
        self.model = YOLO('yolov8x.pt')
        
        # Record previously detected objects to avoid duplicate announcements
        # Using a set ensures each object is only tracked once
        self.last_detected_objects = set()
        
        self.get_logger().info("YOLOv8 node has started.")

    def image_callback(self, msg):
        try:
            # Convert ROS Image message to OpenCV image
            cv_image = self.bridge.imgmsg_to_cv2(msg, desired_encoding='bgr8')
        except Exception as e:
            self.get_logger().error("Error converting image: " + str(e))
            return

        # Perform object detection using YOLOv8
        # The model processes the image and returns detection results
        # Results include: bounding boxes, class labels, confidence scores
        results = self.model(cv_image)

        # Draw detection results on the image
        # plot() draws bounding boxes, labels, and confidence scores on the image
        # This creates a visual representation of what was detected
        annotated_image = results[0].plot()

        # Extract detected object names from the results
        detected_objects = set()  # Use set to avoid duplicates
        for result in results:
            # Iterate through all detected bounding boxes
            for box in result.boxes:
                # Get the class ID (integer) and convert to class name (string)
                class_id = int(box.cls[0])  # Class ID (e.g., 0, 1, 2...)
                class_name = result.names[class_id]  # Class name (e.g., 'person', 'car')
                detected_objects.add(class_name)  # Add to set (automatically handles duplicates)

        # Publish information only if new objects are detected
        # This avoids spamming the topic with the same detections
        # Set difference finds objects in detected_objects but not in last_detected_objects
        new_objects = detected_objects - self.last_detected_objects
        if new_objects:
            # Publish detected object information in JSON format
            object_info = {
                "objects": list(new_objects),
                "timestamp": self.get_clock().now().nanoseconds * 1e-9
            }
            object_msg = String()
            object_msg.data = json.dumps(object_info)
            self.object_publisher.publish(object_msg)
            
            self.get_logger().info(f"Detected objects: {new_objects}")

        # Update detection results
        self.last_detected_objects = detected_objects

        # Convert detection results to ROS Image message and publish
        try:
            annotated_msg = self.bridge.cv2_to_imgmsg(annotated_image, encoding='bgr8')
            self.publisher.publish(annotated_msg)
        except Exception as e:
            self.get_logger().error("Error converting detection results: " + str(e))

        # Display detection results in a window
        cv2.imshow("YOLOv8 Detection", annotated_image)
        cv2.waitKey(1)

def main(args=None):
    rclpy.init(args=args)
    node = ImageYoloDetection()
    try:
        rclpy.spin(node)
    except KeyboardInterrupt:
        node.get_logger().info("Shutting down...")
    finally:
        cv2.destroyAllWindows()
        node.destroy_node()
        rclpy.shutdown()

if __name__ == '__main__':
    main()

Code Explanation

QoS Profile: Set QoS profile suitable for image data transmission and reception
Subscription: Subscribe to images from the camera
Publisher: Create topics to publish detection results
cv_bridge: Convert between ROS2 image messages and OpenCV format images
YOLO Model: Load YOLOv8 model using Ultralytics library
Object Detection: Execute object detection using model(cv_image)
Result Visualization: Draw detection results on images using results[0].plot()
Result Publishing: Convert detection results to ROS Image messages and publish
Result Display: Display detection results using cv2.imshow function

4. Types of YOLO Models

YOLO provides various models. The main models are as follows:

YOLOv8 Models

YOLOv8n: Lightest model (fast but lower accuracy)
YOLOv8s: Small model (balanced performance)
YOLOv8m: Medium model (higher accuracy)
YOLOv8l: Large model (high accuracy but slow)
YOLOv8x: Largest model (highest accuracy but slowest)

Model Selection

Prioritize Processing Speed: Choose YOLOv8n or YOLOv8s
Prioritize Accuracy: Choose YOLOv8l or YOLOv8x
Prioritize Balance: Choose YOLOv8m

5. Package Creation and Execution

Package Creation

# Create YOLO object detection package
ros2 pkg create --build-type ament_python image_yolo_detection --dependencies rclpy sensor_msgs cv_bridge opencv-python ultralytics std_msgs --node-name image_yolo_detection

Installing Dependencies

# Install Ultralytics library
pip install ultralytics

Package Building

cd ~/ros2_ws
colcon build --packages-select image_yolo_detection

Node Execution

# Run YOLO object detection node
ros2 run image_yolo_detection image_yolo_detection

6. Applications of YOLO Object Detection

Practical Applications

Object Tracking: Track detected objects
Object Counting: Count specific objects
Anomaly Detection: Detect objects different from normal
Robot Navigation: Move while avoiding objects

Research and Development Applications

Training on Custom Datasets: Train YOLO models for specific objects
Multimodal Detection: Detection combining images and other sensor data
Real-time Object Tracking: Track objects in videos in real-time
3D Object Detection: Estimate 3D object position and pose from 2D images

Exercises

Basic Object Detection

Run the YOLO object detection node and observe how objects are detected from camera images
Use different YOLO models (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x) and observe the differences in detection accuracy and processing speed

Parameter Adjustment

Change the YOLO model’s confidence threshold and observe changes in detection results
Limit detection target classes to detect only specific objects

Node Extension

Add detection probability information to the object detection node
Add a function to display the number of detected objects
Calculate and display object position information (where in the screen they were detected)

Content

Use ROS2 and YOLO to detect objects in real-time from robot camera images!