Voice Output of Object Detection Results
Introduction
This lesson creates a program that subscribes to YOLO object detection results implemented in Lesson7 and outputs the names of detected objects by voice. This enables the robot to tell “what it sees” by voice.
Learning Objectives
By the end of this lesson, you will be able to:
- Integrate multiple ROS2 nodes (object detection and voice output)
- Subscribe to detection results and process them
- Combine sensor processing with robot interaction capabilities
- Create interactive robot behaviors using ROS2 communication
Program Overview
This program has the following two main functions:
- Subscribe to YOLO object detection results from Lesson7
- Output detected object names by voice
These functions are implemented in one node:
kachaka_speak_detection: Subscribes to object detection results and outputs by voice
Required Packages
The following packages are required to run this program:
image_yolo_detection: Object detection package implemented in Lesson7kachaka_interfaces: Package that provides interfaces for Kachaka robot
Program Implementation
Voice Output Node (kachaka_speak_detection.py)
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from kachaka_interfaces.action import ExecKachakaCommand
from kachaka_interfaces.msg import KachakaCommand
from rclpy.action import ActionClient
import json
import time
class KachakaSpeakDetection(Node):
"""
Node for speaking detected objects
"""
def __init__(self):
super().__init__('kachaka_speak_detection')
# Set QoS profile for object detection data
qos = rclpy.qos.QoSProfile(
reliability=rclpy.qos.ReliabilityPolicy.BEST_EFFORT,
durability=rclpy.qos.DurabilityPolicy.VOLATILE,
history=rclpy.qos.HistoryPolicy.KEEP_LAST,
depth=10
)
# Subscribe to object detection information from Lesson7
self.subscription = self.create_subscription(
String,
'/image_yolo_detection/objects', # Topic from Lesson7
self.object_callback,
qos
)
# Action client for speech output
self._action_client = ActionClient(
self, ExecKachakaCommand, "/kachaka/kachaka_command/execute"
)
self._action_client.wait_for_server()
# Dictionary for converting English object names (kept in English for voice output)
self.object_names = {
'person': 'person',
'car': 'car',
'chair': 'chair',
'bottle': 'bottle',
'cup': 'cup',
'laptop': 'laptop',
'mouse': 'mouse',
'keyboard': 'keyboard',
'cell phone': 'smartphone',
'book': 'book',
# Add more as needed
}
# Store the last detected objects
self.last_objects = []
# Create a timer to periodically speak the detected objects
self.timer = self.create_timer(5.0, self.timer_callback)
self.get_logger().info("Kachaka speak detection node has started.")
def object_callback(self, msg):
"""
Process detected object information and store it
"""
try:
# Convert JSON string to Python object
object_info = json.loads(msg.data)
self.last_objects = object_info.get("objects", [])
except Exception as e:
self.get_logger().error(f"Error processing object information: {str(e)}")
def timer_callback(self):
"""
Periodically speak the detected objects.
This callback is triggered every 5 seconds by the timer. It creates
a natural-sounding sentence describing the detected objects and
sends it to the robot for voice output.
"""
# If no objects are detected, don't say anything
if not self.last_objects:
return
# Convert all detected objects and create a summary message
# Map English object names to display names (or keep original if not mapped)
object_names = []
for obj in self.last_objects:
# Get the display name from dictionary, or use original name if not found
object_name = self.object_names.get(obj, obj)
object_names.append(object_name)
# Create a natural-sounding summary message
# Handle singular vs plural cases for better speech
if len(object_names) == 1:
# Single object: "I detected person"
text = f"I detected {object_names[0]}"
else:
# Multiple objects: "I detected person, car and chair"
# Join all but the last with commas, then add "and" before the last one
text = f"I detected {', '.join(object_names[:-1])} and {object_names[-1]}"
# Execute speech output by sending the text to the robot
self.send_goal(text)
# Log what was said for debugging
self.get_logger().info(f"Speech output: {text}")
def send_goal(self, text):
"""
Send a speech command to Kachaka using the action server.
This method creates a command to make the robot speak the given text.
The command is sent asynchronously, so it doesn't block the node.
Parameters:
text: The text string that the robot should speak
"""
# Create a KachakaCommand message
command = KachakaCommand()
# Set command type to SPEAK_COMMAND (tells robot to use voice output)
command.command_type = KachakaCommand.SPEAK_COMMAND
# Set the text content to be spoken
command.speak_command_text = text
# Create a goal message for the action
goal_msg = ExecKachakaCommand.Goal()
# Attach the command to the goal
goal_msg.kachaka_command = command
# Send the goal asynchronously (non-blocking)
# This returns a Future that will complete when the server responds
future = self._action_client.send_goal_async(goal_msg)
# Set a callback to handle when the server accepts or rejects the goal
future.add_done_callback(self.goal_response_callback)
def goal_response_callback(self, future):
"""
Callback for when the goal is accepted or rejected
"""
goal_handle = future.result()
if not goal_handle.accepted:
self.get_logger().info("Goal rejected")
return
self.get_logger().info("Goal accepted")
result_future = goal_handle.get_result_async()
result_future.add_done_callback(self.get_result_callback)
def get_result_callback(self, future):
"""
Callback for when the goal is completed
"""
result = future.result().result
self.get_logger().info(f"Result: {result}")
def main(args=None):
rclpy.init(args=args)
node = KachakaSpeakDetection()
try:
rclpy.spin(node)
except KeyboardInterrupt:
node.get_logger().info("Shutting down...")
finally:
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()Program Explanation
Voice Output Node (kachaka_speak_detection.py)
This node has the following functions:
- Subscribe to YOLO object detection results from Lesson7
- Process detected object names
- Output object names by voice
Main processing flow:
- Receive information from Lesson7’s object detection node
- Convert JSON format string to Python object
- Process detected object names
- Output object names by voice
Implementation of Voice Output
Voice output is implemented using the Kachaka robot’s action server. The send_goal method performs voice output in the following steps:
- Create
KachakaCommandmessage - Set command type to
SPEAK_COMMAND - Set text to speak
- Create goal message
- Send goal asynchronously
- Set callback functions to handle goal acceptance and results
Callback Processing
Action client callback processing is implemented as follows:
goal_response_callback: Handles whether the goal was accepted or rejected- If the goal is accepted, set a callback to get results
- If the goal is rejected, log it
get_result_callback: Handles goal execution results- Log the results
Periodic Voice Output
To periodically output detected objects by voice, the following functions are implemented:
last_objectsvariable: Stores the latest detection results- Timer: Calls
timer_callbackmethod every 5 seconds timer_callbackmethod: Outputs stored detection results by voice
Usage
Preparation
- Create package
# Move to workspace src directory
cd ~/ros2_ws/src
# Create package
ros2 pkg create --build-type ament_python kachaka_speak_detection --dependencies rclpy std_msgs kachaka_interfaces --node-name kachaka_speak_detection- Build package
# Move to workspace root directory
cd ~/ros2_ws
# Build package
colcon build --packages-select kachaka_speak_detection
# Set up environment
source install/setup.bash- Confirm that Lesson7’s YOLO object detection node is running
ros2 run image_yolo_detection image_yolo_detection- Run voice output node
ros2 run kachaka_speak_detection kachaka_speak_detectionExercises
Add object names
- Add new object names to the
object_namesdictionary
- Add new object names to the
Display detection probability
- Modify to also output detection probability by voice
Detect only specific objects
- Modify to detect and output only specific objects (e.g., person or car) by voice
Add object position information
- Modify to also output object position information (where in the screen they were detected) by voice
Notes
- The voice output node will not work if Lesson7’s YOLO object detection node is not running
- Other processing may be blocked during voice output
- Extend object name mapping as needed