Implementing the Gaze Controller
To make the interaction with the robot more natural, we had the plan to implement a gaze controller. The gaze controller should be able to detect the location of the customer’s face and then move the robot’s head to look at the customer.
The gaze controller runs in a separate thread to be executed parallel to the other interactions of the robot with the customer. It can be in two different states: idle and follow face. The gaze controller is in the idle state when the robot is not interacting with a custormer. In this state, the gaze controller performs random slight head movements and executes selected gestures. We added our own gestures that can be used in this state (e.g. looking around, shaking the head). The gaze controller is in the follow face state when the robot is interacting with a customer. In this state, the gaze controller tries to detect the face of the customer and then moves the robot’s head to look at the customer. The gaze controller receives its current state from the InteractionCoordinator and InteractionManager.
Before implementing the main part of the gaze controller we had to figure out what is the best way to controll the head movements of the robot. We’re using the attend function of the FurhatRemoteAPI that allows us to move the robot’s head to a specific location. The function takes the x and y coordinates of the location as values between -1 and 1. It describes the relative movement of the robot’s head in the horizontal and vertical direction. Additionally, the function takes the distance to the location as parameter z.
To detect the location of the customer’s face we’re using the output of the FacialExpressionDetector. It uses the PyFeat library to detect faces in a frame. The FaceDetector returns the coordinates of the detected faces as rectangles. This representation differs from the representation of the attend function we’re using to move the robot’s head. Therefore, we had to convert the coordinates of the detected faces to the coordinates of the attend function. We defined a grid with 49 cells and divided the frame into these cells. Our plan was to move the robot’s head to the center of the cell where the customer’s face is located. Instead of following directly the customer’s head, this results in less movements of the robot’s head but provides still a natural interaction. To convert the coordinates of the cell centers to the x and y parameter of the attend function we had to calculate the relative distance of the cell center to the center of the frame.
While testing this simple conversion, we figured out three different problems. First, the movements of the robot’s head were to extreme. Therefore, we scaled the calculated x and y values by a fixed constant. Second, the webcam of the robot is located higher than the robot’s eyes. This results in a different perspective of the customer’s face. To compensate this, we had to adjust the calculated y value. Third, we had to find a method to calculate the distance to the customer’s face. To approximate this, we measured the distance to the customer’s face in two different positions and calculated a linear function that approximates the measured values.
After solving these problems, the robot follows the customers face naturally as the following video shows: