In this era of fears that Artificial Intelligence will destroy humanity, SARA is a Socially-Aware Robot Assistant, developed in Carnegie Mellon University’s ArticuLab. SARA interacts with people in a whole new way, personalizing the interaction and improving task performance by relying on information about the relationship between the human user and virtual assistant. Rather than taking the place of people, Sara is programmed to collaborate with her human users. Rather than ignoring the socio-emotional bonds that form the fabric of society, Sara depends on those bonds to improve her collaboration skills.
Specifically, Sara is always attending to two kinds of goals at the same time: a task goal (such as finding information for her human, helping her human to navigate a conference, or helping her human user to learn a new subject matter (such as linear algebra), and a social goal (ensuring that her interaction style is comfortable, engaging, and results in increased closeness and a better working relationship between human and agent over time).
Sara accomplishes this innovative approach to robot (or virtual agent) assistance using cutting edge socially-aware artificial intelligence, developed in the ArticuLab at Carnegie Mellon University Specifically, Sara is capable of detecting social behaviors in conversation, reasoning about how to respond to the intentions behind those particular behaviors, and generating appropriate social responses – as well as carrying out her task duties at the same time.
SARA was presented at the World Economic Forum (WEF) Annual Meeting in Davos (January 17-20, 2017), and World Economic Forum Annual Meeting of New Champions, in Tianjin, China (June 26-28, 2016).
“Hello, I’m SARA. I’m here to be your personal assistant.”
In terms of detection, Sara can recognize visual (body language, using algorithms we developed, as well as the capabilities of OpenFace), vocal (acoustic features of speech, such as intonation or loudness, also using algorithms we developed, as well as the capabilities of OpenSmile) and verbal (linguistic features of the interaction such as conversational strategies, using models and binary classifiers we developed) aspects of a human user’s speech. We leverage the power of recurrent neural networks (deep learning techniques) and L2 regularized logistic regression (a discriminative model in machine learning) with multimodal information from both the user and SARA (speech, acoustic voice quality, and the conversational strategies described above) to learn the fine-grained temporal relationships among these modalities, and their contextual information. Sara uses those sources of input to estimate the rapport between user and agent in real time. We call this social intention recognition, based on the classic natural language processing and AI process of “(task) intention recognition”
In terms of reasoning, Sara first carries out the classic kind of AI task reasoning needed to determine how best to fulfill the user’s goals. Then Sara carries out a brand new kind of reasoning — what we call “social reasoning” — to determine how to carry out the conversation (including language and body language) with the user so as to best accomplish both the task (information-seeking, teaching, calendar management, etc.) and social goals (managing rapport, etc.).
In terms of responding, the output of the social reasoner is sent to the Natural Language Generation module. That output of the social reasoner is suggestions (technically, those “suggestions” are the output of a spreading activation network where one conversational strategy is activated). Those social reasoner’s suggestions, such as
“the most effective way to increase rapport given the user’s last speech, and the current rapport level, and what Sara is trying to achieve at this point in the conversational strategy, would be in terms of a self-disclosure”
are sent to the Natural Language Generation module. The NLG module is responsible for generating language that fits the conversational strategy requested and is grammatically correct. For example, if self-disclosure is the optimal next strategy, the NLG might generate language for Sara as follows:
“I certainly find it difficult to remember information without noting it down. If you’re like me, you might take a screenshot so you remember this information.”
The NLG also generates appropriate body language for Sara – this body language includes hand gestures, shifts in eye gaze, smiles, head nods, etc., all with the goal of establishing and building a durable relationship with the user over time. Sara’s body was designed and animated by our talented animators, and implemented using the Virtual Human Toolkit.
This relationship between Sara and her human user is the social infrastructure for improved performance.
The current application of SARA is the front-end of an event app; that is, a personal assistant that helps conference attendees achieve their goals, including introducing them to other attendees and telling them about sessions that fit their interests. Through rapport-building conversational strategies the agent elicits the user’s interests and preferences and uses these to improve its recommendations. Through estimating the user’s current rate of rapport, and the conversational strategies the user has uttered, the agent is able to choose the right conversational strategies to respond with.
While Sara is a personal assistant for busy conference attendees, applications of our work on socially-aware artificial intelligence have also included educational technologies, such as:
- Culturally-Aligned Virtual Peers: educational technologies in the shape of virtual children that support students in low-resourced schools for whom collaborative learning with peers, and a sense that they belong in their school environment, have been shown to be particularly important to learning gains.
- Authorable Virtual Peers: a system that allows children with high-functioning autism or Aspergers to author behaviors for and control virtual peers as a way to acquire and practice key interactional social skills. Results have demonstrated improved social skills in subsequent peer-peer interaction.
- SCIPR: Educational games that rely on peer dynamics and augmented reality environments to evoke, scaffold, and preserve curiosity in an increasingly “teach-to-the-test” school paradigm.