Rapport has been identified as an important function of human interaction, but to our knowledge no model exists of building and maintaining rapport between humans and conversational agents over the long-term that operates at the level of the dyad. In this project, we leverage existing literature and our corpus of peer tutoring data to develop a framework able to explain how humans in dyadic interactions build, maintain, and destroy rapport through the use of specific conversational strategies that function to fulfill specific social goals, and that are instantiated in particular verbal and nonverbal behaviors.
This project has two folds: (1) developing a theory of how rapport is built, maintained, and destroyed among teens, and (2) developing a computational architecture and system implementation that allows a virtual peer to build, maintain (and if necessary respond to destroying) rapport in the context of math tutoring.
One of the most interesting, and least understood, fields of behavioral science involves the social substructure of daily life: friendship, politeness, impoliteness, relationship formation, and rapport. We all know that feeling of “getting along” or “clicking” with someone, and the ways in which as a relationship deepens, rapport builds, but there are few comprehensive theories about what the rapport-building process is, and the mechanisms by which it takes place. And yet, it has been shown that increased rapport plays an important role in everyday life: people learn more from teachers and peers with whom they feel rapport, gain more medical benefits from doctors and therapists with whom they feel rapport, are more honest and more likely to complete surveys when they feel rapport with the interviewer, and so on. As computational devices take on an increasingly important and ubiquitous role in our lives, we believe that these devices should know how to build rapport so as to better support their users over time.
We know from research in the Learning Sciences that intelligent tutoring systems can help students learn, often vastly improving what students learn in traditional, lecture-based classrooms. Furthermore, we know that when classroom peers collaborate on a learning project, those students who are friends learn more together than those students who are not friends. Finally, research has shown that peer tutoring can lead to more learning gains for the student who does the tutoring than the one being tutored. When we combine these observations together, we conclude that a computerized peer that can engage in reciprocal peer tutoring – teaching and being taught by a human – and can also develop rapport with that human, may be of great use in the classroom. Imagine a teachable agent that knows social cues well enough to say something impolite if that utterance would improve the chances that the tutor learns more. This is the system we are implementing.
While some researchers have studied “instant rapport,” we focus on long-term rapport, and the ways in which people change their behaviors as they come to know and feel deepening rapport with another person. This will allow us to build devices that truly can become a part of our lives over the long-term.
We have carried out extensive analyses of three datasets to look at the role of rapport in learning. The first, collected by Erin Walker (Walker et al., 2011) contained data on peer tutoring over a chat interface by 130 high school students. We annotated the text data for the social functions of impoliteness and positivity, and the behaviors that might play a role in those social functions (such as criticisms, praise, insults, condescension, complaining, challenges, off-task behavior, etc.). Our analyses showed that negative behavior such as insults actually predicted learning gains (Ogan et al., 2012) and that both positivity and impoliteness could be automatically detected on the basis of the behaviors that make it up (Wang et al., 2012).
These results suggest that social functionality does play a role in peer tutoring, but that the nature of that social talk may not be the politeness and positivity that one might expect.
To follow up, we collected a second data set of face-to-face peer tutoring. We asked 12 dyads of high school students (half of the dyads were friends and half were strangers; half were girls and half were boys) to take turns tutoring one another in linear equations. The students came into the lab 5 times over 5 weeks. During each session both students in the dyad had the opportunity to tutor the other, with social time breaks built in between the tutoring (social time – tutoring – social time – tutoring – social time). Each session was videotaped from 3 angles so as to capture the face and torso of each individual, and a side view showing both participants. At the end of each session participants filled out a questionnaire about their rapport with and liking for the other, and at the beginning and end of the 5 weeks, the students took a test to evaluate their knowledge of linear equations.
Annotating Conversational Strategies
We have been transcribing and annotating the more than 90 hours (60 sessions) of human-human data. Based on prior literature in social psychology and communication studies, we have annotated non-verbal behaviors such as eye gaze, head nods, posture shifts and smiles, and verbal behavior such as insults, external vs. internal complaining, positive and negative self-disclosure, reference to shared experience, and more than 20 other phenomena. The longitudinal nature of the data, as well as the differences between friends and strangers has allowed us to see how friends vs. strangers weather frustration, how they manage a task where one partner (the tutor) is given more power than the other, and what kinds of social support strategies enhance learning and what kinds diminish it.
This dataset has also allowed us to automatically detect friends vs. strangers based on their acoustic and nonverbal behavior (Zhou et al., 2013).
Finally, we used a think-aloud protocol to collect data about students tutoring a virtual agent (called a “teachable agent”) to see whether the results we obtained for human-human tutoring translated to a context where one member of the dyad was a computer. Here too, to our surprise, we found that students who insulted the agent, and students who engaged with the agent and referred to it as “you” were more likely to learn than students who were polite, or students who referred to the agent as “she” or “it”(Ogan et al,. 2012).
Based on the data analysis described above, as well as a thorough investigation into prior literature from the social sciences on the components that make up the experience of rapport, the way people assess rapport in others, and the goals and strategies people use to build, maintain and destroy rapport. we propose a model for rapport enhancement, maintenance, and destruction in human-human and human-agent interaction. In Spencer-Oatey’s (Spencer-Oatey, 2005) perspective, each of these tasks requires management of face, which, in turn, relies on behavioral expectations, and interactional goals. Our data support the tremendous importance of face, as the teens alternately praise and insult one another, all the while hedging their own positive performance on the algebra task in order to highlight the performance of the other. The data also contain numerous examples of mutual attentiveness and coordination as input into rapport management. Unlike prior work such as Tickle-Degnen (Tickle-Degnen & Rosenthal,1990) and the computational work that is based on it, we found it difficult to code positivity independently of its role in face. Therefore, our model posits a tripartite approach to rapport management, comprising mutual attentiveness, coordination, and face management (Zhao et al,. 2014)
Rapport Model (Enhancement/Maintaining)
Rapport Model (Destruction)
Having proposed a theoretical framework for rapport management, we have also proposed a computational architecture that allows virtual agents to enhance, maintain and destroy long-term rapport with their users. The proposed architecture is presented in the following figure, and is described in (Papangelis et al.,2014)
Computational Dyadic Architecture for Rapport Management
The technical innovations represented by this architecture include its dyadic nature, meaning that updates and grounding are done by taking into account both sides of the interaction – both human and agent. While we defined rapport-management strategies above, their effect is not guaranteed (and therefore cannot be grounded) until we observe the user’s reaction. To achieve this, it is necessary to represent a dyadic state modeling what has been grounded; a model of the user, representing the system’s beliefs about the user; and a putative virtual agent state inside that user model, representing the system’s beliefs of how the user perceives it. The data structures in our architecture, derived from our theoretical model of rapport, include the dyadic state representing the current state of rapport and a user model containing information we learn during the interaction.
We continue to iteratively analyze the data and use the results to update our theoretical framework which, in turn, allows us to innovate the computational architecture for a rapport managing virtual peer.