Individual and team profiling to support theory of mind in artificial social intelligence
The experiment reported here was pre-registered on the Open Science Framework28 and the data collected as part of this study and that is used in the analyses presented in this work, has been made publicly available (data available here:29). The experiment manipulated the presence of an ASI teammate serving in an advisory role such that some USAR teams had no advisor, others a human advisor, and the remainder paired with one agent imbued with artificial social intelligences (ASI). The purpose of this at the ASIST programmatic level was to test the effectiveness of six different ASI agents, developed by independent research teams, and as part of a research program testing the effectiveness of social-cognitive architectures developed for teams25. Because our interests were more in understanding how profiles are related to human-agent teaming, for our analyses, we collapsed across the six agents combining them into a group of ASI advised teams. We will discuss agent capabilities and constraints in a later section, but overall, the ASI advisors were designed to closely monitor the actions and communications of the teams with which they operated. Based upon inferences derived from the agents AToM, developed by monitoring their human teammates, the agents provided advice and interventions to improve the team’s process and achieve successful outcomes.
Simulated USAR missions
Teams had to complete two simulated Urban Search and Rescue (USAR) missions executed in a testbed built using the Minecraft game environment (see29 for a full description of the testbed). The testbed was designed to support virtual collaboration, allowing for remote experimentation during the pandemic. As such, teams were not collocated, and experiments were coordinated by remotely connecting participants to the testbed along with virtual meeting software allowing for synchronous communication and video of each team member. Missions were completed in a fixed order with each mission featuring a rescue in a partially collapsed building and included a different perturbation such that, in the first mission, teams experienced new rubble falling into the search area, blocking some areas, and, in the second mission, teams experienced a disruption (or black-out) of a shared map included to support coordination. Each of 113 teams were assigned to one of eight (between-team) conditions, with 15 teams in the No Advisor condition, 14 teams in the Human Advisor condition, and 14 teams in each of the remaining six Artificial Social Intelligence Agents (ASIs) Advisor conditions (i.e., 84 total teams in the ASI advisor condition). Each of the three team members were randomly assigned a role, such that the role assignment order for each team was fixed and distributed in order of connecting to the testbed). The three roles in this study were Medic, Engineer, and Transporter, and each role had unique capabilities, tools, and knowledge related to possible locations of victims. Teammates communicated with each other using virtual meeting voice communications as well as through what we call knowledge externalization tools. These tools were designed to be scaffolding for coordinative communications; that is, testbed tools that afforded externalization of cognition30 via physical markup inside the virtual building and reflected on a mini-map layout of the USAR environment31 (e.g., blocks marked with pertinent information and placed in front of a room in the building).
Medic role
One individual on each team was assigned the role of Medic, which involved using a device to triage the building-collapse victims that were distributed throughout the environment to determine what kind of injuries they had. The medic was the only role that could acquire information about whether a victim was type A—abrasions, B—bone damage, or C—critical. After the injuries of a given victim were identified, the medic could then stabilize that victim in preparation for transport. It is relevant to note that victims could be transported before they were stabilized, and that transporting victims was a capacity of all roles on the team; however, the nature of a given victim’s injuries dictated their evacuation point, so the effectiveness with which the Medic acquired and shared this information was foundational to team success as well as efficiency of coordination. Additionally, critical victims required assistance from team members to heal/stabilize, which will be discussed in more detail at the end of this section.
Engineer role
A separate individual on each team was assigned the role of Engineer. Engineers had the slowest base movement speed on the team and were able to break rubble blocks. Clearing rubble was a critical task for mission success because it could open new paths or reveal that a victim was trapped within a pile of rubble. Engineers were also uniquely provided with information about the structural stability of rooms in the USAR environment, and they are shown the locations of ‘threat rooms’ (rooms where rubble was likely to fall and trap teammates) on the shared map. The effectiveness with which they shared this information was important for supporting the risk management process of the team.
Transporter role
The final individual on each of the three-person teams was assigned to the Transporter role which had the fastest base movement speed on the team, and provided participants with the ability to detect at a distance whether there was a victim inside of a room in the environment. The Transporter’s effectiveness at searching the environment for victims and communicating those locations to their team was vital to the team’s ability to coordinate triage, stabilizing, and evacuation work as well as to organize the interdependent team tasks.
As noted above, all of the roles were able to pick up the victims, carry them and set the victims down in the environment, but the victims would only count as “evacuated” for the team if the victim was both successfully stabilize and transported to the correct evacuation zone for that victim type (A, B, or C). Additionally, each participant was provided with the same set of knowledge externalization and communication tools that displayed various symbols and could be placed on the virtual floor (as well as removed) such that teammates could see them in the environment as well as view them on a shared mini-map. The semantic meaning of each marker block was as follows: Victim type A, Victim B, No Victim Here, Critical Victim, Regular Victim, Threat Room, Rubble, and Help Me Here. These allowed teammates to quickly and clearly communicate their task needs and organize interdependent tasking as well as backup behaviors.
Teams were also challenged with a shared, interdependent joint-task that required two teammates to work together to stabilize critical victims. Although the medic role was still required to stabilize a critical victim, an additional teammate was required to be in proximity to the victim in order for that stabilization to be successful. Walking away from the victim or not being close enough would cause the medic’s stabilization action to fail.
Before completing the experimental trials, participants responded to surveys and measures that captured participant individual differences and current dispositions, and after the missions they completed measures regarding their team’s success, perceptions of team process, and ratings of their team’s advisor (in conditions that included an advisor). The surveys relevant to this manuscript are described in more detail below.
Materials and measures
Individual player profiles
Player Profiles are based on a six-component model that is constructed from psychometric, psychographic, and skill elicitation measures that tap individuals’ taskwork-related and teamwork-related potential. Related to what was described earlier, this distinction is based on team theory to differentiate the varied competencies associated with completing a task and those needed to collaborate effectively. The combined model of the player profile includes six components with three tapping taskwork potential and three tapping teamwork potential. This integrated approach attempts to bridge the gap between traditional approaches to understanding/facilitating human behavior and modern methods for implementing artificial agents.
In this study, taskwork potential refers to a set of largely task generic competencies related to performing in a virtual world. They capture one’s ability to navigate and recall pathing, comfort/familiarity with task completion in game-based environments, and task execution in the custom Minecraft testbed; the three components used in constructing the task potential part of the model were intended to capture these facets. Ability to navigate and recall pathing was captured using the Santa Barbara Sense of Direction (SSOD), a validated measure of spatial navigation32 and predictive of ability to successfully learn and navigate both real and virtual environments33. Comfort and familiarity with task completion in game-based environments was captured using a Video Game Experience Measure (VGE; see Appendix B of the Study 3 Preregistration in:29) which targeted video gaming specific experience and skills related to Minecraft and the USAR gamified task. The third component of the task potential part of the model was more task specific. It was captured through a timed, in-game Competency Test (Comp). Although somewhat task specific, the behaviors were fairly generic in the Minecraft game environment; that is, this was a behavioral measure where each player had to individually complete a task battery requiring that they execute essential game actions necessary to complete the task in the Minecraft testbed (e.g., breaking walls).
The other half of the player profile model, the teamwork potential profile, consists of a set of team generic competencies related to collaboration and interpersonal competencies. This was built to capture an individual’s ability to discern mental states/emotions, group interaction tendencies, and collective engagement and grouping behavior. Ability to discern mental states/emotions was captured using the Reading the Mind in the Eyes Test (RMET). This is a validated measure designed initially to detect subtle deficits in ToM in adults with high-functioning autism, and has also been related to neurotypicals’ ability to make mental state attributions34. Interaction tendencies were captured through the Sociable Dominance scale (SD), a validated measure of sociable dominance in individuals that can predict social interactions; for example, it has been found that individuals high in sociable dominance tend to use reasoning and direct communication strategies with others35. Finally, collective engagement and grouping behaviors were captured using the Psychological Collectivism scale (Collectivism), a validated measure of attitudes individuals have about working in groups, including preferences for being in a group, concern for the group, and whether they tend to comply with group norms and rules36.
Individuals were categorized as high or low in teamwork potential and in taskwork potential based on the combination of their scores on the measures related to that part of the model (see Fig. 1). Specifically, if an individual scored higher on two out of the three measures they would be classified as high, and if they scored low on two out of the three measures they would be classified as low. Possible profile groups included: low taskwork – low teamwork, high taskwork – low teamwork, low taskwork – high teamwork, and high taskwork – high teamwork potential. To provide a concrete example, an individual who scored above the sample median on Video Game Experience measure and above median on the SSOD, would be classified as ‘high’ on task potential. And if that individual scored lower than the median on Sociable Dominance and lower on Reading the Mind in the Eyes, they would be classified as “low” on team potential (see Fig. 2 for examples). Thus, their combined profile would be high taskwork – low teamwork potential.
Team holistic profile formulation
Team profiles were constructed from the combination of individual team member profiles through modal analysis (e.g., the most prevalent taskwork and teamwork potential profiles determined the overall team profile). This allowed us to categorize each entire team as low or high taskwork potential and low or high on teamwork potential in a similar manner to the player classifications. For example, if a team consisted of one player that was categorized as high taskwork but low teamwork, a second that was low taskwork low teamwork, and a third that was classified as high taskwork high teamwork, they would then be collectively categorized as high taskwork (2 of 3 representatives) low teamwork (2 of 3 representatives). Further, to provide more insight into the predictiveness of the profiling technique we additionally provide analyses focused on the far ends of the classification spectrum (e.g., teams that were classified as low taskwork – low teamwork or as high taskwork – high teamwork). These groups were selected for additional analysis because this approach to holistically profiling teams is novel, and we hypothesize that there may be interactions between the teamwork and taskwork profiles that would make interpretation of the low taskwork – high teamwork, high taskwork – low teamwork teams unclear at best.
Artificial social intelligence agents overview
The six ASI agents in this study were individually developed by ASIST program performers/teams, and were instantiated with a performer-defined AToM, implemented as computational models of team attributes, teamwork processes, and their impact on effects in teams (see29 for these descriptions in the study preregistration). The ASI agents acted in an advisory role to support their three human team members with teaming behaviors and engaging in teamwork in the experimental tasks. The agents were required to adhere to certain constraints to maintain the primary goal of supporting teamwork processes rather than taskwork – to this end, the agents were not embodied in the virtual Minecraft-based simulated USAR testbed and were not able, or allowed, to engage in any of the taskwork in this study. The ASI were also constrained with respect to their knowledge of the environment so that it would comparably be realistic in the real world. Specifically, they were not given omniscient knowledge of where every victim was, what the best route to a particular location would be, or where threats (e.g., risk of further building collapse) may exist. However, the ASI were able to observe the actions the human team members took in the virtual environment, see the externalized cognitive artifacts (e.g., marker blocks), and see the field of view for each team member to allow them to perceive what each person has seen or encountered in the environment. This allowed the ASI agents to make inferences about the human team members’ beliefs, intentions, goals, and knowledge based on the observable actions taken in the environment. The different agents used various approaches to this, some utilizing Bayesian approaches to model human perspectives, short-term and long-term planning, aspects of human cognition such as workload or emotion, and track multiple hypotheses37. Other agents use internal models that estimate participant knowledge of relevant known entities spatially through a 2D representation of the experimental environment, using neural network prediction models developed on human-annotated and simulated player data to make inferences and predictions based on accumulating knowledge of, and modeling, each individual team member over time and the team as a whole38. Additionally, the agents were provided access to all surveys taken by participants before completing the experimental task, as well as various analytic components (ACs) that were developed independently by performer teams. To varying degrees, the ACs were developed based upon team theory and designed to augment the ASI architecture. As described, our AC was developed to help the ASI understand player profiles. Other research teams based their ACs on, for example, leadership theory. For example, one AC used pre-experiment surveys and in-game data to identify emergent leadership in human players39. This could then be used by ASI to determine where to direct leader relevant interventions.
The player profiles we described above were implemented in the testbed as an analytic component with the intent to provide machine-readable input about the players, and the team as a whole, through the quantified profile model. The player profile analytic component read the survey and gameplay data used in the model, calculated the profile for each individual and the team overall, and published the player profile model’s output to the message bus used by the agent to receive testbed, survey, and component data. The player profile models were developed during prior program studies (see25,40) before the inclusion of the ASI agents in the present study. The profiles afforded the ASI the ability to interpret the measures and gameplay data used in the profile construction in the context of the theoretical grounding used to develop the profiles over the course of the ASIST program. The ASI agents integrated the player profile data into their internal models to help inform their predictions, allowing the ASI to consider an individual’s potential capacity for teaming and tasking behaviors into their existing models and calculations. Specification of the technical integration of the player profile models, or any analytic component used in this study, into the various ASI agents AToM and cognitive architectures is beyond the scope of this paper. Interested readers are referred to the publicly available dataset, which also contains all of the code and documentation for the agents used in this study, and for the documentation of the player profile analytic component (see27). Additionally, this study did not manipulate the provision of the player profiles or any other information to the ASIs as a variable. All ASI agents were provided the same access to the testbed, survey, and analytic component output data. Thus, because we were not able to manipulate the provision of profiles within agents and teams, we are unable to comment on the particular impact the profiles had on agent interventions and determinations provided to teams. Rather, our analyses focus on the player profile models predictive power with regards to the experimental task measures and perceptions of the ASI. This allows us to comment on the utility of the a priori information being provided to actual artificial social intelligence agents through the player profile model, and whether they provided the ASI with data that is indicative of overall and specific task performance.
USAR task metrics
Performance on the USAR task was tracked along multiple dimensions. Each of these measures are considered to reflect successful taskwork, successful teamwork, or a combination of taskwork and teamwork.
At the individual level, each role that participants could be assigned was associated with a set of unique or optimal taskwork functions that they could perform as well as a set of teamwork functions in which they could engage to support coordination. Given the nature of the testbed, measures of teamwork were relatively difficult to track because, as mentioned above, human teammates communicated through voice comms, and the natural language associated with those exchanges has not yet been fully processed and analyzed by our team. Accordingly, the measures we employ for this article often are taskwork focused or have taskwork woven into their execution, but several either have teamwork components or are entirely reflective of teamwork actions (see Table 1).
Study sample
This study involved 113 three-person teams completing two 17-minute gamified urban search and rescue missions implemented in Minecraft. All participants engaged in the study remotely from an internet connected computer of their choice and were overseen by experimenters at Arizona State University (see the Study 3 Preregistration in: 29) who carried out the methods in accordance with the approved protocol, and relevant guidelines and regulations. All participants in this study reviewed and completed an informed consent form prior to participation. The study was reviewed and approved by the Arizona State University Institutional Review Board. The data sharing agreement and approval for data analysis by researchers at the University of Central Florida were overseen, reviewed, and approved by the University of Central Florida Institutional Review Board. All local approvals were further submitted to the Army Human Research Protections Office (AHRPO) for supplemental review.
Analysis populations
The analyses reported here relate to multiple different groups of participants as a function of the focus on individual versus team outcomes, the availability of complete data for player profiling, the availability of complete data for team profiling, and the exclusion of groups to support the inspection of outcomes related to ASI advisors specifically. More detailed study information on this study and the data repository can be found at28 and29 respectively. See Table 2 below for demographics descriptives for each subset of the data.