Sally–Anne test

The Sally–Anne test is a psychological test originally conceived by Daniel Dennett, used in developmental psychology to measure a person's social cognitive ability to attribute false beliefs to others.^[1] Based on the earlier ground-breaking study by Wimmer and Perner (1983),^[2] the Sally–Anne test was so named by Simon Baron-Cohen, Alan M. Leslie, and Uta Frith (1985) who developed the test further;^[3] in 1988, Leslie and Frith repeated the experiment with human actors (rather than dolls) and found similar results.^[4]

Test description

To develop an efficacious test, Baron-Cohen et al. modified the puppet play paradigm of Wimmer and Perner (1983), in which puppets represent tangible characters in a story, rather than hypothetical characters of pure storytelling.

In the test process, after introducing the dolls, the child is asked the control question of recalling their names (the Naming Question). A short skit is then enacted; Sally takes a marble and hides it in her basket. She then "leaves" the room and goes for a walk. While she is away, Anne takes the marble out of Sally's basket and puts it in her own box. Sally is then reintroduced and the child is asked the key question, the Belief Question: "Where will Sally look for her marble?"^[3]

In the Baron-Cohen, Leslie, and Frith study of theory of mind in autism, 61 children—20 of whom were diagnosed autistic under established criteria, 14 with Down syndrome and 27 of whom were determined as clinically unimpaired—were tested with "Sally" and "Anne".^[3]

Outcomes

For a participant to pass this test, they must answer the Belief Question correctly by indicating that Sally believes that the marble is in her own basket. This answer is continuous with Sally's perspective, but not with the participant's own. If the participant cannot take an alternative perspective, they will indicate that Sally has cause to believe, as the participant does, that the marble has moved. Passing the test is thus seen as the manifestation of a participant understanding that Sally has her own beliefs that may not correlate with reality; this is the core requirement of theory of mind.^[5]

In the Baron-Cohen et al. (1985) study, 23 of the 27 clinically unimpaired children (85%) and 12 of the 14 children with Down Syndrome (86%) answered the Belief Question correctly. However, only four of the 20 children with Autism (20%) answered correctly. Overall, children under the age of four, along with most autistic children (of older ages), answered the Belief Question with "Anne's box", seemingly unaware that Sally does not know her marble has been moved.^[3]

Criticism

While Baron-Cohen et al.'s data have been purported to indicate a lack of theory of mind in autistic children, there are other possible factors affecting them. For instance, autistic individuals may pass the cognitively simpler recall task, but language issues in both autistic children and deaf controls tend to confound results.^[6]

Ruffman, Garnham, and Rideout (2001) further investigated links between the Sally–Anne test and autism in terms of eye gaze as a social communicative function. They added a third possible location for the marble: the pocket of the investigator. When autistic children and children with moderate learning disabilities were tested in this format, they found that both groups answered the Belief Question equally well; however, participants with moderate learning disabilities reliably looked at the correct location of the marble, while autistic participants did not, even if the autistic participant answered the question correctly.^[7] These results may be an expression of the social deficits relevant to autism.

Tager-Flusberg (2007) states that in spite of the empirical findings with the Sally–Anne task, there is a growing uncertainty among scientists about the importance of the underlying theory-of-mind hypothesis of autism. In all studies that have been done, some children with autism pass false-belief tasks such as Sally–Anne.^[8]

In other hominids

Eye tracking of chimpanzees, bonobos, and orangutans suggests that all three anticipate the false beliefs of a subject in a King Kong suit, and pass the Sally–Anne test.^[9]^[10]

Artificial intelligence

Artificial intelligence and computational cognitive science researchers have long attempted to computationally model human's ability to reason about the (false) beliefs of others in tasks like the Sally-Anne test. Many approaches have been taken to replicate this ability in computers, including neural network approaches,^[11] epistemic plan recognition,^[12] and Bayesian theory-of-mind.^[13] These approaches typically model agents as rationally selecting actions based on their beliefs and desires, which can be used to either predict their future actions (as in the Sally-Anne test), or to infer their current beliefs and desires. In constrained settings, these models are able to reproduce human-like behavior on tasks similar to the Sally-Anne test, provided that the tasks are represented in a machine-readable format.

On March 22, 2023, a research team from Microsoft released a paper showing that the LLM-based AI system GPT-4 could pass an instance of the Sally–Anne test, which the authors interpret as "suggest[ing] that GPT-4 has a very advanced level of theory of mind."^[14] However, the generality of this finding has been disputed by several other papers, which indicate that GPT-4's ability to reason about the beliefs of other agents remains limited (59% accuracy on the ToMi benchmark),^[15] and is not robust to "adversarial" changes to the Sally-Anne test that humans flexibly handle.^[16]^[17] While some authors argue that the performance of GPT-4 on Sally-Anne-like tasks can be increased to 100% via improved prompting strategies,^[18] this approach appears to improve accuracy to only 73% on the larger ToMi dataset.^[16] In related work, researchers have found that LLMs do not exhibit human-like intuitions about the goals that other agents reach for,^[19] and that they do not reliably produce graded inferences about the goals of other agents from observed actions.^[20] The degree to which LLMs such as GPT-4 can perform social reasoning thus remains an active area of research.

References

^ Wimmer, Heinz; Perner, Josef (January 1983). "Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception". Cognition. 13 (1): 103–128. doi:10.1016/0010-0277(83)90004-5. PMID 6681741. S2CID 17014009.
^ Wimmer & Perner (1983) (1983). "Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception. Cognition, 13, 103-128". Cognition. 13 (1): 103–128. doi:10.1016/0010-0277(83)90004-5. PMID 6681741.{{cite journal}}: CS1 maint: numeric names: authors list (link)
^ ^a ^b ^c ^d Baron-Cohen, Simon; Leslie, Alan M.; Frith, Uta (October 1985). "Does the autistic child have a "theory of mind"?". Cognition. 21 (1): 37–46. doi:10.1016/0010-0277(85)90022-8. PMID 2934210. S2CID 14955234. Pdf.
^ Leslie, Alan M.; Frith, Uta (November 1988). "Autistic children's understanding of seeing, knowing and believing". British Journal of Developmental Psychology. 6 (4): 315–324. doi:10.1111/j.2044-835X.1988.tb01104.x.
^ Premack, David; Woodruff, Guy (December 1978). "Does the chimpanzee have a theory of mind?". Behavioral and Brain Sciences. 1 (4): 515–526. doi:10.1017/S0140525X00076512.
^ "Autism and Theory of Mind: A Theory in Transition". www.jeramyt.org. Retrieved 9 October 2016.
^ Ruffman, Ted; Garnham, Wendy; Rideout, Paul (November 2001). "Social understanding in autism: eye gaze as a measure of core insights". Journal of Child Psychology and Psychiatry. 42 (8): 1083–1094. doi:10.1111/1469-7610.00807. PMID 11806690.
^ Tager-Flusberg, Helen (December 2007). "Evaluating the theory-of-mind hypothesis of autism". Current Directions in Psychological Science. 16 (6): 311–315. doi:10.1111/j.1467-8721.2007.00527.x. S2CID 16474678.
^ Krupenye, Christopher; Kano, Fumihiro; Hirata, Satoshi; Call, Josep; Tomasello, Michael (2016-10-07). "Great apes anticipate that other individuals will act according to false beliefs". Science. 354 (6308): 110–114. Bibcode:2016Sci...354..110K. doi:10.1126/science.aaf8110. hdl:10161/13632. ISSN 0036-8075. PMID 27846501.
^ Devlin, Hannah (2016-10-06). "Apes can guess what others are thinking - just like humans, study finds". The Guardian. ISSN 0261-3077. Retrieved 2016-10-09.
^ Rabinowitz, Neil; Perbet, Frank; Song, Francis; Zhang, Chiyuan; Eslami, S. M. Ali; Botvinick, Matthew (2018-07-03). "Machine Theory of Mind". Proceedings of the 35th International Conference on Machine Learning. PMLR: 4218–4227.
^ Shvo, Maayan; Klassen, Toryn Q.; Sohrabi, Shirin; McIlraith, Sheila A. (2020-05-13). "Epistemic Plan Recognition". Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS '20. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems: 1251–1259. ISBN 978-1-4503-7518-4.
^ Baker, Chris L.; Jara-Ettinger, Julian; Saxe, Rebecca; Tenenbaum, Joshua B. (2017-03-13). "Rational quantitative attribution of beliefs, desires and percepts in human mentalizing". Nature Human Behaviour. 1 (4): 1–10. doi:10.1038/s41562-017-0064. ISSN 2397-3374. S2CID 3338320.
^ Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023). "Sparks of Artificial General Intelligence: Early experiments with GPT-4". arXiv:2303.12712v5 [cs.CL].
^ Sap, Maarten; LeBras, Ronan; Fried, Daniel; Choi, Yejin (2022). "Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs". arXiv:2210.13312 [cs.CL].
^ ^a ^b Shapira, Natalie; Levy, Mosh; Alavi, Seyed Hossein; Zhou, Xuhui; Choi, Yejin; Goldberg, Yoav; Sap, Maarten; Shwartz, Vered (2023). "Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models". arXiv:2305.14763 [cs.CL].
^ Ullman, Tomer (2023). "Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks". arXiv:2302.08399 [cs.AI].
^ Moghaddam, Shima Rahimi; Honey, Christopher J. (2023). "Boosting Theory-of-Mind Performance in Large Language Models via Prompting". arXiv:2304.11490 [cs.AI].
^ Ruis, Laura; Findeis, Arduin; Bradley, Herbie; Rahmani, Hossein A.; Choe, Kyoung Whan; Grefenstette, Edward; Rocktäschel, Tim (2023-06-29). "Do LLMs selectively encode the goal of an agent's reach?". {{cite journal}}: Cite journal requires |journal= (help)
^ Ying, Lance; Collins, Katherine M.; Wei, Megan; Zhang, Cedegao E.; Zhi-Xuan, Tan; Weller, Adrian; Tenenbaum, Joshua B.; Wong, Lionel (2023). "The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs". arXiv:2306.14325 [cs.AI].

[1] Wimmer, Heinz; Perner, Josef (January 1983). "Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception". Cognition. 13 (1): 103–128. doi:10.1016/0010-0277(83)90004-5. PMID 6681741. S2CID 17014009.

[2] Wimmer & Perner (1983) (1983). "Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children's understanding of deception. Cognition, 13, 103-128". Cognition. 13 (1): 103–128. doi:10.1016/0010-0277(83)90004-5. PMID 6681741.{{cite journal}}: CS1 maint: numeric names: authors list (link)

[baron-3] Baron-Cohen, Simon; Leslie, Alan M.; Frith, Uta (October 1985). "Does the autistic child have a "theory of mind"?". Cognition. 21 (1): 37–46. doi:10.1016/0010-0277(85)90022-8. PMID 2934210. S2CID 14955234. Pdf.

[4] Leslie, Alan M.; Frith, Uta (November 1988). "Autistic children's understanding of seeing, knowing and believing". British Journal of Developmental Psychology. 6 (4): 315–324. doi:10.1111/j.2044-835X.1988.tb01104.x.

[5] Premack, David; Woodruff, Guy (December 1978). "Does the chimpanzee have a theory of mind?". Behavioral and Brain Sciences. 1 (4): 515–526. doi:10.1017/S0140525X00076512.

[6] "Autism and Theory of Mind: A Theory in Transition". www.jeramyt.org. Retrieved 9 October 2016.

[7] Ruffman, Ted; Garnham, Wendy; Rideout, Paul (November 2001). "Social understanding in autism: eye gaze as a measure of core insights". Journal of Child Psychology and Psychiatry. 42 (8): 1083–1094. doi:10.1111/1469-7610.00807. PMID 11806690.

[8] Tager-Flusberg, Helen (December 2007). "Evaluating the theory-of-mind hypothesis of autism". Current Directions in Psychological Science. 16 (6): 311–315. doi:10.1111/j.1467-8721.2007.00527.x. S2CID 16474678.

[9] Krupenye, Christopher; Kano, Fumihiro; Hirata, Satoshi; Call, Josep; Tomasello, Michael (2016-10-07). "Great apes anticipate that other individuals will act according to false beliefs". Science. 354 (6308): 110–114. Bibcode:2016Sci...354..110K. doi:10.1126/science.aaf8110. hdl:10161/13632. ISSN 0036-8075. PMID 27846501.

[10] Devlin, Hannah (2016-10-06). "Apes can guess what others are thinking - just like humans, study finds". The Guardian. ISSN 0261-3077. Retrieved 2016-10-09.

[11] Rabinowitz, Neil; Perbet, Frank; Song, Francis; Zhang, Chiyuan; Eslami, S. M. Ali; Botvinick, Matthew (2018-07-03). "Machine Theory of Mind". Proceedings of the 35th International Conference on Machine Learning. PMLR: 4218–4227.

[12] Shvo, Maayan; Klassen, Toryn Q.; Sohrabi, Shirin; McIlraith, Sheila A. (2020-05-13). "Epistemic Plan Recognition". Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems. AAMAS '20. Richland, SC: International Foundation for Autonomous Agents and Multiagent Systems: 1251–1259. ISBN 978-1-4503-7518-4.

[13] Baker, Chris L.; Jara-Ettinger, Julian; Saxe, Rebecca; Tenenbaum, Joshua B. (2017-03-13). "Rational quantitative attribution of beliefs, desires and percepts in human mentalizing". Nature Human Behaviour. 1 (4): 1–10. doi:10.1038/s41562-017-0064. ISSN 2397-3374. S2CID 3338320.

[14] Bubeck, Sébastien; Chandrasekaran, Varun; Eldan, Ronen; Gehrke, Johannes; Horvitz, Eric; Kamar, Ece; Lee, Peter; Lee, Yin Tat; Li, Yuanzhi; Lundberg, Scott; Nori, Harsha; Palangi, Hamid; Ribeiro, Marco Tulio; Zhang, Yi (2023). "Sparks of Artificial General Intelligence: Early experiments with GPT-4". arXiv:2303.12712v5 [cs.CL].

[15] Sap, Maarten; LeBras, Ronan; Fried, Daniel; Choi, Yejin (2022). "Neural Theory-of-Mind? On the Limits of Social Intelligence in Large LMs". arXiv:2210.13312 [cs.CL].

[:0-16] Shapira, Natalie; Levy, Mosh; Alavi, Seyed Hossein; Zhou, Xuhui; Choi, Yejin; Goldberg, Yoav; Sap, Maarten; Shwartz, Vered (2023). "Clever Hans or Neural Theory of Mind? Stress Testing Social Reasoning in Large Language Models". arXiv:2305.14763 [cs.CL].

[17] Ullman, Tomer (2023). "Large Language Models Fail on Trivial Alterations to Theory-of-Mind Tasks". arXiv:2302.08399 [cs.AI].

[18] Moghaddam, Shima Rahimi; Honey, Christopher J. (2023). "Boosting Theory-of-Mind Performance in Large Language Models via Prompting". arXiv:2304.11490 [cs.AI].

[19] Ruis, Laura; Findeis, Arduin; Bradley, Herbie; Rahmani, Hossein A.; Choe, Kyoung Whan; Grefenstette, Edward; Rocktäschel, Tim (2023-06-29). "Do LLMs selectively encode the goal of an agent's reach?". {{cite journal}}: Cite journal requires |journal= (help)

[20] Ying, Lance; Collins, Katherine M.; Wei, Megan; Zhang, Cedegao E.; Zhi-Xuan, Tan; Weller, Adrian; Tenenbaum, Joshua B.; Wong, Lionel (2023). "The Neuro-Symbolic Inverse Planning Engine (NIPE): Modeling Probabilistic Social Inferences from Linguistic Inputs". arXiv:2306.14325 [cs.AI].

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]