RECOG-AI aims to improve AI evaluation by providing a framework and benchmarks for measuring the capabilities of AI systems.
RECOG-AI is a two-year (2021-2023) DARPA-funded project on the Robust Evaluation of Cognitive Capabilities and Generality in Artificial Intelligence. The RECOG-AI project is split into three core research packages, as below.
The first work package is the codification of a new AI evaluation framework based on cognitively-defined capabilities. This will draw from cognitive psychology and psychometrics to translate well defined cognitive capabilities (such as “object permanence”) into skills (e.g. “find an occluded reward”), and then into individual tests, and varied instances (varying in surface features) of these tests. The aim is to facilitate the robust conclusion that an individual “has” or “does not have” a capability, based on their pattern of performance across a battery of tests and instances.
The second is a protocol to assimilate existing AI benchmarks into the framework so that performance on these established assessments can be interpreted in terms of capabilities. This way we can better understand existing AI progress in terms of cognitive capabilities. This will require understanding the relationship between the actual and intended requirements of these benchmarks, and how they can be related to the skills afforded by specific capabilities.
The third is the design of an innovative benchmark for evaluating cognition in AI within a 3D environment. This will be based on the existing benchmark AnimalAI, which will be adapted to incorporate new features required by the evaluation framework designed in the rest of the project. This will require design of well controlled cognitive tests and programming of new functionalities and environments within Unity. Details of the Animal AI platform and its latest developments can be found at https://github.com/mdcrosby/animal-ai. You can learn about, play, and watch AI performance on the old version at http://animalaiolympics.com/AAI/.
The outcome of the project will be a new and robust evaluation framework that can be used to track AI progress using metrics more suited to widespread understanding of the capabilities of AI systems. This has important practical applications for AI-ethics and governance and tracking the rate at which AI advancements may be translated into new capabilities for deployed AI systems. The project will also output a public testing platform designed to facilitate progress on the key cognitive abilities that are currently missing in even the most advanced AI systems.