Andrey Breslav, August 29, 2023

todo

Goals:

every frontier model undergoes evaluation

models failing safety criteria are not deployed

evaluation techniques are systematically improved

Overall approach:

Roles: