Value Learning Problem

#accountingintelligence

Autonomous AI systems’ programmed goals can easily fall short of programmers’ intentions. We discuss early ideas on how one might design smarter-than-human AI systems that can inductively learn what to value from labeled training data.

Standard texts in AI safety and ethics generally focus on autonomous systems with reasoning abilities that are not strictly superior to those of humans. Smarter-than-human AI systems are likely to introduce a number of new safety challenges. Even small errors in the first super intelligent systems could have extinction-level consequences.

Even if AIs become much more productive than we are, it will remain to their advantage to trade with us and to ours to trade with them. As noted by Benson-Tilsen and Soares, rational trade presupposes that agents expect more gains from trade than from coercion. The upshot of this is that engineering a functioning society requires that powerful autonomous AI systems be prosocial.

The idea of super intelligent agents monomaniacally pursuing “dumb”- seeming goals may sound odd, but it follows from the observations of Bostrom and Yudkowsky. Instilling tastes or moral values into an heir isn’t impossible, but it also doesn’t happen automatically. Without a systematic understanding of how perverse instantiations differ from moral progress, how can we distinguish moral genius in highly intelligent machines from moral depravity?

Auto205

Article 1