We can't figure out how to make an artificial general intelligence aligned with human values, because we can't figure out how to make humans aligned with human values. If you give enough power to anyone, the fate of the world is at risk. The tricky part is trying to make a powerful computer optimization process without it being powerful in that sense. It might be impossible to explain human values – we haven't figured out how to explain it to each other – and watching people and learning isn't as powerful a paradigm as you might hope, if you want actual safety.

I suppose the trick is the same one we use: we have a system where, to satisfy your values, you have to delight others. There's concern about a sufficiently clever optimizer breaking all the known rules, but if that framework mostly works for people, it seems like it would mostly work for a basic optimizer.