As you might have heard, quite a few people are becoming agitated at the thought that we may actually be seeing good AIs in the near future. The general reasoning is that once we make AIs better than ourselves, they can build better AIs than themselves, and once AIs are as far beyond our intellect as we are beyond a ferret's, they might well get us extinct like we're currently extinguishing a whole lot of other species, even without specifically meaning to.
I definitely agree on the possibility - it is one of a dozen different apocalypses queuing up to visit us, and it might cross the finish line before nuclear winter or the superbug. Unfortunately, people doing actual work on this issue have decided to devote their effort to a hard and vague endeavor known as Value Alignment.
What they perceive as a fundamental philosophical problem, I see as a basic failure of utilitarianism alone: the assumptions that
- decision making is neatly divided into values and strategies;
- values are learned early on (or innate) and reasonably stable during most of your life, and define "what you want",
- strategies can (ideally) be almost freely chosen for the purpose of maximizing your happiness.
This is pretty much how we build AIs so far: we write or train them to have an objective function, then they do whatever it takes to maximize it. And this, of course, gets us stuck dealing with a very hard problem: making sure that any AI that can outsmart us - and hence rewrite itself better than we wrote it - will forever-always retain a utility function whose maximization does not entail annihilating mankind as a side-effect.
But this is in stark contrast with everything we know about humans: namely, that people are terribly bad at acting according to a consistent utility function, and it's really hard to tell what their values are and how happy they are right now. Distinguished economists such as Kahneman claim that people do have a real underlying utility function, but they're bad at evaluating it themselves, because of cognitive illusions and other irrationalities. Know thyself = know thy utility, sayeth the man.
I for one side with the claim that we don't work with an absolute utility function at all - a very natural consequence of our mind being an analogy machine, rather than an optimizer. All our preferences are tightly bound to context: we learn to desire something in a specific setting, and then to recognize analogous settings in which the same values apply. Some situations are about winning at any cost, others about not losing anything, others about making sure that your opponent is worse off. See the totally different utility of the very same amount on money, depending on whether it is spent on a fine, an insurance premium, a purchase or a gift. No complex interplay of different values, just different contexts in which we have internalized different goals. The inconsistencies that plague behavioral economists arise from them inadvertently probing multiple contexts, or ambiguous situations where the context can be identified in various ways, creating oscillations in expectations and desires.
Utilitarianism is simply the assumption that all settings are analogous to making a purchase at a market. It might be useful as a prescription for how to behave, and it can work well as a local, tactical, individualistic approximation of many things that we do, but it is pretty poor at making sense of our long-term behavior. Fundamentally, we don't want much, but we want to want, and it is society and contacts with each other that teach us what to desire or value in each context, and how to generalize between those. Indeed, many social ills come from situations where society is not giving us convincing wants - the disease of ghettos is not lack of food or safety, but lack of any plausible future.
How would an AI built like this work? Well, its only absolute utility would be wanting to learn from others how to recognize a context, and what to strive for in each new context it discovered. A chess AI would not maximize its wins by killing human players and replacing them with chinchillas, because it would not care about winning outside of whatever it can recognize as a proper game. Any illegal move would ruin it. (It is almost a defining characteristic of a game that you can ruin it by playing it wrong, and most of our desires derive from social games)
Would that guarantee that said AI wouldn't exterminate us? Not any more than being human deters us from trying to annihilate each other. But it would change the problem's nature, from individualistic preferences to ensuring proper socialization. In essence, we would have to deal with AIs much like we deal with all potential sociopaths, by making sure they play well with their little friends in kindergarten.
That is not at all to say that it is impossible for us to build an entirely inhuman general AI - one that works with context-free values.
To a large extent, we're already getting a taste of it: we have engineered slow, analog, distributed AIs collectively known as "the System", and we still wonder why they seem so intent on crushing everything human in us.
Corporations, for example, are excellent context-free short-term wealth maximizers - especially, of course, those that deal in finance. Governments are of course another example, though what they maximize is more mysterious (probably their own mass).
In either case, the one saving grace is that they still use some humans as moving parts, and therefore they need us too much to extinguish us entirely. But even when every single human in the chain would recognize that the trajectory of the institution is counter-productive for its original nebulous context-bound set of goals (innovating here, increasing welfare there...), we end up with very little say in what it does in the end.
(Edit 2017/01/04: Thanks Laura E. for suggesting Kuipers' article on corporations, which takes these ideas much further than I did!)