A short post today, to share a few thoughts about a classic idea in economics: Goodhart's law - to paraphrase, "when a metric becomes a target, it ceases to be a good metric".

In other words:
- Given that we want to guide a massive multi-agent system toward a set objective,
- Given we know another quantity (the metric) to be empirically correlated with that objective in past data,
- What is the likelihood that, if we now incentivize agents to increase that metric, it will still correlate with the true objective in the new dynamics that then unfold?

It should be possible to give simple geometric conditions under which, if you're too good at optimizing for the metric, there's a point when progress along that metric will be uncorrelated, or perhaps even anticorrelated, with an improvement in what the metric was supposed to measure all along.

That idea is very far from new (hello perverse incentives), though formalization seems lacking. However, what might be slightly less done to death is that, to my mind, this is actually a learning problem, with a tradeoff between how good the system is at optimizing its current target, and how good it is at updating the target. In some sense, I feel working hard toward a metric means committing to it in a way that is hardly reversible.

For instance, stock markets have proven very efficient at optimizing for profit, much less at updating for the fact that profit does not correlate that well with how "good" a company is. In that case, I believe it is fairly clear that the anticorrelation has been growing dramatically with the system's evolution.

It may be better to have a system that doesn't optimize as well for its current metric, but doesn't get stuck forever with one single metric. Academia is a market with much fewer players and much more discrete currency (quantum career jumps), hence infinitely less fluid than financial markets, but perhaps less likely to get stuck with former correlates forever. Although, as widely noted, it is fast headed down the single-metric high-fluidity lane.

Among many other things, this may be crucial in the future prediction markets, to say nothing of "futarchy" (i.e. market-based government). I think this objection goes beyond the usual: instead of claiming to know how prediction markets can fail to achieve what they set out to do, I am suggesting that they very success could be the problem. The better they are at optimizing, the more we need to be sure that what they reward is exactly what we want to reward, rather than "roughly in the right direction", because there will be fewer and fewer chances to steer the system in a different direction.

Is this tradeoff a necessary property, or a kink that can be engineered out? I'm not sure. But it really seems so far that, once you have reduced everything to a single fluid metric, it is extremely hard to step back and de-abstract (or re-abstract) from it toward some other approximation that now points more toward the original goal. Indeed, corporations might be a lost cause - to switch to another system that better indexes and rewards their contribution to public good, you would need to dismantle an infinite network of parasitic outgrowths, from shareholders and the financial system to vast swathes of law, that has basically grown into the spine of our society.

Doing just that all over again with knowledge (or nature, by the way) seems... unwise.

Tags: