$FB | Facebook and the Dogfood Problem - The Diff

image

In this issue:

  • Facebook and the Dogfood Problem
  • Currency
  • Market Pricing
  • Substitution
  • Bitcoin's Long March: Tax Edition
  • First-Party Data

Facebook and the Dogfood Problem

"Eating your own dogfood" is a term in tech, first popularized at Microsoft, that refers to relying on products internally in order to test them out before they're made available to customers. It has a long and illustrious history; not only have many organizations used this rule to force themselves not to ship buggy or subpar products, but it's helped them refine the products they care about to make better before launch. Google was an email-heavy organization when Gmail was first developed, and it was used for a few years internally before launching to the public, which is part of why the original version of Gmail was optimized for people who a) liked email shortcuts, b) got a heavy enough volume of email that they needed to search it rather than filter it, and c) was designed around the assumption that the cost of storage would asymptotically approach zero.

Facebook does this, too; the company is one giant bet that many human-heavy problems can be solved with software, especially software written by Facebook and running in Facebook's datacenters—meeting people, socializing with friends and family, sending quick messages, but also, as it turns out, handling internal corporate communications and access to physical locations. At one level, it makes perfect sense that Facebook would do this; part of their core competency is ensuring that people can post status updates that are visible to close friends but not coworkers, or, as the case may be, to close friends but not family members. A company that specializes in providing that sort of service for billions of people could understandably assume that it ought to be able to determine who gets physical access to which conference rooms and servers.

Which, of course, did not work out as planned yesterday. Cloudflare has a good post going through the technical details, which revolve around the Border Gateway Protocol (one of those obscure topics that, when it's in the news, indicates that the news is bad, e.g. here). This outage didn't just knock out Facebook's consumer-facing products (including Instagram and Whatsapp), but also crippled their internal systems, including email and, as alluded to above, physical access to data centers. (They did not apparently need to use angle grinders, although it's a great image.)

The outage had some serious consequences. One company estimates that over the six hours of the outage, US non-Amazon e-commerce sales dropped by about 27%, or about 13% over a full 24 hours. With total US e-commerce sales of about $376bn, that implies a total sales impact in the US of around $200m. (This is a nice natural experiment demonstrating that online ads do indeed drive revenue, and that the subprime attention crisis is not true everywhere.) For Facebook, six hours of revenue is around $80m, also not insignificant. Some of this will obviously come back, and some of it, also obviously, won't; a great deal of money and brainpower is devoted to making ads as compelling as possible and purchases as seamless as possible, and breaking the site that serves those ads will cost sales if the ad-sellers are doing their jobs.

FB has been obsessive about reliability for a long time. The company's early days are full of stories about investing in hardware well ahead of immediate needs, because growth was rapid enough that being 3x overprovisioned today meant being underprovisioned in a few months (for all its issues with the broader story, The Social Network got this part right).

In one sense, Facebook's decision to centralize so much of its products is hard to argue with: there are only a handful of companies with a bigger economic interest in maximum uptime, and the other ones are direct or indirect Facebook competitors. (Amazon is a special case: AWS outages do happen, but to the average person they're not experienced as an AWS outage, but as an outage from a site that uses AWS. And AWS is both a smaller business and one with a lower operating margin than Facebook.) Facebook could have implemented backup systems, but every fully independent backup system is just a way to trade off between reliability and security: if there are two ways for employees to communicate rather than one, there are two attack vectors, too.

Facebook's problems occured at the discontinuity between distributed and centralized systems: the Internet is, as Cloudflare notes, a network of networks, but the process of getting those networks to talk to one another necessarily involves some centralization; everyone needs to be running the same protocols, and needs to have a canonical way to determine which server is meant when someone types "facebook.com" into their browser. That centralization meant that Facebook ran a centralized risk, too.

Solving this kind of problem is fundamentally hard. Solving the exact bug is easier, through some combination of 1) running a quick script that checks BGP and similar updates for any catastrophic mistakes, and 2) having more humans review them. But bugs are not one-off; they tell you that the distribution of outcomes you're sampling for is buggier than you thought. Compressing that distribution might mean more internal checks, or it might mean a fundamental rethink; in one sense, the goal of Facebook should be that "Facebook, Instagram, and Whatsapp are all down" is an implausible outcome, the way it's hard for the dollar or the English language to have downtime; there are specific systems that can fail, but there are layers and layers of fallbacks that make catastrophic failure less likely, and make it easier to recover from incremental failures.

Facebook has talked about the metaverse as an area that's exciting because they can have an impact on how it's designed and built. They got to smartphones a little late, and while they rode that paradigm shift for all it was worth, they clearly had ideas for how phones could be better. They may start looking back at older technology layers they rely on, in order to rethink those. Facebook went down due to human error, at many levels, but given that the networks of the future will be operated by humans, too, the goal will be to design a system resilient to those mistakes rather than trying to eliminate them entirely.

Elsewhere

Currency

In macroeconomics, it's easy to divide the world into countries that can easily borrow in their own currency and countries that have to borrow in somebody else's, taking FX risk along with the usual cost of servicing debt. Increasingly among companies, there's a similar division: companies like Roblox can issue their own currency, which users willingly hold and spend rather than converting back to more typical currencies. Tinder is testing the same thing. For an online dating site in particular, virtual currency can be very powerful, because some users are big contributors to the network effect while others are bigger contributors to the bottom line. Having an internal currency that can incentivize particular behaviors is an effective way to get the maximum value out of both.

Market Pricing

Shortages generally sound more acute when they're phrased in terms of particular inputs rather than the outputs they're meant to satisfy; lumber prices rising by a lot and then crashing are a more exciting story than housing rising at a more modest and stable pace. This is happening in an interesting way in energy right now, because the shortage of gas is leading to higher prices for emissions credits ($, FT). When emissions have a market price, natural gas is a cost-competitive and readily available substitute for coal; when gas prices are high, though, the converse holds true, too. This is a good case of market signals doing exactly what they're supposed to do: if carbon emissions are a negative externality, and coal can substitute for gas, then energy shortages are more expensive than they would be in an environment where emissions aren't priced—exactly the reality that carbon credit markets are meant to reflect.

Substitution

The WSJ notes that higher-priced appliances are more likely to be in stock than cheap ones ($, WSJ), because companies are making the rational decision to maximize gross profit per shipping container given a shortage of containers. This is a fairly straightforward case of Alchian-Allen, with an interesting side effect: if people who are shaping public perception around supply chain issues are disproportionately likely to be either well-off or renters, then they won't be directly exposed to the average person's experience of not being able to buy a replacement for a cheap washing machine.

Bitcoin's Long March: Tax Edition

I've written a few times before about how cryptocurrency gradually sneaks into more and more cautious institutions, and rarely gets rejected. Another example: financial advisors have discovered that crypto's volatility, and its treatment as property, means that tax losses can be realized quickly ($, WSJ), at least for the rest of this year. For 2021, that means more cryptocurrencies will be owned by people who reliably sell during drops (the effect of this on realized volatility is hard to predict, since perfectly predictable selling behavior creates perfectly predictable buying behavior from market makers). Over the longer term, it means that many more people will have made their first crypto investment for tax purposes, even if those tax purposes go away.

First-Party Data

Amazon is allowing people to send gifts without knowing the recipient's address as long as they have contact information. Given how long it took Amazon to launch this, it can't be too material to the business, but it is an interesting illustration of the options available to a sufficiently ubiquitous company. Many companies have some idea of people's addresses, but some of that address data got wildly out of date right around the middle of 2020. If people are going to update addresses, the most likely place to start is with the company that sends them the most stuff, so Amazon ends up having very up-to-date information on who lives where. (It would be interesting to see if any Amazon employees had made savvy real estate purchases based on this relocation data. There is a precedent ($, WSJ).) As with a number of other platform companies, Amazon can create the consumer surplus from sharing detailed personal information, without literally sharing that information—and then it can take a cut.

(Disclosure: I am long Amazon.)

This post is for paying subscribers