Mo Reads: Issue 3
This issue’s links range from literature to software optimization, from the psychology of money to philosophy, from space to artificial intelligence.
Links:
Thoughts on Meaning and Writing by Dormin (2,800 words, 11 mins)
The Cost of Cloud, a Trillion Dollar Paradox by Sarah Wang (3,200 words, 13 mins)
Dissolving the Fermi Paradox by Anders Sandberg et al (7,000 words, 28 mins)
Buy value, not price by Jacob Falkovich (1,000 words, 4 mins)
The mature optimization handbook by Carlos Bueno (18,000 words, 70 mins)
Public intellectuals have short shelf lives — but why? by Tanner Greer (3,000 words, 12 mins)
Boundaries, objects, and connections by David Chapman (800 words, 3 mins)
Cerebras WSE: Why we need big chips for deep learning by PR (2,800 words, 11 mins)
Better babblers by Robin Hanson (900 words, 4 mins)
Poets are intelligence assets by Benjamin Hoffman (2,500 words, 10 mins)
Thoughts on Meaning and Writing by Dormin (2,800 words, 11 mins): Dormin suggests a heuristic for meaning in life — “both in the short and long term I try to do things which one day could be put on a bullet point list about me. Or maybe I just try to do things I’ll remember… Especially when I’m older, I want to be able to sit at a computer and type out the actions, events, and people that I remember and be proud of the list before me”. This heuristic was inspired by his observation that the biographies of people like Arnold Schwarzenegger and Steve Bannon can be distilled into bullet points of historically noteworthy activities and accomplishments, then recognizing that most people neither can nor desire the ‘historical’ part and substituting it for noteworthy ‘on one’s own terms’: relationships, marriage & kids, work, money, passion projects. Dormin suggests 2 kinds of bullet points: (1) actions that create discrete memorable experiences (e.g. “do random spontaneous activity” vs “do more of the same old stuff”); (2) small daily contributions to ‘greater acts of meaning’ (e.g. regularly posting on a blog for years). I like this heuristic as an (but not the) answer to ‘the meaning of life’.
The Cost of Cloud, a Trillion Dollar Paradox by Sarah Wang (3,200 words, 13 mins): cloud is good for early-stage cos, but high margins depress rev and market cap later; because this shift happens late, it’s hard to reverse, so infrastructure optimization is often considered a nonstarter. But it shouldn’t be: Sarah conservatively estimates the recaptured savings in the (extreme) case of full workload ‘repatriation’ at $4B/yr for just “50 of the top public software cos”, translating to $100B market cap, or ~$500B for “the broader universe of enterprise software and consumer internet cos”. Proposed solutions: make cloud spend a KPI (e.g. Spotify’s Cost Insights), “tie the pain directly to folks who can fix it” (e.g. spot bonus if you optimize/shut down unneeded workloads), use 3rd-party tools to optimize infra, have software architects think about repatriation upfront (e.g. via Kubernetes and containerization). Prediction: growing awareness of these costs will force public clouds to start giving up margin and/or workloads
Dissolving the Fermi Paradox by Anders Sandberg et al (7,000 words, 28 mins): the Fermi paradox asks — where are the aliens? Are we alone in the cosmos? The Drake equation, which probabilistically estimates how many active comms civilizations there’d be given reasonable parameter estimates, yields high odds they should’ve contacted us by now, but there’s no evidence this ever happened; what gives? Over 70 explanations have been proposed, from “aliens are really rare” to “civilizational lifetimes are really short” to “they’re there, we’re just too technologically incapable of finding them”, plus more creative ones (I’m partial to the economic argument that it’s cheaper to transfer info than go physically). Sandberg et al take a totally different tack — they dissolve the question by a 2-step argument: (1) everyone’s calculating the Drake equation wrong, don’t use point estimates for highly uncertain parameters use distributions informed by current scientific knowledge; (2) doing that gives 30% odds we’re alone in the Milky Way and 10% in the observable universe. In other words: no paradox!
Buy value, not price by Jacob Falkovich (1,000 words, 4 mins): I like Warren Buffett’s aphorism that “price is what you pay, value is what you get”; it’s the intuition behind statements like “this phone is a steal at this price tag!”. Jacob takes this insight a step further, noting that marketing theory decomposes value into 4 factors to help market product better — (1) functional (what it solves, e.g. kitchenware); (2) social (signaling value, e.g. college degrees), (3) psychological (happiness from owning it, e.g. framed wedding photo); (4) monetary (financial benefit from owning/reselling it, e.g. stocks) — but that you-the-consumer can flip this around to figure out how much you’d pay for something. His example is a tailored suit: the functional value is the price of the cheapest alternative that keeps him warm/not naked (low), psych value is how much he’d pay to feel like he looks good in it (moderate), monetary value is ~zero (nobody would pay for a suit tailored to someone else!), so most of its value is social signaling. Side observation: people in rich countries pay mostly for social value
The mature optimization handbook by Carlos Bueno (18,000 words, 70 mins): surprisingly accessible for a technical handbook. Carlos begins: “The trickiest part of speeding up a program is not doing it, but deciding whether it’s worth doing at all.” Impossible to summarize, so here’s a few tidbits. (1) Optimizing without measurement guidance is foolish, as is optimizing everything; biggest wins tend to be in “the critical 3%”. (2) Proper measurement says little about whether optimization is worthwhile; this is ultimately a cost/benefit decision. (3) Measurement entails knowing what problem you’re solving; problem definitions must be falsifiable, and are often wrong to begin with, but that’s fine just iterate. (4) The sheer complexity of computer systems means you shouldn’t generalize too much from a given performance bug; measurement is your true north. (5) Big-O complexity is almost never the real-world reason for slow programs; it’s usually constants/coefficients etc, the stuff ignored in CS. (6) Dimensional (e.g. OLAP) vs relational entity (e.g. OLTP) modeling in DWH is mostly about the number of table joins to do analysis, i.e. an argument over performance, so “define problem and measure” applies. Default: complicating schema (flat table to OLAP etc) is an optimization as needs arise, not data modeling orthodoxy!
Public intellectuals have short shelf lives — but why? by Tanner Greer (3,000 words, 12 mins): Tanner observes: “public intellectuals… reign supreme in the public eye for about seven years or so. Most that loiter around longer reveal themselves oafish, old-fashioned, or ridiculous.” This is as true for the Song Dynasty and Elizabethan England as it is for (say) Thomas Friedman today — 3-time Pulitzer Prize winner, not butt of a thousand jokes. The real question is “why are so many public intellectuals capable of generating insight, originality, or brilliance at the beginning of their careers, but are utterly incapable of fresh thinking a decade later?” Tanner has 2 guesses: (1) psychological: both analytic brilliance and various measures of creativity peak and decline over a lifespan (artists early, STEM mid-age, humanities late) from brain cell loss, so the sweet spot for original work is 30-40s; this also explains why social attitudes shift via inter-generational churn, not at the individual level; (2) sociological: once a great thinker has reached the top, they have no incentive to optimize declining analytic/creative ability to stay original, because this requires unavoidably hard activities like poring through mountains of raw data or interviewing refugees in war zones. Tanner’s advice for thinkers is hence to “feel the urgency”: “figure out the most important intellectual problem you think you can help solve and make sure you spend your thirties doing that”
Boundaries, objects, and connections by David Chapman (800 words, 3 mins): most people are innate dualists: they consider the world as consisting of clearly separate objects. Some are instead monists, rejecting the idea of boundaries. David claims both stances are wrong via the example of a jar of jam. Whether the jar, lid and jam inside are one thing or three depends on what we do with it; the jam itself isn’t persistently object-like; the boundary between jam and blueberry bits inside is fuzzy. (Clouds are a clearer illustration of boundary fuzziness.) David uses these intuition pumps to motivate his alternative to monism/dualism: “participation”, that “objects, boundaries, and connections are co-created by ourselves and the world in dynamic interaction”, and that “there’s no single right way to draw boundaries around objects, or even between self/other”. This shift in stance entails another idea: “neither subjective nor objective, but interactive”. All these ideas echo Julie Moronuki’s points in The unreasonable effectiveness of metaphor
Cerebras WSE: Why we need big chips for Deep Learning by PR (2,800 words, 11 mins): these days there’s a growing gap between compute supply and demand. A major reason for rising demand is commercial AI deployment, mostly from deep neural networks, which need training to be useful, which in turn need enormous resources (months and millions). Supply isn’t catching up because of chip hardware constraints (core count, and bandwidth/latency comms between them, are limited by chip size). The solution sounds trivial: use the whole wafer for one chip! (And add in DL-specific optimizations, e.g. “fine-grained dataflow scheduling” to avoid multiplying by the zeros that comprise 90+% of sparse NNs.) The engineering obstacles are monumental however: design, manufacturing, power, cooling, comms etc; Texas Instruments and ITT tried and failed. Cerebras is the first org to succeed, creating the CS-1 and now CS-2. The specs are mind-blowing: 123x more cores than Nvidia’s industry-leading V100 GPU, 1,000x on-chip memory, 45,000x interconnect bandwidth etc. So is performance: e.g. a CS-1 outperformed the 84,000-core NETL Joule 2.0 supercomputer (top 100 in the world) by over 200x on a powerplant combustion chamber modeling problem involving computational fluid dynamics — in fact, faster than real time. The catch: if you can’t fit a model entirely in chip memory, you won’t get the ridiculous speedup. Still exciting to me!
Better babblers by Robin Hanson (900 words, 4 mins): “babbling” (as Robin defines it) is any style of talking/writing relying mostly on low-order correlations — what words to use (e.g. teacher’s passwords), which words go together, combos with positive associations etc — vs “deep structure” not reducible to low-order correlations e.g. tower of prerequisites for academic concepts. Babbling is everywhere: polite conversation, inspirational TED talks, public intellectuals pontificating outside their expertise. Eliza the chatbot proved that even very low-order correlations got surprisingly far, and stats/ML advances make AI ever better babblers. This portends a future where (Robin claims) people will try more and more to distinguish themselves from AI via non-babbler styles — either via deep structure (hard), or distinctive (“personalized”) talking styles, or “more indirect ironic insider talk” like on Twitter where words/phrases/references are used in “ways that only folks very near in cultural space can clearly accept as within recent local fashion” (easy), generating too little data for AI babblers to mimic. What’s striking, with the rise of the GPTs, is that future is already here: they can already pass the Turing test against humans on autopilot simply because most human conversation is babble.
Poets are intelligence assets by Benjamin Hoffman (2,500 words, 10 mins): in literary analysis, it often happens that you can get something out of the text the author didn’t intend, and that the same text can yield far more coherent narratives than the author could’ve plausibly meant; this is usually taken to mean these readings are spurious. Ben’s counterargument is related to John Nerst’s Partial Derivatives and Partial Narratives, albeit inspired by Robin Hanson’s argument that one way to get info from published studies that’s ~uncontaminated by publication bias is to look at the coefficients of control variables (as less scrutiny is applied to them): great literature is usually an integrated multidimensional depiction where authors try to report how things might’ve happened to satisfy their own sense of verisimilitude, so you should expect it to be an honest informative account of everything except what authors meant to put into it. (Readers from different eras might find most intriguing the stuff authors just consider background.) Literature that “stands the test of time” is lit that admits of multiple readings by future generations. This incidentally explains why DFW’s Infinite Jest is disadvantaged in this regard: it’s already optimized along every dimension for contemporary readers, sacrificing stuff that might’ve been serendipitously interesting to future readers.