Jose Drost-Lopez

Chasing perfection: a tale of sequential decision-making

In Math, Personal on May 29, 2011 at 2:54 am

Now *that's* my kind of piano teacher.

My grandest ‘real-life’ use of probability theory until recently was to estimate my odds in Risk and poker. Then, while skimming a math-themed book, I singled out a principle that will shape how I will spend $6,000 and about 200 hours of my life in the upcoming year. As far as the math goes, it doesn’t matter whether I’m referring to getting a girlfriend, a pet, an apartment, or a job. But at the moment I am actually picking a piano teacher. The teacher-picking theorem I have in mind is based on an oversimplified scenario that has an optimal strategy. Trying to apply it has been interesting. But before I reflect on my story, let’s glance at what the math says.

Sequential decisions

Start by imaging a series of candidates, among which we want to choose the very best. We have to assess candidates one at a time, and we have to decide whether to accept or reject them immediately after inspecting them. The candidates come to us in random order, so that the first one we assess is just as likely as the last one to be the best. The optimal strategy in this setup is to inspect about 37% (or exactly 1/e) of the candidates and then accept the next one that is better than the first 37%. This approach is optimal (i.e., mostly likely to snag the best option) because 37% of candidates is just enough to estimate which ones are exceptional without rejecting too exceptional ones.

“Sequential decision making,” as this scenario is sometimes called, first came to my attention in a psychology study. When healthy people do computerized sequential decision tasks—sometimes simulating job interviews or shopping—they tend to jump the gun. However, this study in Germany found that people suffering from depression tend to wait longer and more closely approximate the best strategy. While this line of research has interesting implications for the causes and origins of depression, my main personal reaction was a sense that I probably don’t sample enough options in my life decisions. A prime example heads off my piano story: I picked a piano teacher a two months ago by emailing a music professor I did not know and taking his recommendation. Out of 30 available teachers listed in a directory for my area, I ended up ‘sampling’ only 1/30 ( 3.3%) of them. That’s all the more suboptimal because with piano teachers I can choose earlier candidates, which justifies sampling a larger fraction of teachers than in “37% rule” I’ve described. My only defense is that the recommending professor seemed to be familiar with the candidates, so he probably ruled out some of the least compatible candidates.

The experiment

Fast forward to two weeks ago; I was now looking for a new piano teacher. Emboldened by my pet theorem, I emailed nine teachers (about 30% of my options) based on what little information I found online. It was not hard to pick four of the more interesting teachers to meet in person. I have now met three out of those four and I feel glad with my approach, but I am still making sense of the challenges involved. The main challenge is making fair comparisons. I started my search by de facto rejecting all the teachers I did not email based on little or no information; then I rejected some email respondents based on unreliable cues in their messages; and, in the final step, I made snap judgments from meetings that were subject to confounds like mood, time of day, shared expectations, and who knows what else. My two conclusions here are that (1) quality-assessments are imprecise, especially when they involve judging a match between people and (2) “counting” how many candidates have been “assessed” is subjective unless they all receive similar attention under similar conditions.

My next big hitch is about social emotions, not strategy. Specifically, my meetings so far were all at some point comically awkward, because in each case I did not want to admit how widely I had casted my net. I vaguely mentioned “considering my options” and, at most, I acknowledged meeting one other teacher. Although I don’t feel ashamed about my attempts to find a good match, I don’t want to upset teachers who might frown on my approach. After all, no one likes being compared to others or facing rejection. Thus I tried to treat each teacher as my top choice without making false promises. Looking to the future, I also wonder if more awkward situations will develop. Will any teachers gossip disapprovingly of me? Will I bump into “rejected” teachers at future recitals? These questions suggest that trying out multiple teachers has had a minor “emotional cost” for me. I imagine this type of cost could be much heavier in other choice processes like child adoption.

My final complaint with my piano teacher experiment is that it has been resource-intensive. The time and effort spent emailing, driving to and meeting people has felt subjectively like “too much.” The root of this complaint is that sequentially finding the best piano teacher in my area is not my only goal in life. I have other uses for my time, energy, and gasoline, like finding a part-time job, catching up with friends, and sleeping. Therefore I constantly have to weigh the value of finding a slightly better teacher against improving some other aspect of my life. At some point, my search is no longer worth it (economists, read: diminishing returns or increasing opportunity costs). The inevitable trade-offs we all face could partly explain, from both evolutionary and practical perspectives, why we tend not to sample quite enough alternatives to make the ‘optimal’ choice. Satoshi Kanawaza makes a related point about dating in densely populated cities like New York—at some point there are so many eligible bachelors around that meeting or speed-dating 37% of them is infeasible. We usually settle for “good enough,” and most times we have to.

Murky math and puppy paradoxes

These reflections should make clear that an idealized model of sequential decision making cannot replace mental assets such as good intuition, resourcefulness, and common sense. In spite of the apparent certainty of the “37% rule,” its most useful lesson for daily choices is vague: get a good sense of the candidate pool. In my search for piano teachers, the easiest way to scope out the field has been to contact a lot of teachers directly. But other ways to do that include asking experts or reviewers and drawing on relevant past experiences.

Unfortunately, knowing our options can be just as counterproductive as it is helpful for some highly subjective decisions. As Sheena Iyengar and Barry Schwartz are fond of pointing out, we humans are susceptible to “choice overload.” When we see too many retirement plans or job offers, we take irrational shortcuts and sometimes we feel less satisfied with whatever pick we make. For some of us (depending on culture, personality, and exact circumstances) the most adorable puppy possible is one of the first we see. If beauty is in the eye of the beholder, then the beholder is liable to get fatigued and jaded by alternatives.

It's easiest to have the puppy pick you.

It’s no surprise that marrying a mathematical theorem like the 37% rule with the complexity of human decision-making requires a laundry list of caveats. But even with the caveats, mathematicians and other logical sorts are on hand to help us if we ever get the urge to approximate “rational” thinking. The rest of the time, our unconscious brains can run a decent autopilot for us—and thank goodness for that.

(Disclaimer: I disavow myself of any responsibility, moral or legal, for terrible choices of piano teachers or puppies that result from your reading this post).

‘Proofiness’: The Wrong Kind of Math

In Book Review on May 4, 2011 at 12:22 am

This guy must be onto something. Look at all those fancy numbers.

Proofiness: The Dark Arts of Mathematical Deception is a lucid exposition of innumerate thinking in its many ugly forms. The author, Charles Seife, notes that a fundamental source of numerical confusion is measurement, which necessarily involves units and a degree of uncertainty stemming from the measuring instrument. Sometimes reported measurements lack units because there is no well-defined quantity to measure: what does it mean for a type of mascara to have “12 times more impact,” as L’Oreal once advertised? Sometimes people treat different units as the same, as New York politicians have done in claiming drastic improvement in their state’s educational performance based on state tests that got easier over time. Even when units are handled correctly, most people misunderstand precision.

The commonest mistake is “disestimation”–assuming an estimate is more precise than it is. Take vote counts: due to all kinds of undercounting and double-counting errors, the margin of error will be at least 2% of the total votes. That means that in cases where the difference in votes between two candidates is tiny—the 2000 presidential election especially—the logical response is to declare a tie. But ties do not sit well with most people, so closely contested elections degenerate into squabbles over hundreds of votes, as if those decisive votes were the only ones subject to error. In one of the most hilarious passages of the book, Seife chronicles the fight over one ballot in Minnesota’s close 2008 Senate race; that particular ballot offered the write-in candidate “Lizard people” but also bubbled in Al Franken for governor, leading to a heated fight among lawyers and a panel of judges about whether “Lizard people” is a valid individual (the decision: yes, he/she is).

Speaking of error, Seife devotes a chapter to undercutting most polls reported by the press. The typical opinion poll will show the percentage of people who gave each response, along with a “margin of error.” The lurking problem with these polls is that the largest source of error is not acknowledged. “Margin of error” as journalists report it is actually just statistical error due to random variation, which depends on sample size. Much more important is systematic error, skewing of the results due to the design of the survey. Examples of design problems include picking a sample that does not represent the population being studied, wording and ordering questions in a way that influences answers, and asking questions which might tempt people to lie. One blaring example of design failure is internet surveys, which can only include people with decent internet access who volunteer to take the survey based on motives that will probably skew their answers. But sadly, people will exaggerate even in careful face to face interviews—that’s why the CDC found in 2007 that heterosexual men somehow have more sexual partners than heterosexual women.

In surveying mathematical failures, Seife offers his own cutesy terminology. Sometimes I find it dull: he calls misattributed causation “causuistry,” which is neither memorable nor easy to say. Other times I found myself chuckling. He dubs fitting inappropriate lines and curves to data points “regression to the moon.” This is a play on the phrase “regression to the mean” that gets across the idea that foisting simple models onto complex data leads to wacky conclusions. Case in point: a 2004 Nature paper extrapolates a linear fit for sprinters’ times to argue that women will surpass men in the next century. Seife rejects that as ridiculous, pointing out that the same linear extrapolation would predict sprinters eventually breaking the sound barrier and surpassing the speed of light.

Proofiness is essentially a series of warnings, anecdotes, and lessons. Those three elements dance together gracefully throughout the book, making for an engaging read. So go out and find yourself a copy! Here is some more background on Proofiness if you’re not sold on the book yet:

Simplifying English Spelling? Not So Simple.

In Reality Check on February 26, 2011 at 12:30 am

Wouldn’t it be great if every letter in English spelling only made one sound? Why can’t our language be more “phonetic” like Spanish? Well it could be, but in some respects it is “too late,” and in any case our way has its perks.

To briefly make my point, here are a few practical obstacles for creating a 1:1 sound:letter script for English.

(1) Transparency of word roots is valuable. “Insane” and “insanity” have such related meanings that spelling them differently to account for pronunciation would be confusing. As another example, the silent /n/ at the end of “column, autumn, condemn” is worth keeping since it gives rise to “columnist, autumnal, condemnation.”

(2) Different spellings are helpful for reading homophones, which are common in English (though not as common as in Chinese, which needs much more meaning-specific morphemes). E.g., “eye” vs “I,” “you” vs “ewe,” “two” vs “too” vs “to.”

(3) Spelling should not reflect nuances of pronunciation that most people do not notice, such as coarticulation, assimilation, resyllabification, which change the pronunciation of words depending on spoken context. “Cap driver” would be confusing even though we only imagine pronouncing the “b” sound most of the time. Same goes for “apsurd” and for foreign accents (“do you vant some beer?”).

These ideas aren’t mine–I got them mostly from Stanislas Dehaene’s “Reading in the Brain.” It’s a cool book!