Posted in books

‘Everybody Lies: Big Data, New Data’ By Seth Stephens-Davidowitz

I must confess that I have learned a lot from this book. The more I learned about the true nature of the human beings, the more I have experienced a wide range of feelings… I felt disgusted, scared, then as the stories moved along, I felt surprised and started laughing! Then I felt angry again. The author used Google searches to measure racismself-induced abortiondepressionchild abusehateful mobsthe science of humorsexual preferenceanxiety, son preference, and sexual insecurity, among many other topics. This great emotional roller coaster book is written by the internet data expert Seth Stephens-Davidowitz.

Davidowitz worked for one-and-a-half years as a data scientist at Google and is currently a contributing op-ed writer for the New York Times. He is a former visiting lecturer at the Wharton School at the University of Pennsylvania. He received his BA in philosophy, Phi Beta Kappa, from Stanford, and his PhD in economics from Harvard.

His 2017 book that I will review today, Everybody Lies, published by HarperCollins, was a New York Times bestseller; a PBS NewsHour Book of the Year; and an Economist Book of the Year.

Enjoy.

For this study, Davidowitz downloaded all of Wikipedia, pored through Facebook profiles, and scraped Stormfront. Plus PornHub, gave him its complete data on the searches and video views of anonymous people around the world.

Google Trends, a tool that was released with little fanfare in 2009, tells users how frequently any word or phrase has been searched in different locations at different times. The everyday act  of typing a word or phrase into a compact, rectangular white box leaves a small trace of truth that, when multiplied by millions, eventually reveals profound realities.

RELATIONSHIPS: On Google, the top complaint about marriage is not having sex: Searches for “sexless marriage” are three and a half times more common than “unhappy marriage” and eight times more common than “loveless marriage”. Relationship themed most searches are “abusive relationship” 😦

il_fullxfull.521965914_gixb

Sometimes new data reveals cultural differences: The writer gives the example of different ways that men around the world respond to their wives being pregnant. In Mexico, the top searches about “my pregnant wife” include “frases de amor para mi esposa embarazada” (words of love to my pregnant wife” and “poemes para mi esposa embarazada” (poems for my pregnant wife”. In the United States, the top searches include “my wife is pregnant now what” and “my wife is pregnant what do I do.” (Blogger’s note: Nope. Not as romantic.)

After daters took some recording with them and data analysts digitized these audios into words, we had an interesting set of information about how to have a successful first date! On the first date, for instance, one of the ways a man signals that he is attracted is obvious: he laughs at a woman’s jokes. When speaking, they limit the range of their pitch. There is a research that suggests a monotone voice is often seen by women as masculine. The scientists found that a woman signals her interest by varying her pitch, speaking more softly, and taking shorter turns talking. A woman is unlikely to be interested when she uses hedge words and phrases such as “probably” or “I guess”. Also, a woman is likely to be interested if she is using “I” and self making phrases such as “Ya know?” and “I mean”. This research revealed that men are more likely to report clicking with a woman who talks about herself. If there are lots of questions asked on a date, it is less likely that both will report a connection.

baddate

Among the Facebook data scientists’ findings, Christmas is one of the happiest days of the year BUT Davidowitz doesn’t trust Facebook data very much: He thinks Facebook is digital brag-to-my-friends-about-how-good-my-life-is serum. In Facebook world, family life seems perfect, in the real world, family life is messy. So don’t trust relationship posts very much.

Compare, for example, the way that people describe their husbands on public social media and in anonymous searches:

facebook

So human beings are liars? NO WAY! haha!

FEMALE-MALE: Parents are 2.5 more likely to ask “Is my son gifted?” than “Is my daughter gifted?” Parents show a similar bias when using other phrases related to intelligence that they may shy away from saying aloud, like, “Is my son a genius?”. Which is funny is that in American schools, girls are 9 percent more likely than boys to be in gifted programs. So what are the most searches about daughters?: “Is my daughter overweight?” This phrase was searched roughly as frequently as they Google “Is my son overweight?”

Parents are also 1.5 times more likely to ask whether their daughter is beautiful than whether their son is handsome. And they are 3 times more likely to ask whether their daughter is ugly than whether their son is ugly. (Author’s note: How Google is expected to know whether a child is beautiful or ugly is hard to say 🙂 ) (Bloggers note: Maybe parents are the ones who were causing females’ insecurities about how they look?). In general, parents seem more likely to use positive words in questions about sons.

Another interesting male/female difference: women use the word “tomorrow” far more often than men do. Adding the letter “o” to the word “so” like “Sooo” is one of the most feminine linguistic traits.

On Facebook, among the words used much more frequently by men than women are “fuck” “shit” “bullshit” “Fucking and Fuckers ” (Blogger’s note: Dear men, take a chill pill! haha) Whereas for women it is “shopping””excited” “cute” “happy””family” “soooo” “yay”(Blogger’s note: Soooo happy to be a woman! Yay!”)

SEX: Data science makes many parts of Freud falsifiable – it puts many of his famous theories to the test. For example: Freud’s theory of the phallic (shape of male genital) symbols in dreams. According to big data, the substance that is most dreamed is water. The top twenty foods include chicken, bread, sandwiches, and rice – all notably un-Freudian. Bananas are the second most common fruit to appear in dreams. But they are also the second most commonly consumed fruit

banana

Consider all Google searches of the form “I want to have sex with my…” The number one way to complete this search is “mom”. 😮 Overall, more than 3/4 of searches of this form are incestuous. Again, according to Google and PornHub Data, men retain an inordinate number of fantasies related to childhood (including mom, babysitter, wearing diapers, breast feeding, etc.)

Americans search for “porn”more than they search for “weather”.

There are twice as many complaints that a boyfriend won’t have sex than that a girlfriend won’t have sex. By far, the number one search complaint about a boyfriend is “My boyfriend won’t have sex with me”.

Do women care about penis size? Rarely, according to Google researches. More than 40%
of complaints about a partner’s penis size say that it’s too big. For every search women make a partner’s phallus, men make roughly 170 searches about their own! Men’s second most common sex question is how to make their sexual encounters longer. Once again, the insecurities of men do not appear to match the concerns of women. Women’s concern isn’t about when or how long it happened but why it isn’t happening at all.

However, women still outpace them when it comes to insecurity about how they look.

insecurity

In 2004, in some parts of the US, the most common search regarding changing one’s butt was how to make it smaller. Beginning in 2010, however, the desire for bigger butts grew. Does women’s growing preference for a larger bottom match men’s preferences? Interestingly, yes. Again internet says men show a preference for large breasts. But natural ones: About 3 percent of big-breast porn searches explicitly say they want to see natural breasts (Blogger’s note: Thank you Beyonce, J-Lo, Rihanna and Kim Kardashian!!)

Men make as many searches looking for ways to perform oral sex on themselves as they do how to give a woman an orgasm (This was among authors favorite facts in Google search data) (Blogger’s note: How is it even possible?? And nope, I won’t Google it!)

RACISM: Roughly one in every hundred Google searches that included the word “Obama” also included “kkk” or “nigger(s)”. There was a darkness and hatred that was hidden from the traditional sources but was quite apparent in the searches that people made. Places with the highest racist search rates included upstate New York, western Pennsylvania, eastern Ohio, industrial Michigan and rural Illinois, along with West Virginia, southern Louisiana and Mississippi. The true divide, Google search data suggested, was not South versus North; it was East versus West, and racism was not limited to Republicans.

Black Americans told polls they would turn out in large numbers to oppose Trump. But Google searches for information on voting in heavily black areas were way down.

You can see on Google, where users ask sometimes questions such as “Why are black people rude?” or “Why are Jews evil? Below, in order, are the top five negative words used in searches about various groups:

IMG_1862
Top 5 negative words used in searchers for specific groups

Shortly after the mass shooting in San Bernardino, California on December 2, 2015, the top search in CA with the word “Muslims” in it at the same time was “kill Muslims”.

What is super interesting that as Obama gave more speeches on TV about ‘equality’ and ‘racism’ it created an opposite effect. In his speech, he said “It is the responsibility  of All Americans – of every faith – to reject discrimination” Searches calling Muslims “terrorists” “bad””violent” and “evil” doubled during and shortly after the speech (But then one of Obama’s speeches succeeded, to find out which one, keep reading!)

Guess when are searches for “nigger(s)” or “nigger jokes” most common? Whenever African-Americans are in the news. Among the periods when such searches were highest was the immediate aftermath of Hurricane Katrina, when television and newspapers showed images of desperate black people in New Orleans struggling for their survival. They also went up during Obama’s first election. And searches for “nigger jokes” rise on average about 30% on Martin Luther King Jr. Day. Davidowitz claims that there is a hidden explicit racism in the USA.

HURRICANE KATRINA VICTIMS OUTSIDE SUPERDOME
Victims of Hurricane Katrina argue with National Guard Troops as they try to get on buses headed to Houston, TX on Thursday morning, September 1, 2005.

POLITICS: Nate Silver,  an American statistician and writer, noticed that the areas where Trump performed best made for an odd map. Silver looks for variables to try to explain this map. Then he found that the single factor that best correlated with Donald Trump’s support in the Republican primaries was that made the most Google searches for “nigger”.

Google searches for “how to vote” or “where to vote” weeks before an election can accurately predict which parts of the country are going to have a big showing at the polls.

The most important year for developing political views is age 18 (Blogger note: I guess voting age being 18 was the right decision).

STRESS: Google searches reflecting anxiety tend to be higher in places with lower levels of education, lower median incomes, and where a larger portion of the population lives in rural areas.

The author was surprised with one fact. you would think that people would search for more jokes when they are sad or depressed to cheer themselves up. However, data shows that searches for jokes are lowest on Mondays, the day when people report they are most unhappy. They are lowest on cloudy and rainy days. They actually seek out jokes when things are going well in life.

In winter months, warn climates, such as that of Honolulu, Hawaii, have 40 percent fewer depression searches than cold climates, such as that of Chicago, Illinois.

HEALTH: In Google, searching for a back pain and then yellowing skin turned out to be a sign of pancreatic cancer; searching for just back pain alone made it unlikely someone had pancreatic cancer. These weren’t listed as symptoms before.

We tend to overestimate the prevalence of anything that makes for a memorable story. People rank tornadoes as a more common cause of death than asthma. In fact, asthma causes about seventy times more deaths. But deaths by asthma don’t stand out- and don’t make the news. Same goes for flu and shark attacks.

Search rates for self-induced abortion were fairly steady from 2004 through 2007. They began to rise in late 2008, coinciding with the financial crisis and the recession that followed. They took a big leap in 2011 jumping 40% – when 92 state provisions restricted access to abortion were enacted. The state with the highest Google searches for self-induced abortions in Mississippi, a state with roughly three million people and, now, just one abortion clinic.

UNEMPLOYMENT: Google engineers created a service: Google Correlate that gives solitaire-logooutside researchers the means to experiment with the same type of analyses across a wide range of fields. One day Davidowitz put the US unemployment rate from 2004 through 2011 into Google Correlate. Of the trillions of Google researches during that time, what do you think turned out to be the most tightly connected to unemployment? “New jobs”? No. It was “Slutload” That’s right the most frequent search was a pornographic site. Many are stuck at home, alone and bored. The second most common search: Spider Solitaire. Again, not so surprising.

More rich people in a city means the poor there live longer. Poor people in NYC for example, lives a lot longer than poor people in Detroit: Contagious behavior maybe driving some of this (Behaviors like healthy eating habits, exercising, less stress, etc.)

SPORTS: The data tells us that in worse-off families. in worse-off communities, there are NBA-level talents who are not in the NBA.

Internet data shows that the most important year in a man’s life for the purposes of cementing his favorite baseball team as an adult, is when he is more or less 8 years old. This peak age for women is 22.

HOW BIG DATA WORKS: In prediction business, you just need to know that something works, not why. For example: Before a hurricane hit Southeast in 2004, Walmart (the biggest supermarket chain in the US) suspected -correctly- that people’s shopping habits may change when a city is about to be pummeled by a storm. They pored through sales data from previous hurricanes to see what people might want to buy. A major answer? Strawberry Pop-Tarts. This product sells seven times faster than normal in the days leading up to a hurricane. We don’t ask why, we care about what.

straw

By using Google Ngrams, you can search through millions of digitized books for particular words or phrase. This way, you can see how the popularity of a phrase changed among hundreds of years.

If you type “Why is…” the first two Google auto-completes currently are “Why is the sky blue?” and “Why is there a leap day?” suggesting these are the two most common ways to complete this search. The third: “Why is my poop green?” And Google auto-complete can get disturbing. Today, if you type “Is it normal to want to…” the first suggestion is “kill” If you type in “Is it normal to want to kill…” The first suggestion is my family.

People are seven times more likely to ask Google whether they will regret not having children than whether they will regret having children. Adults with children are 3.6 times more likely to tell Google they regret their decision than are adults without children.

MARKETING: One day, across the internet, the researchers found 949 scanned yearbooks from American high schools spanning the years 1905-2013. Americans, and particularly women, started smiling as the years pass by. They went from nearly stone-faced at the start of the twentieth century to beaming by the end.

When photographs were first invented, people thought of them like paintings. Subjects in photos adopted the same look. In the mid-20th century, Kodak, the film and camera company was frustrated by the limited numbers of pictures people were taking and ended up with a strategy to get them take more. Kodak’s advertising began associating photos with happiness. (Blogger’s note: SO CLEVER, isn’t it? This reminded me of the increase in dental cleaning appointments since ‘selfies’ became a thing. Although, I’m sure dentists are not behind this whole selfie craziness).

IMG_1863

Answer these questions for me:

  • Have you ever cheated on an exam?
  • Have you ever killed someone in your dream?

Were you tempted to lie? Many people under-report embarrassing behaviors and thoughts on surveys. They want to look good, even though most surveys are anonymous. This is called social desirability bias. Why do people misinform anonymous surveys? Roger Tourangeau, a research professor emeritus at the University of Michigan explains: “About 1/3 of the time, people lie in real life. The habits carry over to surveys.”

Netflix was confused because in the beginning, when it let its users to create a queue of movies to watch later, they realized something odd. When the users were reminded of these movies later, they rarely clicked. Because they were filling the queue with award winning highbrow films like black and white World War II or foreign movies. But when they come home from work, they were clicking on at lowbrow comedies or romance films. They were lying to themselves. So, Netflix created an algorithm based on users’ choices. The former data scientist at Netflix, Xavier Amatriain says “The algorithms know you better than you know yourself”.

Also, apparently on Facebook or YouTube, “Content is more likely to become viral the more positive it is”. (Bloggers note: I found this very strange when I think about my news feed which is full of bad news.)

CRIME: For every percentage point increase in the unemployment rate, there was an associated 3 percent increase in the search rate for “child abuse” or “child neglect”. The author argues that it’s safe to say that the Great Recession did make child abuse worse, although the traditional measures did not show it but Big Data did.

BIG DATA LESSON: When we lecture angry people, the search data implies that their fury can grow. But subtly provoking people’s curiosity, giving new information, and offering new images of the group that is stoking their rage may turn their thoughts in different, more positive directions. According to Google researches, Obama’s one of the most successful speech was this: “Muslim Americans are our friends and our neighbors, our co-workers, our sports heroes and yes, they are our men and women in uniform, who are willing to die in defense of our country.”

Here is his speech:

DOPPELGANGERS AND A/B TESTING

Doppelgangers play a huge role in predicting. Let’s see what it means first:

Doppelganger: an apparition or double of a living person.

Of course, we are not talking about looking exactly the same, but we are talking about correlating the data to find the most similar people in database to predict the future of that person.

Think about a sick person suffering from certain disease symptoms. Her age is 34, her height is 5’5, her weight is 132 lbs, she has no smoking or drinking habits, she had Pneumonia when she was seven. She lives in Kentucky.  If we run the analysis and find once upon a time 24 year old female who lived in Kentucky  with 5’5 height and around 132 lbs, with no smoking or drinking history who had Pneumonia when she was little, suffering from the same symptoms, we can predict what might happen to the patient. Because what happened to her doppelganger(s) will probably happen to her, as well.

ortizLet’s explain it with a real example: 35 year old baseball player Ortiz was about to be fired because of his old age. But, data analysts went through every single information about him and correlated to historical records. They found 20 ballplayers who played like he did when he was 24, 25, 26, 27, 28, 29, 30, 31, 32, and 33. Then see how Ortiz’s doppelgängers’ careers progressed. Data showed that Ortiz was about to enter the peak of his career. So Boston decided to be patient with is aging slugger. And they won. He took his team to World series at the age of 37, Ortiz was also voted for World Series MVP.

So how about A/B testing? What is it?

A/B Testing is basically testing two different controlled groups. Google wants to know how to get more people to click on ads on their sites, they may try two shades of blue in ads – one shade for Group A another for Group B. Then Google can compare these click rates. Facebook now runs a thousand A/B tests per day. A former Google employee Dan Siroker, used A/B testing for Barack Obama’s first presidential campaign. He A/B tested the campaign home page. In 2012, he used 3 different pictures (see below), 2 different slogans and 3 different ‘click buttons’ which one do you think got more clicks and so more donations?

IMG_1864

The winner was picture of Obama’s family and the button “Learn More”.

EDUCATION: What makes some places better at allowing a poor kid to have a pretty good life?

  • Areas that spend more on education provide a better chance to poor kids.
  • Places with more religious people and lower crime do better.
  • Places with more black people do worse. Interestingly, this has an effect on not just the black kids but on the white kids living there as well.
  • Places with lots of single mothers do worse.

To see which cities/states have the most successful people the author zoomed in names of the people who are baby boomers and who took a place in Wikipedia. Roughly 1 in 1,209 baby boomers (Born between 1946 and 1964) born in California reached Wikipedia with their success. Roughly 1 in 748 baby boomers born in Suffolk County, Massachusetts, where Boston is located, made it to Wikipedia. Be careful, not ‘went to school’ but ‘born’. The reason for this seems to be early exposure to innovation. Besides, New York City apparently produces notable journalists at the highest rate, Boston produces notable scientists at the highest rate and Los Angeles produces notable actors and actresses at the highest rate.

The greater the percentage of foreign born residents in an area, the higher the proportion of children born there who go on to notable success (Davidowitz’s note: Take that, Donald Trump!) (Blogger’s comment: Ditto that!)

Education spending did not correlate with rates of producing notable writers, artists, or business leaders.

What I found very motivational as a reader was on page 237. Even though you couldn’t finish the best school, you still have an equal chance to be as successful as those people who graduated from Harvard, MIT or Stanford. People adapt to their experience and people who are going to be successful find advantages in any situation. The factors hat make you successful are your talent and your drive.

________________________________________________________________________

Davidowitz also discusses other interesting topics, which I don’t mention here such as “who gets loans more easily, who doesn’t” or “How much the casinos let you lose” or what other jaw dropping data analyses are being used in marketing.

In his final words, he emphasizes that social science is becoming a real science (Blogger’s note: Finally!!! I was so tired of people not seeing social science as real science! Thank you Davidowitz!). And this new, real science is poised to improve our lives.

If a violent movie comes to a city, does crime go up or down? If more people are exposed to an ad, do more people use the product? If a baseball team wins when a boy is twenty, will he be more likely to root for them when he is forty? These are all clear questions with clear yes-or-no answers. And in the mountains of honest data, we can find them

This is the stuff of science, not pseudoscience.

IMG_1865

I strongly recommend this book! And after you read, please feel free to share your comments below 🙂

-Ece

Author:

Science lover, book enthusiast, a nerd who dedicated herself to education.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.