The Real Original Source of the Phrase “Big Data”

Big Data

In early 2013, Steve Lohr of the New York Times published an article where he tracked down the origin of the phrase “Big Data”. He found several different sources, and declared that it originated in the mid-1990s. But… he specifically opted to conclude that the very earliest source he could find – from 1989 – was not the originator. His reasoning was based on 2 factors:

  1. He wanted to credit someone who used the phrase in a technical way: “The credit, it seemed to me, should go to someone who was aware of the computing context.”
  2. He did not feel that the original usage of the phrase fitted the same idea of ‘Big Data’ as his. He therefore concluded the first usage was: “not, I don’t think, a use of the term that suggests an inkling of the technology we call Big Data today.”

I read Steve’s article at the time, where he declared that the first ever use of “Big Data” was not the originator, and thought “that’s a little unfair”. I keep going back to it, because the first source he found, and apparently the original usage of the phrase “Big Data” was very insightful, and covers perhaps the two biggest issues in relation to data today: its massive worth from a corporate point of view, and its massive privacy implications from a consumer point of view.

The original article was published on July 26th, 1989, under the headline “How Did They Get Your Name? Direct-mail Firms Have Vast Intelligence Network Tracking Consumers”. It was written by Erik Larson (now a best-selling author). The article talks about organisations gathering, joining, and mining data on millions of people, to use for marketing purposes. Here are a couple of example paragraphs:

“We’ve been scavenged by data pickers who sifted through our driving record and auto registrations, our deed and our mortgage, in search of what direct mailers see as the keys to our identities; our sexes, ages, the ages of our cars, the equity we hold in our home.

The scavengers record this data in central computers, which, in turn, merge it with other streams of revelatory data collected from other sources – the types of magazines we subscribe to, the organizations we support, how much credit we’ve got left – and then spit it all out (for a price) to virtually anyone who wants it.”

It goes on to talk about future implications of all of this:

It is an interesting exercise to imagine the big marketing databases put to use in other times, other places, by less trustworthy souls. What, for instance, might health insurers do with the subscription lists of gay publications?

Despite the dated & simplistic example, this is of course what many people today worry about: what governments try to regulate, where companies spend millions setting up & utilising systems, what we use in real time to deliver relevant ads to people as they browse websites, and – with a little stretching – what much of the NSA/Edward Snowden stuff was about. It is an article from 1989 talking about one of the biggest issues in technology today. And there, in the middle, is the first ever usage of the phrase “Big Data”:

bigdataquote

There’s a copy of the original article over on the Orlando Sentinel website, ironically now full of real-time targeted ads. Erik Larson later released a book expanding on the topic “The Naked Consumer: How Our Private Lives Become Public Commodities”. Despite being 25 years old, both the article and the book essentially talk about one of the versions of the phrase “Big Data” we use today: a cornerstone of modern marketing from a corporate point of view, and a privacy worry from a consumer point of view for many.

BuzzFeed is Watching You

When you visit BuzzFeed, they record lots of information about you.

Most websites record some information. BuzzFeed record a whole ton. I’ll start with the fairly mundane stuff, and then move on to one example of some slightly more scary stuff.

First: The Mundane Bits

Here’s a snapshot of what BuzzFeed records when you land on a page. They actually record much more than this, but this is just the info they pass to Google (stored within Google Analytics):

Here’s a description of what’s going on there:

The first line there is how many times in total I’ve visited the site (above this, which I’ve skipped for brevity, it also records the time I first visited, and a timestamp of my current visit).

Below that, the ‘Custom Var’ block is made up of elements BuzzFeed have actively decided “we need to record this in addition to what Google Analytics gives us out of the box”. Against these, you can see ‘scope’. A scope of ‘1’ means it’s something recorded about the user, ‘2’ means it’s recorded about the current visit, ‘page’ means it’s just a piece of information about the page itself.

There you can see other info they’re tracking, including:

  • Have you connected Facebook with BuzzFeed?
  • Do you have email updates enabled?
  • Do they know your gender & age?
  • How many times have you shared their content directly to Facebook & Twitter & via Email?
  • Are you logged in?
  • Which country are you in?
  • Are you a buzzfeed editor?
  • …and about 25 other pieces of information.

Within this you can also see it records ‘username’. I think that’s recording my user status, and an encoded version of my username. If I log in using 2 different browsers right now, it assigns me that same username string, but I’m going to caveat that I’m not 100% sure they’re recording that it is ‘me’ browsing the site (ie. that they’re able to link the data they’re recording in Google Analytics about my activity on the site back to my email address and other personally identifiable information). Either way, everything we’ve covered so far is quite mundane.

The Scary Bit

The scary bit occurs when you think about certain types of BuzzFeed content; most specifically: quizzes. Most quizzes are extremely benign – the stereotypical “Which [currently popular fictional TV show] Character Are You?” for example. But some of their quizzes are very specific, and very personal.

Here, for example, is a set of questions from a “How Privileged are You?” quiz, which has had 2,057,419 views at the time I write this. I’ve picked some of the questions that may cause you to think “actually, I wouldn’t necessarily want anyone recording my answers here”.

When you click any of those quiz answers, BuzzFeed record all of the mundane information we looked at earlier, plus they also records this:

Here’s what’s they’re recording there:

  • ‘event’ simply means something happened that BuzzFeed chose to record in Google Analytics.
  • ‘Buzz:content’ is how they’ve categorised the type of event.
  • ‘clickab:quiz-answer’ means that the event was a quiz answer.
  • ‘ad_unit_design3:desktopcontrol’ seems to be their definition of the design of the quiz answer that was clicked.
  • ‘ol:1218987’ is the quiz ID. In other words, if they wish, they could say “show me all the data for quiz 1218987” knowing that’s the ‘Check Your Privelege’ quiz.
  • ‘1219024’ is the actual answer I checked. Each quiz answer on BuzzFeed has a unique ID like this. Ie. if you click “I have never had an eating disorder” they record that click.

犀利士
In other words, if I had access to the BuzzFeed Google Analytics data, I could query data for people who got to the end of the quiz & indicated – by not checking that particular answer – that they have had an eating disorder. Or that they have tried to change their gender. Or I could run a query along the following lines if I wished:

  • Show me all the data for anyone who answered the “Check Your Privelege” quiz but did not check “I have never taken medication for my mental health”.

In BuzzFeed’s defense, I’m sure when they set up the tracking in the first place they didn’t foresee that they’d be recording data from quizzes of this personal depth. This is just a single example, but I suspect this particular quiz would have had less than 2 million views if everyone completing it realised every click was being recorded & could potentially be reported on later – whether that data is fully identifiable back to individual users, or pseudonymous, or even totally anonymous.

What do you think?

.UK Domains Launched – Sorry!

On June 10th 2014, at 8am, Nominet (the UK domain registry) launched “.uk” domains. In other words, I could now move this site to “http://barker.uk” rather than “http://barker.co.uk”.

To announce the launch – the biggest change to UK internet addresses in many, many years – Nominet have launched what they call “the world’s largest welcome sign”, visible from 35,000 feet. Here’s how the Daily Mail described this enormous sign:

威而鋼
ttp://barker.co.uk/wp-content/uploads/2014/06/2cfb6be40fbd84c17e99dd146beb73ea.png” alt=”” width=”648″ height=”138″ />

Sadly – here’s what you see if you visit the URL on the world’s largest welcome sign:

A shame to have launched the world’s largest welcome sign leading to a large “Sorry…” notice, and a nice lesson to remember to double check your landing pages when running multi-channel campaigns.

Note: If you’d like a full summary of the .uk change, what it means, and what to do about it, feel free to leave a comment and I’ll update this post later.

The John Lewis Email Spam Fine

Part of the email marketing industry in the UK is built around this phrase:

‘in the course of a sale or negotiations for the sale of a product or service’.

Those are the conditions under which – if you have collected an email address – you are allowed to send marketing emails (b2c), even if they have not explicitly opted in to receive mail from you.

Most sites assume signing up for an account, or beginning a checkout process to fall within ‘negotations for the sale of a product service’. As a result, they consider it perfectly ok to send you abandoned basket emails if you have begun checkout, and it’s fairly standard practice to email users who have registered for an account with you, as long as they have not specifically opted out.

Here’s how the Information Commissioner’s Office talk about this:

John Lewis essentially did exactly that, or considered they had. Here is how the man who took them to court (a Sky News producer) described John Lewis’ argument: (from http://news.sky.com/story/1272933/spammer-to-pay-damages-after-court-victory):

To be clear: What John Lewis were doing here is considered fairly good practice. The user signed up for an account. They had the opportunity to opt out & did not. Yet the court still considered it spam & issued a fine.

What does this mean for email marketing? 

If you are a business or a website owner:

  • It may mean you should relook at the wording on your website to make it clear that an account signup is considered ‘negotiation toward a sale’.
  • It may mean you need to speak to your abandoned basket email provider to ask “are we definitely covered here? If not, what do we need to do?”
  • It may mean that your ‘opt out’ box should be more prominent after signing up & that you highlight that the sign up is considered the beginning of a relationship.
  • It may mean you should check through how your existing email addresses have been acquired a little more thoroughly.
  • It may mean some sites need to watch out for scammers, putting in spam claims to try and win the fine money.
  • It may even mean you need to move to double opt-in, or more heavily confirm opt-ins, as – of course – anyone can enter anyone else’s email address on a form, it is not necessarily confirmation from the actual email owner that they wish to receive your communications.
  • It may mean you should think about not emailing users unless they have explicitly ticked a box, even though the Information Commissioner says it’s fine to do just that under some conditions.

Or, this may just be a fluke, and another court may decide a similar case entirely differently.