Twitter and Google are naming people they should not

Twitter and Google regularly do something that – if you or I did it – would be breaking the law. They reveal the identities of people who courts have decided should not be named. If newspapers and members of the general public name them, there are very serious repercussions. Yet Google & Twitter’s algorithms seem free to do this.

Today they are doing it again, in relation to the killers of a 39 year old woman.

One of the saddest legal cases you may ever read is that of Angela Wrightson. She was killed by two teenagers – 13 and 14 at the time – who inflicted more than 100 injuries on her. Today a judge sentenced them both to life in prison, serving a minimum of 15 years. The full, shocking, detail is here: https://www.judiciary.gov.uk/wp-content/uploads/2016/04/sentence_F_D.pdf

In considering whether Angela’s killers should remain anonymous, or whether newspapers, news media, and the general public should be allowed to reveal their identities, the judge said many things, including that it is likely to pose a great danger to them:

The judges summary was this:

I suspect some of us would agree with the above, and some would argue that they are murderers, and have brought it upon themselves. Either way, however strong they are, your feelings & my feelings are irrelevant: a judge has decided that there is a ‘real and present danger’ to these two girls, references suicide attempts, and therefore summarises that they must remain anonymous.

Yet, clicking on the victim’s name on Twitter, which has trended for much of the day, reveals 2 girls’ names:

Angela’s name trended across the UK. In other words, if you have used Twitter today, you were a single click away from seeing the names of two people who a judge had deemed should not be named.

And, as I have written about before, Google does a very similar thing, at the foot of search results:

This happens automatically, because both Google & Twitter have algorithms that associate related searches to each other. In other words, the algorithms are breaking the law.

I have written about this several times over the last few years:

  • Twitter & Google both showed photographs of one of James Bulger’s killers, when searching for broadly related terms: http://barker.co.uk/algolaw
  • The footballer Adam Johnson’s 15 year old victim was named http://barker.co.uk/algorithmlaw
  • In recent days, a very high profile celebrity was named by Twitter’s algorithm, when searching for the initials he had been given by the UK courts to conceal his identity.
  • And, again, it is happening today.

It is not right that this should happen. It is dangerous both in cases like this – for the killers, for their friends, and those associated with them – and it is most definitely not right in examples where victims are named.

The Footballer Adam Johnson, Search Algorithms, and the Law

Just over 3 years ago I wrote a post about 2 algorithms that appeared to be doing something that – if you or I did it – would break the law: Google’s image search algorithm and Twitter’s keyword search algorithm. At that time, they were displaying pictures of one of the killers of Jamie Bulger – someone who is not legally allowed to be identified under UK law.

This week, there was a very high profile story in the media about Adam Johnson, a footballer who was found guilty of a child sex charge. It is illegal in the UK to identify the victim of a sexual offence. There has been much said about individuals naming the victim in this case, but less said about the capability for algorithms to also do so.

Here is a quick look at whether Twitter & Google have managed to improve over the last 3 years, or whether there is still a chance they may inadvertently break the law in this way.

Twitter’s Algorithm

Adam Johnson’s name was one of the top trends in Twitter for much of the day. A click on the trending term took users to this search results page:

As you can see, there are some quite nasty  ‘related searches’ displayed for his name. A click on the 2nd related search leads to this result:

I have blurred 2 entries there: A Twitter username whose account has since been deleted, and what appears to be a woman’s name. I do not know whether either of these is the victim, but it’s worrying that Twitter aren’t on top of suppressing these on such high profile trends.

(update: On checking several hours after publishing this post, and after ‘Adam Johnson’ stopped trending, some Twitter results have now been cleaned.)

Google’s  Algorithm

Google’s algorithm fares a little better at first. It is only when reaching the foot of search results that their ‘related searches’ appear. Here are the results at the foot of the first search results page:

As you can see, several names are mentioned there. The first 2 names have b犀利士
een mentioned many, many times in the press – Adam Johnson’s former partner, and daughter. ‘adam johnson 15 year old’ is also present, but the results aren’t quite as nasty as the first set of Twitter ‘related’ searches. But, on clicking ‘stacey flounders’, and scrolling to related searches there, the following appears:

Again, as you can see, I have blurred the results partially, where Google lists a person’s name which cannot be explained by other means (2 of the other, unblurred names there have appeared in other sad news stories, and are explainable). The above is simply when clicking the name of Adam Johnson’s  partner. When clicking the ‘adam johnson 15 year old’ related search, the following appears:

Again, I have blurred a result there. As you can see, all of the related results are quite nasty here, but the blurred one in particular is very worrying. Ie: As with Twitter’s algorithm, Google is specifically naming someone who may/may not be the victim in the case. Additionally, Dan Bell noted a similar issue appears in Google Image Search. As Google themselves should not know the name of the victim, again, it is worrying that a name is allowed to appear here. (I have attempted to notify them.)

Summary

It is over 3 years since I last wrote about this topic, where both Twitter’s & Google’s algorithms appeared to be displaying results which could break the law.

I do not know in the above examples if the names they display are the victim in this case (frankly I hope not). Either way, it is worrying that both Google & Twitter’s search algorithms seem still to be capable of doing something that would likely be illegal for any person in the UK to do. It is concerning both from the broad point of view of algorithms breaking the law (and causing harm to individuals), and from the narrower point of view of this individual case.

Is the Cookie Law Being Enforced in the UK?

In 2012, “the cookie law” was implemented in the UK (it was actually a year earlier, but UK organisations were given a year’s grace period). I put in a ‘Freedom of Information’ request to the Information Commissioner’s office to see how they’re currently enforcing the law. Ashley Duffy (Lead Information Access Officer at the ICO) very kindly responded.

This post has a little bit of preamble, the numbers on how many ‘concerns’ have been raised about cookies by members of the public, detail on how the ICO has generally responded, and a summary.

Cookie Law?

The law essentially says you must tell your users prominently if your site is using cookies. Of course, by 2012 when the law began being enforced, almost every site on the web was using cookies, and therefore this meant every business in the UK rushed to do something to try and understand their requirements and comply with this new law. The Information Commissioner’s Office (who are responsible for policing this in the UK) flipped & flopped a little bit on what was acceptable for sites to do to gain consent that their visitors were happy to be tracked via cookies, but eventually agreed that ‘implied’ consent was a valid way for sites to achieve this. This is the approach that virtually every UK site now follows.

Here’s the ICO’s bullet-point guidance on what ‘implied consent’ means:

Some sites choose to take that to mean “we have to place a strip across the top of the site telling everyone”, some read it as “we just have to have a link in the footer that says “cookies”, etc. In summary though – it means almost without exception, sites in the UK place cookies without the user taking a specific ‘explicit’ action to say they’re happy with that. In other countries this is much harsher – for example in France many sites avoid placing any cookies until the user has either accepted, or clicked/scrolled on a page.

Enforcement: The Numbers

The Information Commissioner’s office very kindly replied to a Freedom of Information request I put in, asking for a breakdown of complaints & their response so far. They publish much of this info on their website, but it’s a tiny bit out of date & missing one or two answers. Here is the number of complaints (they refer to a complaint as a ‘concern’) that have been expressed to the Information Commissioner’s office, broken down over the last few years:

In other words, there have been a total of 1,023 ‘concerns’ raised by members of the public in the time since the law began being enforced. The number has dropped over time, with more than 50% of all complaints happening in the first 6 months after enforcement began, and only 7% of complaints in the last 6 months.

As context, the ICO received 47,465 ‘concerns’ about unwanted marketing communications between April & June 2014. In other words – if you’ve been doing the maths there, you’ll have noticed these 2 key stats:

  • Between July & September there was roughly only 1 complaint every 3 days.
  • Between April & June (all being equal) – it  was 1,249x more likely for a company to be complained about as a result of marketing communications than as a result of improperly informing users about cookies.

The ICO Response

The ICO give a good level of detail breaking down the above ‘concerns’ and their response:

  • Among the 1,023 complaints, there were 52 sites which were complained about more than once.
  • Following the 1,023 complaints since the ‘cookie law’ rolled out, the ICO say they have written to 275 organisations “where a complaint has been received about a website”. Absolutely no formal action has been taken (ie. no prosecution, fine, etc).
  • “27 larger sites have been investigated. We have prioritised those sites that are most-frequently visited by UK individuals (sites ranked within the top 200 most-visited in the UK). We have rated these sites as red, amber or green depending on the steps taken towards compliance. Currently all of these sites fall within the green category.”

The red/amber/green categories are as follows:

  • Red: The site hasn’t taken any steps to comply.
  • Amber: The site has taken some steps, but the ICO consider it ‘non-compliant’.
  • Green: “Significant steps taken to make users aware cookies are in use and obtain consent.”

Here’s a chart directly from the ICO showing the history of their classification for websites they’ve investigated. These are the group within their priority ‘top 200 most-visited in the UK’ about whom they’ve had complaints & have investigated:

As you can see, only one site among those has ever been in the red bracket, and all have moved into the “significant steps taken to make users aware cookies are in use and obtain consent” bucket. Ie: It looks like nobody’s ever been in any real trouble with the ICO in relation to cookies. I clarified this by asking for the number of sites prosecuted, or  where other action was taken against a site for the non-compliance of the cookie law, to which the response was:

“We have not had to take any formal action to date, instead we have used informal methods to secure compliance such as through correspondence and compliance meetings.”

That is the key line in this post really: nobody has been charged with anything in the UK, nobody has been fined, the ICO has simply worked with them to get them to a state where they’re happy that users have given ‘implied consent’ that they’re happy for sites to set cookies.

Summary:

In summary, and in answer to the question in the headline:

  • Yes, the “cookie law” is being enforced.
  • It is most definitely not a high priority within the Information Commissioner’s Office. (they do not have a single member of full time staff assigned to it, for example)
  • ‘Enforcement’ so far has simply meant: take complaints from the public, prioritise them based on the scale of reach of the site concerned, contact organisations to ask them to take steps toward compliance, check whether they have done that.

Based on the extremely low number of complaints they’ve received, I’d say the ICO are doing a really good job of matching the response to the actual level of interest from the public: the general public does not seem fussed about this issue at all (for better or worse), or they are broadly happy with the way it’s presented by sites.

Finally, with the obvious caveats that the ICO could change their policies if they wish, and that I am not offering legal advice:

  • From a business perspective: if you’re not among the 200 most visited sites in the UK, it seems you’re likely to be lower priority from the ICO’s point of view.
  • Even among the top most visited sites, as long as you’ve taken steps toward compliance & you’re willing to cooperate and take more, you are likely (literally) to be able to achieve a green light.

The Guardian’s Terms & Conditions: Worse than Instagram?

The Guardian have launched a new ‘user content’ site in collaboration with EE called “Guardian Witness”, they are now urging users to post their content to the site at https://witness.guardian.co.uk. The site itself is nice and slick, as you’d expect if EE’s tech team has been involved.

The official announcement is full of comments criticising The Guardian for asking for free content, accusing them of trying to build up a free picture library, etc.

But something else seemed strange to me: throughout the launch article they keep saying that you ‘still own the copyright’ of any content you post. I have read The Guardian’s Ts & Cs before, and that didn’t seem quite right to me, so I did a little more digging.

Here are some notes, plus the key ‘Instagramesque’ part of their terms & conditions:

“You still own the copyright”

As part of the terms and conditions for using the site, they say this:

“You or the owner of the content still own the copyright in the content sent to us”

They’ve also put together a fairly friendly set of frequently asked questions which explain: “You (or whoever created the content) own the copyright to the content which means that you control what others can do with it.”

In their article promoting the site, they also point out in the comments that “the copyright is, and remains with, the creator of content added to GuardianWitness…” and later on in the comments they again say “The creator of the submission always holds the copyright.”

But what do the Terms & Conditions actually say?

Reading the above quotes, you may expect that it means that you, when submitting content, still control exactly who has the ‘right to copy’ any content you post there. What they don’t say in the launch article itself is that, in one of the clauses in the terms and conditions, there is this 50 word snippet:

“…by submitting content to us, you are granting us an unconditional, irrevocable, non-exclusive, royalty-free, fully transferable, perpetual worldwide licence to use, publish and/or transmit, and to authorise third-parties to use, publish and/or transmit your content in any format and on any platform, either now known or hereinafter invented.”

 

Going back again through a few of the key points there:

  • Unconditional – there are zero conditions on how they can use your content.
  • Irrevocable – once you’ve posted content, you cannot ever stop them from using it.
  • Royalty-Free – they won’t pay you anything.
  • Fully transferable – they can in turn pass the right to use your content on to whoever they choose.
  • Perpetual – the right lasts forever.
  • Worldwide – there are no geographical restrictions.
  • Any format and on any platform – your content can be used for anything. They also “reserve the right to cut, crop, edit” your content elsewhere in the terms.

In other words, they may be promoting this by saying you still ‘own’ the copyright, and the FAQs may say they will endeavour to assist in various areas, but according to the terms & conditions The Guardian can do anything they like with your content once you have uploaded it.

They could sell your content, use it alongside ads (in fact the Guardian Witness site is in collaboration with an advertiser, so the content is by default being used as part of a third-party marketing campaign), they could allow anyone they choose to use your photos, videos, text, or any other ‘content’ you submit in any way they choose, and even if it is used commercially they never have to pay you.

How does this compare to Instagram?

This may all sound a vaguely similar to one of the complaints around the big Instagram Ts & Cs issue that blew up late last year. Oddly, one of the differences seems to be that The Guardian’s terms are slightly heavier than Instagram’s.

Here’s what The Guardian said about the Instagram issue in one of several articles about it:

“Instagram photos could be used in advertising, without reference to the owner, with all the payments going to Instagram. There is no opt-out from that use except to stop using the service and to delete your photos.”

The situation here is roughly similar, except that in The Guardian’s case you cannot opt out at all (even if you stop using the service). From the moment you post any content to Guardian Witness, you have granted them an “irrevocable, perpetual worldwide license”.

Keep that in mind when you read this advice, written by the excellent Jo Farmer in another Guardian article about user generated content following the Instagram fallout:

“Brands might be thinking that they can then use that content in future marketing, which might lead to a temptation to write something in the user terms and conditions to the effect that, “any content submitted by users may be used by the brand for any purpose without any payment to the user”.

The lessons we are learning from Instagram and other social media channels is to avoid any ham-fisted attempt to acquire such wide licence rights from your users in relation to their UGC.”

 

You can read the full terms here: https://witness.guardian.co.uk/terms.

Feedback very welcome on this, or do share this post if you think it would be of interest to others.

Can Algorithms Break the Law?

Here’s a breakdown of how 2 of the most prominent search algorithms on the web appear to be presenting information that it would be illegal for you or I to communicate.

  1. The first is the Twitter search algorithm, tweaked recently in a way which (inadvertently) increases the likelihood of people finding out ‘illegal’ information.
  2. The second is a Google algorithm, and 2 user interface changes which (again inadvertently) very much increase the chance of ‘illegal’ information being communicated.

imageobscured犀利士
4″ />

The Background – a Very Sad Story

In the entire history of the British legal system, there are apparently only four prisoners who have been given new identities. One of them was a boy (now a man) who used to be called Jon Venables. When he was a child, Jon Venables & another boy (Robert Thompson) killed a very, very young boy (James Bulger). If you live in the UK, you will almost certainly know this.

Because of their crime, Jon Venables & Robert Thompson are very likely to be in extreme danger if the general public can identify them. Therefore, they were given new identities, and those identities were protected.

Since Jon Venables was given a new identity, nobody is allowed to know his new name: It is illegal to publish anything claiming to identify him, or even to ‘purport to identify’ anyone as him whether it is him or not.

jvindie

Photos of Jon Venables were apparently published on Twitter recently. This hit the front pages of most newspapers in the UK (including, above, The Guardian).

The Twitter Search Algorithm Change

Until recently, Twitter’s search algorithm only returned very recent tweets. A few weeks ago, that was altered to also include much, much older tweets. That coincided with the 20 year anniversary of Jon Venables’ crime. As a result, lots of very old tweets and photos claiming to ‘out’ Jon Venables identity suddenly became visible far more easily, at a time when many were searching for his name. Many of these photos were retweeted; some new ones were posted, some were taken from Twitter and posted on other websites, and many users posted names that they claimed were Jon Venables’ new identity.

That hit the front page of most UK newspapers (though none made the link with Twitter’s algorithm update), and – of course – an investigation was carried out.

Twitter Search Last Week: Jon Venables

Here’s what happened if you searched Twitter for ‘Jon Venables’ a week ago:

venables

You can see there, Twitter has pulled a ‘Top news’ box covering the investigation. It has automatically pulled a ‘Top’ tweet that was Retweeted 49 times, claiming to out Jon Venables’ new identity & urging readers to spread it.

There are 2 big red blocks there too where I’ve obscured 2 important items:

Item 1: Top Photos

On the left (item 1) I’ve obscured 2 photos showing an adult man, roughly the age Jon Venables would be now. Those were posted by users, but the algorithm picked them out as ‘top photos’.

Item 2: A man’s name

Across the top (item 2) I’ve obscured a man’s name, flagged as being a ‘Related search’ for ‘Jon Venables’. Again, as with the photos, this was posted by Twitter users, but Twitter’s algorithm picked it out & highlighted it as being a related search.

Clean Up?

Since this all came out, the UK Attorney General has announced that there will be ‘contempt’ proceedings launched against people who posted images ‘purporting’ to be Jon Venables.

Twitter has obviously made a substantial effort to clean up the photos too.

Twitter Search Now: Jon Venables

Today, if you search for ‘Jon Venables’ on Twitter, the photos on the left (which previously showed an adult man) have gone. BUT, the ‘Related’ search term, showing a man’s name, is still in place:

jv2

If you click through on the man’s name in the obscured ‘Related Search’ up there, it leads you through to this:

venablesnew

Twitter Summary

In the first instance, Twitter’s algorithm automatically highlighted photos & details of an adult man whenever users searched for ‘Jon Venables’. Twitter users themselves posted the content, but the algorithm crowdsourced from that to highlight particular photos, and a particular name.

Twitter have obviously gone to some lengths to clean things up here. BUT, on searching for ‘jon venables’, the algorithm still leaves a fairly prominent trail toward a man’s name, and toward tweets linking to photos of an adult man.

And Google?

The Google story is much, much shorter. It involves 2 relatively recent additions to Google’s user interface:

  1. The ‘knowledge graph box’ – an area on the right of search results that is intended to reveal ‘Facts’ related to searches.
  2. Google’s updated Image Search results, which used to show images in the context of a web page, but now simply show the image on a black background on Google itself.

Here’s what happens when you search for ‘Jon Venables’ on Google today:

jvgoog

I’ve obscured an area on the right there (within the ‘knowledge graph’ box), where there are 2 photos (within a single image) of an adult man.

And when you click on that obscured area, it leads us to this quite scary screen:

jvred

A few weeks ago, clicking the image from search results would have taken you to the website in question (faded in the background), showing clearly that it was a website other than Google publishing it. Today, doing so keeps you on Google’s own property (note the ‘google.co.uk’ URL). It simply shows a black background, 2 photos of a man, and the name ‘Jon Venables’, the old name of a person it is illegal to identify.