Visualizing SOPA on Twitter

When I heard that Tyler Gray at Public Knowledge was looking for someone to do some analysis on tweets that mentioned SOPA, I thought I might try Cytoscape (an open source tool used for biomedical research, but handy for large scale data visualization) to show some of the relationships between people discussing the controversial bill on Twitter.

The result is a graph of the most active users referencing SOPA

Public Knowledge worked with the Brick Factory to set up their slurp140 tool to record approximately 1.5 million tweets which Tyler sent me in the form 350mb CSV file. I first used Google Refine to clean and narrow the set down to only tweets which were replies to someone else. This left approximately 80,000 tweets which I then imported into R. I then ranked all of usernames by how often they appeared both as senders and recipients, and then picked the approximate top 1,000 users. Since replies are sent from one user to another, the graph is directed: each edge has a direction with an origin and an arrow pointing at the recipient. There are 1,021 nodes identified by their Twitter usernames, and 1,757 edges a good portion of which are labeled with the content of their tweet.

Visualizing networks this large is more of an art than a science

I've tried to strike a balance between visual complexity, aesthetics and readability of tweets, but you'll find that this isn't always successful. Sometimes tweets run into nodes, sometimes edges run into labels, and sometimes the graph feels like a total mess. But that messiness is part of what made the SOPA debate on so interesting over the last month.

Thousands of people participating with plenty of cross talk.

The colors and sizes of the nodes and edges are coded in the following ways:

  • A node and its label size is maps to the number of tweets both posted by a user and and mentioning a user. (Ex: @BarackObama is a huge node because so many people were tweeting at him about SOPA).
  • Node color represents the number of outgoing tweets. The greener the node, the more replies a user posted. (Ex: @Digiphile sent a lot of tweets mentioning SOPA.)
  • Edge thickness represents "edge betweeness" which is how many "shortest paths" that run through it. This is a rough measure of how central a given tweet is in a network. (Ex: @declanm and @mmasnick have a thick line connecting them because many other nodes are connected to the two through that tweet.)
  • Edge color represents the language of the tweet. (Ex: Tweets in English are blue, Spanish are yellow.)

The nodes are positioned using an "force directed" algorithm which is typically designed for undirected graphs, but I found it to be the most visually compelling of Cytoscape's layout options. To learn more about force directed graphs, take a look at this d3 tutorial visualizing the characters in Victor Hugo's Les Misérables.

To really browse the graph visit GigaPan where I've uploaded a 32,000 x 32,000 pixel version.

I highly recommend GigaPan's full screen mode. I've also created a couple snapshots on GigaPan that highlight interesting nodes: @BarackObama, @GoDaddy, and @LamarSmithTX21 and @DarellIssa.

If you really want, you can also download the 36mb gigapixel file, the Cytoscape source file, and the PDF vector version of the network graph.

Thanks again to Public Knowledge, The Brick Factory for providing the infrastructure to record the tweets, and everyone who has helped fight against SOPA and PIPA over the last couple of months, especially those who tweeted about it.

Thoughts on Verizon and Google

In early 2007 I attended a talk at Fordham Law School by William Barr, the former US Attorney General and current Verizon General Counsel and Executive Vice President. The premise of his talk was that regulation, of the network neutrality kind, would only hurt technological innovation in the broadband and Internet space.

A lot of has changed since then, and now that Google and Verizon have stuck a deal purportedly threatening the openness of the future of the web, I thought I'd revisit some of my thoughts from that night as well as muse about what this deal might mean and why its happening now.

During his lecture Barr attempted to point out that there had never been an instance of a telecommunications company violating the terms of network neutrality, so why would they begin now? Out of nowhere, from behind me, someone shouted "What about Madison River?" That person was Tim Wu, who I didn't personally know at the time, but who would later become a friend of mine. Tim had interrupted Barr to remind him aboutMadison River where a local telecom had blocked VoIP connections for broadband subscribers because the telephone company didn't want to compete with inexpensive internet telephony. It was precisely the kind of violation of network neutrality that Barr was claiming could never have happened. Barr dismissed Madison River as an isolated incident which didn't represent the overall policy of non-discrimination by the telecom industry.

Later in the lecture, Barr tried to envision an industry closely regulated by the FCC in order to uphold network neutrality. This would be a world that Barr thought no one would want: innovation would peter out as businesses would face a high barrier of entry in the form of regulations. Conversely, if corporations had the opportunity to really invest in research and development without the fear of future regulatory action, then they might come up with services and tech that would be even better than TCP/IP. Barr believed that it was naive for us to blindly accept that TCP/IP was the best we were going to get for transferring data and communications over a network. Who is to say Verizon or AT&T couldn't come up with a better protocol? TCP/IP has plenty of performance issues (real time synchronous voice communication was a huge challenge), so why not let Verizon innovate at the protocol level, and sure, maybe they'd prioritize some kind of traffic, but it would be for the benefit of technological innovation. Just think of all the potentially amazing applications they'd could come up with if the FCC just left the innovation to Verizon's R&D lab instead of the open internet and the public?

Just say no to walled gardens.

During the question and answer period, I asked Barr why he thought that consumers wanted more walled gardens of content, and whether it was wise to assume the market was going to support another set of AOLs, Compuserves and Prodigys? He replied that of course they consumers wanted better content -- video on handheld devices was going to be the future and the telecoms were going to be the only companies who could deliver it. I insisted that consumers only really want the internet in their pockets and that he was kidding himself if he thought a curated walled garden on a handset would be nearly as appealing as an actual functional web browser (something no mobile company had delivered yet).

In a sense we were both wrong and we were both right. Consumers did want mobile video on demand, but they also wanted the entire open web in a functional experience.

Prior to Barr's lecture Verizon had announced a half-baked partnership with YouTube which would offer limited and selected versions of YouTube videos for watching on handheld devices. Then, a couple of months later, Steve Jobs announced the iPhone which would have even greater support for YouTube. Verizon was banking on curated portals inside hobbled handsets, and Apple had just bet the farm on the touchscreen and a mobile Safari browser. We know who won this battle. Does anyone ever talk about watching YouTube on their 3 year old cell phone any more? Does anyone even remember the partnership?

Why Verizon and none of the other telecoms never fully invested in a serious mobile browsing experience is best explained by their general hostility to the open web. The big telecoms have always loathed the net, whether it was manifest in an engineering snobbery towards the "dumbness" of TCP/IP or the fact that the net worked best when it treated their products not like products at all but like common utilities, something no company wants. So it has never been surprising that the telecommunications industry never bothered to create a real mobile browsing experience; they were too eager to strike Big Deals with Exclusive Providers of Proprietary Content than supply an actual connection to the open web.

Steve Jobs, to his credit, saw the opportunity to serve consumers what they really wanted, and he and Apple have since been handsomely rewarded for creating a mobile browsing experience worth using. Google's choice to freely offer Android was a brilliant bit of strategy: all of the telecommunications firms and handset manufacturers were panicking and desperate to compete with Apple's iPhone, so why not give supply them what they wanted?

So now Verizon and Google are making an uneasy deal behind the FCC's back and trying to assuage the FCC and the public that they're really doing it in the name of technological innovation. Think about all the applications that could exist if we didn't have to rely on the Internet! Healthcare Monitoring! The Smart Grid! Advanced educational services! Incredible entertainment and gaming options! These are all ghosts of walled gardens past and there's no reason to believe that a competitive startup can't supply these exact services over the open web.

The wireless component of the Google/Verizon deal is the biggest wild card and the most controversial aspect of their joint policy proposal. The two companies argue that the principles of network neutrality shouldn't apply in the wireless space. I couldn't agree less. The telecoms have demonstrated very little capacity for innovation in the wireless space in the last 15 years (why is it so hard to develop SMS applications? why is Google voice such a pain to reconfigure as my voicemail? etc.), so why would we trust them now?

Ultimately, why shouldn't the principles of common carriage and network neutrality apply to the wireless space? Because its too difficult? Too expensive? I don't buy it. What the wireless space needs now is faster and cheaper TCP/IP service and a more open application infrastructure. Negotiating one off deals for new channels and services will only remind us of Compuserve circa 1999.

Lessig, Crawford and Wu have a good post about the proposal, but also read Jonathan Zittrain's thoughts on it here too.

Fighting iPhone App Store Stockholm Syndrome with Easter Eggs

Some iPhone app store developers are beginning to suffer from Stockholm syndrome and are now sympathizing and fighting on behalf of their captor, known as the iPhone approval process.

From Wikipedia's article on Stockholm Syndrome:

Stockholm syndrome is a psychological response sometimes seen in abducted hostages, in which the hostage shows signs of loyalty to the hostage-taker, regardless of the danger or risk in which they have been placed.

And just as Patty Hearst picked up a machine gun to rob a bank while being held captive by the Symbionese Liberation Army, these developers are attacking the sane programmers trying to save them.

Here's a guest post on TechCrunch where Matt Galligan, a CEO of an iPhone app development shop where he calls out Yelp for not abiding by Apple's rules:

Call it sneaky, call it clever, but I call it deceit. Apple has put forth specific guidelines, and “rules” around their app development, and while I don’t always agree, it’s the reality of how we must work with them for now. Yelp hid their easter egg behind shaking the device, which isn’t always the most intuitive action to take on an app that contains some maps and lists. As a result, the unsanctioned Augmented Reality view was gone from Apple’s radar.

Why is Galligan chastising Yelp? Sure, he acknowledges, the app store may act badly sometimes, but hey, rules are rules, right?

Wrong. He should be commending Yelp for putting their app's approval on the line by risking Apple's wrath. Yelp must have one of the most popular free apps in the iPhone app store, so it is quite a risk to release it with functionality purposely hidden from Apple.

But its the right kind of risk; it's gutsy, offers a new whiz-bang feature, and asserts Yelp's right to develop whatever features they want outside the scrutiny of their captor.  These are values that all developers need more of when creating iPhone applications.

And, if as Galligan predicts, Yelp's risk forces the App Store approval process to spend more time digging through source to discover undocumented functionality using forbidden (Gasp!) API calls, then maybe it will demonstrate to Apple that it's just not worth treating your developers like hostages, and they'll dismantle the approval process entirely.

Apple now has such strict control over the development process that some developers have clearly lost the ability to think for themselves. That means we have to find every opportunity to encourage them to fight against their captor's tyranny.

That means encouraging risks like Yelp's and developing more Easter eggs for iPhone apps.

So if you're reading this and are also currently developing an iPhone app, think about including an Easter Egg that might rankle Apple. You won't be ruining it for the rest of us, you'll be chipping away at the wall of Apple's tyranny over developers.

DeCSS and (My) Radicalization

Philosophy Club Poster

I made this poster for a meeting of the Philosophy Club at Wilton High School. Admittedly, my definition of "philosophy" was pretty loose and this poster's point was pretty incoherent (apologies to MLK), but I had found myself talking about the 2600 DeCSS case Universal v. Reimerdes so much with my friends, that I figured it might be good to found a club where we could keep similar conversations going. Since our school didn't have a debate club at the time (there were rumors about an ill-fated trip involving a school bus sinking in the Norwalk River), we didn't really have any other venues to do this besides study hall.

Luckily, my father happened to be a working philosophy of science professor and had enough spare time to help us get the club off the ground. I think I organized the first session and ranted about the DeCSS case, but we later moved onto more academic subjects and discussions. The club was a high point in what was mostly a difficult period in my life and school. I think I still have some photos that we intended to submit to the yearbook and if those turn up I'll try and post them. Unfortunately the club never survived after our class's graduation as we were unable to find a faculty adviser or enough student interest. I would later use the skills I developed to launch Free Culture @ NYU, so I suppose I was on the right track.

The polemical writings of Emannuel Goldstein, editor in chief of 2600 and the main defendant in the case, about the magazine's choice to publish DeCSS had galvanized me. Goldstein articulated that the issues at hand in the suit were really ones of freedom, source code, and speech, not piracy and profits. As an early adopter of Linux (Slackware 3.3 anyone?) as well as a kid who loved movies and was incredibly excited about the potential of DVDs, the practicalities of the case were quite clear to me: why shouldn't I be able to run whatever software I wanted to play my own DVDs? Who says I can't read *that* source code? Jon Johansen, the teenager hacker who cracked the DVD encryption scheme, CSS (not to be confused with the other CSS), played the role of sympathetic hacker who I, not incidentally, looked up to.

Free speech on the internet, heck, freedom itself, appeared to be at stake, threatened by a very bad part of a very new law that sounded like it was bought and paid for by the exact interests suing our magazine.

During the case's 2nd Circuit Court of Appeals trial in May of 2001, I wore a t-shirt featuring the censored source code while sitting in the audience. The Wall Street Journal interviewed me that day and it wasn't until last year that I discovered my quote actually made it into the article in the paper:

Looking back, I now realize my interest and involvement in this case marks my early foray into the world of radical online free speech activism and copyright reform. I knew the 2600 case was important (clearly, I spent a disproportionate amount of time thinking about it, debating it, and following it closely), but I did not estimate how much these issues would continue to shape and influence my life and career. I've now been involved in this community for almost a decade, and it's only beginning to get really interesting.

Obviously, I was not alone. This case and these issues not only radicalized a generation of free software developers and enthusiasts, but also trained them with a set of skills necessary to successfully navigate these issues in the future.

My friend and now colleague at NYU, Gabriella Coleman has written an article about our story called "Code is Speech: Legal Tinkering, Expertise, and Protest among Free and Open Source Software Developers"  published in the academic journal Cultural Anthropology. Biella's paper is one of the best overviews of the conditions that precipitated the birth of a generation of internet and free speech activists. Biella concludes by arguing this type of political activism and legal autodidacticism represents a new kind of engagement with democracy, which of course, I completely agree with and am proud to be part of.

Download the PDF of her paper here, or look for it in your copy of Cultural Anthropology.

Regarding Public Disclosure of Private Fact on Social Networks

A quick update about the Facebook governance post I wrote a while ago where I wondered whether disclosing private facts about yourself on your Facebook page would constitute "public disclosure of private facts" and thereby prevent you from claiming invasion of privacy should a friend disclose something they discovered on your semi-private profile:

... American law prevents me from disclosing private facts about Alice that are not news worthy. However, if Alice had disclosed such private facts in a public space (perhaps in front of a large audience), I can pass on the facts to others and even publish them.

But what if Alice discloses her private fact on her Facebook profile? It remains private in the sense that only I and her friends can see it by logging into Facebook’s private service, but it also arguably public in the sense that I and her friends are also an audience. Does it matter how many friends she has? What privacy settings did she have in place?

Through a Slashdot post, I just stumbled across a case that hinged on a very similar fact pattern, Moreno vs. Hanford Setinel. The judge decided that since a teenager wrote a post on her MySpace blog revealing facts she believed (and now regretfully wishes) were private, she could not claim a breach of privacy under the doctrine.

The judge astutely points out that since the teenager's MySpace page and blog were publicly available to "anyone with a computer and Internet connection.", they couldn't be considered private even if she believed her actual audience to be tiny. But this leaves open the question of whether using Facebook's privacy settings would create a particular level of security that would classify the profile and facts as "private."

Obviously details about actions and relationships matter a great deal in determining whether privacy has been breached and whether certain disclosures are public "enough" to negate a plaintiff's privacy claim. But what is still interesting to me, is whether certain technical choices a user can make on Facebook are substantial enough to shift a profile from being public to being private in the eyes of the law.

As Lessig argues, code is law, but in this case, we might be able to see it the other way around: Facebook's code could amount to sufficient law.

The Staggering Hypocrisy of the MPAA

MPAA shows how to videorecord a TV set from timothy vollmer on Vimeo.

This video is shot by my friend Timothy Vollmer at the current DMCA exemption hearings. The issue is whether Congress should allow educators and students the rights to rip DVDs for educational purposes. Peter Decherney succeeded in establishing this right for film historians working at universities, and is now seeking to broaden it to all educators and students.

In the video, a representative from the MPAA is demonstrating that it is "easy" to access and compile content from a DVD without the need to rip it using decryption software. Their suggested technique? A camcorder pointed at a flatscreen hooked into the audio signal.

This is evil and hypocritical a number of reasons. First, the MPAA has positioned themselves against camcording movies. Here, they're showing how easy it is to do. They're also one of the main organizations which have successfully lobbied for criminal penalties against people bringing camcorders into movie theaters.

Second, the software used in the presentation is VLC. VLC disables the MPAA's price fixing scheme known as region encoding and can also decrypt DVDs, providing yet another example of where the MPAA thinks their own rules don't apply to them.

Third, the MPAA has been leading the pack in attempts to close the "analog hole" through legislation and collusion with hardware manufacturers. The analog hole is precisely the phenomenon demonstrated in this video; since audio and visual data needs to be broadcast into an analog signal eventually (our brains are not capable of decrypting 1s and 0s into images and audio yet), there will always be a avenue in which to record media so long as our computers obey us.

"Closing the analog hole" refers to forcing manufactures to cripple hardware so that it is incapable of broadcasting analog signals and also incapable of recording them. It is the stuff of a dystopian science fiction plot not technical reality.

Ultimately this video demonstrates the insidiousness of the MPAA's strategy: they want to force educators to use a technique that they're simultaneously lobbying to prohibit.

End result? The precise strategy suggested by the MPAA, the analog hole, gets legislated away by the MPAA, and educators are left wasting money and time on multiple copies of crippled media.

UPDATE: Another way I'm thinking about this video: it proves that the MPAA knows closing the analog hole is impossible, thus exposing their attempts at legislation as disingenuous.

Props go to Tim for posting such a illustrative video (not to mention the nerve to post clips of Harry Potter under fair use!)

What would have Twitter looked like on 9/11?

I spent the first week of college living through September 11th in and around New York City and have since endured recurring plane crash nightmares.

Which is why I was relieved to find out after the fact that today's close call with Air Force One and two F-16s was a photo-op rather than another generation-defining tragedy.

Reading the New York Times' extensive coverage of the episode on their blog had me wondering about how the event unfolded on everyone's-favorite-real-time-reporting-source: Twitter. What was the first tweet that observed the fly by? Was it panicked? How many people retweeted it? What would have Twitter looked like on 9/11?

We'll never know, but I've done a bit of searching for terms related to today's news ("nyc plane")* and have discovered one of the first tweets at around 10:30am (around the time of the first flyover) by n8s8e asking JetSetCD whether Obama was supposed to be in NYC:

Shortly after, @The_Pace asks a similar question, and then @hugoyles mentions that Goldman's trading floor was evacuated. Then @ChicagoSooner reports that CNBC had confirmed the sightings. @Rithesh asked if there was a plane crash in lower NYC, and then @grapejamboy breaks the news that the Pentagon confirmed the flights as a photo-op. From then on, most tweets cover the story properly.

It's clear that Twitter beat traditional news outlets today in relaying that something was happening with a plane over NYC's downtown skies. However, as @Rithesh's tweet demonstrates, there is potential that misinformation gets disseminated (there was no crash) as well, so the system is not noise proof.

There's also a limit to what can be gleaned from Twitter search at any given moment, and a very real chance that all the signal will itself become noise. As commentators smarter than I have observed, this makes Twitter a fantastic "raw material" in a journalist's process, but not a final product itself.

But really, what's the difference between leaving a search open in Tweetdeck and leaving CNN on in the background?

UPDATE: Zander points out this great piece in the Nieman Journalism lab breaking down the Twitter accounts of today in much better and greater detail than I did.

*This search is not scientific at all and is probably leaving out earlier sightings. I tried searching for "plane" but Twitter's search is frustratingly limited to narrowing queries by day as opposed to hour and minute (which would be ideal here) and will only deliver a max of 1500 results for any term. There are obvious security reasons for this, but it presents a fantastic example of how Twitter can capitalize on search: I'm  willing to shell out a couple of dollars for access to do more sophisticated searching.

We Are One if You Are HBO

photo by jurvetson
photo by jurvetson on flickr

Techdirt is reporting that Against Monopoly is reporting that HBO is sending take down notices to people who have uploaded their own recordings of the Inaugural Concert: We Are One.  I haven't been able to verify this, but if it is indeed the case, it would seem that HBO is misunderstanding their rights under copyright law. Note that I am not a lawyer, so this is not legal advice.

Since HBO merely owns the copyright to their recording of the concert, they can't control what other people were doing with their own recordings from their own cameras. This is because a work is not entitled to copyright protection unless it is fixed. The actual performance that happened that evening wasn't fixed or copyrighted until it ended up on HBO's tapes (or hard drives).

If the content of the concert was in the public domain or free (e.g., The Star-Spangled Banner is in the public domain since it was created prior to 1923), then any audience member who recorded it had the right to make a recording of it and distribute that recording since they owned the copyright to the video. Putting aside questions of anti-bootlegging laws (which are arguably unconstitutional and not relevant to DMCA takedown notices), it is not clear that HBO can prevent distributions of privately filmed performances of public domain works that were performed in a public venue, which, if the Against Monopoly report is correct, is what part of what they're trying to do.

However, according to the Wikipedia page, a lot of non-public-domain non-free content was performed.

Which means that by recording and distributing a live performance of say, a Bruce Springsting song, an audience member might be infringing on the boss' copyright, but probably not HBO's copyright. Does anyone know more about bootlegging laws and how they might or might not apply here?

So what right does HBO have to send takedown notices for other people's works? Sending fraudelent DMCA takedown notices is itself a violation of the DMCA, so if you've been threatened by HBO for posting videos you recorded at the inaugural concert, you probably have the right to file a putback, and perhaps take action against HBO.

There are bigger questions, however, about the inaugural committee's right to leverage tax payer money and support to sell off exclusive rights of a public event to a private entity such as HBO. I'm not clear on whether their status as a legal entity would entitle them to do this.

Anyway, while I would like to see HBO put the concert into the public domain along with other works of the federal government, that is probably impossible as the recording contains works that are in copyright, such as Bruce Springsting songs.

There is the possibility that HBO could put the video but not the audio into the public domain, but I do not think there is an easy work around for including both the audio and video. This is not to say, however, that HBO is justified in sending nasty letters to citizens interested in helping celebrate an important event.

I sympathize with the inaugural committee's desire to produce and execute a fantastic recording of a historic moment in American history. I know that this kind of production costs money and there must be incentives for creating it. But I think the conflicts between HBO and citizens indicate that copyright is not the proper incentive here. It alienates too many citizens interested in documenting their own version of history, and given the context and content of our current president's administration, sets the wrong precedent for sharing that history. HBO should be ashamed of themselves.