Visualizing SOPA on Twitter

When I heard that Tyler Gray at Public Knowledge was looking for someone to do some analysis on tweets that mentioned SOPA, I thought I might try Cytoscape (an open source tool used for biomedical research, but handy for large scale data visualization) to show some of the relationships between people discussing the controversial bill on Twitter.

The result is a graph of the most active users referencing SOPA

Public Knowledge worked with the Brick Factory to set up their slurp140 tool to record approximately 1.5 million tweets which Tyler sent me in the form 350mb CSV file. I first used Google Refine to clean and narrow the set down to only tweets which were replies to someone else. This left approximately 80,000 tweets which I then imported into R. I then ranked all of usernames by how often they appeared both as senders and recipients, and then picked the approximate top 1,000 users. Since replies are sent from one user to another, the graph is directed: each edge has a direction with an origin and an arrow pointing at the recipient. There are 1,021 nodes identified by their Twitter usernames, and 1,757 edges a good portion of which are labeled with the content of their tweet.

Visualizing networks this large is more of an art than a science

I've tried to strike a balance between visual complexity, aesthetics and readability of tweets, but you'll find that this isn't always successful. Sometimes tweets run into nodes, sometimes edges run into labels, and sometimes the graph feels like a total mess. But that messiness is part of what made the SOPA debate on so interesting over the last month.

Thousands of people participating with plenty of cross talk.

The colors and sizes of the nodes and edges are coded in the following ways:

  • A node and its label size is maps to the number of tweets both posted by a user and and mentioning a user. (Ex: @BarackObama is a huge node because so many people were tweeting at him about SOPA).
  • Node color represents the number of outgoing tweets. The greener the node, the more replies a user posted. (Ex: @Digiphile sent a lot of tweets mentioning SOPA.)
  • Edge thickness represents "edge betweeness" which is how many "shortest paths" that run through it. This is a rough measure of how central a given tweet is in a network. (Ex: @declanm and @mmasnick have a thick line connecting them because many other nodes are connected to the two through that tweet.)
  • Edge color represents the language of the tweet. (Ex: Tweets in English are blue, Spanish are yellow.)

The nodes are positioned using an "force directed" algorithm which is typically designed for undirected graphs, but I found it to be the most visually compelling of Cytoscape's layout options. To learn more about force directed graphs, take a look at this d3 tutorial visualizing the characters in Victor Hugo's Les Misérables.

To really browse the graph visit GigaPan where I've uploaded a 32,000 x 32,000 pixel version.

I highly recommend GigaPan's full screen mode. I've also created a couple snapshots on GigaPan that highlight interesting nodes: @BarackObama, @GoDaddy, and @LamarSmithTX21 and @DarellIssa.

If you really want, you can also download the 36mb gigapixel file, the Cytoscape source file, and the PDF vector version of the network graph.

Thanks again to Public Knowledge, The Brick Factory for providing the infrastructure to record the tweets, and everyone who has helped fight against SOPA and PIPA over the last couple of months, especially those who tweeted about it.

  1. wicked awesome.

    media companies need to stop trying to hold back technical progress (i.e. Napster) and adapt (i.e. itunes, rhapsody, new napster, netflix)

  2. [...] sites, pages Facebook, comptes Twitter, vidéos sur YouTube et de nombreux blogues dont celui de Fred Benenson qui permet de visualiser tout ce qui se dit au sujet de SOPA sur Twitter à l’aide de [...]

  3. [...] You can take a look at the graph of the most-mentioned names related to SOPA over at [...]

  4. [...] growth and cost savings from tech and I.T. depts want cloud and analytics. – Quentin HardyVisualizing SOPA on Twitter FREDBENENSON.COM | Very cool infographic by data savant @fredbenenson showing people discussing [...]

  5. [...] Visualizing SOPA on Twitter FREDBENENSON.COM | Very cool infographic by data savant @fredbenenson showing people discussing SOPA on Twitter. – Jenna Wortham [...]

  6. This is wiked, I agree. I love that you explain your methodology and link to helpful tools. However, you didn't include much analysis of your findings. Anything surprising or outstanding? Was much the conversation dominated by arguing or general outrage? Not sure if we can make any inferences of this nature based on your findings, but I'd be curious to hear if you had any hypothesis.

    Finally do the 80,000 tweets represent tweets in English or using a certain term or hashtag? Cool stuff. If you have time, thanks for answering.

  7. [...] de mooiste uiting van SOPA op Twitter vond ik op het blog van Fred Benenson. Hij maakte een infographic die de onderlinge coversaties tussen tweets over SOPA [...]

  8. [...] Visualizing SOPA on Twitter | ████ █████&#9608... When I heard that Tyler Gray at Public Knowledge was looking for someone to do some analysis on tweets that mentioned SOPA, I thought I m… Fredbenenson [...]

  9. [...] 国外网友反应很快,在 SOPA 一片抗议声中,使用了 Cytoscape 分析工具,把 Twitter 声量(发布有关内容的推,@ 他人、转发他人的推)最大的人,用一幅图给囊括起来,这是一个有趣的社会学实验。 [...]

  10. So amazing!!

    Thank you very much for sharing all the steps and troubles you have found. This post is a little treasure.

  11. [...] Twitter wurde der Hashtag #sopa rund 2.2 Millionen Mal genutzt, #pipa rund 411.000. Fred Benenson hat die Hashtag-Daten ausgewertet und anschaulich visualisiert. Hier gibt es eine 32.000 x 32.000 Pixel [...]

  12. Good job! Thanks for sharing the images!
    It is possible that you could also share the graph itself in GraphML/GDF format?


  13. [...] [bild/text] visualizing sopa on twitter (fredbenenson) [...]

  14. [...] Kickstarter data engineer Fred Benenson created a nice visualization of the #SOPA tweets which can be viewed on his blog, [...]

  15. [...] and the blogosphere but also the social networking sites. Fred Benenson created an interesting visualization of the tweets about #sopa during the day that reveals some of the patterns of the global [...]

  16. [...] data dude Fred Berenson visualized conversations around SOPA on Twitter. [...]

  17. [...] 国外网友反应很快,在SOPA一片抗议声中,使用了Cytoscape分析工具,把Twitter声量(发布有关内容的推,@他人、转发他人的推)最大的人,用一幅图给囊括起来,这是一个有趣的社会学实验。他一共分析了150万条推,找出1021名活跃的Twitter用户,最终生成的图片有32000×32000那么大。原图片可以在这里找到。 [...]

  18. [...] Visualizing SOPA on Twitter by Fred Benenson (via Nerdcore) [...]

  19. [...] Fred Benson also has an interesting graph visualizing mentions of SOPA on Twitter. [...]

  20. [...] noch ein paar Links: Fred Benenson: Visualizing SOPA on Twitter Buzzfeed: 20 Images That Will Change Your Life (Nice one!) Why We’ve Censored The SOPA [...]

  21. [...] between people discussing the controversial bill on Twitter,” writes Fred Benenson. The result is astounding. Benenson admits that analyzing data that large is more an art than a [...]

  22. [...] I had worked with Cytoscape to render a network graphs, but this seemed like a good opportunity to make something interactive and also a perfect first [...]

  23. [...] the visualization here (be sure to view it in full-screen mode). Benenson has also put together a post that outlines the tools and process behind his [...]

  24. [...] that you are not. During the fight against the Stop Online Piracy Act, I was fortunate to work with Kickstarter’s Fred Benson on an analysis of tweets mentioning the hashtag “#sopa.” By studying this data-visualization we were able to demonstrate the depth and breadth of opposition [...]

  25. [...] data set, however, was missing from the paper: the role of social media, in particular Twitter, in reporting, amplifying and discussing the bills. The microblogging platform connected many [...]

Leave a Reply

Your email address will not be published. Required fields are marked *