Research claim that A full 90% of all the data in the world has been generated over the last two years. This tsunami of digital data have brought along incredible insights but also many many headaches. One of the most prominent challenges relates to the inferences we draw from these data. Kate Crawford argues in “The Hidden Biases in Big Data” on Harvard Business Review that “We give numbers their voice, draw inferences from them, and define their meaning through our interpretations. Hidden biases in both the collection and analysis stages present considerable risks, and are as important to the big-data equation as the numbers themselves.” O’Reilly Radar’s Mike Loukides stresses, in “Data Skepticism“, that “even when you have unlimited data, you have to be very careful about the conclusions you draw from that data. It is in conflict with the all-too-common idea that, if you have lots and lots of data, correlation is as good as causation.”
This particularly strikes a nerve when it comes to Social data (and Social Network Analysis). You must have, at least once, come across titles such as “Social Networks are making us anti-social” or “Facebook causes divorce” or “Twitter moods help predict stock markets” etc. This might reflect a mere misinterpretation of the original studies (causation sells way better than correlation as the comic promptly illustrates) or a defect in the analysis and interpretation processes of said studies. We have a tendency to jump into such conclusions because our minds react better to narratives and “because” is a good ideas’ connector.
When examining the virality of content in social networks, a tweet for instance, the observed contagion phenomena is often explained through ‘Peer influence’. It could be the case, but it’s good to stop and think ‘Maybe the answer isn’t that simple’, maybe there are alternative explanations. Sinan Arial, an associate professor at the MIT Sloan School of Management, delivers a compelling talk about social contagion and highlights just how Homophily (the tendency of similar people to bond together) is a viable explanation for some diffusion phenomena often attributed to peer-influence.
When stressing how ‘Big Data’ (Social Data) will revolutionize business or how ‘visualization will save big data’, vendors fail to stress these interpretation issues (which is understandable). Human intervention is an omnipresent part of the conception and analysis process. And unless our analysts (or data scientists if you like) are open minded enough to consider alternative explanations, or we come to find more appropriate models, we might just be digging ourselves into a much bigger hole.