A Different World for Social Science: Enough data, at last

How well do we understand the people? Back in college, when I was exploring different disciplines, I found social sciences disappointing. Compared to the rigor of chemistry (my chosen major), I found psychology and sociology disappointing. How things have changed.

One of my primary objections, as I read through the technical papers presented in my texts, was the smallness of the sample sizes. While there is no comparison between the number of molecules in the liter of liquid and even the entire population of the world, one would have to say that, thanks to the Internet, sample sizes have become interesting. Facebook alone has an estimated half a billion participants.

Studies done across that many of people are likely to provide reliable results–ones that are even acceptable to this one-time chemist. And, of course, the studies themselves are likely to be interesting because they deal with what people want, need, care about and invest their time in. In other words, they find information that is important to understanding people and to achieving the kinds of goals and objectives that both governments—who, we hope are working to better serve citizens—and businesses need to achieve their goals.

There are other limits that have been overcome, thanks to technologies aimed at exploiting data of these very large social networks and Internet sites. One of these is the reliability of data. The pervasiveness of connectivity, along with the vast number of instances of interaction, has provided checks against two flaws that once were quite common in the social sciences. The first was that people do not always behave in natural ways when they are part of studies. They may, even unconsciously, work to provide results that they think are the ones that are expected or that reflect well on them. Of course, they also may turn things the other way and lie within studies. They may even provide answers that are not congruent with their actual preferences and behaviors because of a lack of self-knowledge or have other misperceptions. But the behaviors on the Web are tracked over extended periods of time, and it is possible to correlate them in ways that previously could not have been done.

The second flaw of social science that limited it was simply a lag between acquisition of data and the time at which it could be processed and, ultimately, published. Because there is a constant datastream, it is possible now to make adjustments in the midst of studies and accommodate any design problems. Also, problems with the data’s being just a snapshot in time are basically overcome by the ability to continuously monitor the data.

Now I mentioned design problems. Any scientist knows how easy it is to create a hypothesis and devise tests, only to find that hidden assumptions invalidate the experiments. Now they can fix the experiments before they get too far and before large investments are wasted. But beyond that, design itself may be less important. The use of algorithms, which simply detect patterns as they exist, without depending heavily on intellectual frameworks for theories, opens up new possibilities for exploring and using information even before what is going on is well understood.

Because there is a constant flow of new information–veritable Amazons of data that are discovered every day—it is possible to go beyond what social scientists dreamed of just a few decades ago. For example, the language that people use to describe the same phenomena can vary greatly. In one of my jobs, I experienced this directly.

I would read thousands of abstracts across the many subdisciplines of chemistry and come to understand that many of the people were working on common problems–even though their vocabularies were quite different. I got the opportunity, in fact, to bring together people who are unaware of the work of their colleagues. Often, this led to some interesting collaborations. Today, it is less necessary to have human intervention for people to find each other when their jargon is not the same. All of the advances of the Semantic Web make those differences less important. This same technology, which levels the differences between specific words as they are used, can become one more way to extract meaning and value from the new Amazons of data.

This is all just scratching the surface of the huge potential we now have. Admittedly, with this new power, we are seeing news about concerns and sometimes controversies with regard to privacy. Facebook probably does not want to have any more press events where they are pushed to explain a new feature and to defend the rights they have two and to expose, reveal and use information from its members.

Certainly, as people become more aware of what they are telling the world about themselves online, especially in social networks, we can expect adjustments in how people behave and how data will be protected. But, at the same time, the potential for deeper understanding of people, their relationships, and how they behave provide new ways for serving them in more effective and efficient ways.

My host for this blog is holding a conference on 23 November to provide a forum for discussing data, social networks and what they mean to us, as well as all the other areas of research and innovation in the Institute. The session’s overall motto is «Conocer, Innovar, Crecer» («Know, Innovate, Grow») and you can explore the agenda . This agenda is in Spanish, but I was easily able to get a good translation via Google (and that illustrates another barrier to effective social science that has fallen).

1 Comment

Esteban Moro
noviembre 2, 2010 at 8:26 am · Responder
Hi Peter.
Very interesting post. In your line of argument, I recommend the following outlook article by a number of prestigious researchers about what they coined as «Computational Social Science»
http://www.sciencemag.org/cgi/content/full/323/5915/721
Dealing with human digital traces is challenging though. Most of the times observations (or experiments) can only we performed once (like in the Stock Market) and thus the question is about discovering patterns in the data, but also trying to figure out which of those patterns are universal i.e. do not depend on the actual time, country, etc of the experiment.
This is another advantage of the large datasets we deal with: the possibility to sample this data to unveil possible scaling relationships in the patterns we observe. For example: does human mobility or social networks have the same properties at different population scales?
Best regards

Blog

Sobre Peter

1 Comment

Dejar un comentario Cancelar la respuesta

Búsqueda

Suscríbete

Categorías

Blog

A Different World for Social Science: Enough data, at last

Sobre Peter

Related Posts

1 Comment

Dejar un comentario Cancelar la respuesta

Búsqueda

Suscríbete

Categorías