More About The Insightful Use of Data (and not about Basketball)

March 2009

Summary:  Real life experience of data being collected for one purpose being used for other purposes–adding to the argument that data mining is just beginning.  And, when you can do something about it (like own it or get access to it through a license in your agreements) then you should.

Two articles in a recent issue of The Economist added (at least in my mind) to the thesis that we are entering an era when the slicing and dicing of data (OK, OK, call it data mining) will yield actionable results and meaningful rewards.  The issue was February 28, 2009.

To reiterate my point in other blogs, it is not so much the results of the studies that are interesting but the fact that data collected through new digital systems for one purpose were used for another purpose.

Permit me to address the first one, on social networks as a source for data analysis.

Facebook and the Dunbar Number (Hint:  It is not 42)

First, forgive me if I get the facts wrong (but they are not the point here).  Several years ago a professor posited an upper bound (on average) of the total number of people in a social network.  That is the Dunbar number.

Later, a Professor Marsden confirmed common sense that there is a much, much smaller “core” network.  The Dunbar number is 148;  the Marsden number is around ten.  (By the way, the Dunbar number has been surprisingly stable over history as an organizing unit for groups like armies, etc.).

Online social networks make social networking more efficient to create and sustain.  (The conclusion of the studies cited is that they do not affect these numbers, but that is not germane to the point of this post.)  Crunching the numbers from Facebook confirmed (pretty much) both the Dunbar Number and Marsden’s core.  The average network is about 120 (close enough to Dunbar) and, by looking at proxies for interaction (proxies are another theme of mine), the core number worked out to be about seven for men and ten for women and in some circumstances somewhat higher.  (Please keep in mind that these are averages:  Your mileage may vary.)

The Method Is Not Madness

The analysts looked more closely to determine the core.  This is where it gets interesting.  They used responses as proxies for interaction–that is, leaving a comment or otherwise communicating with someone who has communicated with you.  (It turns out that there are cultural or national differences, by the way.  For example, in research for a long time on this core it has been known that American men tend to have a very small circle of people with whom they regularly discuss important matters–smaller than other nationalities (this is, after all, a British magazine).

You can imagine all sorts of variables that affect these numbers.  That is true.  There may be more interaction between two people flirting;  there may be more interaction between family members;  and there may be less interaction among co-workers based on their rank.  Time of day, amount of alcohol–you pick–may also affect these numbers.  (These points were not raised in this article–I am just mentioning them).

So Test for It

But that is not the point.  You could test for–or control for–any number of variables.  Now, think like an advertiser, or a publisher that wants to increase the value (and thus the price of advertising) for the advertiser.

(Note to readers:  To anticipate one concern, the analysis does not have to include PII and can–and should–be done in full compliance with the strictest of privacy policies.)

You could cross-check any number of such users to see the overlap of interests.  You could check those overlapping interests against geographic distribution over time and then over demographic data (of the group) such as number of males in given age cohorts and then demographic data for the region(s) such as the baseline of behavior for such cohorts, etc.

An advertiser could then check those results against its sales or marketing efforts in the region or, for that matter, its marketing efforts to that group on that social networking site at those particular times.

OK, these insights are not entirely new.  Many search and ad companies have been doing this sort of thing for a few years.  What has not happened is that advertisers (and their gatekeepers, the ad agencies) have not yet embraced the power of these kinds of data analytics.  In other words, it has not yet gone mainstream.

So What About Dunbar Numbers?

I have not yet figured out the import of the actual Dunbar Number (looks like a constant to me) when it comes to monetizing data.  One thought is a kind of “data threading.”  Assume that people belong to several, if not many, groups.  One could trace the overlapping interests across the groups, not to mention the overlap of group memberships, as well.  Here we are not necessarily talking about social groups (“friends” in the parlance of Facebook and “Contacts” in the parlance of LinkedIn).

For example, I belong to some groups in which a few of those members also belong to other groups of which I am a member but they also belong to other groups of which I am not a member.  Some number of those members share my interests;  some number do not.  These overlaps could be “threaded.”  And, they could also be tested over time.

The Conclusion, Please

OK, OK, I am way over my self-imposed word limit.  Perhaps you get the point.  I will post on the second article soon (about crowds being analyzed by CCTV).  This should add weight to my point that your agreements should get you access to anay data collected in any digital deals you do.

And, yes, as a result, expect CPMs (and monetizing online experiences) to rise.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s