Duncan Watts, principle researcher at Microsoft Research, visited McGill yesterday as part of the the official launch for Centre for Social and Cultural Data Science. Before his formal talk, he spent an hour with graduate students discussing data science, the skills required, the usefulness of the data and working for industry versus academia. The insight then spilled over to the Faculty Club where he talked about projects using online experiments and social media data. The talk will hopefully be one of many future events discussing all the issues related to the recent big data science advent taking shape.
“Data science” is extraordinarily broad, though we can roughly define some necessary skills. Dr Watts described himself more specifically as a computational social science. Some equate ‘data science’ with the ability to scrape social media data though, at least in my experience, this is actually one of the easiest parts of the start-to-finish process of handling data, and is only one of many ways to acquire a dataset that could lend itself to the skills of a data scientist. Scraping data can be as few as 2 lines of code in R. Even if you have overcome the logistics of storing such data, the challenges have just begun. As Dr Watts mentioned, getting your data to the point it can be analysed is the most time-consuming thing. These data wrangling skills are a key part of finding the story in the data: imagine, in a large dataset, manually looking at the data is barely possible.
Regardless of the definition, a blend of statistical and coding skills are best to become a data scientist. This transferable, broad skill set tends to be more useful than substantive expertise, though this depends on what specifically you’re ‘doing data science’ for. Short of a focused degree in data science, getting these skills usually involves cobbling together your own curriculum by self-teaching skills in R and/or Python and/or statistics on the side. Dr Watts highlighted the wide availability of summer courses and workshop, including those run by McGill’s CSCDS. On top of learning them, I’d emphasize the importance of ‘proving’ them, a problem I’ve struggled with lately. It’s one thing to say you know how to do something, but another for your CV to say that. Think creatively…
So you’ve learned how to do some scraping, some analysis and how to prove it, but what next? The people who use Twitter are different than those who don’t. The people who post every minute on Facebook are different to those who don’t. The people who have never used a computer are just different. Dr. Watts described how generalizability is a tough problem in research studies relying on the web and, in particular, social media. Depending on the research question, the problem may be insurmountable. For example, a study of an older population may best not be done using Twitter feeds. But generalizability was one of many problems inherent in ‘big data’, as Dr. Watts acknowledged, and it may or may not be your biggest problem. Just be clear on your target population and, subsequently, whether social media data is useful.
Given his dual academic-industry background, students were also curious about the relative freedom researchers get. Dr Watts suggested freedom was really company-dependent and thus an important question to ask during interviews. This can become an issue when research results would hurt a company’s bottom line, reputation or create unwanted media attention. These problems might not matter to you if you are not interested in publishing, but can be bothersome if you consistently feel censored. Understand what situations you would not like to be in, and find out if any prospective employer would regularly put you in those situations.
After the informal discussion session with students, Dr. Watts gave a formal talk at the official launch for CSCDS in the Faculty Club. His team has demonstrated that ‘going viral’ is rarely a real thing, that the popularity of a song is difficult to predict because it is perpetuated by peer pressure, and that some people always choose to cooperate with others to achieve an aim even when they get stung. Beyond experiments and interest in social phenomena, Dr Watts also described a disaster response system that relied on real-time collection of information reported on social media. There are other examples, for example, in aid distribution systems.
For more information on the science that has become data skills, see the Centre for Social and Cultural Data Science website. Also see the number of books Duncan Watts has published: