New IMS President Bin Yu has been wondering how IMS can help your career.
I began my IMS presidency at JSM in Montreal in August. Data science, or big data, had been on my mind well before the meeting, during my flight to Montreal, and at the meeting itself. One question that I have been pondering is what statistics as a community—and IMS as an organization—could do to make sure we are a key player in the “new” field of data science or the “chic” field of big data, while these fields are being defined.
At JSM, my scientific activities were all related to data science or big data. Most, if not all, JSM sessions could be called “data science”. We might think we have been doing data science since the beginning of our field. However, there are new components of “data science” that have been driven by advances in computing, data storage, and data communication.
Statisticians are data scientists, but so are other people from computer science, electrical engineering, applied mathematics, physics, biology and astronomy. In my view, the key factor for our success in data science is human resource: we need to improve our interpersonal, leadership, and coding skills. There is no doubt that our expertise is needed for all big data projects, but if we do not rise to the big data occasion to take leadership in the big data projects, we will likely become secondary to other data scientists with better leadership and computing skills. We either compute or we concede.
Although this might be a bit technical, let me discuss briefly the importance of taking computer memory into account in our computation. This is important because of the predicted computation bottleneck in communication bandwidth and resulted latency. In a nutshell, memory has a hierarchy for us to respect when we compute: CPUs have very fast access to very small cache memory, fast access to small RAM, and slow access to very large disks. R has become a popular platform even in many parts of industry to directly use or interact with C++ code and there are a few functions in R to monitor usages of memory and time (e.g. gc( ), system.time( ), rprof( )). Parallel computation is an effective way to open up the bottleneck and R also has a few packages such as foreach, doParallel and doMPI to parallelize computation on a multi-core machine or a cluster.
The best learning model is the growth model in which one keeps learning. For this, there are many worthy resources on the internet. For computing skills there are, for instance, the Introduction to Python and other courses at the Codecademy, and parallel computing online graduate course by Professor Jim Demmel at UC Berkeley (http://www.cs.berkeley.edu/~demmel/cs267_Spr13/). For frontier related to big data, I highly recommend the NAS massive data report chaired by Professor Mike Jordan at UC Berkeley (Jordan et al. (2013): NAS report on Frontiers in Massive Data Analysis, http://www.nap.edu/catalog.php?record_id=18374).
The IMS is looking into ways to position our members better to engage in big data and data science activities. We hope to improve the communication skills of our members by co-sponsoring a writing workshop with ASA (led by Nell Sedransk and Keith Crank) on the Sunday of the JSM2014 in Boston; to discuss data science at the New Researchers Conference (chaired by Edo Airoldi) July 31–August 2, 2014, at Harvard immediately before JSM; and to publish papers in a special issue of the Annals of Applied Statistics (Editor in Chief Steve Fienberg) on data science.
The IMS Council has just had a discussion, led by Past-President Hans Künsch, on how to increase the representation of women among the named and Medallion lectures of IMS, triggered by the fact that this year all these lectures were given by men (cf. Terry Speed’s column in the June/July 2013 issue). There is a broader issue within IMS regarding how to increase the representation of women, and other under-represented groups, in probability and statistics. We can all contribute to this worthy course in different, doable ways. We could work on attracting students from these groups into our graduate programs and retaining them by mentoring such students at the undergraduate and graduate levels. When on conference and award committees, we could make separate lists of qualified women and under-represented groups to make sure people from such groups are considered. Departments and individuals could also participate in activities of organizations such as the Math Alliance: please watch out for a future piece in the Bulletin by Kathryn Chaloner, giving more information about this organization. Many of our colleagues have been working on this righteous cause for many years. I know many such people and I hope you do too. We should at least remember to thank them in person, at our work places and conferences.
As the IMS president, I would like to get a broader spectrum of people engaged in IMS activities. As a starter, I have asked the council to recommend people for appointments on IMS committees, and the IMS leadership is looking into other concrete measures to involve the IMS community more.
Last but not least, I’d like to remind you that the IMS Bulletin does have an online discussion forum [You’re reading it now! Ed.], at http://bulletin.imstat.org (you can leave a comment on any article or post).
Please email me at email@example.com with your suggestions and ideas on how IMS can help your career and engage us more in data science and big data.