Original Post From Irving Wladawsky-Berger
A couple of months ago I was appointed executive-in-residence at the Center for Urban Science and Progress (CUSP), an interdisciplinary applied science research institute led by NYU and NYU-Poly in partnership with academic institutions, global companies and New York City government agencies. CUSP’s overriding mission is the study of “the grand technical, intellectual, engineering, academic, and human challenges posed by a rapidly urbanizing world.” It was formally launched last April by New York City’s mayor Michael Bloomberg.
CUSP’s research and educational programs are centered on urban informatics–“the acquisition, integration, and analysis of data to understand and improve urban systems and quality of life.” Big Cities + Big Data and Bringing Urban Data to Life are prominently displayed in its website. This coming September, it will start offering two new programs in Applied Urban Science and Informatics, a 30-credit Master of Science, and a 12-credit Advanced Certificate.
Given the central role of big data with CUSP and with other initiatives I’m involved with, I’d like to step back and reflect on what this all means.
Wikipedia defines big data as “a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.” Over the past several years we have seen the application of big data to a variety of initiatives in business, government and academia. But, a number of experts have been questioning whether it is big data per se that we should be excited about, or the emerging data science disciplines, like urban informatics in CUSP’s case, which aim to leverage big data to discover new insights, enhance decision making, improve the effectiveness of people-oriented processes and optimize the overall management of social organizations like cities, companies and economies.
Data scientists should have a hybrid set of skills: the IT skills that are necessary to deal with and analyze vast amounts of data; and the subject matter skills needed to know which valuable business insights can be extracted from the data, and how to best frame the questions and build the right model that will reveal these insights.
When used by itself, as it often is, big data might imply that the emphasis is on the data. But, with data science, it is clear that the emphasis is on the science.
Scientific disciplines seek to develop testable explanations and predictions based on applying scientific methods to their particular areas of research. As per Wikipedia: “To be termed scientific, a method of inquiry must be based on empirical and measurable evidence subject to specific principles of reasoning.”
This has long been the case with mature disciplines like physics, chemistry and biology. Every time a new measuring instrument or technology is developed–e.g., a new kind of telescope, an advanced microscope, a better technique for genomic sequencing, a more powerful particle accelerator–lots of new information gets collected that validates theories and predictions and/or leads to new questions, and sometimes, to whole new areas of research.
For the past couple of decades, we have turned our measuring instruments on ourselves. We’ve been using the ubiquitous digital technologies and devices all around us to both create and collect massive amounts of information on who we are, what we do and how we interact as individuals, communities and institutions. And, just like the established disciplines, the emerging data-science-oriented disciplines like urban informatics and information-based medicine aim to leverage all these sources of data to develop new scientific methods of inquiry.
This is all in its very early stages. What is data science?, by Mike Loukides of O’Reilly Media, is one of the most comprehensive articles I have seen on the subject. “Merely using data isn’t really what we mean by data science,” writes Loukides. “A data application acquires its value from the data itself, and creates more data as a result. It’s not just an application with data; it’s a data product. Data science enables the creation of data products. . . The thread that ties most of these applications together is that data collected from users provides added value. Whether that data is search terms, voice samples, or product reviews, the users are in a feedback loop in which they contribute to the products they use. That’s the beginning of data science.”
Loukides contrasts the holistic approach employed by data scientists with those used in data mining, business analytics and other applications that apply statistical analysis to large data sets. “[Data scientists] are inherently interdisciplinary. They can tackle all aspects of a problem, from initial data collection and data conditioning to drawing conclusions. They can think outside the box to come up with new ways to view the problem, or to work with very broadly defined problems: ‘here’s a lot of data, what can you make from it?’”
According to experts Loukides interviewed for his article, the best data scientists tend to be physicists and other scientists. “Physicists have a strong mathematical background, computing skills, and come from a discipline in which survival depends on getting the most from the data. They have to think about the big picture, the big problem. When you’ve just spent a lot of grant money generating data, you can’t just throw the data out if it isn’t as clean as you’d like. You have to make it tell its story. You need some creativity for when the story the data is telling isn’t what you think it’s telling.”
In addition, a lot of the innovation in physics and other sciences involves knowing how to break large, complex problems into smaller ones, as well as how to attack a large, difficult problem that appears intractable by making the necessary approximations and by finding a more tractable problem whose solution can be related to the larger problem’s solution.
Over the past few centuries, we have significantly increased our understanding of the natural world around us by learning how to collect large amounts of data and by developing disciplined ways to study, analyze, model and make sense of all that data. We have similarly applied our scientific methods in the social sciences to enhance our understanding of societies and human behavior. Given the explosion of new data we can now gather with our ubiquitous digital technologies, let us hope that a whole new set of data-science-oriented disciplines will emerge to help us better understand and deal with our increasingly complex lives and human organizations–like cities.
AltaFlux understands what you and your organization need to excel, and can deliver rapid innovation to unleash your full workforce potential. Together, we can empower your business by streamlining, transforming, and optimizing your key HCM and talent processes with industry-leading SAP SuccessFactors technology—enabling you to adapt at the speed of change.
AltaFlux Corporation is a global HCM cloud consulting partner based in Troy, Michigan. We empower organizations by streamlining, transforming, and optimizing key human capital management (HCM) processes with industry-leading HCM cloud solutions like SAP SuccessFactors, Benefitfocus, WorkForce Software and Dell Boomi.