Humankind 2.0
a book in progress...
Meditations on the future of technology and society...
...to be published in China in 2016
These are raw notes taken during and after conversations between piero scaruffi and Jinxia Niu of Shezhang Magazine (Hangzhou, China). Jinxia will publish the full interviews in Chinese in her magazine. I thought of posting on my website the English notes that, while incomplete, contain most of the ideas that we discussed.
(Copyright © 2016 Piero Scaruffi | Terms of use )
Big Data: History, Trends and Future(See also the slide presentation)
Narnia: "Big Data" is a vague term. What does it really mean? What is new in "data"?
piero:
The fact that the big companies are offering their big data platform as open-source platforms to everybody
is a sign that sometimes even the most aggressive businesses value collaboration more than competition.
Undderstanding big data is one field that will require a shift from competition to cooperation.
I think that "big data" will introduce a new way of thinking about human life. It may sound like a horrible
world in which machines produce data and machines read data and then machines tell other machines what to do,
and, yes, our body is the ultimate object of this process, and all these machines make it less "human".
However, you can also find a Buddhist way to look at data. Our existence is a chaotic flow of data. Those data
don't last in time, they are just instants of existence. All combined, they are "me".
They are similar to the Buddhist "dharmas".
Each dharma is relative to every other dharma, each dharma is caused by another dharma.
The "Visuddhimagga" says:
"Only suffering exists, but no sufferer is to be found. Acts are but there is no actor".
In Buddhism no being "exists" for any period of time: each moment is an entirely new existence.
I think that "big data" introduce a concept of human life that is somewhat similar. We think too often of
data as just numbers, but those numbers actually represent real people. If i write that, unfortunately,
600,000 people are killed every year of malaria, mostly children, that is not just a number: it is people,
and it is also the people around them, mothers and sisters and wives, who are crying.
I hope that in the future, when everything is interpreted in terms of data, we will be able to interpret
data as people, not numbers.
Big Data requires a new way of thinking. The data come from all sorts of sources. A specialist (whether human
or machine) cannot possibly absorb all of them. What is required is an interdisciplinary approach.
In the 1930s two men pioneered "big science" in the USA: Vannevar Bush at the MIT and Ernest Lawrence.
Unfortunately, the motivation came from the war, but the beneficiary was actually peacetime society.
Bush and Lawrence realized that solving big problems requires many minds: big science gathered together
scientists from different disciplines. Out of that approach we got, for example, nuclear power and the Internet.
Big science was an early application of "big data" except that in those days the data were in the minds of the
scientists. The approach, however, will have to be similar. In order to use big data to solve big problems
we will need to use a similar interdisciplinary approach.
There is an even earlier example of solving big problems with big data: ancient China.
I think China can be a model for the new way of thinking because China actually invented it many centuries
ago. During the Tang and the Song dynasty the ideal person was an interdisciplinary scholar: politician,
historian, writer, painter, poet, calligrapher... The ideal person was supposed to study all (all) the classics,
not just one or two. China invented the multimedia mind (China also invented the multitasking mind!)
The ideal person was in charge of solving the big problems of society, thanks to the fact that it had absorbed
so much knowledge from so many different fields. What has calligraphy got to do with solving big problems?
It shapes your brain. If the brain is not right, you will never find the right solution. Every discipline
helps create the right way of thinking. I think it was the right approach, and it is still the right approach
today. Maybe China needs to rediscover its own approach to managing a complex society, except that this approach
needs to be adapted to the age of big data (i.e. the ideal person needs to use machines not just the "maobi").
Improved statistical and computational methods, and improved visualization methods, are being developed at
many universities; but these new methods serve a simple purpose: make fast computation cheaper (big data requires
expensive computers). The progress has been impressive:
decoding the human genome originally took 10 years, but now there are startups that do it in less than a day.
Stanford's most popular textbook for undergraduate computer science students is
"Mining of Massive Datasets". The second edition has just been published by Cambridge University Press in 2014:
http://www.mmds.org/
There is no secret: anybody can use those methods to analyze big data.
But new math will not give us more useful applications, just cheaper data analytics, and the
reason is simple: mathematicians are not the ones who know the problems of the world.
This is yet another field in which an interdisciplinary approach is required to come up with applications that
are not just "data analytics". Yes, we need mathematicians; but we also need scholars from all other disciplines.
Solving problems in human society is not just a math test.
For example,
Gary King, director of Harvard University's Institute for Quantitative Social Science, has assembled a team of
sociologists, economists, physicians, lawyers, psychologists, etc.
You can see the current line-up at http://www.iq.harvard.edu/team-profiles
Berkeley has set up the Institute for Data Science (BIDS) staffing it with ethnographers, neuroscientists, sociologists, economists, physicists, biologists, psychologists and even a seismologist: http://bids.berkeley.edu/
And in 2012 the USA
launched the "Big Data Research and Development Initiative" to apply big data to government.
It is a shame that so far (since the invention of computers) the main application of data analysis has been to maximize the profits of big corporations.
These days applications of big-data analysis include the "recommendation engines" of Amazon and Alibaba, that use
data about other customers to suggest what you should buy. In the USA, people got upset when the media learned
that Target, one of the the largest retail chains, used math to guess when women were pregnant. Target's algorithm
recognized purchases typically related to expecting mothers for the sole reason of targeting them with special
promotions. Is this all we can do with big data about pregnant women?
There is a broader category of big-data applications, applications that guess the future. For example, using
big data we could predict when pollution will reach a dangerous level without waiting for the day that it happens;
we could predict where and when crime is more likely to happen and therefore allocate police resources there;
Banks already use a "predictive" kind of big-data analysis when they want to determine if a customer deserves
a loan: credit underwriting. Banks could decide underwrite a loan in seconds by using all the data available
on people like you. You are likely to behave like all the other people in your age group, income group,
ethnic group, etc. The bank can use big data to determine if you can be trusted.
These predictive applications typically look for associations: if you have the same purchasing history of many other
people who defaulted on their credit card payments, it is very likely that you will default too.
In technical terms, they look for patterns and then try to build hypotheses.
But we are back to the problem that most data are "read" and analyzed by machines, not by humans.
We have known for centuries that hypothesis-formation methods have a weakness:
finding correlations in very large datasets is not difficult, what is difficult is to
understand "causation". If all the people who caught the flu yesterday in Turin prefer black and white shirts,
it doesn't mean that black and white shirts cause the flu, or that the sellers of white and black shirts are
contagious: it may simply mean that they are all fans of the Juventus football club, whose official shirt is
black and white. Half of the population of Turin is Juventus fans. Mathematicians who don't follow football would
reach the wrong conclusion. Machines who know nothing about football would even be worse at reaching the
conclusion. Instead, a human being who knows about the city of Turin would realize that the correlation does not
tell us much about causation, except that maybe the outbreak started at a stadium where Juventus played.
This problem is as old as the science of statistics, but it becomes particularly vexing with huge datasets because
in huge datasets the likelihood of accidental correlations is... huge.
Predictions based on big data can be especially useful in the medical and biotech fields, where the amount of available data is virtually infinite but sometimes we don't even store it in digital format.
The human genome contains billions of base pairs. Our current knowledge of what all the genes in the human genome do,
and how they interact with each other to cause diseases, is ridiculously minimal.
For the record, biologists also study the microbiome, the bacteria that live inside us and are crucial to the proper functioning of our body (for example, digestion): there are 100 times more genes in the microbiome than in the genome.
We don't know what those billions of base pairs mean, but we have 8 billion people on this planet
whose genomes can be compared to find out which combinations of genes are likely to be a problem and which
combinations can give immunity. Some people are immune to malaria. We can find out by studying the distributions
of those billions of base pairs.
Stanford hosts a yearly conference titled "Big Data in Biomedicine" with the motto
"Data science will shape human health for the 21st century".
Google itself once analyzed search terms by region to predict outbreaks of influenza;
and Google's project DNAStack studies genetic data from around the world to predict diseases https://www.dnastack.com
Shamelessly, a lot of the big data that are needed to provide useful applications to the public are owned by
corporations that don't make them available to the researchers who could use them.
There are also data all over our environment that can provide useful information and we "waste". For example,
the Sloan Foundation is funding a project to collect information about which microbes we humans leave on the
touchscreen machines of the railway stations. Those microbes can yield a lot of information about the health situation
in a city.
Narnia: do you agree with Jaron Lanier that only corporations make money out of everybody's data?
piero:
Narnia: What about the "quantified self" movement? In 2007 writers Gary Wolf and Kevin Kelly of Wired magazine introduced the term "quantified self". The concept caught up quickly and in 2011 the first international conference was held in Mountain View. The "quantified self" movement started from the premise that our lives continuously produce data. We can physically record those data by wearing sensors connected to computers. These data can then be used to document a person's life: self-tracking and self-monitoring. Isn't that an example of "big data" applied to ordinary lives?
piero:
|