In 2012, IDC projected that the digital universe will reach 40 zettabytes by 2020. This was based on the assumption that 1.7 megabytes of new information will be created every second for every human being on the planet in 2020.

IDC realised that its projection was on the conservative side! Its current forecast says the global datasphere will grow to 163 zettabytes by 2025.

To give you an idea of the kind of size we are talking about, 1 zettabyte is a trillion gigabytes, which is roughly equal in size to 43 billion Blu-ray discs or 250 billion DVDs!

This unimaginably vast amount of digital information can enable humanity to do many of the things that were previously considered impossible.

Data Science, which allows us to make sense of Big Data, is helping us find relationships between distinct kinds of information, thus generating new insights, creating new knowledge and leading to breakthrough innovation. Big Data can be the most potent instrument for transformative social change the world has ever seen.

It is no surprise then that the most powerful and valuable corporations in the world today are the ones that deal in data. The five US tech titans, Google, Microsoft, Amazon, Apple and Facebook each have market caps in excess of half a trillion dollars. They are collectively worth nearly US$3.3 trillion, which is larger than the size of the Indian economy!

Data is quickly becoming 21st-century’s economic fuel, replacing oil as an essential resource for the global economy, and thereby the focus of intense struggle to control it!

We stand at the dawn of the Data Age, where data will be the lifeblood of our rapidly expanding digital existence and the new raw material of business: an economic input at par with capital and labour.

IDC forecasts that 20% of data produced in 2025 will belong to data sets that run the risk of being disruptive or even life-threatening, if unavailable at any moment.

Watch video here. Scroll down to continue reading


Unorganised data is useless information but when it is analysed, structured and interpreted, it becomes valuable knowledge. Data Science or the study of databases and data sets epitomises what Prof Abraham Flexner of Princeton University refers to as “The usefulness of useless knowledge”! Data has been collected and collated for centuries but in the absence of computing power, it remained archived as historical artefacts! For example, the Swedish town of Uppsala had diligently and meticulously collected health records of its citizens over 40 years. The power of this data was only realised in the 1990s when computational analysis of this rich database provided familial and biochemical markers to cardiac diseases. Likewise, deCODE genetics, created a large genomic database of the uniquely homogenous Icelandic population in 2001 which was translated into genetic risk factors associated with several diseases and a gene discovery engine that would drive research and innovation in the future.

Have you ever wondered why Uber is valued at almost US$100 billion? Is it only because it is a successful taxi aggregator?  No, that’s just a part of the story. Uber’s astronomical valuation is also a factor of its ownership of the biggest pool of data about supply, i.e., car drivers and demand i.e., passengers for personal transportation.  Similarly, Tesla is not just a maker of fancy electric cars. The company’s latest models collect huge amounts of data, which allow it to optimise its self-driving algorithms and then update the software accordingly. By the end of 2016, the firm had gathered 1.3 billion miles-worth of driving data! Data that can be leveraged to make better real-time decisions and navigate real-world environments!

Algorithms are increasingly self-teaching — the more fresh data they are fed, the better it is as it learns from input data, and adapts the output to respond intelligently to new data, kicking off the “data-network effect”, a powerful virtuous cycle of innovation where data is used to attract more users, who then generate more data, which helps to improve services, which attracts more users.

For example, the more users comment or “like” a Facebook post, the more it learns about those users and the better targeted the ads on newsfeeds become. Similarly, the more people search on Google, the better its search results turn out. Or, the more buyers and sellers use a marketplace like eBay, the more efficient it becomes.

As the world becomes increasingly hyper-connected and everything and everyone becomes inter-connected into a single global network, the “data network effect” will expand. It will allow Data Science to identify patterns across the data that would normally escape even the brightest experts among us. It will open new doors to the future.


Indian entrepreneurs have been anticipating opportunities from a Big Data explosion for over a decade. One of the first people to identify the opportunity was 28-year-old Dhiraj Rajaram, who quit his job in the US and came back to set up a small data analytics company called Mu Sigma in Bangalore in 2004.

Mu Sigma signed up its first customer — Microsoft — in 2005 to provide data analytics services that would help the software behemoth make day-to-day business decisions based on hard data rather than gut feel. Today, Mu Sigma counts 140 of the Fortune 500 companies as its clients and employs over 1000 data scientists.

As an early entrant into the fast-growing analytics space, Mu Sigma was able to sign up marquee names like Sequoia Capital, General Atlantic Partners, MasterCard and Fidelity as investors. Thanks to its expertise in data analytics Mu Sigma is valued at over a billion-dollars today!


Zettabytes of data demand exponential computing power or quantum computing as conventional computing is akin “to try and drink from a fire hydrant”.  Unlike binary digital computing, quantum computation involves quantum bits that can apply complex algorithms and integer factorization to deliver solutions at speeds that are orders of magnitude faster than classical computers.  Simply put, quantum computing takes advantage of the strange ability of subatomic particles to exist in more than one state at any time which allows it to store more multiplexed information and thereby perform operations more quickly using less energy than classical computers.

Google and IBM are leading the way in developing quantum computers claiming “quantum supremacy” to develop credible and reliable artificial intelligence based solutions by 2020.  Last year a team of Google and NASA scientists created a D-wave Quantum Computer that was 100 million times faster than a conventional computer. But moving quantum computing to an industrial scale is difficult. IBM however, claims that it is confident it will deliver Quantum computers with 50-Qubits at commercial scale within a few years.

If Artificial Intelligence is to deliver on its stated promise of transforming the digital world, it cannot do so without quantum computing.  In the near term, AI will rely on machine learning based on first generation quantum computing and thereby address obvious opportunities in driverless vehicles, robotics and weather forecasting.  Next gen quantum computing will have the potential to replace human intervention in ways that are unimaginable today!


Let me now revert to data science in the context of drug research. Creating a new drug from scratch is an expensive and time-consuming process. First, researchers must identify a potential therapeutic target. Then, a drug that acts on that target must be designed, purified, and tested, both on cells in a dish and in living animals. In order to be approved, this new drug must meet rigorous safety specifications and pass through highly controlled phases of human testing.

It could take up to a decade for a new drug to complete the ‘lab to market’ journey, and cost over US$ 2.5 billion.

With a one in ten success rate, the global pharmaceutical industry is looking at Data Science for enhancing the probability of success and shortening timelines.

Data Science is today enabling the pharmaceutical industry to throw off the shackles of the conventional one-drug-one-target-one-disease model of healthcare innovation, which is inefficient, expensive and time-consuming.

Whilst it is true that the cost of bringing a new drug to market is enormous, it is also true that the astronomical price tags for revolutionary new treatments are unsustainable!

The US FDA recently approved the first gene therapy, Luxturna, which will hit the market for a mind-boggling price tag of US$ 850,000.  At US$ 425,000 for each eye, this treatment for a rare type of inherited blindness, will be the most expensive drug on market.

Moreover, despite these medical breakthroughs we are unable to anticipate ‘off target’ side effects caused by all drugs.  Additionally, we are unable to explain the presence of the large numbers of non-responders to many drugs.

Drug research today is therefore relying on multiplexed data sets to answer and solve many of these medical challenges.

An illustration of this is provided by ‘Bugworks’, a data driven Biotech startup in Bangalore (incubated at C-CAMP at NCBS) that is developing next generation antibiotics to fight drug resistance. Antibiotics in the past have been happy accidental discoveries and have remained incremental innovations over the decades.  It is noteworthy that since 1962 when Fluoroquinolone was discovered as the ‘nextgen’ antibiotic we have not seen any breakthrough innovations to deal with a large unmet need in infection control, which still remains one of the highest causes of mortality.  Bugworks has focused its efforts to leverage and combine multiple data sets from a vast pool of clinical, genomic, pharmacological and even epidemic data to develop algorithms that can design novel antibiotics to combat wide ranging infections and even superbugs.


Data Science is therefore ushering in the next wave of drug innovation, by promising to transform every stage of the new drug discovery and development process.

Bioinformatics, a specialized branch of data science, is helping incorporate knowledge derived from genomics, proteomics and other biological disciplines into drug discovery and drug design in order to come up with revolutionary ideas for new molecules.

However, we are talking of a humungous amount of data here! If printed on standard office paper and stacked, the raw sequencing data of just one patient’s genome would top an 80-storey building!

The quest to sequence the first human genome was a massive undertaking. Between 1990 and the publication of a working draft in 2001, more than 200 scientists joined forces in a US$3-billion effort to read the roughly 3 billion base pairs of DNA that comprise our genetic material.

Fortunately, genome sequencing costs have plummeted in the last decade, from almost US$10 million in 2007 to close to US$1,000 today – cheap enough to put the cost of sequencing all of an individual’s DNA on par with many routine medical tests. Doctors and researchers can today study an individual’s genome without spending an astronomical sum.

The availability of genetic information—together with other phenotypic as well as medical information, is helping identify new drug targets by linking particular genes and their products to individual diseases.

In addition to genomic data, other -omics data have moved into the spotlight. Proteomics and metabolomics, as well as epigenetics and an integrated view of all of these disciplines, are gaining more and more traction. Also, the impact of lifestyle choices is now starting to be factored in.

On the other end of the spectrum, electronic health records and other patient-related information in registries, hospital administration databases and payer databases are helping establish real-world evidence for the effectiveness of a particular medicine.

Data analytics is helping predict clinical outcomes, inform clinical trial designs, support evidence of effectiveness, optimize dosing, predict product safety, and evaluate potential adverse event mechanisms.

Drug discovery research today is bringing together cross-disciplinary teams comprising biologists, chemists, clinicians and data scientists. Data scientists draw on their expertise in computer science and statistics to sift through gargantuan virtual databases of molecular and clinical data to zoom in on likely drug candidates that treat key mutations.

The Mazumdar Shaw Center for Translational Research (MSCTR), which I have helped set up in Bengaluru, is working closely with clinicians at the Mazumdar Shaw Medical Center (MSMC) to facilitate translational research that will contribute to early detection, diagnosis and treatment of various human diseases by analysing the data being generated through patients being treated at the hospital.

Close collaboration between the scientific teams of Strand Life Sciences, the Mazumdar Shaw Center for Translational Research (MSCTR), the Mazumdar Shaw Medical Center (MSMC) and HCG Cancer Hospitals recently led to breakthrough research in cancer detection. It resulted in Strand Life Sciences becoming one of the earliest companies to introduce the revolutionary liquid biopsy technology in India.

Liquid biopsy is a paradigm shift as it provides a highly sensitive technique to identify the ‘genetic signature’ of a person’s cancer through a simple blood draw. It helps to create personalized cancer treatment plans for each patient. It can also provide insights to assess if a patient is prone to a relapse and if a person is likely to respond to therapy or not.


Designing medicines to target diseases requires knowing what proteins are involved and what their shapes are.

Advances in data science are leading to accurate predictions of the interactions between novel drugs and their targets, helping reduce the cost of drug discovery by several orders of magnitude.

One of the most critical steps in evaluating any potential medical treatment involves testing its toxicity in animal models. Now Big Data is offering a validated alternative to animal tests. In fact, data analytics and machine-learning techniques that combine data sets from disparate chemical and biological assays have shown that a virtual model can at times predict toxicity better than traditional animal testing.

Data science is also leading to a rethink of the whole clinical development approach and promises to enable faster, safer, and less expensive clinical trials. Pharma companies are now using multiple data sources – including social media and public health databases – and more targeted criteria like genetic information, to identify which populations would benefit the most from a planned clinical trial.

In silico clinical trials, which involve the use of computer models and simulations in the development of a medicinal product, are gaining popularity. These trials can be executed quickly and for a fraction of the cost of a full scale live trial.

Leveraging advanced data analytics presents a real and significant opportunity for the pharmaceutical industry. A recent McKinsey analysis has shown that operating efficiencies attainable from scaling the impact of advanced analytics range as high as 15 to 30% of EBITDA over five years, accelerating to 45 to 75% over a decade given the potential impact of predictive modelling in discovering and optimizing new blockbuster therapies.

In increasingly cost-constrained global healthcare markets, pharma companies that leverage analytics for advanced data-driven decision-making over the next one to three years will gain a decisive advantage over their peers, says McKinsey.


Developing and testing a new anti-cancer drug can cost billions of dollars and take many years of research. However, advanced data analytics is turning up novel relationships in biological data that are impenetrable through ordinary statistical means. Data analytics is also allowing scientists to look at data in a multifaceted way and match drugs with an identified Mechanism of Action (MoA) to a particular disease or across different diseases.

It is helping researchers find effective anti-cancer medications from the pool of drugs already approved for the treatment of other medical conditions. Because these drugs are already known to be safe for human use, these trials could potentially be initiated very quickly thus cutting a considerable amount of time and money from the process.

For example, a computational method to systematically probe massive amounts of open-access data to discover new ways to use drugs, led to the identification of four drugs with cancer-fighting potential last year. A team from the University of California, San Francisco was able to demonstrate that one of the four drugs — an FDA-approved drug called pyrvinium pamoate, which is used to treat pinworms — could shrink hepatocellular carcinoma, a type of liver cancer, in mice. This cancer, which is associated with underlying liver disease and cirrhosis, is the second-largest cause of cancer deaths around the world yet it has no effective treatment.

Similarly, a novel bioinformatics approach in 2016 enabled a team of researchers find that antimicrobial drug pentamidine, which is prescribed for pneumonia, can be used to treat patients with advanced kidney cancer.

There is also great excitement surrounding a promising new use for metformin, a drug that has historically been the mainstay of Type 2 diabetes management. Emerging data show a link between the use of metformin with a decrease in the risk of developing cancer and a reduced cancer related mortality!


India is fast gaining global recognition for the quality of its data scientists who enable evidence based decision making. The ability of our data analysts has led to the emergence of decision sciences at the intersection of technology, business and maths thus creating a new man machine ecosystem.

The exponential growth in the fields of Artificial Intelligence and machine learning is leading to a huge demand for highly skilled professionals. Over the last one year, the number of analytics jobs in India has almost doubled. As more data analytics projects get outsourced to India, due to a dearth of such skills across the world, we are likely to see a boom in this field in the years ahead.

India has an opportunity to create integrated databases that cut across genomic, molecular, chemical, clinical and even medical administrative and insurance data. Additionally, plant genomics and plant chemistries can provide another powerful multiplexed database.

India is home to more than 1 billion people, consisting of more than 4,500 anthropologically well-defined populations. India thus offers wide genetic variance – studies have shown that genetic diversity in India is up to four times greater than that found in Europeans. This offers a huge opportunity for India to create and mine a rich source of genomic information. In fact, smart mining of genetic data can help India transform the disadvantage of a huge disease burden into a competitive advantage by capturing and analysing this information.

Moreover, India is one of the richest countries in the world as regards genetic resource of medicinal and aromatic plants. It constitutes 11% of total known world flora though its total land mass occupies only 2% of the globe. India has 15 agro-climatic zones, 47,000 different plant species. Medicinal plants, as a group, comprise approximately 15,000 species and account for about 32% of all the higher flowering plant species of India. Out of these the Indian systems of medicine have identified 1,500 medicinal plants, of which 500 species are generally used in the preparation of drugs. IT-enabled data mining can provide India with a huge competitive advantage in bringing innovative plant-based therapies to the world.

Bio-IT being data agnostic can also play a powerful role in other segments of biotechnology including agriculture, environmental and industrial biotechnology and much more.

The Indian government can play an enabling role in incentivizing the creation and mining of these databases by giving tax breaks to companies involved in such activities. Such a move will give a fillip to the Digital India story as it will allow Indian companies to intelligently leverage data to come up with high value products for global markets.

The recent Economic Survey 2017-18 for the first time included a chapter on the transformative potential of Science & Technology. It called for a doubling of efforts and expenditure in R&D with a mission mode focus on Genomics, Mathematics, Dark Matter, Energy Storage Systems, Cyber Physical Systems and Agriculture.  Integral to this must be Big Data Analytics.  It called for the Private sector to play a key role in these efforts.  If so, then data science led R&D and businesses that rely on data analytics must be treated as priority sectors for both grants and lending.  Gene sequencers, quantum computers and other Big Data infrastructure must be provided with tax incentives in order to build capacity and scale.


 In 1906, Sir William Osler, who is known as the father of modern medicine, had articulated a bold vision for medical science:

“To wrest from nature the secrets which have perplexed philosophers in all ages, to track to their sources the causes of disease, to correlate the vast stores of knowledge, that they are quickly available for the prevention and cure of disease — these are our ambitions.”

More than a century later we seem to be on the verge of realising Osler’s vision as Artificial Intelligence, fuelled by the exponential rise in Big Data, becomes a powerful extension of human intelligence.

Innovation is the collective compound interest of human ingenuity. By enabling new ways of thinking and amplifying human innovation, Data Science is enabling exponential, rather than linear, growth in human knowledge to unleash breakthrough innovation!

Download the full PDF of the speech here: IHC_Lecture_KMS



One thought on “IHC Public Lecture: Strategic Importance Of Data Sciences In Research And Innovation

  1. Mrs. Kiran Mazumdar-Shaw
    My name is Volodymyr, I am a priest who serves children with disabilities, homeless children and various elderly people, and I ask you to help me in my village, a small country and a poor country with 73 rooms. In general, as a priest, I want to develop a village, people, children, youth and build up, but I can not, because I can not, because it is very financially difficult. I ask you to tell it to your friends so that they can share with friends who can also help me promise you, as a priest, your name will be written in my village:
     What do I want to do as a priest?
    1) I want to make and restore the Church so that Christians, children, can come and pray before God.
    2) I would like to create a school for children so that they can learn and receive knowledge;
    3) the creation of a playground;
    4) poor children, orphans, widows, people with special needs, deaf people need help.
    Allow me, according to my request, to fall into your heart so that we all respond to it, sincerely sacrifice God’s plan. Thank you in advance, I promise you, parents, children, grandchildren, pray for them, and I will always remember them in prayer! Who can help, do not turn away from the plan of God, with respect, priest Volodymyr!

Leave a Reply