As a result of the merger between Ciba-Geigy and Sandoz, Novartis is a heavyweight in the pharmaceutical industry. In 2021, the group, which employs 110,000 employees worldwide, achieved a turnover of $51 billion. He noted that he spent just over $9 billion on research and development.
Pharmaceutical research, clinical trials, manufacturing, logistics… all these activities at Novartis generate data. Lots of data.
In the group’s IS, data for research and development alone takes up more than 20 petabytes of storage space.
This fact is not new. The group continues to develop its infrastructure.
Great Collectible Projects
Since 2015, the company has been building a dedicated analytics platform for its clinical trials. In 2016, she introduced Nerve Live. It is the “command center” dedicated to the recruitment of patients charged with approximately 500 clinical trials each year. It includes among others the SENSE unit.
It is a “watchtower” that provides information on the status of clinical trials and allows problems to be identified and the risks of overbudgeting or delays avoided. Other advanced analytics tools make it possible to manage human resources, simulate trial results, or even manage the provision of treatments needed for their progress.
” [Ce sont] Primary defenses that have enabled us to defer and mitigate risks for our clinical trials during the global pandemic, with limited impact on our operations and schedules,” the group stated on its website.
In 2018, this platform was then based on an internally published data lake. That was before Novartis developed a hybrid architecture. Nerve Live now relies on local data transmission servers paired with a private cloud.
Next, Novartis developed data42, a platform designed to analyze and explore historical data from its clinical trials. This is based on an architecture that combines cluster HPCs with AWS instances.
As of February 2022, 42 data was used by nearly 700 researchers across three Novartis entities. More than petabytes of data were ingested from more than 3,000 source systems. The platform brings together at least 3,000 clinical trials involving nearly 900,000 patients.
In 2021, Novartis, in partnership with Microsoft, published a data science suite to study the formulas used during the early stages of manufacturing experimental treatments.
This extensive use of data doesn’t just apply to research. Since 2015, historians and about two dozen ERPs from the group’s factories have been uploading their data into the site’s Hadoop system.
After finding that its HDFS does not support scaling poorly, in 2019 Novartis decided to migrate its data to Amazon S3 and EBS instances. Since then, Novartis has combined historical data from its factories with data from IoT sensors equipped with AWS IoT Greengrass. Industrial data is ingested into SpotOn’s real-time analytics platform, which is rolled out across 18 production sites in 2021.
In China, Novartis publishes an app called AI Nurse in partnership with Tencent. It is dedicated to the prediction and monitoring of cardiovascular disease. It is used by more than 5,000 healthcare professionals to support 300,000 patients across 1,000 hospitals.
Formula One, a platform for two thousand data scientists
Novartis did not want these initiatives to remain isolated. In 2019, the group launched the F1 Formula One programme.
The pharmaceutical group then had the idea of creating a global platform that would connect all company data for analytical purposes. Today, the device integrates “nearly all of the company’s internal data,” according to Loic Giroud, global head of digital delivery at Novartis.
Loic GeroGlobal Head of Digital Delivery, Novartis
“We have a global analytics platform to which we connect over 80 major data sources,” he says. “It is used for research, clinical trials, and manufacturing — which includes production, logistics, sales and marketing — as well as support functions: purchasing or human resource management.”
Given that the company’s operations, from research to drug marketing, are interconnected, it was agreed to create a modular architecture. The projects were to accommodate 2,000 data scientists for the group.
“The platform is a multi-cloud, multi-product architecture,” says Loïc Giraud. It is deployed across multiple AWS and Azure cloud regions in Europe, the US, and China. According to a job posting from the group published in February 2022, Novartis was to increase its presence on AWS by 200%, and by 1500% on Microsoft Azure.
This platform is divided into three groups: There is a “landing area” where data is ingested and modeled, before it is coordinated, standardized, and integrated into the MDM. Therefore, the data is subject to the Novartis business rules.
The data can be moved to an area designated for “scrutiny”, i.e. for refinement, predictive and didactic analysis of the data. Finally, this prepared or analyzed data is pushed to AI or machine learning applications.
The browser “marketplace” allows companies, especially data scientists, to search for data sets, sources, or any other assets they have access to. The market plays the role of a group-wide data governance solution, integrating data ratios, indexing, quality management, and data mining capabilities.
The goal is to design reusable, referenceable data sets to meet different use cases. Today, more than 1,500 assets are available from the platform.
Similarly, the DevSecOps team is responsible for developing and maintaining the platform in an operational state. Data access and infrastructure provision are largely automated.
After launching initial production in the third quarter of 2020, the platform now hosts more than 300 use cases. “Internal orders are exponential,” says Loïc Giraud.
How Novartis bets on a snowflake
One use case more specifically relates to the group’s US businesses targeting pharmacies and physicians. With the reclusive and aging Islamic State activity slowing, teams have had to put in significant efforts to launch and monitor new media campaigns. Analytical tools were no longer relevant to the situation, while the release of new drugs was difficult to predict.
This was one of the first use cases envisaged when the Formula 1 initiative was launched at the end of 2019. The emergence of the COVID-19 pandemic has upset the habits of salespeople who used to meet pharmacy managers and doctors. Business operations must evolve to continue to inform healthcare professionals in the United States. One goal was to improve the segmentation of this population in order to improve media campaigns.
Loic GeroGlobal Head of Digital Delivery, Novartis
That same year, Snowflake announced the launch of its data-sharing platform, Data Exchange. This is one of the arguments that eventually convinced platform administrators to opt for a multicloud data warehouse.
“The launch of Data Exchange has changed a lot of things for us,” emphasizes Loïc Giraud.
Combined with marketing efforts to increase awareness of Novartis’ drugs, this data-sharing capacity has enhanced business opportunities and accelerated a range of processes, from research to the commercialization of molecules.
So Novartis reviewed the way it accesses external data, from partners like IQVIA and Symphony Health. “We were one of the first big pharma companies to really bet on Snowflake’s data-sharing platform,” said Ed Scura, head of solutions engineering at Novartis, during a session at Snowflake Summit 2022. It only takes a few days now.”
When it comes to analytics, Novartis wanted their teams to have an integrated experience. That’s fine: the system at the heart of data exchange allows data sets to be shared using a simple URL. It was also essential that the quality of the data coming from a hundred streams was punctual. “We built hundreds of quality checks before the data got into the hands of data scientists and analysts,” says Ed Scura. For this, platform administrators from the F1 program have integrated an in-house developed framework, ETL Matilion and Apache Spark functionality served by Databriks.
Analytics workbenches consist of a mix of Amazon SageMaker, Dataiku, and R Maker, among others.
For their part, companies can access some analytics via Qlik Sense apps. “We are the largest consumer of Qlik Sense. We have 60,000 users and more than 500 Qlik apps,” Loïc Giraud reported.
It is not uncommon to see analytics platforms that allow the integration of internal and external data sources. Beyond that, Novartis has deployed 300 data pipelines—both ETL Matillion streams and Spark jobs—on Databriks to run its Snowflake data warehouse. These streams are linked to a graph-driven database – AWS Neptune – for inclusion on the internal market.
The use case alone includes a petabyte layer of data that is provided to more than 1,000 users in the United States.
Another selection criterion that motivated Snowflake’s publication was performance, according to Loïc Giraud. The pharmaceutical group was able to try Snowflake in 2017.
After migrating its data from Hadoop to AWS, Novartis realized that Amazon RedShift was not meeting the sales force’s compensation processing needs. “It took a long time. Our tests with Snowflake convinced us: we had never seen this level of performance before,” says Loïc Giraud. The cloud data warehouse was then expanded to include other use cases. “We started with analytics, but we’ve found that Snowflake can be used at all levels,” he adds.
New use cases under development
At Novartis, Snowflake has become the market of choice for data exchange. The repository is one of the layers used to gather data needed for sales force, research and development and maintain CSR policies, but it can also drive new use cases.
In fact, Loc Giraud welcomes the arrival of hybrid tables, a mechanism introduced by Snowflake to accommodate transaction processing from the data warehouse. Likewise, recent support for unstructured data, the Python programming language, and Apache Iceberg could further increase the use of this technology.
Consequently, Novartis continues to spread Snowflake in the EMEA region and would like the supplier to strengthen its presence in China.
Meanwhile, the pharmaceutical group is exploring other technologies, including digital twins and blockchain.
Novartis is a member of the European PharmaLedger Consortium. The organization is developing a blockchain dedicated to the sharing, transparency and integrity of supply chain, clinical and health trial data.