In silico modeling for scaling up bioprocesses – its time has come

Bioprocesses harness the ability of living organisms to produce useful products such as vaccines, biofuels, and green ingredients for cosmetics.

For example, many vaccines use human cells, which are grown in bioreactors, to reproduce a harmless version of the virus, much as cells would in the body. The new mRNA vaccines use enzymes to rapidly scale-up RNA from a template. Enzymes are also widely used to turn biomass into useful green chemicals, which replace fossil fuel-derived chemicals in automotive fuel, cosmetics, and fertilizer.

Each chemical output is produced according to a “recipe” that lists the production facility settings, such as timings, temperatures, volumes, concentrations, pH, microorganisms, lactate feed, glucose, and so on. Recipes often run to hundreds of pages and require expert knowledge that can be hard to codify.

Getting the recipe wrong can mean batches of pricey products go bad and need to be thrown out. But, more usually, it will mean a sub-optimal yield, impacting profitability and time to market.

The unique challenge of scaling bioprocesses

A particular challenge is the journey from lab to production facility. All chemical processes behave differently when moved from lab flasks to industrial vats. Processes designed on the lab bench often overlook scale-up challenges, leading to late-stage redesign to improve efficiency, reduce reliance on expensive feedstocks, or design out unwanted waste chemicals. But the challenge is far greater for bioprocesses, which involve complex biological organisms and must account for hard-to-predict variables, such as how the volume of the container affects the rate of their mutation.

The dream has long been in silico models of bioprocesses. These would take data from lab measurements and bioreactors to build biological models that predict how an industrial environment will impact the process.

Such models could create a digital twin of the actual process happening in the bioreactor. This data “twin” could be used to run AI simulations, tweaking input variables such as temperature, enzyme choice, etc., to understand the impact of changes and identify optimal approaches.

Once underway, the modeled approach could also be compared with real-time measurements. This comparison would spot deviation from expected results and could automatically make tweaks or flag any concerns.

Although in silico is not new, scale-up models have proven hard to do well. The primary problem has been a lack of quality data on bioprocesses. But thanks to advances in measurement sensors that can be deployed into bioreactors – and the tools to turn their data into meaningful insights such as AI, modeling, compute power, and data management improvements – modeling bioprocesses is increasingly possible.

In silico modeling for scale-up

The opportunity from in silico modeling of bioprocess scale-up has been recognized by, amongst others, the French government. It has committed to a five-year project with Tessella, Sanofi, and others to develop a new approach to model bioproduction processes using data and AI.

The goal is to develop scalable processes that can be adapted to a wide range of bioprocess planning and optimization, reducing cost and time to market as the industry grows.

Bio Processes data journey

Source: Capgemini Engineering

This project identified four stages for building better in silico scale-up models, which are sensible to consider for anyone looking at this problem. The stages are:

  1. Sensor development and more fine-tuned data collection

    Acquire or develop sensors that allow detailed time-series analysis of what’s going on in the bioreactor. For example, the project uses electrochemical and MEMS sensors and Raman Spectroscopy to characterize microorganisms, contaminants, and chemical changes across the batch. High-quality temperature, pressure, and pH sensors are also important, and are already widely used.

  2. Use data management approaches to ensure you make sense of your data

    Set up a storage and data architecture to feed data captured into an anonymized database alongside relevant past data.

    This approach should use intelligent indexing and query systems to ensure data sets are easy for all users to find, interrogate, and integrate into models. Bake in data security and incorporate machine learning tools to spot data errors, inconsistencies, and anomalies.

    Build dashboards to clearly visualize data for users and regulatory compliance. We recommend graph-based representations that display relationships between data nodes, much as our brains make sense of the world, and aid human understanding of complex processes.

    Bio Production 3 pillars

    Source: Capgemini Engineering

  3. Deploy advanced data science to model scale-up

    Pull relevant data sets from the database to understand and design processes that will perform beyond the lab. Build these into physical and biological models that follow the laws of physics (as opposed to statistical models based on past performance) and are robust even when only limited data is available.

  4. Integrate the model with the real-world system

    Once the process gets underway, your models should predict the real-time sensor read-outs (assuming good sensors and good models). Software can then be added to correct drifting parameters to keep the process on track or flag concerns to a human if something happens outside the models’ expectations.

Why should we model scale-up early?

The opportunity to speed up the production process for bio-based medicines and chemicals is enormous. Applying modeling techniques to high-quality production process data will yield insights about the effects of different parameters on bioproduction, allowing accurate predictions of conditions for optimal yield.

The outcome will be reliable recipes that scale effectively, avoiding production problems and waste, and shortening time to market.

Case study: Scaling up dengue and yellow fever vaccines

An example of modeling scale-up comes from our work with Sanofi Pasteur in 2020, which used in silico models to estimate the yield of the bioprocess, i.e., volume produced in dengue and yellow fever vaccine production.

We looked at historical and real-time sensor data from their current production processes, and processed the data into formats suitable for modeling.

We then applied machine learning and statistical approaches to this data, allowing us to pinpoint factors that led to drift in the process. These included understanding the precise impact of temperature variations on yield and establishing the complex root causes of mutations that allowed us to predict which were likely, how this changed as processes were scaled, and their impact on the process.

We packaged this into a tool that could run within their IT and be used by non-specialists. This approach helped model optimal production conditions and predict challenges, eliminating problems that could set production back by days.

This could be improved further using the approaches being developed by the French project discussed in this article. With reliable access to granular production data, we could have started much earlier in the process, running small-scale experiments that would have revealed higher value insights more quickly, further reducing the number of experimental steps to reach the optimal recipe.



Patrick Chareyre, PhD Principal Technologist, Tessella, part of Capgemini Engineering

Patrick Chareyre, PhD

Principal Technologist, Capgemini Engineering

Patrick works within a team of passionate data scientists solving complex operational problems by deploying innovative approaches to transform data into intelligence, thus resolving major pain points in client digitalization projects.


Valeria Sputtery Project Manager, Capgemini Engineering

Valeria Sputtery

Project Manager, Capgemini Engineering

Valerie is an innovation expert with over 15 years of experience in Life Sciences and is currently project manager for implementing the Alliance France Bioproduction (AFB) within the French Strategic Committee for the Health Industries sector (CSF -ITS).

Stéphane Thery Managing Director in Analytics and AI, France, Tessella, part of Capgemini Engineering

Stéphane Thery

Managing Director in Analytics and AI, France, Capgemini Engineering

Stéphane has over 25 years of experience in leading multi-functional and international projects across industries in digitalization and, since 2018, has been the head of the Analytics and AI Center of Excellence in France.


Contact US Capgemini Engineering


You can work with a company built for now,
or you can with one engineering the trusted AI systems of tomorrow.