Statistical applications in pharmaceutical and chemical field

by Riccardo Bonfichi

Chat GPT-4o: A powerful Tool to Quickly Identify Anomalous Lots in a Dataset

05/27/2024

Abstract

This article explores the application of ChatGPT-4o, an advanced artificial intelligence tool, in the field of pharmaceutical Quality Control. Using a dataset comprising analytical results from thirty-one production batches of a hypothetical active ingredient, the study demonstrates how ChatGPT-4o can quickly identify and efficiently interpret anomalies within complex datasets. Leveraging Principal Component Analysis (PCA), the AI not only identified anomalous batches, but also provided insights into the reasons behind such anomalies (for example, higher impurity levels or changes in solvent residues). Furthermore, the AI hypothesized potential problems in the production process or in the quality of raw materials, based on significant deviations observed in certain batches.
The results highlight the ability of artificial intelligence to make data interpretation easily accessible. However, the study highlights the importance of statistical knowledge to formulate detailed questions and understand the answers generated by artificial intelligence. Ultimately, ChatGPT-4o has proven to be a powerful tool for improving the efficiency and effectiveness of data review processes, such as those for Annual Product Quality Reviews (APQRs).


Read more
Read more for Apple

Monte Carlo method: a useful tool for the simulation of pharmaceutical processes

04/10/2024

Abstract

In the precision-driven world of pharmaceuticals, where safety and regulatory compliance are paramount, simulation methods stand out for their ability to predict and optimize complex processes. This post focuses on the key role of Monte Carlo simulations, a statistical method that transcends guesswork, providing a robust framework for decision making in the pharmaceutical industry.
The concept of the “Monte Carlo Method” and how it works are introduced using simple analogies. The introduction is completed with some historical data and the advantages/disadvantages of this approach.
Five case studies are then presented which refer, respectively, to five different operations/situations typical of the pharmaceutical industry:
. crystallization,
. production of an API
. micronization
. robustness of the analytical method
. stability studies
These examples, although greatly simplified for reasons of clarity, show well the practical usefulness and versatility of "Monte Carlo Simulations" in different scenarios in the pharmaceutical sector.
Each case study is illustrated with the help of graphs and the results are commented on since practical decisions then depend on them.
The R scripts for the case studies mentioned above can be freely downloaded at:
github.com/rbonfichi/montecarlo simulation
In conclusion, this post aims to demonstrate how, once again, statistical methods applied to pharmaceutical control and production improve the reliability and efficiency of processes while reducing costs.


Read more
Read more for Apple

Elements of Statistics for the Pharmaceutical Quality Control using Microsoft Excel®

02/02/2024

Abstract

Every day, in the field of pharmaceutical manufacturing and control, enormous quantities of data are produced, which remain, for the most part, underutilized. A statistical approach enables the transformation of such often disorganized data into useful information, facilitating a better understanding, utilization, and improvement of the processes that generated them.
Microsoft Excel® is undoubtedly the simplest, most widespread, and commonly used program for "data management" in companies, including those in the pharmaceutical field. Although it was not created and developed for specific applications in the statistical domain, Excel® allows for significant achievements if its full potential is exploited.
The purpose of these slides is to demonstrate, through simple but meaningful examples, how even with this almost "zero cost" program, it is possible to extract a wealth of information from your data and, who knows, perhaps even spark a passion for statistics. Indeed, once the enormous potential of this discipline is understood, and with a small investment, the desire to delve deeper and transition to more specific software can become a natural progression.
In order to stimulate this interest and show how much Excel®, despite its intrinsic limitations, can offer, numerous common examples from daily production and control practice have been compiled. Application examples include assessments and decision-making related to trends in analytical parameters and yields, the impact of process parameters on production, deviation investigations, out-of-specification (OOS)/out-of-trend (OOT) results, supplier validation, and more.


Read more
Read more for Apple

Continued Process Verification: a Practical Approach

01/10/2024

Abstract

Continued (or Ongoing) Process Verification is a structured approach that allows a company to monitor the production process and make the necessary changes to the process and/or control strategy, as appropriate.
According to Dr. Shewhart, all manufacturing processes, and therefore also chemical-pharmaceutical and pharmaceutical ones, show a "controlled" or "natural" variability, to which is often added an "uncontrolled variability" attributable to so-called “special” causes.
Furthermore, all manufacturing processes tend to deviate from "ideal conditions" in which only "natural variability" characterizes them.
It is precisely for these reasons that since 2011 the FDA, and since 2015 the EMA, strengthened by the ICH Q8, Q9, Q10, Q11 and Q12 guidelines, have encouraged manufacturers of both APIs and finished pharmaceutical forms to adequately control the variability of the manufacturing processes throughout their life cycle in order to prevent dangerous deviations in the quality of the finished product.
In a nutshell, this is the meaning of Continued Process Verification (abbreviated as CPV) discussed in the FDA guideline on Process Validation (2011) and Annex 15 of Eudralex Vol. IV (2015).
Starting from the indications contained in these documents, the following slides show, through simple but significant examples and using the appropriate statistical tools, how it is possible to deal with the CPV in practice.


Read more
Read more for Apple

Bootstrap using R: a useful approach for handling chunky data

09/04/2023

Abstract

The term 'chunky data' was coined by Dr. Wheeler in the 90s to describe data that has been measured "in increments too large for the task at hand" or that result from "rounding or truncating experimental measurements". This type of data often occurs when experimental values must be reported in compliance with pre-established specifications that perhaps do not require decimal digits, or at most, only one. In the case of time series data that are naturally similar to each other (e.g., Annual Product Quality Reviews), and in the absence of decimal digits that differentiate them, it is common to find values that are repeated many times identically. This type of data, which can be clearly visualized using a probability plot or an individual value plot, leads to a substantial reduction in the variability of the dataset. As a result, a dataset may not follow a normal distribution, even though there is no scientific reason for this deviation. However, the non-normality of the datasets can represent an obstacle to the application of certain statistical tests that require the normality of the data.
A simple way to eliminate the problem caused by chunky data is to repeat the measurements with suitable tools or to report the measurements with the decimal places eliminated during rounding. Unfortunately, this is often not feasible, such as when comparing measurements from two laboratories that have used different data reporting criteria. In these cases, the absence of normality makes it impossible to correctly apply those statistical tests that are commonly used, for example, to compare the means and dispersions of two data series (e.g., Two-sample t-test or F-test for Equal Variances).
Bootstrapping, a nonparametric resampling technique, serves as an effective and easy-to-implement alternative to non-parametric tests (e.g., Mann-Whitney) for handling such data. Bootstrapping allows for the creation of many simulated samples from a single dataset, without making assumptions about the data's distribution. This technique can help in estimating the distribution of a population and can be used to make inferences about the mean and variance differences between two datasets, even when one or both are not normally distributed. This post demonstrates how to use a simple R script to implement a specific bootstrapping method, providing a quick and reliable solution.
Clearly, the approach presented here can be extended to compare two non-normally distributed datasets for reasons beyond the presence of chunky data. Typical examples include analytical parameters (e.g., related substances content) or critical process parameters that are naturally "limited" (the impurity content can never be less than zero) or arbitrarily constrained and are not normally distributed.

Read more
Read more for Apple

Applied Statistics for QA & QC in a GMP environment

01/30/2023

Abstract

In the 2011 FDA guideline on Process Validation, the term "statistical" was already used 13 times and the message to pharmaceutical manufacturers was clear: use quantitative statistical methods whenever possible to keep processes under control so as to ensure their stability over time and consistency with initial validation.
The concept of "Continued Process Verification", introduced by the FDA Guidance on Process Validation, was subsequently also taken up by Eudralex's Annex 15 "Qualification and Validation", published in 2015, which also recommended that “statistical tools should be used, where appropriate, to support any conclusions with regard to the variability and capability of a given process and ensure a state of control”.
Other important regulatory documents published later (ICH Q10 and ICHQ12) have further reaffirmed the importance of using statistical tools and not only to better define the processes control strategy, but also to design them adequately (Design of Experiments, Design Space, etc.) and all this with a view to reducing any post approval changes.
From all this, not only the multiple uses of statistical tools are evident, but also their strong practical impact.
The slides attached here, and which were used for a two-day webinar held in June 2022, present, with a structured approach, numerous quantitative statistical tools applied to pharmaceutical manufacturing and control. Given the vastness of the subject, this material obviously cannot cover all topics. However, it provides an overview that should encourage the adoption of these tools, if only for the advantages, including economic ones, that they offer.

Read more
Read more for Apple

Elements of Acceptance Sampling by Attributes

09/20/2021

Abstract

The need to verify whether a material supplied by a producer to a consumer, or by a department to another of the same company, corresponds to pre-established requirements, requires a set of statistical techniques that are called acceptance control. In general, the acceptance control can be carried out “by attributes” or “by variables” and is mainly used to establish whether the lots subjected to the control can be accepted or rejected, not to determine their quality level. This post focuses on acceptance control by attributes and the quality of the lot is measured by its percentage of defects. The three main schemes of sampling plan (i.e., hypergeometric, binomial and poissonian) are discussed and practical application examples are presented. The “control by attributes” is then considered from the process standpoint using the appropriate control charts (i.e., p or np-charts and c or u-charts). The analysis of the topic is completed by a discussion of the ISO 2859-1 standard with some practical application examples. The ISO 2859-1 standard specifies an acceptance sampling system for inspection by attributes indexed in terms of Acceptance Quality Limit (AQL).
The ultimate purpose of this post is to draw attention to the fact that although sampling plans are challenging to design and implement, they can perform a much higher function than just "police control". The information they return is indeed invaluable and is a real waste of resources if, as often happens, it is simply filed and ignored.

Read more
Read more for Apple

How to extend the shelf life of an API ? Look at its Stability Data from a Multivariate standpoint !

04/12/2021

Abstract

Stability studies are mandatory activities that, in general, are routinely conducted and equally routinely monitored as per official guidelines.
The traditional approach to stability studies is limited exclusively to recording the occurrence of a degradation process with the sole purpose of estimating a possible shelf life for the product. The objective is achieved by following the trend over time of a quantitative attribute, usually the assay value.
This approach, due to its univariate nature, is however unable to say anything about the possible causes of the degradation phenomenon and therefore suggest a way to improve things.
Since at each stability time point other quality attributes are also determined beside assay (e.g., pH, water content, etc.), the adoption of a new perspective, i.e., a multivariate approach, allows to identify those parameters, among that are measured, that most influence the degradation process. This allows us to hypothesize improvement actions on the process aimed at reducing, if not even minimizing, degradation and therefore, ultimately, extending the shelf life of the product itself.
In this post, stability data obtained under "accelerated conditions" were chosen as a case study precisely because, being available before the others (i.e., long term), they allow the degradation process to be investigated immediately.
Experimentally it was also observed that even with only the data of the third month it was possible to obtain a model similar to that obtained with the data of the sixth month. It is therefore reasonable to assume that the use of additional accelerated aging techniques (e.g., 40°C ≤ T ≤ 80°C and 10% ≤ RH ≤ 75%) will make the data available for analysis in an even shorter time frame.

Read more
Read more for Apple

ASEPTIC FILLING OF STERILE POWDERS: SOME ELEMENTS OF STATISTICAL PROCESS CONTROL AND PREVENTIVE MAINTENANCE

02/22/2021

Abstract

A precise and accurate dosing of sterile powders under aseptic conditions in vials still represents a challenge in the pharmaceutical field and this is even more true when it comes to small quantities of high-potency active substances.
To conduct this important operation of the pharmaceutical industry effectively and efficiently, microdosing machines are available that can fill up to over 20,000 vials per hour.
Among the various filling methods available, the one that uses a vacuum / pressure system is very popular.
The discs of the microdosing machine, and the chambers contained therein, are subjected to a continuous operational stress which leads to an inevitable deterioration of their performance.
To what extent is this deterioration acceptable?
When should preventive actions be taken to limit it?
These questions are answered by the Descriptive Statistics which, thanks to a simple summary index, the coefficient of variation, allows to compare the variability of each dosing chamber over time, build a case history, set limits of acceptability and then indicate when it is time to intervene in a preventive way.
Furthermore, the statistical methods allow us to go into even more detail of the filling process, modeling it and verifying its consistency between the different dosing chambers and over time.
It is worth noting that the approach and methods presented here are applicable to similar processes, at least in some respects, such as compression to produce tablets, etc.

Read more
Read more for Apple

PRINCIPAL COMPONENT ANALYSIS AND CLUSTER ANALYSIS AS STATISTICAL TOOLS FOR A MULTIVARIATE CHARACTERIZATION OF PHARMACEUTICAL RAW MATERIALS

12/14/2020

Abstract

Numerous factors contribute to the variability of the pharmaceutical industry processes and among these the raw materials play a primary role as they often come from different sources that use different production processes.
Raw materials characterization therefore plays a fundamental role in terms of Quality which, by its nature, is "the enemy of variability".
Multivariate Statistical Analysis of Data (MVDA), beyond of its complex mathematical, is here presented as a powerful and practical tool for the study and classification of raw materials.
Thanks to the use of multivariate techniques such as Principal Component Analysis (PCA) or Cluster Analysis (CA), it is possible to graphically represent each lot, defined by the values of the different analytical parameters that characterize it, as a point in a Cartesian diagram whose coordinates are the principal components. Since these components are built to intercept the variability in the data, these graphs reveal characteristics which would escape other types of surveys and therefore allow to catalog the lots based on the degree of intrinsic homogeneity that defines them and identify any anomalous behavior. This approach can therefore be used both initially, to characterize the incoming raw materials, and subsequently, in the case of any anomalies, to see how the raw materials of the batches under investigation were located compared to those that had not given problems.
The techniques that have been detailed here can also be extended to other typical situations in the pharmaceutical industry such as, for instance:
• comparative evaluation of finished product lots, for example for the purposes of Annual Product Quality Review (APQR).
• comparative evaluation of series of measurements performed by different operators, etc.
Once again, statistical methods show how it is possible to "simplify complexity" and extract practical and "ready-to-use" knowledge from complex datasets by capturing their information content.

Read more
Read more for Apple

MULTIPLE LINEAR REGRESSION: A POWERFUL STATISTICAL TOOL TO UNDERSTAND AND IMPROVE APIs MANUFACTURING PROCESSES

10/26/2020

Abstract

It is known that, over time, all production processes tend to deviate from their initial conditions, and this happens because of many different reasons such as changes in materials, personnel, environment, etc.
This variability in the processes, which often goes unnoticed, is instead well intercepted by the data that Quality Control systematically collects for batch release purposes.
If these data are analyzed using Multiple Linear Regression (MLR), they reveal a lot regarding the manufacturing processes that generated them.
This product knowledge is of great practical use to the Company as it allows to:
• understand which are the parameters that most affect the product quality and how they interact with each other,
• establish whether the parameters that are controlled are really the ones we need or, instead, which ones would be better to consider,
• define / improve a product control strategy based on experimental data and quantitative models rather than speculation,
• define and graphically represent the design space (ICH Q8) inherent to the production process considered,
• identify possible ways to improve process performance and scientifically pilot this improvement,
• mitigate the Regulatory impact in case of changes.
In this post is detailed, step by step, how this ready-to-use process knowledge can be obtained from experimental data easily available.

Read more
Read more for Apple

QUALITY METRICS AND DATA CONSISTENCY – Part 2

08/01/2020

Abstract

This second part is the continuation and completion of the previous one.
In this second post the points dealt with are:
Read more
Read more for Apple

QUALITY METRICS AND DATA CONSISTENCY – Part 1

08/01/2020

Abstract

In 2002, FDA launched the “Pharmaceutical cGMPs for the 21st Century” initiative with the aim of promoting a modern production approach, risk- and science-based. In 2015, always in that context, FDA asked the industry for inputs to define a “FDA Quality Metrics program” and in December 2019 announced that the implementation of a “Quality Metrics Program” has become a priority. Taking its cue from these FDA stimuli, this post and the next deal with the use of quantitative tools (or Quality Metrics) for understanding, monitoring and possibly improving pharmaceutical manufacturing processes. Real case studies that show the practical application of Quality Metrics to typical QA / QC topics are discussed and their statistical analysis detailed step by step. In practice it is shown how, from data normally available at the company, it is possible to easily extract useful information on the state of the processes and, above all, predict their possible outcome. It is exactly this combination of two aspects, one descriptive and the other predictive, which allows to really know a given process, control it and possibly even improve it. This knowledge is also useful for managing issues like OOS, OOT, deviations, etc. In fact, a poor knowledge of the process and of its quality indicators can lead to consider anomalous what is not. Given the number of Quality Metrics considered and the breadth of the case studies discussed, the topic was splitted in two parts. In this first post the points dealt with are:
Read more
Read more for Apple

Basics of Statistical Risk Analysis

07/23/2020

Abstract

Risk is an essential part of daily life and even the society, as a whole, needs to take risks to continue growing and developing. Risk management is the process of identifying, analyzing and responding to risk factors. According to ICH Q9, Risk Assessment consists of the identification of hazards and the analysis and evaluation of risks associated with exposure to those hazards. Apart from a few exceptions (e.g., quantitative FTA), most of the risk analysis tools commonly used in the pharmaceutical field (e.g., FMEA, etc.) are basically subjective. However, in some cases, there are statistical techniques that allow us to assess the extent of the risk associated with some decisions. A typical example of this is, for example, the decision regarding the conformity, or not, of a lot based on the analysis of a sample of it. In such a decision two figures must be considered, the PRODUCER and the CUSTOMER (or CONSUMER), who run two different types of risk. The PRODUCER runs the risk of rejecting a “good lot” while the CUSTOMER (or CONSUMER) that of accepting a “not compliant” or a “poor quality” product. This post briefly addresses this topic.

Read more
Read more for Apple

Regulatory Technical Writing - Labor Ergo Scribo!

07/17/2020

Abstract

Those who work must necessarily write! The aims are many: to communicate the results of one's studies, to give operating instructions, to respond to requests, etc. In all cases, however, if the message contained in the writing does not reach the recipient, the entire communication process is frustrated and the consequences of this can be significant. For this purpose, it is sufficient to think that at least a third of the time of an executive is spent in writing documents and that the quality of a given job, the choice to continue it, interrupt it, finance it, etc. are often determined solely by the document that illustrates it! The focus of this presentation is therefore to analyze the structure of a technical document and provide practical suggestions for its preparation. Writing, however, is still much more than this and therefore the presentation considers, more generally, the "what it means to write and how to do it".

Read more
Read more for Apple

Solvents Classification using a Multivariate Approach: Cluster Analysis.

07/16/2018

Abstract

This post continues and completes the analysis of a database consisting of 64 solvents, each described by eight physico-chemical descriptors, initiated in the previous post. Subject matter of this study is the application of Cluster Analysis with the intention of finding groups in data, i.e., identifying which observations are alike and categorize them in groups, or clusters. As clustering is a broad set of techniques, this study focuses just on the so-called hard clustering methods, i.e., those assigning observations with similar properties to the same group and dissimilar data points to different groups. Two types of algorithms have been considered: hierarchical and partitional. Quite apart from the chosen technique, the experimental evidence indicates the presence, in the database, of: • three main groups, each consisting of individuals categorized as similar among them and • a few isolated individuals dissimilar from the others. A similar finding was also obtained in the previous post using 2d-contour plots. A closer examination of these three main groups of solvent shows a finer structure consisting of smaller groups of individuals highly similar among them (e.g., members of a given chemical family (e.g., alcohols, chlorinated hydrocarbons) or of chemical entities sharing common characteristics (e.g., aprotic dipolar solvents).

Read more
Read more for Apple

Solvents Classification using a Multivariate Approach: Correlation and Principal Component Data Analysis.

06/01/2018

Abstract

The identification of data-driven criteria to make a conscious choice of solvents for practical applications is a rather old issue in the chemical field. Solvents, in fact, are mainly selected based on Chemist’s experience and intuition driven by parameters such as polarity, basicity and acidity. At least two research groups, already in 1985, approached the issue of solvent selection using multivariate statistical methods. These Scientists, using different databases, each based on different types of physicochemical descriptors, obtained different classification patterns. In this post, it has been chosen one of those databases and the data analysis process has been repeated detailing it systematically. This post deals with the first part of the process and it covers the intercorrelation among the physicochemical descriptors used to characterize the solvents under study and Principal Component Analysis. The correlation found allows to capture 70% of the initial data variability just using two principal components the first of which is related to “polarity/polarizability” and “lipophilicity” of molecules and the second to “strength of intermolecular forces”. The use of these two principal components suggests the possibility of grouping solvents into aggregates (or clusters) of similar individuals and this aspect will be covered in the following post.

Read more
Read more for Apple

A different way to look at pharmaceutical Quality Control data: multivariate instead of univariate.

05/09/2018

Abstract

In the pharmaceutical industry, Quality Control (QC) data are typically arranged in data tables each row of which refers to a specific production lot and contains the results from different types of measurements (chemical and microbiological). As for each active chemical entity, or dosage form, there is a specific data table and since all lots listed therein are manufactured using the same approved process, the data table contains the “analytical fingerprint” of that specific manufacturing process. In spite of their table form, QC data are usually reviewed, evaluated and trended in a univariate mode, i.e., each type of data is analyzed individually using statistical tools such as control charts, box plots, etc. The dataset is therefore studied “ by columns ”. In this post, it is proposed a different way to analyze QC data, i.e., by using a multivariate approach that improves upon separate univariate analyses of each variable by using information about the relationships between the variables. Moreover, the combination of multivariate methods with the power of the programming language R and its unsurpassed graphic tools, allows analyzing data mainly relying on graphics and, as stated by Chambers et al., “there is no statistical tool that is as powerful as a well-chosen graph”. This post shows how using R for combined multivariate data analysis and visualization, the information contained in QC chemical dataset can be easily extracted and converted into “knowledge ready to use”.

Read more
Read more for Apple

Riccardo Bonfichi Hi and Welcome on my website Smile

I am a Chemist and I work in the pharmaceutical industry since 1982 where I had experience of Analytical R&D, Quality Control and Quality Assurance. In the last six - seven years, I have developed a deep, personal interest in Statistical data analysis. After a start using Minitab and the univariate approach, I later discovered R/RStudio and Multivariate Analysis. Both these last findings, that impressed and fascinated me, are one of the main reasons for creating this website. I hope, in fact, it would allow me to get in touch with Scientists involved in the field of Multivariate Analysis and Clustering to learn from and to cooperate with. Therefore, please, get in touch to talk about statistical methods in the pharmaceutical / chemical industries and, in particular, Multivariate Analysis and Data Clustering.
The content of this website and the opinions therein have nothing to do with my current position or with my previous or current employers.

UNIVERSITY EDUCATION

1986 Master in Analytical and Chemical Methods of Fine Organic Chemistry
Polytechnic University of Milan, Italy

1981 Graduated in Chemistry
University of Milan, Italy

Training courses
• Statistical Process Control for the FDA regulated Industry, Pragmata, Teramo, May, 3rd - 4th 2016
• Statistics for Data Science with R, Quantide, Legnano, October, 19th - 20th 2018
• Data Mining with R, Quantide, Legnano, February, 15th - 16th 2018
• Intermediate R Course, DataCamp, February, 27th 2018
• Data Visualization and Dashboard with R, Quantide, Legnano, June, 25th - 26th 2018

Affiliations
• Member of the Italian Statistical Society (since May 2018)

Languages
My mother tongue is Italian. From 1989 to 1992, I have worked and lived in Basel (Switzerland) where I learned German. Beside this, I also speak English and a bit of French.

r-bloggers.com

quantide.com

r-project.org

rstudio.com/




Read Italian legislation on data protection and privacy.
P.I. 11136880967

Contacts email: rbonfichi@gmail.com

Twitter
Linkedin


Privacy policy


Template by Danny Design

Privacy policy

Law D.Lgs. n. 196/03

COOKIES LAW
This site doesn't use any type of cookies (technical cookies or profiling cookies).
Pursuant to Section 122 of the “Italian Privacy Act” and Authority Provision of 8 May 2014, no consent is required from site visitors.
Garante della privacy (en-it)
PERSONAL DATA
This website doesn't collect or store any kind of personal data.

COOKIE TECNICI USATI DA QUESTO SITO
Questo sito non fa uso di cookies di profilazione per i quali è richiesto il consenso del navigatore come meglio specificato nelle pagine del Garante della privacy (en-it)
DATI PERSONALI
Questo sito non richiede, non raccoglie e non tratta dati personali di alcun genere.