Blog Archives

The Million Dollar Rejected Lot(s)! Part 2

6/30/2020

Welcome to part 2 of this investigation series! You can catch up on Part 1 right here.

This investigation started with an immediate smoking gun! The pooling vessel had foaming issues and frozen broth chunks. The operators stuck a pool strainer into the tank to mix things up.

You read that right.

They used a non-controlled pool strainer that’s not mentioned in any procedure. I still have no idea where it came from. They brought it through a CNC/ISO 9 transition zone, moved it through an ISO 9/ ISO 8 transition zone, opened the manhole entrance at the top of the tank, stuck it in the broth, then swirled it around for half an hour.

When I interviewed the operators, they emphazied the exhorbitant amount of alcohol used to sanitize the tool in each transition zone. They wiped down each and every crevice, including inside the extension poles. But this was as obvious of a smoking gun as possible.

We didn’t want to delay the start of the next engineering run (the last one scheduled). I wrote up the investigation report, also addressing a few extra potential contributors like maintenance performed on the mechanical transfer arm during the thawing process and an issue with the sample tubing. The report included all WFI, Environmental Monitoring (EM), and cleaning data proving the site and equipment were otherwise in control. All reviewers, including the Microbiology department head and the site Quality Assurance director, were happy with the report. It closed on time and the next run could move forward as planned.

I watched that next engineering run with interest. Some self doubt crept in, and I had some perspective on the contamination that didn’t quite jive with the initial report. I snuck into the lab a day before the bioburden test plates were scheduled to be read. The plates looked the exact same as the previous run. Covered in Bacillus; >250 CFU/mL.

Now things got scary. The site planned to run their first commecial batch in a month, but the previous two runs had overwhelming bioburden counts. A superstar team was put together.

The site 6 Sigma / Master Black Belt to facilitate the investigation
Two corporate Investigation/Microbiology SMEs were flown to this manufacturing site. These women created the corporate procedure for performing microbiology related investigations.
The Mechanical Engineering Manager responsible for installing manufacturing and CIP equipment
The Validation Engineer responsible for performing the cleaning validation
Process engineers (and their manufacturing manager) that created all raw material thawing and transferring procedures
Two supervisors from the Microbiology department.
Myself and another microbiology compliance investigator.
Other expert witnesses would pop in as needed for various lengths of time, Including a microbiologist investigtor flown in from a sister site in the US, and three microbiologist investigators + an engineer flown in from sister sites overseas. All those sites manufactured the same product.

This investigation had everything. It was treated with the intensity of a final product sterility failure, even though ~10 downstream bioburden samples were collected after further purification steps. We followed a DMAIC (Define, Measure, Analyze, Improve, Control) road map, with all the investigative tools you could ask for, like:

Brainstorming Session
Multiple GEMBA walks
The Is/Is Not, Fishbone, and Affinity diagrams
RACI, Cause and Effect, Contradiction, and Prioritization Matrices
SIPOC (Suppliers, Process Inputs, Process Steps, Process Outputs, Customer) process map,
SWOT Cost-Benefit Analysis
And lots of other acronyms and terms like 6M’s, FMEA, VOC, the 5 Why’s, CTQ, CCR, Process X, Project Y,

We even had another smoking gun that somehow didn’t come up during the first investigation! The first three engineering runs used a single pair of thaw vessels, the same pair each time. The 4th and 5th runs used both thaw vessel pairs (as seen in the overhead diagram).

The organism had to be coming from this second thaw vessel chain, right? We thought so too, so we put together a major swabbing effort to see if we could find our bacillus organism. We swabbed both pairs of thaw vessels and the piping leading from them to to pooling vessel. We hoped the pair used for the first 3 runs would function as a control. The stars on the overhead diagram represent the approximate areas we swabbed.

Keep in mind, all runs were performed using the pooling vessel marked “1”. The pooling vessel marked “2” (and the piping leading to it) was never used for these engineering runs. The Equipment CIP was performed soon after the 5th run was complete. Swabbing began a few days after the bioburden results were generated, so about a week passed between cleaning and swabbing.

Guess what? We saw our Bacillus organism in almost every location we sampled. Both sets of thaw vessels. The counts were highest downstream near the pooling vessel, dwindling to a single colony (if any) on the thaw vessel rims.

I gave a couple spoilers about non-root-causes in Part 1 of this series. My spoiler in Part 2: these results were accurate representations of what was swabbed. There were no sampling or testing errors that threw us off.

As all raw material moved in a single direction through the equipment, the group came to the following conclusions:

The organism was introduced in either room 1 or 2 and it made its way into the thaw vessels. The Micro people on the investigation team focused their efforts on this part of the investigation to find the source of this organism.
The cleaning cycles were not sufficient to kill/remove this spore/biofilm forming organism. The higher counts happened to settle in the downstream sections of piping between cleaning and swabbing. The engineering team and site management focused their efforts on remediating this issue. Serious conversations about permanent corrective actions were underway to either:
- Implement a contract cleaner (~$100,000/run + 3 day delay) to clean the equipment with a special cleaning agent between each run.
- Add a new acid cleaning cycle to the routine CIP, which would require a tank addition to each CIP skid and add hours to each CIP.

Fortunately, both of those conclusions were dead wrong. Unfortunately, it took two rejected lots to figure that out.

In the next part of this story, I’ll explain why the formulaic investigation techniques (i.e. the alphabet soup of techniques listed above) forced the investigation team to continue acting on those conclusions. I’ll also go into how the team should have been looking at this problem. This alternative mindset ended up saving a 3rd lot at the last minute.

Check out Part 3, Part 4, and Part 5 here!

0 Comments

The Million Dollar Rejected Lot (s)!

6/23/2020

0 Comments

Welcome to my series about critical investigations I've worked on! This is part 1 of an investigation I call "The Million Dollar Rejected Lots". This investigation determined over two million dollars in raw material needed to be destroyed.

First off- writing these stories with an interesting narrative is hard. I want to write these out as a mystery so you can experience what I did going through the investigation. But, with my benefit of hindsight, I know some things that threw off my team during these investigations. Aspects that threw off my team for days turned out to be pretty irrelevant, so it's hard to fit them into the story. I’m struggling to present those aspects in an intriguing way that doesn’t give away the ending. Let’s see how this goes!

This investigation was for massive bioburden (Bacillus) contamination seen in upstream processing of a biologic pharmaceutical. The main raw material cost around $1 million for each batch. We had to discard this material for each failing result.

The material itself didn’t help. Although we had good reason to believe it was sterile, the material was a nutrient buffet. Think of it as a growth promoting broth ice cube. Spoiler alert- the raw material was not the root cause of this investigation.

So what’s our problem statement?

A new manufacturing site was performing engineering runs for a new product. Three engineering runs were performed with minimal bioburden recoveries. On the 4th run, bioburden Too Numerous To Count (TNTC, >250 CFU/mL) was recovered from the test sample. The growth was so heavy there were serious conversations about the result being documented as a single colony that happend to grow so big it covered the entire filter. The test sample represents the pooling of all individual units of raw material used for the batch. No excipients or water is added to the batch at this time.

The site planned a total of 5 engineering runs. The 6th run was scheduled for commercial sale. Not only was the manufacturing process expensive - the product was a life saving drug in short supply. The pressure was on.

Below is a diagram of the raw material processing. The diagram starts where the material is moved out of frozen storage to a non-classifed area. Each bottle is manually placed on a conveyor belt to be taken into an ISO-8 clean room. While on the conveyor belt, the bottles are rinsed with WFI, dried off with compressed air, cut open in the ISO 8 area, then a mechanical transfer arm pours the bottles into a thawing vessel. At this point the frozen broth is melted enough to be pumped through transfer piping to a pooling vessel. When the entire batch of broth (~7000 liters) is in the pooling vessel, the sample is collected.

It’s important to note- This process is slow! To speed it up, a total of 4 thaw vessels are used for each batch. The system is even set up to transfer into 2 different pooling vessels as needed. A valve on the transfer piping determined which pooling vessel would receive the broth. Even when all 4 thaw vessels were transferring into a single pooling vessel, it took about 16-20 hours to pool all the broth.

The picture below shows how the conveyor belts, pooling vessels, and transfer pipes are set up from a top-down perspective. The stick figures shown in room 1 highlight the amount of activity that occurs there. It is a high traffic area as people transfer product out of freezers, place bottles on the conveyor belt, and monitor the process in multiple ways. Room 2 was off limits while the mechanical transfer arms were operating, but personnel did need to enter the room a few times a run to perform maintenance.

PThere were a few other factors we knew off the bat.

All water and compressed air was tested daily. Results were consistently 0 CFU/200mL.
All environmental monitoring for the ISO-8 areas met requirements for multiple months before and after this event. Almost all 0 microbial recoveries. Per procedure, low-level recoveries were not identified.
All cleaning cycles were performed as required. The cleaning validation met all requirements. Cleaning rinse samples were collected before most of the runs associated with this investigation, and they all met validation requirements.
As I mentioned earlier- we had reason to believe the raw material itself was not a contributor. We also had evidence the sampling and testing was not a contributor. Another spoiler- sampling and testing was not the root cause.

What are your thoughts? What would you look at for this investigation? In part 2, I'll describe the superstar team we put together, what we looked at, and why it was terribly wrong.

Links to the following parts here:

Part 2

Part 3

Part 4

Part 5

0 Comments

Covid Thoughts

6/12/2020

0 Comments

I feel like sharing some thoughts on Covid-19 case numbers.

I’ve been using this site when I feel like checking on the numbers. When I’ve checked Ohio’s (and a few other states’) official sites for case and death counts, it has been spot-on.

There is a lot of noise surrounding the accuracy of the official counts. For every reason I’ve heard why official counts are possibly over-reporting, I can list another reason these numbers are an undercount. In that mind-set, I’m trusting the state official numbers are accurate enough to show general trends. That’s what I want to discuss here.

I do want to mention one of those noise-contributors for the data: Pneumonia deaths. According to the CDC, the weeks before Covid deaths started to increase, there were consistently ~3700 deaths attributed to pneumonia each week in the US. Those deaths were not linked to Covid. That number jumped to over 5000 in early April.

Pneumonia has almost universally been associated with late stage Covid cases that resulted in death. This increase in pneumonia deaths could very likely be caused by Covid, but were not attributed to it due to lack of testing or false negative tests. I’m under the impression pneumonia cases usually decrease in warmer months, so I’d love more background reading on this. But right now, I strongly believe Covid deaths have been undercounted in the US.

With the official numbers over the past 3 months, Covid-19 is only behind heart disease and cancer for average deaths per day in the U.S. There was a week in Mid-April where it averaged to the #1 cause.

It’s important to remember cancer and heart disease are not contagious. Pointing to those numbers and saying “heart disease and cancer still kills more people each day!” is not a good mindset when considering actions to prevent the spread.

I’m proud of my home state of Ohio so far. It has the 7th highest population in the US, but as of 6/12/2020, only the 16th highest case count. The spike of cases in April was due to expedited testing for an at-risk prison population with most prisoners and staff testing positive. No sign yet of increases in new case counts after soft re-opening of businesses last month. It looks like new cases are slowly moving down even with a slight increase in testing.

New York was hit hard and early. The state is currently home to over a quarter of all US Covid-19 deaths. There was a consistent period of over 10,000 cases and 500+ deaths per day in Early to Mid-April. Over 90% of cases in New York state come from the 6 counties surrounding New York city. Dense metro areas pose a special risk of spreading contagious disease. Seattle and New Orleans felt it early on as well.

New York is now seeing close to 1000 new cases a day. This decrease is the single biggest contributor to the decrease in new cases we’ve seen in the country for the past 2 months.

5 States alarm me right now: Arizona, California, Florida, North Carolina, and Texas. Up until recently, their case count per population has been relatively low compared to other states. They’re showing a strong increase in new cases over the past month.

These states are home to 15 of the top 35 metro areas in the US. Honorable mention to cities as large as Jacksonville, Raleigh, Tucson, and Fresno at 40, 42, 53, and 54, respectively.

This is what I’m hoping for: the increase in cases is attributed to an increase in test availability. People with minor symptoms and low risk for hospitalization/intensive care are able to get tested. This allows us to better track where the disease is spreading and limit exposure to higher risk patients.

News from South Carolina (another state with sharply increasing case counts) points to a worrying sign- Not only are cases increasing, but the percentage of people testing positive is increasing.

Despite what we’re all hoping for, this virus is dangerous and is spreading. We should continue taking means to limit the spread until hospitals are prepared to handle an influx in patients. There are large scale government/business projects to handle that assessment and I don’t have visibility to the decision-making process there. But I want to remind us what we can do to limit the spread, and speak to some confusion I’ve seen.

There has been a lot of controversy over masks. Below are points of confusion I’ve heard, followed by rationale why we should still wear them in public places.
- “We were told they weren't needed”. There is a lot to unpack here. But early on, cases in the US were concentrated in hospitals. It was imperative to protect our front-line medical workers when there was a short supply of masks. If a general public run on supply could be delayed, they would be available for hospitals until manufacturers could make more.
- “Pores in the masks are bigger than the virus- They won’t stop it from passing through”. The part about the virus being smaller than the holes in most masks- that's totally true.. Literature about masks that implies they are not effective are usually based around this issue. But that misses the point on how masks help. The virus is likely contained in droplets spewed out when talking, singing, coughing, etc… These droplets will catch on the mask with viral particles contained in them, significantly reducing what comes out of you and lands on surfaces and other people.
- “If I’m not sick, I don’t need to wear a mask”. A recent WHO spokesperson statement really muddied the waters on this one too. The thing is, the virus is so widespread and contagious, you may not know if you have it until days after you were exposed. During the time from your exposure to you feeling symptoms, the virus is using your body as a warm incubating workshop to replicate itself. It can build a home in your lungs and spray out when you breathe before you feel sick.

Social distancing is a luxury many can’t afford. For many, the risk of getting the virus is outweighed by the risk of losing their livelihood. It sucks and the system for situations like this needs to improve. But keeping physical distance is a huge way to keep people you care about safe. Please remember and think about that before going somewhere.

On that note- 1 in every 1625 black Americans have died of Covid since the outbreak in the US started 3 months ago. They are often older based on risk factors for Covid deaths.

Odds are, 1 in every 1000 black men will be killed by a police officer. They are likely younger with a lot of their life yet to live.

There is never a convenient time to protest- but I defer the decision to do it to the people disproportionately affected by both of these issues (and their allies).

0 Comments

You can hear the groans when it happens...

6/9/2020

0 Comments

You can hear the groans roll through the department when it happens. Missed sample collection! This happens way more frequently than any Quality Control group wants to admit. It’s somehow always an analyst’s fault for not getting the sample(s). Let’s look at one of those times…

…A supervisor hears a couple WFI ports weren’t sampled the previous week. Those sites require testing every week. That’s a deviation from procedure, so she opens an investigation.

In this department, sampling plans are allocated to each analyst. The plans outline the ports they’re supposed to get each week and the days they’re usually collected. The plans are allocated a few weeks in advance.

The missed samples were set for Friday collection. The analyst assigned to those samples was present that day but did not attempt to collect them.

The supervisor talked to the analyst. The analyst confessed to forgetting those samples. The SOP stated there was no impact for this situation, so this investigation could be closed quickly.

Sound familiar? 6Ms were reviewed for this event too:

Measurement/Materials/Mother Nature/Machinery: These M’s did not play a role in this deviation and are not considered the root cause.

Method: The method clearly states these samples must be collected on a weekly basis. The method has been followed correctly for years. Method is not the root cause

Manpower: The analyst was aware of her sampling assignment for the week. She did not collect these samples. There was no barrier preventing her from collecting the samples. The analyst admitted she forgot to collect them. Manpower is the root cause.

This is another low impact investigation where the analyst admitted to a mistake. Easy manpower root cause! We can talk to the analyst, document this “talking-to” as the corrective action, and close the event. Deviation closing metrics look great again!

But I wouldn’t be bringing up this event if it were that simple.

For years, these specific locations were routinely collected on Mondays. Management decided previous test results were only representative of the site on Mondays. Since data did not represent the site over the course of the week, they started a program to rotate days each sample was collected. This was a good change.

But what a coincidence, these missed samples occurred the first week this program was implemented.

Now let’s throw in something else- The analyst responsible for collecting the sites- She was sick on Monday that week. Monday, the day these samples were previously scheduled for sampling on her sample plan. Monday is also the start of the week, the day the supervisors officially rolled out the plan.

When an analyst is out for the day, the supervisor assigns a back-up to collect samples on the analyst’s plan. These samples weren’t included because they were no longer collected on Mondays. By the end of the week, the analyst collected all samples she was assigned to except for the ones previously gotten on Mondays, the day she was absent.

Let’s add another twist- the supervisor is responsible for reviewing samples that were collected each week. This allows the weekend shift to get samples not collected during the week. The supervisor assigned someone to this review and the assigned reviewer stated all samples were collected. The supervisor reported that to the weekend shift without confirming the data.

I didn’t perform this investigation. I discovered the missing data that prompted the event, so I had a vested interest in it. But guess who did the write-up? The supervisor. I worked with her to flesh out the extenuating circumstances they should investigate.

Those circumstances are hard to discuss in an investigation. How do you cite a non-controlled review process or sampling plan as a root cause? How do you implement a corrective action for something that management wants to keep outside of a formal procedure? After talking to me, the investigating Supervisor talked to a different Supervisor (the one responsible for tracking investigation metrics). During that discussion, the decision was made to remove any references to the sampling plan and data review. The root cause was attributed to a simple manpower error.

Investigations are an opportunity to see process gaps and improve upon them. These improvements lead to fewer mistakes and higher quality products. In this case, the quality of the investigation was sacrificed to make metrics look good. An unfortunate employee was caught in the middle.

0 Comments

What Idiot Wastes More Time With This Investigation? (Spoiler: It's Me)

6/1/2020

0 Comments

This is a story about an analyst performing a biochemical test. She preps her samples, loads them into the test equipment, and enters sample info into the test software.

Then she runs the test (she clicked "run" and went to lunch).

After the test is complete, she confirms the assay is valid and results meet requirements. She cleans up her work area, notifies manufacturing they can continue processing, disposes the samples, and hands her paperwork in for review.

Her supervisor notices a test parameter was entered incorrectly. The analyst used “0" for parameter X, Per the SOP, parameter X must be "100" for all samples.

The software was able to re-calculate the results after the fact. The difference in results was barely a rounding error. But manufacturing was told to continue processing based on an inaccurate result, which was technically a deviation.

The analyst and her supervisor discussed the error. The analyst immediately admitted to the mistake. The method clearly said to use 100 for parameter X. The event had no impact on the product, so the investigation could close quickly, but we wanted to use the 6Ms to show our due diligence. Here's how the 6Ms were considered (very abridged versions):

Measurements:  The parameter was entered incorrectly, and the parameter is used for the test measurement. But that's the problem statement.

Materials: All materials were used correctly for this test.

Mother Nature/Milieu:  The environment did not contribute to this issue.

Machinery:  The machinery was calibrated and functioning correctly at the time.

Method:  The method clearly states parameter X must be set to 100.

Those 5Ms didn't need much consideration for the root cause. Those factors had all been in place for years with no previous deviations like this. Now, we get to...

Manpower:  The analyst did not enter Parameter X correctly. There was no physical or electronic barrier stopping her from entering it. She completed the test with the correct value entered multiple times before.

Even with the 6Ms It’s obvious the analyst made a mistake.

We could talk to the analyst, document a “talking-to” as the corrective action, and close the event. Metrics for closing deviations in less than 30 days would continue to look great.

The root cause must be manpower, right? What idiot wastes more time with this investigation?

Me. I'm that idiot. I drove the area manager nuts with this.

I talked to that analyst about how the parameter is normally entered in the system. The answer was interesting – she doesn’t normally enter the parameter. Analysts are taught to copy and paste information from a previous line when performing new tests. The analyst only needs to change test specific info. This saves the company lots of time over dozens of tests each day.   As parameter X never changes, analysts never do anything to that field.

This analyst was performing a single test that session, so she created a new blank line to enter sample information. The analyst felt it easier for a single sample, but this is the more compliant practice. The blank line defaults parameter X to 0. As she never adjusted this value in her training or in previous tests (and because this was a time sensitive sample), she didn’t think to look at it.

By investigating this issue just a little bit more, we uncover a lot:

Training for analysts to copy and paste is bad practice
The system was set to default at 0 when the SOP said to use 100
The analyst was set up to fail-
- She was trained to not need a manual entry.
- Why would the equipment default be set to 0 when the SOP says to always use 100?

All we had to do was re-program the default value to 100, and we would never see the issue again.

If you do the routine 6M investigation you naturally blame manpower. This correction is NEVER discovered or implemented with that conclusion.

So what should we call the root cause? I definitely wouldn't consider it to be the analyst that ran the test. But It really doesn't matter. Check out my video on the futility of narrowing a root cause to a 1-2 word phrase here:

Another thing worth noting here- Around the time this event occurred, the company was rolling out a new policy. If you were the “manpower” root cause for an event, it would have a direct impact on your bonus and performance review. This policy could only impact front line, entry level employees that handle GMP tasks. In a very short sighted way, the company would save money on bonuses by jumping to blaming people. How would you feel working under that managment? I know I way over-simplified the rationale here, but optics like this matter.

The 6Ms are terrible for these types of investigations. You really need to walk through a process with the people involved to figure out what happened, even for ones that seem trivial. Please, if you’re doing an investigation, challenge yourself to find the actual root cause.

0 Comments

My Quality Blog

The Million Dollar Rejected Lot(s)! Part 2

The Million Dollar Rejected Lot (s)!

Covid Thoughts

You can hear the groans when it happens...

What Idiot Wastes More Time With This Investigation? (Spoiler: It's Me)

Archives

Categories