Particle counts were off the chart in an ISO 8 clean room! Literally, the particle counter couldn’t count high enough. A pump was spewing oil with each thrust. The spewing started slowly after a routine maintenance activity. Counts built up slowly over months then exploded into OOS results, forcing us to act. We fixed the pump and everything was fine again. Let's talk about the last step of that investigation. We needed to complete an effectiveness check to close the record in our quality system. We need to be “SMART” with our effectiveness checks! These principles were referenced in so many trainings I’ve been through. SOPs required this “SMART” plan. This is how the effectiveness check for the pump fix was SMART: Specific - We had a particle excursion in a specific room, so we assessed results from that room. Pretty specific. Measurable- We measured using the routine test procedure and compared measurements to our SOP/ISO 8 requirements. Achievable- All results had to pass. We knew it was achievable because the room met requirements when the pump was working before the break. Relevant- The rationale here was covered with “Specific”. If you had a particle failure, and you wanted to make sure that failure wasn’t ongoing, then you want to know results are good after the fix, right? Timely- Per procedure, we couldn’t keep effectiveness checks open for longer than 3 months. That guaranteed us 3 post-fix results with our routine monitoring plan. Perfect. The plan seems logical right? Unfortunately, every aspect was misguided. In an effort to make the effectiveness check achievable and timely (make it pass, fast!) the plan was designed like too many other poorly designed effectiveness checks. "Look at the same test results for X amount of time." The corrective action is effective if everything passes. First- The root cause wasn’t the broken pump, it was the yearly maintenance that broke the pump. Our post-fix tests showed we fixed it. We had no idea if the next maintenance would break it again. Second- We needed to know if the new maintenance procedure was effective. Since the maintenance was done on a yearly basis and our effectiveness checks were not allowed to be longer than 3 months, the “T” part of the “SMART” effectiveness check was impossible. To avoid keeping the record open for a year, we emphasized the broken pump as the root cause. That way, the effectiveness check just needed to confirm we fixed the pump. Isn’t that crazy? We feared an effectiveness check requirement, so we changed the root cause. That could have impacted what we did to correct it. If I hadn’t pushed the maintenance group to update their procedure outside of the investigation record, we wouldn’t have had a maintenance fix at all. This was one of many experiences that made me HATE effectiveness checks. But they need to be done. How else can we show our corrective actions are effective? Well, let’s see what they FDA has to say. Surely they have some recommendations. The FDA defines effectiveness checks in their regulatory procedural manual. “'Effectiveness Checks' are actions taken to verify that all …." Don’t waste your time with the whole definition. It only speaks to recall actions. It has nothing to do with corrective actions for GxP issues. CFR 21 part 820 also speaks a little to corrective action effectiveness. Manufacturers must have procedures for “Verifying or validating the corrective and preventive action to ensure that such action is effective and does not adversely affect the finished device" That’s the extent of it. No hints on HOW to do the effectiveness check. What about FDA warning letters? The FDA demands actions from countless firms that need to fix quality issues. I checked through a bunch of warning letters. It's always similar wording, something along the lines of: “You should address how you plan to oversee your CAPA program to ensure that you are confident that all corrective actions taken by your firm are verified to be effective.“ The FDA keeps making demands with no guidance on how to meet them! Well, they do make suggestions to bring in quality consultants. Those experts love the SMART plan. The particle example shows how, when that plan is taken out of a vacuum and mixed with mixed with metrics and business interests, you get some misguided and superficial effectiveness checks. So what are we supposed to do? Well, what if we look at it a different way. How does the FDA train themselves to see if a firms’ corrective actions are effective? Following that train of thought, I found a presentation on the FDA’s website for their inspectors. This is the most exhaustive section I can find on effectiveness checks: "Determine if corrective and preventive actions were effective and verified or validated prior to implementation. Confirm that corrective and preventive actions do not adversely affect the finished device…" Ok, on par with other FDA documents. Make sure the cure isn't worse than the disease. What else? “Using the selected sample of significant corrective and preventive actions, determine the effectiveness of these corrective or preventive actions.” Getting repetitive, determine if changes were effective, but go on… “This can be accomplished by reviewing product and quality problem trend results. Determine if there are any similar product or quality problems after the implementation of the corrective or preventive actions." Finally! Something substantial to determine if a corrective action is effective. It makes perfect sense. You can tell if your corrective action was effective by comparing trends of the problem before and after the action was complete. Firms should already have routine trend monitoring for potential quality problems. That’s where corrective action effectiveness should be tracked. Trends can be annotated at the event and expected change point with the event and corrective action. Firms feel obligated to keep quality records open until individual effectiveness checks are complete. That's a big time drain and shortsighted. Trend programs, which can cover years of product and environmental data (including other non-conformance trends) should include references to relevant excursions and events. Write your trending procedures to track data associated with those events. That way, you’ll never need to keep quality records open for effectiveness checks again.
This is a boring subject that I could talk about for hours. I have some more examples I want to add to this, but I’ll save that for another day. Also- It’s exhausting digging up all the FDA’s thoughts on this issue. Let me know if the FDA or other regulatory agencies have other documents that address effectiveness checks. I’d love more background!
2 Comments
*In the 2 contexts I hear it most often. “Right First Time” (RFT) is a metric. It measures how frequently we complete a goal error free. We can measure batches made without discarded units, tests performed without deviations, shipments without damage, etc... When used appropriately, we can identify areas for improvement. By acting on that information, we can be RFT more often. That’s great, and it ties into my point at the end of this post. But there are 2 contexts in the manufacturing and quality environments where I hear RFT the most. Context 1: A New Manager Introducing Themselves. We’ve all been there. A new Manager/Director/Executive Team is new to the office, so they give a quick presentation to introduce themselves. The first couple slides discuss their family and background. Then they get to the “Management Style” slide. There always seems to be a bullet point for RFT. They emphasize it, saying it’s an expectation they have to make their group the best group possible. Everything should be done right the first time. Seems simple enough. What’s wrong with that? With that management style, RFT isn’t a metric you learn from, it’s an expectation. This expectation is held by someone outside the group that couldn’t possibly understand the nuances of the group’s work yet. Even if they're refering to how they use the RFT metric to identify process gaps, It gives an impression they don’t need to see, understand, or help with problems currently preventing RFT. Then, that attitude trickles down to the floor… Context 2: After an error occurs. I’ve posted a few manufacturing and testing investigations on this site. I plan to write about a lot more. For almost all these events, group leaders, supervisors, and managers were quick to blame manpower. Once they jump to that conclusion, the first thing they say to their team is predictable. “Come on guys, we want to be Right First Time”. Responding this way is useless and demoralizing. It’s like telling a child that just dropped their ice cream “Hey kid, try not to drop your ice cream”. At this point you’re too late. RFT isn’t possible. Repeating RFT here re-enforces blaming manpower when we should be looking at process improvements (like- why would we even trust that girl to hold her own ice cream?). Conversations about the error circle around what an individual could have done differently to be RFT. That detracts from what you want to do as an organization . . . learn from this mistake to develop a system that can function RFT next time.
I propose a different mantra that can be used in both these contexts. "Learn and Improve". It's just as short and it meets the spirit of the RFT metric. I hope it catches on. I’m desperate for good news about the Coronavirus. I’m Jealous of New Zealand, Iceland, Taiwan, and South Korea. Maybe I just want to live on an island somewhere. As I link to those, I’m well aware of the size and diversity of the US posing a greater challenge for us. But good news is a double edged sword. If it seems like we’ve won the fight, then it discourages us from keeping up with preventative measures that work. Examples:
But it’s also important that good news about the virus is real. If we’re making decisions about the health of our community (and our own actions), it needs to be based on accurate information that’s not meant to mislead. Recently, a few charts were shared with me that are very misleading. I’m making this post to address them. I believe they were shared to counter the “fear” about the virus from the "media". When that “fear” diminishes, the current economic restrictions seem worse than the virus. The charts are all based off Ohio’s data. They were shared on 8/1/2020. The data is recent as of 7/31/2020. To be clear- all numbers are real from the Ohio department of health. I’m happy that, despite all the issues around gathering this data quickly, we can acknowledge this data is as close as we can get to what’s going on (i.e., the numbers aren’t “fake”). Before looking at the data, we have to know it’s lagging behind the actual spread. That’s the nature of the game. We find out who’s infected after the infection happens. Days to weeks after. With my past micro testing, results took 3-5 days for incubation. When I worked with customer complaints, we didn’t receive those until months after we made the product. I’ve learned there are a lot of ways to interpret lagging data, and there are constant calls to wait for more data before action. There’s loads of hospital and government bureaucracy the data goes through before it’s reported. There will be noise (I mean statistical noise, many of these cases are individually tragic):
But overall, I generally trust the experts that have dedicated their career to medicine and public service. I definitely trust them more than friends and family arm-chair quarterbacking the situation. The first chart discusses new cases. The blue bars are the most striking element on the chart for me. They’re distinct from all the line graphs. And they’re going down! Actual case onset is going down! This is great right? But what is "actual case onset"? For the purposes of this chart: when someone tests positive, they get lumped into the first day they started feeling symptoms or came in contact with a known carrier. If someone started feeling symptoms on those days, but haven’t gotten tested or haven’t gotten test results back, they’re not added to that date yet. As it takes time for people to get tested and get their results, recent days will always appear to have fewer cases. It’s like the makers of this chart purposely picked the earliest day they could to show a downward trend. As the virus is spreading, actual case onset is increasing. Full data for "actual case onset" lags by at least 2 weeks. The chart makes it look like only ~100 cases started on 7/30/2020. That number will inevitably rise as more data comes in. EDIT ON 8/11/2020- I just checked Ohio's official website. That number is now at 979 for 7/30/2020 The next most prominent element is the bright yellow line. This is the main point of the graph. Net new cases. The chart maker takes the amount of new reported cases (brown line) and subtracts the amount of new “presumed recovered” cases (green line). How does “net new cases” help us understand the spread of the virus? It’s a deceptive way of talking about Active cases. Active cases tell us how many people in the state are infected. On 8/1/2020, that number was ~20000. The yellow line shows a combination of how fast the spread is, and how many people were infected a month ago (long enough to be considered "presumed recovered"). Cool, but not really helpful. When the concern is virus spread, you want to know if more people are getting it this week compared to last week. 7/30/2020 and 7/31/2020 were the 1st and 4th highest new case counts in Ohio, the two most recent days on this chart. We've had high case spread for a month. Net new case count does not give an accurate picture of the spread. The last element on the chart is the death count. The red blips next to the blue columns. This is a visual representation of what’s been known since the virus first made its way to the states. 1-2% fatality rate for people with serious enough conditions to get tested. That means since April- Covid-19 is on pace as the 3rd leading cause of death in Ohio. Growing case counts won’t help that stat. The second chart shows cases and deaths by age. This chart emphasizes that the people who die are more likely to be older. It comes with a hint of "they were going to die anyway so their death isn’t as important". This is supposed to make younger people feel better. But it hides how:
The last chart discusses hospital data. There’s open beds and ventilators in the state. That's great! But communicable diseases spread in local hot spots. Data from the entire state isn't too helpful here. It doesn't help my parents in Cleveland if there are open beds in Portsmouth, Marietta, or Springfield. That's why Ohio is going with the county color scheme now.
I'd love to have an answer for what the state (or country) could do to make the virus go away. I don't get to work with people closest to studying the virus's impacts, so my recommendations are moot. I just want our individual decisions (wear masks, socially distance, avoiding crowded indoor spaces, how to talk with others about the virus) to be based on an accurate representation of the problem. I feel like sharing some thoughts on Covid-19 case numbers.
|