Mastering Root Cause Analysis with XMPro: Capture, Value, Impact
Last updated
Last updated
Unlock the full potential of Root Cause Analysis in our in-depth webinar with Nicole Scheinbach, Engineering Consultant at XMPro. This session is a treasure trove for professionals eager to streamline their RCA processes using XMPro's sophisticated application. 📊 Nicole takes you through the intricacies of RCA, teaching you to not only capture recommendations but also to assess their impact and value comprehensively. Imagine having a centralized system where every recommendation is immediately accessible, leading to quick, informed decisions that translate into measurable financial results. XMPro's application blueprints exemplify the platform's capability to transform insights into action. These blueprints are customizable, ensuring they fit a wide array of business needs, from those new to RCA to those seeking to enhance their existing condition monitoring applications. Ready to enhance your RCA proficiency? Join us to navigate the nuances of XMPro's RCA application and discover how to make it your own. 🔗 Blueprint Download: This RCA blueprint will soon be available for download on our blueprints, accelerators, and patterns page. Stay tuned! https://xmpro.github.io/Blueprints-Accelerators-Patterns/ 🕒 Key Webinar Segments: [2:16] What is Root Cause Analysis (RCA)? [6:00] Exploring the Benefits of Conducting an RCA [11:20] Introducing an XMPro-based Root Cause Analysis Application [28:55] A Guide to XMPro's Blueprints, Accelerators & Patterns Don't forget to like, share, and subscribe to our channel for more insightful webinars and tutorials on leveraging XMPro for your business success! #XMPro #RootCauseAnalysis #BusinessProcessManagement #RCAApplication #BlueprintsAndPatterns #CustomizableSolutions #EngineeringExcellence #Webinar #XMProTutorial
hello everybody and welcome to our last
webinar for 2023 um today I have the uh
pleasure of Nicole's company who's going
to run us through root cause analysis
application uh going through some
terminology um and then jumping into
some of the ex and pro specific uh
pieces can you drop to the next slide
please
ni you can just pull the whole slide um
some of the areas that we're going to
cover is um she's going to run you
through what is a root cause analysis
some types some benefits um an exm Pro
blueprint um around root cause analysis
and then we're going to touch on at the
end there just our general blueprints
accelerators and patterns as well um
next slide please so it's my great
pleasure to introduce to you our one of
our resident Engineers uh Nicole so
Nicole if you could just give everyone a
brief introduction and the floor is
yours sure thank you uh so my name is
the shinbach uh my background is in
mechanical engineering so a little bit
just about my previous roles uh before
being an engagement lead at XM Pro I was
a reliability engineer in a polymer
facility uh primary responsibilities
included PM improvements PM reviews uh
equipment upgrades or modifications
based off process changes as well as uh
root cause
analysis uh I also was a remote process
engineer that specialized in asset
condition monitoring this was primarily
through the use of iot sensors that
track vibration across um personally I
was uh over 15 different Building
Product and paper mill facilities trying
to prevent unplanned downtime on their
equipment my current role is an
engagement lead I've had a number of
clients um including we nutrient and so
primary
focus of um to create use cases to
basically capture and codify knowledge
you know to ensure it's not lost
especially when people retire as well as
to enhance and streamline current
workflows to solve any sort of problems
um that clients may
have so some basic terminology alignment
so first of all what is a root cause
analysis most people are familiar with
this term but uh just to clarify so root
cause analysis is the pro uh sorry root
cause analysis aims to identify the
causes of a problem in order to identify
actions to help solve the issues so um
at a high level right you want to you
want to be identifying the actual causes
of a failure not necessarily the
symptoms I shown on the diagram on the
Left Right the symptoms are
typically uh what you you know you
actually see I.E a pump bearing has
locked up but why has the pump bearing
locked up and so through the root cause
analysis process the aim is to address
these causes and not necessarily just
the
symptoms uh next going into the
different types of root cause
analysis so there are many many types of
root cause analysis um we're going to
focus primarily on the ones that are of
the you know cause and effect type
analysis I listed here three that are
quite popular the first is the 5y
approach this is what our solution is
Loosely based B off of um and we'll go
into that once I launch into the demo uh
at a high level of 5y is an iterative
technique to explore the causes and
effects of un underlying uh certain
issue so usually people describe this as
when a child basically asks you know why
something has happened and they
continually ask why eventually they give
up or they get the answer they want um
in the same fashion right if you keep
asking why you're going to get more and
more into the detailed uh
the details of a problem until you
actually reach the root
cause um next we have the fishbone
diagram also called the Ishikawa diagram
so this is a visual method to organize
the cause and effect relationship into
categories there is a little fish on
that diagram to the right that is
typically what the structure does look
like the head is typically the actual um
problem and then the associated bones
are the different categories of uh
causes so this is used across multiple
Industries and there's different uh
pneumonics that people typically use for
the categories one common one is the 6m
for manufacturing there's also I believe
a 4M there's different um there's
different M categories based on how many
categories you want to go through the 6m
is basically Manpower method machine
materials mother nature and measurements
so that's a good starting point when
your team wants to start putting
Associated causes underneath a larger
category um finally there is the Paro
chart so the Paro chart aims to identify
the frequency and impact of a problem it
sometimes follows the 8020 rule where
most um most problems 80% of the
problems are caused by 20% of the causes
and so in a Paro chart you know you're
looking at the leftmost of the chart
basically which is you know what uh what
issue are are happening at a high
frequency as well as causing the most
downtime so typically uh my experience
with this is that at the end of the year
it's a good look back of a cumulative
impact my reliability engineer would
usually sit down our team right go
through basically look this is the
frequency and impact of some issues and
you know try to assign work accordingly
for the next year to address
those next uh benefits of doing a root
cause analysis so why would you want to
do this right the main I guess the main
benefit to doing this is to prevent
reoccurrence of issues in the future
right the whole goal is basically X Y
and Z happen causing a failure right so
you're wanting to create action items to
resolve the root cause of a specific
failure versus a symptoms again
referencing the tree right the symptoms
are what you visually see or visually
happens and the root cause you know
that's what you need to ident identify
and
resolve additionally um improved team
communication so you know working with
your colleagues there's multiple
disciplines there's you know a cross
functional team that you know provides
support for the whole process right so
you need to ensure first of all that the
full picture is captured right so you
want to involve as many relevant people
as possible in your RCA process so
everything that is relevant is captured
understood and addressed via team
collaboration you also Al want
everyone's you know um input to ensure
everyone is on the same page and agrees
what the steps are to ensure this
problem does not happen again right
everyone needs to agree finally in terms
of documentation right this
documentation is important you first
want to you know validate that you have
done your due diligence you've captured
all the associated data and evidence uh
as well as Associated action items to
you know close the loop and ensure this
doesn't happen again and this is also a
great way to in um share across your
organization so if you're in
manufacturing typically you're going to
have you know Associated sister sites
that are doing a similar process to you
which might have the identical equipment
so ideally you're wanting to share your
experience with them obviously not a
great experience but to ensure that this
doesn't happen to them again right or
doesn't happen to them in the same way
that the failure has happened um your
site now in terms of the right diagram
so this is sort of how the different um
steps and uh pieces that you need to
fully complete an RCA uh the first is
the problem identification so basically
what has happened um and how you're
going to you know capture that so you
know X Y and Z failed at this time and
this impact the next is the data
collection portion so this is an
important portion where you know again
utilizing your cross functional team you
want to be gathering all the necessary
data so you know PM plan operations rout
process data anything of relevance to
your failure you need to capture in a
timeline so you can see you know
potentially when the actual issue
cropped up next cause mapping and
identifying the root cause right you're
actually performing the the root cause
analysis and identifying what the issue
is finally closing the loop you need to
create actions that are addressing the
root cause or causes and ensure that
they are implemented so this issue does
not happen
again so how does this process integrate
into our XM Pro existing process so
typically um you know depending on the
client there may be a very uh very
specific use case or problem that they
want to address or it might be something
more broad for example reducing
unplanned downtime right that's quite a
broad statement and that's you know
something OB ly very common amongst um
sites you want to ensure that um we're
trying to identify the right items to
address this issue so typically we go
through this process of we first
identify you know the Bad actors right
you can do that eventually via Paro
chart um you want to identify ones that
are you know either frequently failing
or causing a massive impact on
production um operations next you go
into the failure modes right what is
actually causing the this bad actor to
fail you need to identify that to
properly address it um and then you know
coming into the root causes so again uh
identifying the root causes is the most
important part because if you don't
identify the proper root causes you're
not going to be addressing the correct
problem now items to the right basically
you know is the rest of our process here
you know we we identify now that we know
the root causes any of the leading
indicators what data sources we need to
integrate with and then you know
Associated recommendations with that so
um today though we are going to be
focusing on uh the root cause review so
we wanted to bring this uh blueprint
essentially to make available to people
because uh basically we have you know
implemented solutions for clients we now
are enabling you know clients and you
know even new clients right to utilize
our um root cause analysis application
to identify you know what kind of root
root causes are creating potential you
know downtime availability losses
anything like that you can utilize our
platform to create solutions to address
these
issues the XM ba XM Pro based rot cause
analysis
application so now I'm going to take you
through the
actual um the actual demo uh and you can
see kind of what we've provided in terms
of a a
blueprint second I
bring all right so we have here um the
demo so I'll quickly go through the
basic pieces of this landing page so
you'll land here the first part is uh
the left the number of failures per AET
type for the last 12 months months so
this is based off um ISO 224 which is
actually the structure that we've
utilized uh to create our um our
variation of the
fivey uh the iso code basically goes
over how to capture data in a quote
unquote like reliability format you know
to ensure when you're doing an analysis
you know later in the year everything's
captured into you know appropriate
categories so you can analyze in the
future uh this is is used across
multiple uh
manufacturing uh facilities as well as
you know different equipment types so
here for our sample right we have a
centrifugal pumps we have the broad
categories for failure mechanisms as
electrical failures external influences
material failures and mechanical
failures now coming on to the right card
action items due soon so this is at a
high level everything that you or all
the rcas that you've created
all the actions that are due so you know
this is great if you need to look up you
know potentially one that you have been
assigned to you know double checking
which ones you have and when they're due
or at a high level perhaps you know
maintenance manager looking at all the
associated action items that need to be
due and you doing any sort of necessary
followup finally we get to the bottom
card here the all root CA analysis card
so this uh this card allows you to
actually go through and um look back at
your for existing root cause analysis
look at any pending action item just
look at any of the timelines anything
like that once they're completed they
will be stored here so again for
documentation purposes you can reference
them in the
future so right now we're going to go
through and create uh a new RCA so you
can just see the general
process so first uh as mentioned in the
PowerPoint the failure details so we
just want to capture it high level first
of all what has happened and the
associated Financial impact right that's
the most important thing um you know you
typically do root cause analysis for you
know extremely high impact things you
know that you need to address um so for
this demo we have uh centrifugal pumps
uh as the as the asset type now going
forward you can add any Associated asset
types that you want so if you've got
Heat exchangers fans anything of that
nature you can add a structure in there
uh to add them just to your um RCA
application I'm going to go ahead and
just copy and paste some of this data in
so you don't have to watch me uh watch
me type here so we have a asset ID
equipment ID and what basically happen
so there was a pump and it was shut down
due to a high overall vibration in the
de bearing so this happen uh the
beginning of the month we're now trying
to evaluate it while everything is fresh
facility so uh this client is based out
of this fictional client is based out of
Texas and they have basically two areas
um of their facility in terms of safety
impact there was no safety impact a bit
of operational impact and a large uh
larger production impact
here so after completion of all these
fields it will automatically sum up
double check all your items here and you
can click save and continue oh apologies
I did not add a zero there there is some
validation on these fields these fields
are required so you do need to fill them
all in this is all necessary
information going on to our next phase
um timeline so this is the data you know
collection portion that's really you
know vital to your uh your cause mapping
right you need to identify all the
associated events that could have per U
that could have cumulated accumulated in
your failure now if you also notice up
here there are associated breadcrumbs
this provides additional navigation
between the pages as well as let I mean
let your team know essentially that this
is how many parts you still have to do
to complete your
RCA so coming back to um you have your
cross-sectional team basically available
and they're digging through and they've
noticed
that uh way back in June right we we
installed a new
assembly um and this was of normal um
normal maintenance there was an overhaul
and we just uh we installed a new
rebuilt assembly
here um digging through your cmms
records you notice that
unfortunately
um unfortunately here there was a
failure and it was all Hands-On deck the
failure of a fan and unfortunately a
scheduled TM was not completed so this
was for lubrication of that U of that
home uh now a couple days later there is
also a scheduled uh vibration route
that's done um it does not pick up any
sort of anomalous uh overall vibration
yet right maybe you know the bearing is
still okay at this point now
unfortunately
these these um lubrication PMs and B
routes they only happen every couple of
months right so everything looks to be
okay until an operator basically comes
up um and he's doing his normal routes
and he can he can hear something wrong
with this bearing um at that point it's
too late right uh your your bearing's
probably your bearing's probably done
what he does is he he tells his manager
um and his manager basically calls up um
the Rel liability and maintenance team
and they take another reading um
basically before the scheduled reading
basically on the day that it's taken
down the vibe comes and says look this
is a stage forbearing failure at this
point you need to shut it down I have no
idea when it's going to fail and we
don't want to just have a random
unplanned uh downtime in the middle of
the night when there's no
support so um your cross-sectional team
has basically gone through and put
together the series of events uh they
think you know this is good enough but
we think we have an idea now of what
could be what would have caused this
issue now we still need to capture
everyone that has participated so you
know obviously for documentation
purposes you want to capture everyone
that um is part of this first of all we
have uh Bob CA Bob Costa is a process
engineer we also want to
cap
well um you want to capture this for
documentation purposes but you also want
to ensure um that everyone here is
captured because when
you uh assign the action items uh you
can only assign the action items for
people that were captured here so again
you want to make sure everyone is
captured here so we next have Jill Smith
she's a reliability engineer she also
works at
company last but not least we have uh
Max berson he is the maintenance team
lead so he has provided his input into
this RCA and he is
Robson
comp.com okay so now we have our
participants now we have our timelines
um you know we want to go ahead and uh
save and
continue um as we go to this part
someone says oh you know I think I need
to revise part of my timeline okay so we
go back uh via the breadcrumbs here and
he says you know I want to make it clear
that my operator he informed me ASAP and
we tried to get this done as as soon as
possible so we want to add a an
additional note here it says um you know
operator
notified integer
immediately
well
okay so uh there is a couple additional
functions here one of them is the save
button so you can see the save button is
to the top right of each card if you do
need to make modifications you can go go
ahead and do so after a certain point in
the RCA you'll no longer be able to make
modifications right for documentation
purposes people can't just come back and
continually make modifications but at
this point right you haven't done the C
map you can make modifications you can
also delete and I don't necessarily want
to delete here but if you click here and
click delete you can you know delete
anything maybe um you're doing a
revision with your team and you decide
oh this event actually didn't happen or
you know potentially we need to shift
around some things you can go ahead and
delete and upload you know the necessary
information but we're going to go ahead
and save and
continue but coming onto the failure
analysis part right so this is the most
important part right ensuring that you
you capture the correct failure analysis
as well as you know identifying the root
cause so you can have corrective actions
to take so what failure mechanism so
again uh utilizing
14224 the iso code right there is highle
buckets that we want to place
again for documentation purposes in the
future so this uh overall vibration you
know causing uh bearing failure that
would typically be considered a
mechanical
failure H what kind of mechanical
failure was this well you know it was
related to specifically vibration and
why did this happen so what caused um
you know what did the vibration do
essentially to kill the pump and in this
case uh it created a bearing failure and
um the bearing failure was eventally
going to um freeze up the pump and the
pump was going to stop stop rotating so
after this part you're you're saying
okay so the failure mechanism which is
you know at the highest level what you
you you visibly see is that the bearing
failure due to vibration caused the pump
to fail now what kind of uh comments or
additional information can you
provide you go through and look at um
your system maybe your Vibe system and
you know you analys that and analyze
that data and you find that um you know
like in your timeline that these these
bearing uh readings indicated a stage
for failure and then um there was
indication of bpfo so basically at this
point your Vibe Tech is recommended
please you know shut down again we don't
want unplanned um
outages again this is kind of like the
higher level what you actually see now
we come down to the actual um failure
causes right so you know the bearing you
know um unless there was a manufacturing
defect right the bearing just doesn't
fail by itself it has it has some help
here right so in terms of um what we've
dug in through the timeline basically it
looks like a PM was missed and um don't
know if it's like within the system if
there's some way to you know ensure the
PM is done but essentially the p m is
missed right so that is kind of failur
to like the management the workflow
system right something something is not
aligning this is a critical piece of
equipment and when the PM is missed we
want to ensure that it is done again
right um we can't just be mying
PM so in terms of that um it's it's sort
of a CMM you know cmms potentially or
documentation error right basically or
you know potentially a management error
depending on on you know which one
you're team um goes for basically so um
you know the management of the PMS needs
to be re-evaluated you need to look at
basically uh how we can ensure that
critical PMS are are completed or
rescheduled you know um if the you know
potentially was done the day after this
may not have been an issue right so um
after you've evaluated with your team
you leave some comments
basically um and apologies um so the
definitely because of this uh unplanned
failure another piece of equipment all
hands were on Deck this PM was missed
and unfortunately because this PM only
happens uh not on a high frequency uh it
was not known until basically the
bearing was in a stage where failure
that was going to um that was going to
cause an unexpected
failure so this is your uh cause map
once you're happy with this you can go
to save and continue
and this is the final stage this is the
action portion so you can see here this
is the cosm that you just created now if
you do need to modify anything you can
go back to the failure analysis tab uh
via the breadcrumbs but right now you
can basically take a look at this um and
identify what sort of actions you need
to take to address this so in this case
um one of the things is that basically
the PM was missed right so we want to
identify why the PM was missed
um and some some notes here
basically because this was missed um we
need to discuss with maintenance how to
mitigate this in the F future I guess
the current practice is basically the PM
is closed and we wait for the next one
to come along but for certain PMS right
especially ones like this that can't be
the current
practice um max berson he is the
maintenance um he is in charge of kind
of assigning the different PMS he can
take a look at how we can potentially
address this in the future we should
probably give him some time right uh
probably at least a month or two right
he needs to go back through double check
double check what the current practice
is and communicate that to his
te next U we're noticing that basically
because there was no Contin continuous
vibration monitoring on this piece of
equipment um it was only picked up
because an operator heard the sounds
right which is that point it's not uh
it's not
savable and so you know we've been
hearing about all these iot sensors
right they they uh they returned
real-time data this could have
potentially caught it before it became a
stage for uh bearing failure why don't
we look into some of these iot sensors
there's a lot on the market right but um
you know based off our process and you
know needed temp requirements and things
like that we can probably find something
that we can install on there and ensure
that we are seeing the vibration data in
real
time so Jill Smith she is a reliability
engineer she's going to go through she's
going to take a look at any sort of
associated um iot sensor companies that
look like they could be a good fit for
our
application all right new
actions so theme is deliberated okay we
feel that these actions are are good um
and are addressing the issues now if we
do need to come back and add actions in
the future can go ahead um if we you
know think about something else or you
know X Y and Z otherwise we feel we feel
confident that this is going to address
this issue so now that you've created
your actions you're going to go ahead
and return
home and now you can see that um these
are um by RCI um you can see that this
one was created
um was created and so now you can go
back and reference that in the
future
all right um so that is the
demonstration again um a couple quick
items to note here so you can navigate
to the rcas Via here um so for example
if you want to see
um see this RCA another sample RCA you
see oh I want to reference like what was
discussed here it might have been a
while you can go back and see based off
the action item the RCA similarly you
can go back and reference the RCA
here oh
is not created it
is let's see this now you can basically
see what the information is that was
captured in that
RCA okay now I'm going to go back over
to Gavin um and he's going to finish out
this uh this
presentation thank you very much uh
Nicole the the one thing I will mention
is this is a blueprint so what that
means is if there's anything extra you
want to change you want to add to you
can adapt this to your own processes you
can change this to to your own way of
doing things um all the data is um
captured if you can go to the next slide
please Nicole all the data is captured
and the the other thing is even if you
flag something is deleted it's not
deleted from the database it's flagged
so you can't actually bring it up
however you can put a ton of metrics on
top of that and actually bring a lot of
that up we will be expanding on the
blueprints as well um and adding a lot
of different feedback options for the
reporting Etc so where can you find it
um it is part of our blueprints
accelerators and patents um we covered
that in a prior webinar what that is how
to access it um on that page if you hit
the landing page bottom right there's an
RSS feed if you click that it'll
actually um give you the RSS feed you
can load into your outlook Etc and then
you can be informed whenever we publish
new ones um out this particular one uh
should be out um just after these
webinars just before the holidays um and
you'll be able to access it uh and and
go from there you can also contribute to
these as well so if you have anything
you feel that you need to contribute uh
please don't be shy next slide
please and with that um thank you Nicole
for uh running us all through that um
this will be uh it for the webinars for
the year so we'll be taking a short
break uh for all the holidays that
everyone's going to go on and we'll be
seeing you in February
2024 um so be safe thank you for the
Fantastic um 2023 for attending feedback
and and comments um and we will pick
this up in February of next year thank
you all thank you
everyone