Contact Stephen Puryear

41 Cypress Court
San Pablo, CA 94806

How this all started

My manager at Chiron and I studied for two ASQ certifications (CQE and CQM) as well as the California State Engineering EIT License. In addition, we belonged to several of the same professional organizations. These included the Society for Maintenance and Reliability Professionals (SMRP), and the American Society for Quality (ASQ). Its is not surprising that a chance conversation with him would get me started on a path that has lead me down many an interesting road in the last eight years or so.

One day we were talking about the task of maintaining the 500 refrigeration systems Chiron Corporation was using at the time. We had all kinds; “walk-ins” big enough for forklift traffic, -80s, -20s, 2-8 degree “beer coolers” and domestic refrigerators in research labs and HVAC units spread throughout the campus buildings. The standard maintenance strategy for these types of units was (and is to this day) to visit each unit on the basis of the calendar. There was little or no other input to help us predict whether that scheduled visit was going to be an almost total waste of time or whether the refrigeration tech was going to be walking up to a unit that was only 24 hours away from complete failure. Also, the scope of our discussion did not include the costs of running inefficient units. We were only discussing the costs in maintenance resources caused by our ignorance of the “health” of the systems for which we were responsible as a facilities operations group. It seems more important these days that a unit that is beginning to run a little less efficiently is going to be proportionally more expensive for a company to operate. This situation could easily stay invisible for many months before a cost increase caused by inefficiency begins to also reflect itself as a set of symptoms.

An exciting and creative alternative to business as usual

We also discussed an exciting alternative approach: Condition Based Monitoring. Instead of basing our maintenance response to all these systems on the calendar, this approach is driven by our response to some measured critical parameter that we believe indicates its current state of health. In some instances this measured variable correlates very well with the equipment’s future. Vibration analysis can be a good example of this kind of measurement. But sometimes there is no known variable with an attractive correlation with the machine’s future, either because such a variable is not currently identified or it does not exist. Sometimes it is truly cheaper to run the equipment to failure rather than do the necessary investigation.

I was quite naïve then. After that conversation, I was excited because I thought that my task was to search the marketplace for the software that had already been written for freezer users to analyze some variable in their operation that would provide enough warning of imminent failure that a technician could be dispatched to rescue the freezer contents and fix the unit.

Just run down to the store and pick up the software…

That software did not and still does not exist. Instead, in the academic parlance, this very broad territory is “an area of active interest”. In other words we aren’t even close yet. Instead of giving up, I asked for access to a junk freezer at work. I got one and hooked it up to our building monitor system so that I could gather data on the way the freezer performed. This was inherently limited by the data bandwidth. The Chiron freezers only reported and alarmed the state of one internal probe to the building monitors, so this would be the last variable to “head south” when a freezer unit was about ready to fail.

A Rig is Born

At this point my loaner freezer at work died and besides, I wanted more control, so I decided to take this project home with me and work on it in my garage. My wife Terri was indulgent enough to allow me to buy a cheap -20 freezer from a nearby box store and I borrowed an obsolete DigiStrip data collector from work. The “Digi” is a real workhorse;here is a picture of one. The Digi can collect up to 120 channels of thermocouple/RTD/digital data every 30 seconds, but it needs a dedicated PC as a terminal. It cannot store anything but its own profile. I happened to have a surplus home computer lying around which we were getting ready to throw out so I ran some RS 232 cable between the Digi and our home office and I was in the freezer health data gathering business. My freezer only had about a 4 ounce charge of R134A refrigerant in it but I wanted to be above board so I got an EPA Universal license so that I could alter and store the charge in my unit. I wanted to measure its response to being over-or undercharged among other things. I needed a reservoir for the refrigerant being transferred back and forth that was safe to handle so I bought what scuba divers call a pony bottle. It’s the yellow bottle in the photograph above-good to 4,000 psi. At first when I wanted to pump out my system and weigh the charge, I put the pony bottle in a bath of dry ice and alcohol and figured that when I directed the refrigerant towards the bottle, the vacuum created by the temperature gradient would make all the gas want to go there very badly. Later I bought a recovery pump. (Thanks, Terri!). I never used a gauge set to determine how much refrigerant was present because I wanted to have more accuracy, so I always weighed the pony bottle on an accurate scale.

I start thinking about a failure detector tool

At first, my concept was to create some analytical tool which would be able to distinguish a failure mode from normal performance-an imminent failure detector. It took a while to shake this off because it’s attractively dramatic: facility guys will sometimes even start fires just so they can arrive and put them out! One thing that convinced me to let it go was the book Condition Monitoring For Marine Refrigeration Plants by Hugo Grimmelius which I have cited on the Resources tab. Grimmelius concentrated very diligently on failure mode analysis and also took a formal approach to modeling. By that I mean that he tried to write an objective function for a simple test bed plant. I don’t have near enough math for that although thanks to Practical Optimization Methods with Mathematica © by M. Asghar Bhatti (also cited on the Resources tab) I recognized the territory in which Grimmelius was working. Also, I kept reading a wide variety of experts say that one of the weaknesses of work in this area (call it pattern recognition for a moment) is that there is never a practical link to statistics and probability. In other words we should start out wanting to always be able to answer what probability rides along with any statement that we may venture to make about the current state of the freezer? If our current assessment of health is always accompanied with a built-in confidence interval that narrows over time, then we know that we are on to something important or at least interesting.

I find it necessary to change my basic approach

Slowly, I began to see that there were also technical advantages to abandoning failure detection if, at the same time I grafted probability onto my concept. Gradually I saw that compared to trying to catch a failure just before it happened, the failure mode is usually preceded by a relatively long time when nothing is apparently wrong yet. It is during this window of time that our assurance that the health of the machine is changing can grow. We can afford the time to allow that assurance to grow because we are looking for the front of a slope not trying to fight off the very back. If our assurance does not grow consistently, then we can begin to say that either we are in the measurement process noise and that everything is o.k. with the machine or else our tool is not working because it’s missing the symptoms of decreasing health. Since entropy almost always flows in one direction, a machine may not be ready to fail yet but it certainly is not going to heal itself. Before I leave this topic,I want to put in a plug for wavelet analysis in this application. They are beautiful, very flexible frequency analyzers. And because they are mathematical objects, not an engineers attempt to fashion a tool for a moments need, attaching a probability dimension to them is done when they are designed, not nailed on later.

And then there is sustainability…

Over the time of my interest in this topic, energy usage and its sustainability have risen dramatically in importance. This viewpoint provides an even better reason to abandon fault detection: the damned machine could eat us out of house and home before it ever gets around to presenting symptoms of a fatal condition that our detector is programmed to look for. Domestic kitchen freezer compressors will keep going even after they’ve gotten so hot that they have scorched the linoleum underneath them! A tribute, no doubt to their almost bullet proof design, but aren’t there much cheaper and quicker ways to set the kitchen floor on fire?

Aside for reading everything under the sun, what have I actually done so far?

I have explained that by using PCA and a graphic look at the data, it became clear pretty soon that there were only about two variables controlling the way my rig acted. The clearest controlling variable was the ambient temperature in my work space. Garages with south-east facing doors can get quite hot in the summer in San Pablo, California. The correlation between ambient air temperature and compressor run time was quite clear. I spelled it out in my Novartis Energy Award application elsewhere on this site. I would not have focussed on it so much except for a illustrative incident from this period. Periodically, Chiron would entertain the idea of moving some or all of its freezers scattered throughout the campus into a central location. As it happened, I was in the middle of this study during one of these episodes. I was able to ask the engineer in charge of the project what he defined as the environmental ambient requirements for the freezer farm. The only requirement he had as far as the freezers were concerned was a recommendation from the freezer manufacturer that their units never be subjected to ambient temperatures of greater than 105 degrees F. This requirement clearly had nothing to do with energy consumption, but with the survival of the freezer compressors. I am not saying the engineer was stupid, but I am saying that the marginal energy costs were not considered at all.

This embarrassing situation inspired me to write a function that correlated compressor run time and ambient temperature derived not from a handbook or table but from my rig. This I did by comparing run times as dependant variables to ambient temperatures as independent variables in Excel and PAST. Tackling the relation between internal freezer loads and compressor run time is something that I found to be quite a bit more difficult. Here are a few observations: An empty freezer’s control system will hunt quite a bit as it tries to stabilize the internal freezer space temperature. One of the first tricks I ever learned as an instrument tech at Cetus was to place a gallon jug of water in an empty -80 freezer before trying to calibrate its temperature display. The added load of 8 pounds of water helped quite a bit-thanks Mikey!. Without it, the displayed internal temperature oscillated quite rapidly. Also, intuitively as the load on a compressor increases, the compressor run time increases-at first. Then slowly the increasing loads starts to becomes less of a burden on the compressor and more of an appreciable temperature/energy reservoir. Visualize opening a freezer that is almost completely filled with 5 gallon jugs: Assuming that the insulation is intact and that you do not hold the door open all day, that freezer is going to stay pretty cold for a while with little additional help needed from the compressor. So, the load contributes in a non-linear way to the work that the compressor must do to keep things going. At first, as the load increases, the compressor has to run more, and then later the increasing load allows the compressor run less.

As I have said, I was able to use just a simple compressor run time average and found that it correlated very well with the temperatures reported to the  Digi by the little ambient temperature thermocouple I had hanging down from the rafters in my lab. However, run times correlated very poorly with the load. At one point I even thought about placing my unit on a floor scale in an attempt to trap the load. I gave that up for several reasons, including let’s see: cost, upkeep and the weird effect caused by frost accumulating on the unit. I found it strange that I had no way of knowing the freezer load except to open the door and look in the unit. At this point I wondered if perhaps Mahalanobis Distance was sensitive and robust enough to distinguish different loads if I exposed it to a matrix of ambient, internal temperatures and compressor run times. I have made quite a bit of progress on this approach. After I created the matrices and built the Mahalanobis tool, I was very pleased to see the different loads line up in their MD score order. The downside was that the MD scores seemed way out of whack. Some MD authorities say that the output of the MD process is a number whose units are standard deviations. This, if true (not every authority agrees with this assertion explicitly) implies to me that I should not see score much greater than 3, which would be 1 part in 1,000,000, roughly. To get my loads to fall into load order, I had to accept scores that were much higher than 3. So perhaps the tool does not work in this instance for reasons I do not understand, or I misapplied it and broke the math. I am assuming that this would be fairly straightforward for a pro to straighten out but it has beaten me so far.

 10,000 foot view of this situation leads me to the following summary: the task that I have set for myself involves normalizing the compressor run time for two factors: ambient operating temperature and the internal load placed in it. Suppose that we could place our freezer in a space whose ambient temperature was rigidly controlled with an unchanging load inside our freezer. It seems logical that the problem of assessing system health based on the compressor performance would be relatively easy. At least it seems logical to me, but I could easily be wrong. At this point I can’t prove it either way. Another assumption that I am sneaking in here is that our platform does not fall victim to terrorists, or a meteorite, or some one-off “random event”.

Late breaking news on the what have I actually done? front!

Over the last half year I have searched for a partner for this effort. I am excited to have located a wonderful resource-and they have positions! I recently drove to San Jose, California and talked extensively with Cypress Envirosystems. Their business model involves supplying wireless add-on devices for facility operators who know that they need to analyze/update individual systems within their shop without having to gut everything out to the walls. Cypress Envirosystems is a subsidiary of Cypress Semiconductors. Typical of their product spread is the wireless gauge reader. This is a clever device which clamps onto the face of an analog gauge in the field. It produces a local display (to reproduce the gauge’s hidden needle position) and a wireless output which can drive a control device. This is a very cheap way of digitizing a local analog device without engineering a new insertion into the process.

Cypress Envirosystems also has a wireless freezer monitor ! Talking with them was great for a couple of reasons. First, it was lovely to talk about coupling wavelet analysis to facility operation data analysis without getting a blank stare in response! Second, it increased my confidence in the viability of using wavelet analysis on the front end of this approach. The time series value of the dataset is just too important to throw out without very careful consideration. This will be doubly true when pressing a  wavelet analysis button is as fast and easy as pressing the tried and true (and in some cases nearly worn down)Fourier button.