Everything You Know Is Wrong April 2004

Answers to those Doggone Thermal Design Questions

By Tony Kordyban

Mr. Everything You Know,

In the 18 months since finishing school, I have been responsible for designing sheet metal parts, running the vibration testing machine, and building assembly jigs for the manufacturing line. Now I am supposed to come up with a Quality Procedure for a high temperature Burn-In for all of our electronic products. As the only mechanical engineer, I guess I am the closest thing to a thermal expert around.

My question is: Which is better, to burn in the products at a steady 50°C for 24 hours, or to do temperature cycling, between -20° and 50°C? I have heard that temperature cycling is more effective, but the chamber to do that is much more expensive.

Too Much on My Plate Already in Fat City

Dear Too Much,

I am heartened that as the first step in your Quality Control Program, you took the time to consult a stranger on the Internet who specializes in humorous stories. It shows the same regard for your customers that was demonstrated by your management when they gave that assignment to you. So at least your policies are consistent.

I have to admit that Quality Control and Product Reliability are not my field of expertise. I know how to predict and measure and control the temperature of electronics. I don’t know how temperature affects their functionality or expected life (other people think they do, but I’m not sure about that either). But you didn’t ask for facts — you asked me what I think about burn-in. I don’t have any trouble expressing an opinion.

There is a vague, generally accepted idea about Burn-In floating around. Somewhere out there, hidden away as a trade secret, or in some military test spec, is a magical test that we can run our finished products through, if we only knew exactly what it was. That test is something like the Obstacle Course in the military basic training. The recruits are run through this grueling test, and the weaklings are weeded out. That makes the surviving population stronger, on average.

Behind this Burn-In idea are three assumptions.

There’s the first assumption of the theory of Burn-In: that in every batch of products you manufacture, there are some weaklings that are liable to fail, causing problems for your customers, and eventually, for you. You don’t want those weaklings to get into the customers’ hands. You want them to have Good Quality, whatever that means.
The second assumption is that running the products at elevated temperature, or maybe through some temperature cycles, is a grueling experience that will weed out the weaklings.
The last assumption of the Burn-In enthusiast, is that after you start burning-in your products, you never have to prove the first two assumptions. Just keep doing Burn-In and call it your Quality Process for Reducing Infant Mortality.

Although, as I’ve already admitted, I’m not a Quality Engineer, I am under the impression that a Quality Process is supposed to have metrics and feedback. By that I mean, if you say you have weaklings in your product population, you should be able to measure how many there are, statistically. For example, you might classify five out of every hundred as “weak”.

Then after your Burn-In process, whatever it is, the metrics should be able to tell you how many of the survivors are still “weak”, or at least, how many units fail during Burn-In. That is a measure of how effective your Burn-In process is at weeding out the weaklings. If nothing ever fails in Burn-In, either you don’t have any weaklings, and Burn-In is a waste of time, or the Burn-In process in ineffective at weeding out the weaklings, in which case it is still a waste of time.

So, Mr. Too Much, the first thing you should find out before you design a Burn-In process, is whether there is any data about the weaklings in your product population. Perhaps if you request this data, it will take many months to dig up, during which you can get some of your more useful work done.

But if the Burn-In project still goes on after that, you will need to look at the second assumption. Is running your product at 50°C for 24 hours really a grueling ordeal? Is the nature of the “weakness” of your weaklings something that will be accelerated by elevated temperature? For the products I am used to designing, 50°C is a normal operating temperature, not some horribly stressful condition. Granted, 50°C is a little bit stressful compared to 20°C. But it seems unlikely to me that it is enough stress to cause even “weakling” products to fail quickly. Instead of an obstacle course, I see 24 hours of 50°C operation as a walk in the park (perhaps on a sunny, summer day.)

Temperature cycling is more stressful than steady state operation. At least I can imagine one physical failure mechanism that is being exercised when the temperature is cycled — the stresses due to differential thermal expansion are run through with each reversal in temperature. It is like bending a paper clip back a forth. With enough bends, the paper clip will break.

The problem is, how many cycles can you achieve with a roomful of finished products in 24 hours? If you cycle the room temperature too quickly, the component temperatures will not change very much at all, because of their thermal mass. If you change the temperature slowly, so the components reach the extremes you have in mind, then you will have only a few cycles by the end of the test. It takes many cycles to trigger mechanical failures, so this “Burn-In” test is likely to be not very grueling, either.

My guess is that if you introduced either one of these Burn-In processes, you would not weed out any weaklings, even if you have any. There would be some random failures along the way, but they would not be triggered by the temperature process. I think you would get the same results by doing a functional test of your products, letting them run at room temperature for 24 hours, and then functionally testing them again. Just the stress of handling them and testing them twice will cause some failures — mainly due to ESD (Electrostatic Discharge. The more products are handled by hand in the factory, the more they suffer ESD damage.)

Chances are you will be commanded to chose one of the two Burn-In processes you described. Pick one that makes sense to you (as I said, I don’t think either one will be effective anyway.) But write into the process a requirement for data to be collected before and after the Burn-In. If, after 6 months or a year of doing it, you can show that the product failure rate is no different after Burn-In than it is before Burn-In, then you have evidence that Burn-In is a waste of time.

But don’t phrase it that way in your report. What you have to say is: “Our manufacturing quality is so good to start with that Burn-In is no longer an effective method of removing defects.”

Dear Thermal Guy,

Aren’t there some component failure mechanisms that actually slow down with higher temperature? Maybe we should do Freeze-Out instead of Burn-In to get rid of infant mortality.

Jack Frost from Bear Lake, Minnesota

Dear Jack,

There is at least one failure mechanism that slows down as the component gets hot. It is one that I can understand, because I see it every winter on the body of my automobile. It is called metallic corrosion. Some metals corrode in the presence of liquid dihydrogen oxide (water) and other impurities. It happens especially when there are two different metals in contact, which is quite common on the surface of a chip. There is gold welded to aluminum touching copper sitting next to solder.

Maybe it’s hard to imagine, but plastic-encapsulated components are not watertight. Just the opposite — they suck water right out of the humid air. That water forms as liquid on the chip, and starts the metals turning into salts — even when the chip is turned off. That corrosion can even start to happen before the component gets assembled into a finished product.

The less water there is, the slower the chemical reaction. When the component is powered up, the water starts to bake out. If the chip is over 100°C, all the water is gone, and metallic corrosion comes to a halt. So it is possible that Burning-In products at 50°C ambient might actually slow down an important failure mechanism instead of accelerating it. Your Freeze-Out idea is maybe a more valid method of improving product quality than Burn-In at constant high temperature. At least it has some physics and chemistry on its side.

Are you planning on turning your ice fishing shack into a Freeze-Out Test Chamber?

—————————————————————————————————————

Isn’t Everything He Knows Wrong, Too?

The straight dope on Tony Kordyban

Tony Kordyban has been an engineer in the field of electronics cooling for different telecom and power supply companies (who can keep track when they change names so frequently?) for the last twenty years. Maybe that doesn’t make him an expert in heat transfer theory, but it has certainly gained him a lot of experience in the ways NOT to cool electronics. He does have some book-learnin’, with a BS in Mechanical Engineering from the University of Detroit (motto:Detroit— no place for wimps) and a Masters in Mechanical Engineering from Stanford (motto: shouldn’t Nobels count more than Rose Bowls?)

In those twenty years Tony has come to the conclusion that a lot of the common practices of electronics cooling are full of baloney. He has run into so much nonsense in the field that he has found it easier to just assume “everything you know is wrong” (from the comedy album by Firesign Theatre), and to question everything against the basic principles of heat transfer theory.

Tony has been collecting case studies of the wrong way to cool electronics, using them to educate the cooling masses, applying humor as the sugar to help the medicine go down. These have been published recently by the ASME Press in a book called, “Hot Air Rises and Heat Sinks: Everything You Know About Cooling Electronics Is Wrong.” It is available direct from ASME Press at 1-800-843-2763 or at their web site at http://www.asme.org/pubs/asmepress, Order Number 800741.

READ FOR FUN

Everything You Know Is Wrong April 2004