As electronic systems become more complex, troubleshooting manufacturing and field failures becomes more difficult. Hard failures (i.e. permanent changes in a device that lead to reproducible failures) are straightforward to isolate and determine a root cause. The more problematic area is “no trouble found” or NTF – meaning the original failure is difficult (if not impossible) to reproduce.
There are several sources of NTF:
- Noise – Signal reflections, cross-talk, ground bounce
- Timing marginality – Variations in rise times and delay times over temperature and voltage
- Weak cells (memory) – leakage in DRAM, read-write instability in SRAM
- Soft errors – upsets in memory and logic due to radiation from alpha particles within the IC package or neutrons generated from cosmic rays
Each of these sources of errors can be difficult to reproduce. This means that a failure initially observed at the system level will not be reproduced that the IC level. The key to effective failure analysis is to recognize the possible sources and run the appropriate stress tests to reproduce and isolate the failure:
- Noise and timing marginality are generally much worse at the system level than within the IC. However, memory chips can be tested with special patterns that highlight these marginality issues.
- Weak cells – failures will appear at random address locations from chip to chip, but refresh or voltage margin stress testing can be used to highlight the weak cell
- Soft errors are difficult to test at the individual chip or system level. Acceleration techniques are required: high altitude testing or neutron beams for cosmic ray soft errors and sources containing high concentrations of radioactive materials for alpha particle soft errors.
Knowledge of the characteristics of each of these causes of NTF can then be used to help isolate what the most likely source. Further testing at the system or component level with the correct source of stresses are necessary to confirm or refute the cause.