A Data Mining Algorithm for Monitoring PCB Assembly Quality

This book presents four different ways of theoretical and practical advances and applications of data mining in different promising areas like Industrialist, Biological, and Social. Twenty six chapters cover different special topics with proposed novel ideas. Each chapter gives an overview of the subjects and some of the chapters have cases with offered data mining solutions. We hope that this book will be a useful aid in showing a right way for the students, researchers and practitioners in their studies


Introduction
When surface mount technology (SMT) evolves as driven by the continuing miniaturization of electronic components and ever-growing board complexity, in-line defect inspection has become common for ensuring reliable production.For example, as an in-line measurement technique, visual defect metrology is now widely utilized in assessing process capability (Cunninggham & MacKinnon 1998;Rao et al. 1996;Barajas et al. 2003).In discrete printed circuit board (PCB) assembly, the boards within each shift are visually inspected to monitor the variation on operational conditions.Often the visual inspections are performed by automated machines, which utilize sophisticated optical and image processing techniques to detect the defects that lead to the process yield loss. Literature study on semiconductor industry shows that over 60% of end-of-the-line defects can be traced back to solder paste printing process (Breed 1998;Venkateswaran et al. 1997).Improving the printing process performance is expected to produce reduced rework and lower cost in the downstream stages of PCB assembly by preventing small shifts and twists of components from being defects.Moreover, when components have a large number of pins such as ball grid array (BGA), it is crucial to reduce the variation between the deposits of electronic components after printing so that all joints will be soldered properly (Dempster et al. 1977).Therefore, inspection systems built in paste printing process should not only detect the defects, but also help the operators identify the underlying root causes of poor yield resulting from inappropriate printing operations, and then develop corrective measures to avoid defective boards (Barajas et al. 2001;Litman 2004).A proper understanding of the patterns of variability among the measured solder paste profile is thus required to facilitate operators adjust the influential stencil printing parameters before a significant damage has occurred.To accommodate such quality control and yield improvement motivations, this paper proposes an effective identification method on root causes of solder paste defects by integrating statistical analysis of solder paste measurements and engineering knowledge of stencil printing process.Very often, in semiconductor fabrication, the outputs of visual defect inspection constitute a list of binary values.That is, when hundreds of integrated circuits are assembled on a printed circuit board, the inspection machine will indicate each solder joint either good or defective.Classical statistical process control (SPC) techniques have been applied to monitor the process disturbances by charting the percentage of defects per PCB.If the total number of visual defects exceeds a predefined control limit, the identified offending equipment should be tuned up and returned to in-control operational conditions.In PCB assembly, the optimal operational parameters for running the process in control are usually designated by operator's experience, or based on a small sample of measurements in that the cost of replicating massive quantities of PCB for inspection prevents the application of experimental design approaches (Bartholomew & Knott 1999;Gopladrishnan & Srihari 1999).A new diagnosing scheme for identifying the fault pattern present in binary inspection data is addressed in this paper, which is shown to serve as a tool to extract clustered patterns from inspected pastes on PCB and thereby identify corresponding root causes for each cluster of defects.Note that this method does not assume any prior knowledge about the nature of stencil printing faults, or any particular distribution on the size, shape or location of solder joint patterns.In short, this chapter introduces a method for routinely monitoring binary visual inspection data to detect the presence of clustered defects caused by certain assignable causes in stencil printing.As a key aspect of quality control and diagnosing, this root cause identification involves searching for systematic faults that explain the observed variability behavior by incorporating process knowledge.

The solder past printing process
A substantial proportion of the defects in PCB assembly occur in solder paste printing.For instance, insufficient solder paste volume may result in solder opens while excess solder paste volume increases the chances of bridging (O'Hara & Lee 1996).Maximizing the uniformity of solder paste profile to reduce subsequent assembly defects is then expected to improve the overall quality of PCB fabrication.On the other hand, the detection and reduction of defects in the earlier stage of SMT manufacturing such as stencil printing also diminishes the cost for other downstream stages (Pan et al. 1999).Therefore, a proper control of stencil printing has become significantly important over the years in yield management.As a major step in SMT manufacturing, stencil printing involves the allocation of adequate amount of solder paste on each component pad.In practice, various potential process factors (e.g., printer alignment, squeegee pressure, printing speed, and separation speed) may impact solder paste printing in achieving high quality.Fig. 1 schematically illustrates the stencil printing operation, where metallic stencil is first placed over a PCB and solder paste is kneaded on one side of the stencil.As shown in Fig. 1(a) and 1(b), the squeegee is pushed over the stencil under predefined pressure and moved to the other side of stencil with specific speed.This procedure makes the solder paste roll to fill the apertures in the stencil and the squeegee blade removes the excess of material, followed by the separation of the stencil from PCB at a slow snap-off speed.In stencil printing, operation parameters should be adjusted as controllable variables by process engineers.However, such parameter adjustment relies heavily on ad-hoc algorithms or expert knowledge, because the direct printing performance evaluation given visual inspection data is not readily achievable.The lack of an analytical process monitoring mechanism comes from the difficulties in deriving a direct mathematical function between the paste defects and process parameters.Thus, a challenging problem arises on how to utilize the binary inspection information to identify the influential process factors (or systematic causes) that affect solder paste quality.When the sample of inspection data becomes available, as discussed below, a logistic regression model will characterize the correlation of binary solder paste defects and measured physical profile (e.g., solder paste volume, height, area, etc.), the results of which are then incorporated into a latent variable framework for clustering the systematic causes to explain the variation on solder paste and consequent binary defects.

Logistic regression model
As a common statistical approach for analyzing binary data, logistic regression model has been applied to various data mining and machine learning disciplines such as data classification and predicting the certainty of binary outcome (Bartholomew & Knott 1999;Jaakkola & Jordan 1997;McCulloch 1997).Under the present problem setting, for each solder paste in PCB assembly, let y denote the binary inspection such that 1 for good paste and −1 for failure, and x be a d-dimensional vector representing a set of physical characteristics (called solder paste profile).The logistic regression analysis usually assumes the following quantitative relationship between y and x: where For a set of m measurement couples {(x 1 , y 1 ), (x 2 , y 2 ), …, (x m , y m )}, the log-likelihood of vector β in Equation ( 1) is: and the second-order Hessian matrix For notation simplicity, the Hessian matrix H in Equation ( 2) is often written in matrix form, i.e., H = −XAX T , where the non-zero element of diagonal matrix A is and x j is the jth column of d×m sample matrix Newton optimization algorithm works as an efficient way to estimate the d×1 regression coefficient vector by maximizing L(β) through the second derivatives ( 2), which provides the following iterative calculation (3) to estimate β: The maximum-likelihood (ML) estimation of β is also called iterative re-weighted least squares (IRLS) algorithm, where the computation complexity within each iteration is O(md 2 ).
As discussed in previous research work, the logistic regression model (1) has been used mostly to understand the role of input variables x in predicting the binary response variable y.In manufacturing practice, however, many of the measurement variables in x are correlated due to some common physical phenomena, which encourage us to seek a parsimonious form of the input variables to summarize their effects on binary outcomes.In other words, the effects on the measured physical profile can be explained by a reduced set of latent variables without loss of statistical information, as described in the next section.Thus, we would refit the regression model ( 1) with fewer latent variables to provide an interpretation of their influence on binary outputs as observed in defect inspection.This statistical interpretation, equipped with proper pattern clustering and visualization, is shown to enhance the diagnosing of solder paste quality.

MLPCA based pattern clustering algorithm 4.1 Latent variable model and MLPCA
When correlations are present among the measured variables x for a product, this implies the existence of common systematic causes that govern such interrelated manners.Therefore, multivariate statistical techniques such as PCA have been proposed to investigate the correlations when multiple variables are involved (Crida et al. 1997).A latent variable model is introduced to relate d characteristics of solder pastes to p unknown systematic causes v, by assuming that v affects the solder paste profile x through a linear model, i.e., www.intechopen.com where C = [c 1 , c 2 , . .., c p ] is a d×p constant matrix with full rank, and v = [v 1 , v 2 , . .., v p ] T is a p×1 zero-mean random vector with independent components, each scaled without loss of generality to have unit variance.
As assumed in PCA, the latent variables are of smaller dimension (i.e., p < d) so that the dependencies among observed data x can be described by a reduced set of variables v. Noise w denotes the aggregated effects that are not due to any systematic causes, which is assumed to be white noise, i.e., w ~ N(0, σ w 2 I), and independent of v.It is reasonable to assume that each root cause is associated with distinct physical dynamics so that the latent variables v can be represented by normalized independent Gaussians, that is, v ~ N (0, I).As such, the impacts on measured solder joint profile x from v are quantified by the magnitude of corresponding rows in matrix C. Equipped with prior distributions over v and w, model ( 4) now provides a parsimonious probabilistic description for multivariate measurement data x (Hamada & Nelder 1997;Tipping & Bishop 1999).Moreover, the probabilistic assumptions enable an ML estimate for C (denoted by CML) that is shown to span the principal subspaces of x (Tipping & Bishop 1999).
For isotropic Gaussian noise w, model (4) yields the conditional probability of x as: The Gaussian assumption on v implies that the marginal density of data x can be readily obtained by integrating out v so that x ~ N(0, Σ), and covariance Σ= σ w 2 I + CC T .For a sample of {xj: j = 1, 2, …, m} from model ( 4) and ( 5), the log-likelihood is where S is the sample covariance matrix.The estimate of C that maximizes the loglikelihood ( 6) is shown to satisfy (Tipping and Bishop 1999): The interpretation of Equation ( 7) is that the maximum of log-likelihood is achieved when the column vectors of d×p matrix U p are eigenvectors of S corresponding to the p largest eigenvalues.The eigenvalues λ i are stored in descending order within matrix The column vectors in U p are also called principal eigenvectors due to their relationship with respect to the eigenvectors, and R is a p×p orthogonal matrix.Furthermore, the ML estimate of σ w 2 is given by in which noise variance is viewed as the average of the d−p smallest eigenvalues.
The maximum-likelihood estimate of C in Equation ( 7) can be calculated by an iterative expectation-maximization (EM) algorithm between the following equations (Booth & Hobert 1999;Dempster et al. 1977): where M = (σ w 2 I + C T C).Thus, the optimal C and noise variance σ w 2 I are obtained when Equations ( 8) and ( 9) converge.Note that the rotation matrix R brings somewhat ambiguity in the ML estimation for matrix C. In the proposed method, this ambiguity can be resolved by determining the rotation matrix from . As implied in Equation ( 7), latent variable model ( 4) effects a mapping from the latent space into the principal subspace of multivariate data x.In this sense the ML estimate C ML for model ( 4) is indeed a form of principal component analysis.Therefore, we choose to term the proposed method as maximum-likelihood PCA.One major advantage of latent variable model and corresponding MLPCA estimate is to offer an effective way to link the variability analysis on solder paste profile and subsequent binary inspections to a candidate set of process faults.Suppose that multivariate measurements x on solder pastes are correlated due to common unobservable process factors v, this paper tries to provide an analytical tool for diagnosing product quality by relating variation pattern on physical characteristics to these hypothesized systematic causes.As demonstrated in later case study, this method is developed on a process-oriented basis, which applies MLPCA to determine the latent space of systematic root causes and then project logistic regression coefficients onto this reduced space for pattern clustering and interpretation.The visualization of clustered variation pattern, combined with appropriate engineering knowledge, will help identify the underlying process faults.On the other hand, classical PCA is a data-oriented approach that tries to explain the variance of x by seeking the principal eigenvectors.PCA works well for situations when a single process fault occurs (i.e., p = 1), but can not produce interpretable results for process diagnosing when p > 1 (Apley & Shi 2001).The limitations of PCA on root cause recognition or fault interpretation thus hamper its diagnostic capabilities in complicated multivariate process control.The latent variable model (4) also considers the effects from measurement noise on solder paste, which has been a non-neglectable factor when accurate process modeling and diagnosing are required.The probabilistic formulation enables the introduction of likelihood measure for obtaining ML estimate CML.It is worth noting that CML is built on the assumption that p is known.However, the probabilistic model itself does not provide a mechanism to determine p.For practical implementation, we need to address how to define the dimension of latent variable v prior to parameter estimation.For p = d-1, the model is equivalent to a full covariant Gaussian distribution, while in case of p < d-1 it implies that the remaining d-p directions is caused by noise variance σ w 2 .As a possible approach, crossvalidation may compare all potential values of p, however, it becomes expensive in computation when d increases.Simulation results over numerous examples with varying p and d, suggest the following practical rule to determine p and substitute it into the iterative EM algorithm:

The regression coefficient clustering algorithm
The dramatic advances in in-process sensor and data collection technologies enable vast quantities of physical features to be measured about the manufacturing system.For instance, in PCB assembly, laser-optical measurement machines are commonly installed to record detailed dimensional characteristics of wet solder paste after it is deposited onto the board in stencil printing.When electronic components are positioned and the solder is cured in the re-flow oven, dimensional characteristics are obtained via X-ray laminography (Crida et al. 1997;Litman 2004;Neubauer 1997).As in any quality control applications, one fundamental objective considered in this paper is to explain as precisely as possible the nature of variation on solder paste and identify the root causes of binary defects by utilizing the earlier measured physical information.
Although the aforementioned logistic regression method can estimate coefficients for each measurement variable, the high dimensionality of solder profile makes it not efficient for engineers to explore the nature of how the underlying process factors cause the defective outputs.On the other hand, as shown in Fig. 2, the latent variable model helps recognize the patterns of solder paste variation and thereby identify the corresponding systematic causes during stencil printing operation.By integrating the logistic regression method with latent variable model ( 4), the proposed methodology will quantify the effects on solder profile x and defects y from process faults v, which is performed entirely on the collected sample data with no a priori knowledge about the patterns of variation.Therefore, a core component of this approach includes the proper clustering over regression coefficients with respect to variables v, which provides more intuitive insight into the interdependencies among multiple measurement variables.Following the assumptions on model (4), let x = [x 1 , x 2 . . .x d ] T represent the measurable characteristics of a solder paste, and {x i ; i = 1, 2, …, n} be a set of n solder pastes in the board.Fig. 2 implies that p independent causes v j apply their joint effects on the variation of physical profile x i through a constant matrix C i , and produce the consequent binary outputs www.intechopen.comy i through logistic regression coefficient β i .In particular, the effect from cause v j is represented by the j th column vector c i,j in C i .Since each v j is scaled to have unit variance, c i,j indicates the magnitude or severity of variation caused by v j .After clustering a sample of solder pastes based on the distribution of their regression coefficients in terms of v j , quality diagnosing of stencil printing becomes possible by assigning a process fault to the solder pastes within the same group.Prior to the clustering analysis over regression coefficients, Equation ( 4) is substituted into the logistic regression model, yielding new coefficients latent variable v. Now binary data y i can be explained by systematic causes v j , which takes the form of a logit function, that is, where ε w denotes the transformed noise effect.The new coefficients β v,i,j correspond to the change in the log odds per unit change in v j when v j does not interact with other sources (this is reasonable given the latent variable model assumptions).Or, the effect of increasing v j by 1 is to increase the odds that y i = 1 by a factor exp(β v,i,j ).Since β v,i depends on the systematic causes v, the regression coefficients can be classified so that each cluster describes the similar pattern of solder paste variation.In other words, the proposed clustering method is used to separate the impacts from cause v. Once all inspected solder pastes on a PCB are clustered in terms of β v,i , process diagnosis for variation reduction can be performed since each cluster is mapped to a specific process fault or assignable cause.

MLPCA based clustering algorithm for quality diagnosing
As a statistical tool for diagnosing the quality of solder pastes, the proposed MLPCA based regression coefficients clustering algorithm is now summarized as follows: Step 1. Apply logistic regression model (1) to binary inspection data collected from m PCBs, yielding the estimates of coefficients β i for the i th solder paste through sample { i j y : i = 1, 2, …, n; j =1, 2, …, m ).
Step 2. Given the set of measured solder paste profile i j x , determine the dimension p via rule (10) and estimate the matrix C i in model ( 4) by MLPCA method.
Step 3. Calculate new regression coefficients , followed by a k-means clustering algorithm (Hastie et al. 2001) over β v,i to recognize the coefficient clusters.
Step 4. Present the geometrical clustering results on the board to process operators to identify the process faults by utilizing their engineering knowledge.The diagnosing results of solder paste quality will then lead to appropriate stencil printing operation adjustments.By taking advantage of the diagnostic information from latent variable model and logistic regression coefficients, the MLPCA based clustering algorithm provides a visual way to relate stencil printing process problems to the variation on solder paste profile and consequent binary defects.Case study in the following section shows that the proposed method is favorable in improving process quality by developing a more interpretable relationship between variation pattern and physical faults. www.intechopen.com

Application in PCB assembly
In stencil printing process, each solder paste is deposited on the board automatically by printing machines, then registered with the screen and printed.Stencil printing is known to be an established technology, however, there are some uncontrolled factors that influence the quality of solder pastes (Lathrop 1997;Liu et al. 2001), and hence cause component failures in PCB assembly.In order to produce pastes with minimal variation on physical profile, the controllable parameters for printing operation should be monitored and adjusted by appropriate diagnosing of solder paste quality.In the present study, solder paste printability was denoted by a physical profile collected from laser triangulation and X-ray based measurement machine.The purpose of the present experimental research is to identify the systematic factors in solder deposition process by quantifying their impacts on paste quality.The set of process factors include printer steel squeegee angle, printing direction, and squeegee speed, etc.The variation on measured solder paste profile that leads to binary inspection results stems from improper parameter settings of stencil printing, called systematic factors.Their effects on solder paste (such as solder paste volume, area, and height, etc.) are present in multivariate profile x.Due to the common factors v, variables in x are always highly correlated, as shown in the scatter plots in Fig. 3.For a specific solder paste, the plots were drawn over pairs of distinct physical solder paste features from a sample of m = 30 inspected boards.In semiconductor fabrication, the solder paste profile often includes paste thickness, paste volume, shape of heel fillet, shape of toe or center fillet, alignment between pad and lead, pad average width, pad average height, and pad volume, etc.For purpose of illustration, we chose ten physical features as element variables of vector x, that is, d = 10.Next, the coefficients clustering algorithm proposed in Section 4 was applied to map the pattern of solder paste defects on PCB to the latent systematic causes, given the assumption that variation of solder paste profile was not completely random due to the measurement noise.To accommodate pattern clustering and visualization, the present experimental study was undertaken over a region of PCB that consists of more than 3000 solder joints (e.g., n = 3012), as shown in Fig. 4. Given the sample of binary inspection yi and corresponding physical profile x i (i = 1, 2, …, n), we first calculated the estimation of coefficients β i by logistic regression model (1).MLPCA was then applied to estimate variation pattern matrix C i , in which the dimension of systematic cause v i was always determined as two by the rule (7) (i.e., p = 2 for all i).As indicated in the algorithm summary, after projecting original β i onto the latent space spanned by C i , the new coefficients β v,i became available for kclustering algorithm (Hastie et al. 2001), which classify them into two clusters.
The graphical illustration of clustered β v,i in Fig. 4 also validate the presence of two clusters as identified by the standard k-means algorithm.Since each solder paste is positioned on PCB by the unique X-Y coordinates, we can visualize the clustered coefficients on a printed circuit board such that the pastes in each cluster are denoted by the same symbol (e.g., "+" for cluster 1 and "×" for cluster 2).MLPCA implied that there were 2 systematic causes that governed the variation over the measured solder paste profile.The pastes denoted by '+' in Fig. 4, for example, were dominantly affected by the first systematic cause v 1 , which almost lie on the horizontal direction of the board, while the pastes denoted by '×' were distributed along the vertical direction and influenced mainly by the second cause v 2 .The graphical demonstration of clustered coefficient results in Fig. 4 thus helped process engineers to adopt their expert knowledge and experiences in diagnosing the solder paste defects.For instance, the solder pastes denoted by "+" in cluster 1 had relatively large coefficients β v,i,1 and were mostly distributed along the length of PCB.That is, the systematic cause corresponding to this cluster should influence the solder pastes along the horizontal direction to a greater extent during stencil printing.Intensive discussions with process engineers have provided a reasonable explanation for the causes to be inappropriate parameter settings in controlling the stencil printing speed and printing pressure.These process factors are expected to generate large variation of solder paste profile along the horizontal and vertical direction, respectively, and correspondingly more inspected defects.
Further investigation on other potential process faults implied that the above identified systematic causes are most likely to produce the consistent results.The diagnostic results also agreed with the natural speculation on stencil printing diagram in Fig. 1, where printing speed usually influences the solder pasts along the length of PCB, while printing pressure has greater impacts on solder paste quality than other process factors (e.g., separation distance, printer alignment) along the width of PCB.In addition, detailed inspections revealed that a substantial portion of the quality deficiencies (such as slumping, bridging and bleeding of paste underneath the stencil) along the width of PCB was caused by abnormal high printing pressure during stencil printing.
The case study shows that the systematic pattern on PCB assembly defects is often owing to specific process faults such as inappropriate process operations, rather than completely random due to the environmental or measurement noise.The proposed coefficients clustering algorithm provides an effective process-oriented diagnosis tool for identifying such production irregularities.By assuming the potential systematic causes are mapped to the clusters of solder pastes with similar coefficients, the variation on solder paste profile and corresponding binary defects can be mapped to improper parametric control that deviates from optimal conditions, which will suggest informative corrections to adjust the stencil printing to improve process quality.

Conclusion
The distillation of massive quantities of solder paste inspection data into relevant quality information allows rapid understanding of the low production yield in PCB assembly.The statistical diagnosis method proposed in this paper provides more meaningful insights into the defect mechanisms than traditional yield analysis methods, which can identify the assignable causes of defects and their effects on yield by integrating MLPCA and logistic regression model.This offers a systematic representation on the impacts of process condition changes to the variation of solder paste profile.The probabilistic latent variable model allows ML estimation to determine the latent space by iteratively maximizing the likelihood function.In contrast to standard P C A , t h i s a p p r o a c h i s a l s o e f f i c i e n t f o r multivariate process analysis when some sample data are missing.The clustering algorithm over the projected regression coefficients onto the latent space is relatively easy to implement with affordable computational effort.Experimental study demonstrates that the statistical interpretation of solder defect distributions can be enhanced by intuitive pattern visualization for process fault identification and variation reduction.

Fig. 2 .
Fig. 2. Illustration of latent variable model that explains the relationship between systematic cause v, solder paste profile x, and final defect inspection output y.