This work is joint research with Daeun Jeong, Heungkook Ko, Sewon Park, and Jaeyong Lee and supported by Samsung Electronics, 2021 - 2023.
The data generated from the semiconductor manufacturing process have characteristics of non-normal distributions, random missing patterns, and high missing rates, which complicate the prediction of the yield. We propose the Dirichlet Process - Naive Bayes model (DPNB) that can simultaneously impute missing values and address classification problems. In this project, I implemented models based on neural network models and was a co-author of the paper. I implemented GAIN (Yoon, J., Jordon, J., & Schaar, M., 2018) and MIDA (Gondara, L., & Wang, K., 2018) using PyTorch
. Through this task, I have improved my skills in using GPU cluster servers. The result of this project was published in IEEE Transactions on Semiconductor Manufacturing in 2023.
Publication
S. Park, K. Lee, D. Jeong, H. Ko, and J. Lee. (2023). Bayesian nonparametric classification for incomplete data with a high missing rate: an application to semiconductor manufacturing data. IEEE Transactions on Semiconductor Manufacturing, 36(2), 170-179.