Methodological Assessment of Data Suitability for Defect Prediction

Peter Schlegel, Daniel Buschmann, Max Ellerich, Robert H. Schmitt


Purpose: This paper provides a domain specific concept to assess data suitability of various data sources along the production chain for defect prediction.

Methodology/Approach: A seven-phase methodology is developed in which the data suitability for defect prediction in interlinked production steps is assessed. For this purpose, the manufacturing process is mapped and potential influencing variables on the origin of defects are identified. The available data is evaluated and quantified with regard to the criteria relevancy, completeness, appropriate amount of data, accessibility and interpretability. The individual assessments are then visualized in an overview, gaps in data acquisition are identified and needs for action are derived.

Findings: The research shows a seven-phase methodology to systematically assess data suitability for defect prediction and identify data gaps in interlinked production steps.

Research Limitation/implication: This research is limited to the analysis of contextual data quality for the use case of defect prediction. Other data analytics applications or processes outside of manufacturing are not included.

Originality/Value of paper: The paper provides a new approach to identify gaps in data acquisition by systematically assessing data suitability for defect prediction and deducting needs for action. The accuracy of predictive defect models is then to be improved by the subsequent optimization of the data basis.


predictive quality; defect prediction; failure prediction; data suitability; data quality

Full Text:



Ardagna, D., Cappiello, C., Samá, W. and Vitali, M., 2018. Context-aware data quality assessment for big data. Future Generation Computer Systems, [e-journal] 89, pp.548-562. DOI: 10.1016/j.future.2018.07.014.

Arif, F., Suryana, N. and Hussin, B., 2013. A Data Mining Approach for Developing Quality Prediction Model in Multi-Stage Manufacturing. International Journal of Computer Applications, [e-journal] 69(22), pp.35-40. DOI: 10.5120/12106-8375.

Backhaus, K., Erichson, B., Plinke, W. and Weiber, R., 2016. Multivariate Analysemethoden: Eine anwendungsorientierte Einführung. Berlin, Heidelberg: Springer. DOI: 10.1007/978-3-662-46076-4.

Bal, H.Ç. and Erkan, Ç., 2019. Industry 4.0 and Competitiveness. Procedia Computer Science, [e-journal] 158, pp.625-631. DOI: 10.1016/j.procs.2019.09.096.

Batini, C., Cappiello, C., Francalanci, C. and Maurino, A., 2009. Methodologies for data quality assessment and improvement. ACM Computing Surveys, [e-journal] 41(3), pp.1-52. DOI: 10.1145/1541880.1541883.

Bauernhansl, T., Krüger, J., Reinhart, G. and Schuh, G., 2016. Wgp-Standpunkt Industrie 4.0. Frankfurt am Main: Wissenschaftliche gesellschaft für produktionstechnik.

Brecher, C., Klocke, F., Schmitt, R. and Schuh, G. eds., 2017. Internet of Production für agile Unternehmen: AWK Aachener Werkzeugmaschinen-Kolloquium 2017. Aachen, Germany, 18-19 May 2017. Aachen: Apprimus Verlag.

Cai, L. and Zhu, Y., 2015. The Challenges of Data Quality and Data Quality Assessment in the Big Data Era. Data Science Journal, [e-journal] 14(2), pp.1-10. DOI: 10.5334/dsj-2015-002.

Eger, F., Coupek, D., Caputo, D., Colledani, M., Penalva, M., Ortiz, J.A., Freiberger, H. and Kollegger, G., 2018. Zero Defect Manufacturing Strategies for Reduction of Scrap and Inspection Effort in Multi-stage Production Systems. Procedia CIRP, [e-journal] 67, pp.368-373. DOI: 10.1016/j.procir.2017.12.228.

Ghimire, S., Melo, R., Ferreira, J., Agostinho, C. and Goncalves, R., 2015. Continuous Data Collection Framework for Manufacturing Industries. In: I. Ciuciu, ed. 2015. On the move to meaningful internet systems: OTM 2015 workshops, Lecture Notes in Computer Science. Cham, Heidelberg, New York, Dordrecht, London: Springer. pp.29-40. DOI: 10.1007/978-3-319-26138-6_5.

Gürdür, D., El-khoury, J. and Nyberg, M., 2018. Methodology for linked enterprise data quality assessment through information visualizations. Journal of Industrial Information Integration, [e-journal] 15, pp.191-200. DOI: 10.1016/j.jii.2018.11.002.

Hildebrand, K., Gebauer, M., Hinrichs, H. and Mielke, M. eds., 2015. Daten- und Informationsqualität: Auf dem Weg zur Information Excellence. Wiesbaden: Springer. DOI: 10.1007/978-3-658-09214-6.

Kacprzyk, J., Gunn, S., Guyon, I., Nikravesh, M. and Zadeh, L.A. eds., 2006. Feature extraction: Foundations and applications, Studies in Fuzziness and Soft Computing. Berlin, Heidelberg, Springer. DOI: 10.1007/978-3-540-35488-8.

Kao, H.-A., Hsieh, Y.-S., Chen, C.-H. and Lee, J., 2017. Quality prediction modeling for multistage manufacturing based on classification and association rule mining. MATEC Web of Conferences, [e-journal] 123(9), p.00029(2017). DOI: 10.1051/matecconf/201712300029.

Lieber, D., Stolpe, M., Konrad, B., Deuse, J. and Morik, K., 2013, “Quality Prediction in Interlinked Manufacturing Processes based on Supervised & Unsupervised Machine Learning. Procedia CIRP, [e-journal] 7, pp.193-198. DOI: 10.1016/j.procir.2013.05.033.

Liu, H. and Motoda, H., 2008. Computational methods of feature selection, Chapman & Hall / CRC data mining and knowledge discovery series. Boca Raton: Chapman & Hall/CRC.

Pennekamp, J., Glebke, R., Henze M., Meisen T., Quix, C., Hai, R., Gleim, L., Niemietz, P., Rudack, M., Knape, S., Epple, A., Trauth, D., Vroomen, U., Bergs, T., Brecher, C., Bührig-Polaczek, A., Jarke, M. and Wehrle, K., 2019. Towards an Infrastructure Enabling. In: The Institute of Electrical and Electronics Engineers (IEEE) IEEE Industrial Electronics Society (IES), Proceedings of the 2nd IEEE International Conference on Industrial Cyber-Physical Systems (ICPS 2019). Taipei, Taiwan, 6-9 May 2019. IEEE. DOI: 10.1109/ICPHYS.2019.8780276.

Raudys, Š., 2001. Statistical and Neural Classifiers: An Integrated Approach to Design, Advances in Pattern Recognition. London: Springer. DOI: 10.1007/978-1-4471-0359-2.

Rawat, T. and Khemchandani, V., 2017. Feature Engineering (FE) Tools and Techniques for Better Classification Performance. International Journal of Innovations in Engineering and Technology, [e-journal] 8(2), pp.169-179. DOI: 10.21172/ijiet.82.024.

Schmitt, J. and Deuse, J., 2018. Similarity-search and Prediction Based Process Parameter Adaptation for Quality Improvement in Interlinked Manufacturing Processes. In: The Institute of Electrical and Electronics Engineers (IEEE), IEEE International Conference on Industrial Engineering and Engineering Management. Bangkok, Thailand, 16-19 December 2018. IEEE. DOI: 10.1109/IEEM.2018.8607361.

Schmitt, R.H., Ellerich, M., Schlegel, P., Ngo, Q.H., Emonts, D., Montavon, B., Buschmann, D. and Lauther, R., 2020. Datenbasiertes Qualitätsmanagement im Internet of Production. In: W. Frenz, ed. 2020. Recht und Technik: Handbuch Industrie 4.0. Berlin, Heidelberg: Springer. DOI: 10.1007/978-3-662-58474-3_25.

Schmitt, R.H., Ngo, Q.H., Groggert, S. and Elser, H., 2016. Datenbasierte Qualitätsregelung. In: R. Refflinghaus, Ch. Kern, and S. Klute-Wenig, eds. 2016. Qualitätsmanagement 4.0 – Status Quo! Quo vadis? - Bericht zur GQW-Jahrestagung 2016. Kassel: Kassel University Press. Ch. 6. DOI: 10.19211/KUP9783737600859.

Schuh, G., Rebentisch, E., Riesener, M., Ipers, T., Tönnes, C. and Jank, M.-H., 2019. Data quality program management for digital shadows of products. Procedia CIRP, [e-journal] 86, pp.43-48. DOI: 10.1016/j.procir.2020.01.027.

Škulj, G., Vrabič, R., Butala, P. and Sluga, A., 2013. Statistical Process Control as a Service: An Industrial Case Study. Procedia CIRP, [e-journal] 7, pp.401-406. DOI: 10.1016/j.procir.2013.06.006.

Uhlemann, T.H.-J., Schock, C., Lehmann, C., Freiberger, S. and Steinhilper, R., 2017. The Digital Twin: Demonstrating the Potential of Real Time Data Acquisition in Production Systems. Procedia Manufacturing, [e-journal] 9, pp.113-120. DOI: 10.1016/j.promfg.2017.04.043.

Wang, K.-S., 2013. Towards zero-defect manufacturing (ZDM)—a data mining approach. Advances in Manufacturing, [e-journal] 1(1), pp.62-74. DOI: 10.1007/s40436-013-0010-9.

Wang, R.Y. and Strong, D.M., 1996. Beyond Accuracy. What Data Quality Means to Data Consumers. Journal of Management Information Systems, [e-journal] 12(4), pp.5-33. DOI: 10.1080/07421222.1996.11518099.

Wuest, T., Irgens, C. and Thoben, K.-D., 2013. Analysis of Manufacturing Process Sequences, Using Machine Learning on Intermediate Product States. In: C. Emmanouilidis, M. Taisch and D. Kiritsis, eds. 2013. Advances in Production Management Systems, IFIP Advances in Information and Communication Technology, Berlin, Heidelberg: Springer. Vol. 398. DOI: 10.1007/978-3-642-40361-3_1.

Zaveri, A., Rula, A., Maurino, A., Pietrobon, R., Lehmann, J., Auer, S. and Hitzler, P., 2015. Quality assessment for Linked Data. A Survey. Semantic Web, [e-journal] 7(1), pp.63-93. DOI: 10.3233/SW-150175.



  • There are currently no refbacks.

Copyright (c) 2020 Peter Schlegel, Daniel Buschmann, Max Ellerich, Robert H. Schmitt

ISSN 1335-1745 (print)
ISSN 1338-984X (online)
CCBY crossref cope
Covered, abstracted, indexed in:
Clarivate Analytics Emerging Sources Citation Index; Scopus; Google Scholar; IDEAS; EconPapers; RePEc; Cabells' Directories; Google Scholar