Abstract
Polychlorinated biphenyls (PCBs) are persistent pollutants that greatly affect marine ecosystems. Machine learning techniques were used to build quantitative structure activity-relationship (QSAR) models that predict PCBs"™ bioconcentration factor (BCF). These models were built from topographic 2D and 3D descriptors calculated for the molecular structures optimized at molecular mechanics level of theory. After the analysis of their statistical parameters, it was determined that two models are robust enough for predicting logBCF. The models selected were: M_4_LR, built with two molecular descriptors and showed values of r2 = 0,9154, Q2LOO = 0,8944, y Q2ext = 0,9119, and M_13, built with four molecular descriptors and showed values of r2 = 0,9375, Q2LOO = 0,9155, y Q2ext = 0,844. Both models passed the double validation phase, and they satisfied the criteria from the Tropsha"™s test. This implies that predictions for logBCF were quite accurate as it is showed in the results from the present study.
References
Santos, L. L., Miranda, D., Hatje, V., Albergaria-Barbosa, A. C. R., & Leonel, J. (2020). PCBs occurrence in marine bivalves and fish from Todos os Santos Bay, Bahia, Brazil. Marine Pollution Bulletin, 154, 111070. https://doi.org/10.1016/j.marpolbul.2020.111070
Ai, H., Wu, X., Zhang, L., Qi, M., Zhao, Y., Zhao, Q., Zhao, J., & Liu, H. (2019). QSAR modelling study of the bioconcentration factor and toxicity of organic compounds to aquatic organisms using machine learning and ensemble methods. Ecotoxicology and Environmental Safety, 179, 71-78. https://doi.Org/10.1016/j.ecoenv.2019.04.035
Bartalini, A., Muñoz-Arnanz, J., Baini, M., Panti, C., Galli, M., Giani, D., Fossi, M. C., & Jiménez, B. (2020). Relevance of current PCB concentrations in edible fish species from the Mediterranean Sea. Science of The Total Environment, 737, 139520. https://doi.org/10.1016/j.scitotenv.2020.139520
Soni, A. K., Sahu, V. K., & Sahu, S. (2017). DFT-Based Prediction of Bioconcentration Factors of Polychlorinated Biphenyls in Fish Species Using Atomic Descriptors. Asian Journal of Chemistry, 29(11), 2515-2521. https://doi.org/10.14233/ajchem.2017.20839
Safe, S. H. (1994). Polychlorinated Biphenyls (PCBs): Environmental Impact, Biochemical and Toxic Responses, and Implications for Risk Assessment. Critical Reviews in Toxicology, 24(2), 87-149. https://doi.org/10.3109/10408449409049308
Lunghini, F., Marcou, G., Azam, P., Enrici, M. H., Van Miert, E., & Varnek, A. (2020). Publicly available QSPR models for environmental media persistence. SAR and QSAR in Environmental Research, 31(7), 493-510. https://doi.org/10.1080/1062936X.2020.1776387
Liu, H., Liu, H., Sun, P., &Wang, Z. (2014). QSAR studies of bioconcentration factors of polychlorinated biphenyls (PCBs) using DFT, PCS and CoMFA. Chemosphere, 114, 101-105. https://doi.org/10.1016/j.chemosphere.2014.03.113
Devriese, L. I., De Witte, B., Vethaak, A. D., Hostens, K., & Leslie, H. A. (2017). Bioaccumulation of PCBs from microplastics in Norway lobster (Nephrops norvegicus): An experimental study. Chemosphere, 186, 10-16. https://doi.org/10.1016/j.chemosphere.2017.07.121
Yeo, B. G., Takada, H., Yamashita, R., Okazaki, Y., Uchida, K., Tokai, T., Tanaka, K., & Trenholm, N. (2020). PCBs and PBDEs in microplastic particles and zooplankton in open water in the Pacific Ocean and around the coast of Japan. Marine Pollution Bulletin, 151, 110806. https://doi.org/10.1016/j.marpolbul.2019.110806
Soni, A. K., Singh, P., & Sahu, V. K. (2020). DFT-Based Prediction of Bioconcentration Factors of Polychlorinated Biphenyls in Fish Species Using Molecular Descriptors. Advances in Biological Chemistry, 10(01), 1-15. https://doi.org/10.4236/abc.2020.101001
Mikolajczyk, S., Warenik-Bany, M., Maszewski, S., & Pajurek, M. (2020). Dioxins and PCBs - Environment impact on freshwater fish contamination and risk to consumers. Environmental Pollution, 263, 114611. https://doi.org/10.1016/j.envpol.2020.114611
Gad, S. C. (2005). Toxicity Testing, Aquatic. En P. Wexler (Ed.), Encyclopedia of Toxicology (Second Edition) (pp. 233239). Elsevier. https://doi.org/10.1016/B0-12-369400-0/00963-7
Schmitz, K. S. (2018). Chapter 4—Life Science. En K. S. Schmitz (Ed.), Physical Chemistry (pp. 755-832). Elsevier. https://doi.org/10.1016/B978-0-12-800513-2.00004-8
Peake, B. M., Braund, R., Tong, A. Y. C., & Tremblay, L. A. (2016). 5—Impact of pharmaceuticals on the environment. En B. M. Peake, R. Braund, A. Y. C. Tong, & L. A. Tremblay (Eds.), The Life-Cycle of Pharmaceuticals in the Environment (pp. 109-152). Woodhead Publishing. https://doi.org/10.1016/B978-1-907568-25-1.00005-0
Lunghini, F., Marcou, G., Azam, P., Patoux, R., Enrici, M. H., Bonachera, F., Horvath, D., & Varnek, A. (2019). QSPR models for bioconcentration factor (BCF): Are they able to predict data of industrial interest? SAR and QSAR in Environmental Research, 30(7), 507-524. https://doi.org/10.1080/1062936X.2019.1626278
Marigómez, I. (2014). Environmental Risk Assessment, Marine. En P. Wexler (Ed.), Encyclopedia of Toxicology (Third Edition) (pp. 398-401). Academic Press. https://doi.org/10.1016/B978-0-12-386454-3.00556-X
Silakari, O., & Singh, P. K. (2021). Chapter 2 - QSAR: Descriptor calculations, model generation, validation and their application. En O. Silakari & P. K. Singh (Eds.), Concepts and Experimental Protocols of Modelling and Informatics in Drug Design (pp. 29-63). Academic Press. https://doi.org/10.1016/B978-0-12-820546-4.00002-7
Muratov, E. N., Bajorath, J., Sheridan, R. P., Tetko, I. V., Filimonov, D., Poroikov, V., Oprea, T. I., Baskin, I. I., Varnek, A., Roitberg, A., Isayev, O., Curtalolo, S., Fourches, D., Cohen, Y., Aspuru-Guzik, A., Winkler, D. A., Agrafiotis, D., Cherkasov, A., & Tropsha, A. (2020). QSAR without borders. Chemical Society Reviews, 49(11), 3525-3564. https://doi.org/10.1039/D0CS00098A
Chandrasekaran, B., Abed, S. N., Al-Attraqchi, O., Kuche, K., & Tekade, R. K. (2018). Chapter 21—Computer-Aided Prediction of Pharmacokinetic (ADMET) Properties. En R. K. Tekade (Ed.), Dosage Form Design Parameters (pp. 731-755). Academic Press. https://doi.org/10.1016/B978-0-12-814421-3.00021-X
Gund, T. (1996). 3—Molecular Modeling of Small Molecules. En N. C. Cohen (Ed.), Guidebook on Molecular Modeling in Drug Design (pp. 55-92). Academic Press. https://doi.org/10.1016/B978-012178245-0/50004-4
Errol G. Lewars. (2011). Computational Chemistry: Introduction to the Theory and Applications of Molecular and Quantum Mechanics (2a ed.). Springer Netherlands.
Tosco, P., Stiefl, N., & Landrum, G. (2014). Bringing the MMFF force field to the RDKit: Implementation and validation. Journal of Cheminformatics, 6(1), 37. https://doi.org/10.1186/s13321-014-0037-3
García-Jacas, C. R., Marrero-Ponce, Y., Acevedo-Martínez, L., Barigye, S. J., Valdés-Martiní, J. R., & Contreras-Torres, E. (2014). QuBiLS-MIDAS: A parallel free-software for molecular descriptors computation based on multilinear algebraic maps. Journal of Computational Chemistry, 35(18), 1395-1409. https://doi.org/10.1002/jcc.23640
Echols, K. R., Gale, R. W., Schwartz, T. R., Huckins, J. N., Williams, L. L., Meadows, J. C., Morse, D., Petty, J. D., Orazio, C. E., & Tillitt, D. E. (2000). Comparing Polychlorinated Biphenyl Concentrations and Patterns in the Saginaw River Using Sediment, Caged Fish, and Semipermeable Membrane Devices. Environmental Science & Technology, 34(19), 4095-4102. https://doi.org/10.1021/es001169f
Geyer, H. J., Scheunert, I., Brüggemann, R., Steinberg, C., Korte, F., & Kettrup, A. (1991). QSAR for organic chemical bioconcentration in Daphnia, algae, and mussels. Science of The Total Environment, 109-110, 387-394. https://doi.org/10.1016/0048-9697(91)90193-I
Devillers, J., Bintein, S., & Domine, D. (1996). Comparison of BCF models based on log P. Chemosphere, 33(6), 10471065. https://doi.org/10.1016/0045-6535(96)00246-9
Wei, D., Zhang, A., Wu, C., Han, S., & Wang, L. (2001). Progressive study and robustness test of QSAR model based on quantum chemical parameters for predicting BCF of selected polychlorinated organic compounds (PCOCs). Chemosphere, 44(6), 1421-1428. https://doi.org/10.1016/S0045-6535(00)00538-5
Sa^an, M. T., Erdem, S. S., Özpinar, G. A., & Balcioglu, I. A. (2004). QSPR Study on the Bioconcentration Factors of Nonionic Organic Compounds in Fish by Characteristic Root Index and Semiempirical Molecular Descriptors. Journal of Chemical Information and Computer Sciences, 44(3), 985-992. https://doi.org/10.1021/ci0342167
Lu, X., Tao, S., Cao, J., & Dawson, R. W. (1999). Prediction of fish bioconcentration factors of nonpolar organic pollutants based on molecular connectivity indices. Chemosphere, 39(6), 987-999. https://doi.org/10.1016/S0045-6535(99)00020-X
Lu, X., Tao, S., Hu, H., & Dawson, R. W. (2000). Estimation of bioconcentration factors of nonionic organic compounds in fish by molecular connectivity indices and polarity correction factors. Chemosphere, 41(10), 1675-1688. https://doi.org/10.1016/S0045-6535(00)00050-3
Fox, K., Zauke, G. P., & Butte, W. (1994). Kinetics of Bioconcentration and Clearance of 28 Polychlorinated Biphenyl Congeners in Zebrafish (Brachydanio rerio). Ecotoxicology and Environmental Safety, 28(1), 99-109. https://doi.org/10.1006/eesa.1994.1038
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18. https://doi.org/10.1145/1656274.1656278
Thirumalai, K., Singh, A., & Ramesh, R. (2011). A MATLABTM code to perform weighted linear regression with (correlated or uncorrelated) errors in bivariate data. Journal of the Geological Society of India, 77(4), 377-380. https://doi.org/10.1007/s12594-011-0044-1
Seeger, M. (2004). Gaussian processes for machine learning. International Journal of Neural Systems, 14(02), 69106. https://doi.org/10.1142/S0129065704001899
Cabrera, N., Mora, J. R., & Marquez, E. A. (2019). Computational Molecular Modeling of Pin1 Inhibition Activity of Quinazoline, Benzophenone, and Pyrimidine Derivatives. Journal of Chemistry, 2019, 1-11. https://doi.org/10.1155/2019/2954250
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32. https://doi.org/10.1023/A:1010933404324
Li, C., & Jiang, L. (2006). Using Locally Weighted Learning to Improve SMOreg for Regression. En Q. Yang & G. Webb (Eds.), PRICAI 2006: Trends in Artificial Intelligence (pp. 375-384). Springer. https://doi.org/10.1007/978-3-540-36668-3_41
Bugeac, C. A., Ancuceanu, R., & Dinu, M. (2021). QSAR Models for Active Substances against Pseudomonas aeruginosa Using Disk-Diffusion Test Data. Molecules, 26(6), 1734. https://doi.org/10.3390/molecules26061734
Veerasamy, R., Rajak, H., Jain, A., Sivadasan, S., Varghese, C. P., & Agrawal, R. K. (2011). Validation of QSAR Models—Strategies and Importance. International Journal of Drug Design and Discovery, 2(3), 511-519.
Gramatica, P., Chirico, N., Papa, E., Cassani, S., & Kovarich, S. (2013). QSARINS: A new software for the development, analysis, and validation of QSAR MLR models. Journal of Computational Chemistry, 34(24), 2121-2132. https://doi.org/10.1002/jcc.23361
Cabrera, N., Mora, J. R., Márquez, E., Flores-Morales, V., Calle, L., & Cortés, E. (2021). QSAR and molecular docking modelling of anti-leishmanial activities of organic selenium and tellurium compounds. SAR and QSAR in Environmental Research, 32(1), 29-50. https://doi.org/10.1080/1062936X.2020.1848914
Montgomery, D. C., & Runger, G. C. (2014). Applied Statistics and Probability for Engineers (6a ed.). John Wiley & Sons.
Mao, J. X. (2014). Atomic Charges in Molecules: A Classical Concept in Modern Computational Chemistry. Journal of Postdoctoral Research, 2(2), 4. https://doi.org/10.14304/SURYA.JPR.V2N2.2
Gupta, V. P. (2016). 12—Characterization of Chemical Reactions. En V. P. Gupta (Ed.), Principles and Applications of Quantum Chemistry (pp. 385-433). Academic Press. https://doi.org/10.1016/B978-0-12-803478-1.00012-1
House, J. E. (2013). Chapter 9—Acid-Base Chemistry. En J. E. House (Ed.), Inorganic Chemistry (Second Edition) (pp. 273-312). Academic Press. https://doi.org/10.1016/B978-0-12-385110-9.00009-1

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright (c) 2021 José Ramón Mora, Martín Moreno
