Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups
Naderi Darbaghshahi, Saeid; Yin, T.; König, S.
2016 • In Journal of Dairy Science, 99 (9), p. 7261-7273
disease trait; random forest methodology; accuracy of genomic prediction
Abstract :
[en] A simulation study was conducted to investigate the performance of random forest (RF) and genomic BLUP (GBLUP) for genomic predictions of binary disease traits based on cow calibration groups. Training and testing sets were modified in different scenarios according to disease incidence, the quantitative-genetic background of the trait (h2 = 0.30 and h2 = 0.10), and the genomic architecture [725 quantitative trait loci (QTL) and 290 QTL, populations with high and low levels of linkage disequilibrium (LD)]. For all scenarios, 10,005 SNP (depicting a low-density 10K SNP chip) and 50,025 SNP (depicting a 50K SNP chip) were evenly spaced along 29 chromosomes. Training and testing sets included 20,000 cows (4,000 sick, 16,000 healthy, disease incidence 20%) from the last 2 generations. Initially, 4,000 sick cows were assigned to the testing set, and the remaining 16,000 healthy cows represented the training set. In the ongoing allocation schemes, the number of sick cows in the training set increased stepwise by moving 10% of the sick animals from the testing set to the training set, and vice versa. The size of the training and testing sets was kept constant. Evaluation criteria for both GBLUP and RF were the correlations between genomic breeding values and true breeding values (prediction accuracy), and the area under the receiving operating characteristic urve (AUROC). Prediction accuracy and AUROC increased for both methods and all scenarios as increasing percentages of sick cows were allocated to the training set. Highest prediction accuracies were observed for disease incidences in training sets that reflected the population disease incidence of 0.20. For this allocation scheme, the largest prediction accuracies of 0.53 for RF and of 0.51 for GBLUP, and the largest AUROC of 0.66 for RF and of 0.64 for GBLUP, were achieved using 50,025 SNP, a heritability of 0.30, and 725 QTL. Heritability decreases from 0.30 to 0.10 and QTL reduction from 725 to 290 were associated with decreasing prediction accuracy and decreasing AUROC for all scenarios. This decrease was more pronounced for RF. Also, the increase of LD had stronger effect on RF results than on GBLUP results. The highest prediction accuracy from the low LD scenario was 0.30 from RF and 0.36 from GBLUP, and increased to 0.39 for both methods in the high LD population. Random forest successfully identified important SNP in close map distance to QTLexplaining a high proportion of the phenotypic trait variations.
Disciplines :
Animal production & animal husbandry
Author, co-author :
Naderi Darbaghshahi, Saeid ; Université de Liège - ULiège > Agronomie, Bio-ingénierie et Chimie (AgroBioChem) > Ingénierie des productions animales et nutrition
Yin, T.
König, S.
Language :
English
Title :
Random forest estimation of genomic breeding values for disease susceptibility over different disease incidences and genomic architectures in simulated cow calibration groups
Publication date :
September 2016
Journal title :
Journal of Dairy Science
ISSN :
0022-0302
eISSN :
1525-3198
Publisher :
American Dairy Science Association, Champaign, United States - Illinois
Annual statistics of the German Cattle Breeders Federation (2014), Arbeitsgemeinschaft Deutscher Rinderzüchter e.V. Bonn, Germany
Albrecht, T., Wimmer, V., Auinger, H.J., Erbe, M., Knaak, C., Ouzunova, M., Simianer, H., Schön, C.C., Genome-based prediction of testcross values in maize (2011) Theor. Appl. Genet., 123, pp. 339-350
Biffani, S., Dimauro, C., Macciotta, N., Rossoni, A., Stella, A., Biscarini, F., Predicting haplotype carriers from SNP genotypes in Bos taurus through linear discriminant analysis (2015) Genet. Sel. Evol., 47, p. 4
Breiman, L., Random forests (2001) Mach. Learn., 45, pp. 5-32
Buch, L.H., Kargo, M., Berg, P., Lassen, J., Sørensen, A.C., The value of cows in reference populations for genomic selection of new functional traits (2012) Animal, 6, pp. 880-886
Daetwyler, H.D., Calus, M.P.L., Pong-Wong, R., de Los Campos, G., Hickey, J.M., Genomic prediction in animals and plants: Simulation of data, validation, reporting, and benchmarking (2013) Genetics, 193, pp. 347-365
Daetwyler, H.D., Hickey, J.M., Henshall, J.M., Dominik, S., Gredler, B., van der Werf, J.H.J., Hayes, B.J., Accuracy of estimated genomic breeding values for wool and meat traits in a multi-breed sheep population (2010) Anim. Prod. Sci., 50, pp. 1004-1010. , a
Daetwyler, H.D., Pong-Wong, R., Villanueva, B., Woolliams, J.A., The impact of genetic architecture on genome-wide evaluation methods (2010) Genetics, 185, pp. 1021-1031. , b
de Los Campos, G., Hickey, J.M., Pong-Wong, R., Daetwyler, H.D., Calus, M.P.L., Whole-genome regression and prediction methods applied to plant and animal breeding (2013) Genetics, 193, pp. 327-345
Edel, C., Schwarzenbacher, H., Hamann, H., Neuner, S., Emmerling, R., Götz, K.U., The German-Austrian genomic evaluation system for Fleckvieh (Simmental) cattle (2011) Interbull Bull., 44, pp. 152-156
Efron, B., Tibshirani, R.J., An Introduction to the Bootstrap (1993) Monographs on Statistics and Applied Probability 57, , Chapman & Hall/CRC New York, NY
Egger-Danner, C., Willam, A., Fuerst, C., Schwarzenbacher, H., Fuerst-Waltl, B., Hot topic: Effect of breeding strategies using genomic information on fitness and health (2012) J. Dairy Sci., 95, pp. 4600-4609
García-Magariños, M., Inaki, L.U., Cao, R., Salas, A., Evaluating the ability of tree-based methods and logistic regression for the detection of SNP-SNP interaction (2009) Ann. Hum. Genet., 73, pp. 360-369
Gernand, E., Rehbein, P., von Borstel, U.U., König, S., Incidences of and genetic parameters for mastitis, claw disorders, and common health traits recorded in dairy cattle contract herds (2012) J. Dairy Sci., 95, pp. 2144-2156
Ghafouri-Kesbi, F., Rahimi-Mianji, G., Honarvar, M., Nejati-Javaremi, A., Predictive ability of random forest, boosting, support vector machines and genomic best linear unbiased prediction in different scenarios of genomic evaluation (2016) Anim. Prod. Sci., , 10.1071/AN15538
Goddard, M., Genomic selection: Prediction of accuracy and maximisation of long term response (2009) Genetica, 136, pp. 245-257
González-Recio, O., Forni, S., Genome-wide prediction of discrete traits using Bayesian regressions and machine learning (2011) Genet. Sel. Evol., 43, p. 7
González-Recio, O., Rosa, G.J.M., Gianola, D., Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits (2014) Livest. Sci., 166, pp. 217-231
Guo, Z., Tucker, D.M., Basten, C.J., Gandhi, H., Ersoz, E., Guo, B., Xu, Z., Gay, G., The impact of population structure on genomic prediction in stratified populations (2014) Theor. Appl. Genet., 127, pp. 749-762
Hayes, B.J., Bowman, P.J., Chamberlin, A.J., Goddard, M.E., Invited review: Genomic selection in dairy cattle: Progress and challenges (2009) J. Dairy Sci., 92, pp. 433-443
Hernandez, L.M., Blazer, D.G., (2006) Genes, Behavior, and the Social Environment: Moving Beyond the Nature/Nurture Debate, , National Academies Press Washington, DC
Hill, W.G., Robertson, A., Linkage disequilibrium in finite populations (1968) Theor. Appl. Genet., 38, pp. 226-231
König, S., Brügemann, K., Pimentel, E.C.G., Züchterische Strategien für Tier- und Klimaschutz: Was ist möglich und was brauchen wir? (2013) Zuchtungskunde, 85, pp. 22-33
König, S., Dietl, G., Raeder, I., Swalve, H.H., Genetic relationships for dairy performance between large-scale and small-scale farm conditions (2005) J. Dairy Sci., 88, pp. 4087-4096
König, S., Simianer, H., Willam, A., Economic evaluation of genomic breeding programs (2009) J. Dairy Sci., 92, pp. 382-391
Kramer, M., Erbe, M., Seefried, F.R., Gredler, B., Bapst, B., Bieber, A., Simianer, H., Accuracy of direct genomic values for functional traits in Brown Swiss cattle (2014) J. Dairy Sci., 97, pp. 1774-1781
Li, Y., Kijas, J., Henshall, M., Lehnert, S., McCulloch, R., Reverter, A., Using random forests (RF) to prescreen candidate genes: A new prospective for GWAS (2014) Abstract 206 in., Proc. 10th World Congr. Genet. Appl. Livest. Prod, Vancouver, BC, Canada
Madsen, P., Jensen, J., (2010) A User's Guide to DMU: A Package for Analysing Multivariate Mixed Models. Version 6, release 5.0, , University of Aarhus Tjele, Denmark
Makowsky, R., Pajewski, N.M., Klimentidis, Y.C., Vazquez, A.I., Duarte, C.W., Allison, D.B., de Los Campos, G., Beyond missing heritability: Prediction of complex traits (2011) PLoS Genet., 7, p. e1002051
Mc Hugh, N., Meuwissen, T.H.E., Cromie, A.R., Sonesson, A.K., Use of female information in dairy cattle genomic breeding programs (2011) J. Dairy Sci., 94, pp. 4109-4118
Meng, Y.A., Yu, Y., Cupples, L.A., Farrer, L.A., Lunetta, K.L., Performance of random forest when SNPs are in linkage disequilibrium (2009) BMC Bioinformatics, 10, p. 78
Meuwissen, T.H., Hayes, B.J., Goddard, M.E., Prediction of total genetic value using genome-wide dense marker maps (2001) Genetics, 157, pp. 1819-1829
Minozzi, G., Pedretti, A., Biffani, S., Nicolazzi, E.L., Stella, A., Genome wide association analysis of the 16th QTL-MAS Workshop dataset using the Random Forest machine learning approach (2014) BMC Proc., 8, p. S4
Neves, H.H.R., Carvalheiro, R., Queiroz, S.A., A comparison of statistical methods for genomic selection in a mice population (2012) BMC Genet., 13, p. 100
Nguyen, T.T., Huang, J.Z., Wu, Q., Nguyen, T., Junjie, M., Genome-wide association data classification and SNPs selection using two-stage quality-based random forests (2015) BMC Genomics, 16, p. S5
Ogutu, J.O., Piepho, H.P., Schulz-Streeck, T., A comparison of random forests, boosting and support vector machines for genomic selection (2011) BMC Proc., 5, p. S11
Pimentel, E.C., Wensch-Dorendorf, M., König, S., Swalve, H.H., Enlarging a training set for genomic selection by imputation of un-genotyped animals in populations of varying genetic architecture (2013) Genet. Sel. Evol., 45, p. 12
Pryce, J.E., Goddard, M.E., Raadsma, H.W., Hayes, B.J., Deterministic models of breeding scheme designs that incorporate genomic selection (2010) J. Dairy Sci., 93, pp. 5455-5466
Pszczola, M., Strabel, T., Mulder, H.A., Calus, M.P.L., Reliability of direct genomic values for animals with different relationships within and to the reference population (2012) J. Dairy Sci., 95, pp. 389-400
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M.A.R., Bender, D., Maller, J., Sham, P.C., PLINK: A tool set for whole-genome association and population-based linkage analyses (2007) Am. J. Hum. Genet., 81, pp. 559-575
Sargolzaei, M., Schenkel, F.S., QMSim: A large-scale genome simulator for livestock (2009) Bioinformatics, 25, pp. 680-681
Su, G., Madsen, P., User's Guide for Gmatrix version 2, a program for computing genomic relationship matrix (2013), http://www.dmu.agrsci.dk/Gmatrix/Doc/, Accessed Apr. 11, 2013
Thomasen, J.R., Sørensen, A.C., Lund, M.S., Guldbrandtsen, B., Adding cows to the reference population makes a small dairy population competitive (2014) J. Dairy Sci., 97, pp. 5822-5832
VanRaden, P.M., Efficient methods to compute genomic predictions (2008) J. Dairy Sci., 91, pp. 4414-4423
VanRaden, P.M., O'Connell, J.R., Wiggans, G.R., Weigel, K.A., Genomic evaluations with many more genotypes (2011) Genet. Sel. Evol., 43, p. 10
Vazquez, A.I., de Los Campos, G., Klimentidis, Y.C., Rosa, G.J.M., Gianola, D., Yi, N., Allison, D.B., A comprehensive genetic approach for improving prediction of skin cancer risk in humans (2012) Genetics, 192, pp. 1493-1502
Yin, T., König, S., Genomics for phenotype prediction and management purposes (2016) Anim. Front., 6, pp. 65-72
Yin, T., Pimentel, E.C.G., König, V., Borstel, U., König, S., Strategy for the simulation and analysis of longitudinal phenotypic and genomic data in the context of a temperature × humidity-dependent covariate (2014) J. Dairy Sci., 97, pp. 2444-2454
Zhang, Z., Liu, J., Ding, X., Bijma, P., de Koning, D.J., Zhang, Q., Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix (2010) PLoS ONE, 5, p. e12648