Report: Optimizing the use of GBLUP for recycling in IITA-NextGen-Cassava program

December 18th, 2020

The GS Baseline

GBLUP as a surrogate of genetic value

Can take several assumptions:

Additive effects only

Genomic estimated breeding value (GEBV): Assumes contribution of only additive effects to the traits as ‘non-additive effects cannot be passed to progeny’. Only additive effects of markers are used to estimate the breeding value. Selection Unit = Individual

            𝒚=𝑿𝜷+𝒁𝒂+𝒆

Limitation: Non-additive effects can be confounded with the additive effects during predictions and lead to overestimation of genetic parameters in downstream applications

Proven that non-additive effects contribute substantially to traits and that modeling dominance effects improved prediction accuracy and increased genetic gains in outbred programs.

GBLUP as a surrogate of genetic value

Additive + Dominance effects

Important for:

Increasing prediction accuracy and response to selection
Coupling of selected parents
Maintain dominance variation in the breeding population.

Can take several assumptions:

Additive + symmetrical dominance

Assuming symmetrical distributions of posterior estimates of additive + dominance effects to predict genomic estimated breeding values

Genomic estimated breeding values under dominance (GEBVd)

      𝒚=𝑿𝜷+𝒁𝒂+𝑾𝒅+𝒆

However, in outbred crops, we are interested in and affected by heterosis and/or inbreeding depression which are both conditioned by directional dominance (higher percentage of positive than negative dominance effects). We need to specifically select for this directional dominance.

GBLUP as a surrogate of genetic value

Additive + directional dominance

Additive + directional dominance: Selection Unit = Individual

Genomic estimated genetic value (GEGV)

          𝒚=𝑿𝜷+𝒇𝒃+𝒁𝒂+𝑾𝒅+𝒆

Additive + directional dominance: Selection Unit = Potential Crosses

Genomic prediction of cross-performance (GPCP)

          𝒚=𝑿𝜷+𝒇𝒃+𝒁𝒂+𝑾𝒅+𝒆

      𝑮𝑷𝑪𝑷=𝒂(𝒑−𝒒−𝒚)+𝒅[𝟐𝒑𝒒+𝒚(𝒑−𝒒)]

Cool, just like exploiting SCA in a conventional RRS but without the need to keep two heterotic pools, and just use markers to predict good cross combinations in the same pool.

A recent preprint on genomic prediction based on cross-performance in clonal crops simulating different levels of dominance can be found at https://www.biorxiv.org/content/10.1101/2020.06.15.152017v1

1. Introduction to the problem

Crop by region

IITA-Cassava NextGen

Problem specification

The program is using GBLUP for faster recycling. The effect of this fast recycling on genetic gain is unknown. The program would like to know also if this use of GBLUP can enable them to skip some stages.

Breeding strategy component tackled

Crossing, Evaluation, Selection

Breeders’ equation terms tackled

L, \(\sigma_g\), i

\(\Delta_g = (i * \sigma_g * r)/L\)

Hypothesis

The use of GBLUP for recycling will increase genetic gain more than the conventional method.

The performance of the GBLUP method and its effect on genetic gain is dependent on the management of directional dominance within the breeding population in addition to selecting for additive effects

2. We are moving fast: Where to?

Here we compare the three models: Ignoring dominance (GEBVd), considering dominance but selecting individuals (GEGV) and considering dominance but selecting based on cross performance (GPCP)

Treatments

Treatment	Description
Conv_PYT-AYT	Add + dom traits, no GS, recycling using the proposed mixed crossing block from both PYT and AYT
GEBVd_BASELINE_Ind	Add + dom traits (ignoring dom in GBLUP), trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=SNGS, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GEGV_BASELINE_Ind	Add + drdom traits, selectUnit= Individual, trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=SNGS, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GPCP_BASELINE_Ind	Add + drdom traits, selectUnit= Crosses, trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=SNGS, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3

Simulation procedure

A 20 year burn-in period was modeled using the baseline. The burn-in was followed by a 60 year evaluation period. Genetic gain was measured by assessing changes in genetic merit at UYT. Genotype-by-year interaction variance was assumed to be equivalent to genetic variance (based on average correlation between locations being equal to 0.5). 10 replications done. traitsNames = c(“MCMDS”, “DM”, “HI”, “RTSZ”, “PLTHT”, “FYLD”) and econWt = c(-10, 20, 7, 10, 10, 20).

2.0 Results

Comapring GS models with(out) directional dominance over 20-year and 60-year breeding periods

GPCP realized genetic gains faster. Considering directional dominance sustained the gains from GS longer. All GS methods were better than the conventional for the 20-year breeding period but not over a longer breeding period (60 years). Which GS method is the team applying? and experience so far?

2.2 Genetic gain and variance (UYT) per trait

The model performance depend on the underlying trait architecture as dictated by the covariance among traits and their weights/direction. GCPC is better in traits where dominance is present.

3. Could we make more gain if we delayed recycling to CE?

We move forward only with the best model: Genomic prediction of cross-perfomance (GPCP) and drop GEBVd and GEGV

Treatments

Treatment	Description
Conv_PYT-AYT	Add + dom traits, no GS, recycling using the proposed mixed crossing block from both PYT and AYT
GPCP_BASELINE	Genomic prediction of cross-performance, trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=SNGS, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GPCP_CE	Genomic prediction of cross-performance, trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=CE, recyclePop=CE, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GPCP-PYT-AYT	Genomic prediction of cross-perfomance, trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=PYT-AYT, recyclePop=PYT-AYT, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3

Simulation procedure

As in the previous treatments in section 2 above

3.1 Results on recycling stage

Comparing for 20-year and 60-year breeding period based on genomic prediction of cross-performance

All GP scenarios were better than the conventional method in the short term, but not in the long term due to depletion of genetic variation. There was no difference if GP was carried out at current SNGS or delayed to CE. Delaying GP to PYT-AYT reduced gains from GP but maintained variation over a longer period.

4. Could we skip stages and genotype more?

Also evaluated using only the best model: Genomic prediction of cross performance (GPCP) and drop GEBVd and GEGV

Treatments

Treatment	Description
Conv_PYT-AYT	Add + dom traits, no GS, recycling using the proposed mixed crossing block from both PYT and AYT
GPCP_BASELINE	Genomic prediction of cross-performance, trainPop=c(trainPop, CE, PYT, AYT, UYT1, UYT2), predictPop=SNGS, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GPCP_PYT	Genomic prediciton of cross-performance, skip=CE trainPop=c(trainPop, PYT, AYT, UYT1, UYT2), predictPop=SNGS+PYT, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GPCP__AYT	Genomic prediction of cross-performance, skip=CE+PYT, trainPop=c(trainPop, AYT, UYT1, UYT2), predictPop=SNGS+AYT, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3
GPCP_PYT_i5250	Genomic prediciton of cross-performance, skip=CE, trainPop=c(trainPop, PYT, AYT, UYT1, UYT2), predictPop=SNGS+PYT, recyclePop=SNGS, ntrainPop =3000 (random), nSNPs = 5400, meanDD=0.3, genotype 5250 (assuming resources are saved from the skipped stage)

Simulation procedure

As in the previous treatments in section 2 above

4.1. Results on skipping stages and selInt at F1

comparing 20-year and 60-year breeding period based on genomic prediciton of cross-performance

A clear difference can be seen between the conventional method and GP prediction methods. There was no big difference between the current baseline and skipping of one(CE), or two (CE+PYT) stages. Doubling the number genotyped did have a small advantage in the short term but it did not seem to justify the extra investment.

4. Conclusion

It is important to model for dominance effects in GBLUP models to avoid losing heterozygosity and losing genetic gains due to inbreeding depression

The current results show that genomic prediction of cross performance is a promising method to ensure maintenance of dominance and guard against rapid inbreeding depression

In the current simulation, we have assumed that traits have an average dominance degree of 30%. If higher dominance degree is expected within the program, modeling of directional dominance becomes even more important. Also if the heritability in the actual trials is lower than simulated, the ROI from genomic prediction is expected to be much higher than simulated.(Refer to previous GBLUP-for accuracy session)

Delayed recycling from current SN to CE did not have a big impact on gains.Comparing short and long-term breeding periods indicate the need to balance the rate at which we make genetic gains and the breeding period targeted to ensure enough variation to sustain genetic gains

There was no clear effect from skipping stages, and even increasing the number genotyped by factor two. This implies that the extra resources saved from a skipped stage could be used to carry out more accurate trials in later stages, although this scenario is yet to be simulated

Genomic prediction has faster return on investment in the short run and all scenarios are better than the conventional ways. For GP approaches, there is need to consider returns in the short term and explore methods of strategic introgression to avoid rapid diversity depletion