November 12th, 2020

The continuous adoption of genomic prediction

The use of GP for early recycling reduces cohorts

1. Introduction to the problem

Crop by region

CIAT-Beans EA

Problem specification

Reducing cycle time to increase rates of gain is not being fully exploited in classical programs neither the use of genomic prediction to boost even more the reduction of cycle time. Playing factors are not fully understood.

Breeding strategy component tackled

Crossing, Evaluation, Selection

Breeders’ equation terms tackled

L

\(\Delta_g = (i * \sigma_g * r)/L\)

Hypothesis

Using genomic prediction in the recycling (selection) process to reduce the cycle time could increase the rate of genetic gain.

2.1 Materials and methods (reduced cycle time)

Treatments

Treatment Description
TPn_PPn_SPf4f5 TrainingPop=NULL,PredictedPop=NULL,RecyclingPop=F4-F5 using an index.
TPn_PPn_SPf5f6 TrainingPop=NULL,PredictedPop=NULL,RecyclingPop=F5-F6 using an index.
TPn_PPn_SPf6f7 TrainingPop=NULL,PredictedPop=NULL,RecyclingPop=F6-F7 using an index.
TPn_PPn_SPf7f8 TrainingPop=NULL,PredictedPop=NULL,RecyclingPop=F7-F8 using an index.
TPn_PPn_SPf7f8_NF TrainingPop=NULL,PredictedPop=NULL,RecyclingPop=F7-F8 using an index with no family selection.
TPf5f6_PPf1_SPf1 TrainingPop=F5-F6, PredictedPop=F1, RecyclingPop=F1 using an index.

Simulation procedure

A 20 year burn-in period was used. Burn-in was followed by a 20 year evaluation period to measure rates of genetic gain in F9 lines. Genotype-by-year, genotype-by-location interaction variances were assumed to be equivalent to main genetic variance. 25 replications done. We simulated 5 complex and 3 simple traits to be behind the genetic merit. TP=random, N.TP=3K, N.Markers=5K.

2.2 Results for cycle time reduction

  • Reducing cycle time to F5-F6 will provide 1.31 (95% CI: 1.03,1.66) times more gain at year 10 and 1.26 (95% CI: 1.12,1.42) at year 20.
  • Reducing cycle time to F1 will provide 2.05 (95% CI: 1.45,2.9) times more gain at year 10 and 1.2 (95% CI: 1.03,1.4) at year 20.

2.2 Results for cycle time

The plateu reached by GP can be attributed to quick genetic variance depletion and the drift in simple traits.

3.1 Materials and methods (diversity issue)

Treatments

Treatment Description
TPf5f6_PPf1_SPf1 TrainingPop=F5-F6,PredictedPop=F1,RecyclingPop=F1 using an index.
TPf5f6_PPf1_SPf1_OCS TrainingPop=F5-F6,PredictedPop=F1,RecyclingPop=F1 using an index + optimal contribution.
TPf5f6_PPf1_SPf1_2S TrainingPop=F5-F6,PredictedPop=F1,RecyclingPop=F1 using an index + 2-step selection (simple traits => complex traits).
TPf5f6_PPf1_SPf1_NN TrainingPop=F5-F6,PredictedPop=F1,RecyclingPop=F1 using an index + increased #parents.
TPn_PPn_SPf7f8_NF TrainingPop=NULL,PredictedPop=NULL,RecyclingPop=F7-F8 using an index with no family selection.

Simulation procedure

A 20 year burn-in period was used. Burn-in was followed by a 20 year evaluation period to measure rates of genetic gain in F9 lines. Genotype-by-year, genotype-by-location interaction variances were assumed to be equivalent to main genetic variance. 25 replications done. We simulated 5 complex and 3 simple traits to be behind the genetic merit. TP=random, N.TP=3K, N.Markers=5K.

3.2 Results for addressing diversity issue

  • Reducing cycle time to F1+OCS will provide 1.02 (95% CI: 0.78,1.33) times more gain at year 20 than F1
  • Reducing cycle time to F1+2S will provide 1.25 (95% CI: 1.07,1.46) times more gain at year 20 than F1
  • Reducing cycle time to F1+NNp will provide 1.29 (95% CI: 1,1.66) times more gain at year 20 than F1

3.2 Results for diversity issue

OCS methodologies help to avoid a quick depletion of genetic variance. Increasing the number of parents to face the reduction of Ne (because of less cohorts) can also help. A different treatment of simple and complex traits in the application of GP makes a big difference.

4.1 Materials and methods (training pop stage)

Treatments

Treatment Description
TPf5f6_PPf1_SPf1_OCS TrainingPop=F5-F6, PredictedPop=F1, RecyclingPop=F1 using an index+OCS.
TPf6f7_PPf1_SPf1_OCS TrainingPop=F6-F7, PredictedPop=F1, RecyclingPop=F1 using an index+OCS.
TPf7f8_PPf1_SPf1_OCS TrainingPop=F7-F8, PredictedPop=F1, RecyclingPop=F1 using an index+OCS.

Simulation procedure

A 20 year burn-in period was used. Burn-in was followed by a 20 year evaluation period to measure rates of genetic gain in F9 lines. Genotype-by-year, genotype-by-location interaction variances were assumed to be equivalent to main genetic variance. 25 replications done. We simulated 5 complex and 3 simple traits to be behind the genetic merit. TP=random, N.TP=3K, N.Markers=5K.

4.2 Results for training pop stage

  • Closing the gap between TP and PP by 2 generations (F7-F8 to F5-F6) 1.31 (95% CI: 1.02,1.67) times more gain at year 10, and 1.11 (95% CI: 1,1.24) times more gain at year 20. M=5K;N=3K;S=random.

4.2 Results for training pop stage

Training the model with STG1 and STG2 data generated earlier or later doesn’t seem to have an important effect in the long term but in early years could be more important. The earlier the data feeds the model the best. Closing the gap between the TP and PP.

5.1 Materials and methods (training pop sample)

Treatments

Treatment Description
TPf5f6_PPf1_SPf1_OCS_FIRST TrainingPop=F5-F6 using 3K old individuals, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.
TPf5f6_PPf1_SPf1_OCS_LAST TrainingPop=F5-F6 using 3K recent individuals, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.
TPf5f6_PPf1_SPf1_OCS_RANDOM TrainingPop=F5-F6 using 3K random individuals, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.

Simulation procedure

A 20 year burn-in period was used. Burn-in was followed by a 20 year evaluation period to measure rates of genetic gain in F9 lines. Genotype-by-year, genotype-by-location interaction variances were assumed to be equivalent to main genetic variance. 25 replications done. We simulated 5 complex and 3 simple traits to be behind the genetic merit. TP=varied, N.TP=3K, N.Markers=5K.

5.2 Results for training pop sample

  • Using recent individuals for the TP will provide 1.28 (95% CI: 1.15,1.43) times more gain at year 20 than old individuals. and using random individuals 1.17 (95% CI: 1.08,1.26). M=5K;N=3K;S=?

5.2 Results for training pop sample

The use of recent or random individuals is preferable over old data. The use of recent data showed to be better than a random sample but not for much.

6.1 Materials and methods (training pop number)

Treatments

Treatment Description
TPf5f6_PPf1_SPf1_OCS_N500 TrainingPop=F5-F6 using 500 random individuals 5K markers, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.
TPf5f6_PPf1_SPf1_OCS_N1000 TrainingPop=F5-F6 using 1000 random individuals 5K markers, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.
TPf5f6_PPf1_SPf1_OCS_N3000 TrainingPop=F5-F6 using 3000 random individuals 5K markers, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.
TPf5f6_PPf1_SPf1_OCS_N5000 TrainingPop=F5-F6 using 5000 random individuals 5K markers, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.

Simulation procedure

A 20 year burn-in period was used. Burn-in was followed by a 20 year evaluation period to measure rates of genetic gain in F9 lines. Genotype-by-year, genotype-by-location interaction variances were assumed to be equivalent to main genetic variance. 25 replications done. We simulated 5 complex and 3 simple traits to be behind the genetic merit.TP=random, N.TP=varied, N.Markers=5K.

6.2 Results for training pop number

  • Using 3K individuals in the training population provides 1.28 (95% CI: 0.84,1.95) times more gain than 500 inds at year 20.And 5K provides 1.52 (95% CI: 1.08,2.14) times more gain. M=5K;N=?;S=random.

6.2 Results for training pop number

The greater the number of individuals in the model the best. We limited our scenario to what was computanionally feasible. But the learning is that the more data is available the best.

7.1 Materials and methods (number of markers)

Treatments

Treatment Description
TPf5f6_PPf1_SPf1_OCS_M500 TrainingPop=F5-F6 using 3K random individuals 500 markers, PredictedPop=F1, RecyclingPop=F1 using a base index + optimal contribution.
TPf5f6_PPf1_SPf1_OCS_M1000 Same but 1000 markers
TPf5f6_PPf1_SPf1_OCS_M2500 Same but 2500 markers
TPf5f6_PPf1_SPf1_OCS_M5000 Same but 5000 markers
TPf5f6_PPf1_SPf1_OCS_M10000 Same but 10000 markers
TPf5f6_PPf1_SPf1_OCS_M30000 Same but 30000 markers

Simulation procedure

A 20 year burn-in period was used. Burn-in was followed by a 20 year evaluation period to measure rates of genetic gain in F9 lines. Genotype-by-year, genotype-by-location interaction variances were assumed to be equivalent to main genetic variance. 25 replications done. We simulated 5 complex and 3 simple traits to be behind the genetic merit. TP=random, N.TP=3K, N.Markers=varied.

7.2 Results for number of markers

  • Using 5000 markers in the prediction model provides 1.24 (95% CI: 1.11,1.38) times more gain than 500 markers at year 20.And 30K markers provides 1.24 (95% CI: 1.11,1.38) times more gain. M=?;N=3K;S=random.

7.2 Results for number of markers

The number of markers seems to be important when is too low but when moving to the thousands of markers the differences among treatments are less. We expect this can change with different TP sizes. A grid is needed to find a definite answer but seems that 5-10 K markers may be enough.

8. Conclusion

We highly recommend the use of GP for reducing cycle time if all proper steps have been adopted. Watch out for: 1) Recycling early using GP models increases drastically the genetic gain but exahusts diversity much quicker. 2) Do not jump into GP to reduce cycle time if you haven’t sorted out the logistics. 3) Training models should have data for all traits, not only yield.

Practical recommendations for the adoption of GP for reducing cycle time include:

1) Implement a closed system to keep training models accurate.Do not try to predict exotic germplasm or use exotic germplasm to predict the elite population.

2) Exhausting the variance is inevitable and a good selection process needs to be used together with measures like using the right number of parents and OCS.

3) The time separation between TP and PP doesn’t seem to affect much once the method is stablished.

4) The number of individuals in the training model (assuming a closed system) has great impact and we recommend this use as much information as as possible (informations from the same pool, not exotic or research pools).

5) The number of markers used to capture the relationship between the TP and the PP is important but several thousand markers (i.e. 5K) is enough to capture the relationships well for several decades in a closed system.