Back to R&D main

L.GEN.2004-Data Quality Metrics for Sheep Genetics and BREEDPLAN

Did you know, improving data feedback gives sheep breeders enhanced accuracy of estimated breeding values and increases their ability to make accurate selection decisions?

Project start date: 07 February 2020
Project end date: 29 June 2021
Publication date: 16 February 2022
Project status: Completed
Livestock species: Sheep
Relevant regions: National
Download Report (3.2 MB)

Summary

This project developed metrics to describe the quality of data submitted for inclusion in genetic evaluation systems. The work focused mainly on the sheep genetic evaluation systems delivered through Sheep Genetics, and the refinement and reporting of the data quality metrics currently reported in the RAMPing Up Genetic Gains (RUGG) reports.

Objectives

Objective 1. Demonstration of the value proposition of data quality metrics in relation to prediction of genetic merit
• Objectives 2 & 3. Development of applicable data quality metrics and refinement of the current data quality reports and metrics within Sheep Genetics, as well as any additional metrics from BREEDPLAN
• Objective 4. Demonstration of ways to calculate and report data quality metrics on a per flock basis to the public

Key findings

Objective 1. The value proposition
Most current data quality metrics were significant predictors of genetic gains, explaining between 2 to 60% of the observed variation in the rate of progress between flocks. Flocks with higher quality data made more genetic progress.

• Objectives 2 & 3. Development and refinement of data quality metrics
The current RUGG data quality metrics were refined, and additional metrics (including DataAudit-inspired metrics) were calculated. There were 4 quantity, 9 quality and 5 timeliness-related new metrics calculated.

• Objective 2 & 3. Development of an overall Data Quality Score (DQS)
To combine the quantity, quality and timeliness metrics, an overall Data Quality Score (DQS) for each flock was derived using 3 methods. Detailed investigation of the alternate methods yielded no definitive or optimal method for how an overall DQS is calculated. All resulting DQSs were related to genetic gains, and the DQS were also moderately to highly-correlated with each other.
The informed weightings approach to derive the overall DQS is recommended, which utilises 21 metrics. The calculation of this DQS did not include metrics describing the amount of genetic gains achieved by each flock, but they were moderately to strongly correlated.

• Objective 4. Reporting of data quality metrics and score
OVIS software has been updated to calculate new metrics, DQS score, and automatic identification of data recording strengths and recommendations for improved data recording. New software was developed to generate interim DQS reports to demonstrate how it can be incorporated into RAMping Up Genetic Gains reports

• Objective 4. Road-testing of DQS with industry
The DQS prototype was road-tested at 6 events involving 96 flocks. The prototype was well-received and constructive feedback was provided to further enhance the usefulness of the DQS.

Benefits to industry

An enhanced data feedback tool for breeders.
• The RAMping Up Genetic Gains reports can be further enhanced with new data quality metrics, Data Quality Score, star rating, recommendations, and strengths. This provides targeted advice to breeders to assist in management changes, improve data collection and submission and hence ASBV accuracy. In turn, this will assist in more accurate selection decisions and increased rates of genetic progress
• Transparency for ram buyers about the quality of data used to calculate EBVs. While EBV accuracies are available for individual rams, a DQS provides an indication of the overall quality of the flock’s data and allows direct comparison across flocks.
• A way to identify and highlight breeders who collect high quality data. This could be used as a basis for discounted registration fees, or through breeder awards, or other signals and/or rewards
• Engagement tool for Sheep Genetics development officers (and service providers) for targeted extension activities for flocks with poor data quality.
• This data quality framework can be further developed to determine and value data contribution to reference populations

MLA action

Public and private reporting: The recommended roll out strategy is to initially privately report the DQS whilst road-testing, before public reporting of star ratings after a grace period. The pathway to public release (including the length of the grace period) is yet to be fully defined. The reporting will take place via upgrades to the Sheep Genetics website.
• Continued road-testing and education: This is particularly important if there is a reward or incentive to having a high data quality score. This requires a detailed communication strategy, which may involve media releases, and fact sheets and videos on the Sheep Genetics website.
• Incorporation into RUGG reports: While an interim report is available, it would be ideal to incorporate the DQS and associated features into RUGG reports. Increasing the availability and use of the RUGG reports by service providers and breeders should also be a key strategy.
• Continuous monitoring and refinement: The metrics require monitoring, and weights require refinement over time. This will assist in evaluation of how effective the reporting is to entice change. There is also potential to further refine the DQS reporting
• Understanding of poor data recording: Understanding recording challenges, and devising targeted extension messages
• Application in beef: The demand for an updated data quality framework for BREEDPLAN (currently delivered through DataAudit software) is unknown. In principle, frameworks for evaluating data quality should be consistent across species, primarily to facilitate extension, and potentially to simplify introduction of systems for valuing of data for reference populations.

Future research

There are several areas for further research arising from the project including:
• Understanding challenges and reasons for poor data recording. Since some metrics are widely poorly recorded across many flocks (e.g. level of full pedigree recording in Merino flocks), it would be beneficial to understand 1) why it is poorly recorded, 2) explore/devise tools to increase ease of recording, and 3) design an extension campaign to target improvement of recording.
• Cost benefit analysis and tools to understand to what extent it is worth improving data recording, considering the costs associated, at both the individual flock and the industry levels
• Better demonstration of the value proposition. Flocks with higher Data Quality Scores had higher index accuracies and rates of genetic gains. The Data Quality Score also provides additional information not captured in EBV/index accuracy. There is potential to undertake a simulation to better understand how changing data recording reflect consequent outcomes in genetic gains. An important component over time will be to understand if RUGG and DQS reporting leads to change in behaviour and improved recording.
• Better accounting of fixed effects. Completeness and accuracy of fixed effect recording is only captured to a limited extent in the proposed framework. This requires more in-depth examination.
• Data Quality Score -- reference vs. individual breeder. The proposed DQS characterises the data, with the purpose of understanding the value of the data to individual breeders and their clients. Although related, an alternative perspective is valuing data contributing to the reference population, and/or other breeders.

For more information:

Contact Project Manager: Peta Bradley

E: reports@mla.com.au