Timely H5N1 sequence data now provided by many – but not all – countries

Sarah Otto*

* Department of Zoology & Biodiversity Research Centre, The University of British Columbia, Vancouver BC Canada V6T 1Z4. E-mail: [email protected]

A year ago, we raised concerns about the lengthy delays1 between when genetic samples of the highly pathogenic avian influenza (H5N1) were obtained and when genomic sequence data were submitted to the Global Initiative on Sharing All Influenza Data (GISAID) repository for virus data and associated metadata2,3. As our hope in drawing attention to these delays was to motivate faster data sharing for this virus of pandemic concern, we reexamined the temporal delays for H5N1 (clade 2.3.4.4b) sequences submitted to GISAID in 2025 (accessed February 26 2026), relative to delays for similar data submitted in 2024 (accessed February 26 2025). In both cases, we restricted the dates when samples were collected to a five-year window (starting January 1 2020 and January 1 2021, respectively) to avoid skewing the data strongly by historical samples.

Across these two snapshots of time, there was a slight drop in total submissions from 11,131 in 2024 to 10,819 in 2025. Ten additional samples were submitted in 2025 but embargoed, a new option through GISAID, and these ten samples are not further analysed.

Considering only data with complete collection date information, the median delay between collection and submission dates has improved substantially in the past year, down from 474 days in 2024 to 59 days in 2025. Very long delays remain common, however, so much so that the mean delay shows less improvement and remains over six months, on average (mean ±SD: 427±336 days out of 10,254 datapoints in 2024 and 208±288 days out of 7,617 datapoints in 2025).

The delay distribution for countries with at least 100 samples submitted either in 2024 or 2025 is shown in Figure 1 (summary statistics in Table 1). Six countries (Austria, Czech Republic, United Kingdom, Italy, Netherlands, Germany) submitted data within two months of collection in both years, with an additional three countries reaching this achievement in 2025 (Norway, France, United States). Improvements in Norway and France were tremendous, with ten-fold lower median delays in 2025. By contrast, nearly half of the submissions from the United States lacked day-of-collection information (3169/6595), obscuring delays from the USA. Of the countries submitting substantial numbers of sequences, Canada had the longest delay in 2024 (median of 706 days); while the timeliness of Canadian data improved substantially in 2025, the delay remained over a year (median of 469 days) hampering analyses of novel variants and transmission among species.

Beyond data delays, data submitted in 2025 were less likely to have complete metadata. While collection date information was incomplete for 7.7% of samples in 2024, the proportion incomplete rose dramatically to 29.6% of samples in 2025, with most of these incomplete dates giving only the year of collection and most originating from the United States (3169 of 3202 samples with incomplete dates). While geographic information about the state from which a sample was collected was typically supplied in 2024 for the United States (89.4% of 6889 samples), geographic resolution was often missing in 2025 (only 52.2% of USA samples provided the state), typically coinciding with incomplete collection date. By contrast, the province of origin was always indicated for data from Canada in both years. No data on the health status of the animals or their domestic/wild status was provided in 2025 (both down from 5.9% in 2024).

As data uploaded to GISAID is not always made publicly available when submitted and may be updated by the submitters, we reanalysed a more recent download of data submitted in 2025 (downloaded May 29, 2026). While there were updates on the health status of the animals (2.6%) or their domestic/wild status (8.5%), incomplete collection dates remained common (30.4% =3419/11,240 of datapoints), especially from the United States (3169/3419). It is worth noting that 421 additional virus samples appeared to be submitted in 2025 when queried on May 29, 2026 than on February 26, 2026 (e.g., EPI_ISL_19766196, with submission date March 4, 2025, was not in the database on February 26, 2026). This appearance of data on GISAID months after submission is consistent with back-filling of data without transparent information about updates (see also 1), which hampers reproducibility because the data available changes by accession date.

In summary, the sharing of important genomic information about highly pathogenic avian flu has improved over the last year, with many countries now submitting data within two months of collection (the median in 2025). Nevertheless, delays remained over six months long in 2025, on average (the mean), in part due to a large number of lengthy delays for samples from Canada. While the United States contributed the most data in both years (Table 1), the quality of the metadata from the USA has declined substantially, with almost half of samples lacking data on the date and state of collection. Viral sequence data has been enormously helpful for tracking disease transmission among species4 and agricultural communities5 and for monitoring variant evolution of highly pathogenic avian influenza6, but fulsome analyses require that data are available. It is hoped that this short report will spur improved data sharing by applauding the progress made and highlighting gaps that remain.

Figure 1: Collection-to-submission times by country for highly pathogenic avian influenza (H5N1). Violin plots show the distribution of days from sample collection to submission for all countries that had submitted at least 100 accessions in 2024 or 2025, ordered by median delay (numbers at top of graph). Grey distribution on the far right is for the global dataset.

Table 1: Summary table by country. Data are restricted to H5N1 sequence data submitted to GISAID in 2024 or 2025 with complete collection dates. # gives the number of GISAID samples uploaded in 2024 (left) or 2025 (right), followed by the median, mean, standard deviation of the mean, and 2.5% and 97.5% quantiles of the distribution. Grey data on the bottom is for the combined global dataset. [Link to spreadsheet for Table 1 or for all countries.]

DATA AVAILABILITY

The findings of this study are based on metadata accessed from GISAID based on submissions in 2024 and in 2025 (Appendix). The first 2024 batch were accessed on February 26 2025 (list of EPI_SET IDs at https://doi.org/10.55876/gis8.250226wp). The second 2025 batch was accessed on February 26 2026 (list of EPI_SET IDs at https://doi.org/10.55876/gis8.260529vh) and again on May 29 2026 (list of EPI_SET IDs at https://doi.org/10.55876/gis8.260529vz). The download format was “Isolates as XLS (virus metadata only)”. All analyses were conducted in Mathematica (notebook link).

ACKNOWLEDGEMENTS

I thank Sean Edgerton for valuable discussions on data gaps.

REFERENCES:

1. Otto, S. P., & Edgerton, S. V. (2025). Lengthy delays in H5N1 genome submissions to GISAID. Nature Biotechnology, 43, 665-666. Lengthy delays in H5N1 genome submissions to GISAID | Nature Biotechnology

2. Shu, Y. & McCauley, J. (2017) GISAID: Global initiative on sharing all influenza data - from vision to reality. Euro Surveill. 22. https://doi.org/10.2807/1560-7917.ES.2017.22.13.30494

3. Elbe, S. & Buckland-Merrett, G. (2017) Data, disease and diplomacy: GISAID’s innovative contribution to global health. Global Chall. 1, 33–46. https://doi.org/10.1002/gch2.1018

4. Giacinti, J. A., et al. (2024) Transmission dynamics of highly pathogenic avian influenza virus at the wildlife-poultry-environmental interface: a case study. One Health, 19, 100932. https://doi.org/10.1016/j.onehlt.2024.100932

5. Caserta, L. C., et al. (2024) Spillover of highly pathogenic avian influenza H5N1 virus to dairy cattle. Nature, 634(8034), 669-676. https://doi.org/10.1038/s41586-024-07849-4

6. Chakraborty, C., & Bhattacharya, M. (2024) Evolution and mutational landscape of highly pathogenic avian influenza strain A (H5N1) in the current outbreak in the USA and global landscape. Virology, 600, 110246. https://doi.org/10.1016/j.virol.2024.110246

SUPPLEMENTARY APPENDIX

As recommended by GISAID - All genome sequences and associated metadata supporting the findings of this study can be accessed through the persistent digital object identifiers: https://doi.org/10.55876/gis8.250226zt, https://doi.org/10.55876/gis8.260529vh, https://doi.org/10.55876/gis8.260529vz.

In addition to the minted DOI, GISAID also communicates the aggregation of GISAID accession numbers (EPI_ISL_IDs) through the corresponding EPI_SET identifiers to facilitate both the acknowledgment of all data contributors and the direct retrieval of the underlying data from GISAID used in this study.