Transformer architectures

  • Text:
    • BioBERT[1]: a domain-specific language representation model pre-trained on large-scale biomedical corpora.
    • PubMedBERT[2]: biomedical literature from PubMed.
    • BioMegatron[3]: PubMed-derived free text.
    • ClinicalBERT[4]: clinical text in MIMIC-III.
    • GatorTron[5]: using >90 billion words of text from the de-identified clinical notes of University of Florida (UF) Health, PubMed articles, and Wikipedia.
  • EHR:

[1] Lee, J. et al. BioBERT: a pre-trained biomedical language representation model for
biomedical text mining. Bioinformatics. 36, 1234–1240 (2020).
[2] Gu, Y. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. 3, 1–23 (2022).
[3] Shin, H.-C. et al. BioMegatron: larger biomedical domain language model. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 4700–4706 (2020).
[4] Alsentzer, E. et al. Publicly Available Clinical BERT Embeddings. in Proc. 2nd Clinical Natural Language Processing Workshop 72–78 (2019).
[5] Yang X, Chen A, PourNejatian N, et al. A large language model for electronic health records[J]. NPJ Digital Medicine, 2022, 5(1): 194.