Title | : | Bridging the Knowledge Gap: Integrating Biomedical Ontologies to Enhance BERT-Based Medical MCQA |
Speaker | : | Sahil (IITM) |
Details | : | Tue, 17 Oct, 2023 4:00 PM @ SSB-333 |
Abstract: | : | We address the challenge of enhancing BERT-based language models for specialized medical tasks by integrating them with biomedical ontologies. Despite their robust language understanding capabilities, BERT models often need more domain-specific knowledge, particularly within medical contexts. We introduce BioOntoBERT, a BERT-based model pre-trained on multiple biomedical ontologies to bridge this gap. To generate knowledge-rich documents, we develop the Onto2Sen system, which extracts entity names, synonyms, definitions, and concept relationships from ontologies, enriching the model's knowledge of biomedical concepts. Our evaluation encompasses both the MedMCQA dataset and the MedQA dataset medical multiple-choice question answering (MCQA) benchmarks. The results reveal BioOntoBERT's improvement over baseline models like BERT, SciBERT, BioBERT, and PubM edBERT on both evaluation datasets. BioOntoBERT attains this improvement by incorporating only 158MB of ontology-generated data during pre-training, which is just 0.7% of the data used for PubMedBERT pre-training. Extending this work, we fine-tuned BERT-based models using a synthetic MCQA dataset, BioOntoMCQA, constructed using biomedical ontologies. We employ this fine-tuning strategy across various BERT models using the BioOntoMCQA dataset and assessing their performance on MedMCQA and MedQA datasets. The results demonstrate notable accuracy improvements, highlighting the role of biomedical ontologies in improving language models for medical domains. Our findings underscore the significance of fine-tuning with ontology-generated data and model adaptation within specialized domains. |