Large Learning Model for Cancer Characteristics
Extracting cancer tumor characteristics and other relevant data from these reports is vital for supporting clinical research in cancer domains and for other downstream use cases such as the secondary use of observational EHR data to understand treatment effectiveness and safety in real-world patient populations and for identifying eligible cancer patients for clinical trial matching. Natural language processing (NLP) is key to large-scale extraction of nuanced data within clinical texts.
MSI PIs Rui Zhang (Professor, Surgery; Masonic Cancer Center) and Steve Johnson (Assistant Professor, Institute for Health Informatics; Masonic Cancer Center), and Professor Anne Blaes (Medicine; Masonic Cancer Center) are working on a project called “CancerLLM: Development of a Cancer-domain Large Language Model to Extract Diagnostic Information,” that seeks to develop a privacy preserving cancer domain specific large language model (LLM) capable of extracting diagnostic information for breast cancer. This project extends the researchers’ prior work (CancerBERT) to develop a cancer domain-specific LLM (CancerLLM) to extract diagnosis information from the records of patients with diverse cancer types. CancerLLM will enable researchers to process notes in a privacy-preserving manner and not release potentially identifiable data externally. The ability to extract these concepts more accurately will improve the use of unstructured observational data in cancer research, allow patients to be better matched to clinical trials and ultimately improve patient outcomes.
This project recently received a DSI Seed Grant. The Seed Grant program is intended to promote, catalyze, accelerate, and advance U of M-based data science research so that U of M faculty and staff are well prepared to compete for longer term external funding opportunities.
The program was updated in Summer 2024 to include three focus areas: Foundational Data Sciences; Digital Health and Personalized Health Care Delivery; and Agriculture and the Environment. The types of awards are Rapid Response Grants and new types, Awards for DSI Faculty Fellowship and Data Sets (Data as an Asset). This project falls under the Digital Health and Personalized Health Care Delivery focus area.
Complete information about DSI Seed Grants can be found on the DSI website.
Image description: Graphical abstract of the project.
