Vision

To understand what we see in an image, we must first understand the world it represents. An image is never acquired in isolation; it is always part of a larger context. In medical imaging, this means understanding both the patient and the patient workflow.

Example workflow

A patient presents with neurological symptoms and is evaluated by a neurologist, who refers the patient for an MRI scan. The radiologist detects a brain tumor. While the tumor’s size and location are assessed, its aggressiveness remains unclear. The patient is scheduled for a brain biopsy, and the sample is sent to a pathologist for histological examination. The combined findings from imaging and pathology are discussed by a multidisciplinary team, including an oncologist and neurosurgeon. Surgery is recommended and the neurosurgeon uses pre-operative images to plan the procedure. Following surgery, the patient begins a course of radiotherapy and/or chemotherapy as part of the treatment plan. The patient will receive regular follow-up imaging to monitor for recurrence.

Although the above is a simplification, it provides valuable context to explain the role of imaging and AI.

Role of imaging

First, AI will not magically replace medical professionals, not any time soon. The example illustrates highly complex and specialized care, where human expertise remains essential. However, certain tasks in this workflow, such as tumor segmentation, can be automated using AI.

Second, imaging serves distinct purposes, each imposing specific requirements for image analysis. For example, tumor removal requires accurate segmentation, while follow-up imaging must ensure no tumor tissue is missed. Or in acute stroke imaging, fast detection is crucial because time is brain.

Third, image interpretation is typically performed by a radiologist or pathologist, who are also part of a multidisciplinary team where findings and implications are discussed. They are trained to interpret medical images and understand the patient and workflow. Thus when it comes to understanding medical images, AI must learn from their expertise.

Human interpretation often involves manual annotation. Before a machine learning algorithm can automatically interpret unseen images, it must first learn from annotated examples. These annotated images are referred to as the reference standard, and this process is known as supervised learning. Image analysis requires:

Damned finest coffee

Data

Reference standard

Neural networks

Computational resources

These are the basic building blocks. Despite all the amazing developments in AI, the brilliant new architectures, the massive computer power, we still need quality and representative data, and a clear understanding of what we are looking at, before AI potentially can take over the task. There is no shortcut. Also, behind every block there is a complete world of scientific progress and technical innovation. The field of medical imaging is constantly changing making an image (and its interpretation!) essentially a snapshot in space and time.

Snapshot in spacetime

A few simple examples illustrate its importance. Let’s say a new segmentation method (a new AI model) has been developed on CT images with a certain resolution. If technological advancements enable scanning at much higher resolution, for example with photon-counting CT, then this model cannot be used as is on the new imaging data. Or if an AI model has been trained on an adult population, then it cannot be used as is on a pediatric population. All factors contributing to image creation, as well as who is looking, influence the AI model. Therefore not only the image is a snapshot in spacetime, the AI model itself is as well: both are static. As local data and environments change, AI model performance declines over time, requiring monitoring when deployed in clinical practice. Segmentation is thus an engineering problem and cannot be solved like solving a mathematical problem.

But AI solves everything

No, AI does not solve everything. Except, perhaps, until AI can discover new physics, a sign of truly understanding our world and universe. We are also impressed and excited by AI developments. The first time interacting with a large language model (LLM), such as ChatGPT, can feel like magic. LLMs belong to the class of generative AI, which include models capable of generating text, images, audio, and video. The potential of generative AI in medicine is large, but eloquence and realism can be misleading. First, it is not always obvious when results are wrong. Ensuring correctness is a real hard problem (and what is correct?) making patient safety a concern. Second, only big tech companies have the resources to build these models, raising patient privacy concerns. Finally, it may obscure the fact that under the hood quality data and manual annotating (now called ‘reinforcement learning with human feedback’) are still crucial for training.

Foundation models

Foundation models play an important role in medical imaging. Examples of foundation models are TotalSegmentator and SAM. They work because images have more commonalities than not. For example, scanners operate on the same physical principles regardless of manufacturer, anatomy is similar regardless of age or race, and segmentation tasks use similar image gradients regardless of organ. Earlier models did not exhibit grokking, later models do and enable zero-shot learning, which allows segmenting unseen images without additional training. Foundation models are excellent starting points for segmentation, but require fine-tuning for your application since your patient data is unique. Do you trust SAM trained on images scraped from the internet to segment images of your patients? Probably not. Also with foundation models, there is no free lunch.

Congratulations on making it to the end. The above contains a wealth of ideas and information. If you have questions or wonder how this applies to your medical imaging problems, please feel free to reach out.