converge


AI In Cancer Diagnosis

A 2015 study conducted by the United States National Institute of Health investigated the accuracy of breast cancer diagnosis. 240 biopsies of breast tissue were distributed among 115 pathologists. They examined the images, interpreted what they saw, and determined the severity of the situation. Samples were grouped into four categories: benign without atypia, atypia, ductal carcinoma, and invasive cancer, the most malignant kind of cells. The results were concerning — the pathologists only managed to reach a consensus 48 per cent of the time. At this rate, you might as well be flipping a coin for your diagnosis. Heads, and you could undergo an unnecessary mastectomy, costing you hundreds of thousands of dollars. Tails, and you could miss the opportunity to detect and treat your cancer in its earliest phase. The stakes are exceptionally high, and the consequences of both scenarios could be devastating. Can an algorithm do better?

Until recently, computers have been creating programs using “Conventional Programming.” Conventional programming is when the machine knows all the rules in advance. For example, imagine a machine identifying pictures of dogs from a set of twenty images. To do this, it must understand what defines a “dog.” It can be programmed with criteria like fur, a snout, a tail, four legs, and floppy ears. However, this approach has limitations. What if the dog is sitting, hiding its hind legs? Or if it has pointed ears instead of droopy ones? Or if its back is turned? And how does the computer differentiate dogs from other objects with similar features? One solution could be listing all possible scenarios for the computer to consider — using conventional programming — but this would be overwhelming and impractical due to the immense volume of information and code required. A better solution is needed.

This is where neural networks come in. Supervised learning is a subset of neural networks. It is a trial-and-error technique in which the computer builds on feedback from an external party — humans. This type of learning operates on the premise that the machine does not need to know all the rules beforehand. Initially, the machine has limited training data about what a “dog” looks like. A picture is entered into the system, and the algorithm identifies it. After each attempt, trainers provide feedback to the network. The feedback is received, and the algorithm adapts to the new information, discarding incorrect criteria and reinforcing those that work. Another image is input and processed by the adjusted program. This process repeats until the accuracy is close to perfection.

Within supervised learning, two additional types of algorithms produce different outputs. They are “Classification” and “Regression.” Classification in machine learning involves sorting data into predefined categories or labels based on its characteristics. It aims to answer questions like “What is this?” by assigning data points to specific classes. For instance, classification can be used to determine whether an incoming email is spam, classify images like faces, documents, or trees, or evaluate website activity to identify high-value customers among visitors. By training algorithms on labelled examples, classification streamlines decision-making processes by categorising data efficiently. This is precisely what is needed for identifying cancerous cells in breast tissues. Regression follows a similar approach but predicts numerical quantities like “how much,” how many,” and “how long.” It is commonly used to forecast scenarios such as the sale price of a home, considering factors like location, square footage, condition, and proximity to amenities.

Identifying and categorising breast tissue biopsies using machine learning necessitates supervised learning and classification. Training data of images of different severities of cancerous cells would need to be inputted into the computer separately based on the threat level. These images are “accurate” and “clean” information. After analysing each image and its corresponding severity, a set of criteria is established and used to help identify unlabelled pictures. Like the example with dog images, when an unlabelled picture is inserted into the program, the computer must determine the seriousness of the cancerous cells in breast tissue, if present. If the computer classifies the data correctly, it reinforces the effective method, and if incorrect, it discards it. This process iterates until the accuracy is very high and the trained model is prepared for diagnosing real-life patients.