< Back to September Cutting Edge
Surgeon vs Machine or Surgeon with Machine:
Machine-Learning Applied to Acute Care Surgery
Caroline Park, MD, MPH, FACS1
Vikas Chowdhry, MBA2
1Department of Surgery, University of Texas Southwestern Medical Center/Parkland Memorial Hospital, Dallas, Texas
As Acute Care Surgeons, we constantly triage, resuscitate and operate with incomplete data. We need methods to inform the best care using patient-specific data and risk assessment. Further, we want data that helps us assess risk of venous thrombo-embolism (1), risk of mortality (2-4), sepsis and other important outcomes BEFORE they happen. The main dilemma is that this data is not as helpful 30, 90 days, or 1 year later. Machine-learning, when used properly, can help triage and risk stratify patients in real-time.
One of the most pressing and time-sensitive issues is resuscitation and assessment of trauma mortality within the first 24-48 hours. Most trauma mortality scoring systems or risk-assessments are retrospective, or may require lengthy calculations that normally do not occur in real-time and are calculated retrospectively. This includes TRISS (5), Trauma Score (6) and the Injury Severity Score, Abbreviated Injury Scales (AIS), and ASCOT (7). All of these validated scales and scoring systems are tremendously helpful in grading injury severity of our patients at M&M or performance improvement conference, but they are not useful to better treat the patient in front of you.
Machine-Learning - What is it?
Machine Learning (ML) is a subfield of Artificial Intelligence (AI) and refers to a set of techniques that enable computers to acquire knowledge from data using mathematical models. When humans first label data (input) to indicate the desired outcome (output), it is called supervised learning. Unsupervised learning occurs when the system, such as an EMR, instead analyzes patterns in the data without human input.
In simpler terms, “Machine learning is a subfield of artificial intelligence that gives computers the ability to learn without explicitly being programmed.”(8)
The more data it has, the more ‘tries’ it has to improve. For example, the chihuahua vs blueberry meme (9) is a good example of pattern recognition in supervised learning. Humans can detect two eyes and a nose with other features, like a snout and ears. A machine-learning algorithm in its early training phases might see all blueberry muffins as chihuahuas. However, the more data it processes to ‘learn’ these subtle features, the more accurate it becomes in distinguishing between an animal from a pastry.
Fig 1. Grid containing chihuahua versus blueberry muffins. https://twitter.com/teenybiscuit/status/707727863571582978/photo/1 (Accessed August 1st, 2022)
Machine Learning 101 – Types and Uses
It’s already here, probably in your hand as you read this article.
- Predictive text: you’re texting thora- to your colleague, and it suggests ‘thoracotomy’.
- Image recognition: The face ID option when you sign-in to your phone-even with your mask on. (10)
- Autonomous driving: using multiple sensors and calculations, cars can now park and drive themselves.
The list goes on.
Types of models (and how they work)
The types of models used for ML include Regression, Bayesian Networks, Decision Trees, and Artificial Neural Networks (ANN). Deep Learning, a family of techniques that is now broadly used from image recognition to Natural Language Processing, is based on ANN. These models vary substantially in their mathematical techniques, but they fundamentally “learn” the same way – through acquiring knowledge from experience--where experience equals data.
A simple example is a ML model that classifies your email as spam. To develop that model, a large dataset of emails (called “training data”) that have been previously classified as spam or not-spam by humans is fed to the algorithm. Based on various factors in emails (called ‘input’ features) such as subject, sender’s domain, certain words in the body of the email, the model learns the patterns that indicate whether an email is spam or not (the output required from the model).
The model is then tested on a separate dataset called “test data” to see how well the model performs on data it has not yet encountered. It is expected that the model will continue to improve through new data that an email vendor like Microsoft or Google are constantly gathering based on your interaction with emails. When you press the ‘spam’ button to mark an email as spam when the system inaccurately classifies it as not-spam (or vice-versa), you are providing labeled data to Microsoft for training future iterations of the model and to hopefully improve its performance.
Pros and Cons
Statistical performance of these models can vary widely based on the type of task at hand and the quality of data. Smaller data sets lead to decreased performance.
Machine-learning can save time with complex calculations for one patient or thousands.
Example: Embedding Caprini Scores into an EMR algorithm to flag patients at risk for venous thromboembolism (VTE) and offering suggested VTE prophylaxis warnings for patients who should be anticoagulated as soon as possible automatically through the EMR.
It can help flag patients that may need additional resources and alerting providers for additional resources.
Example: A model that flags patients based on pre-hospital or admission vitals who is at risk of massive transfusion protocol and alerting providers and pertinent ancillary resources (i.e. Blood Bank) to start thawing plasma before the order is placed.
There are often trade-offs between statistical performance and other factors that can contribute to usability and acceptability. Those criteria should be explicitly discussed with all stakeholders who will be using this model.
Example: a sepsis calculator is built into an EMR and flags patients who meet criteria (leukocytosis, tachypnea, hypotension, etc..) and triggers sepsis in a post-op day 1 trauma patient who is bleeding or in pain. The positive predictive value may be very high but at the expensive of a lower negative predictive value. Poor models may also lead to alert fatigue leading providers to ignore multiple warnings.
How are they used well (and used poorly)
Models that are widely adopted and contribute to positive outcomes are the ones that are complementary to humans and are well integrated into their workflow tools. Poorly-used models fail to account for those factors and may focus too much on the statistical aspects of the model at the expense of other factors.
Example: a 30-day readmission model that simply appears as a probability number without any additional information on the underlying factors is not as helpful. Providers cannot identify the top reasons for re-admission and target these areas for improvement.
It is also critical to understand that these models are learning patterns based on clinical practices. Any biases or care practices that do not necessarily follow standards of care will naturally influence a machine-learning model. A good implementation team will include data scientists who are aware of these challenges and can apply statistical techniques to measure (and in some instances correct) these biases or at least allow easy identification of deviations for human review
Anticipated Barriers to Implementation - What questions should I ask?
A machine-learning model that is integrated into your facility’s clinical systems will need strong support and approval from various parts of your organization. The barriers to implementation will be higher if the model is not out-of-the-box integrated with your EMR (it may have been developed by the research or informatics team). The following questions can help:
- What problems does this model solve for the organization?
- How does it help the end-users of the model?
- Does it impact clinical autonomy of end-users in any way?
- What is the level of effort needed to integrate with the EMR?
- If the model is offered by an outside vendor, does it come pre-integrated with your EMR vendor?
- If something goes wrong with the model, who will support it?
The Future for Acute Care Surgery
Currently there are few models focused on acute care surgery patients and fewer integrated with the electronic medical record (EMR). However, there are general purpose models that can be utilized for these workflows. Examples include models for early diagnosis and prediction of sepsis, post-surgical infections, falls and early warning scores such as Rothman Index (11). Given that these models were not specifically validated on trauma cohorts, their statistical performance will vary but offer an immediate opportunity to implement machine-learning in acute care surgery patients.
- He L, Luo L, Hou X, Liao D, Liu R, Ouyang C, Wang G. Predicting venous thromboembolism in hospitalized trauma patients: a combination of the Caprini score and data-driven machine learning model. BMC Emerg Med. 2021 May 10;21(1):60. doi: 10.1186/s12873-021-00447-x. PMID: 33971809; PMCID: PMC8111727.
- Lang EW, Pitts LH, Damron SL, Rutledge R. Outcome after severe head injury: an analysis of prediction based upon comparison of neural network versus logistic regression analysis. Neurol Res. 1997 Jun;19(3):274-80. doi: 10.1080/01616412.1997.11740813. PMID: 9192380.
- Nederpelt, Charlie J. BSc; Mokhtari, Ava K. MSc; Alser, Osaid MD, MSc(Oxon); Tsiligkaridis, Theodoros PhD; Roberts, Jay PhD; Cha, Miriam PhD; Fawley, Jason A. MD; Parks, Jonathan J. MD; Mendoza, April E. MD, MPH; Fagenholz, Peter J. MD; Kaafarani, Haytham M.A. MD, MPH; King, David R. MD; Velmahos, George C. MD, PhD, MSEd; Saillant, Noelle MD. Development of a field artificial intelligence triage tool: Confidence in the prediction of shock, transfusion, and definitive surgical therapy in patients with truncal gunshot wounds. Journal of Trauma and Acute Care Surgery: June 2021 - Volume 90 - Issue 6 - p 1054-1060 doi: 10.1097/TA.0000000000003155
- Maurer LR, Bertsimas D, Bouardi HT, El Hechi M, El Moheb M, Giannoutsou K, Zhuo D, Dunn J, Velmahos GC, Kaafarani HMA. Trauma outcome predictor: An artificial intelligence interactive smartphone tool to predict outcomes in trauma patients. J Trauma Acute Care Surg. 2021 Jul 1;91(1):93-99. doi: 10.1097/TA.0000000000003158. PMID: 33755641.
- Boyd CR, Tolson MA, Copes WS. Evaluating trauma care: the TRISS method. Trauma Score and the Injury Severity Score. J Trauma. 1987; 27(4):370-8.
- CHAMPION, HOWARD R. FRCS (Ed); SACCO, WILLIAM J. PHD; CARNAZZO, ANTHONY J. MD; COPES, WAYNE PHD; FOUTY, WILLIAM J. MD. Trauma score. Critical Care Medicine: September 1981 - Volume 9 - Issue 9 - p 672-676
- Champion HR, Copes WS, Sacco WJ, Lawnick MM, Bain LW, Gann DS, Gennarelli T, Mackenzie E, Schwaitzberg S. A new characterization of injury severity. J Trauma. 1990 May;30(5):539-45; discussion 545-6. doi: 10.1097/00005373-199005000-00003. PMID: 2342136.
- https://twitter.com/teenybiscuit/status/707727863571582978/photo/1 (Accessed August 1st, 2022)
- Mancosu M, Bobba G. Using deep-learning algorithms to derive basic characteristics of social media users: The Brexit campaign as a case study. PLoS One. 2019 Jan 25;14(1):e0211013. doi: 10.1371/journal.pone.0211013. PMID: 30682111; PMCID: PMC6347201.
- Rothman Index. https://clinicaltrials.gov/ct2/show/NCT04403737