LunaNotes

Download Subtitles for Health Care Data Analytics Lecture C

Health Care Data Analytics: Unit 6: Machine Learning and Natural Language Processing - Lecture C

Health Care Data Analytics: Unit 6: Machine Learning and Natural Language Processing - Lecture C

Dr Chris Paton - Digital Health, Informatics & AI

460 segments EN

SRT - Most compatible format for video players (VLC, media players, video editors)

VTT - Web Video Text Tracks for HTML5 video and browsers

TXT - Plain text with timestamps for easy reading and editing

Subtitle Preview

Scroll to view all subtitles

[00:00]

welcome to component 24 healthcare data

[00:03]

analytics

[00:04]

this is unit 6 machine learning and

[00:06]

natural language processing and lecture

[00:09]

c

[00:10]

the component healthcare data analytics

[00:13]

covers the topic of healthcare data

[00:15]

analytics which applies the use of data

[00:17]

statistical and quantitative analysis

[00:20]

and explanatory and predictive models to

[00:22]

drive decisions and actions in

[00:24]

healthcare

[00:25]

the learning objectives for this unit

[00:27]

machine learning and natural language

[00:29]

processing

[00:30]

are to describe the major tasks for

[00:33]

which machine learning is used

[00:35]

compare and contrast the major

[00:37]

approaches for machine learning

[00:39]

describe the major tasks for which

[00:41]

natural language processing is used

[00:44]

and discuss the major approaches and

[00:46]

challenges for processing clinical

[00:48]

narratives

[00:50]

in the last lecture we started our

[00:52]

discussion of natural language

[00:54]

processing or nlp of clinical text we

[00:58]

began by looking at basic definitions

[01:00]

and approaches to nlp

[01:02]

this was followed by challenges in

[01:04]

processing the clinical narrative

[01:06]

now we will discuss various clinical nlp

[01:09]

approaches and projects

[01:10]

as well as describe alternatives and

[01:13]

future directions

[01:15]

let's talk about clinical nlp approaches

[01:17]

and projects

[01:19]

we'll begin by discussing a couple

[01:21]

original nlp projects

[01:23]

the linguistic string project which was

[01:25]

one of the first

[01:26]

large-scale attempts at clinical nlp and

[01:29]

the medical language extraction and

[01:31]

encoding

[01:31]

or medley system medley was developed

[01:34]

after the linguistic string project

[01:36]

and is used in operational clinical

[01:38]

settings

[01:40]

we'll describe some other nlp systems

[01:43]

we will also discuss a couple important

[01:46]

projects

[01:47]

the electronic medical records and

[01:48]

genomes or emerge network

[01:51]

and the i2b2 challenge evaluations

[01:54]

that'll be followed by a description of

[01:56]

some other research on nlp

[01:58]

issues and the results obtained also

[02:01]

there are a growing number of commercial

[02:03]

systems that have become available

[02:05]

including some integrated into ehr

[02:07]

products

[02:09]

as noted on the last slide the

[02:11]

linguistic string

[02:12]

project was one of the first large-scale

[02:14]

attempts to do nlp over clinical text

[02:17]

the project was started by sagar and

[02:19]

colleagues in the 1980s

[02:21]

and was based on work she and her

[02:23]

colleagues had done in analyzing

[02:25]

clinical documents

[02:26]

there were some presumptions about

[02:28]

clinical documents that the system was

[02:30]

built around

[02:31]

one was that technical documents in a

[02:33]

single field such as medicine

[02:35]

used only that subset of english grammar

[02:38]

and vocabulary that she called a

[02:39]

subgrammar

[02:41]

in fact in analyzing large numbers of

[02:43]

documents

[02:44]

it was believed that essentially all

[02:46]

statements in clinical documents could

[02:48]

be reduced to one of six

[02:49]

information formats these formats were

[02:52]

general medical management

[02:54]

treatment other than medication

[02:56]

medication

[02:58]

test and results patient state and

[03:00]

patient behavior

[03:03]

the system went through a number of

[03:04]

steps that would aim to take clinical

[03:06]

language and

[03:07]

map it into the meaning encoded in these

[03:09]

information formats

[03:11]

the first step was parsing which

[03:13]

consisted of labeling each word with a

[03:15]

syntactic category

[03:17]

such as verb noun etc the next step was

[03:20]

choosing a sub language that helped

[03:22]

disambiguating the words and sentences

[03:25]

this was followed by regularization of

[03:27]

the language

[03:28]

so the words were normalized into

[03:30]

equivalent forms

[03:32]

finally there was an information

[03:34]

formatting step where one of the six

[03:36]

information formats was selected if the

[03:39]

system unambiguously mapped into one of

[03:41]

those formats

[03:42]

it was then entered into a database this

[03:45]

slide shows an example for one of the

[03:47]

information formats

[03:49]

the medication information format is one

[03:52]

of the simpler ones

[03:53]

so it is more easily visible on a slide

[03:56]

the original text in the clinical

[03:58]

document was

[03:59]

patient was treated by ampicillin 500

[04:02]

milligrams

[04:02]

tid orally the medication information

[04:06]

format has slots for the patient

[04:08]

the medication the dose the frequency

[04:12]

the manner in which it was given and the

[04:14]

verb

[04:15]

as seen in the slide the text of the

[04:17]

sentence mapped into those slots

[04:19]

and gives a complete picture of one

[04:21]

medication used by this patient

[04:24]

friedman developed a different approach

[04:26]

called medley and

[04:28]

this approach used what is called a

[04:29]

semantic grammar

[04:31]

where the grammar is not focused on the

[04:33]

syntactic categories

[04:34]

but actually the semantic categories the

[04:37]

initial focus of medley was on radiology

[04:40]

reports

[04:41]

but it has been extended to quite a

[04:42]

number of other applications

[04:45]

medley goes through four steps that are

[04:47]

described in the paper cited here

[04:50]

there is a preprocessor that gets the

[04:52]

text ready for processing in a parser

[04:54]

which focuses on the semantic categories

[04:57]

such as medication and disease

[04:59]

rather than the syntactic categories

[05:02]

there is also a phrase regulizer that

[05:04]

normalizes the language

[05:06]

and then an encoder that attempts to

[05:08]

encode the language into controlled

[05:10]

vocabulary terms

[05:12]

after this processing is done the output

[05:14]

is sent to a clinical information system

[05:17]

where it may be used for statistical

[05:19]

aggregation decision support or other

[05:21]

functions

[05:23]

medley has been evaluated extensively

[05:26]

including its performance in its very

[05:28]

first task of identifying conditions in

[05:30]

chest x-ray reports

[05:32]

that first evaluation study of medley

[05:35]

looked at reports that had been coded by

[05:37]

three physicians each

[05:38]

and then compared medley with that

[05:41]

approach

[05:42]

the recall of identifying correct

[05:44]

concepts was 70 percent

[05:46]

and the precision of concepts identified

[05:48]

was 87

[05:50]

when the system was modified based on

[05:52]

these results

[05:53]

the recall improved to 85 percent while

[05:56]

the precision remained unchanged

[05:58]

a more comprehensive evaluation of

[06:00]

medley with chest x-ray reports

[06:03]

measured what they called distance which

[06:05]

was the average number of conditions per

[06:07]

report

[06:08]

where the physicians disagreed across

[06:10]

different groups of individuals

[06:12]

these groups included internists

[06:14]

radiologists

[06:16]

laypersons and then a variety of

[06:18]

computer systems what this study showed

[06:21]

was that there was variation across all

[06:23]

of these individuals and also within

[06:25]

each category

[06:26]

so variation even within internists and

[06:29]

radiologists and lay people

[06:32]

the distance from which medley was

[06:33]

relative to the human coders

[06:35]

was within the statistical confidence

[06:37]

interval meaning that the rate of

[06:39]

variation of medley term recognition and

[06:41]

assignment was no different than the

[06:43]

rate of variation between different

[06:45]

humans

[06:47]

medley has been extended to a number of

[06:49]

other applications

[06:50]

one of these is the parsing of

[06:52]

notational text

[06:54]

the terse kind of highly abbreviated

[06:56]

text that we see from specialists such

[06:58]

as ophthalmologists

[07:00]

medley was found to perform better than

[07:02]

a specialized parser for ophthalmologist

[07:04]

notes concerning glaucoma

[07:06]

for six findings related to glaucoma

[07:09]

medley had recall better than 80 percent

[07:11]

and 100

[07:12]

precision medley has also been adapted

[07:16]

to coding the locations where strokes

[07:18]

occur in the brain

[07:19]

performing comparable to manual coding

[07:22]

it's been extended to clinical documents

[07:24]

generally

[07:25]

and has also been extended to handle

[07:27]

temporal data that describes events that

[07:29]

occur

[07:30]

over time more recently it's been

[07:32]

combined with machine learning data to

[07:34]

be used in a number of areas

[07:37]

as mentioned earlier medley is used

[07:39]

operationally in new york presbyterian

[07:41]

hospital

[07:43]

there are a number of other clinical nlp

[07:45]

systems

[07:46]

five of which are listed here some of

[07:49]

these are available as open source

[07:51]

software and can be downloaded for use

[07:54]

others are being developed into

[07:55]

commercial products

[07:57]

just to describe them briefly there is

[07:59]

the high tech system that's part of the

[08:01]

i2b2 software suite

[08:03]

and is an open source system that can be

[08:05]

downloaded and used

[08:07]

there is knowledge map which is part of

[08:09]

the emerge network from vanderbilt

[08:11]

university

[08:12]

there's meta-map from the national

[08:14]

library of medicine

[08:16]

which maps text into terms from the

[08:18]

unified medical language system

[08:20]

or umls metathesaurus that can be

[08:23]

downloaded and used in an open source

[08:25]

manner

[08:26]

there is also the c-take system from

[08:28]

mayo clinic which is available as open

[08:31]

source

[08:31]

and the thai system from the university

[08:33]

of pittsburgh

[08:35]

the emerge project has mostly been used

[08:37]

in clinical research settings

[08:39]

the emerge project aims to find

[08:41]

associations between the genotype

[08:44]

that is the genes in dna and the

[08:46]

phenotype

[08:47]

which is the set of characteristics that

[08:49]

are expressed in the living organism

[08:51]

the emerge network is a consortium that

[08:54]

aims to link the growing number of dna

[08:57]

biorepositories with phenotype

[08:59]

information extracted from ehr systems

[09:02]

the overall goal is large-scale

[09:05]

high-throughput genetic research

[09:08]

one way to get phenotypes out of the ehr

[09:11]

is to use icd-9 codes that are assigned

[09:13]

in diagnosis for each encounter

[09:16]

for a variety of reasons icd-9 codes are

[09:19]

inadequate

[09:20]

and there's much richer information in

[09:22]

the text the nlp system being used by

[09:25]

the emerge network has been found to be

[09:27]

more effective in correctly identifying

[09:30]

patient phenotypes than icd-9 codes

[09:33]

alone

[09:34]

there have been many results from the

[09:35]

emerge project

[09:37]

the initial work looked at replicating

[09:39]

findings of known gene disease

[09:41]

association that could be detected in

[09:43]

ehr data

[09:44]

subsequent work led to the discovery of

[09:47]

new associations and follow-on

[09:49]

biological research to assess those

[09:51]

hypothesized associations

[09:53]

another important result from emerge was

[09:56]

that the nlp algorithms have been found

[09:58]

to be easily transportable across

[10:00]

different institutions

[10:02]

and are not so uniquely tied into one

[10:04]

institution

[10:06]

finally the emerge project has given

[10:08]

rise to a new type of analysis

[10:10]

called phenome-wide association studies

[10:13]

or fiwas

[10:14]

where many aspects of the patient

[10:16]

phenome that is

[10:18]

findings diseases and treatments that

[10:20]

the patient may have

[10:21]

are associated with a genome variant in

[10:24]

addition to the evaluations of the

[10:26]

systems already described

[10:28]

there's been another activity for the

[10:30]

clinical nlp community

[10:32]

which is the i2b2 challenge evaluations

[10:36]

challenge evaluations are common in many

[10:38]

areas of computer science

[10:40]

where a standardized task and data set

[10:42]

are developed

[10:43]

and different research groups compare

[10:45]

their results with others

[10:47]

many of these are done on an annual

[10:49]

basis which is what has happened for the

[10:51]

most part with i2b2

[10:54]

the bullets on the slides here list the

[10:56]

tasks that have been covered in the

[10:57]

challenge evaluations in the first year

[11:01]

the task was automated de-identification

[11:03]

of medical records

[11:05]

this was followed the next year by

[11:07]

identification of smoking status from

[11:09]

medical discharge summaries

[11:12]

in the following year the task focused

[11:14]

on identification of obesity

[11:16]

and its comorbidities this was followed

[11:19]

by extraction of medication information

[11:21]

the detection of relationships between

[11:23]

concepts or entities in clinical text

[11:26]

co-reference resolution and sentiment

[11:28]

classification

[11:29]

and more recently the identification of

[11:32]

temporal

[11:33]

or time-related relationships and most

[11:35]

recently

[11:36]

more de-identification as well as risk

[11:38]

factor detection

[11:40]

there has also been other research in

[11:42]

clinical nlp

[11:44]

one area of work has been negation

[11:46]

detection

[11:48]

as noted earlier clinical narratives are

[11:50]

full of negations

[11:52]

and the ability to detect those

[11:53]

negations is very important

[11:56]

there is also the important problem of

[11:58]

syndromic surveillance of possible

[12:00]

outbreaks of disease

[12:01]

that are often detected through chief

[12:03]

complaints of patients presenting to

[12:05]

emergency departments

[12:07]

there is also the detection of

[12:08]

healthcare quality measures

[12:10]

that are oftentimes derived from text in

[12:13]

the clinical record

[12:15]

another area is clinical research for

[12:17]

example

[12:18]

finding patients who may have a

[12:20]

condition such as congestive heart

[12:22]

failure

[12:23]

or patients who have something like a

[12:24]

foot examination done

[12:26]

in the presence of diabetes other

[12:28]

research has looked at identification

[12:30]

of follow-up recommendations from

[12:32]

radiology reports

[12:34]

sometimes which are not seen by

[12:36]

clinicians when they should be

[12:38]

another area of research is handling

[12:40]

variations in language

[12:42]

such as abbreviation and other ambiguity

[12:44]

that occurs in clinical text

[12:47]

another question is whether clinical nlp

[12:49]

is ready for prime time

[12:51]

especially if it's going to be used

[12:53]

outside of informatics research settings

[12:55]

where we monitor closely how accurate

[12:58]

the algorithms are

[12:59]

stanfill and colleagues performed a

[13:01]

systematic review of all

[13:03]

automated coding classification systems

[13:05]

through 2010

[13:07]

the recall and precision results of

[13:09]

those systems are shown on this slide

[13:12]

as can be seen while many systems were

[13:14]

highly accurate

[13:16]

many were not and it often depended on

[13:18]

the task being done

[13:20]

the disease or finding being studied and

[13:22]

so forth

[13:23]

but clearly if we're going to use nlp in

[13:26]

a large scale way

[13:27]

the algorithms need to be highly

[13:29]

accurate with both high recall and

[13:31]

precision

[13:33]

in closing what are alternatives to nlp

[13:36]

and future directions that may be taken

[13:39]

despite the successes that we've seen

[13:41]

we still know that clinical nlp systems

[13:44]

are limited

[13:45]

they're being difficult to generalize

[13:47]

across subject domains

[13:48]

as we often need to develop new rules

[13:51]

new data etc for each new area of use

[13:54]

in addition we still really don't know

[13:57]

how good performance must be

[13:58]

before we can use them in a clinically

[14:00]

reliable manner

[14:02]

are 95 to 98 recall and precision good

[14:06]

enough

[14:07]

to test that we have to test the systems

[14:09]

in operational clinical settings

[14:11]

one alternative to nlp is to get

[14:14]

clinicians to enter more structured data

[14:16]

on the front end

[14:18]

approaches such as menu driven systems

[14:20]

for entering clinical data have been

[14:22]

tried for years

[14:23]

but are probably best in limited domains

[14:26]

especially when they take a great deal

[14:28]

of time for busy clinicians to use

[14:31]

clearly there's an important role in the

[14:32]

future for nlp

[14:34]

and as pointed out in the paper by

[14:36]

chapman and colleagues

[14:38]

we need tools and shared tasks to define

[14:40]

that ultimate role

[14:42]

this concludes lecture c of machine

[14:45]

learning and natural language processing

[14:47]

in summarizing this lecture we learned

[14:50]

there have been many clinical nlp

[14:52]

systems but

[14:54]

only a small number are used

[14:55]

operationally for clinical care or

[14:57]

research

[14:59]

the performance of clinical nlp systems

[15:01]

is imperfect

[15:02]

and the adequate level of performance

[15:04]

for clinical use is not known

[15:07]

further research is required to

[15:09]

determine the optimal use of nlp in

[15:11]

healthcare

[15:12]

this also concludes the unit on machine

[15:15]

learning and natural language processing

[15:17]

in summarizing this unit we've seen that

[15:20]

being able to learn from data and

[15:22]

process data within text

[15:24]

are important aspects of applying data

[15:26]

analytics to healthcare

[15:28]

machine learning is the field focused on

[15:30]

learning from data

[15:32]

and can occur in a supervised or

[15:33]

unsupervised manner

[15:36]

natural language processing is the area

[15:38]

that aims to understand the text in

[15:40]

natural languages

[15:41]

and has many challenges in the clinical

[15:52]

domain

[16:35]

you

Download Subtitles

These subtitles were extracted using the Free YouTube Subtitle Downloader by LunaNotes.

Download more subtitles

Related Videos

Download Subtitles for Health Care Data Analytics Lecture B

Download Subtitles for Health Care Data Analytics Lecture B

Access accurate and easily downloadable subtitles for the Health Care Data Analytics Unit 6 lecture on Machine Learning and Natural Language Processing. Enhance your learning experience with clear captions that help you understand complex topics and improve retention.

Download Subtitles for All Machine Learning Concepts Video

Download Subtitles for All Machine Learning Concepts Video

Enhance your understanding by downloading accurate subtitles for the 'All Machine Learning Concepts Explained in 22 Minutes' video. Access clear captions to follow complex topics with ease and improve your learning experience.

Download Subtitles for Lesson 1: Understanding Healthcare Time Crisis

Download Subtitles for Lesson 1: Understanding Healthcare Time Crisis

Enhance your learning experience by downloading accurate subtitles for Lesson 1 - Understanding the Time Crisis in Healthcare. Subtitles improve comprehension and accessibility, making it easier to follow complex healthcare topics. Perfect for students and professionals seeking deeper insight.

Download Subtitles for Harvard CS50 2026 Computer Science Course

Download Subtitles for Harvard CS50 2026 Computer Science Course

Enhance your learning experience with downloadable subtitles for the Harvard CS50 2026 full computer science course. Easily follow along with lectures, improve comprehension, and access the content offline anytime. Perfect for students and enthusiasts aiming to master computer science concepts.

MASTERCLASS 2026 Subtitles Download - June 11 Session Captions

MASTERCLASS 2026 Subtitles Download - June 11 Session Captions

Download accurate subtitles for the MASTERCLASS held on June 11, 2026, to enhance your learning experience. Access clear captions that help you follow the session easily and revisit key points anytime. Improve comprehension and accessibility with our high-quality subtitles.

Most Viewed

Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen

Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen

Laden Sie die Untertitel für den gesamten Film 'Nicos Weg' herunter, um Ihr Deutschlernen auf A1 Niveau zu unterstützen. Untertitel helfen Ihnen, Wortschatz und Aussprache besser zu verstehen und verbessern das Hörverständnis effektiv.

ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1

ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1

ดาวน์โหลดซับไตเติ้ลสำหรับวิดีโอ DMD LAND 3 The Final Land Day 1 เพื่อช่วยให้เข้าใจเนื้อหาได้ง่ายขึ้น และเพิ่มความสะดวกในการติดตามทุกช่วงเวลา เหมาะสำหรับผู้ชมที่ต้องการความชัดเจนและเข้าถึงข้อมูลอย่างครบถ้วน

Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63

Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63

Accede fácilmente a los subtítulos del episodio 63 de '6 DE COPAS', centrado en el narcisismo. Descargar estos subtítulos te ayudará a entender mejor el contenido y mejorar la experiencia de visualización.

Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56

Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56

Descarga los subtítulos para el episodio 56 de la tercera temporada de 6 DE COPAS, centrado en los tipos de apego. Mejora tu comprensión y disfruta del contenido en detalle con nuestros subtítulos precisos y accesibles.

Download Subtitles for Your Favorite Videos Easily

Download Subtitles for Your Favorite Videos Easily

Enhance your video watching experience by downloading accurate subtitles and captions. Enjoy better understanding, accessibility, and language support for all your favorite videos.

Buy us a coffee

If you found these subtitles useful, consider buying us a coffee. It would help us a lot!

Let's Try!

Start Taking Better Notes Today with LunaNotes!