Download Subtitles for Health Care Data Analytics Lecture C
Health Care Data Analytics: Unit 6: Machine Learning and Natural Language Processing - Lecture C
Dr Chris Paton - Digital Health, Informatics & AI
SRT - Most compatible format for video players (VLC, media players, video editors)
VTT - Web Video Text Tracks for HTML5 video and browsers
TXT - Plain text with timestamps for easy reading and editing
Scroll to view all subtitles
welcome to component 24 healthcare data
analytics
this is unit 6 machine learning and
natural language processing and lecture
c
the component healthcare data analytics
covers the topic of healthcare data
analytics which applies the use of data
statistical and quantitative analysis
and explanatory and predictive models to
drive decisions and actions in
healthcare
the learning objectives for this unit
machine learning and natural language
processing
are to describe the major tasks for
which machine learning is used
compare and contrast the major
approaches for machine learning
describe the major tasks for which
natural language processing is used
and discuss the major approaches and
challenges for processing clinical
narratives
in the last lecture we started our
discussion of natural language
processing or nlp of clinical text we
began by looking at basic definitions
and approaches to nlp
this was followed by challenges in
processing the clinical narrative
now we will discuss various clinical nlp
approaches and projects
as well as describe alternatives and
future directions
let's talk about clinical nlp approaches
and projects
we'll begin by discussing a couple
original nlp projects
the linguistic string project which was
one of the first
large-scale attempts at clinical nlp and
the medical language extraction and
encoding
or medley system medley was developed
after the linguistic string project
and is used in operational clinical
settings
we'll describe some other nlp systems
we will also discuss a couple important
projects
the electronic medical records and
genomes or emerge network
and the i2b2 challenge evaluations
that'll be followed by a description of
some other research on nlp
issues and the results obtained also
there are a growing number of commercial
systems that have become available
including some integrated into ehr
products
as noted on the last slide the
linguistic string
project was one of the first large-scale
attempts to do nlp over clinical text
the project was started by sagar and
colleagues in the 1980s
and was based on work she and her
colleagues had done in analyzing
clinical documents
there were some presumptions about
clinical documents that the system was
built around
one was that technical documents in a
single field such as medicine
used only that subset of english grammar
and vocabulary that she called a
subgrammar
in fact in analyzing large numbers of
documents
it was believed that essentially all
statements in clinical documents could
be reduced to one of six
information formats these formats were
general medical management
treatment other than medication
medication
test and results patient state and
patient behavior
the system went through a number of
steps that would aim to take clinical
language and
map it into the meaning encoded in these
information formats
the first step was parsing which
consisted of labeling each word with a
syntactic category
such as verb noun etc the next step was
choosing a sub language that helped
disambiguating the words and sentences
this was followed by regularization of
the language
so the words were normalized into
equivalent forms
finally there was an information
formatting step where one of the six
information formats was selected if the
system unambiguously mapped into one of
those formats
it was then entered into a database this
slide shows an example for one of the
information formats
the medication information format is one
of the simpler ones
so it is more easily visible on a slide
the original text in the clinical
document was
patient was treated by ampicillin 500
milligrams
tid orally the medication information
format has slots for the patient
the medication the dose the frequency
the manner in which it was given and the
verb
as seen in the slide the text of the
sentence mapped into those slots
and gives a complete picture of one
medication used by this patient
friedman developed a different approach
called medley and
this approach used what is called a
semantic grammar
where the grammar is not focused on the
syntactic categories
but actually the semantic categories the
initial focus of medley was on radiology
reports
but it has been extended to quite a
number of other applications
medley goes through four steps that are
described in the paper cited here
there is a preprocessor that gets the
text ready for processing in a parser
which focuses on the semantic categories
such as medication and disease
rather than the syntactic categories
there is also a phrase regulizer that
normalizes the language
and then an encoder that attempts to
encode the language into controlled
vocabulary terms
after this processing is done the output
is sent to a clinical information system
where it may be used for statistical
aggregation decision support or other
functions
medley has been evaluated extensively
including its performance in its very
first task of identifying conditions in
chest x-ray reports
that first evaluation study of medley
looked at reports that had been coded by
three physicians each
and then compared medley with that
approach
the recall of identifying correct
concepts was 70 percent
and the precision of concepts identified
was 87
when the system was modified based on
these results
the recall improved to 85 percent while
the precision remained unchanged
a more comprehensive evaluation of
medley with chest x-ray reports
measured what they called distance which
was the average number of conditions per
report
where the physicians disagreed across
different groups of individuals
these groups included internists
radiologists
laypersons and then a variety of
computer systems what this study showed
was that there was variation across all
of these individuals and also within
each category
so variation even within internists and
radiologists and lay people
the distance from which medley was
relative to the human coders
was within the statistical confidence
interval meaning that the rate of
variation of medley term recognition and
assignment was no different than the
rate of variation between different
humans
medley has been extended to a number of
other applications
one of these is the parsing of
notational text
the terse kind of highly abbreviated
text that we see from specialists such
as ophthalmologists
medley was found to perform better than
a specialized parser for ophthalmologist
notes concerning glaucoma
for six findings related to glaucoma
medley had recall better than 80 percent
and 100
precision medley has also been adapted
to coding the locations where strokes
occur in the brain
performing comparable to manual coding
it's been extended to clinical documents
generally
and has also been extended to handle
temporal data that describes events that
occur
over time more recently it's been
combined with machine learning data to
be used in a number of areas
as mentioned earlier medley is used
operationally in new york presbyterian
hospital
there are a number of other clinical nlp
systems
five of which are listed here some of
these are available as open source
software and can be downloaded for use
others are being developed into
commercial products
just to describe them briefly there is
the high tech system that's part of the
i2b2 software suite
and is an open source system that can be
downloaded and used
there is knowledge map which is part of
the emerge network from vanderbilt
university
there's meta-map from the national
library of medicine
which maps text into terms from the
unified medical language system
or umls metathesaurus that can be
downloaded and used in an open source
manner
there is also the c-take system from
mayo clinic which is available as open
source
and the thai system from the university
of pittsburgh
the emerge project has mostly been used
in clinical research settings
the emerge project aims to find
associations between the genotype
that is the genes in dna and the
phenotype
which is the set of characteristics that
are expressed in the living organism
the emerge network is a consortium that
aims to link the growing number of dna
biorepositories with phenotype
information extracted from ehr systems
the overall goal is large-scale
high-throughput genetic research
one way to get phenotypes out of the ehr
is to use icd-9 codes that are assigned
in diagnosis for each encounter
for a variety of reasons icd-9 codes are
inadequate
and there's much richer information in
the text the nlp system being used by
the emerge network has been found to be
more effective in correctly identifying
patient phenotypes than icd-9 codes
alone
there have been many results from the
emerge project
the initial work looked at replicating
findings of known gene disease
association that could be detected in
ehr data
subsequent work led to the discovery of
new associations and follow-on
biological research to assess those
hypothesized associations
another important result from emerge was
that the nlp algorithms have been found
to be easily transportable across
different institutions
and are not so uniquely tied into one
institution
finally the emerge project has given
rise to a new type of analysis
called phenome-wide association studies
or fiwas
where many aspects of the patient
phenome that is
findings diseases and treatments that
the patient may have
are associated with a genome variant in
addition to the evaluations of the
systems already described
there's been another activity for the
clinical nlp community
which is the i2b2 challenge evaluations
challenge evaluations are common in many
areas of computer science
where a standardized task and data set
are developed
and different research groups compare
their results with others
many of these are done on an annual
basis which is what has happened for the
most part with i2b2
the bullets on the slides here list the
tasks that have been covered in the
challenge evaluations in the first year
the task was automated de-identification
of medical records
this was followed the next year by
identification of smoking status from
medical discharge summaries
in the following year the task focused
on identification of obesity
and its comorbidities this was followed
by extraction of medication information
the detection of relationships between
concepts or entities in clinical text
co-reference resolution and sentiment
classification
and more recently the identification of
temporal
or time-related relationships and most
recently
more de-identification as well as risk
factor detection
there has also been other research in
clinical nlp
one area of work has been negation
detection
as noted earlier clinical narratives are
full of negations
and the ability to detect those
negations is very important
there is also the important problem of
syndromic surveillance of possible
outbreaks of disease
that are often detected through chief
complaints of patients presenting to
emergency departments
there is also the detection of
healthcare quality measures
that are oftentimes derived from text in
the clinical record
another area is clinical research for
example
finding patients who may have a
condition such as congestive heart
failure
or patients who have something like a
foot examination done
in the presence of diabetes other
research has looked at identification
of follow-up recommendations from
radiology reports
sometimes which are not seen by
clinicians when they should be
another area of research is handling
variations in language
such as abbreviation and other ambiguity
that occurs in clinical text
another question is whether clinical nlp
is ready for prime time
especially if it's going to be used
outside of informatics research settings
where we monitor closely how accurate
the algorithms are
stanfill and colleagues performed a
systematic review of all
automated coding classification systems
through 2010
the recall and precision results of
those systems are shown on this slide
as can be seen while many systems were
highly accurate
many were not and it often depended on
the task being done
the disease or finding being studied and
so forth
but clearly if we're going to use nlp in
a large scale way
the algorithms need to be highly
accurate with both high recall and
precision
in closing what are alternatives to nlp
and future directions that may be taken
despite the successes that we've seen
we still know that clinical nlp systems
are limited
they're being difficult to generalize
across subject domains
as we often need to develop new rules
new data etc for each new area of use
in addition we still really don't know
how good performance must be
before we can use them in a clinically
reliable manner
are 95 to 98 recall and precision good
enough
to test that we have to test the systems
in operational clinical settings
one alternative to nlp is to get
clinicians to enter more structured data
on the front end
approaches such as menu driven systems
for entering clinical data have been
tried for years
but are probably best in limited domains
especially when they take a great deal
of time for busy clinicians to use
clearly there's an important role in the
future for nlp
and as pointed out in the paper by
chapman and colleagues
we need tools and shared tasks to define
that ultimate role
this concludes lecture c of machine
learning and natural language processing
in summarizing this lecture we learned
there have been many clinical nlp
systems but
only a small number are used
operationally for clinical care or
research
the performance of clinical nlp systems
is imperfect
and the adequate level of performance
for clinical use is not known
further research is required to
determine the optimal use of nlp in
healthcare
this also concludes the unit on machine
learning and natural language processing
in summarizing this unit we've seen that
being able to learn from data and
process data within text
are important aspects of applying data
analytics to healthcare
machine learning is the field focused on
learning from data
and can occur in a supervised or
unsupervised manner
natural language processing is the area
that aims to understand the text in
natural languages
and has many challenges in the clinical
domain
you
Full transcript without timestamps
welcome to component 24 healthcare data analytics this is unit 6 machine learning and natural language processing and lecture c the component healthcare data analytics covers the topic of healthcare data analytics which applies the use of data statistical and quantitative analysis and explanatory and predictive models to drive decisions and actions in healthcare the learning objectives for this unit machine learning and natural language processing are to describe the major tasks for which machine learning is used compare and contrast the major approaches for machine learning describe the major tasks for which natural language processing is used and discuss the major approaches and challenges for processing clinical narratives in the last lecture we started our discussion of natural language processing or nlp of clinical text we began by looking at basic definitions and approaches to nlp this was followed by challenges in processing the clinical narrative now we will discuss various clinical nlp approaches and projects as well as describe alternatives and future directions let's talk about clinical nlp approaches and projects we'll begin by discussing a couple original nlp projects the linguistic string project which was one of the first large-scale attempts at clinical nlp and the medical language extraction and encoding or medley system medley was developed after the linguistic string project and is used in operational clinical settings we'll describe some other nlp systems we will also discuss a couple important projects the electronic medical records and genomes or emerge network and the i2b2 challenge evaluations that'll be followed by a description of some other research on nlp issues and the results obtained also there are a growing number of commercial systems that have become available including some integrated into ehr products as noted on the last slide the linguistic string project was one of the first large-scale attempts to do nlp over clinical text the project was started by sagar and colleagues in the 1980s and was based on work she and her colleagues had done in analyzing clinical documents there were some presumptions about clinical documents that the system was built around one was that technical documents in a single field such as medicine used only that subset of english grammar and vocabulary that she called a subgrammar in fact in analyzing large numbers of documents it was believed that essentially all statements in clinical documents could be reduced to one of six information formats these formats were general medical management treatment other than medication medication test and results patient state and patient behavior the system went through a number of steps that would aim to take clinical language and map it into the meaning encoded in these information formats the first step was parsing which consisted of labeling each word with a syntactic category such as verb noun etc the next step was choosing a sub language that helped disambiguating the words and sentences this was followed by regularization of the language so the words were normalized into equivalent forms finally there was an information formatting step where one of the six information formats was selected if the system unambiguously mapped into one of those formats it was then entered into a database this slide shows an example for one of the information formats the medication information format is one of the simpler ones so it is more easily visible on a slide the original text in the clinical document was patient was treated by ampicillin 500 milligrams tid orally the medication information format has slots for the patient the medication the dose the frequency the manner in which it was given and the verb as seen in the slide the text of the sentence mapped into those slots and gives a complete picture of one medication used by this patient friedman developed a different approach called medley and this approach used what is called a semantic grammar where the grammar is not focused on the syntactic categories but actually the semantic categories the initial focus of medley was on radiology reports but it has been extended to quite a number of other applications medley goes through four steps that are described in the paper cited here there is a preprocessor that gets the text ready for processing in a parser which focuses on the semantic categories such as medication and disease rather than the syntactic categories there is also a phrase regulizer that normalizes the language and then an encoder that attempts to encode the language into controlled vocabulary terms after this processing is done the output is sent to a clinical information system where it may be used for statistical aggregation decision support or other functions medley has been evaluated extensively including its performance in its very first task of identifying conditions in chest x-ray reports that first evaluation study of medley looked at reports that had been coded by three physicians each and then compared medley with that approach the recall of identifying correct concepts was 70 percent and the precision of concepts identified was 87 when the system was modified based on these results the recall improved to 85 percent while the precision remained unchanged a more comprehensive evaluation of medley with chest x-ray reports measured what they called distance which was the average number of conditions per report where the physicians disagreed across different groups of individuals these groups included internists radiologists laypersons and then a variety of computer systems what this study showed was that there was variation across all of these individuals and also within each category so variation even within internists and radiologists and lay people the distance from which medley was relative to the human coders was within the statistical confidence interval meaning that the rate of variation of medley term recognition and assignment was no different than the rate of variation between different humans medley has been extended to a number of other applications one of these is the parsing of notational text the terse kind of highly abbreviated text that we see from specialists such as ophthalmologists medley was found to perform better than a specialized parser for ophthalmologist notes concerning glaucoma for six findings related to glaucoma medley had recall better than 80 percent and 100 precision medley has also been adapted to coding the locations where strokes occur in the brain performing comparable to manual coding it's been extended to clinical documents generally and has also been extended to handle temporal data that describes events that occur over time more recently it's been combined with machine learning data to be used in a number of areas as mentioned earlier medley is used operationally in new york presbyterian hospital there are a number of other clinical nlp systems five of which are listed here some of these are available as open source software and can be downloaded for use others are being developed into commercial products just to describe them briefly there is the high tech system that's part of the i2b2 software suite and is an open source system that can be downloaded and used there is knowledge map which is part of the emerge network from vanderbilt university there's meta-map from the national library of medicine which maps text into terms from the unified medical language system or umls metathesaurus that can be downloaded and used in an open source manner there is also the c-take system from mayo clinic which is available as open source and the thai system from the university of pittsburgh the emerge project has mostly been used in clinical research settings the emerge project aims to find associations between the genotype that is the genes in dna and the phenotype which is the set of characteristics that are expressed in the living organism the emerge network is a consortium that aims to link the growing number of dna biorepositories with phenotype information extracted from ehr systems the overall goal is large-scale high-throughput genetic research one way to get phenotypes out of the ehr is to use icd-9 codes that are assigned in diagnosis for each encounter for a variety of reasons icd-9 codes are inadequate and there's much richer information in the text the nlp system being used by the emerge network has been found to be more effective in correctly identifying patient phenotypes than icd-9 codes alone there have been many results from the emerge project the initial work looked at replicating findings of known gene disease association that could be detected in ehr data subsequent work led to the discovery of new associations and follow-on biological research to assess those hypothesized associations another important result from emerge was that the nlp algorithms have been found to be easily transportable across different institutions and are not so uniquely tied into one institution finally the emerge project has given rise to a new type of analysis called phenome-wide association studies or fiwas where many aspects of the patient phenome that is findings diseases and treatments that the patient may have are associated with a genome variant in addition to the evaluations of the systems already described there's been another activity for the clinical nlp community which is the i2b2 challenge evaluations challenge evaluations are common in many areas of computer science where a standardized task and data set are developed and different research groups compare their results with others many of these are done on an annual basis which is what has happened for the most part with i2b2 the bullets on the slides here list the tasks that have been covered in the challenge evaluations in the first year the task was automated de-identification of medical records this was followed the next year by identification of smoking status from medical discharge summaries in the following year the task focused on identification of obesity and its comorbidities this was followed by extraction of medication information the detection of relationships between concepts or entities in clinical text co-reference resolution and sentiment classification and more recently the identification of temporal or time-related relationships and most recently more de-identification as well as risk factor detection there has also been other research in clinical nlp one area of work has been negation detection as noted earlier clinical narratives are full of negations and the ability to detect those negations is very important there is also the important problem of syndromic surveillance of possible outbreaks of disease that are often detected through chief complaints of patients presenting to emergency departments there is also the detection of healthcare quality measures that are oftentimes derived from text in the clinical record another area is clinical research for example finding patients who may have a condition such as congestive heart failure or patients who have something like a foot examination done in the presence of diabetes other research has looked at identification of follow-up recommendations from radiology reports sometimes which are not seen by clinicians when they should be another area of research is handling variations in language such as abbreviation and other ambiguity that occurs in clinical text another question is whether clinical nlp is ready for prime time especially if it's going to be used outside of informatics research settings where we monitor closely how accurate the algorithms are stanfill and colleagues performed a systematic review of all automated coding classification systems through 2010 the recall and precision results of those systems are shown on this slide as can be seen while many systems were highly accurate many were not and it often depended on the task being done the disease or finding being studied and so forth but clearly if we're going to use nlp in a large scale way the algorithms need to be highly accurate with both high recall and precision in closing what are alternatives to nlp and future directions that may be taken despite the successes that we've seen we still know that clinical nlp systems are limited they're being difficult to generalize across subject domains as we often need to develop new rules new data etc for each new area of use in addition we still really don't know how good performance must be before we can use them in a clinically reliable manner are 95 to 98 recall and precision good enough to test that we have to test the systems in operational clinical settings one alternative to nlp is to get clinicians to enter more structured data on the front end approaches such as menu driven systems for entering clinical data have been tried for years but are probably best in limited domains especially when they take a great deal of time for busy clinicians to use clearly there's an important role in the future for nlp and as pointed out in the paper by chapman and colleagues we need tools and shared tasks to define that ultimate role this concludes lecture c of machine learning and natural language processing in summarizing this lecture we learned there have been many clinical nlp systems but only a small number are used operationally for clinical care or research the performance of clinical nlp systems is imperfect and the adequate level of performance for clinical use is not known further research is required to determine the optimal use of nlp in healthcare this also concludes the unit on machine learning and natural language processing in summarizing this unit we've seen that being able to learn from data and process data within text are important aspects of applying data analytics to healthcare machine learning is the field focused on learning from data and can occur in a supervised or unsupervised manner natural language processing is the area that aims to understand the text in natural languages and has many challenges in the clinical domain you
Download Subtitles
These subtitles were extracted using the Free YouTube Subtitle Downloader by LunaNotes.
Download more subtitlesRelated Videos
Download Subtitles for Health Care Data Analytics Lecture B
Access accurate and easily downloadable subtitles for the Health Care Data Analytics Unit 6 lecture on Machine Learning and Natural Language Processing. Enhance your learning experience with clear captions that help you understand complex topics and improve retention.
Download Subtitles for All Machine Learning Concepts Video
Enhance your understanding by downloading accurate subtitles for the 'All Machine Learning Concepts Explained in 22 Minutes' video. Access clear captions to follow complex topics with ease and improve your learning experience.
Download Subtitles for Lesson 1: Understanding Healthcare Time Crisis
Enhance your learning experience by downloading accurate subtitles for Lesson 1 - Understanding the Time Crisis in Healthcare. Subtitles improve comprehension and accessibility, making it easier to follow complex healthcare topics. Perfect for students and professionals seeking deeper insight.
Download Subtitles for Harvard CS50 2026 Computer Science Course
Enhance your learning experience with downloadable subtitles for the Harvard CS50 2026 full computer science course. Easily follow along with lectures, improve comprehension, and access the content offline anytime. Perfect for students and enthusiasts aiming to master computer science concepts.
MASTERCLASS 2026 Subtitles Download - June 11 Session Captions
Download accurate subtitles for the MASTERCLASS held on June 11, 2026, to enhance your learning experience. Access clear captions that help you follow the session easily and revisit key points anytime. Improve comprehension and accessibility with our high-quality subtitles.
Most Viewed
Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen
Laden Sie die Untertitel für den gesamten Film 'Nicos Weg' herunter, um Ihr Deutschlernen auf A1 Niveau zu unterstützen. Untertitel helfen Ihnen, Wortschatz und Aussprache besser zu verstehen und verbessern das Hörverständnis effektiv.
ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1
ดาวน์โหลดซับไตเติ้ลสำหรับวิดีโอ DMD LAND 3 The Final Land Day 1 เพื่อช่วยให้เข้าใจเนื้อหาได้ง่ายขึ้น และเพิ่มความสะดวกในการติดตามทุกช่วงเวลา เหมาะสำหรับผู้ชมที่ต้องการความชัดเจนและเข้าถึงข้อมูลอย่างครบถ้วน
Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63
Accede fácilmente a los subtítulos del episodio 63 de '6 DE COPAS', centrado en el narcisismo. Descargar estos subtítulos te ayudará a entender mejor el contenido y mejorar la experiencia de visualización.
Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56
Descarga los subtítulos para el episodio 56 de la tercera temporada de 6 DE COPAS, centrado en los tipos de apego. Mejora tu comprensión y disfruta del contenido en detalle con nuestros subtítulos precisos y accesibles.
Download Subtitles for Your Favorite Videos Easily
Enhance your video watching experience by downloading accurate subtitles and captions. Enjoy better understanding, accessibility, and language support for all your favorite videos.

