Comprehensive CS50 Course Review: From Basics to Web Development
Description
Explore the full journey of Harvard's CS50 course, covering foundational programming concepts, data structures, Python, SQL, web development with HTML, CSS, JavaScript, and cybersecurity essentials. Gain insights into practical applications, debugging, and best practices for secure coding and user interface design.
Keywords
CS50 course review, Harvard CS50, programming fundamentals, Python tutorial, SQL basics, web development HTML CSS JavaScript, cybersecurity, debugging techniques
Content
Introduction to CS50 and Computational Thinking
- CS50 is Harvard University's introduction to computer science and programming.
- Focuses on computational thinking: precise, logical problem solving.
- Emphasizes personal progress over comparison with peers.
Foundations of Programming and Data Structures
- Early weeks cover basics: variables, loops, conditionals, functions.
- Introduction to data structures: arrays, linked lists, trees, hash tables, tries.
- Trade-offs between time and space complexity discussed.
- Big O notation used to analyze algorithm efficiency.
Transition to Python and SQL
- Python simplifies syntax and automates memory management.
- SQL introduced for relational database management.
- Use of libraries and frameworks (Flask, Bootstrap) to build web apps.
- Emphasis on code readability, modularity, and error handling.
Web Development Essentials
- HTML for page structure; tags and attributes explained. Comprehensive Guide to HTML and CSS: From Basics to Advanced Techniques
- CSS for styling; classes, IDs, selectors, and external stylesheets.
- JavaScript for interactivity; event listeners, DOM manipulation. Java Programming Course: Introduction, Structure, and Setup Guide
- Responsive design and accessibility considerations.
Building Dynamic Web Applications
- Flask framework for Python-based web servers.
- Routing, templates, and form handling with GET and POST methods.
- Session management and cookies for user state and authentication.
- Integration of databases for persistent data storage.
Security and Best Practices
- Importance of strong, random passwords and two-factor authentication.
- Risks of SQL injection and mitigation via parameterized queries.
- Use of debugging tools and error handling techniques. Comprehensive Guide to Ethical Hacking: From Basics to Advanced Concepts
- Encouragement to use password managers and encryption.
Advanced Topics and Real-World Applications
- Image and video handling on the web.
- Speech synthesis and recognition with Python libraries.
- Face recognition and QR code generation.
- Overview of internet protocols: TCP/IP, HTTP, DNS.
Course Conclusion and Resources
- Encouragement to continue learning and practicing.
- Recommendations for tools: Git, local development environments.
- Community resources: Stack Overflow, Reddit, official docs.
- Final project as a capstone to synthesize learned skills.
Interactive Review and Engagement
- Use of live polls and quizzes to reinforce learning.
- Emphasis on understanding over memorization.
- Recognition of the course's comprehensive scope and depth.
This comprehensive review encapsulates the CS50 course's progression from fundamental programming concepts to advanced web development and cybersecurity, highlighting practical skills, theoretical knowledge, and real-world applications essential for aspiring computer scientists and developers.
if you want to learn about computer science and the Art of programming this course is where to start cs50 is
considered by many to be one of the best computer science courses in the world this is a Harvard University course
taught by Dr David men and we are proud to bring it to the Freo Camp YouTube channel throughout a series of lectures
Dr me will teach you how to think algorithmically and solve problems efficiently and make sure to check the
description for a lot of extra resources that go along with the course [Music]
all right this is cs50 Harvard University's introduction to the intellectual Enterprises of computer
science and the Art of programming back here on campus in beautiful Sanders Theater for the first time in quite a
while so welcome to to the class my name is David May so my name is David man and I took
this class myself some time ago but almost didn't it was it was sophomore fall and I was sitting in on the class
and I was a little curious but H didn't really feel like the field for me I was definitely a computer person but
computer science felt like something altogether and I only got up the Nerf to take the class ultimately because the
professor at the time Brian kernahan allowed me to take the class past fail initially and that is what made all the
difference I quickly found that computer science is not just about programming and working in isolation on your
computer it's really about problem solving more generally and there was something about homework frankly that
was like actually fun for perhaps the first time in what 19 years and there was something about this ability that I
discovered along with all of my classmates to actually create something and bring a computer to life to solve a
problem and and sort of bring to bear something that I've been using every day but didn't really know how to harness
that's been gratifying ever since and definitely challenging and frustrating like to these to this day all these
years later you're going to run up against mistakes otherwise known as bugs in programming that just drive you nuts
and you feel like you've hit a wall but the trick really is to give it enough time to take a step back take a break
when you need to and there's nothing better I dare say than that sense of gratification and pride really when you
get something to work and in a class like this present ultimately a a terms end something like your very own final
project now this isn't to say that I took to it 100% perfectly in fact just this uh this past week I I looked in my
old cs50 binder which I still have from some 25 years ago and took a photo of what was apparently the very first
program that I wrote and submitted and quickly received minus two points on but this is a program that we'll soon see in
the coming days that does something quite simply like print hello cs50 in this case to the screen and to be fair I
technically hadn't really followed the directions which is why I lost those couple of points but if you just look at
this especially if you've never programmed before you might have heard about programming language but you've
never typed something like this out undoubtedly it's going to look cryptic but unlike human languages frankly which
were a lot more sophisticated a lot more vocabulary a lot more grammatical rules programming once you start to wrap your
mind around what it is and how it works and what these various languages are it's so easy you'll see after a few
months of a class like this to start teaching yourself subsequently other languages as they may come in the coming
years as well so what ultimately matters in this particular course is not so much where you end up relative to your
classmates but where you end up relative to yourself when you began and indeed you'll begin today and the only
experience that matters ultimately in this class is your own and so consider where you are today consider perhaps
just how cryptic something like that looked a few seconds ago and take comfort in knowing just some months from
now all of that will be uh within your own grasp and if you're thinking that okay surely the person in front of me to
the left to the right behind me knows more than me that's statistically not the case 2third of cs50 students have
never taken a CS course before which is to say you're in very good company uh throughout this whole term so then what
is computer science I claim that it's problem solving and the upside of that is that problem solving is something we
sort of do all the time but computer science class learning to program I think kind of cleans up your thoughts it
helps you learn how to think more method meally more carefully more correctly more precisely because honestly the
computer's not going to do what you want unless you are correct and precise and methodical and so as such there's these
sort of fringe benefits of just learning to think like a computer scientist and a programmer and it doesn't take all that
much to start doing so this for instance is perhaps the simplest picture of computer science sure but really problem
solving in general problems are all about taking input like the problem you want to solve you want to get the
solution AKA output and so something interesting's got to be happening in in here in here when you're trying to get
from those inputs to outputs now in the world of computers specifically we need to decide in advance how we represent
these inputs and outputs we all just need to decide uh whether it's Macs or PCS or phones or something else that
we're all going to speak some common language irrespective of our human languages as well and you may very well
know that computers tend to speak only what language so to speak assembly one but binary 2 might be
your go-to binary by implying two means that the world of computers has just two digits at its disposal zero and one and
indeed we humans have many more than that certainly not just zeros and ones alone but a computer indeed only has
zeros and ones and yet somehow they can do so much they can crunch numbers and Excel send text messages create images
and and artwork and movies and more and so how do you get from something as simple as a few zeros a few ones to all
of the stuff that we're doing today in our pockets and laptops and desktops well it turns out that we can start
quite simply if a computer were to want to do something as simple as count well what it could it do well in our human
world we might count doing this like 1 2 3 4 5 using so-called unary notation literally the digits on your fingers
where one finger represents one person in the room if I'm for instance taking attendance now we humans would typically
actually count 1 2 3 4 5 six and we'd go past just those five digits and count much higher using zeros through nines
but computers somehow only have these zeros and one so if a computer only somehow speaks
binary zeros and ones how does it even count past the number one well here are three zeros of course and if you
translate this number in binary 0 0 0 to a more familiar number in decimal we would just call this zero enough said if
we were to represent with a computer the number one it would actually be 0 01 which not surprisingly is exactly the
same as we might do in our human world but we might not bother writing out the two zeros at the beginning
but a computer now if it wants to count as high as two it doesn't have the digit two and so it has to use a different
pattern of zeros and ones and that happens to be 0 1 0 so this is not 10 with a zero in front of it it's indeed 0
1 0 in the context of binary and if we want to count higher now than two we're going to have to tweak these zeros and
ones further to get three and then if we want four or five or six or seven we're just kind of toggling these zeros and
ones AKA bits for binary digits that represent via these different patterns different numbers that you and I as
humans know of course as the so-called Decimal System 0 through n deck implying 10 10 digits those zeros through nine so
why that particular pattern and why these particular zeros and Ones Will turns out that representing one thing or
the other is just really simple for a computer why at the end of the day they're powered by electricity and it's
a really simple thing to just either store some electricity or don't store some electricity like that's as simple
as the world can get on or off one or zero so to speak so in fact inside of a computer a phone anything these days
that's electronic pretty much is some number of switches otherwise known as transistors and they're tiny you've got
thousands millions of them in your Mac or PC or phone these days and these are just tiny little switches that can get
turned on and off and by turning those things on and off in patterns a computer can count from zero on up to seven and
even higher than that and so these switches really you can think of being as like switches like this let me just
borrow one of our little stage lights here here's a light bulb it's currently off and so I could just think of this as
representing in my laptop a transistor a switch representing zero but if I allow some electricity to flow now I in fact
have a one well how do I count higher than one I of course need another light bulb so let me grab another one here and
if I put it in that same kind of pattern I don't want to just do this that's sort of the old finger counting way of unary
just one two I want to actually take into account the pattern of these things being on and off so if this was one a
moment ago what I think I did earlier was I turned it off and let the next one over be on AKA 01 0 and let me get us a
third bit if you will and that feels like enough here is that same pattern now starting at the beginning with three
so here is 0 0 here is 0 0 1 here is 0 1 0 Z AKA in our human world of decimal two and then we of course keep counting
further this now would be three and dot dot dot if this other bulb now goes on and that switch is turned and all three
stay on this again was what number okay so seven so it's just as simple relatively as that if you will but how
is it that these patterns came to be well these patterns actually follow something very familiar you and I don't
really think about it at this level anymore cuz we've probably been doing math and numbers since grade school or
whatnot but if we consider something in decimal like the number 123 I immediately jump to that this
looks like 123 in decimal but why it's really just three symbols a one a two with a bit of curve a three with a
couple of curves that you and I now instinctively just assign meaning to but if we do rewind a few years that is
123 because you're assigning meaning to each of these columns the three is in the so-called one's place the two is in
the so-called 10's place and the one is in the so-called hundreds place and then the math ensues quickly in your head
this is technically 100 * 1 plus 10 * 2 + 1 * 3 AKA 100 + 20 + 3 and there we get the sort of mathematical notion we
know is 123 well nicely enough in binary it's actually the same thing it's just these
columns mean a little something different if you use three digits in decimal and you have the ones place the
T place and the hundreds place well why was that 10 1 10 and 100 they're technically just powers of 10 so 10 the
0 10 the 1 10 the two why 10 Decimal System deck meaning 10 you have eight and 10 digits zero through nine in the
binary system if you're going to use three digits just change the bases if you're using only zeros and ones so now
it's powers of two 2 to the 0 2 to the 1 2 to the 2 AK a 1 and 2 and four respectively and if you keep going it's
going to be Eight's column 16's column 32 64 and so forth so why did we get these patterns that we did here's your 0
0 0 because it's 4 * 0 2 * 0 1 * Z obviously zero this is why we got the decimal number one in binary this is why
we got the number two in binary because it's 4 * 0 + 2 * 1 + 1 * 0 and now 3 and now 4 four and now five and now six and
now seven and of course if you wanted to count as high as eight to be clear like what do you have to do what does a
computer need to do to count even higher than seven add a bit add another light bulb
another switch and indeed computers have standardized just how many zeros and ones or bits or switches they throw at
these kinds of problems and in fact most computers would typically use at least eight at a time and even if you're only
counting as high as three or seven you would still use eight and have a whole bunch of zeros but but that's okay
because the computers these days certainly have so many more thousands millions of transistors and switches
that that's quite okay all right so if with that said if we can now count as high as seven or frankly as high as we
want that only seems to make computers useful for things like Excel like number crunching but computers of course let
you send text messages write documents and so much more so how would a computer represent something like a letter like
the letter A of the English alphabet if at the end of the day all they have is
switches any thoughts yeah okay so we could represent letters using numbers okay so give me what's a
proposal what number should represent whatting the beginning perfect yeah we just all have
to agree somehow that one number is going to represent one letter so one is a two is B 3 is C uh Z is 26 and so
forth maybe we can even take into account uppercase and lower case we just have to agree and sort of write it down
in some global standard and humans indeed did just that they didn't use one two three turns out they started a
little higher up capital A has been standardized as the number 65 uh and capital B has been standardized as the
number 66 and you can kind of imagine how it goes up from there and that's because whatever you're representing
ultimately can only be stored at the end of the day as zeros and ones and so some humans in a room before decided that
capital A shall be 65 or really this pattern of zeros and ones inside of every computer in the world 0 1 0 0
00001 so if that pattern of zeros and ones ever appears in a computer it might be interpreted then as indeed a capital
letter a eight of those bits at a time but I worry just to be clear we might have now created a problem it might seem
if I play this naively that okay how do I now actually do math with the number 65 if now Excel displays 65 is an a let
alone B's and C's so how might a computer do as you've proposed have this mapping from numbers to letters but
still support numbers it feels like we've given something up yeah AIX by having a
prefixed okay so we could perhaps have some kind of prefix like some pattern of zeros and ones I like this that rep that
indicates to the computer here comes another pattern that represents a letter here comes another pattern that
represents a a number or a letter so not bad I like that other thoughts how might a computer distinguish these two H
yeah indeed and that's spot on nothing wrong with what you suggested but the world generally does just that the
reason we have all of these different file formats in the world like uh JPEG and GIF and pings and Word Documents do
do o c x and Doc Excel files and so forth is because a bunch of humans got in a room and decided Well in the
context of this type of file or really more specifically in the context of this type of program Excel versus Photoshop
versus Google Docs or the like we shall interpret any patterns of zeros and ones as being maybe numbers for Excel maybe
uh letters in like a text messaging program or Google Docs or maybe even colors of the rainbow and something like
Photoshop and more so it's context dependent and we'll see when we ourselves start programming you the
programmer will ultimately provide some hints to the computer that tells the computer interpret it as follows so
similar in spirit to that but not quite as standardized with these prefixes so this system here actually has a name ASI
the American Standard code for information interchange and indeed it began here in the US and that's why it's
actually a little biased toward A's through Z's and a bit of punctuation as well and that quickly became a problem
but if we start simply now in English the mapping itself is fairly straight so if a is 65 B is 66 and dot dot dot
suppose that you received a text message an email from a friend and underneath the hood so to speak if you kind of
looked inside the computer what you technically received in this text or this email happened to be the numbers 72
73 33 or really the underlying pattern of zeros and ones what might your friend have sent you as a message if it's 72 73
33 hey close hi it's indeed High Why well apparently according to this little
cheat sheet H is 72 I is 73 it's not obvious from this chart what the 33 is but indeed this pattern represents high
and anyone want to guess or if you know what 33 is exclamation point and this is frankly not the kind of thing most
people know but it's easily accessible via a nice userfriendly chart like this so this is an asky chart when I said
that we just need to write down this mapping earlier this is what people did they wrote it down in a book in a chart
and for instance here is our 72 for H here is our 73 for I and here is our 7 33 for exclamation point and computers
Macs PCS iPhones Android devices just know this mapping by heart if you will they've been designed to understand
those letters so here I might have received High technically what I've received is these patterns of zeros and
ones but it's important to note that when you get these patterns of zeros and ones in any format be it email or text
or a file they do tend to come in standard lengths with a certain number of zeros and ones alog together and this
happens to be 8 plus 8 plus 8 so just to get the message High exclamation point you would have received at least it
would seem some 24 bits but frankly bits are so tiny literally and mathematically that we don't tend to think or talk
generally in terms of bits you're probably more familiar with bites b t s is a bite is a bite is a bite a bite is
just eight Bits And even those frankly aren't that useful if we do out the math how high can you count if you have eight
bits anyone know say again uh higher than that unless you want to go negative that's
that's fine 256 technically 255 long story short if we actually got into the weeds
of all of these zeros and ones and we figured out what 11 one one one11 mathematically adds up to in decimal it
would indeed be 255 or less if you want to represent negative numbers as well so this is
useful because now we can speak not just in terms of bytes but if the files are bigger kilobytes is thousands of bytes
megabytes is millions of bytes gigabytes is billions of bytes terabytes are trillions of bytes and so forth we have
a vocabulary for these increasingly large uh quantities of uh data the problem is that if you're using
asky and therefore eight bits or one B per character and originally only seven you can only represent 255 characters
and that's actually fine or 256 total characters including zero and that's fine if you're using literally English
in this case plus a bunch of punctuation but there's many humans uh languages in the world that need many more symbols
and therefore many more bits so thankfully the world decided that we'll indeed support not just the US English
keyboard but all of the accented characters that you might want for some languages and Heck if we use enough bits
zeros and ones not only can we represent all human languages in written form as well as some emotions along the way we
can capture the latter with these things called emojis and indeed these are very much invogue these days you probably
send and or receive many of these things any given day these are just characters like letters of an alphabet patterns of
zeros and ones that you're receiving that the world has also standardized for instance there are certain emojis that
were represented with certain patterns of bits and when you receive them your phone your laptop your desktop displays
them as such and this newer standard is called Unicode so it's a superet of what we called ASI and unicode is just a
mapping of many more numbers to many more letters or characters more generally that might use eight bits for
backwards compatibility with the old way of doing things with ASI but they might also use 16 Bits And if you have 16 bits
you can actually represent more than 65,000 possible letters and that's getting up there and heck unic code
might even use 30 two bits to represent letters and numbers and punctuation symbols and emojis and that would give
you up to 4 billion possibilities and I dare say one of the reasons we see so many emojis these days is we have so
much room I mean there's got room for billions more literally so in fact just as a little bit of trivia has anyone
ever received this decimal number or if you prefer binary now has anyone ever received this pattern of zeros and ones
on your phone in a text or an email perhaps this past year here well if you actually look this up this esoteric uh
sequence of zeros and ones happens to represent uh face with medical mask and notice that if you've got an iPhone or
an Android device you might be seeing different things in fact this is the Android version of this most recently
this is the iOS version of it most recently and there's Bunches of other interpretations by other companies as
well so Unicode as a Consortium if you will has standardized the descriptions of what these things are but the
companies themselves manufacturers out there have generally interpreted it as you see fit and this can lead to some
human uh miscommunications in fact for like literally embarrassingly like a year or two I started being in the habit
of using the Emoji that kind of looks like this cuz I thought it was like woo happy face or whatever didn't realize
this is the emoji for hug because whatever device I was using sort of looks like this not like this and that's
because of their interpretation of the data this has happened too when what was a gun became a water pistol in some
manufacturers eyes and so it's an interesting dichotomy between what information we all want to represent and
how we choose ultimately to represent it questions then on these representations of formats be it numbers or letters or
soon more yeah and sorry why is why is what so popular popular
yeah so we'll come back to this in a few weeks in fact there are other ways to represent numbers binary is one decimal
is another unary is another and heximal is yet a fourth that uses 16 total digits literally 0 through 9 plus AB CDE
EF and somehow you can similarly count even higher with those uh we'll see in a few weeks why this is uh compelling but
heximal long story short uses four bits per digit and so four bits if you have two digits in HEX that gives you eight
and it's just a very convenient unit of measure and it's also human convention in the world of like files and other
things but we'll come back to that soon other questions the lights on the stage supposedly do the lights on the stage
supposedly say anything well if we had thought in advance to use maybe 64 light bulbs that would seem to give us uh
eight total bites on stage eight times uh eight giving us just that maybe good question other questions on zeros and
ones it's a little bright in here no oh
yes where everyone's pointing somewhere specific there we go sorry very bright in this
corner oh sure and we'll come back to this in some form in the coming days too at at a slower Pace too we have with
eight bits two possible values for the first and then two for the next two for the next and so so forth so that's 2 * 2
* 2 that's 2 to 8th power total which means you can have 256 total possible patterns of Zer and ones but as we'll
seen soon computer scientists programmers software often starts counting at zero by convention and if
you use one of those patterns 0 00 0 000000 0 to represent the decimal number we know is zero you only have 255 other
patterns left to count as high as therefore 255 that's all good question
question all right so what then might we have besides uh these emojis and letters and numbers well we of course have
things like colors and programs like Photoshop and pictures and photos well let me ask the question again how might
a computer do you think knowing what you know now represent something like a color like what are our options if all
we've got are zeros and ones and switches yeah RGB when RGB indeed is This Acronym
that represents some amount of red and some amount of green green and blue and indeed computers can represent Colors by
just doing that remembering for instance this dot this yellow dot on the screen that might be part of any of those
emojis these days well that's some amount of red some amount of green some amount of blue and if you sort of mix
those colors together you can indeed get a very specific one and we'll see in just a moment just that so indeed
earlier on did uh humans only use seven bits total and it was only once they decided oh let's add an eighth bit that
they got extended asy and that was initially in part A solution to the same problem of not having enough room if you
will in those patterns of zeros and ones to represent all of the characters that you might want but even that wasn't
enough and that's why we've now gone up to 16 and 32 and long past 7even so if we come back now to this one particular
color RGB was proposed as a scheme but how might this work well consider for instance this if we do indeed decide as
a group to represent any color of the rainbow with some mixture of some red some green and some blue we have to
decide how to represent the amount of red and green and blue well it turns out if all we have are zeros and ones airgo
numbers let's do just that for instance suppose a computer we using these three numbers 72 73 33 no longer in the
context of an email or a text message but now in the context of something like Photoshop a program for editing and
creating graphical files maybe this first number could be interpreted as representing some amount of red green
and blue respectively and that's exactly what happens you can think of the first digit as red second as green third as
blue and so ultimately when you combine that amount of red that amount of green that amount of blue it turns out it's
going to resemble this shade of yellow and indeed you can come up with numbers between 0 and 255 for each of those
colors to mix any other color that you might want and you can actually see this in practice even though our screens
admittedly are getting really good on our phones and laptops such that you barely see the dots they are there you
might have heard the term pixel before pixel is just a DOT on the screen and you've got thousands millions of them
these days horizontally and vertically and if I take even this Emoji which again happens to be one company's
interpretation of a face with medical mask and zoom in a bit maybe zoom in a bit more you can actually start to see
these pixels things get pixelated because what you're seeing is each of the individual dots that compose this
particular image and apparently each of these individual dots are probably using 24 bits eight bits for red eight bits
for green eight bits for blue in some pattern and this program or some other like Photoshop is interpreting one
pattern it's white or yellow or black or some brown in between and so if you look sort of awkwardly but up close to your
phone or your laptop or maybe your TV you can see exactly this too all right well what about things that we also
watch every day on YouTube or the like things like video how would a computer knowing what we
know now represent something like a video how might you represent a video using only zeros and ones
yeah yeah exactly and to summarize what video really adds is just some notion of time it's not just one image it's not
just one letter or number it's presumably some kind of sequence because time is passing and so with a whole
bunch of images maybe 24 maybe 30 per second if you fly them by the human's eyes we can interpret them using our
eyes and brain that there is now movement and therefore video and similarly with audio or music if we just
came up with some convention for representing those same notes on a musical instrument could we have the
computer synthesize them too and this might be actually pretty familiar let me pull up a quick video here which uh
happens to be an old school version of the same idea you might remember from [Music]
childhood so granted that particular video is an actual video of a paper based animation but indeed that's really
all you need is some sequence of these uh these images which themselves of course are just zeros and ones because
they're just this grid of these pixels or dots now something like musical notes like these those of you who are musicans
might just naturally play these on physical devices but computers can certainly represent those sounds too and
for instance a popular format for audio is called midi and midi might just represent each note that you saw a
moment ago essentially as a a sequence of numbers but more generally you might think about music as having notes for
instance a through G maybe some flats and some Sharps uh you might have the duration like how long is the note being
heard or played on a piano or some other device and then just the volume like how hard does a human in the real world
press down on that key and therefore how loud is that sound it would seem that just remembering little details like
that quantitatively we can then represent really all of these these otherwise analog uh human realities so
that then is really a laundry list of ways that we can just represent information again computers or digital
have all of these different formats but at the end of the day and as fancy as those devices in yours are it's just
zeros and ones tiny little switches or light bulbs if you will represented in some way and it's up to the software
that you and I and others write to use those zeros and ones in ways we want to get the computers to do something more
powerfully questions then on this representation of information which I dare say is ultimately what problem
solving is all about taking in information and producing new via some process in
between any questions out there yeah yeah in back yeah we so a really good question there are
many other file formats out there you allude to MP4 for video and more generally the are these things called
codecs and containers it's not quite as simple when using larger files for instance in more modern formats that a
video is just a sequence of images for instance why if you stored that many images for like a Hollywood movie like
24 or 30 of them per second that's a huge number of images and if you've ever taken phone uh photos on your phone you
might know how many megabytes or larger even individual photographs might be so humans have developed over the years uh
fancier software that uses much more math to represent the same information more minimally just using somehow
shorter patterns of zeros and ones that are most simplistic representation here and they use what might be called
compression if you've ever used a zip file or something else somehow your computer is using fewer zeros and ones
to represent the same amount of information ideally without losing any information and in the world of
multimedia which we'll touch on a little bit in a few weeks there are both lossy and lossless formats out there lossless
means you lose no information whatsoever but more commonly as you're alluding to one is ly compression L OSS Y where
you're actually throwing away some amount of quality you're getting a some amount of pixelation that might not look
perfect to the human but heck it's a lot cheaper and a lot easier to distribute and in the world of multimedia you have
containers like Quicktime and other uh M containers that can combine different formats of video different formats of
audio in one file but there too do designers have discretion so more in a few few weeks too other questions then
on information here as well yeah know Compu used to be very big taking up like a whole room andu is the reason they've
gotten smaller because information what exactly I mean back in
the day you might have heard of the expression a vacuum tube which is like some physically large device um that
might have only stored some zero or one um yes it is the miniaturization of Hardware these days that has allowed us
to store as many and many more zeros and ones much more closely together and as we've built more fancy machines that can
sort of Design This Hardware at an even smaller scale we're just packing more and more into these devices but there
too is a trade-off for instance you might know by using your phone uh or your laptop for quite a while maybe on
your lap starts to get warm and so there are these literal physical side effects of this where now some of our devices
run hot and this is why like a data center in the real world might need more air conditioning than a typical place
because there are these physical AR facts as well and in fact if you'd like to see one of the earliest computers
from decades ago across the river here in now Alon in the new engineering building is the Harvard mark1 computer
uh that will give you a much big a better mental model of just that well if we come back now to this first picture
being computer science or really problem solving I dare say we have more than enough ways now to represent information
input and output so long as we all just agree on something and thankfully those before us have given us things like asky
and unicode not to mention MP4s Word documents and the like but what's inside of this proverbial black box into which
these inputs are going and the outputs are coming well that's where we get this term you might have heard too an
algorithm which is just step-by-step instructions for solving some problem incarnated in the world of computers by
software when you write software AKA programs you are implementing one or more algorithms one or more step uh sets
of instructions for solving some problem and maybe you're using this language or that but at the end of the day no matter
the language you use the computer is going to represent what you type using just zeros and ones so what might be a
representative algorithm nowadays you might use uh your phone quite a bit to make calls or send texts or emails and
therefore you have a whole bunch of contacts in your address book nowadays of course this is very digital but
whether on iOS or Android or the like you might have a whole bunch of names a first name and or last as well as
numbers and emails and the like and you might be in the habit of like scrolling through on your phone all of those names
to find the person you uh want to call uh it's probably sorted alphabetically by first name or last name a through z
or some other symbol and this is frankly quite the same as we used to do you know back in in my day cs50 when we just used
a physical book and this physical book might be a whole bunch of names alphabetically sorted from left to right
corresponding to a whole bunch of numbers so suppose that in this Old Harvard phone book we want want to
search for John Harvard we might of course start quite simply at the beginning here looking at one page at a
time and this is an algorithm this is like literally step by steps uh looking for the solution to this problem in that
sense if John Harvard's in the phone book is this algorithm Page by Page correct would you
say yes like if uh John Harvard's in the phone book obviously I'm eventually going to get to him so that's what we
mean by correct is it efficient is it well-designed would you say no I mean this is going to take forever even just
to get to the J's or the H's depending how this thing sorted all right well let me go a little faster I'll start like
two pages at a time 2 4 6 8 10 12 and so forth sounds faster is faster is it correct okay why is it not correct
yeah exactly if I start an odd number of pages and I'm going two at a time I miss pages in between and if I therefore
conclude when I get to the back of the book there was no John Harvard I might have just aired this would be again one
of these bugs but if I try a little harder I feel like there's a solution we don't have to completely throw out this
algorithm I think we can probably go roughly twice as fast still but what should we do instead to fix this yeah
I'm back nice so I think what many of us most of us if we even use this technology
anymore these days we might go roughly to the middle of the phone book just to kind of get us started and now I'm
looking down I'm looking for Jay assuming first name Jay Harvard and it looks like I'm in the M section so just
to be clear what should I do next okay and presumably it is John Harvard would be to the left of this so
here's an opportunity to figuratively enl literally tear this particular problem in half throw half of the
problem away it's it's actually pretty easy if you just do it that way the hard way is this way but I've now just uh
decreased the size of this problem really in half so if I started with like a thousand pages of phone numbers and
names now I'm down to 500 and already we haven't found John Harvard but that's a big bite out of this problem and I do
think it's correct because if J is to the left of M of course he's definitely not going to be over there and I think
if I repeat this is again dividing and conquering if you will here I might have gone a little too far now I'm in like
the E section so let me tear the problem in half again throw another 250 Pages away and again repeat dividing and
dividing and conquering until finally presumably I end up with just one page of a phone book on which John Harvard's
name either is or is not but because of the algorithm you proposed step by step I know that he's not in anything I
discarded so dramatic as that might have been um uh been made out to be it's actually
just harnessing pretty good human intuition indeed this is what programming is all about too it's not
about learning a completely new world but really just how to harness intuition and ideas that you might already have
and take naturally but learning how to express them now more succinctly more precisely and using things called
programming languages why is an algorithm like that if I found John Harvard better than ultimately just
doing the first one or even the second maybe doubling back to check those even Pages well let's just look at a little
charts here again we don't have to get into the the nuances of numbers but if we've got like a chart here XY plot on
the x-axis here I claim is the size of the problem so measured in the P numbers of pages in the phone book so the
farther you go out here the more pages are in the phone book and here we have time to solve on the y- axis so the
higher you go up the more time it's going to be taking to solve that particular problem so let's just
arbitrarily say that the first algorithm involving like n Pages might be represented graphically like this no
matter the slope it's a straight line because there's presumably a onetoone relationship between numbers of pages
and number of seconds or number of page turns why if the phone company adds another page next year because some new
people move to town that's going to require one additional page for me one to one if though we use the second
algorithm flawed though it was unless we double back a little bit to fix someone being in between that's two going to be
a straight line but it's going to be a different slope because now there's a two: one or a one: two relationship
because I'm going two pages at a time so if the phone company adds another page that's going to take me only or another
two pages that's still only just one more step and you can see the difference if I kind of dra draw this if this is
the phone book in question this number of pages it might take this many seconds on the yellow line to represent or to
solve uh to find someone like John Harvard but of course on the first algorithm the red line
it's literally going to take twice as many steps and what do the ends here Meet n is the go-to variable for
computer scientist or programmer just generically representing a number so if the number of pages in the phone book is
in the number of steps the second algorithm would have taken would be in the worst case n/ two half as many
because you're going twice as fast but the third algorithm actually if you recall your your logarithms looks a
little something like this there's a fundamentally different relationship between the size of the problem and the
amount of time required to solve it that technically is log based two event but it's really the shape that's different
and the implication there is that if for inston Cambridge and Austin two different towns here in Massachusetts
merge next year and there's just one phone book that's twice as big no big deal for that third and final algorithm
why you just tear the problem one more time in half taking one more bite that's it not another thousand bytes just to
get to the solution and put another way you can walk out way way way out here to a much bigger phone book and ultimately
that green line is barely going to have budged so this then is just a way of now formalizing and thinking about what the
performance or quality of these algorithms might be and before we now make one more formalization of the
algorithm itself any questions then on this notion of efficiency or now performance of
ideas yeah many um a lot of phone books over the years and if you or your parents have
any more still somewhere we could definitely use them because they're hard to find other questions but thanks other
questions here too no oh was that a murmur yes over here sorry say
again oh yeah hopefully and then we could uh then we'd have a little something more to use here so now if we
want to formalize further what it is we just did we can go ahead and introduce this a form of code AKA pseudo code
pseudo code is not a specific language it's not like something we're about to start coding in it's just a way of
expressing yourself in English or any human language succinctly correctly toward an end of getting your idea for
an algorithm across so for instance here might be how we could formalize the code the pseudo code for that same algorithm
step one was pick up the phone book as I did step two might be open to the middle of the phone book as you proposed that
we do first step three was probably to look down at the pages I did and step four gets a little more interesting
because I had to quickly make a decision and ask myself a question if person is on page then I should probably just go
ahead and call that person but that probably wasn't the case at least for John Harvard and I opened the M section
and so there's this other question I should now ask else if the person is earlier in the book then I should tear
the problem in half as I did but go left so to speak and then not just open to the middle of the left half of the book
but really just go back to step three repeat myself why because I can just repeat what I just did but with a
smaller problem having taken this big bite but if the person was later in the book as might have happened with a
different person than John Harvard then I should open to the middle of the right half of the book again go back to line
three but again I'm not going to get stuck doing something forever like this cuz I keep shrinking the size of the
problem lastly the only possible scenario that's left if John Harvard is not on the page and
he's not to the left and he's not to the right what should our conclusion be he's not there he's not listed and so
we need to quit in some other form now as an aside it's kind of deliberate that I buried that last question at the end
because this is H what happens all too often in programming whether you're new at it or professional just not
considering all possible cases Corner cases if you will that might not happen that often but if you don't anticipate
them in your own code code pseudo code or otherwise this is when and why programs might crash or you might see
stupid little spinning beach balls or hourglasses or your computer might reboot why it's doing something sort of
unpredictable if a human maybe myself didn't anticipate this like what does this program do if John Harvard's not in
the phone book If I Had omitted lines 12 and 13 I don't know maybe it would behave differently on a Mac or PC
because it's sort of undefined behavior and these are the kinds of omissions that frankly you're invariably going to
make bugs you're going to introduce mistakes you're going to make early on and me too 25 years later but you'll get
better at thinking about those Corner cases and handling anything that can possibly go wrong and as a result your
code will be all the better for it now the problem ultimately with learning how to program especially if you've never
had experience or even if you do but you learned a one language only is that they all look a little cryptic at first
glance but they do share certain alties and in fact we'll use this pseudo code to define those first highlighted in
yellow here are what henceforth we're going to start calling functions lots of different programming languages exist
but most of them have what we might call functions which are actions or verbs that solve some smaller problem that is
to say you might use a whole bunch of functions to solve a bigger problem because each functions tend to do each
function tends to do something very specific or precise these then in English might be translated in code
actual computer code to these things called functions highlighted in yellow now are what we might call conditionals
conditionals are things that you do conditionally based on the answer to some question you can think of them kind
of like Forks in the road do go left or go right or some other direction based on the answer to some question well what
are those questions highlighted now in yellow are what we would call Boolean Expressions named after mathematician
last name bull that simply have yes no answers or if you prefer true or false answers or Heck if you prefer one or
zero answers we just need to distinguish one scenario from another the last thing manifest in this pseudo code is what I
might highlight now and call Loops some kind of cycle some kind of directive that tells us to do something again and
again so that I don't need a thousand line program to search a thousand page phone book I can get away with a 13 line
program but sort of repeat myself inherently in order to solve some problem
until I get to that last step so this then is what we might call pseudo code and indeed there are other
characteristics of programs that we'll touch on before long things like arguments and return values variables
and more but unfortunately in most languages including some we will very deliberately use in this class and that
everyone in the world world these days still uses these programs tend to look like this this for instance is a
distillation of that very first program I wrote in 19 96 in cs50 itself just to print something on the screen and in
fact this version here just tries to print quote unquote hello world which is dare say the most canonical first thing
that most any programmer ever gets a computer to say just because but look at this mess I mean there's a hash symbol
these angled brackets parentheses words like int curly braces quotes parentheses semicolons and backslashes I mean
there's more overhead and more syntax and clutter than there is an actual idea now that not to say that you won't be
able to understand this before long because honestly there's not that many patterns inde programming languages have
typically a much smaller vocabulary than any actual human language but at first it might indeed look quite cryptic but
you can perhaps infer I have no idea what these other lines do yet but hello world is presumably quote unquote what
will be printed on the screen but what we'll do today after a short break and set the stage for next week is introduce
these exact same ideas and just a bit using scratch something that you yourselves might have used when you're
quite younger but without the same vocabulary applied to those ideas and the upside of what we'll soon do using
scratch this graphical programming language from our friends down the road at MIT it'll let us today start to drag
and drop things that look like puzzle pieces that interlock together if it makes logical sense to do so but without
the distraction of hashes parentheses curly braces angled brackets semicolons and things that are quite beside the
point but for now let's go ahead and take a 10-minute break here and when we resume we will start programming so this
on the screen is a language called C something that we'll dive into next week and thankfully this now on the screen is
another language called python that we'll also take a look at in a few weeks before long along with other languages
along the way today though and for this first week week zero so to speak we use scratch because again it'll allow us to
explore some of those programming fundamentals that will be in C and in Python and in JavaScript and other
languages too but in a way where we don't have to worry about the distractions of syntax so the world of
scratch looks like this it's a web-based or downloadable programming environment that has this layout here by default on
the left here we'll soon see is a pet of puzzle pieces programming blocks that represent all of those ideas we just
discussed and by dragging and dropping these puzzle pieces or blocks over this to Big area and connecting them together
if it makes logical sense to do so will start programming in this environment the environment allows you to have
multiple Sprites so to speak multiple characters things like a cat or anything else and those Sprites exist in this
rectangular World up here that you can full screen to make bigger and this here by default is scratch who can move up
down left right and do many more things too and within it scratches world you can think of it as perhaps a familiar uh
coordinate system with x's and y's which is helpful only when it comes time to like position things on the screen right
now scratch is at the default 0 comma 0 where x equals 0 and Y equals 0 if you were to move the cat way up to the top X
would stay zero y would be positive 180 if you move the cat all the way to the bottom X would stay zero but y would now
be negative 180 and if you went left X would become -240 but y would stay zero or to the right X would be 240 and Y
would stay zero so those numbers generally don't so much matter because you can just move relative ly in this
world up down left right but when it comes time to like uh precisely position some of these Sprites or other imagery
it'll be helpful just to have that mental model of up down left and right well let's go ahead and make perhaps the
simplest of programs here I'm going to switch over to the same programming environment now for a tour of the left
hand side so by default selected here are the category in blue motion which has a whole bunch of puzzle pieces or
blocks that relate to motion and whereas scratch as a graphical language categorizes things by the type of things
that these pieces do we'll see that throughout this whole pet we'll have functions and variables and conditionals
and Boolean expressions and more each in a different color and shape so for instance moving 10 steps or turning one
way or the other would be functions categorized here as things like motion under looks in purple you might have
speech bubbles uh that you can create by dragging and dropping these that might say hello or whatever for some number of
seconds or you could switch costumes change the cat to look like a dog or a bird or anything else in between sounds
too you can play sounds like meow or anything you might import or uh record yourself and then there's these things
scratch calls events and the most important of these is the first when green flag clicked because if we look
over to the right of scratch's world here this rectangular region has this green flag and red stop sign up above
one of which is for play one of which is for stop and so that's going to allow us to start and stop our actual programs
when that green flag is initially clicked but you can listen for other types of events when the space bar is
pressed or something else when this Sprite is clicked or something else and here you already see like a programmer's
incarnation of things you and I take for granted like every day now on our phones anytime you tap an icon or drag your
finger or hit a button on the side these are what a programmer would call events things that happen and are often
triggered by us humans and things that a program be it in scratch or python or C or anything else can listen for and
respond to indeed that's why when you tap the phone icon on your phone the phone application starts up because
someone wrote software that's listening for a finger press on that particular icon and so scratch has these same
things too under control in Orange you can see that we can wait for one second or repeat something some number of times
10 by default but we can change anything in these white circles to anything else there's another puzzle piece here
forever which implies some kind of loop where we can do something again and again and even though it's seems a
little tight there's not much room to fit something there scratch is going to have these things grow and Shrink
however we want to fill similarly shaped pieces and here's those conditionals if something is true or false then do this
this next thing and that's how we can put in this little trapezoid like shape some form of Boolean expression a
question with a yes no true false or one zero answer and decide whether to do something or not you can mess uh combine
these things too if something is true do this else do this other thing and you can even tuck one inside of the other if
you want to ask three or four or more questions sensing too is going to be a thing you can ask questions AKA Boolean
expressions like is the Sprite touching the mouse pointer the arrow on the screen so that you can start to interact
with these programs uh what is the distance between a Sprite and a mouse pointer you can do simple calculations
just to figure out maybe if the the enemy is getting close to the cat under operators some lower level stuff like
math but also the ability to pick random numbers which for a game is great because then you can kind of vary the
difficulty or what's happening in a game without the same game playing the same way every time and you can combine ideas
something and something must be true in order to make that kind of decisions before or we can even join two wors
together says apple and banana by default but you can type in or drag and drop whatever you want there to combine
multiple words into full larger sentences and then lastly down here there's in Orange things called
variables in math we've obviously got X and Y and whatnot in programming we'll have have the same ability to sort of
store in these named symbols X or Y values that we care about numbers or letters or words or colors or anything
ultimately but in programming you'll see that it's much more conventional not to just use Simple letters like X and Y and
Z but to actually give variables full words or word uh singular or plural words to describe what they are and then
lastly if this isn't enough colors of blocks for you you can create your own blocks and indeed this is going to be a
programming principle will apply today and with the first problem set whereby once you start to assemble these puzzle
pieces and you realize oh would have been nice if those several pieces could have just been replaced by one had MIT
fought to give me that one puzzle piece you yourself can make your own blocks by connecting these all together giving
them a name and boom a new puzzle piece will exist so let's do the simplest most canonical programs here starting up with
control and I'm going to click and drag and drop this thing here when green flag clicked and then I'm going to grab one
more for instance under looks and under looks I'm going to go ahead and just say something like initially not just hello
but the more canonical hello comma world now you might guess that in this programming environment I can go over
here now and click the green flag and voila hello comma world so that's my first program and obviously much more
userfriendly than typing out the much more cryptic text that we saw on the screen that you too will type out next
week but for now we'll just focus on these ideas and in this case a function so what is it just that just happened
this purple block here is say that's the function and it seems to take some form of input in the white oval specifically
hello comma World well this actually fits the Paradigm that we looked at earlier of just inputs and outputs so if
I may if you consider what this puzzle piece is doing it actually fits this model the input in this case is going to
be hello comma World in white the algorithm is going to be implemented as a function by m called say and the
output of that is going to be some kind of side effect like the cat and the speech bubble are saying hello world so
already even that simple drag and drop mimics exactly this relatively simple mental model so let's take things
further let's go ahead now and make the program a little more interactive so that it says something like hello David
or hello Carter or hello to you specifically and for this I'm going to go under sensing and you might have to
poke around to find these things the first time around but I've done this a few times so I kind of know where things
are and what color there's this function here ask what's your name but that's in white so we can change the question to
anything we want and it's going to wait for the human to type in their answer and this function called ask is a little
different from the say block which just had this side effect of printing a speech bubble to the screen the ask
function is even more powerful in that after it asks the human to type something in this function is going to
hand you back what they typed in in the form of what's called a return value which is stored ultim timately in by
default this thing called answer this little blue oval here called answer is again one of these variables that in
math would be called just X or Y but in programming we we're saying what it does so I'm going to go ahead and do this let
me go ahead and drag and drop this block and I want to ask the question before saying anything but you'll notice that
scratch is smart and it's going to realize I want to insert something in between and it's just going to move
things up and down I'm going to let go and ask the default question what's your name and now if I want to go ahead and
say hello David or Carter let's just do hello comma cuz I obviously don't know when I'm writing the program who's going
to use it so let me now grab another looks block up here uh say something again and now let me go back to sensing
and now grab the return value represented by this other puzzle piece and let me just drag and drop it here
and notice it's the same shape even if it's not quite the same size things will grow or Shrink as needed all right so
let's now zoom out let me go and stop the old versions because I don't want to say hello world anymore let me hit the
green flag and what's my name all right David enter huh all right maybe I just wasn't
paying close enough attention let me try it again green flag da a enter this seems like a
bug what's the bug or mistake might you think uh yeah yeah we kind of want to combine
them in the same text box and it's you know it's technically a bug because this just looks kind of stupid it's just
saying David after I W for my name I'd like it to say maybe hello then David but it's just blowing past the hello and
printing David but let's put our finger on why this is happening you're right for the solution but what's the actual
fundamental problem and back perfect I mean computers are really darn fast these days it is saying hello
all of us are just too slow in this room to even see it because it's then saying David on the screen so fast as well so
there's a coup of solutions here and yours is spot on but just to poke around you'll see the first example of how many
ways in programming be it scratch or C or python or anything else that there are going to be to solve problems and
we'll teach you over the course of the weeks Sometimes some ways are better relatively than others but rarely is
there a Best Way necessarily because again reasonable people will disagree and what we'll try to teach you over the
coming weeks is how to kind of think through those nuances and it's not going to be obvious at first glance but the
more programs you write the more feedback you get the more bugs that you introduce the more you'll get your
footing with exactly this kind of problem solving so let me try this in a couple of ways up here would be one
solution to the problem MIT anticipated this kind of issue especially with first-time programmers and I could just
use puzzle piece that says say the following for two seconds or one second or whatever then do the same with the
next word and it might be kind of a bit of a pause hello 1 second 2 seconds David 1 second 2 seconds but at least it
would look a little more GR atically correct but I can do it a little more elegantly as you propos let me go ahead
and throw away one of these blocks and you can just drag and let go and it'll delete itself let me go down to
operators because this join block here is the right shape and so even if you're not sure what goes where just focus on
the shapes first let me drag this over here and it grew to fill that let me go ahead and say hello comma space and now
it could just say by default hello banana but let me go back to let me go back to S
drag answer and that's going to drag and drop there and so now notice we're sort of stacking or nesting one block on
another so that the output of one becomes the input to another but that's okay here let me go ahead and zoom out
hit stop and hit play all right what's your name da v d enter and voila now it's presumably as we first intended so
thank you thank you no no minus two this time so consider that even with these uh this
additional example it still fits the same mental model but in a little more interesting way here's that new function
ask something and wait and notice that in this case too there's an input Otherwise Known henceforth as an
argument or a parameter programming speak for just an input in the context of a function and if we use our drawing
as before to represent this thing here we'll see that the input now is going to be quote unquote what's your your name
the algorithm is going to be implemented by way of this new puzzle piece the function called ask and the output of
that thing this time is not going to be the cat saying anything yet but rather it's going to be the actual answer so
instead of the visual side effect of the speech bubble appearing now nothing visible is happening yet thanks to this
function it's sort of handing me back like a a scrap of paper with whatever I typed in written on it so I can reuse
daavid one or more times even like I did now what did I then do with that value we consider that with the subsequent
function with the subsequent function we had this say block two combined with a join so we have this variable called
answer we're joining it with that first argument hello so already we see that some functions like join can take not
one but two arguments or inputs and that's fine the output of join is presumably going to be hello David or
hello Carter or whatever the human typed in and that output notice is essentially becoming the input to another function
say just because we've kind of stacked things or nested them on top of one another but graph or but um but
methodically it's really the same idea the input now or two things hello comma and the return value from the previous
ask function the function now is going to be joined the output is going to be hello David but that hello David output
is now going to become the input to another function namely that first block called say and that's then going to have
the side effect of printing out hello David on the screen so again as sort of sophisticated as ours as yours as others
programs are going to get they really do fit this very simple mental model of inputs and outputs and you just have to
learn to recognize the vocabulary and to know what kinds of puzzle pieces or concepts ultimately to apply but you can
ultimately really kind of spice these things up let me go back to my program here that just is using the speech
bubble at the moment scratch as an aside has some pretty fancy Interac features too I click the extensions button in the
bottom left corner and let me go ahead and choose the um text to speech extension this is using a cloud service
so if you have an internet connection it can actually talk to the cloud or a thirdparty service and this one's going
to give me a few new green puzzle pieces namely the ability to speak something from my speakers instead of just saying
it textually so let me go ahead and drag this and now notice I don't have to interlock them if I'm just kind of
playing around and I want to move some things around I just want to use this as like a canvas temporarily let me go
ahead and steal the join from here put it there let me throw away the SE block by just moving it left and letting go
and now let me join this in so I've now changed my program to be a little more interesting so now let me stop the old
version let me start the new what's your name type in David and voila hello banana okay minus to for real all right
so what I accidentally threw away there uh intentionally for instructional purposes was the actual answer that came
back from the ask block that's embarrassing so now if I play this again let's click the green icon what's your
name David and now hello David there we go hello David all right thank you okay so we have these functions then
in place but what more can we do well what about those condition ials and and loops and other constructs how can we
bring these programs to life so it's not just clicking a button and voila something's happening let's go ahead and
make this now even more interactive let me go ahead and throw away most of these pieces and let me just spice things up
with some more audio under sound I'm going to go to play sound meow until done here we go green
flag okay it's a little loud but it did exactly do what it said let's hear it again okay it's kind of an underwhelming
program eventually since you'd like to think the the cat would just meow on its own but I have to keep hitting the
button well this seems like an opportunity for uh doing something again and again so all right well if I wanted
to meow meow meow let me just grab a few of these or you can even right click or control click and you can copy paste
even in code here let me play this now all right so now like it's not really emoting happiness in quite the
same way it might be hungry or upset so you know let's slow it down let me go to control wait 1 second in between which
might be a little uh less worrisome here we go [Music]
play okay so if my goal was to make the C meow three times I dare say this code or algorithm is correct but let's now
critique its design is this welld designed and if not why not what are your thoughts here uh
yeah yeah sure yeah yeah so yeah agreed I could use forever a repeat but let me push a little harder
but why like this works I'm kind of done with the assignment what do I what's bad about it there's too much repti yeah
there's too much repetition right if I wanted to change the sound that the cat is making to like a different variant of
meow or have it bark instead like a dog I could change it from the drop down here apparently but then I'd have to
change it here and then I'd have to change it here and God if this were even longer that just gets tedious quickly
and you're probably increasing the probability that you're going to screw up and you're going to miss one of the
drop downs or something stupid and introduce a bug or if you wanted to change the number of seconds you're
waiting you've got to change it in two maybe even more places again you're just creating risk for yourself and potential
bugs in the program so I do like the repeat or the forever idea so that I don't repeat myself and indeed what I
alluded to being possible copy pasting earlier doesn't mean it's a good thing and in code generally speaking when you
start to copy and paste puzzle pieces or text next week you're probably not not doing something quite well so let me go
ahead and throw away most of these to get rid of the duplication keeping just two of the blocks that I care about let
me grab the repeat block for now let me move this inside of the repeat Block it's going to grow to fit it let me
reconnect all this and change the 10 just to a three and now [Music]
play so better it's the same thing it's still correct but now I've set the stage to let the cat meow for instance four
times by changing one thing 40 times by changing one thing or I could just use the forever block and just walk away and
it will meow forever instead if that's your goal that would be better a better design but still correct but you know
what now that I have a program that's designed to have a cat meow wow like why I mean MIT invented scratch scratch is a
cat why is there no puzzle piece called meow feels like a missed opportunity now to be fair they gave us all the building
blocks with which we could Implement that idea but but a principle of programming and really computer science
is to leverage what we're going to now start calling abstraction we have step-by-step instructions here the
repeat the play and the weight that collectively implements this idea that we humans would call meowing wouldn't it
be nice to abstract away those several puzzle pieces into just one that literally just says what it does meow
well here's where we can make our own blocks let me go over here to scratch under the pink uh block category here
and let me click make a block block and here I see a slightly different interface where I can choose a name for
it and I'm going to call it meow and I'm going to keep it simple that's it no inputs to meow yet I'm just going to
click okay now just going to clean this up a bit here let me drag and drop play sound and weight over here and you know
what I'm just going to drag this way down here way down here because now that I'm done implementing meow I'm going to
literally abstract it away sort of out of sight out of mind because now notice at top left there is a new puzzle piece
called meow and so at this point I'd argue it doesn't really matter how meow is implemented frankly I don't know how
ask or say was implemented by MIT they abstracted those things away for us now I have a brand new puzzle piece that
just says what it is and this is now still correct but arguably better design why because it's just more readable to
me to you it's more maintainable when you look at your code a year from now for the first time because you're sort
of fondly looking back at the very first program you wrote it says what it does the function itself has semantics which
conveys what's going on if you really care about how meow is implemented you could scroll down and start to Tinker
with the underlying implementation details but otherwise you don't need to care anymore now I feel like there's a
even additional opportunity here for abstraction and to sort of factor out some of this functionality it's kind of
lame that I have this repeat block that lets me call the meow function so to speak use the meow function through
three times wouldn't it be nice if I could just call the meow function AKA use the meow function and pass it an
input that tells the puzzle piece how many times I want it to meow well let me go ahead and zoom out and scroll down
let me right click or control click on the pink piece here and choose edit or I could just start from scratch nope pun
intended with a new one and now here rather than just give this thing a name meow let me go ahead and add an input
here and I'm going to go ahead and type in for instance n for number of times to me now and just to make this even more
userfriendly and self-descriptive I'm going to add a label which has no functional impact it's just an aesthetic
and I'm just going to say times just to make it read more like English in this case that tells me what the puzzle piece
does and now I'm going to click okay and now I need to refine this a little bit let me go ahead and grab under control a
repeat block let me move the play sound and weight into the repeat block I don't
want 10 and I also don't want three here what I want now is This n That is my actual variable that scratch is creating
for me that represents whatever input the human programmer provides notice it snaps right in place let me connect this
and now waa I have an even fancier version of meow that is perimeter IED it takes input that affects Its Behavior
accordingly now I'm going to scroll back up because out of sight out of mind I just care that Meo exists now I can
tighten up my code so to speak use even fewer lines to do the same thing by throwing away the the repeat block
reconnecting this new puzzle piece here that takes an input like three and voila now we're really programming right we've
not made any forward progress functionally the thing just meows three times but it's a better design and as
you program more and more these are the kinds of instincts still start to acquire so that one you can start to
take a big assignment a big problem set something for homework even that feels kind of overwhelming at first like oh my
God where do I even begin but if you start to identify what are the sub problems of a big problem then you can
start making progress and I do this to this day where if I have to tackle some programming related project I'm so easy
to like drag my feet or oh it's going to take forever to start until I just start writing down like a to-do list and I
start to modularize the program and say all right well what do I want this thing to do meowing what's that mean I got to
have it say something on the screen all right uh I need to have it say something on the screen some number of times like
literally a mental or written checklist or pseudo code if you will in English on a piece of paper or text file and then
you can decide okay the first thing I need to do for homework to solve this real world problem I just need a meow
function I need to use a bunch of other code too but I need to create a meow function and boom now you have a piece
of the problem solved not unlike we did with the phone book there but in this case we'll have presumably other
problems to solve all right so what more can we do let's add a few more uh pieces to the puzzle here let's actually
interact with the cat now let me go ahead and now when the green flag is clicked let me go ahead and ask a
question using an event here let me go ahead and say if let's see if the cursor I want to do something like uh implement
the notion of petting the cat so if the cursor is petting touching the cat like here something like this it'd be cute if
like the cat meows like you're petting a cat so I'm going to ask the question when the green flag is clicked if let's
see I think I need sensing uh so if touching Mouse pointer this is way too big but again the shape is fine so there
it goes grew to fill and then if it's touching the mouse pointer that is if the cat to whom this script or this
program anytime I attach puzzle pieces MIT calls them a script or like a program if you will uh let me go ahead
then and choose a sound and say play sound meow until done all right so here it is to be clear when the green flag is
clicked ask the question if the cat is touching the mouse pointer then play that sound meow here we go
play huh all right let's try again play huh I'm worried it's not scratches fault Feels Like Mine what's the bug
here why doesn't this work yeah and Beck yeah who just turned yeah the problem is the moment I
click that green flag scratch asks the question is the cat touching the mouse pointer and obviously it's not cuz the
cursor was like up there a moment ago and it's not down there it's fine if I move the cursor down there but too late
the program already asked the question the answer was no or false or zero however you want to think about it so no
sound was played so what might be the solution here be I could move my my cursor quickly but that feels like never
going to work out right other Solutions here yeah and way back the forever Loop so I could indeed
use this forever Loop because if I want my per to just constantly listen to me well let's literally do something
forever or at least forever as long as the program is running until I explicitly hit stop so let me grab that
let me go to control let me grab the forever block let me move the if inside of this forever block reconnect this go
back up here click the green flag and now nothing's happened yet but let me try moving my cursor
now ah so now that's kind of cute so now the cat is actually responding and it's GNA keep doing this again and again and
so now we have this idea of taking these different ideas these different puzzle pieces assembling them into something
more complicated and I could definitely put a a name to this I could create a custom block but for now let's just
consider what kind of more interactivity we can do let me go ahead and do this by again grabbing a when green flag clicked
let me go ahead and click the video sensing and I'm going to rotate the laptop because otherwise we're going to
get a little Inception thing here where the camera is picking up the camera is up there so I'm going to go reveal to
you what's inside the lect turn here while we rotate this and now that we have a nonv video
backdrop I'm going to say this instead of the green flag clicked actually I'm going to say when the video motion is
greater than some arbitrary measurement of motion I'm going to go ahead and play sound meow until done and then I'm going
to get out of the way so here's the cat and let's put him we'll put him on top on top of there just
okay all right and here we go so my hand is moving faster than 50 something or other whatever the unit of
measure is and we thank you we so now we have even more interactive
version but I think if I sort of slowly right I'm not it's completely creepy but I'm not like
exceeding the threshold until finally my hand moves as fast as that and so here actually is an
opportunity to show you something a former student did let me go ahead here in okay got to stop this let me go ahead
and zoom out of this in just a moment uh if someone would be if someone would be comfortable coming up not only Mass but
also on camera on the internet thought we'd play one of your former classmates projects here up on stage would anyone
like to volunteer here and be up on stage who's that yeah come on down what's your name SAR Sahar all right
come on down let me get it set up for you [Applause]
here all right let me go ahead and full screen this here so this is uh wacka by one of your former predecessors it's
going to use the camera focusing on your head which will have to position inside of this rectangle and if you ever played
the like wack-a-mole game and an arcade okay so for those who haven't like these little moles pop up and with a very
fuzzy Hammer you sort of hit down you though if you don't mind are going to use your head to do this
virtually so let's line up your head with this red rectangle if you could we'll do
beginner all right here we go Sahar give it a moment okay come a little
closer and now hit the moles with your head there we go one point one point Point
nice 15 seconds to go there we go oh yep one point 6
seconds oh no there we go quick all right a round of applause for SAR thank you
so beyond having a little bit of fun here the goal was to demonstrate that by using some fairly simple primitive some
basic building blocks but assembling them in a fun way with some music maybe some new costumes or artwork you can
really bring programs to life but at the end of the day the only puzzle pieces really involved were ones like the ones
I just dragged and dropped and a few more because there were clearly lots of moles so the student probably created a
few different Sprites not a single cap but at least four different moles they had like some kind of graphic on the
screen that showed Sahara where to position head there was some kind of timer maybe a variable that every second
was counting down so you can imagine taking what looks like a pretty impressive project at first glance and
perhaps overwhelming to solve yourself but just think about what are the basic building blocks and pluck off one piece
of the puzzle so to speak at a time and so indeed if we rewind a little bit let me go ahead here and introduce a program
that I myself made back in graduate school when scratch was first being developed by MIT let me go ahead and
open here give me just one second something that I called back in the day Oscar time that looks a little something
like this if I full screen it and hit play so you'll notice piece of trash is falling I can click on it and drag and
as I get close and close to the trash can notice it wants to go in it seems and if I let
go one point here comes another I'll do the same two points there's a sneaker falling from
the sky so another Sprite of some sort I can also get just a little a little lazy and just let them fall into
the trash thems if I want to so you can see it it doesn't have to do with my mouse cursor it has to do apparently
with the distance here let's listen a little further I think there's some additional trash is about to make its
appearance presumably there's some kind of like variable that's keeping track of this
score okay let's see what the last the last chorus here [Music]
issap okay and thus it continues and the song actually goes on and on and on and I do not have fond memories of
implementing this and hearing this song for like 10 straight hours but it's a good example to just consider how was
this program composed how did I go about implementing it the first time around and let me go ahead and open up some
programs now that I wrote in advance just so that we could see how these things are assembled honestly the first
thing I probably did was probably to do something a little like this here is just a version of the program where I
set out to solve just one problem first of planting a lamp post in the program right I kind of had a vision of what I
wanted you know it evolved over time certainly but I knew I wanted trash to fall I wanted a cute little Oscar the
Grouch to pop out of the trash can and some other stuff but wow that's a lot to just tackle all at once I'm going to
start easy download a picture of a lamp post and then drag and drop it into the stage as a costume and boom that's
version one it doesn't functionally do anything I mean literally that's the code that I wrote to do this all I did
was use like the backdrops feature and drag and drop and move things around but it got me to version one of my program
then what might version two be well I considered what piece of functionality frankly might be the easiest to pluck
off next and the trash can that seems like a pretty core piece of functionality it just needs to sit there
most of the time so the next thing I probably did was to open up for instance the trash can version here that looks a
little something now like this so this time I'll show you what's inside here there is some code but not much notice
at bottom right I changed the default cat to a picture of a trash can instead but it's the same principle that I can
control and then over here I added this code when the green flag is clicked switch the costume to something I arbitr
called Oscar one so I found a couple of different pictures of a trash can one that looks closed one that looks partly
open and eventually one that has Oscar coming out and I just gave them different names so I said switch to
Oscar one which is the closed one by default then forever do the following if touching the mouse pointer then switch
the costume to Oscar 2 else switched to Oscar 1 that is to say I just wanted to implement this idea of like the can
opening and closing even if it's not exactly what I wanted ultimately I just wanted to make some forward progress so
here when I run this program by clicking play notice what happens nothing yet but if I get closer to the trash can it
indeed pops open because it's forever listening for whether the Sprite the trash can in this casee is touching the
mouse pointer and that's it that was version two if you will and if I went in now and added the lamp post and composed
the program together now we're starting to make progress right now it would look a little something more like the program
I intended ultimately to create what piece did I probably bite off uh after that well I think what I did is I
probably decided let me Implement one of the pieces of trash not the shoe and the newspaper all at once let's just get one
piece of trash working correctly first and so let me go ahead and open this one and again all these examples will be
available on the course's website so you can see all of these examples too it's not terribly long I just implemented in
advance so we could flip through kind of quickly here's what I did here on the right hand side turned my Sprite into a
piece of trash this time instead of a cat instead of a trash can and I also created with Carter's help a second uh
Sprite this one a floor it's literally just a black line because I just wanted initially to have some notion of a floor
so I could detect if the trash is touching the floor now without seeing the code yet just hearing that
description why might I have wanted the second Sprite and this black line for a floor with the trash intending to fall
from the sky what might I been thinking like what problem might I be trying to solve yeah you don't want the first
Sprite to go through it yeah you don't want the first Sprite to start at the top go through and then boom like you
completely lose it like that would not be a very uh useful thing or it would seem to maybe eat up more and more of
the computer's memory if the trash is just endlessly falling and I can't grab it uh it might be a little traumatic if
you try to get it and you can't pull it back out and you can't fix the program and so I just wanted the thing to stop
so how might I implemented this let's look at the code at left here I have a bit of Randomness like I proposed
earlier exists there's this blue function called go to x comma y that lets me move a spray to any position up
down left right I picked a random X location either here or over here -240 to positive 240 and then a yv value of
180 which is the top and this just makes the game more interesting it's kind of lame pretty quickly if the trash always
falls from the same spot here's just a little bit of Randomness like most any game would have that spices things up
and so now if I click the green flag you'll see that it just Falls nothing interesting is going to happen but it
does stop when it touches the black line because notice what we did here I'm forever asking the question if the
distance of the Sprite the trash is to the floor is greater than zero that's fine change the Y location to NE by neg3
so move it down three pixels down three pixels until the distance to the floor is not greater than zero it is zero or
even negative at which point it should just stop moving moving Al together there's other ways we could have
implemented this but this felt like a nice clean way that logically just made it make sense okay now I got some trash
falling I got a trash can that opens and closes I have a lamp post now I'm you know a good three steps into the program
we're making progress if we consider one or two final pieces something like the dragging of the trash let me go ahead
and open up this version too uh dragging the trash requires a different type of question let me zoom in here here's the
piece of trash I only need one Sprite no floor here because I just want the human to move it
up down left right and the human's not going to physically be able to move it outside of the world and if we zoom in
on this code the way we've solved this is as follows we're using that and conjunction that we glimpsed earlier
because when the green flag is clicked we're forever asking this question or really these questions plural if the
mouse is down and the cat or sorry the trash is touching the mouse pointer that's equivalent logically to clicking
on the trash go ahead and move the trash to the mouse pointer so again it takes this very familiar idea that you and I
take for granted every day on Macs and PCs of clicking and dragging and dropping how is that implemented well
mac o Mac OS or windows are probably asking a question oh for every icon is the mouse down and is the icon touching
the mouse if so go to the location of the mouse forever while that Mouse button is clicked down so how does this
work in reality now let me go ahead and click on the play nothing happens at first but if I click on it I can move it
up down left right it doesn't move thereafter so I now need to kind of combine this idea of dragging with
falling but I bet I could just start to use just one single program right now I'm using separate ones to show
different ideas but now that's another bite out of the problem and if we do one last one something like the scorekeeping
is interesting cuz recall that every time we dragged a piece of trash into the can Oscar popped out and told us the
current score so let me go ahead and find this one Oscar variables and let me zoom in on this one
and this one is longer because we combined all of these elements so this is the kind of thing that if you looked
at at first glance like I have no idea how I would have implemented this from nothing from scratch literally but again
if you Vision take your vision and componentize it into these smaller bite-size problems you could take these
baby steps so to speak and then solve everything collectively so what's new here here is this bottom one forever do
the following if the trash is touching Oscar the other Sprite that we've now added to the program change the score by
one this is an orange and indeed if we poke around we'll see that orange is a variable like an X or a y but with a
better name changing it means to add one or if it's negative subtract one and then go ahead and have the trash go to
pick random what what is this all about well let me let me show you what it's doing and then we can infer backwards
let me go ahead and hit play all right it's falling I'm clicking and dragging it I'm moving it over and I'm Letting Go
all right let me do it once more letting go let me stop why do I have this function at the end called go to X and Y
randomly like what problem is this solving here yeah in way back yeah yeah exactly even though the human
perceives this is like a lot of trash falling from the sky it's actually the same piece of trash just kind of being
magically moved back to the top as though it's a new one and there too you have this idea of reusable code if you
were constantly copying and pasting your pieces of trash and creating 20 pieces of trash 30 pieces of trash just because
you want the game to have that many levels probably doing something wrong reuse the code that you wrote reuse the
Sprites that you wrote and that would give you not just correctness but also a better design well let's take a look at
one final set of building blocks that we can compose ultimately into something particularly interactive as follows let
me go ahead and zoom out here and let me propose that we Implement something like um like some kind of maze based game and
let me go ahead here so I want to implement some maze-based game that looks at first glance like this let me
hit play it's not a very fun game yet but here's a little Harvard Shield a couple of black lines this time vertical
instead of horizontal but notice you can't quite see my hand here but I'm using my arrow keys to go down to go up
to go left to go right but if I keep going right right right right right right right it's not going anywhere and
left left left left left left left left left left left left it eventually stops so before we look at the code how might
this be working what kinds of scripts collections of puzzle pieces might collectively help us implement this what
do you think perfect yeah there's probably some question being asked if touching the
black line and it happens to be a couple of Sprites Each of which is just literally a vertical black line we're
probably asking a question like are you touching it or is the distance to it zero or close to zero and if so we just
ignore the up down left or rather we ignore the left or the right arrow at that point so that works but otherwise
if we're not touching a wall what are we probably doing instead forever here how is the moving working presumably yeah
and back are you might are you scratching okay sure let's go sorry say a little
louder exactly it's continually forever looking or listening for the arrow keys up down left right and if the up arrow
is pressed we're probably changing the Y by a positive value uh if the down arrow is pressed We're Going Down by Y and
left and right accordingly so let's actually take a quick look if I zoom out here and take a look at the code that
implements this there's a lot going on at first glance but let's see first of all let me drag some stuff out of the
way because it's kind of overwhelming at first glance especially if you for instance were poking around online as
for problem set zero just to get inspiration most projects out there are going to look overwhelming at first
glance until you start to wrap your mind around what's going on but in this case we've implemented some abstractions from
the get-go to sort of explain to ourselves and to anyone else looking at the code what's going on this is that
program with the two black lines and the Harvard Shield going up down left and right it initially puts the shield in
the middle 0 comma 0 it then forever listens for keyboard as I think you were describing and it feels for the walls as
I think you were describing now how is that implemented don't know yet these are custom blocks we created as
abstractions to kind of hide those implementation details because honestly that's all I need to know right now but
as aspiring programmers if we're curious now let's scroll down to the actual implementation of listening for keyboard
this is the one on the left and it is a little long but it's a lot of similar structure we're doing the following if
the up arrow is pressed then change by y by one go up if the down arrow is pressed then change by by negative one
go down right arrow left arrow and that's it so it just assembles all of those ideas combines it into one new
block just cuz it's kind of overwhelming let's just implement it once and tuck it away and if we scr scroll now over to
the feel for walls function this now is asking the question as hypothesize if I'm touching the left wall change my x
value by one sort of move away from it a little bit if I'm touching the right wall then move X by negative one to move
a little bit away from it so it kind of bounces off the wall just in case it slightly went over we keep the crest
within those two walls all right so then a couple of more pieces here to introduce what if we want to actually
add some kind of adversary or opponent to this game well let me go ahead to um maybe this one here where the adversary
in this game might for instance be designed to be bouncing to stand in your way if this is like a maze and you're
trying to get the Harvard shield from the bottom to the top or vice versa uhoh Yale is in the way and it seems to be
automatically bouncing back and forth here well let me ask someone else I hypothesize how is this working this is
an idea you have this is an idea you see let's reverse engineer in your head how it
works how might this be working yeah and back yeah so if the Yale symbol is touching the left wall or the right wall
we somehow have it bounce and indeed we'll see there's a puzzle piece that can do exactly that technically off the
edge as we'll see but there's another way we can do this and let's look at the code the way we ourselves can Implement
exactly that idea bounce is just with a little bit of logic so here's what this version of the program's doing it's
moving Gale by default to 0 just to arbitrarily put it somewhere pointing it Direction 90° which means just
horizontally essentially and then it's forever doing this if touching the left wall or touching the right wall here's
our translation of Bounce we're just turning 180° and the nice thing about that is we don't have to worry if we're
going from right to left or left to right 18 180° is going to work on both of the walls and that's it after we do
that we just move one step one pixel at a time but we're doing it forever so something is happening continually and
the Yale icon is bouncing back and forth well one final piece here what if now we want a more uh uh another adversary a
more advanced adversary down the road for instance to go and follow us wherever we are such that this time we
want the other Sprite to not just bounce back and forth but literally follow us no matter where we go how might this be
implemented on the screen I bet it's another forever block but what's inside yeah forever point at the
location of the Harvard shield and go one step toward it this is just going to go on forever if I just give up at least
in this version notice it's B it's sort of twitching back and forth because it goes one pixel then one pixel then one
pixel it's sort of in a frantic State here we haven't finished the game yet but if we see inside we'll see exactly
that it didn't take much to implement this simple idea go to a random position just to make it kind of fair initially
then forever Point towards Harvard which is what we call the Harvard Crest spray move one step suppose we now wanted to
make a more advanced level what's a minor change I could logically make to this code just to make MIT even better
at this all right change the number of steps to two so let's try that so now
they got twice as fast let me go ahead and just get this out of the way oops let me make it a fair fight green flag
all right I unfortunately am still moving one pixel at a time so this isn't going to end well it caught up to me and
if we're really aggressive and do something like 20 steps at a time click the green flag Jesus okay so that's how
you might then make your levels progressively harder and harder so it's not an accident that we chose these
particular examples here involving these particular schools because we have one more demonstration we thought we'd
introduce today if we could get one other volunteer to come up and play what was called by one of your predecessors
Ivy's hardest game let's see who in the middle do you want to come on up what's your
name say again come a little closer actually and say sorry hard to hear here all right a round of applause here if we
could too okay sorry what was your name Celeste ceste Celeste Celeste come on over nice
to meet you too so here we have on this other screen Ivy's hardest game written by a former cs50 student I think you'll
see that it combines these same principles the maze is clearly a more advanced the goal at hand is to
initially move the Harvard Crest to the Sprite all the way on the right so that you catch up to him in this case but
you'll see that there's different levels and different in uh levels of sophistication so if you're up for it
you can use just these arrow keys up down left right you'll be controlling the harbard Sprite and if we could raise
the volume just a little bit we'll make this our final example here we go clicking the green flag
[Music] [Applause] [Music]
[Applause] [Music] [Music]
[Music] [Music] [Music]
[Music] [Music] [Music]
[Music] all right that's it for cs50 Welcome to the class we'll see you next time
[Music] [Music] all right this is cs50 and this is week
one the one in which you learn a new language which is something we technically said last week at least if
you had never played with this graphical language known as scratch before which itself was a programming language but
today as promised we transition to something a little more traditional a little more text based not puzzle piece
or block based known as C this is an older language it's been around for decades but it's a language that
underlies so many of today's more modern languages among them something called python that we'll also come to in a few
weeks time indeed at the end of the semester the goal is free to feel that you've not learned scratch you've not
learned C or even python for that matter but fundamentally that you've learned how to program unfortunately when you
learn how to program with a more traditional language like this there's just so much distraction last week I
described all of the syntax all of the weird punctuation that you see in this like the hash symbol these angled
brackets parentheses curly braces backs slash and and more well today we're not going to reveal what all of those little
particulars mean but by next week will this no longer look like the proverbial Greek to you a language that presumably
you've never actually seen or typed before but to do that we'll explore some of the very same topics as last week so
weall that by a scratch and presumably VI a problem set one we took a look at things called functions that are actions
or verbs and related to functions where arguments like inputs and related to some functions where return values like
outputs then we talked a bit about conditionals Forks in the road so to speak Boolean Expressions which are just
yes no questions or true false questions Loops which let you do things again and again variables like in math that let
you store values temporarily and then even other topics still so if you were comfortable on the heels of problems at
zero in last week realize that all of these topics are going to be remain with us so really today is just about
acquiring all the more of a mental model for how you translate those ideas into presumably a very cryptic new syn a new
syntax frankly that's actually more simple in some ways than your own human language be it English or something else
because there's far fewer vocabulary words there's actually far less syntax that you might have in say a typical
human language but you need to be with these computer languages all the more precise so that you're most uh
ultimately correct and ultimately we'll see too your code is successful along a few other lines as well so if you think
about like the last time you kind of wandered around not really knowing what you were doing or encountered something
new might not have been that long ago entering Harvard Yard for the very first time or old campus or the like be it in
Cambridge or New Haven you know you didn't really need to know how to do everything as a first year you didn't
need to know who everyone was where everything was how Harvard or Yale or anything else for that matter worked you
sort of got by data Day by just focusing on those things that matter and anything you didn't really understand you sort of
turned a blind eye to until it's important and that's indeed what we're going to do today and really for the
next several weeks we'll focus on details that are initially important and try to wave our hands so to speak at
details that yeah eventually we'll get to might be interesting but for now they might be distractions and by
distractions I really mean some of that syntax to which I alluded earlier so by the end of today and really by the end
of problem set one your first Fay presumably into this language called C you'll have written some code and you'll
be asking yourself We'll be asking yourselves just how good is that code well first and foremost per last week be
it in scratch or phone book form like code ultimately needs to be correct to be Well Done Right you want the problem
to be solved correctly so that one sort of goes without saying and along the way this term will provide you with tools
and techniques so you don't have to just sit there sort of endlessly trying an input checking the output trying another
input checking the output there's a lot of automation tools in the real world and in this class and others like it
that'll help facilitate you answering that question for yourself is my code correct according to our specifications
or the like but then something that's going to take more time and you're probably not going to feel 100%
comfortable with the first week the first weeks is just how welld designed your code is it's one thing to sort of
speak English or write English but it's another thing or any language for that matter but it's another thing to speak
it or write it well and we spent all these years in middle school high school presumably writing papers and other
documents getting grades and feedback on them as to how well formulated your arguments were how well structured your
paper was and the like and there's that same idea in programming it doesn't matter necessarily that you've just
solved the program uh a problem correctly if your code is a complete visual mess or if it's crazy long it's
going to be really hard for someone else to wrap their mind around what your code is doing and indeed to be confident if
it is correct and honestly you the next morning the next year uh the uh next time you look at that code might have no
idea what you yourself were even thinking but you will if you focus too on designing good code getting your
algorithms efficient getting your code nice and clean and even making sure your code looks pretty which would describe
as a matter of style so in the sort of written human world you know having punctuation in the right place
capitalization and the like the sort of way you write an essay but not necessarily send a text message relates
to style for instance and so good style and code is going to have a few of these characteristics that are pretty easily
taught and remembered but you just have to start to get in the habit of writing code in a certain way so these three
axes so to speak correctness design and style are really the overarching goals when writing code that ultimately is
going to look like this so this program we conjectured last week does what if you run it on a Mac or PC or
somewhere else presumably what does it do yeah it just prints hello world and
honestly that's kind of atrocious that you need to hit your keyboard keys this many time with this cryptic syntax just
to get a program to say hello world so a spoiler in a few weeks time when we introduce other more modern languages
like python you can distill this same logic into literally one line of code and so we're getting there ultimately
but it's it's helpful to understand what what it is that's going on here CU even though this is a pretty cryptic syntax
there's nothing after this week and really next week that you shouldn't be able to understand even about something
that right now looks a little something like this so how do you write code well I've given us sort of the answer to a
problem how do you print hello world on the screen so what do I do with this code well we're in the habit of
typically writing things with like Microsoft Word or Google documents and yeah I could open up word or Google Docs
or Pages or the like and just literally transcribe that character for character save it and boom I've got a program but
the problem per last week is that computers only understand or speak what other language so to
speak yeah so binary zeros and ones and so this obviously is not zeros and on so it doesn't matter if I put it in a word
doc gooogle doc Pages file or the like the computer's not going to understand it until I somehow translate it to zeros
and ones and honestly none of those tolls that I rattled off are really appropriate for programming why well
they come with features like bold facing and italics and sort of fluffy aesthetic stuff that has no functional impact on
what you're trying to do with your code and they don't have the ability it would seem to convert that code ultimately to
zeros and ones but tools that do have this capability might be called integrated development environments or
idees or more simply text editors a text editor is a tool that a programmer uh uses perhaps every day to write their
code and it's a simple program here for instance a very popular one called Visual Studio code or VSS code and at
the top here you see that I've actually created in advance before class A very simple empty file called hello. C why
well C indicates by convention that this is going to be a file in which there is C code it's not docx which would mean in
this file is a Microsoft Word document or pages is a Pages file this is do c which means in this file is going to be
text in the language called C this number one here is just sort of an automatic line number that's going to
help me keep track of how long or short this program is and the cursor is just blinking there waiting for me to start
typing some code well let me go ahead and type out exactly the same code for me it comes pretty comfortably from
memory so I'm going to go ahead and include something called standard io. more on that later I'm going to sort of
magically type int main void whatever that means we'll come back to that later one of these curly braces and then a
sibling there that closes the same then I'm going to hit tab to indent a few spaces and then I'm going to type not
print but print F then hello comma world back SL n close quote close parenthesis semicolon and I dare say this was
essentially the very first program I wrote some 25 years ago I wrote it to say hi cs50 now it just says the more
canonical conventional hello world but that's it that's my very first program and all I need to now do is maybe hit
command s or contr S to save the file and voila I am a programmer the catch though is like okay how do I run this
like on your Mac or PC how do you run a program well usually double click an icon on your phone you tap an icon in
this environment that we're using and that many programmers dare say most programmers use you don't have
immediately a nice pretty icon to double click on that's very user friendly but it's not very necessary especially when
you get more comfortable with programming you're going to want to type commands because it's just faster than
pointing and clicking a mouse and you're going to want to automate things which is a lot easier if it's all command or
text based as opposed to Mouse and and mus muscular movements and so here I have my program lives in this file
called hello.c I need to now convert it though to zeros and ones well how do I go about doing this and how am I going
to get from those uh this so-called code or source code as it's conventionally called to this these zeros and ones that
we'll now start calling machine code the zeros and ones from last week can be used not only to represent numbers and
letters colors uh audio video and more it can also represent instructions to a computer like print or play a sound or
delete a file or save a file all the sort of basics of a computer somehow can be represented by other patterns of
zeros and ones and just like last week it depends on the context in which these numbers are stored sometimes they're
interpreted as numbers like in a spreadsheet sometimes they're interpreted as colors sometimes they're
interpreted as instructions commands to your computer to do very low-level operations like print something on the
screen so fortunately like last week's definition of like computer science of problem solving is a nice mental model
for exactly the goal at hand I have some input AKA source code I want to Output ultimately machine code those zeros and
ones I certainly don't want to do this kind of process by hand so hopefully there's an algorithm implemented by some
special program that does exactly that and those of you who do have some prior experience this program might be called
a a compiler so a few of you have indeed programmed before not all languages use compilers see in fact is a language that
does use a compiler and so I just need to find myself on my computer somewhere presumably a so-called compiler a
program whose purpose in life is to convert one language to another and source code written textually in C like
we saw a moment ago is source code the machine code is the corresponding zeros and ones so let me go back to the same
programming environment called Visual Studio code or VSS code this is typically a program you or any
programmer on the internet can download onto their own Mac or PC and be on their way with whatever computer you're own
writing some code a downside though of that approach is that all of us have slightly different versions of Macs or
PCS we have slightly different versions of operating systems they may or may not be up to date it's just sort of a
technical support nightmare to create a uniform environment especially for like an introductory class where everyone
should ideally be on the same page so we can get you up and running quickly and so I'm actually using a a cloud-based
version of vs code something that you only need a browser to access and then you can be on any computer today or
tomorrow by the end of the semester we're going to get you out of the cloud so to speak as best we can and get you
onto your own Mac or PC so that after this class especially if it's the only CS class you ever take you feel like you
can continue programming in any number of languages even with cs50 behind you but for now wonderfully the browser
version of es code should pretty much be identical to what the eventual downloadable vers vers of the same would
be and you'll see in problem set one how to access this and how to get going yourself with your first programs but I
haven't mentioned this bottom part of the screen this bottom part of the screen and this is an area where we have
what's called a terminal window so this is sort of old school technology that allows you with a keyboard to interact
with a computer wherever it may be on your lap in your pocket or even in this case in the cloud so on the top hand
portion of this screen is my code my text editor like tabed Windows like in many programs where I can just create
files and write code the bottom of the screen here my so-called terminal window gives me the ability to run commands on
a server that currently I have exclusive access to so because I logged into VSS code with my account online I have my
own sort of virtual server if you will in the cloud otherwise known as in this context a container this has its own
operating system for me its own hard drive if you will where I can save and create file of my own separate from
yours and vice versa and it's at this very simple prompt which is conventionally but not always
abbreviated by a dollar sign has nothing to do with currency it just means type your commands here this is where I'm
going to be able to type commands like compile my source code into machine code so it's a command line interface or
CLI on top of an operating system that you might not have ever used or seen but it's very popular called Linux odds are
almost all of us in this room are using Mac Os or Windows right now but we're all going to start using an operating
system called Linux which is in a family of operating systems that offer not only this command line interface but are used
not just for programming but for serving websites and developing applications and the like and it's indeed uh a familiar
and very powerful interface as we'll see so how do I go about making this file hello.c into a program there's no double
there's no icon to double click but there is a command I can type make hello at this dollar sign prompt go ahead and
hit enter and nothing appears to happen but that's a good thing and as we'll see in programming almost always if you
don't see anything go wrong that means everything went right so this is going to be a rarity at first but this is a
good thing that it just seems to do nothing but now there is in the folder in my account in this on the cloud a
file called hello and it's a bit of a weird command but you'll get familiar with it before long dot just means go
into my current folder SL Hello means run the program called hello in this current folder so/ hello
and then enter and voila now I'm actually not just programming but running my actual code so what have I
just done let me go ahead and do this I'm going to go ahead and open up the sidebar of this program and you'll see
in problem set one how to do this and this might look a little different based on your own configuration even the color
scheme I'm using might ultimately look different from yours cuz it's support it's nice colorful theme so you can have
different colors and uh brightnesses depending on your mood or the time of day what I've opened here though is what
is called in vs code Explorer and this is just all of the files in my cloud account and there's not many right now
there's only two one is the file called hello.c and it's highlighted because I've got it open right there and the
other is a file called hello which is brand new and was created when I ran that command and what's now worth noting
is that now things are getting a little more like Mac OS and windows like on the left hand side you have a guey a
graphical user interface but on the bottom here again you have a CLI command line interface these are just different
ways to interact with computers and you'll get comfortable with both and honestly you're certainly familiar and
comfortable with goys already so it's the command line one with which we'll spend some time now suppose that I just
wanted to do something more than compile this program suppose I wanted to go ahead and remove it like uhuh no I made
a mistake I want to say hello cs50 not Hello World I could just just hover up here like in any software and I could
rightclick and I could poke around and there delete permanently so most of us might have that instinct on a Mac or PC
you right click or control click and you poke around but in a command line interface let me do this instead the
command for removing or deleting a file in the world of Linux this other operating system is just to type RM for
remove and then hello enter it's a somewhat cryptic confirmation message but this just means are you sure I'm
going to go ahead and type why for yes and now when I hit enter Watch What Happens at top left in the Explorer the
goey the graphical interface voila it disappears right not terribly exciting but this just means this is a graphical
version of what we're seeing here and in fact if you want to never use the guey again I'll go ahead and close it with a
keyboard shortcut here you can forever just type LS for list and hit enter and you will see in the command line
interface all of the files in your current folder so anything you can with the mouse you can do with this command
line interface and indeed we'll see many more things that you can do as well but the inventors of this and it's uh this
operating system it's his predecessors were very succinct like the command is RM for remove the command is LS for list
it's very tur why cuz ah it's just faster to type so before we Forge ahead with making something more interesting
than just hello world let me pause here to see if there's questions on source code or machine code or compiler or this
command line interface yeah really good question and let me
recap if I were to make changes to the program run it and then maybe make other changes and try to rerun it would those
changes be reflected even though I've rewarded slightly well let's do this I already removed the old version so let
me go ahead and point out that if I do/ hello now I'm going to see some kind of error because I just deleted the file no
such file or directory so it's not terribly user friendly but it's saying what the problem is is let me go ahead
and remake it by typing make hello now if I type LS I'll see not one but two files again and one of them is even
green with a little asterisk to indicate that it's executable it's sort of the textual version of something you could
double click in our human world so now of course if I run hello we're back where I started hello world but now
suppose I change it to hello comma cs50 like I did years ago let me go ahead and save the file with command s or controls
down here now let me run /hello again and voila huh so let me ask someone else to answer that question what's the
missing step why did it not say hello cs50 yeah yeah so you didn't I didn't compile
it again so sort of newbie mistake you're going to make this mistake and many others before long but now let me
go ahead and remake hello enter it's going to seemingly make uh the same program but this time when I run it it's
hello cs50 all right any other questions on some of these building blocks and we'll
come back to all the crazy syntax I typed before long but for now we're focusing on just the output
yeah uh when I keep running make it creates a new version of the machine code so it keeps changing the hello
program in the hello file and that's it there's no make file per se ah good question no if I open up that
directory you'll see that there's just the one and it doesn't matter how many times I run make hello through 3 4 5
it's just keeps overwriting the original so it's kind of like just saving in the world of Google docs or Microsoft Word
or the like but there's an additional step today we have to then convert my words to the computers the zeros and
ones yeah in front ah what happens if I run hello. C so let me go ahead and do/ hello.c which
is a mistake you'll invariably make early on permission denied so what does that mean this is where the error
messages mean something to the people who design the operating system but it's a little cryptic it's not that you don't
have access to the file it means that it's not executable this is not something you have permission to run but
you do have permission to read or write it that is change it ah really good question so if I have
named my file hello. C or more generally something. c one of the things that make does is it automatically picks the file
name for me and we'll discuss a bit we'll discuss this a bit more next week make it itself is kind of the first of
White Lies today itself is not a compiler it's a program that knows how to find and use a compiler on the system
and automatically create the program if I use as we'll discuss next week the actual compiler myself I have to type a
much longer sequence of commands to specify explicitly what do I want the name of my program to be make is a nice
program especially in week one because it just automates all of that for us and so here we have now a program that very
simply prints something on the screen so let's now put this this into the context of where we left off last time in the
context of scratch and inputs and outputs so we discussed last time of course functions and arguments functions
again are those actions and verbs like say or ask or the like and the arguments were the inputs to those functions
generally in those little white ovals that in scratch you could type words or numbers into we'll see in all of the
languages we're going to see this term have that same capability and let's just start to translate one of these things
to another so for instance let's put this same program in C in the context of scratch this is what hello world looked
like last week in the form of one function this week of course it looks like prints and then the parentheses
notice are kind of deliberately designed in the world of scratch to resemble that same shape even though this is a white
oval you kind of get that um it's uh it's kind of evoking that same idea with the parenthesis technically the function
in C it's not called say it's not even called print it's called print f F the F stands for formatted but we'll see what
that means in a moment but print f is the closest analogous function for say in the world of C notice if though you
want to print something like hello world or hello cs50 in C you don't just write the words as we did last week you also
had to add what if you notice already what's missing from this version yeah so the double quotes on the left and the
right so that's necessary in C whenever you have a a string of words and I'm us using that word deliberately whenever
you have multiple words like this this is known as a string as we'll see and you have to put it in double quotes not
single quotes you have to put it in double quotes there's one other stupid thing that we need to have in my C code
in order to get this function to do something ultimately which is what semicolon so just like in our human
world you eventually got into the habit of using at least in formal writing periods semicolon is generally what you
use to finish your thought in the world of programming with C all right so we have that function in place now what
does this really fit into in terms of the mental model while functions take arguments and it turns out functions can
have different types of outputs and we've actually seen both already last week one type of output from a function
can be something called a side effect and it generally refers to something visual like something appearing on the
screen or a sound playing from your computer it's sort of a side effect of the function doing its thing and indeed
last week we saw this in the context of passing in something like hello world as input to the save function and we saw on
the screen hello world but it was kind of a one-off it's it's one and done you can't actually do anything with that
visual output other than consume it visually with your human eyes but sometimes recall last week we had
functions like the ask block that actually returned me some value remember the ask what's your name it handed me
back whatever answer the human typed in it didn't just arbitrarily display it on the screen the cat didn't necessarily
say it on the screen it was stored instead in that special variable that was uh called answer because some
functions have not side effects but return values they hand you back an output that you can use and reuse unlike
the side effect which again displays and that's it you can't sort of catch it and hold on to it so in the context of last
week we had the ask block and that had this special answer return value in C we're going to see in just a moment we
could translate this as follows the closest match I can propose for the ask block is a function that we're going to
start calling get string string is again a word a set of words like a phrase or a sentence in programming it two is a
function in so far as it takes input and pretty much this isn't always true but very often when you have a word in C
followed by an open parenthesis and a closed parenthesis it's most likely the name of a function and we're going to
see that there's some exceptions to that but for now this indeed looks like a function because it matches that pattern
if I want to ask the question what's your name question mark and I'm even going to deliberately put a space there
just to kind of move the cursor a little bit over so that the human isn't typing literally after the question mark so
that's just a nitpicky aesthetic this is perhaps the closest analog to just asking that question but because the ask
block returns a value the analog here for get string is that it to returns a value it doesn't just print the human's
input it hands it back to you in the form of a variable AKA a return value that I can then use and reuse now
ideally it would be as simple as this literally saying answer on the left equals and this is where things kind
start to diverge from math and sort of our human world this equal sign henceforth is not the equal sign it is
the assignment operator to assign a value means to store a value in some variable and you read these things
weirdly right to left so here is a function called get string I claim that it's going to return to you whatever the
human types in is their name it's going to get stored over here on the left because of this so-called assignment
operator that yes is an equal sign but it doesn't mean equality in this context it makes things equal but it does so by
copying the value on the right into the thing on the left unfortunately we're not quite done yet with c and this is
where again it gets a little Annoying at first where I scratch just let us express ressor ideas without so much
syntax in C when you have a variable you don't just give it a name like you did in scratch you also have to tell the
computer in advance what type of value it is storing string is one such type of value int for integer is going to be
another and there's even more than that that we'll see today and Beyond and this is partly an answer to the question that
came up one or more times last week which was how does a computer distinguish this pattern of zeros and
ones from this like is this a letter a number a color a piece of video and I just claimed last week that it totally
depends on the program it depends on the context and that's true but within those programs it often depends on what the
human programmer said the type of the value is if this specifies it's a string which means interpret the following
zeros and ones that are stored in my program as words or letters more generally if it said int for integer it
would be implying by the programmer treat the following zeros and ones in my program as a number an integer not a
string so here's where this week unlike with scratch which just kind of figures out what you mean with see in a lot of
languages you have to be this pedantic and tell it what you mean there's still one stupid thing missing from my code
here what's still missing here yeah and we still need the stupid semicolon and I'm sort of imputing it here because
honestly these are the kinds of stupid mistakes you're going to make today tomorrow this weekend next week a few
weeks from now until you start to notice this and recognize it as well as you do English or whatever your uh spoken
language is Yeah question good question suppose I mix apples and oranges so to speak and I try to put a string in an
INT or an INT in a string the compiler is going to complain so when I run that make Command as I did earlier it's not
going to be nice and blissfully quiet and just give me another prompt it's going to yell at me with honestly a very
cryptic looking error message until we get the muscle memory for reading it other questions
ah what happened to the back slash n so we'll come back to that in just a moment if we may because I have deliberately
omitted it here but we did have it earlier and we'll see the different behavior in a sec other
questions yeah not at all nitpicky these are the kinds of things that just matter and it's going to take time to recognize
and develop this muscle memory everything I've typed here except for the W at the moment is lowercase and the
W is capitalized just because it's English everything else is lowercase and this kind of varies by language and also
context so in many languages the convention is to use all lowercase letters for your variable names other
languages might use some capitalis as well but we'll talk about that before long but this is the kind of thing that
matters and is hard to see at first especially when a little s doesn't look that different you know when it's on
your tiny laptop screen from a capital S but you'll start to develop these instincts all right so besides this
particular block let's go ahead and consider how we can go about implementing this now in code so let me
switch back to VSS code here this was the program I had earlier and let me go ahead and undo my cs50 change and this
time just rerun it rerun make on hello with the original version with the back slash n enter nothing bad seems to have
happened so/ hello enter hello world now if you're curious this is a good instinct to start to acquire what
happens if I get rid of this well I'm probably not going to break things too badly so let's try let me go ahead now
and do make hello still compile so it's not a really bad mistake so let me go ahead and run/
hello what's the difference here yeah what do you see that's different yeah the dollar sign my
so-called prompt stayed on the same line why well we can presumably infer now that the back sln is some fancy notation
for saying create a new line move the cursor so to speak to the next line notice that the cursor will move to the
next line in my terminal window if I keep hitting it it just automatically by nature of hitting enter does it but it'd
be kind of stupid if when you run a program in this world simple as it is if like the next command is now weirdly
spaced in the middle of the terminal with the Dollar on it just looks sloppy right that's it's really just an
aesthetic argument and notice that it's not acceptable or correct to do this to hit enter there let me go ahead and save
that though and see what happens let me go ahead now and run uh make hello enter oh my God like four errors this is like
what 10 lines of Errors for like a oneline program and this is where again you'll start to develop the instincts
for just reading this stuff these kinds of tools like the compiler tool we're using were not designed necessarily with
user friendliness in mind that's changed over the decades but certainly early on it's really just meant to be correct and
precise with its errors so what did I do here missing terminating close quote character long story short when you have
a string in C your double quotes just have to be on the same line just because now there's the slight white lie there's
ways around this but the best way around it is to you this use this so-called escape sequence to escape something
means generally to put a backslash and then a special symbol like in for new line and this is just the agreed upon
way that humans decades ago decided okay you don't just hit your uh Enter key you instead put back /n and that tells the
computer to move the cursor to the new line so again kind of but once you know it like that's it it's just another uh
word in our vocabulary so now let me transition to making my program a little more interactive instead of just saying
hello world let me change it like last week to say hello David or whoever's interacting with the program so I'm
going to do string answer gets get string quote unquote what's your name I'm not going to bother with a new line
here I could this is now just a judgment call I I deliberately want the human to type their name on the same line just
because and how do I now print this well last week recall we used say and then we used the other block called join so the
idea here is the same but the syntax this week is going to be a little different it's going to be printf which
prints something on the screen I'm going to go ahead and say hello comma and let me just go with this initially with the
back sln semicolon let me go ahead and re compile my code whoops huh damn doesn't work still and look at all these
errors like there's more errors than code I wrote but what's going on here well this
is actually something a mistake you'll see somewhat often at least initially and let's start to glean what's going on
here so here if I look at the very first line of output after the dollar sign so even though it uh jumped down the screen
pretty fast I wrote make hello at the dollar sign prompt and then here's the first error on hello.c line five
technically character five but generally line is enough to get you going there's an error use of Undeclared identifier
string did you mean standard in so I didn't and this is not an obvious solution at first but you'll start to
recognize these patterns in error messages it turns out that if I want to use string I actually have to do this I
have to include another Library up here another line of code rather called cs50.h we'll come back to what this
means in just a moment but if I now retroactive ly say all right what does standard IO do for us up here before I
added that new line what is standard IO doing well if you think back to scratch there were a few examples with like the
camera and with the speech to uh the text to voice remember I had to poke around in like the extensions button and
then I had to load it into scratch it didn't come natively with scratch C is quite like that some functions come with
the language but in for the most part if you want to use a function an action or verb like printf you have to load that
extension so to speak that more traditionally is called a library so there is a standard IO Library stdo
standard IO where IO just means input and output which means just like in mit's World there was an extension for
doing text to voice or for using your camera in C there's an extension AKA a library for doing standard input and
output and so if you want to use any functions related to standard input and output like keybo text from a keyboard
you have to include standard io. and you have to then can you use print def same goes here get string it turns out is a
function that cs50 wrote some time ago and on as we'll see over the coming weeks it just makes it way easier to get
input from a user C is very good with printf at printing output on the screen C makes it really annoying and hard as
we see in a few weeks to just get input from the user so we wrote a function called get string but the only way you
can use that is to load the extension AKA load the library called cs50 and we'll come back in time like why is it h
why is it a hash symbol but for now standard IO is a library that gives you access to printf and input and output
related stuff cs50 is a second library that provides you with access to spe functions that don't come with C that uh
include something like get string so with that said we've now kind of teased apart at a high level what lines two and
now one are doing let me go ahead and rerun make hello now it worked so all those crazy error messages were resolved
by just one fix so key takeaway is not to get overwhelmed by the sheer number of Errors let me now do/ hello and if I
type in my name what am I going to see what do you think yeah hello answer because the
computer is going to take me literally and it turns out that if you just write hello comma answer all in the double
quotes you're really just passing English as the input to the print de function you're not actually passing in
the variable and unfortunately in C it's not quite as easy to plug things in to other things that you've typed remember
in scratch there was not just the sa block but the join block which was kind of pretty it does you go you can combine
apples and oranges or what was it apple and banana then we changed it to hello and then the answer that the human typed
in in C the syntax is going to be a little different you tell the computer inside of your double quotes that you
want to have a placeholder there a so-called format code percent s means hey computer put a string here
eventually then outside of your quotes you just add a comma and then you type in whatever variable you want the
computer to plug in at that percent s location for you so percent s is format code which serves as a placeholder and
now the print F function was designed by humans years ago to figure out how to do the apple and banana thing of joining
two words together it's not nearly as user friendly as it is in scratch but it's a very common Paradigm so let me
try and rerun this now make hello no errors that's good/ hello what's my name David if I type enter now now it's hello
David and the print F here's the f in printf it formats it's input for you by using these placeholders for things like
strings represented Again by percent s so a quick question then if I focus here on line seven for just a moment and even
zoom in here how many inputs is print F taking as a function a moment ago I'd admit
that it was taking one input hello world quote unquote how many inputs might you infer print f is taking
now two and it's implied by this comma here which is separating the first one quote unquote hello percent s from the
second one answer and then just as a quick safety check here why is it not three because there's obviously two
commas here why is it not actually three arguments or inputs exactly the comma in the to the
left is actually part of my English grammar that's all so same syntax and again here's where again programming can
just be confusing on because we're using the same special punctuation to mean different things it just depends on the
context and so now is actually a good time to point out all of the somewhat pretty colors that have been popping up
on the screen here even though I wasn't going to like a format menu I wasn't bold facing things I certainly wasn't
changing things to red or blue or whatnot that's because a text editor like vs code syntax highlights for you
this is a feature of so many different programming environments nowadays VSS code does it as well if your text editor
understands the language that you're programming in c in this case it highlights in different colors the
different types of ideas in your code so for instance string and answer here are in black but get string a function is in
this uh sort of nasty Brown yellow here right now but that's just how it's displays on the screen the string though
here in red is kind of jumping out at me and that's marginally useful the percent s is in blue that's kind of nice because
it's jumping out at me and so it's just using different colors to make different things on the screen pop so you can
focus on how these ideas interrelate and honestly when you might make a mistake for instance let me accidentally leave
off this quote here and now all of a sudden everything notice if I delete the quote the colors start to get a little
arai but if I go back there and put it back now everything's back in place what's another feature of this text
editor notice when my cursor is next to this parenthesis which demarcates the end of the inputs to the function notice
that highlighted in green here is the opening parenthesis why it's just a visually useful thing especially when
you start writing more and more code just to make sure your parentheses are lining up and that's true for these
curly braces over here on the left and the right we'll come back to those in a moment if I put my cursor there you can
see that these things correspond to one another so it's nothing in your code fundamentally it's just the editor
trying to help you the human program and you can even see it though it's a little subtle see these four dots here and
these four dots here that's my indentation I'm fingered vs code to indent by four spaces which is a very
common convention anytime I hit the Tab Key this too can help you make sure once we have more interesting and longer
programs that everything lines up nice and neatly all right any questions then on
printf or more yeah short answer yes printf can handle more than one type of variable or value
percent s is one we're going to see percent I is another for plugging in an integer you can have multiple I's
multiple S's and even other symbols too we'll come back to that in just a little bit printf can take uh many more
arguments than just these two this is just meant to be representative yeah over here can you
declare variables within the printf no the variables the only variable I'm using right now is answer and it's got
to be done outside the context of print F in this case good question we'll see more of that before long uh yeah in back
how do we download the cs50 library so uh we will show you in problem set one exactly how to do that it's
automatically done for you in our version of VSS code in the cloud if ultimately you program on your own Mac
or PC either initially or later on it's also installable online but if you want to ask that um via uh online or
afterward we can point you in the right direction but pet one will itself yeah string is the type of the variable
or the more properly the data type of the variable int is another keyword I alluded to earlier haven't used it yet
int for integer is going to be another type or data type of variable
yeah ah good question could I go ahead and just plug in this function kind of like we did in scratch getting rid of
the variable Al together and just do this which recall is reminiscent of what I did in Scratch by plopping block on
top of block on block am I answering that right can I put string in front of get
string no you only put the word string in front of a variable that you want to make string and even though I'm
apparently answering the wrong question let me go ahead and zoom out save this do make hello again seems to compile
okay if I run/ Hello type in David waa that too works and so actually let's go down this rabbit hole for just a moment
clearly it's still correct at least based on my limited testing is this better designed or Worse
design let's open that question like we did last week yeah yeah I kind of agree with that um reasonable people could
disagree but I I do agree that this seems harder to read because like I have to I start reading here but wait a
minute get string is going to get used first and then it's going to give me back a value so yeah it just feels like
it was nicer to read top to bottom I would say your thoughts yeah and so this is useful if I
only want to print out the person's name once if I want to use it later in a longer program I'm out of luck and so I
haven't saved it in a variable so I think long story short we could sort of debate this all day long but in this
case like if you can make a reasonable argument one way or the other like that's a pretty solid ground to stand on
but invariably reasonable people are going to disagree um whether first-time programmers or many years after that so
let's frame this one last example in the context of the same process of taking inputs and outputs the functions we've
been talking about all take inputs otherwise now known as arguments or parameters pretty much synonymous that's
just the fancy word for for an input to a function and some functions have either side effects like we saw printing
something saying something on the screen sort of visually or audibly or they return a value which is a reusable value
like name or answer in this case if we look then at what we did last time in the world of scratch last week the input
was what's your name the function was ask and the return value was answer and now let's take a look at this block
which is honestly an more userfriendly version of what we just did with the percent s last week we said say then
join then hello and answer but the interesting takeaway there was not how to say hello anything it was the fact
that in Scratch 2 the output of one function like the green join could become the input to another function the
purple say the syntax in C is admittedly pretty different but the idea is essentially the same here though we have
hello a placeholder but we have to in this world of C tell printf what we want to plug in for that placeholder it's
just different but that's the way to do it when we get to Python and other languages later in the term there's
actually easier ways to do this but this is a very common Paradigm particularly when you want to format your data in
some way all right let's then take a step back to where we began which was with that whole program which had the
include and it had int main void and all of this other cryptic syntax this scratch piece last week week was kind of
like the go-to whenever you want to have a main part of your program it's not the only way to start a scratch program you
could listen for clicks or other things not just the green flag but this was probably the most popular place to start
a program in scratch in C the closest analog is to literally write this out so just like
last week if you were in the habit of dragging and dropping when green flag clicked as a c programmer the first
thing you would do was after creating an empty file like I did with hello.c you'd probably type int main void open curly
brace Co curly brace and then you can put all of your code inside of those curly braces so just like scratch had
this sort of magnetic nature to it where the puzzle pieces would snap together C is a text based language tends to use
these curly braces one of them opened the other one closed and anything inside of those braces so to speak is part of
this puzzle piece AKA main so what was a top them we went down this Rabbit Hole a moment ago with these
things called header files even though I didn't call them by this name but indeed when we have a whole program in scratch
super easy just have the one green flag clicked and then say hello world there's no special syntax after all it's meant
to be very user friendly and graphical in C though you technically can't just put int main void print F hello world
you also need this because again you need to tell the compiler to load the library code that someone else will
wrote so that the compiler knows what printf even is you have to load the cs50 library whenever you want to use
get string or other functions like get int as we'll soon see otherwise the compiler won't know what get string is
you just have to do it this way the specific file name I'm mentioning here standard i.h
cs50.h is what C programmers called a call a header file we'll see eventually what's inside of those files but long
story short it's like a menu of all of the available functions so in cs50.h there's a menu mentioning get string get
int and a bunch of other stuff and in as standard i.h there's a menu of functions among which are printf and that menu is
what prepares the compiler to know how to implement those same functions all right let me pause here
question a not quite a library provides all of the functionality we're talking about a header file is the very specific
mechanism via which you include it and we'll discuss this more next week for now they're essentially the same but
we'll discuss nuances between the two next week yeah the library would be standard
iio the library would be cs50 the corresponding header file is standard i.h
cs50.h indeed other questions yeah indeed I that two is on the menu we'll come back to that but the word
word string incredibly common in the world of programming it's not a cs50 idea but in C there's technically no
such data type as string by default we have sort of conjured it up to simplify the first few weeks that's a training
wheel that will very deliberately in a few weeks take away and we'll see why we've even using get string and string
because C otherwise makes things uh quite uh quite more challenging early on which then gets besides the point for us
yeah yes early on you will have to use whatever is prescribed by the
specification that will include cs50's functions long story short you referred I think a moment ago to a another
function called scan F we won't talk about for a few weeks long story short in C it's pretty easy and possible to
get input from a user the catch is that it's really easy to do it dangerously and C because it's an older uh lower
level language so to speak that gives you pretty much all uh ultimate control over your computer's Hardware it's very
easy to make mistakes and indeed um that's too why we use the library so your code won't crash
unintendedly all right so with this in mind we have this now mapping between the scratch version and the other let me
just give you a quick tour of some of the other placeholders and data types that will students start seeing as we
assemble more interesting programs in the world of Linux here is a non-exhaustive list of commands with
which you'll get familiar over the next few weeks by playing with problem sets we've only seen two of these so far are
uh LS for list RM for others but I mention them now just so that it doesn't feel too foreign when uh you see them on
screen or uh online in a problem set um CP is going to stand for copy make dur is going to stand for make directory MV
is going to stand for move or rename uh RM dur is going to be remove directory and CD is going to be for change
directory and let me show you this last one here first only because it's something you'll use so commonly if I go
back to my code here on the screen I'm going to go ahead and reopen the little gooey on the left hand side the
so-called Explorer revealing that I've got two files hello and hello.c so nothing has changed since there suppose
now that you know it's a few weeks into class and I want to start organizing the code I'm writing so that I have a folder
for this week or next week or maybe a folder for problem set one problem set two I can do this in a few ways in the
guy I can go up here and do what most of you would do instinctively on a PC you look for like a folder icon you click it
and then you name a folder like pet one enter voila you've got a folder called pet one I can confirm as much with my
command line interface by typing what command how can I list what's in my folder yeah so LS for list and now I see
Hello and it's green with an asteris because that's my executable my runnable program hello.c which is my source code
and now P at one with a slash at the end which just implies that it's indeed a folder all right I didn't really want to
do it that way I'd like to do it more uh Advanced so let me go ahead and right click on P at one delete permanently I
get a scary irreversible error message but there's nothing in it so that's fine now I've deleted it using the guey but
now let me go ahead and start doing the same thing from the command line and if you're wondering how things keep
disappearing if you hit contrl l in your terminal window or explicitly type clear it will delete everything you previously
typed just to kind of clean things up in practice you don't need to be doing this often I'm doing it just to keep our
focus on my latest commands if I do what was the command to make a new directory yeah so make dur make
directory let me create pet one enter and notice at left there's my pet one if I want to get a little overzealous
plannned for next week here's my pet 2 directory suppose now I want to open those folders want a Mac or PC or in
this gooey I could double click on it like this and you'd see this little arrow is moving it's not doing anything
because there's nothing in there but that's fine but suppose again I want to get more comfortable with my command
line notice if I type LS now I see all four same things let me change directories with CD with CD space pet
one enter and now notice two things will have happened one my prompt has changed slightly to remind me where I am just to
keep me Saye so that I don't forget what folder I'm actually in so here is just a visual reminder of what folder I'm
currently in if I type LS now what should I see after hitting enter nothing because I've only created
empty folders so far and indeed I see nothing if I wanted to create a folder called Mario for a program that might be
called Mario this week I can do that now if I type LS there's Mario now if I do CD Mario notice my prompts going to
change to be a little more precise now I'm in piece at one/ Mario and notice what's happening at top left nothing
nothing now cuz these folders are collapsed but if I click the Little Triangle there I see Mario nothing's
going on in there CU there's no files yet but suppose now I want to create a file called
mario.c I could go up here I could click the little plus icon and use the goey or I can just type code mario.c voila that
creates a new tab for me I'm not going to write any code in here yet but I am going to save the file and now at top
left you'll see that mario.c appears so at some point you can eventually just close explore CU again it's not
providing you with any new information it's may be more user friendly but there's nothing you can't do at the
command line that you could do at the guy all right but now I'm kind of stuck how do I get out of this folder in my
Mac or PC World I'd probably click the back button or something like that or just close it and start all over in the
terminal window I can do CD dot dot dot dot is a nickname if you will for The Parent Directory that is the previous
directory so if I hit enter now no notice I'm going to close the Mario folder AKA directory and now I'm back in
peace at one or if I want to be fancy let me go back into Mario temporarily if I type LS there's mario.c just to orient
us if I want to do multiple things at a time I could do CD slash uh CD do dot do dot which goes to my parent to my
grandparent all in one breath and voila now I'm back in my default folder if you will and one last little trick of the
trade if I'm in P at one/ Mario like I was a moment ago and you're just tired of all the navigation if you just type
CD and hit enter it'll whisk you away back to your default folder and you don't have to worry about getting there
manually recall a bit ago though that I was running hello as this do/ hello if dot dot refers to my parent perhaps
infer here syntactically what does a single dot mean instead it means this directory your
current directory why is that necessary it just makes super explicit to the computer that I
want the program called hello that's installed here not in some random other folder on my hard drive so to speak I
want the one that's right here instead all right so besides these commands there's going to be others that we
encounter over time those are kind of the basics that allows you to sort of lean yourself off of a gooey graphical
user interface and start using more comfortably with practice and time a command line interface instead well what
about those other types now back in the world of c those commands were not c those are just commands specific to a
command line interface like in Linux which again we're using in the cloud it's an alternative to Mac OS and
windows back in the world of C now we've seen strings which are words I mentioned int or integer but there's others as
well in the world of C we've seen string we will see int if you want a bigger integer there's something literally
called a long if you want a single character there's something called a char
if you want a Boolean value true or false there is a bull and if you want a floating point value a fancy way of
saying a real number something with a decimal point in it that is what C and other languages call a float and if you
want even more numbers after the decimal point that is more Precision you can use something called a double that is to say
here's again an example in programming where it's up to you now to provide the computer with hints essentially that it
will rely on to know that what is this pattern of zeros and ones is it a number a letter is it a a sound an image a
color or the like these are the types of data types that provide exactly those hints what are the functions that come
in the menu that is the cs50 library we talked about standard IO and that's just one function so far printf in the cs50
library you can see that that follows a pattern the cs50 library exists largely for the first few weeks of the class to
make our lives easier when you got when you just want to get user input so if you want to get a string like a word or
words from the human you use get string if you want to get an integer from the user you're going to use get int when
you want to get any of those other data types for the most part you use getor something else and they're indeed all
lowercase by convention what about printf if we have the ability now to store different types of data and we
have functions with which to get different types of data how might you go about printing different types of data
well we've seen percent s for strength percent I for integer percent C for Char percent f for a float or a double those
real numbers I described earlier and then percent Li for a long integer so here's the first example of like
inconsistencies in an ideal world it would just be percent L and we'd move on it's percent Li i instead in this
case that's printf and some of its format codes um what more might we do well in C as we'll see no pun intended
there's a whole bunch of operators and indeed computers one of the first things they did was a lot of math and
calculation so there's a lot of operators like these computers and in turn C really good at addition
subtraction multiplication division and even the percent sign which is the remainder operator there's a special
symbol in C and other languages just for getting the remainder when you divide one number by another um there are other
features in the world of C like variables as we've seen and there's also what is sort of playfully called
syntactic sugar that makes it easier over time to write fewer characters but express your thoughts the same so just
as a single example of this as a single example consider this use of a variable last week here in scratch is how you
might set a variable called counter to zero in C it's going to be similar if you want the variable to be called
counter you literally write the word counter or whatever you want it to be called you then use the assignment
operator AKA equal sign and you assign it whatever its initial value should be here on the right so again the zero is
going to get copied from right to left into the variable because of that single equal sign but this isn't sufficient in
C what else is missing on the right hand side instinctively now even if you've never programmed in this before yeah in
front a semicolon at the end and one other thing I think is probably missing again a data type so what would you if
we can uh keep going back and forth here what data type seems appropriate intuitively for counter int for integer
so indeed we need to tell the computer when creating a variable what type of data we want and we need to finish our
thought with the semicolon so there might be a counterpart there what about in scratch if we wanted to Inc increment
that counter variable we had this very user friendly puzzle piece last time that was change counter by one or add
one to counter in C here's where things get a little more interesting and pretty commonly done you
might do this counter equals counter plus one with a semicolon and this is where again it's important to note the
equal sign it's not equality otherwise this makes no sense counter cannot equal counter plus one right like that just
doesn't work if we're talking about integers here that's because the equal sign is assignment so it can certainly
be the case that you calculate counter plus one whatever that is then you up at the value of counter from right to left
to be that new value this as we'll see is a very common thing to do in programming just to kind of count upward
for whatever reason you can write this more succinctly this code here is what we'll call syntactic Sugar sort of a
fancy way of saying the same thing with fewer words or fewer characters on the screen this also adds one or whatever
number you type over here to the variable on the left and there's one other form of syntactic Sugar We're
going to start seeing two and it's even more tur than this that two will increment counter by one by literally
changing its value by one or if you change it to minus minus subtracting one from it you can't do that with two and
three and four but you can do it by default with just plus plus or minus minus adding or subtracting one
yeah uh so when you are changing a variable that already has been created as we did with the code that looked like
this you no longer need to remind the computer what the data type is thankfully the computer's uh uh at least
as smart is that it will remember the type of the data that you intended other questions or comments on
this all right that's quite a lot why don't we go ahead here take a a 10-minute break and we'll be back we'll
start writing some code all right so we are back we've just looked at some of the basics of
compiling even if it doesn't quite feel that basic but now let's actually start Focus really on writing more and more
code more and more interesting code kind of like we dove into scratch uh last week so here I have vs code open I've
closed the guey I'm going to focus more on my terminal window and my code editors many different ways I can create
new files but I want to create something called a calculator so again within this environment of vs code I can literally
write the code command which is vs code specific and it just creates a new file for me automatically or I could do that
in the guey I'm going to go ahead and create this file called calculator. C and I'm going to go ahead and include
some familiar things so I'm just going to go ahead and proactively include cs50.h standard i.h I'm going to go
ahead from memory and do the int void main more on that next week why it's in why it's void and so forth and now let
me just Implement a very simple calculator we saw like some uh mathematical operators like plus and the
like so let's actually use this so let me go ahead and first give myself a variable called X sort of like uh grade
school math or algebra let me go ahead then and get an INT which is new but I mentioned this exists and then let me
just ask the user for whatever their x value is the thing in the quotes is just the English or the string that I'm
printing on the screen so I could say anything I want I'm just going to say x colon to prompt the user accordingly now
I'm going to go ahead and get another variable called y I'm going to get int again and now I'm going to prompt the
user for why and I'm just very niply using a space just to like move the cursor so it doesn't look too messy on
the screen and then lastly let me go ahead and just print out the sum of X and Y in an Ideal World I would just say
something like print FX plus y but that is not valid in C the first argument recall in print F has to be a string in
double quotes So if I want to print out the value of an integer I need to put something in quotes here maybe followed
by a new line if I want to move the cursor as as well so again we only glimpsed it briefly but what do I
replace these question marks with if I want a placeholder for an integer yeah so percent I right just
like percent s was string percent I is integer so I change this to percent I and now if I want to add X and Y for
instance super simple calculator doesn't do much of anything other than addition of two integers I think this works and
again it looks definitely cryptic at first glance like it would be nice if programming weren't this cryptic other
languages will clean this up for us but again if you focus on the basics printf takes one input first which is a format
string with English or whatever language some placeholders maybe then it takes potentially more arguments after the
comma like the value of X+ y all right let me go ahead now and make calculator which again compiles my source code in C
pictured above and converts it into corresponding machine code or zeros and ones no error messages so that's that's
already good now I do/ calculator let's do 1 + 1 and enter voila now I have the makings of a
calculator now let's start to Tinker with this a little bit what if I instead had done this int Z gets x + y and then
plug in Z here if I rerun make calculator enter rerun do/ calulator type in 1 + 1 still equals 2 and let me
claim that it will work for other values as well which of these versions is better designed if both seem to be
correct at very cursory glance is this version better or is the previous one without the Z okay so this one's
arguably better because I've now got a reusable variable called Z that I can not only print but Heck if my program's
longer I can use it elsewhere counters yeah debatable like before because it depends on my intent and
honestly I think a pretty good argument can be made for the first version because because if I have no intention
of as you note using that letter uh that variable again you know what maybe I might as well do this just because it's
one less thing to think about it's one less distraction it's one less line of code to have to understand it's just a
little tighter so here again I think it it does depend on your intention but this feels pretty reasonable and I think
as someone noted earlier when I did the same thing with get string that yeah maybe kind of crossed the line because
get string and the what's your name inside of it it was just so much longer but X Plus y
it's not that hard to wrap our mind around what's going on inside of the printf argument so again these are the
kinds of thoughts that hopefully you'll acquire the Instinct for um not necessarily reaching the same answer as
someone else but again the thought process is what matters here all right so how might I enhance this program a
little bit let's just talk about style for just a moment so X and Y at least in this case are pretty reasonable variable
names why because like that's the go-to variable names in math when you're adding two things together so X and Y
seem pretty reasonable I could have done something like well maybe my first variable should be called
first number and my next variable should be called second number and then down here I would have to change this to
first number plus second number like like this isn't really adding anything semantically to help my comprehension
but that would be one other direction we could have taken things so if you have very simple ideas that are
conventionally expressed with common variable names like X and Y totally fine here here what if I want to annotate
this program and remind myself what it is it does well I can add and see what are called comments with a slash slash
to forward slashes you can write a note to yourself like prompt user forx and then down here I could do something like
prompt user for why just to remind myself what I'm doing there and down here perform addition now in this case
not sure these comments are really adding all that much because in the time it took me to write and eventually read
these comments I could have just read the three lines of code but as our programs get more sophisticated and you
start to learn more syntax that honestly you might forget the next day the next week the next month might be useful to
have these sort of notes to self that remind you of what your code is doing or maybe even how it is doing that thing
with these early programs not really necessary doesn't really add all that much to our comprehension but it is a
mechanism you have in place that can help you actually uh remind yourself or remind someone else what it is that's
going on well let me go ahead and rerun this again in this current version make calculator and here too you might think
I'm typing Crazy Fast not really I'm hitting tab a lot so it turns out that Linux the operating system we're using
here in the cloud but actually Windows and Mac OS nowadays support this too supports autocomplete so if you only
have one program that starts with C you don't have to finish writing calculator you can just hit Tab and the computer
will finish your thought for you the other thing you can do is if you hit up and keep going up you'll scroll through
your entire history of commands so there too I've been saving some key strokes by hitting up quickly rather than retyping
the same darn thing again and again so again just another little convenience to make programming and interacting with a
command line interface even faster all right let me go ahead and just make sure it's compiled in the current form the
comments have no functional impact these green things are just notes to self let me run calculator with maybe how about
this instead of 1+ 1 how about uh one billion uh oops let's I got 1 million one billion and another 1
billion and that answer is two billion all right so that seems correct let's run this program one more time how about
two billion plus another 2 billion did you know that so apparently it's not so correct
and clearly running 1 plus one was not the most robust testing of my code here what might have gone wrong what might
have gone wrong yeah yeah the computer probably ran out of space with bits so it turns out with
these data types we've been talking about string and int and also float and Char and those other things they all use
a specific and most importantly finite number of bits to represent them it can vary by computer newer computers use
more bits older computers tended to use fewer bit um it's not necessarily standardized for all of these data types
but in this case in this uh environment it is using 32 bits for an integer that's a lot with 32 bits You Can Count
pretty high this is 64 light bulbs on the stage and could count even higher an INT is only using half of these or we
have two integers here on the stage now if you think back to last week we talked about eight bits at one point and if you
have eight bits eight zeros and ones you can count as high as 256 just a good number to generally remember as trivia 8
Bits gives you 256 permutations of zeros and ones 32 gives you roughly how many if anyone knows it's 2 to the 32 power
so it's roughly 4 billion 2 to the 32 if you don't know that it's it's fine most uh programmers they eventually remember
these kinds of heuristics so it's roughly 4 billion so that feels like enough 2 billion plus 2 billion is
exactly 4 billion and that actually should fit in a 32bit integer the catch is that my Mac
your PC and the like also like to support negative numbers and if you want to support both positive and negative
numbers that technically means with 32-bit integers you can count as high as roughly 2 billion positive or 2 billion
negative in the other direction that's still four billion give or take but it's only half as many in One Direction or
the other so how could I go about implementing a correct calculator here what might the solution
be yes so not just Li which was for long integer I have to make one more change which is to the data type itself so let
me go back up here and change X from an INT to a long AKA long integer and then let me change y as well and then let me
change the format code per the little cheat sheet we had up a few minutes ago to Li let me recompile the calculator
seems to work okay let's reun it now let's do 1+ one that should obviously be the same now let's do two
billion and another two billion and cross our fingers this time now we're counting as high as four billion and we
can go way higher than 4 billion but we're only kicking the can down the street a bit even though we're now using
with a long 64 bits which is as long as this stage now that's still a finite value it might be a really big value but
it's still finite and we'll come back at the end of today to these kinds of fundamental limitations because arguably
now my calculator is correct for like millon ions billions of possible inputs but not all and that's problematic if
you actually want to use my calculator for any possible uh inputs not just ones that are roughly less than say two
billion as in this case are any questions then on that but it's really just a precursor for all the problems
that we're going to have to eventually deal with later on good question yes if we were still
still using Z we would also have to change it to a long otherwise we'd be ignoring 32 of the bits that had been
added together via the Longs good question all right so how about we spice things up with maybe not just addition
here how about um something with some conditions let's start to ask some actual questions so a moment ago recall
that we had uh recall that we had uh just the Declaration of variables now let's look back at something in scratch
that looked a little something like this a bunch of puzzles pieces asking questions by way of these conditionals
and then these Boolean Expressions here in green maybe saying something like X is less than y in C this actually Maps
pretty cleanly it's much cleaner from left to right than it was with print F and join uh here we have just code that
looks like this if a space two parentheses and then X less than y and then we have something like print F
there in the middle so here it's actually kind of a nice mapping notice that just as the yellow puzzle piece in
scratch is kind of hugging the p purple puzzle piece that's effectively the role that these curly braces are playing
they're sort of encapsulating all of the code on the inside the parentheses represent the Boolean expression that
needs to be asked and answered to decide whether or not to do this thing and here's an exception to what I alluded to
earlier usually when you see a word and then a parenthesis something and then close parenthesis I claim that's usually
a function and I'm still feel pretty good about that claim but there are exceptions and the word if is not a fun
function it's just a programming construct it's a feature of the C language that similarly uses parentheses
just for different purposes for a Boolean expression how about something like this last week if you wanted to
make have a two-way fork in the road go this way or that way you can have if and else in C that would look a little
something like this and if we add in the printfs it now looks quite like the same but it adds of course the word Els and
then a couple of more curly braces as an aside in C it's not strictly necessary to have curly braces if you have only
one line of code indented underneath for best practice though do so anyway because it makes super clear to you um
and ultimately anyone else reading your code that you intend for just that one or more line of code to execute how
about this from last week here was a three-way fork in the road if x is less than y else if x is greater than or
equal to greater than y else if x equals y now here's where you have some disparities between scratch and C
scratch uses an equal sign for equality to compare two values C uses a single equal sign for assignment from right to
left minor difference between the two worlds in C we could implement the same code like this the addition being just
this additional else if and if we add in the print FS it looks a little something now like this this is correct both in
the scratch world and in the SeaWorld but could someone make a claim that this is not again welld designed exactly we
don't need the last if we need the Els at least but we don't need the at last if because at least in the world of
comparing integers it's either going to be less than greater than or equal to there is no other case so you can save a
you know a few seconds if you will of your program running a blink of the eye by only asking two questions and then
inferring what the answer to the third must be just by nature of your own human logic here now why is that a good thing
if for instance X and Y happen to equal each other I type in one and one for both values either in scratch or in the
C world in the case of this version you're sort of stupidly asking three questions all of which are going to get
asked even though the answer is no no yes that is true uh false false true that seems to be unnecessary because if
we instead optimize this code get rid of the unnecessary if and just do as you propos logically else print that X is
equal to Y now if x and D equals y because they're both one or some other value now you're only going to ask two
questions so 2/3 as many questions and then you're going to get your same correct result so again A Minor Detail
but again the kinds of things you should be thinking about not only as you write your code uh to be correct but also
write it to be welld designed as well all right so why don't we go ahead and translate this into the context of an
actual program here I'll create a blank window here and let's do something with like points like points on my own very
first cs50 problem Set uh let me go ahead and run code of points. C that's just going to give me a new text file
and then up here I'm going to do my usual include cs50.h include standard i.h in main void
so a lot of boiler plate so to speak in these early programs and now let me just let's see let's ask the user how many
points did they lose on their most recent cs50 P set to sort of evoke uh my photograph of my own very first pet last
week where I lost a couple of points myself so int Point equals get int then I'll ask a question in English like how
many points did you lose question mark space and then once I have this answer let's now ask some questions of it so if
point is less than two borrowing the syntax syntax that we saw on the screen a moment ago let's go ahead and print
out something explanatory like you lost fewer points than me back sln all right else if points greater than two which is
again how many I lost I'm going to go ahead and print out you lost more points than me back sln else if wait a minute
else seems to be sufficient logically here I'm just going to go ahead and print out something like uh you lost the
same number of points as me back sln so really just a straightforward application of those that simple idea
but to like a concrete scenario here so let me go ahead and save this uh let me go ahead and run make points enter no
errors that's good run points and then how many points did you lose how about it's one point all right you lost fewer
points than me how about zero points even better how about three points and so forth so again we have the ability to
express in C now pretty basic idea from last week in reality which is this notion of conditionals and asking
questions there's something subtle here though that's maybe not super welld designed that someone might call a magic
number this is sort of programming speak for something I've done here there's a bit of redundancy unrelated to the if
and the else if and the Els but is there something I typed twice just to ask perhaps for the
obvious exactly I've I've hardcoded so to speak manually typed out the number two in two locations in this case that
did not come from the user where so this is apparently once I compile this this is it you're always comparing yourself
to me in like 1996 which For Better or For Worse is all the program can do but this is an example two of a magic number
in the sense like wait where did that two come from and why is it in two places it feels like we are setting the
stage for just a higher probability of screwing up down the road because the longer this code gets suppose I'm
comparing against two points elsewhere two three four five places am I going to keep typing the number two like yeah
that's fine it's correct it's going to work but honestly eventually you're going to screw up and you're going to
miss one of the twos you're going to change it to a three because maybe I did worse the next year or one I did better
and you don't want these numbers to get out of sync so what would be like a logical Improvement to this design
rather than hardcoding the same number sort of magically in two or more places yeah why don't I make a variable
that I can use in there so for instance I could create a variable like this another integer called mine and I'm just
going to initialize it to two and then I'm going to change change mentions of two to this and mine is a pretty
reasonable name for a variable in so far as it uh refers to exactly whose points are in question um there's a risk here
though minor though it is I could accidentally change mine at some point maybe I forget what mine represents and
I do some addition or subtraction so there's a way to tell the computer don't trust me because I'm going to screw up
eventually by making a variable constant too so a constant in a programming language this did not exist in scratch
is just an additional hint to the computer that essentially enables you to program more defensively if you don't
trust yourself necessarily to not screw up later or honestly in practice if you know that number should never change
make it constant and never think about it again this tells the compiler to make sure that even you later in your code
cannot change the number two and another convention in C and other languages when you have a constant it's often common to
just capitalize the variable kind of like like you're yelling but it really just visually makes it stand out so it's
kind of like a nice rule of thumb that helps you realize oh that must be a constant capitalization alone does not
make it constant the word const does but the capitalization is just a visual reminder that this is somewhere somehow
a constant so just a minor refinement but again we're sort of getting better at programming just by sort of um
instilling these kinds of heuristics questions then on conditionals in C or these constants
yeah yeah why do you not use a semicolon and lik lines 913 no good like just because like this is the way the
language was designed and it's confusing early on generally speaking when you're using conditionals and eventually we'll
see Loops there's no semicolons involved for now assume that semicolons usually finish your thought after a function
that's not 100% reliable of theistic but will get you most of the way there and just because left hand was not
talking to right hand when some of these languages were designed all right so let's do something else how about this
if I have the ability to ask something conditionally is this thing true or is this other thing could I write a very
simple program that does something basic like tells me if a number of the human types is even or odd well let me just
get the framework for that in place let me go ahead and write code of a parody As a fancy way of saying even or odd and
let me go ahead and include cs50.h include standard i.h int main void again more on those down the road
but for now I'm going to go ahead and get a number n from the user uh by calling get int and asking them for
whatever n is and then now I'm going to introduce some pseudo code so here's the first example of a program honestly that
I'm not really sure how to proceed so let me just resort to some pseudo code using comments eventually I'll get rid
of this and write actual code but if n is even then print actually let me just print that let me just go ahead and say
print F quote unquote even because I know how to use print F uh else all right I know how to print F odd
so let me just say print F quote unquote odd so here I've sort of taken a bite out of the problem if you will and let
me go ahead and put in my little placeholders I want to do some kind of condition so if question marks now let
me go ahead and fill in the blanks here else I'll put this here so I think I'm okay now I'm getting closer to solving
this but I still have this question mark here how using syntax we've seen might I determine if n is even or odd what do
you think nice there's this little uh operator we I mentioned by name earlier the remainder operator that will let you
do exactly that if you divide any number by two that mathematical here is is going to tell you if it's even or odd
based on whether there's a remainder of zero or one and that's nice because the alternative would seem to be doing
something stupid like if Nal equals z or Nal equals uh if n equal 2 or Nal 4 or right you would your code would be
infinitely long if you had to ask all possible questions but if I do n uh divided by two and look at the remainder
it's a little cryptic but this will indeed do the trick so the percent sign is the remainder operator it does
numerator divided by denominator and returns not the result of that but rather the remainder of that so if you
divide anything by two it's going to be a zero or one remainder and if indeed two divides into n evenly giving you
zero then you're going to print even else it's got to be odd but there is something odd pun intended in this
highlighted line what is another new piece of syntax apparently besides the percent
sign what's a little off there yeah yeah so that's not a typo and I even caught myself verbally saying it a
moment ago just because it's so ingrained what must this mean here
yeah yeah if something's equivalent to the other so now this is the equality operator it's not assignment from right
to left and this one too is an example of like literally humans not really planning ahead perhaps uh left hand not
talking to right hand and that someone decided let's use the equal sign for assignment and then some number of
minutes or days later people are like damn how do we now compare for equality well let's just use two and if you think
this is a little weird in some languages like JavaScript there's a third version where you use three equal signs so again
it's humans that design these languages so if you're ever frustrated by them confused by them admittedly it might
just not have been the best design but we just kind of have to live with it ever since so let me go ahead and zoom
out here let me go ahead and make parody here so make parody and again parody is just the name of my file parody. c/
parity type in a number like two that's indeed even four that's indeed even three that's indeed odd and so forth if
we continue testing presumably we'll get the same kinds of answers how about something else let me go ahead now and
let me start copying and pasting some of this code because admittedly it's getting a little tedious to keep typing
out all of that boiler plate at the top let me create a program called agree. C that's reminiscent of like any of those
forms you have to agree to online with a checkbox or typing in yes or no or the like so let me throw away all the guts
of this main program and now ask something like this let me go ahead and prompt user to agree to something I'm
going to go ahead and say how about get string do you agree whatever the question might be and I want the human
to type why or no y or n for yes or no respectively so if it's only a single character actually I can actually get by
with just get Char not used it before but it was on our menu of functions from the cs50 library and if I want to get
the user's response the return value should be a Char also on the left so now we've seen strings ins and now chars if
we only care about a single letter and now let's go ahead whether check whether uh user agreed so how about if C equals
equals quote unquote y then let me go ahead and inside of my curly braces print out agreed or some such sentence
like that else if they did not type c or you know what let's be explicit here just so they can't type Z or B or some
random letter else if C equals equals quote unquote n for no then let me go ahead and print out not agreed or
something like that and I'm just going to ignore the user if they don't cooperate in the type Z or B or
something that's not y or n all right let me go ahead now and compile this code make
agree SL agree all right do I agree yes let's go with the default okay so that seems to work no I don't agree this time
that seems to work how about my caps lock key is on or I'm just uh really yelling capital Y it ignores me capital
N it ignores me so obviously a bug at least if I want to tolerate uppercase and lower case which is kind of
reasonable so what would be the possible solutions here do you think how do I solve this and tolerate both up
capital and lowercase maybe what's the simplest most naive implementation yeah so why not just ask
two questions or you know what even more simplistic based only on what we've seen before let me if you will let me just
kind of copy and paste some of this code change this to an else whoops not in caps else if quote unquote capital Y and
then I bet I could do the same thing with n but here too just like with scratch as soon as you start to find
yourself copy pasting you're probably doing something wrong and what you said verbally if I may was actually better
because you're implying that I could just say something like or c equal equals capital y or down here calal
capital N the catch is you can't use the word or in C it's actually two vertical bars so you can express one question or
another you only will need one of the answers to be yes or true and you use two vertical Bars by contrast just so
you've seen it if you wanted to check if something is equal to something and something else you could use two Amper
Sands this logically would make no sense here though certainly what the human type can't both be lowercase and
uppercase that just makes no sense so in this case we do want or but that allows me to tighten my code up I don't have to
start copying and pasting whole branches I can now ask two questions at once questions then on this variation
really good question can you convert the input to all lowercase absolutely you could we don't have the capability yet
it turns out that's going to require to be easy another library that we could do it ourselves knowing a little bit about
asky or Unicode from last week but yes that would be an alternative but more on that a different time other
questions good question unfortunately you have to be explicit in see you can't just say this even though that's kind of
how you might think about it you have to ask a complete question using the equality sign twice in this
case let me ask a question now too it's not a typo I deliberately used single quotes around all of my single letters
here why might that be previously we used double quotes for anything that looked like
text yeah correct string is double quotes for multiple characters or even one
technically but yes and single quotes for for single characters because my data type is different I chose the rout
of just using a single Char in fact this program won't work with ye or n o That's not supported at the moment more on that
another time I had to use single quotes because that's how C does it if you're dealing with single characters AKA chars
use single quotes if it's a string even if it's one single character in a string as though you're starting to write out a
longer word or sentence that would be double quotes and we'll see why this is before long too but again just sort of
things to keep in mind whenever writing code in this particular language yeah down
here so short answer if I'm understanding correctly even we this would be incorrect and this would be
even more incorrect but if you don't mind let me Kick the Can a couple of weeks on this as to why this doesn't
work the right way the most Pleasant way to do this would indeed be to do something like this but even this is a
slippery slope because what if the user does something weird like they capitalize just the W you could imagine
this getting messy quickly I like your idea earlier about just forcing everything to lower case just to
standardize things unfortunately you cannot compare strings for equality like this for again reasons will come to
before long so for today we're keeping it simple even though arguably it's not nearly as user friendly to only tolerate
individual letters and there's a question over here uh the on the US English keyboard
it's shift and then the backslash key above return but depending on your keyboard it will
vary all right so let's actually now uh look back at something we did a little bit last week let me go ahead and open a
file called meow. C because recall that's what we had scratch do initially let me include not the cs50 library this
time but just standard i.h because I only want printf for this demo let me go ahead now and just print out
meow and then if I want the cat to meow three times like it did last week meow meow meow save it make meow meow voila
program is written correct I claim it ran it compiled okay but again this was the beginning of our conversation last
week of not being particularly welld designed and if someone wants to maybe point out the now obvious like why is
this not welld designed necessarily yeah it's just repetition right again I literally resorted to copy
paste like that should be the signal that you're probably doing something wrong or at best just lazy of you in
this case so the solution as you might glean from last week is probably going to be one of those things called Loops
so let's just take a look at some of the Syntax for loops and see but again no new IDE IDE it's just some new syntax
that'll Tak some getting used to in scratch if you wanted to meow forever with something like this there's not a
forever keyword in C so this one's a little weird to look at but this is the best we can do it turns out there is a
keyword called while in C and that kind of has the right Comm semantics because it's like while I do something again and
again that's the best I can do but just like an if condition or an else if condition those took a expression in
parenthesis a while loop also takes a Boolean expression in parentheses so I have to ask a question now if I want to
do something forever I could kind of stupidly just say while two is greater than one while three is greater than two
or just something completely arbitrary but that should rub you it the wrong way cuz like why two versus one why three
like just if you want true just say true so it turns out in C there are special keywords true and false that are
literally true and false respectively I could also put the number one for true and the number zero for false but most
people would just say true to be explicit so it's a little hackish if you will but very conventional there's no
forever keyword in C if I want to then print meow forever I'm going to just use something like print F here so again not
perfect translation from one to the other but absolutely possible in C what about this this is a little more common
if you want to do something a finite number of times like repeat three there's a few different ways we can do
this in C here's one I would approach and here's where C like a lot of text based languages you kind of have to whip
out that toolkit of all of the basic building blocks and think about all right how can I build a little machine
in software that does something some number of times well let me give myself a variable called counter set it equal
to zero let me create a loop whose Boolean expression is is counter less than three
the idea being here why don't I just kind of count 1 2 three so how do I implement this physicality in code I
give myself a variable set it to zero zero fingers up now I ask the question is counter less than three if so go
ahead and print out meow and just intuitively even if you've never seen C code or any code before scratch what
more do I need to do I've left room here for one more line of logic yeah we have to increase counter so I need code like
I showed earlier like counter equals counter plus one and so here's where like programming sometimes becomes a bit
more like Plumbing like you can't just say what you mean like you couldn't scratch you have to build a little sort
of software machine that initializes a value does something increments it checks it and so it's kind of like this
software-based machine but together that's just using some familiar building blocks but this is pretty common just
like in scratch you might have used Loops a bunch of times pretty common in C so can we tighten this code up this is
correct but here are some new here are some conventions that are popular you don't if you're going to count just say
I a convention in programming with at least languages like C is just use I as an integer if all its purpose is is to
count from like zero on up counter is not wrong it's not bad it's just uh like it's it's more verbose than you need to
be just call it I you don't need more semantics than that all right what else can I do here there's another
opportunity to tighten up this code do you recall yeah yeah that syntactic sugar that does
nothing new but it does it more succinctly I can change this to either the uh intermediate format or the even
tighter format of just i++ now this is pretty canonical like this is how many people most people would Implement
something three times using a loop in C using a while loop that is turns out that it's so common in C and other
languages to do something finitely many times there's a couple of ways to do it in this model to be clear the logic
though is that we start by initializing the variable like I've highlighted here we then ask the question is I less than
zero if so everything that's indented inside the curly braces gets executed namely meow then the update then the
computer is going to to recheck the condition to make sure that I hasn't gotten so big that it's greater than
three but if not it then does this again and it does this again and then it repeats constantly checking the
condition and executing what's in the block checking the condition and executing what's in the block after
three times of that the condition is going to be false or a no answer and that's it for the code it just proceeds
to whatever is down here just like with scratch it jumps to the next blocks down below all right what's another way
though to do this well I've deliberately been counting from zero and that's a programming convention right we started
last week with all the light bulbs off which was zero so it's pretty reasonable to start counting at zeros just like you
would here like no fingers are up this is zero fingers on your hand but if you prefer you could start counting at I
equals 1 but then you don't want to do it while I is less than 3 you want to do I is less than or equal to 3 on most
keyboards there's no symbol for less than or equal to or greater than or equal to so in C you use two characters
less than and then an equal sign with no spaces in between that just means less than or equal to we could change it to
set I to two and make this condition be less than or equal to four we could make this be um
uh 10 and less than or equal to 12 but again just stick with the basics start at zero and count on up would be the
convention or if you prefer to countdown that's fine too set I to three and then do this so long as I is greater than
zero but you have to decrement instead of increment so again we could do this all day long there's like literally an
infinite number of ways to implement this idea and that's why I keep emphasizing convention call the variable
I for something like this initialize it to zero for something like this and just generally count up unless you really
prefer to count down again just certain human conventions all right how about another way to do this this is what's
called a for Loop in C also very common it's not quite as straightforward in that it doesn't really read top to
bottom in the exactly the same way this kind of has a lot more logic tucked into its first line but it does exactly the
same thing what happens here is notice that inside the parentheses next to the word for there's two semicolons which is
another weird use of syntax they're not at the end of the line now they're in the middle of the parentheses but that's
what the humans chose years ago the first thing before the uh semicolons initializes your variable in I equals z
the next thing is the condition that's going to constantly get checked every cycle through this Loop and the last
thing is going to be what you do after each Loop which in this case is going to be count up so again if I rewind we
initialize I to zero we then ask the question is I less than three if so execute what's inside of the
loop then the computer asks does this it does the update incrementing I by one and then it's not going to blindly meow
again it's going to check again the condition is I less than three then it's going to meow if so then it might go
ahead and increment I and check the condition again so again this does not read quite in the same simple Fashion
top to bottom you kind of read it left to right and then jump around but again the
initialization the constant Boolean expression being checked and the update after each time does the exact same
thing as what we saw a moment ago in this while loop format which one is better eh they're the same I think most
people would probably eventually use a for Loop once comfortable but just because is really the answer there all
right any questions then on Loops as we translated them to see yeah uh for Loop and while loop can both
be used to do exactly the same thing there are subtle differences with issues of scope which we'll discuss before long
where when you create a variable in a for Loop notice that it was again inside of those parentheses which technically
means it's only going to exist in these four lines of Code by contrast with the while loop I declared my variable
outside of the loop that variable is going to continue to Exist Elsewhere in my program so that's one of the minor
differences there good question but you'll see some others over time all right so we claim then that it's better
in some form to do this with Loop so let's actually jump back to the code let me go ahead and now reimplement meowing
with a four Loop for instance so how about four in I gets zero I less than three whoops I less than three i++ then
inside my curly braces let me go ahead and print out with print F meow with a new line and a semicolon so I did it
pretty quickly just cuz I've long acquired the muscle memory but if I now make meow no errors there run. meow and
I see meow meow meow well let's do now what we did last week which was to begin to make our own custom functions if you
will by using our own Inc so here's where the synex gets a little funky but we'll explain over time why what each of
these keywords is doing if I want to create a function called meow because the authors of C did not create a
function called meow decades ago I need to give it a name like meow I need to specify if it takes any inputs for now
I'm going to say no and I'm going to explicitly Say No by writing the special word
void it's also necessary when implementing a function in C which was not necessary in scratch to specify what
its return type is but for now I'm just going to say that meow is the name of the function it takes no no inputs and
that's what the me void in parenthesis means and it does not return anything like get uh like ask did or like get
string or get int does meow's purpose in life is just to have side effects visual Side Effects by printing something on
the screen so what is meow going to do I'm going to have it quite simply say print F quote
unquote uh meow back sln and now just like in scratch I can now just call a brand new function called meow and
here's where too if you really don't like the curly braces technically speaking you can get rid of them when
there's only one line of code inside your Loop but again stylistically I would encourage you to preserve them to
make super clear to yourself and others what it is that's going on let me go ahead and save this and do make meow
whoops darn all right what did I do something stupid yeah so zero does not belong
there I meant to hit parenthe uh parenthesis so let me rerun make meow okay fixed my mistake all right it's
still working okay but recall what I did in scratch kind of out of sight out of mind and just to make a point let me
just kind of highlight this and move it way down in the file because again like now that meow exists it's an abstraction
I just know a meow function exists I want to be able to use it so let me scroll back up my main function is the
same let me go ahead and make meow again and now just by moving that function I've created all these lines of errors
and let's look at the first again the rule of thumb here it's a little small but it says meow. C and bolt which is
the name of the file where the bug is five is the line number and 20 is the character so line number is no enough
alone uh let's see oh this is what happens when I scrolled up too far sorry this is the error we're now looking at
line seven I was looking at the old error message uh from earlier before I fixed the zero meow see line seven all
right apparently C does not know what the meow function is implicit Declaration of function meowo is invalid
in c99 well what does that mean Declaration of function means your creation of a function like I'm
declaring that meow exists but I haven't apparently defined it yet and then c99 is the version of C from the year 1999
which we generally use here it's one of the more recent versions so why is that the case can you infer from the mere
fact that I just moved meow to the bottom of the file which was fine in scratch but now is bad why is
that yeah C is just kind of old school it reads your code top to bottom and if it does not know what meow is when you
first try to use it it just freaks out and prints out these error messages so the solution is quite simply don't do
that just leave it where it was but you could imagine this getting a little Annoying over time if only because main
is by name the main part of your program and honestly it would just be nice honestly if if main were always at the
top of your code because if you want to understand what a file is doing it makes sense to just read it top to bottom well
there is a solution to this you can put functions in different orders with main at the top so long as you and this is
perhaps the only time copy paste is appropriate so long as you leave a little breadcrumb for the compiler at
the very top of your file that literally repeats the return value the name and the arguments to that function semicolon
this is so to speak declaring your function and the real fancy way is this is a prototype it's like what is this
thing going to look like but the semicolon means I'm not going to deal with this yet I'm going to actually
Define the function or implement it down below here this is kind of a stupid detail more recent languages get away uh
get rid of this need you can put your functions in any order but again if you just think about the basics of
programming languages like this one here and as you noted it must just be reading your code top to bottom so annoying yes
but explained yes too so let me go ahead and make meow once more time/ meow still working okay and let me make one final
enhancement to this meow program here let me go ahead now and say something like this let me go ahead and say all
right wouldn't it be nice if my meow function could do something for me some number of times so suppose I want to do
this this meow function fun at the moment is going to meow three times but suppose I want to meow n times where n
is just some number provided by the user well just like in scratch custom functions can take inputs I just
presently am saying void but if I change this to intn thereby telling the compiler hey
meow still doesn't return something but it does take something as input it takes an integer and I want to call it n so
this is another way of declaring a variable but a way of declaring a variable that gets handed into as input
the function so now if I tighten up main here now I can actually do something really cool just like in scratch which
is this if I now look at this code let me zoom in here now my main program is really well written in the sense that it
just says what it does meow three times this works though because I Define meow is now taking an input an integer called
n and then using n in my now familiar for Loop there's one change you might have caught my one mistake I
also have to remind myself up here to make that change too again this is one of the only redundancies or copy paste
that's sort of reasonable but there I have now a better version so let me go ahead and rerun this uh make meow do/
meow voila so again no change in correctness but now again we're sort of modularizing our code and heck what you
could do now and this is just a tease a feature down the road those header files we talked about early those libraries
this is the kind of modularization we're talking about we the staff wrote a function called get string get int and
so forth we put it in a file called cs50.c and we put little breadcrumbs specifically these things called
prototypes in cs50.h so that when you all as aspiring programmers include cs50.h you are sort
of secretly telling the compiler at the very top of your code what the menu of available functions is why because in
cs50.h is lines like these obviously not for meow but for get string get int and so forth and in standard i.h is the same
lines of code for things like printf so that's all that's going on there it's just a way of telling the computer in
advance what functions to expect are any questions then on these here correct so if you don't mind I want
to continue to wave my hand at that detail for today indeed int main void is a little weird cuz like what would the
input domain be we have no mechanism for providing input yet and what does it mean for main to return anything like
who is it returning to for another day if we may they're going to come into play but that for now today is just
something you should take at face value as necessary copy paste to begin programs so meow is a function that
takes an input the number of times to meow but it didn't actually have a return value hence the void but what
what if we actually want to create our own function that not only takes zero or more inputs as arguments but also
returns some value maybe an INT maybe a float maybe something else Al together we'll turns out to see we can do that as
well let me go ahead and create a new file here called discount and let's Implement a quick program via which we
can discount some regular price by some percentage as though there's a sale going on in a store let me go ahead and
include our usual uh cs50.h followed by standard IO at the top let me give myself in main void as before and inside
of main let's go ahead and do something simple let's give ourselves a float called regular representing like the
regular price of something in a store let's go ahead and get a float from the user asking them what that regular price
is then next let's go ahead and declare a second variable also a float called sale ultimately representing the sale
price after some percentage discount off and let's go ahead and simply calculate whatever regular is and say 15% off is a
pretty good discount so let's go ahead and discount regular Whatever It Is by 15% which is equivalent of course to
multiplying it with the asterisk by 0.85 of course if we're taking off 15% we multiply the regular price by 085 now
let's go ahead and print out the results here let me go ahead and say print F sale price colon let me go ahead and use
percent F but more specifically percent. 2f CU at least in US currency we typically show sense to two decimal
places followed by a new line and then let me go ahead and plug in the value of sale all right let's go down here and do
make discount enter so far so good/ discount and the regular price is maybe $100 so the sale price should be $85 so
our arithmetic seems to be correct here but let's fast forward now in time suppose that we find ourselves
discounting a lot of prices in an application maybe a website like Amazon where they're offering some kind of
percentage discount and' be nice to have a reusable function that just does this arithmetic Force simple though it may be
so let's go ahead and modify discount this time to give ourselves our own function called discount for instance
that takes an input like the regular price that you want to Discount and then it also returns a value it doesn't just
print it out it returns a value namely a float that represents what the sale price is so let me go down below Main
and go ahead and Define a function that's going to return a float because we're dealing with dollar amount still
uh the function is going to be called discount and it's going to take one input
like the price that we want to Discount in here I'm going to do something very simple I'm going to say float sale
equals whatever that price is times 085 and then I'm going to go ahead and return sale now for that matter I can
actually tighten this up a bit if I'm only declaring a variable to store a value that I'm then returning with this
keyword return I actually don't even need that variable so I can delete the second line and I can actually just go
ahead and get rid of that variable alt together and immediately return turn whatever the arithmetic result is of
taking the price input the argument that's being passed in times 85 so very simple function that simply does the
discounting for me as always let me go ahead and copy paste the almost the only time it's okay to copy paste the
Prototype of that function to the top of the file so that when compiling this code uh main has already seen the word
discount before and now let me go into the code here and instead of doing the math myself in main let me presume that
we have some function already in our toolkit called discount that lets me discount the regular price and return
that value and then down here my code doesn't need to change I'm still going to print out sale the variable in which
I'm storing that result but notice what I've done here I've sort of abstracted the way of the notion of taking a
discount by creating my own function that takes a float called price or anything else as input it does a little
bit of math simple though it is here and then it returns a value but notice that discount is not printing that value it's
literally using this other keyword called return so that I can hand back that value just like get string hands
back a value just like get in hands back in integer without printing it for you so that I up here on line nine can go
ahead and store that value in a variable if I want and then actually print it out let me go ahead now and recompile this
code with make uh discount let me go ahead and do do/ discount and let's again do
$100 sale price is going to be uh 85 5 as well now it turns out that functions don't have to take just zero or one
argument as input they can actually take two or three or more so in fact suppose we wanted to now enhance this version of
my program and take in as input to the discount function not just the price that I want to discount but also the
percentage off thereby allowing us to support not just 15% off but any number of percentage points off well let me go
up here and declare an Inay and call it percent off and let me ask the user for how many percentage points they want to
take off so I'm going to say percent off inside of the propt here get that int called percent off and now in addition
to passing in regular as an input that the discount function I'm also going to pass in percent off but I need to tell
the computer that it is taking now two arguments and the way I do this is just with a comma down here in the function's
own definition here is going to be a percentage argument a second argument per the comma and I'm going to use that
percentage in a slightly uh familiar way I don't want to just do percentage like this because of course that's going to
increase the size of the total price I actually need to do a little bit of real world math where if this is a percentage
off like the number 15 for 15 percentage points I need to do like 100 minus that many percentage points thereby giving me
100- 15 85 and then I need to divide that by 100 in order now to give myself 85 five times the price that was passed
in but if I go ahead now and save this run make discount one last time I notice that I've actually got an error here
what have I done wrong well I need to change that prototype too and again this is admittedly an annoying aspect of C
that you have to maintain consistency here but that's fine I'm just going to go up here change this to int percentage
spelling incorrectly and now let me retry compilation make discount crossing my fingers this time worked okay/
discount and while $100 and percent off say 15 points and voila
$85 now it's worth noting that I've deliberately returned the results of my math from this function I haven't just
done the math on the original variable that's being passed in in fact if we take a look at this second version where
discount is now taking a price argument and a percentage argument notice that I'm not doing something like this I'm
not just saying price equals price time 100 - percentage divided by 100 and leaving it that the problem there is
that this variable price is going to be scoped to that discount function and we'll encounter this again before long
but this notion of scope just refers to where in which a variable actually lives or exists or is accessible so it turns
out if I change price in the context of this discount function that's not going to have a lasting effect if I actually
want to get the result back to the function that use the discount function namely main I actually do need to take
this approach of actually returning the value explicitly so that ultimately I'm handing back the discounted price all
right well let's go ahead and maybe how about let's just use these Primitives in just a few different ways how about
little game of uh yester year Super Mario Brothers and in the original Super Mario Brothers and in bunches of Varian
so you have like these uh sidescrolling worlds that look like this where there's some coins in the sky hidden behind
these question marks so let's just use this as a visual to consider how and see could I start to make something semi-
graphical like not actual colors or fanciness that feels like too much too soon just something like printing out
some question marks well if I go back over here let me create that actual file that I alluded to earlier so let me code
up mario.c let me go ahead and include standard i.h in main void again which
we'll continue to copy paste for today and then let me just go ahead and do something simple like one two 3 4 and a
new line all right this is what we might call asky Art which just means Graphics but really just implemented with your
keyboard and if I make Mario and do/ Mario it's not nearly as engaging visually as this but it's the beginning
of this kind of map for a game well if I wanted to now print out of those things dynamically let me go back to my code
here and instead of printing out four all at once I could do something like four in I get zero I less than four i++
and then inside here I could just print out one of them at a time let me save that make Mario and at the risk of
disappointing so close but I made a mistake just stupid aesthetic the The Prompt is not on the
new line how could I move it yeah I need an escape character the back slash n
but should I put it here okay no because that's going to put it after everyone and it's going to make
this thing vertical instead of horizontal so logically just like in scratch put it at the end of the loop so
something out here and just print out for instance only quote unquote new line and now if I do make Mario again/ Mario
okay we're back in business but a little better designed in that now I'm not repeating myself multiple times I'm
doing this again and again but let's do one other thing here with Mario let me go ahead and ask the user how many
question marks or coins to print the catch here is that there's another type of loop that's helpful for this and it's
called a do while loop generally a do while loop is similar to a while loop but it checks the condition last instead
of first recall earlier on the slide we had while open parenthesis closed parenthesis and I kept claiming that we
check whether I is less than whatever it was three in advance again and again a Doh Loop just inverts the logic so that
you can actually do something like this at the top of this program I'm going to go ahead now and give myself a variable
n like this of type integer and then I'm going to do literally the following with the keyword do
n equals get int and I'm going to ask the user for the width like the number of dollar signs to print and I'm going
to do this while n is less than say one so this is a little cryptic but the Salient differences are the while the
Boolean expression is now at the bottom of my block of code not at the top now why is this well the difference here if
I make Mario is whoops uh I need to add cs50.h because I'm now using get int if I now
compile this version of Mario and do/ Mario do while loop is helpful when you want to do something no matter what
first and then check some condition or some Boolean expression to see if maybe in this case the user cooperated it
would make no sense if the user typed in say zero because like there's no work to be done it' be really weird if they said
Nega 100 because that makes no sense logically so with this simple construct here on I am doing the following while n
is less than one the implication is that as soon as n equals 1 or as bigger than one I'm going to break out of this Loop
and I've got myself a variable called n containing essentially a positive value one through two billion or so and I can
now use this for instance here change the four to an N so now my program is completely Dynamic let me go ahead and
do make Mario do/ Mario again and I'll do four still works I'll do 40 still works and
the difference here with the do while is if something like this involves getting user input well there's no question to
ask like the user hasn't given you anything yet so you have to do something first then check and break out of the
loop if the human has for instance cooperated in this case all right well why don't we escalate to something more
like this in the same game where your underground is Mario and this is like a two-dimensional uh wall that's popping
up here it looks like a 3X3 for instance for the sake of discussion and it's like made out of bricks so I'll use maybe
hash symbols this time well it turns out that we can Nest that is combine some of these same ideas as follows let me go
ahead now and change back to this code and I'm going to keep the do while loop from before and I'm going to ask though
this question what's the size of this Square I'm going to assume it's like x n byn so uh 3x3 4x4 whatever so I'm just
going to ask for the size of the square of bricks and now how do I do this well I'm going to go ahead for instance and
print out how about four in I gets Zer I less than N I ++ let me just keep it simple and print out something like this
just a single hash symbol that is a brick and a new line after it all right let's make Mario Run Mario of three okay
that's close to being it I've got a column all right but I I need it to be wider so the solution last time was to
get rid of the new line and then maybe put the new line here out after the loop all right so let's do make Mario do/
Mario and uh type in three and huh all right so I kind of need to combine these two ideas somehow so how might we solve
this problem I want to print like rows and columns not row or
column how do I do this yeah Loop yeah add another loop in the for Loop right if you use one Loop
conceptually to kind of count the rows from top to bottom and then within each row you then sort of typewriter style
old school typewriter do like character character character character horizontally I think we could do exactly
what we want to achieve here so how about this let me get rid of this line and get rid of this line for now and let
me just give myself another loop on the inside and since I'm already using I another reasonable convention here would
be to say something like J so J also gets zero J is less than n j++ and now what's going to happen let me go ahead
and print out just one of these things at a time and let me save and write let me run this let me see if it how close
we are make Mario three okay three that's clearly wrong but I see nine things there on the screen so we're
close what's the one fix I need now to sort of move the old school typewriter head down to the next row when a
appropriate what do you think yeah I need one of these back slash ends and probably let me add some
comments now to help everyone visualize what I've done for each row for each column how about print a brick just to
kind of explain the logic and so I add that because now move to next row I could do something like this with a back
slash n so here's where the the comments really my pseudo code actually kind of illuminates the situation a bit let me
go ahead and recompile Mario Mario 3 now we're talking it's not a perfect square just because these hash symbols are a
little taller than they are wide but that's just a font detail here now I've done something that's quite more akin to
something like this all right so let me pause here and see if there are any
questions again the code's getting a little more complicated but we're just building like more complicated programs
like in scratch like familiar puzzle pieces some variables Some Loops some conditionals it's all the same as before
yeah can you multiply strings in C no uh but ask that same question again in a few weeks when we get to Python and the
answer will be yes other questions yeah in C you must specify the return type the name of the function and the inputs
or arguments to the function in that order and if none of them are applicable you write the word void uh so same
question as earlier let me kick that can a week or so and we'll come back to that and we'll see why but for now just take
on faith that you need to do that with Maine because Maine is a little special similar to the when green flag is
clicked it too was a little special as well yeah yes if you want to get out of a
loop early you could do this so let let me answer this question this way an alternative to a do while loop would be
to do something like this uh how about while true so do the following forever let me go ahead and get an INT from the
user for the size of this thing if n is greater than zero that is a positive integer then go ahead and use a new
keyword called break this is identical to what we just did it's just a little longer it's like a couple extra lines a
lot of them are blank and so it's just an alternative but a do while does the same thing but a little tighter if
that's an answer to your question all right so let's now introduce finally a sequence of problems that I've kind of
been brushing under the rug though we did see a little bit of evidence of this earlier when we tried to add two billion
and two billion and it overflowed the number of bits in an in so to speak let me go ahead and code up a program uh
called calculator again but I'm going to go ahead now and change just a float so I'm going to change X to a float and I'm
going to use get float and a float again is just a floating point value which is a fancy way of saying a real number with
a decimal point in it and down here I'm going to go ahead and use percent f for float and I'm going to go ahead now and
do one more thing instead of addition I want to do something fancier like division so divide X by Y and I'm going
to give myself another third float called Z as we did at the beginning of today and I'm going to print out Z
instead of X and Y explicitly so I'm going to go ahead now and do make calculator do/ calulator and let's do
something like oh 2/3 2 / 3 is 66667 so that's what you would rather expect let me run it again
1/10th all right so 0.1 and a bunch of zeros that too is what you would rather expect but now let me get a little
curious it turns out that in C you can modify the behavior of these format codes a little bit by default you get
like six or so digits suppose that you want to get exactly two digits you can more succinctly say 0. 2 before the F
and after the percent this is the kind of thing that's hard to remember but you Google it and you find that okay format
codee for floats uses 2 to do two decimal points so let me do make calculator again/ calculator how about
2/3 67 so it handles the display of significant digits for us here and now let me go ahead and do 1/10th and 0.100
so it's adhering to that well maybe I really want a lot of precision right I've got a really powerful computer let
me see 50 numbers after the decimal point that's a lot of significant digits let me remake the calculator whoops typo
let me remake the calculator SL not Mario calculator and how about 2/3 again well that's
interesting pretty sure it's supposed to be like a 6 with like a line over it right in in grade school math all right
well maybe that's just a bug how about 1110th okay that's really getting funky so what's going on it seems that my
program can not only not do addition very well we eventually hit problems in the billions we can't even do very
precise numbers here what's going on exactly in a nutshell the computer is
approximating the answer using that many uh numbers after the decimal point but the problem fundamentally is actually
very similar to that integer overflow from before and I'm using that now as a term of art integers can overflow if
you're trying to use more bits than you actually have available to you you sort of change them all to ones and then
you're out of bits so to speak same thing here but in the different context of floats if you only have 30 2 bits or
Heck if we change to double and only have 64 bits that's a lot of precision but it's not infinite and yet pretty
sure there's an infinite number of real numbers in the world which is to say a computer with finite memory cannot
possibly represent all possible numbers in the world because again there's not an infinite number of permutations of 32
or 64 bits it might be a lot in the billions or more but it's still finite and so indeed this is the computer's
closest approximation to what's actually going on there and so this is an example of what we would actually generally call
floating Point impr Precision floating point in Precision refers to the inability for computers fundamentally to
represent all possible real numbers 100% precisely at least by default in languages like C thankfully in the world
of scientific Computing and so forth there are solutions to this problem that just give you more digits but the
problem fundamentally is still going to be there so there's a reason I changed X and Y y to floats let's see what would
happen if we reer a bit and instead of using floats for X and Y again you say integer so int X and int Y and let's go
far back and do get int as well thereby giving us integers X and Y let's still leave Z as a float because at the end of
the day we want to be able to handle fractions or floating Point values but let's go ahead now and print out this
value of Z having changed X and Y now to ins make calculator do/ calulator and let's do say two for the numerator three
for the denominator and while it's not 666 and it's not even rounding oddly it's just all zeros this time so why is
that well it turns out that c when dividing an integer by an integer is always going to give you back an integer
an INT the problem is that floating Point values don't fit in ins only the integral part to the left of the decimal
point does everything at and Beyond the decimal point itself get thrown away known as a feature in C called trunc
when dividing an integer by an integer you get back an integer but if you're trying to then store it what's actually
a floating Point result in that integer C is just going to throw away everything at and Beyond the decimal point leaving
us with this case in just the Zero from what should have been 0. 666666 and so forth so let's see one
more example in fact let me go back to my terminal here let me do/ calculator again and let's do 4/3 this time it
should be 1.33333 and so forth but let's see four divided by three both as integers this
time gives us 1.0000 but there're two the answer should be
1.333 but the floating Point part is getting truncated or thrown away leaving us with just one so how do we solve this
well certainly we could just use floats from the GetGo as I did but if by nature of your program you only have access to
uh integers or maybe even Longs for which the same problem would occur what we can actually do is called type
conversion and we can explicitly tell the computer that we actually want to treat this int
as though it's a floating point value and we can do that for both X and Y so let me go back to my code here and I
have a couple of options in fact I can convert y to a float by doing this I can cast y to a float by literally writing
the type float inside of parentheses right before the Y and if I really want to be explicit I can also do the same to
X but strictly speaking it suffices to just change one or the other not necessarily both both let me go ahead
now and do make calculator again/ calculat and let's try 2 / 3 and now we're back to an answer that's closer to
correct but indeed we're still having some rounding issues there let's run it one more time for four ID three there
too we're closer to the right answer at least but we still have that floating point in Precision but that's going to
be another problem Al together to solve and here in a little more detail is that issue of integer overflow which is in
the context of in suppose that we think back last week when we had three bits and we counted from like 0 to seven 0 1
2 3 4 five 6 7 I think I asked the question how would we count to eight someone proposed well we need a fourth
bit that's fine if you have a fourth bit if you have access to another light bulb or transistor if you don't though the
next number after this is technically one0 0 but if you don't have space for or hardware for that fourth bit you
might as well just be representing the number zero so in the world of integers if you're only using three bits those
three bits eventually overflow when you count past seven because what should be eight can't fit so to speak so it rolls
back over to zero and as Arcane as this problem might seem we humans have done this a couple of times you might recall
uh knowing about or reading about the Y2K problem where a lot of people thought the world was going to end why
because on January 1st of 2000 a lot of computers ERS presumably were going to update their clocks from 1999 to the
year 2000 the problem is though for decades for efficiency we humans we're honestly in the habit of not storing
years as four digits why because that's just a lot of space to waste especially since like centuries don't happen that
often so a lot of computer systems especially early on when Hardware was very expensive and memory was very tight
just stored the last two digits of any year problem of course on January 1st of 2000 is that 99 rolls over to 100 but if
you don't have room for another digit it's just 0 0 and if your code assumes a prefix of 19 well we just went from the
year 1999 back to the year 1900 thankfully long story short a lot of people wrote a lot of code in a lot of
old languages and mostly warded off this problem so the world did not end the next time the world might end though is
on January 19th 2038 now that might feel like a long time away but so did the year 2000 at
one point why might clocks again break in today's modern computers in 2038 might you
think indeed so this refers to some number of seconds so it turns out that the way computers generally keep track
of time is they count the total number of seconds since the epoch which is arbit which is defined as January 1st
1970 why it was just a good year to sort of start counting at when computers really came onto the scene unfortunately
most computers used 32 bits to count the number of seconds since January 1st 1970 the implication of which is we can only
count up to roughly 2 billion seconds two billion seconds is going to happen in 2038 at which point 31 ones are going
to roll over as follows that number two billion which is the max because if you're representing positive and
negative numbers recall that you can only count as high as positive 2 billion or negative2 billion looks like this
this is roughly the number two billion in binary it's all ones with one zero way over here if I count one second past
that two billion number give or take that means like all right I add one I carry the one it's just like nines
becoming zeros in decimal if I keep this sort of simple animation and I keep carrying the one carrying the one
carrying the one one second after two billion seconds give or take I have this number in the computer's memory so
there's still one bit that's a one all the way to the left unfortunately that bit often represents
negativity whereby if that first bit is negative that represents that the rest of this somehow represents a negative
number it's not negative zero there's a fancier representation but a very big positive number very suddenly becomes a
very big negative number and that number is roughly -2 billion that means computers in 2038 on that date are going
to accidentally think that it's been -2 billion seconds since January 1st 1970 which is going going to make computers
potentially think it's 1901 so what is the solution to the 2038 problem perhaps why 2K was because we
were using two digits for years what about 2038 more bits and thankfully we're getting a little better at um
Lessons Learned here and computers now are increasingly using 64 bits and all of us will be long gone by the time we
run out of that number of seconds so it's someone else's problem many many years from now but that's really the
fundamental solution if you're running up against something fin well just kick the can further and just give yourself
more bits and frankly because Hardware is so much cheaper these days computers are so much faster it's not as big of a
deal as it might have been decades ago but that's indeed the solution but this arises in very common context in fact
let me go ahead and write a real quick program here uh called pennies right you might think that just converting Dollars
to pennies in US currency might be simple but let me go ahead and do this in pennies. C I'm going to go ahead and
include cs50.h and I'm going to include uh standard i.h int main void as my
starting point and now down here I'm going to do this I'm going to get a float called amount and I'm going to ask
the user for some amount of dollars so a dollar amount and I'm going to store that in a variable called amount then
I'm going to Simply convert that amount to pennies by doing say um how about uh how about amount times 100 and then I'm
going to go ahead and print out that the number of pennies is percent I because that's just an
integer in pennies back sln quote unquote comma pennies all right so if I didn't make any mistakes here let me
make pennies pennies and suppose I have say 99 so .99 that's 99 pennies suppose I
have A123 that's pretty good suppose I have $4.20 huh there's that imprecision issue
and this isn't even that big of an amount now not a big deal if like the cashier gives you one penny less than
your owed but you can imagine this adding up you can imagine this being worrisome for financial implications um
for financial transactions for scientific measurements of the like my program can't even handle this well
there are some solutions here and it looks like what's really happening if I print it out using the percent f with a
0 50 or whatever to see more decimal points presumably the computer is struggling to represent $420 precisely
it's probably storing $419.99 N9 something cents so it's close but it's not quite there so I could at
least solve this by rounding up for instance and it turns out there is a round function out there and it turns
out that it's in a library called the math library and you would know this by looking at online documentation and the
like as we'll point you to and if I now make pennies again and do/ pennies I can now do $4 20 cents and waa now it's
correct so at least in this context it seems like a solvable problem but it's certainly something I need to be mindful
of nonetheless unfortunately even professional full-time programmers over the years have not been particularly
attentive to these kinds of details and in a class like this the goal is not just to teach you programming but to
really teach you what's going on underneath the hood so to speak so that you have a bottom up understanding of
how data is represented how computers are manipulating it so that you are not on the failing end of some program
having some bug and so that we as a society are not beholden to those kinds of mistakes too and this happens
unfortunately all of the time this is a Boeing airplane that a few years ago needed to be rebooted after every 248
days why because this Boeing airplane software was using a 32-bit integer counting up T of a second to keep track
of something or other related to its electrical power and unfortunately after 248 days of the airplane being
continuously on on which in the airline industry is apparently not uncommon to make every dollar count keeping the
planes up and running all the time the 32-bit number would roll over and the power would shut off on the airplane as
a side effect because of sort of undefined behavior in that case the temporary Solution by boing at the time
was apparently essentially sort of operating system style well have you rebooted your plane and that was indeed
the fix until they rolled out an actual software patch this stuff really matters and the more Hardware we carry around
and the more we as a society use these kinds of devices the more of these problems we're going to run into down
the road that's it for cs50 we'll see you next time [Music]
this is cs50 and this is week two now that you have some programming experience under your belt in this more
Arcane language called C among our goals today is to help you understand exactly what you have been doing these past
several days wrestling with your first programs in C so that you have more of a bottom-up understanding of what some of
these commands do and ultimately what more we can do with this language so this recall was the very first program
you wrote I wrote in this language called C much more textual certainly than the scratch equivalent but at the
end of the day computers your Mac your PC uh VSS code doesn't understand this actual code what's the format into which
we need to get any program that we write just a recap so binary otherwise known as
machine code right the zeros and ones that your computer actually does understand so somehow we need to get to
this format and up until now we've been using this command called make which is sort of Appley name because it lets you
make programs and the invocation of that has been pretty simple make hello sort of looks in your current directory or
folder for a file called hello.c implicitly and then it compiles that into a file called hello which itself is
executable which just means runnable so that you can then do/ hello but it turns out that make is actually not a compiler
itself it does help you make programs but make is this sort of utility that comes on a lot of systems that makes it
easier to actually compile code by using an actual compiler the program that converts source code to machine code on
your own Mac or PC or whatever Cloud environment you might be using in fact what make is doing for us is actually
running a command automatically known as clang for C language and so in fact here for instance in VSS code is that very
first program again this time in the context of a text editor and I could compile this with make hello but let me
go ahead and use the compiler itself manually and we'll see in a moment why we've been automating the process with
make I'm going to run clang instead and then I'm going to run hello.c so it's a little different how the compiler used
it needs to know explicitly what the file is called I'll go ahead and run clang hello. C enter nothing seems to
happen which generally speaking is a good thing because no errors have popped up and if I do LS now for list you'll
see there is not a file called hello but there is a curiously named file called a.out This is a historical convention
stands for assembler output and this is just the default file name for a program that you might compile yourself manually
using client itself let me go ahead now though and point out that that's kind of a stupid name for a program even though
it works do/ a.out would work but if you actually want to customize the name of your program we could just resort to
make or we could do explicitly what make is doing for us because it turns out some programs among the make support
what are called command line arguments and more on those later today but these are like literally words or numbers that
you type at your prompt after the name of a program that just influences Its Behavior in some way it uh modifies Its
Behavior and it turns out if you read the documentation for clang you can actually pass A- o for output command
line argument that lets you specify explicitly what do you want your outputed program to be called and then
you go ahead and type the name of the file that you actually want to compile from source code to machine code let me
go ahead and hit enter now again nothing seems to happen and I type LS and voila now we still have the old A.L because I
didn't delete it yet and I do have hello now so/ hello waa run hello world again and let me go ahead
and remove this file I could of course resort to using the Explorer on the left hand side which I am in the habit of
closing just to give us more room to see but I could go ahead and right click or control click on a.out if I want to get
rid of it or again let me focus on the command line interface and I can use anyone recall we didn't really use it
much but what command removes a file RM so RM for remove RM uh a.out enter remove regular file a.out y for
yes enter and now if I do LS again voila it's gone all right so let's now enhance this program to do the second version we
ever did which was to also include uh cs50.h so that we have access to functions like get string and the like
and let me go ahead and do uh string name gets get string what's your name question mark and now let me go ahead
and say hello to that name with our percent s placeholder comma name so this was version two of our program last time
that very easily compiled with make hello but notice the difference now if I want to compile this thing myself with
clang using that same lesson learned all right let's do it clang d o hello just so I get a better name for the program
hello. C enter and a new error pops up that some of you might have encountered on your own so it's a bit Arcane here
and there's this mention of a cryptic looking path with temp for temporary there but somehow my issues in Maine as
we can see here it somehow relates to hello. C even though we might not have seen this language last time in class
but there's an undefined reference to get string as though get string doesn't exist now your first instinct might be
well maybe I forgot cs50.h but of course I didn't that's the very first line of my program but it turns out make is
doing something else for us all this time just putting cs50.h or any header file at the top of your code for that
matter just teaches the compiler that a function will exist it sort of asks the compiler to asks the compiler to trust
that I will eventually get around to implementing functions like get string and cs50.h and standard io. like printf
therein but this error here some kind of linker command relates to the fact that there's a separate process for actually
finding the zeros and ones that cs50 compiled long ago for you that the authors of this operating system
compiled for you long ago in the form of printf we need to somehow tell the compiler that we need to link in code
that someone else wrote the actual machine code that someone else wrote and then compiled so to do that you'd have
to type-l cs50 for instance at the end of the command so additionally telling clang that not only do you want to
Output a file called hello and you want to compile a file called hello.c you also want to quote unquote Link in a
bunch of zeros and ones that collectively Implement get string and printf so now if I hit enter this time
it compiled okay and now if I run /hello it works as it did last week just like that but honestly this is just going to
get really tedious really quickly notice already just to compile my code I have to run clang d o hello hello. c-l cs50
and you're going to have to type more things too if you wanted to use the math Library like to use that round function
you would also have to do- LM typically to specify give me the math bits that someone else compiled and the commands
just get longer and longer so moving forward we won't have to resort to running clang itself but clang is indeed
the compiler that is the program that converts from source code to machine code but will continue to use make
because it just automates that process and the commands are only going to get more cryptic the more uh sophisticated
and more uh featureful your programs get and make again is just a tool that makes all that happen so to speak let me pause
there to see if there's any questions before then we take a look further under the hood yeah in front
D LCS sure let me come back to that in a moment what does the- l cs50 mean we'll
come back to that visually in just a moment but it means to link in the zeros and ones that collectively Implement get
string and print F but we'll see that visually in a sec yeah behind [Music]
you really good question how come I didn't have to link in standard IO because I used printf in version one
standard IO is just literally so standard that it's built in it just works for free cs50 of course is not it
did not come with the language C or the compiler we ourselves wrote it and other libraries even though they might come
with the language C they might not be enabled by default generally for efficiency purposes so you're not
loading more zeros and ones into the computer's memory than you need to so standard IO is special if you will other
questions yeah oh what is the- o mean so- o is shorthand for the English word output
and so- O is telling clang to please output a file called hello because the next thing I wrote after the command
line recall was clang d o hello then the name of the file then- L cs50 and this is where these commands do get and and
stay fairly Arcane it's just through muscle memory and practice that you'll start to remember oh what are the other
commands that you what are the command line arguments you can provide to programs but we've seen this before
Technically when you run make hello the program is called make hello is the command line argument it's an input to
the make function albe it typed at the prompt that tells make what you want to make even when I used RM a moment ago
and did RM of a.out the command line argument there was called a.out and it's just telling RM what to delete it is
entirely dependent on the programs to decide what their conventions are whether you use Dash this or Dash that
um but we'll see over time which ones actually matter in practice so to come back to the first question about what
actually is happening there let's consider the code more Clos up so here is that first version of the code again
with standard i.h and only printf so no cs50 stuff yet until we add it back in and had the second version where we
actually get the human name when you run this command then there's actually a few things that are happening underneath the
hood and we won't dwell on these kinds of details indeed we'll abstract it away so to speak by using make but it's worth
understanding at least from the GetGo how much automation is going on so that when you run these commands it's not
magic you actually do have this bottom up understanding of what's going on so when we say you've been compiling your
code with make that's a bit of an oversimplification technically every time you compile your code you're having
the computer actually do four distinct things for you and this is not four distinct things that you need to sort of
memorize and remember every time you run your program what's happening but it helps to sort of break it down into
building blocks as to how we're getting from source code like C into zeros and ones it turns out that when you compile
quote unquote your code technically speaking you're doing four things things sort of automatically and all at once
pre-processing it compiling it assembling it and linking it just humans decided eh let's just call the whole
process compiling but for a moment let's just consider what these steps are so pre-processing refers to this if we look
at our source code here version two that uses the cs-50 library and therefore get string notice that we indeed have these
include lines at top and they're kind of special versus all the other code we've written because they start with hash
symbols specifically and that's sort of a special syntax that means that these are technically called pre-processor
directives fancy way of saying they're handled special versus the rest of your code in fact if we focus on cs50.h
recall from last week that I provided a hint as to what's actually in cs50.h among other things like what was the one
Salient thing that I said was in cs50.h and therefore why we were including it in the first
place so get string specifically the Prototype forget string we haven't made many of our own functions yet but recall
that anytime we've made our own functions and we've written them like below Main in a file we've also had to
somewhat stupidly copy paste the Prototype of the function at the top of the file just to teach the compiler that
this function doesn't exist yet it does down there but it will exist just trust me so again that's what these prototypes
are doing for us so therefore in my code if I want to use a function like get string or print f for that matter
they're not implemented clear nearly in the same file they're implemented elsewhere so I need to tell the compiler
to trust me that they're implemented somewhere else and so technically inside of cs50.h which is installed somewhere
in the Cloud's hard drive so to speak that you all are accessing via vs code there's a line that looks like this a
prototype for the get string function that says the name of the function is get string it takes one input or
argument called prompt and that type of that prompt is a string get string not surprisingly has a return value and it
returns a string so literally that line and a bunch of others are in cs50.h and so rather than you all having to copy
paste the Prototype you can just trust that cs50 figured out what it is you can include cs50.h and the compiler is going
to go find that prototype for you same thing in standard iio someone else what must clearly be in standard i.h among
other stuff that motivates our including standard io. H2 yeah
print F the prototype for print F and indeed I'll just change it here in yellow to be the same and it turns out
the format uh the prototype for print f is actually pretty fancy because as you might have noticed print F can take one
argument just something to print two if you want to plug a value into it three or more so the dot dot dot just
represents exactly that it's not quite as simple a prototype as get string but more on that another time so what does
it mean to pre-process your code the very first thing the compiler clang in this case is doing for you when it reads
your code top to bottom left to right is it notices ooh here is Hash enclude oh here's another hash include and it
essentially finds those files on the hard drive cs50.h standard io. and does the equivalent of copying and pasting
them automatically into your code at the very top thereby teaching the compiler that get string and printf will
eventually exist somewhere so that's the pre-processing step whereby again it's just doing a find and replace of
anything that starts with hash include it's plugging in the files there so that you essentially get all the prototypes
you need automatically okay what does it mean then to compile the results because at
this point in the story your code now looks like this in the computer's memory it doesn't change your file it's doing
all of this in the computer's memory or RAM for you but it essentially looks like this well the next step is what's
technically really compiling even though again we use compile as an umbrella term compiling code in c means to take code
that now looks like this in the computer's memory and turn it into something that looks like this which is
way more cryptic but it was just a few decades ago that if you were taking a class like cs50 in its earlier form we
wouldn't be using C if it didn't exist yet we would actually be using this something called Assembly Language and
there's different types of or flavors of Assembly Language but this is about as low level as you can get to what a
computer really understands be it a a Mac or PC or a phone before you start getting into actual zeros and ones and
most of this is cryptic I couldn't really tell you what this is doing unless I really thought it through
carefully and rewound mentally years ago from having studied it but let's highlight a few key words in yellow
notice that this Assembly Language that the computer is outputting for you automatically still has mention of Main
and it has mention of get string and it has mentioned of print F so there's some relationship to the C code we saw a
moment ago and then if I highlight these other things these are what are called computer instructions at the end of the
day your Mac your PC your phone actually underly understands very basic instructions like addition subtraction
division multiplication move into memory load from memory print something to the screen like very basic operations and
that's what you're seeing here these assembly instructions are what the computer actually feeds into the brains
of the computer the CPU the central processing unit and it's that inel CPU or whatever you have that understands
this instruction and this one and this one and this one and collectively long story short all they do is print hello
world on the screen but in a way that the machine understands how to do so let me pause here are there any
questions on what we mean by pre-processing which just finds and replaces the hash includes symbols among
others and and compiling which technically takes your source code once pre-processed and converts it to that
stuff called assembly language CPU correct each type of CPU has its own instruction
set indeed and as a teaser this is why at least back in the day when we used to like install software from CD ROMs or
some other type of media this is why you can't take Co a program that was sold for a Windows computer and run it on a
Mac or vice versa because the commands the instructions that those two uh products understand are actually
different now Microsoft or any company could generally write code in one language like C or another and they can
compile it twice saving a PC version and saving a Mac version it's twice as much work and sometimes you get into some uh
incompatibilities but that's why these steps are somewhat distinct you can now use the same code and support even
different platforms or systems if you'd want all right assembly assembling thankfully this part is fairly
straightforward at least in concept to assemble code which is step three of four that is just happening literally
for you every time you run make or in turn clang this Assembly Language which the computer generated automatically for
you from your source code is turned into zeros and ones so that's the step that last week I simplified and said when you
compile your code you convert it to source code from source code to machine code technically that happens when you
assemble your code but no one in normal conversations uh says that they just say compile for all of these
terms all right so that's assembling there's one final step even in this simple program of getting
the user's name and then plugging it into printf I'm using three different people's code if you will my own which
is in hello.c some of cs50's which is apparently in hello.c and sorry which is in
cs50.c which is not a file I've mentioned yet but it stands to reason that if there's a
cs50.h that has prototypes turns out the actual implementation of get string and other things are in cs50.c
and there's a third file somewhere on the hard drive so to speak that's involved in compiling even this simple
program hello.c cs50.c and by that logic what might the other
be yeah standard standard io. C and that's a bit of a white lie because that's such a big fancy library that
there's actually multiple files that compose it but the same idea and we'll take the simplification so when I have
this code now and I compile my code here I get those zeros and one that end up taking hello.c and turning
it effectively into zeros and ones that are combined with cs50.c followed by standard io. c as well so let me rewind
here here might be the zeros and ones for my code the two lines of code essentially that I wrote here might be
the zeros and ones for what cs50 wrote some years ago in cs50.c here might be the zeros and ones that someone wrote
for standard IO decades ago the last and final step is that linking command that links all of these zer and ones together
essentially stitches them together into one single file called hello or called a. out whatever you name it that last
step is what combines all of these different programmers zeros and ones and my God like now we're really in the
weeds who wants to even think about running code at this level you shouldn't need to but it's not magic when you're
running make there's just some very concrete steps that are happening that humans have developed over the years
over the decades that sort of break down this big problem of source code going to zeros and ones or machine code into
these very specific steps but henceforth you can call all of this compiling questions or confusion
yeah a sure what does a.out signify a.out is just the conventional default file name
for any program that you compile directly with a compiler like clang it's just kind of a meaningless name though
it stands for assembler output and assembler might now sound familiar from this assembling process it's just kind
of a lame name for a computer program and so we can override it by outputting something like hello instead
[Music] yeah so there are to recap there are other prototypes in those files cs50.h
standard i.h technically they're all included at top your file even though you strictly speaking don't need most of
them but indeed they are there just in case you might want them and finally any other questions
[Music] yeah does it matter what order we're telling the computer to run sometimes
with libraries yes it matters what order they are linked in together but for our purposes it's really not going to matter
it's just make is going to take care of automating that process for us all right so with that said henceforth compiling
technically is these four things but we'll focus on it just as a higher level concept and abstraction if you will
known as compiling itself so another process that we'll now begin to focus on all the more
this week because unari this past week you ran against uh ran up against some challenges you probably created your
very first bugs or mistakes in a program and so let's focus for a moment on actual techniques for debugging as you
spend more time this semester in the years to come if you continue to program you're never frankly probably going to
write bug free code ultimately your programs are just going to get more featureful more sophisticated and we're
all just going to start to make uh more sophisticated mistakes and to this day I write buggy code all the time and I'm
always horrified when I do it up here but hopefully that won't happen too often but when it does it's just a
process now of debugging trying to find the mistakes in your program and you don't have to just stare at your code or
sort of shake your fist at your code there are actual tools that real world programmers use to help debug their code
and find these faults so what are some of the techniques and tools that folks use well as an aside um if you've ever
um uh de bug in a program is a mistake that's actually been around for some
time if you've ever heard this tale um some uh 50 plus years ago in 1947 this is actually an entry in a log book uh
written by a famous computer scientist known named Grace Hopper who happened to be the one to record the very first
discovery of a quote unquote actual bug in a computer this is actually like a moth that had flown into at the time was
a very sophisticated system known as the Harvard Mark 2 computer sort of large sort of refrigerator size type systems
in which an actual bug caused an issue uh the atmology of bug though actually predates this particular
instance but here you have uh as any computer scientist might know the example of a first physical bug in a
computer how though do you go about removing such a thing well let's consider a very simple scenario from
last time for instance when we were trying to print out various aspects of Mario like this column of three bricks
let's consider how I might go about implementing a program like this and let me switch back over to vs code here and
I'm going to go ahead and run uh write a program and I'm not going to trust myself so I'm going to call it buggy. C
from the get-go knowing that I'm going to mess something up but I'm going to go ahead and include standard i.h and I'm
going to go ahead and Define main as usual so hopefully no mistakes just yet and now I want to print those three
bricks on the screen using just hashes for bricks so how about four in I gets zero I less than or equal to 3 I ++ now
inside of my my curly braces I'm going to go ahead and print out a hash followed by a back sln semicolon all
right saving the file doing make buggy enter it compiles so there's no syntactical errors like my code is
syntactically correct but some of you have probably seen The Logical error already because when I run this program
I don't get this picture which was three bricks High I seem to have four bricks instead now this might be jumping out
you why it's happening but I've kept the program simple just so that we don't have to actually find an actual bug we
can use a tool to find one that we already know about in this case what might be the first strategy for actually
finding a bug like this rather than just staring at your code asking a question trying to sort of just think through the
problem well let's actually try to diagnose the problem more proactively and the simplest way to do this now and
years from now is honestly going to be used to use a function like printf printf is a wonderfully useful function
not for formatting printing formatted strings and all that for just looking inside the values of variables that you
might be curious about to see what's going on so you know what let me do a do this I see that there's four coming out
but I intended three so clearly something's wrong with my I variables so let me just be a little more pedantic
let me go inside of this Loop and just temporarily say something explicit like I is percent I back sln and then just
plug in the value of I right this is not the program I want to to write it's the program I'm temporarily writing because
now I'm going to go ahead and say make buggy do/ buggy and if I look now at the output I have some helpful diagnostic
information I is zero and I get a hash I is one and I get a hash two and I get a hash three and I get half okay wait a
minute I'm clearly going too many steps because maybe I forgot that computers are essentially counting from zero and
now oh it's less than or equal to now you see it right again trivial example but just by using printf you can see
inside of the computer's memory by just printing stuff out like this and now once you figured it out oh so this
should probably be less than three or I should start counting from one there's any number of ways I could fix this but
the most conventional is probably just to say less than three now I can go ahead and delete my temporary print
statement rerun make buggy do/ buy and voila problem solved all right and to this day I do this like whether it's
making a command line application or a web application or mobile application it's very common to use print F or some
equivalent in any language just to poke around and see what's inside the computer's memory thankfully there's
more sophisticated tools than this let me go ahead and reintroduce the bug here and let me go ahead and reopen my
sidebar it left here and let me go ahead now and recompile the code to make sure it's current and I'm going to run a
command called debug 50 which is a command uh that's representative of a type of program known as a debugger and
this debugger is actually built into VSS code and all debug 50 is doing for us is just automating the process of starting
VSS codes built in debugger so this isn't even a cs50 specific tool we've just given you a debug 50 command to
make it easier to start it up from the get-go and the way you run this debugger is you say debug 50 space and then the
name of the program that you want to debug so in this case do/ buggy so you don't mention your C file you mention
your already compiled code and and what this debugger is going to let me do is most powerfully walk through my code
step by step because every program we've written thus far just kind of runs from start to finish even if I'm not done
sort of thinking through each step at a time with the debugger I can actually like click on a line number and say
pause execution here and the debugger will let me walk through my code one step at a time one second at a time one
minute at a time at my own human pace which which is super compelling when the programs get more complicated and they
might otherwise just fly by on the screen so I'm going to click to the left of line five and notice that these
little red dots appear and if I click on one it stays and gets even redder and I'm going to now run debug 50 on do/
bugy and in just a moment you'll see that a new panel opens on the left hand side it's doing some configuration of
the screen and now let me go ahead and zoom out just a little bit here so we can see more on the screen at once and
sometimes you'll see in VSS code that debug console opens up which looks very cryptic just go back to terminal window
if that happens because that the terminal window is where you can still interact with your code and let's now
take a look at what's going on if I zoom in on my buggy. C code here you'll notice that we have uh the same program
as before but highlighted in yellow is line five not a coincidence that's the line I set a so-called breakpoint at the
little sort of red do means break here pause execution here and the yellow line has not yet been executed but if I now
at the top of my screen notice these little arrows there's one for play there's one for this which if I hover
over it says step over there's another that's going to say step into there's a third that says step out I'm just going
to use the first of these step over and I'm going to do this and you'll see that the yellow highlight moved from line
five to line seven because now it's ready but hasn't yet printed out that hash but the most powerful thing here
notice is at top left here it's a little cryptic cuz there's a bunch of things going on that'll make more sense over
time but at the top there's a section called variables below that something called locals which means local to my
current function Main and notice there's my variable called I and its current value is zero so now once I click step
over again Watch What Happens we go from line seven back to line five but look in the terminal window one of the hashes
has printed but now it's kind of printed at my own pace I can sort of think through this step by step notice that I
has not changed yet it's still zero because the yellow highlighted line hasn't yet executed but the moment I
click step over it's going to execute line five now notice at top left I has become
one and nothing has printed yet because now highlighted is line seven and so if I click step over again we'll see the
hash and if I repeat this process at my own human comfortable Pace I can see my variables changing I can see output
changing on the screen and I can just think about should that have just happened and I can pause and give
thought to what's actually going on without trying to race the computer and figure it all out at once I'm going to
go ahead and just stop here because we already know what this particular problem is and that just brings me back
to my default terminal window but this debugger let me disable the break point now so it doesn't keep break
this debugger will be your friend moving forward in order to step through your Cod step by step at your own pace to
figure out where something has gone wrong print if is great but it gets annoying if you have to constantly add
print this print this print this print this recompile rerun it oh wait a minute print this print this like the debugger
just lets you do the equivalent but automatically questions on then this debugger which you'll see all the more
Hands-On over time questions on debugger yeah really good question we'll see this
before long but those other buttons that I glossed over like step into and step out of actually let you step into
specific functions if I had any more than main so if main called a function called something and something called a
function called something else instead of just stepping over the entire execution of that function I could step
into it and walk through its lines of code one by by one so anytime you have a problem set you're working on that has
multiple functions you can set a break point in main if you want or you can set it inside of one of your additional
functions to focus your attention only on that and we'll see examples of that over
time all right so what else and what's the um you know the sort of elephant in the room so to speak is actually a duck
in this case why is there this duck in all of these ducks here well it turns out a third genuinely recommended
debugging technique is talking through problems talking through code with someone else now in the absence of
having a family member or a friend or a roommate who actually wants to hear you talk about code of all things um
generally programmers turn to a rubber duck or other inanimate objects uh if something animate is not available and
the idea behind rubber duck debugging so to speak is that simply by looking at your code and talking it through okay on
line three I'm I'm starting a for Loop and I'm initializing I to zero okay then I'm printing out a hash just
by talking through your code step by step invariably finds you having the proverbial light bulb go off over your
head because you realize wait a minute I just said something stupid or I just said something wrong and this is really
just a proxy for any other human teaching fellow teacher friend colleague um but in the absence of any of those
people in the room you're welcome to take on your way out today one of these little rubber ducks and consider using
it for real anytime you just want to talk through one of your problem s in cs50 or maybe life more generally but
having it there on your desk is just a way to help you hear ill logic in what you think might otherwise be logical
code so printf debugging uh rubber duct debugging are just three of the ways you'll see over time to sort of get to
the source of code that you will write that has mistakes which is going to happen but it'll Empower you all the
more to solve those mistakes are any questions on debugging in general or these three techniques
yeah what's the difference between step over and step into at the moment the only one that's applicable to the code I
just wrote is step over because it means step over each line of code if though I had other functions that I had written
in this program maybe lower down in the file I could step into those function calls and walk through them one at a
time so we'll come back to this with an actual example but step into will allow me to do exactly that in fact this is a
perfect segue to doing a little something like this let me go ahead and open up maybe another file here actually
we'll use the same buggy and we're just going to write one other thing that's buggy as well let me go ahead up here
and include as before cs50.h let me include standard o standard I
o.h let me do int main void so all of this I think is correct so far and let's do this uh let's give myself an INT
called I and let's ask the user for a negative integer this is not a function that exists technically yet but I'm
going to assume for the sake of discussion that it does and then I'm just going to go ahead and print out
with percent I and a new line whatever the human typed in so at this point in the story my program I I think is
correct except for the fact that get Negative int is not a function in the cs50 library or anywhere else I'm going
to need to invent it myself so suppose in this case that I declare a function called get Negative int uh its return
type so to speak should be int cuz as its name suggests I want to hand the user back in integer and it's going to
take no input to keep it simple so I'm just going to say void there no inputs no special prompts nothing like that let
me now give myself some curly braces and let me do something familiar perhaps now from problem set one let me give myself
a variable like n and let me do the following within this block of code assign n the value of get int asking the
user for a negative integer using get in's own prompt and I want to do this while
n is less than zero because I want to get a negative int from the user and recall from having used this block in
the past I can now return n as the very last step to hand back whatever the user has typed in so long as they cooperated
and gave me an actual negative integer now I've deliberately made a mistake here and it's a subtle sort of silly
mathematical one but let me compile this program after copying now the Prototype up to the top just so I don't make that
mistake again let me do buggy enter and now let me do/ buggy I'll give it a negative integer like
-50 uh huh that did not take uh how about how about5 maybe it's to no uh how about
zero huh all right so it's clearly sort of working backwards or incorrectly here logically so how could I go about
debugging this well I could do what I've done before I could use my print F technique and say something explicit
like n is percent I new line comma I just to sorry comma n just to print it out let me recompile buggy let me rerun
buggy let me type in -50 okay n is -50 so that didn't really help me at this point um because that's the same as
before so let me do this debug 50 bugy oh but I've made a mistake so I didn't set my breakpoint yet so let me go ahead
and do this and I'll set a break point this time I could set it here on line8 let's do it in main as before let me
rerun debug 50 now on/ bugy that fancy user interface is going to pop up it's going to highlight the line that I set
the break point on notice that on the left hand side of the screen I is defaulting at the moment to zero because
I haven't typed anything in yet but let me go ahead now and step over this line that's highlighted in yellow and you'll
see that I'm being prompted so let's type in my50 enter all right and notice now that
I'm stuck in that function so all right so clearly the issue seems to be in my get Negative int function so okay let me
go ahead and stop this execution my problem doesn't seem to be in main per se maybe it's down here so that's fine
let me set my same breakpoint at line 8 let me rerun debug 50 one more time but this time instead of just stepping over
that line let's step into it so notice line 8 is again highlighted in yellow in the past I've been clicking step over
let's click step into now and when I click step into boom now the debugger jumps into that specific function and
now I can step through these lines of code again and again I can see what the value of n is as I'm typing it in I can
think through my logic and voila hopefully once I've solved the issue I can exit the debugger fix my code and
move on so step over just goes over the line but executes it step into lets you go into other functions you've written
all right so let's go ahead and do this we've got a bunch of possible approaches that we can take to solving some problem
let's go ahead and Pace ourselves today though let's take a five minute break here and when we come back we'll
actually take a look at that computer's memory we've been talking about see you in five all
right so let's let's dive back in and up until now both by way of week one and problem set one for the most part we've
just translated from scratch into C all of these basic building blocks like loops and conditionals Boolean
Expressions variables so sort of more of the same but there are features in C that we've already stumbled across
already like data types the types of variables that doesn't exist in scratch but that in fact does exist in other
languages in fact a few that we'll see before long so to summarize the types we saw last week just recall this little
list here we had ins and floats and Longs and doubles and chars there's also Bulls and also string which we've seen a
few times but today let's actually start to formalize what these things are and actually like what your Mac and PC are
doing when you manipulate bits as an INT versus a Char versus a string versus something else and see if we can't put
more tools into your toolkit so to speak so we can start quickly writing more featureful more sophisticated programs
in C so it turns out that on most systems nowadays though this can vary by actual computer this is how large each
of the data types typically is is in C when you store a Boolean value a zero or One A true or a false or true it
actually uses one bite that's actually a little excessive because strictly speaking you only need one bit which is
1/8 of this size but for Simplicity computers use a whole bite to represent a bu true or false a Char we saw last
week is actually only uh one bite or eight bits and this is why ASI which uses one bite or technically only seven
bits early on was confined to only 256 maximally possible characters notice that an INT is 4 bytes or 32 bit uh 32
bits a float is also 4 bytes or 32 bits but the things that we called long is literally twice as long 8 bytes or 64
bits and so is a double a double is 64 bits of precision for floating Point values and a string for today we're
going to leave as a question mark because we'll come back to that later today and next week as to how much space
a string takes up but suffice it to say it's going to take up a variable amount of space depending on whether the string
is short or long but we'll see exactly what that means before long so here's a photograph of a typical piece of memory
inside of your Mac or PC or phone um and odds are it might be just a little smaller in some devices this is known as
Ram or random access memory and each of these little black chips on this circuit board the green thing these little black
chips are where zeros and ones are actually stored each of those stores some number of bytes maybe megabytes
maybe even gigabytes nowadays so let's actually focus on just one of those chips just to give us a sort of zoomed
in version thereof and let's consider the fact that even though we don't have to care exactly how this kind of thing
is made if this is like one gigabyte of memory for the sake of discussion it stands to reason that if this thing is
storing one billion bytes one gigabyte then we can number them kind of arbitrarily like maybe this will be B 0
1 2 3 4 5 6 7 8 and then maybe way down here in the bottom right corner is bite number 1 billion right we can just
number the things as might be our our convention so let's actually draw that graphically not with a billion squares
but fewer than those and let's just zoom in further and consider that all right at this point in the story let's
abstract away all the hardware and all the little wires and just think of memory as taking up or rather just think
of data as taking up some number of bytes so for instance if you were to store a Char in a computer's memory
which was one bite it might be stored literally at this like top left hand location of this this uh black Chip of
memory if you were to store something like an integer that uses four bytes well it might use four of those bytes
but they're going to be contiguous back to back to back in this case if you were to store a long or a double you might
actually need eight bytes so I'm just kind of filling in these squares to represent how much memory and given
variable of some data type would take up one or four or eight in this case here well from here let's go ahead and just
abstract away from all of the hardware and just really focus on memory as being a grid or really like a canvas that we
can paint any types of data onto that we want at the end of the day all of this data is just going to be zeros and ones
but it's up to you and I to sort of build abstractions on top of that things like actual numbers and colors and
images and movies and Beyond but we'll start lower level here first suppose I had a program that needs three integers
like a simple program whose purpose in life is to like average your three scores on an exam or some such thing and
suppose that your three scores were these 72 and 73 not too B and 33 which is particularly low let's go ahead and
write a program that actually does this kind of averaging for us let me go back to vs code here let me open up a file
called scores. C and let me go ahead and implement this as follows uh let me include standard i.h at the top in main
void as before and then inside of main let me go ahead and declare score one which is 72 uh give me another score uh
73 and then a third score called score three which is going to be 33 and now I'm just going to use print F to print
out the average of those things and I can do this in a few different ways but I'm going to just print out percent F
and I'm going to do score one plus score 2 plus score 3 divided by 3 Clos parenthesis semicolon so just some B
relatively simple arithmetic just to compute the average of three scores if I'm curious like what my average grade
is in the class with these three assessments all right let me go ahead now and do make
scores huh all right so I've somehow made an error already but this one is actually gerine to a problem
we hopefully won't can counter too frequently what's going on here so underlined is Score 1 plus score two
plus score 3 divided by three format specifies type double but the argument has Type int well what's going on here
because the arithmetic seems to check out yeah [Music]
correct and we'll come back to this in more detail but indeed What's Happening Here is I'm adding three ins together
obviously because I Define them right up here and I'm dividing by another int three but the catch is recall that c
when it performs math treats all of these things as integers but integers are not floating Point values so if you
actually want to get a precise average for your score without throwing away the remainder everything after the decimal
point it turns out in this case we're going to have to we're going to oh we're going to have to we're going to
have to convert this whole expression somehow to a float and there's a few ways to do this but the easiest way for
now I'm just going to go ahead and do this up here I'm going to change the Divide by3 to divide by 3.0 because it
turns out long story short in C so long as one of the values participating in an arithmetic expression like this is
something like a float the rest will be treated as promoted to so to speak a floating point value as well so let me
now recompile this code with uh make scores enter this time it worked okay because I'm treating a float as a float
and let me do do/ scores enter all right my average is 59.3 3333 and so forth all right so the math
presumably checks out floating point in Precision per last week aside but let's consider the design of this program like
what is kind of bad about it or if we maintain this program longer term or we going to
regret the design of this program what might not be ideal here [Music]
yeah yeah so in this case I have hardcoded my three score so if I'm hearing you correctly uh this program is
only ever going to tell me this specific average I'm not even using something like get int or get float to get three
different scores so that's not good and suppose that we wait later in the semester I think other problems could
arise [Music] yeah I can't reuse the number because I
haven't stored the average in some variable which in this program not a big deal but certainly if I wanted to reuse
it elsewhere that's a problem and let's fast forward again a little later in the semester I don't just have three test
scores or exam scores maybe I have four or five or six where might this take [Music]
us yeah I've sort of capped this program at three and honestly this is just kind of bordering on copy paste even though
the variables yes have different names score one score two score three imagine doing this for like a whole Gradebook
for a class having score four 5 6 11 10 12 20 30 like that's a lot of variables and you can imagine just how ugly the
code starts to get if you're just defining variable after variable after variable so it turns out there are
better ways in languages like C if you actually want to have multiple uh values stored in memory that happen to be of
the same data type and so let's take a look back at this memory here to see what these things might look like in
memory here's that grid of memory and each of these recall represents a bite so just to be clear if I store score one
in memory first how many bytes will it take up so four AKA 32 bits so I might draw
score one as filling up this part of the memory it's really up to the computer as to whether it goes here or down there or
wherever I'm just keeping the pictures clean though for today from the top left on down if I then declare another
variable called score two it might end up over there also taking up four bytes and then score three might end up here
and so that's just representing what's going on inside of the computer's memory but technically speaking to be clear per
week zero What's really being stored in the computer's memory are patterns of zeros and ones 32 total in this case
because 32 bits is four bytes but again it sort of gets boring quickly to think at think in and look at binary all the
time so we'll generally ract this away as just using decimal numbers in this case instead but there might be a better
way to store not just three of these things but maybe four maybe five maybe 10 maybe more by declaring one variable
to store all of them instead of three or four or five or more individual variables and the way to do this is by
way of something generally known as an array an array is another type of data that allows you to store
multiple values of the same type back to back to back that is to say contiguously so an array can let you create memory
for one int or two or three or even more than that but describe them all using the same variable name the same one name
so for instance if for one program I only need three integers but I don't want to sort of uh messily declare them
as score one score two score three I can actually do this instead and this is today's first new piece of syntax the
square brackets that we're now seeing this line of code here is similar to int score one semicolon or int score one
equals 72 semicolon this line of code is declaring for me so to speak an array of size three and that array is going to
store three integers why because the type of that array is an INT here the square square
brackets tell the computer how many ins you want in this case three and the name is of course scores which in English
I've just deliberately pluralized now so that I can describe this array as storing multiple scores indeed so if I
want to now assign values to this variable called scores I can do code like this I can say scores bracket 0
equal 72 scores bracket 1 equal 73 and scores bracket 2 equal 33 the only thing weird there is admittedly the square
brackets which are still but we're also notice zero indexing things to zero index means to start
counting at zero and we've talked about that before our for Loops have generally been zero indexed arrays in C are zero
indexed and you do not have Choice over that you can't just start counting at one in arrays just because you prefer to
you'd be sacrificing one of the elements you have to start in arrays counting from zero so out of context this doesn't
necessarily solve a problem but it definitely is going to once we have more than even three scores here in fact let
me go ahead and change this program a little bit let me go back to vs code here and let me go ahead and delete
these three lines here and let me replace it with a scores variable that's ready to store three total integers and
then let me go ahead and initialize them as follows scores bracket 0 is 72 as before scores bracket 1 is going to be
73 scores bracket two is going to be 33 notice I do not need to say int before any of these lines because that's been
taken care of already for me on line five where I already specified that everything in this array is going to be
an INT now down here this code needs to change because I no longer have three variables Score 1 2 and three I have one
variable but that I can index into I'm going to here then do scores bracket 0 plus scores bracket 1 plus scores
bracket 2 which is equivalent to what I did earlier giving me back those three integers but notice I'm using the same
variable name every time and again I'm using this new square bracket notation to quote unquote index into the array to
get at the first int the second int and the third and then to do it again down here now this program is still not
really solving all the problems we describ like I still can only store three scores but we'll come back to
something like that before long but for now we're just introducing a new syntax and a new feature whereby I can now
store multiple values in the same variable well let's enhance this a bit more
instead of hardcoding these scores as was identified as a problem let's go ahead and use get int to ask the user
for a score let's then use get int to ask the user for another score let's use get int to ask the user for a third
score storing them in those respective locations and now if I go ahead and save this program recompile
scores huh I've messed up here but now these errors should be getting a little familiar what mistake did I make
let me give folks a moment cs50.h so that was not intentional so still making mistakes all
these years later I need to include cs50.h now I'm going to go back to the bottom in the terminal window make
scores okay we're back in business/ scores now the program is getting a little more interesting so maybe this
year was better and I got a 100 and a 99 and a 98 and there my average is 99.00 so now it's a little more Dynamic
so it's a little more interesting but it's still capping the number of scores at three of admittedly but now I've kind
of introduced another sort of symptom of bad programming there's this expression in programming too called code smell
where like something smells a little off and there's something off here in that I could do better with this code here does
anyone see an opportunity to improve the design of this code here if my goal still is just to get three scores from
the user but without it like smelling kind of bad yeah Lo that
yeah exactly those lines of code are almost identical and honestly the only thing that's changing is the number and
it's just incrementing by one we have all of the building blocks to do this better so let me go ahead and improve
this let me go ahead and delete that code uh let me go ahead now and have a for Loop uh so for in I gets Zer I Less
Than 3 I ++ then inside of this for loop I can distill all three of those lines into something more generic like scores
bracket I equals get in and now ask the user just once via get int for a score so this is where arrays start to get
pretty powerful you don't have to hard code that is literally type in all of these magic numbers like 0 1 and two you
can start to do it programmatically as you propose with a loop so now I've kind of tightened things up I'm now
dynamically getting three different scores but putting them in three different locations and so this program
ultimately is going to work pretty much the same make scores do/ scores and 100 99 98 and we're back to the same answer
but it's a little better design too if I really want to nitpick there's something that still smells a little bit here the
fact that I have indeed this magic number three here that really kind of has to be the same as this number here
otherwise who knows what's going to go wrong so what might be a solution per last week to kind of cleaning that cat
up further too okay so we could leave it up to the user's discretion and so we could
actually do something like this let me take this a few steps ahead let me say something like in and get get int how
many scores question mark then I could actually change this to an n and then this to an end and indeed make the whole
program Dynamic ask the human how many tests have there been this semester then you can type in each of those scores
because the loop is going to iterate that many times and then you'll get the average of one test two test three I
lost another um uh or or however many scores that were actually specified by the user Yeah
[Music] question how how many bytes are used in an
[Music] array ah so the purpose of an array is not to save space it's to eliminate
having multiple variable names because that just gets very messy quickly if you literally have score one score two score
three dot dot dot score 99 that's literally like 99 different variables potentially that you could actually
collapse into one variable that has 99 locations if you will at different indices or indexes as someone would say
the index for an array is whatever is in the square [Music]
brackets so it's a good question so if you I'm using ins for everything and honestly we don't really need ins for
scores cuz I'm not really likely to get a two billion on a test anytime soon and so you could actually use different data
types and that list we had on the screen earlier is not all of them there's actually a data type called short which
is literally shorter than an INT you could actually technically use Char in some form or even other data types as
well generally speaking in the year 2021 these tend to be over optim overly optimized uh decisions like everyone
just uses ins even though no one's going to get a test score that's 2 billion or more
um CU int is just kind of the go-to years ago memory was expensive and every one
of your instincts would have been spot on because memory is so tight but nowadays we don't worry as much about it
[Music] yeah uh so what is the difference between dividing 2 ins and not getting
an error as you might have encountered in a program like cash versus dividing two ins and getting an error like I did
a moment ago the problem with the scenario I created moment ago was prf was involved and I was telling print F
to use a percent F but I was giving print F the result of dividing integers by another integer so it was printf that
was yelling at me and I'm guessing in the scenario you're describing for something like cach printf was not
involved in that particular line of code so that's the difference there all right so we now have this ability to create uh
in a and an array can store multiple values what then might we do that's more interesting than just storing numbers in
memory well let's take this one step further as opposed to just storing 72 73 33 or 100 99 998 at these given
locations because again an array gives you one variable name but multiple locations or indices therein bracket
zero bracket 1 bracket two on up if it were even bigger than that let's now start to consider something more modest
like simple chars chars being one bite each so they're even smaller they take up much less space and indeed if I
wanted to say a message like hi I could use three variables if I wanted a program to print high Hi exclamation
point literally I could of course store those in three variables like C1 C2 C3 and let's just for the sake of
discussion let's go ahead and whip this up real quickly let me create a new program here now in VSS code this time
I'm going to call it uh high. C and I'm not going to bother with the Cs 50 Library here I just need the standard IO
one for now in main void and then inside of main I'm going to Simply create three variables and this is already hopefully
striking you as a bad idea but we'll go down this road temporarily with C1 and C2 and finally C3 storing each character
in the phrase I want to print and I'm going to go ahead now and print this in a different way than usual now I'm
dealing with chars and we've generally dealt with strings which was easier certainly last week but percent C
percent C percent c will let me print out three chars and like C1 C2 and C3 so kind of a stupid way of printing out a
string so we already have a solution to this problem last week but let's just poke around at what's actually going on
underneath the hood here so let's make high do/ high and voila no surprise but we again could have done this last week
with a string and just one variable or even zero at that but let's go ahead now and start converting these characters to
their apparent numeric equivalence like we talked about in week 02 let me go ahead and modify these percent C's just
to be fun to be percent I and let me just add some spaces so that there are gaps between each of them let me now
recompile high and let me rerun it and just a guess what should I see on the screen
now any guesses yeah the asky values and it's intentional that I keep using the same
word hi because it should be hopefully the old friends 7273 and 33 which is to say that c knows about ASI or
equivalently Unicode and can do this conversion for us automatically and it seems to be doing it implicitly for us
so to speak notice that C1 C2 and C3 are obviously chars but printf is able to tolerate printing them as integers if I
really wanted to be pedantic I could use this technique again known as typ casting where I can actually convert one
data type to another if it makes logical sense to do so and at the end of the day we saw in week zero chars or characters
are just numbers like 72 73 and 33 so I can actually use this parenthetical expression to convert incorrectly three
chars to three integers instead so that's what I meant to type the first time there we go strike two today so
parenthesis int close parenthesis just says take whatever variable comes after this C1 or C2 or C3 and convert it to an
INT the effect is going to be no different here make high and then rerunning whoops then running do/ High
still works the same but now I'm explicitly converting chars to ins and we can do this all day long charge to
ins floats to INS INS to floats sometimes it's equivalent other times you're going to lose information taking
a float to an INT just intuitively is going to throw away everything after the decimal point because after all an INT
has no decimal point but for now I'm going to go ahead and Rewind to the version of this that just did implicit
type conversion or implicit casting just to demonstrate that we can indeed see the values underneath the hood all right
let me go ahead and do this now the week one way this was kind of stupid let's just do print F quote unquote actually
let's do this string s equals quote unquote high and then let's go ahead and do a simple print f with percent s
printing out s is there so now I've rewound to last week where we began this story but you'll notice that if we keep
playing around with this whoops what did I do here oh let me introduce the cs50 library here more on that before long
let me go ahead and recompile rerun this we seem to be be coding in circles here like I've just done the same thing
multiple different ways but there's clearly an equivalence then between sequences of chars and strings and if
you do it the real pedantic way you have like three different variables C1 C2 C3 representing Hi exclamation point or you
can just treat them all together like this Hi exclamation point but it turns out that strings are actually
implemented by the computer in a pretty now familiar way what might a string actually be as of this point in the
story where are we going with this let me try to look further back yeah and way back yeah Str an array yeah a string
might be and indeed is just an array of characters so last week we just took for granted that strings exist technically
strings exist but they're implemented as arrays of characters which actually opens up some interesting possibilities
for us because let me see let me see if I can do this let me try to print out now three integers again but if string s
is but an array as you propose maybe I can do s bracket 0 S bracket 1 and S bracket 2 so maybe I can start poking
around inside of strings even though we didn't do this last week so I can get at those individual values so make high do/
high and voila there we go again it's the same 7273 33 but now I'm sort of hopefully like wrapping my mind around
the fact that all right a string is just an array of characters and arrays you can index into them using this new
square bracket notation so I can get at any one of these individual characters and heck convert it to an integer like
we did in week zero as I might but let me get a little curious now too what might what else might be in the
computer's memory well let's toggle back to the the depiction of these same things here might be how we originally
implemented high with three variables C1 C2 C3 of course that M to these decimal digits or equivalent these binary values
but what was this looking like in memory literally when you create a string in memory like this string s equals quote
unquote High let's consider what's going on underneath the hood so to speak well as an abstraction a string it's Hi
exclamation point taking up it would seem three bytes right I've gotten rid of the bars there because if you think
of a string as a type I'm just going to use one big box of size three but technically a string we've just revealed
is an array and the array is of size three so technically if the string is called s s bracket 0 will give you the
first character s bracket 1 the second and S bracket 3 the third but let me ask this question now
if this at the end of the day is the only thing in your computer memory and the ability like a canvas to draw zeros
and ones or numbers or characters or whatever on it but that's it like this is what your Mac and PC and phone
ultimately reduced to suppose that I'm running a piece of software like a text messenger and now I write down by
exclamation point well where might that go in memory well it might go here B ye and then the next thing I type might go
here here here and so forth my memory just might get filled up over time with things that you or someone else are
typing but then how does the computer know if potentially b y exclamation point is
right after Hi exclamation point where one string ends and the next one begins right all we have are btes or
zeros and on so if you were designing this how would you implement some kind of delimiter between the two or figure
out what the length of a string is what do you think okay so the right answer is use a
null character and for those who don't know what does that mean yeah so it's a special character
let me describe it as a sentinel character humans decided some time ago that you know what if we want to
delineate where one string ends and where the next one begins we just need some special symbol and the symbol
they'll use is generally written as back sl0 this is just Shand notation for literally 80 bits
00000000 00000000 and the nickname for eight zero bits in this context is null n l so to speak and we can actually see
this as follows if you look at the corresponding decimal digits like you could do by just doing out the math or
doing the conversion like we've done in code you would see for storing High 72 73 33 but then one extra bite that's
sort of invisibly there but that is all zeros and now I've just written it as the decimal number zero the implication
of this is that the computer is apparently using not three bytes to store a word like high but four byes
whatever the length of the string is plus one for this special Sentinel value that demarcates the end of the string so
we might draw it like this instead and this character is again sort of pronounced null or written n l so that's
all right if humans at the end of the day just have this canvas of memory they just needed to decide all right well how
do we distinguish one string from another because it's a lot easier with chars individually it's a lot easier
with ins it's even with floats why because per that chart earlier every character is always one bite every in is
always four byes every long is always eight bytes how long is a string well high is 1 2 three with an exclamation
point by is 1 2 3 4 with an exclamation point David is d a v i D5 without an exclamation point and so a string can be
any number of uh bytes long so you somehow need to draw a line in the sand to separate in memory one string from
another so what's the implication of this well let me go back to code here let's actually poke around this is a bit
dangerous but I'm going to start looking at uh memory locations past my string here so let me go ahead and recompile uh
make high whoops what did I do here oh I forgot a a format code let me add one more percent I now let me go ahead and
rerun make high do/ high enter there it is so you can actually see in the computer unbeknownst to you previously
that there's indeed something else going on there and if I were to make like one other variant of this program let's get
rid of just this one word and maybe let's have two so let me give myself another string called t for instance
just just this common convention uh with by exclamation point let me then go ahead and print out with percent s s and
let me also print out with percent s whoops print F uh print out t as well let me just recompile this program and
obviously the out ah this is what happens when I go too fast all right third mistake today close quote as I was
missing make high fourth mistake today make high do/ high okay voila now we have a program that's printing both high
and by only so that we can consider what's going on in the computer's memory if s is storing high and apparently one
bonus bite that demarcates the end of that string by is apparently going to fit into to the location directly after
and it's wrapping around but that's just an artist rendition here but by B exclamation point is taking up one 2 3 4
plus a fifth bite as well all right any questions on this underlying representation of strings and
we'll contextualize this before long so that this isn't just like okay who really cares this is going to be the
source of actually implementing things in fact for problem set 2 like cryptography and encryption and actually
scrambling actual human messages but some questions [Music]
first a good question too and let me summarize as if we were instead to use chars all the time we would indeed have
to know in advance how many chars you want for a given string that you're storing how then does something like get
string work because when cs50 wrote the get string function we obviously don't know how long the words are going to be
that you are typing in it turns out next uh two weeks from now we'll see that get string uses a technique known as dynamic
memory allocation and it's going to grow or Shrink the array uh automatically for you but more on that soon other
questions are we isn't that wasting a b good question why are we using a null value isn't it
wasting a bite yes but I claim there's really no other way to distinguish the end of one string from the start of
another unless we make some sort of Mark uh notation so to speak in memory all we have at the end of the day inside of a
computer are bits therefore all we can do is spend those bits in some creative way to solve this problem and so we're
minimally going to spend one bite to solve this problem here [Music]
yeah if you don't how does uh the computer know to move to a next line when you have a back sln so back sln
even though it looks like two characters it's actually stored as just one bite in the computer's memory there's a mapping
between it and an actual number and you can see that for instance on the asky chart from the other day so
[Music] would it would be if I had put a backs slash n in my code here right after the
exclamation point here and here that would actually shift everything in memory because we would need to make
room for a backs slash n here and another one over here so it would take two more bites exactly other
[Music] [Music] questions and what what's the last thing
you said it's context sensitive so if at the end of the day all we're storing is these
numbers like 72 73 33 recall that it's up to the program to decide based on context how to interpret them and I
simplified this story in week zero saying that photoshop interprets them as RGB colors and iMessage or a text
messaging program interprets them as uh letters and Excel interprets them as numbers how those programs do it is by
way of variables like string and int and float and in fact later the semester we'll see a data typee via which you can
represent a color as a triple of numbers and red value a green value and a blue value so we'll see other data types as
well yeah it seems easy enough end why do we have integ and long integ why can't we make everything
variable in data size really interesting question why could we not just make all data types variable in size and some
languages some libraries do exactly this um see is an older language and so because memory was expensive memory was
limited the reality was you gain benefits from just standardizing the size of these things you also get
performance increases in the sense that if you know every int is four bytes you can very quickly and we'll see this next
week jump from integer to another to another in memory just by adding four inside of those square brackets you can
very quickly poke around whereas if you had variable length numbers you would have to kind of follow follow follow
looking for the end of it follow follow you would have to look at more locations in memory so that's a topic we'll come
back to but it was generally for efficiency other question [Music]
yeah good question why not store the okay same one why not store the null character at the
beginning uh you could I uh let's see why not store it at the beginning you could do that um you could
absolutely well could you do this if you were to do that at the beginning short answer no okay now I
retract that no I because I finally thought of a problem with this if you start at the beginning instead we'll see
in just a moment how you can actually write code to figure out where the end of a string is and the problem there is
you wouldn't necessarily know if you eventually hit a zero at the end of the string because it's the number zero in
the context of like Excel using some memory or if it's the context of some other data type altogether so the fact
that we've standardized the fact that we've standardized um strings as ending with null means that we can reliably
distinguish one variable from another in memory and that's actually a perfect segue now to actually using this
primitive to building up our own code that manipulates these things at a lower level so let me go ahead and do this let
me create a new file this time called length and let's just use this basic idea to figure out what the length of a
string is after uh it's been stored in a variable here so let's go ahead and do this let me include both the
cs50 header and the standard IO header give myself int main void again here and inside of main let me go ahead and do
this let me prompt the user for a string s and I'll ask them for uh a string like their name here and then let me go ahead
and actually let me name it more verbosely name this time and now let me go ahead and do this let me iterate over
every character in this string in order to figure out what it's length is so initially I'm going to go ahead and say
this uh int length equals z because I don't know what it is yet so we're going to start at zero and then while the
following is true while let me do I want to do this let me change this to I just for
clarity let me go ahead and do this while name bracket I does not equal that special null character so I typed it on
the slide as n but you don't write n in code you actually use its numeric equivalent which is back Z in single
quotes while name bracket I does not equal the null character I'm going to go ahead and increment I to
i++ and then down here I'm going to print out the value of I to see what we actually get printing out the value of I
all right so what's going to happen here let me go ahead and run make length for fortunately no errors /length and let me
type in something like Hi exclamation point Enter and I get three let me try by exclamation point Enter and I get
four let me try my own name David enter five and so forth so what's actually going on here well it seems that by way
of this for Loop we are specifying a local variable called I initialized to zero because we're figuring out the
length of the string as we go I'm then asking the question does location Z that is I in the name String which we
now know is an array does it not equal back sl0 because if it doesn't that means it's an actual character like H or
b or d so let's increment I then let's come back around to line nine and let's ask the question again now I equals 1 so
does name bracket 1 not equal back sl0 well if it doesn't and it won't if it's an i or a y or an A based on what I
typed in we're going to increment I once more fast forward to the end of the story once I get to the end of the
string technically one space past the end of the string name bracket I will equal back sl0 so I don't increment I
anymore I end up just printing the result so what we seem to have here with some lowlevel C code just this while
loop is a program that figures out the length of a given string that's been typed in let's practice our abstraction
and decompose this into maybe a helper function here let me actually grab all of this code here and let me assume for
the sake of discussion for a moment that I can just call a function now called string length and the length of the
string is name that I want to get and then I'll go ahead and print out just as before with percent I the length of that
string so now I'm abstracting away this notion of figuring out the length of a string that's an opportunity for me to
create my own function if I want to create a function called string length I'll claim that I want to take a string
as input and what should I have this function return as its return type what should get string
presumably return yeah an INT right an INT makes sense float really wouldn't make sense
because we're measuring things that are uh integers in this case the length of something so indeed let's have it return
an INT I can pretty much use the same code as before so I'm just going to paste what I cut earlier in the file and
the only thing I have to change here is the name of the variable because now this function I decided kind of
arbitrarily that I'm going to call it s just to be more generic so I'm going to look at s bracket I at each location and
I don't want to print it at the end this would be a side effect what's the line of code I should include here if I
actually want to hand back the total length yeah say again return I in this case so I'm going to go ahead and return
I not print it because now my main function can use the return value stored in length and print it on the next line
itself I just need a prototype so that's my one forgivable copy paste here I'm going to rerun make length hopefully I
didn't screw up I didn't do/ length I'll type in high oops I'll type in high again that works I'll type in bu again
and so forth all right so now we have a function that determines the length of a string well it turns out we didn't
actually need this all along it turns out that we can get rid of my own custom string length function here I can
definitely delete the whole implementation down here because it turns out in a file called string.h
which is a new header file today we actually have access to a function called more succinctly Sterling St Len
which literally does that this is a function that comes with C albeit in the string.h header file and it does pretty
much what we just implemented manually so here's an example of admittedly a wheel we just reinvented but no more we
don't have to do that and how do you know what kinds of functions exist well let me actually pop out of my browser
here to a website that is a cs50's incarnation of what are called manual pages it turns out that in a lot of
systems Macs and Unix and Linux systems including the visual studio code instance that we have in the cloud there
are publicly accessible manual pages for functions they tend to be written very expertly in a way that's not very
beginner friendly so what we have here at manual. CS5 .io is cs50's version of manual pages that have this less
comfortable mode that give you a sort of cheat sheet of very frequently used helpful functions in C and we've
translated the sort of expert notation to things that a a beginner can understand so for instance let me go
ahead and search for string up at the top here you'll see that there's documentation for our own get string
function but more interestingly down here there's a whole bunch of string related functions that we haven't even
seen most of yet but there's indeed one here called Sterling calculate the length of a string and so if I actually
go to Sterling here I'll see some less comfortable documentation for this function and the way a manual page
typically Works whether in cs50's format or any other system is you see typically a synopsis of what header files you need
to use the function so you would copy paste these couple of lines here you see what the Prototype is of the function so
that you know what its inputs are if any and its outputs are if any then down below you might see a description which
in this case is pretty straightforward this function calculates the length of s then you see what the return value is if
any and you might even see an example like this one that we've whipped up here so these manual pages which are again
accessible here and we'll link to these in the problem sets moving forward are pretty much the place to start when you
want to figure out has a wheel been invented already is there a function that might help me solve some problem
set problem so that I don't have to really get into the weeds of doing all of those lower level steps as I've had
sometimes the answer is going to be yes sometimes it's going to be no but again the point of our having just done this
together is to reveal that even the functions you start taking for granted they all reduce to some of these basic
building blocks at the end of the day this is all that's inside of your computer is zeros and ones we're just
learning now how to harness those and how to manipulate them ourselves all right any questions here
on this any questions at all [Music]
yeah good question is it so common that you would have to specify it or not you do need to include its header files
because that's where all of those prototypes are you don't need to worry about linking it in with DL anything and
in fact moving forward you do not ever need to worry about linking in libraries when compiling your code we the staff
have configured make to all of that for you automatically we want you to understand that it is doing it but we'll
take care of all of the- L's for you but the onus is on you for the prototypes and the header files other questions on
these representations or techniques [Music] [Music]
yeah a good question if you you were to have a with actual spaces in it that is multiple words what would the computer
actually do well for this let me go to asky chart.com which is just a random website that's my go-to for the first
127 characters of ASI um this is in fact what we had a screenshot of the other day and if you look here it's a little
non-obvious but SP is space if a computer were to store a space it would actually store the decimal number 32 or
technically the pattern of zeros and ones that represent the number 32 all of the US English keys that you might type
on a keybo keyboard can be represented with a number and using Unicode can you express even things like emojis and
other languages [Music] yeah good question only strings are
accompanied by nulls at the end because every other data type we've talked about thus far is of well-defined finite
length one bite for Char four bytes for in and so forth if we think back though to last week we did end week with a
couple of problems integer overflow cuz like four bytes heck even eight bytes is sometimes not enough we also talked
about floating point in Precision thankfully in the world of scientific Computing and financial Computing there
are libraries you can use that draw inspiration from this idea of a string and they might use nine bytes for an
integer value or maybe 20 bytes You Can Count really high but they will then start to manage that memory for you and
what they're really probably doing is just grabbing a whole bunch of bites and somehow remembering how long the
sequence of bites is that's how these higher level libraries work too all right this has been a lot let's take one
more break here we'll do like a seven minute break here and when we come back we'll flesh out a few more
details all right so we just saw stirl is an example of a a function that comes in the string
Library let's start to take more of these Library functions out for a spin so we're not relying only on the
built-ins that we saw last week let me go ahead and switch over to VSS code and let me create a file file called say
string.h just to kind of apply this lesson learned as follows let me go ahead and include
cs50.h let me include standard i.h and this new thing string.h as well at the top I'm going to do the usual int main
void here and then in this program suppose for the sake of discussion that I didn't know about percent s for print
F or heck maybe early on there was no percent s format code and so there was no easy way to print strings well at
least Le if we know that strings are just arrays of characters we could use percent c as a workaround so to speak a
solution to that sort of contrived problem so let me ask myself for a string s by using get string here and
I'll ask the user for some input and then let me go ahead and print out say output and all I want to do is print
back out what the user typed now the simplest way to do this of course is going to be like last week print F
percent s and plug in the S and we're done but again for the sake of discussion forgot about or someone
didn't Implement percent s so how else could we do this well in pseudo code or in English like what's the gist of how
we could solve this problem printing out the string s on the screen without using percent
s how might we go about solving this just in English high level what would your pseudo code look like
yeah okay so just print each letter and maybe more precisely like some some kind of loop like let's iterate over all of
the characters in s and print one at a time so how can I do that well for in I get zero is kind of the go-to starting
point for most Loops I is less than okay how long do I want to iterate well it's going to depend on what I type in but
that's why we have Stirling now so iterate up to the length of s and then increment I with plus plus on each
iteration and then let's just print out percent C with no new line because I want everything on the same line uh
whatever the character is at s bracket I and then at the very end I'll give myself that new line just to move the
cursor down to the next line so the dollar sign's not in a weird place all right so let's see if I didn't screw up
any of the code make uh string enter so far so good string and let me type in something like high enter and I see
output of high two let me do it once more with bu enter and that works too notice I very deliberately and quickly
gave myself two space here and one space here just cuz I literally wanted these things to line up properly and input is
shorter than output but that was just a deliberate formatting detail so this code is correct which is a claim I've
made before but it's not welld designed now it's it is well-designed in that I'm using someone else's Library
function like I've not reinvented a wheel there's no line 15 or below I didn't Implement string length myself so
I'm at least kind of practicing what I've preached but there's still an imperfection a
suboptimality this one's really subtle though and you have to think about how loops work what am I doing that's not
super efficient yeah and back yeah this is a little subtle but if you think back to the basic definition
of a for Loop and recall when I highlighted things last week what happens well the first thing is that I
get set to zero then we check the condition how do we check the condition we call Sterling on S we get back an
answer like three if it's Hi exclamation point and 0 is less than three so that's fine and then we print out the character
then we increment I from 0 to 1 we recheck the condition how do I recheck the condition I call Sterling of s get
back the same answer three compare three against one we're still good so we print out another character I gets increment
again I is now two we check the condition what's the condition well what's the string length of s it's still
three two is still less than three so I keep asking the same question sort of stupidly because the string is
presumably never changing in length and indeed every time I check that condition that function is going to get called and
every time the answer for high is going to be three three three so it's a marginal sort of suboptimality but I I
could do better right like don't ask multiple times questions that you can remember the answer to so how could I
remember the answer to this question and ask it just once how could I remember the answer to
this question let me see yeah back there so stored in the variable right that's been our answer most any time we
want to keep something around so how could I do this well I could do something like this in maybe length
equals stir Lang of s then I can just change this function call so to speak and let me reix my spelling here here
let me fix this to be now comparing against length and this is now okay because now Sterling is only called once
on line nine and I'm reusing the value of that variable AKA length again and again and again so that's more efficient
turns out that for Loops actually let you declare multiple variables at once so we can actually do this a little more
elegantly all in one line and this is just now some syntactic improvement I could actually do something like this n
equals Sterling of s and then I could just say n here or I could call it length but heck while I'm being succinct
I'm just going to use n for number so now it's just a marginal change but I've now declared two variables inside of my
loop I and N i is set to zero n is to the string length of s but now Hereafter all of my condition checks are just I
less than n i less than n and n is never now changing all right so a marginal Improvement there now that I've used
this new function let's use some other functions that might be of interest let me go ahead and write a quick program
here that maybe like upper uh capitalizes the beginning of uh that uh changes to uppercase some string that
the user types in so let me go ahead and code a file called uppercase doc uh up here I'll use my new friends cs50.h and
standard iio and string.h so standard iio and string.h so just as before int main void and then inside of main what
I'm going to do this time is let's ask the user for a string s and get string asking them for the before value and
then let me go ahead and just print out something like after uh so that it uh just so I can see what the uppercase
version thereof is and then after this let me go ahead and do the following for in I equals z Oh let's practice that
same lesson so n equals the string length of s i is less than n i ++ so really nothing new really fundamentally
yet how do I now convert characters from lowercase if they are to uppercase in other words if I type in high hi and
lowercase I want my program now to uppercase everything to capital h capital I well how can I go about doing
this well you might recall that there is this you might recall that there is this asy chart so let's just consult this
real quick on the aski chart.com we've looked at this last week notice that a capital A is 65 capital B is 66 Capital
C is 67 and heck here's lowercase a lowercase b lowercase C and that's 97 98 99 and if I actually do some math
there's like a distance of 32 right so if I want to go from uppercase to lower case I can do 65 + 32 will give me 97
and that actually works out across the board for everything else 66 plus 32 gets me to 98 or a lower case b or
conversely if you have a lowercase a and its value is 97 subtract 32 and boom you have capital A so all right there's some
arithmetic here involved but now that we know that strings are just arrays and we know that characters which are in those
arrays are just binary representations of numbers I think we can manipulate a few of these things as follows let me go
back to my program here and first ask the question if the current character in the array during this Loop is lowercase
let's force it to uppercase so how am I going to do that if the character at s bracket I the current location in the
array is greater than or equal to lowercase A and S braet i is less than or equal to lowercase z kind of a weird
uh Boolean expression but completely legitimate because in this array s is a whole bunch of characters that the
humans typed in because that's what a string is greater than or equal to a might be a little nonsensical cuz when
of you ever compared numbers to letters but we know from week zero lowercase a is 97 lowercase z is what is it one I
don't even remember what is that 132 132 we know and so that would allow us to answer the question is the current
letter lowercase all right so let me go ahead here and answer that question if it is what do I want to print out I
don't want to print out the letter itself I want to print out the letter minus 32 right cuz if it happens to be a
lowercase a 97 97 - 32 gives me 65 which is uppercase a and I know that just from having stared at that chart uh in the
past else if the character is not between little a and big a I'm just going to print out the character Itself
by printing s bracket I and at the very end of this I'm going to go ahead and print out a new line just to move the
cursor to the next line so again it's a little wordy but this Loop here which I borrowed from our code previously just
iterates over the string AKA array character by character through its length this line 11 here is just asking
the question if that current character the I character of s is greater than or equal to little a and less than or equal
to little Z that is between 97 and 132 then we're going to go ahead and force it to uh uppercase instead all right and
let me go ahead and zoom out here for just a second and sorry I miss book 122 which is what what you might have said
there's only 26 letters so 122 is little Z let me go ahead now and compile and run this program so make
uppercase uppercase and let me type in high and lowercase enter and there's the capitalized version there of let me do
it again with like my own name and lowercase and now it's capitalized as well well what could we do to improve
this well you know what let's stop Reinventing Wheels let's go to the manual pages so let me go here and
search for something like like uh I don't know lowercase and there I go I did some autocomplete here our little
search box is saying that okay there's an is lower function check whether a character is lowercase well how do I use
this well let me check is lower now I see the actual man page for this function um now we see include ctype.h
so that's the protot that's the header file I need to include this is the prototype for is lower it apparently
takes a Char as input and returns an INT which is a little weird I feel like is lower should return true or false so
let's scroll down to the description and return value it returns oh this is interesting
and this is a convention in C this function returns a nonzero int if C is a lowercase letter and zero if C is not a
lowercase letter so it returns nonzero so like one negative 1 something that's not zero if C is a lowercase letter and
zero if it is not a lowercase letter so how can we use this this building block let me go back to my code here let me
add this file include ctype.h and down here let me get rid of this cryptic expression which was kind of you know
painful to come up with and just ask this is lower s bracket I uh that should actually work but why
well is lower again returns a nonzero value if the letter is lowercase well what does that mean that means it could
return one it could return negative 1 it could return 50 or50 it's actually not precisely defined why just because like
this was a common convention to use zero to represent false and use any other value to represent true and so it turns
out that inside of Boolean Expressions if you put a value like a function call like this that returns zero that's going
to be equivalent to false it's like the answer being no it is not lower but you can also just in parenthesis put the
name of the function and its arguments and not compare it against anything because we could do something like this
well if it's not equal to zero then it must be lowercase because that's the definition if it returns a non-zero
value it's lowercase but a more succinct way to do that is just a bit more like English if it's is lower then print out
the character minus 32 so this would be the common way of using one of these is functions to check if the answer is true
or false okay well we might be done okay no so it's not necessarily one it
would be incorrect to check for one or negative 1 or anything else you want to check for the opposite of zero so not
equal zero or more succinctly like I did by just putting it into parentheses let me see what happens
here so this is great but some of you might have uh spotted a better solution to this problem a moment ago when we
were on the manual pages sech ing for things related to lowercase what might be another building block we can employ
here based on what's on the screen here yeah so two upper there's a function that would literally do the upper casing
for me so I don't have to get into the weeds of like - 32 plus 32 I don't have to consult that chart someone has solved
this problem for me in the past and let's see if I can actually get back to it there we go let me go ahead now and
use this so instead of doing s braet IUS 32 let's use a function that someone else wrote and just say two upper s
braet i and now it's going to do the pro the solution for me so if I rerun make uppercase and then
do slowly uppercase type in high now it's working as expected and honestly if I read the documentation for two upper
by actually going back to its man page or manual page what you'll see is that it says if it's lowercase it will return
the uppercase version thereof if it's not lowercase it's already uppercase it's punctuation it will just return the
original character which means thanks to this function I can actually tighten this up significantly get rid of all of
my conditional there and just print out the two upper return value and leave it to whoever wrote that function to figure
out if something's uppercase or lowercase all right questions on these kinds of Tricks again it all
reduces to like weak zero Basics but we're just building these abstractions on top
[Music] yeah yes unfortunately no there is no easy way in C to say give me everything
that was for historically uh performance reasons they want you to be explicit as to what you want to include in other
languages like python Java one of which will later this term you can say give me everything but that actually tends to be
best practice because it can actually slow down execution or compilation of your code
yeah uh does two upper accommodate special characters like punctuation yes if I read the documentation more
pedantically we would see exactly that it will properly hand me back an exclamation point even if I passed it in
so if I do make uppercase here and let me do/ upper sorry do/ uppercase high with an exclamation point point it's
going to handle that too and just pass it through unchanged [Music]
yeah really good question too no we do not have access to a function that at least comes with C or comes with cs50's
library that will just force the whole thing to uppercase in C that's actually easier said than done in Python it's
trivial so stay tuned for another language that will let us do exactly that all right so what does this leave
us with there's just a let's come full circle now to where we began today where we were talking about those command line
arguments recall that we talked about RM taking a command line argument the word the file you want to delete we talked
about clang taking command line arguments that again modify the behavior of the program how is it that maybe you
and I can start to write programs that actually take command line arguments well here is where I can finally explain
why we've been typing int main void for the past week and just asking that you take on faith that it's just the way you
do things well by default in C at least the most late the most uh recent versions thereof there's only two
official ways to write main functions you might see other formats online but they're generally not consistent with
the current specification this again was sort of the boilerplate for the simplest function we might write last week and
recall that we've been doing this the whole time void what that void means for all of the programs I have written thus
far and you have written th thus far is that none of our programs that we've written take commandline arguments
that's what the void there means it turns out that Maine is the way you can specify that your program does in fact
take command line arguments that is words after the command in your terminal window if you want to actually not use
get in or get string you want the human to be able to say something like hello David and hit enter and just run hello
print hello David on the screen you can use command line arguments words after the program name on your uh command line
so we're going to change this in a moment to be something more verbose but something that's now a bit more familiar
syntactically if you change that void in main to be this incantation instead int argc comma string argv Open Bracket
close bracket you are now giving yourself access to writing programs that take command line arguments ARG C which
stands for argument count is going to be an integer that stores how many words the human typed at the prompt we the C
automatically gives that to you string arv stands for argument Vector that's going to be an array of all of the words
that the human typed at the prompt so with today's building block of an array we have the ability now to let the
humans type as many words or as few words as they want at the prompt C is going to automatically put them in an
array called argv and it's going to tell us how many words there are in an INT called ARG C
the int as the return type here we'll come back to in just a moment let's actually use now this definition to make
maybe just a couple of simple programs but in problem set two will we actually use this to control the behavior of your
own code let me go ahead and code up a file card AR v.0 just to keep it uh uh apply named let me go ahead and include
cs50.h let me go ahead and include whoops that is not the right name of a program let's start that
over let let's go ahead and code up argv doc and here we have uh include cs50.h includes standard. o.h int main not void
let's actually say int rgc uh string Arvy Open Bracket close bracket no numbers in between because
you don't know in advance how many words the human's going to type at their prompt now let's go ahead and do this
let's write a very simple program that just says hello David hello Carter whoever the name is that gets typed but
not using get string let's instead have the human just type their name at the prompt just like RM just like clang just
like make so it's just one and done when you hit enter no additional prompts let me go ahead then and do this print F
quote unquote hello comma and instead of world today I want to print out whatever the human typed in so let's go ahead and
do this argv bracket Z for now but I don't think this is quite what I want because of course that's going to
literally print out a RG V bracket Z bracket then I need a placeholder so let me put a percent s here and then put
that here so if arv is an array but it's an array of strings then argv bracket zero is itself a single string and so it
can be plugged into that percent s placeholder let me go ahead and save my program and let me go ahead and compile
argv so far so good let me go ahead now and type in my name after the name of the program so no get string I'm
literally typing an extra word my own name at the prompt enter okay it's it's apparently a little
buggy in a couple of ways I forgot my back sln but that's not a huge deal but apparently inside of argv is literally
everything the humans typed in including the name of the program so logically how do I print out hello David or hello so
and so and not the actual name of the program what needs to change here yeah yeah so presumably index to one if
that's the second thing I or who whichever human has typed at the prompt so let's do make Arvy again/ argv enter
huh hello null so this is another form of null but this is user error now on my part I didn't do exactly what I said I
would yeah yeah I forgot the parameter so that's actually H I should probably deal
with that somehow so that people aren't sort of breaking my program and printing out random things like null but if I do
say say argv David now you see hello David I can get a little curious like what's at location two well we can see
make argv bracket argv David enter all right so just nothing is there but it turns out
in a couple weeks we'll start really poking around memory and see if we can't crash programs deliberately because
nothing is technically stopping me from saying oh what's at location 2 million for instance we could really start to
get curious but for now we'll do the right thing but let's now make sure the human has typed in the right number of
words so let's say this if argc equals 2 that is the name of the program and one more word after that go ahead and trust
that in arv1 as you proposed is the person's name else let's go ahead and just default here to something simple
and uh basic like well if we don't get a name from the user just say hello world like always so now we're sort of
programming defensively this time the human even if they screw up they don't give us a name or they give us too many
names we're just going to say hello world because I now have some error handling here because again Arc is
argument count the number of words total typed at the command line so make argv arv let me make the same mistake as
before okay I don't get this weird null Behavior I get something well defined I could now do David I could do David men
but that's not currently supported I would need to alter my logic to support more than just two words after the
prompt so what's the point of this at the moment it's just a simple exercise to actually give myself a way of taking
user input when they run the program because consider that's just more convenient in this new command line
interface world if you had to use get string every time you compile your code it'd be kind of annoying right you type
make then you might get a prompt what would you like to make then you type in hello or cash or something else then you
hit enter it just really slows the process but in this command line interface world if you support command
line arguments then you can use these little tricks like scrolling up and down in your history with your arrow keys you
can just type commands more quickly because you can do it all once all at once and you don't have to keep
prompting the user more pedantically for more and more info so any questions then on commandline arguments which finally
reveals why we had void initially but what more we can now put in main that's how you take command line arguments yeah
to put uh if you were use R and you were to put integers inside of it would it still give you like a string would that
be considered Str or consider yes if you were to type at the command line something like not a a word
but something like the number 42 that would actually be treated as a string why because again context matters so if
your program is currently manipulating memory as though it's characters or strings whatever ever those patterns of
zeros and ones are they will be interpreted as asky text or Unicode text if we therefore go to the Chart here
that might make you wonder well then how do you distinguish numbers from letters in the context of something like chars
and strings well notice 65 is a 97 is a but also 49 is one and 50 is two so the designers of aski and then later Unicode
realized well wait a minute if we want to support programs that let you type things that look like numbers even
though they're not technically ins or floats we need a way in aski and unicode to represent even numbers so here are
your numbers and it's a little silly that we have numbers representing other numbers but again if you're in the world
of letters and characters you got to come up with a mapping for everything and notice here here's the dot even if
you were to represent 1.23 as a string or as characters even the dot now is going to be represented as an asy
character so again context here matters all right one final example to tease apart what this int is and what it's
been doing here for so long so I'm going to go ahead and add one bit of logic here to a new file that I'm going to
call exit. C so in exit. C we're going to introduce that something that are generally known as exit status it turns
out this is not a feature we've used yet but it's just useful to know about especially when automating tests of your
own code uh when it comes to figuring out if a program succeeded or failed it turns out that main has one more feature
we haven't leveraged an ability to signal to the user whether something was successful or not and that's by way of
main return value so I'm going to go ahead and now modify this program as follows like this suppose I want to
write a similar program that requires that the user type a Word at the prompt so that rxc has to be two for whatever
design purpose if argc does not equal two I want to quit out of my program prematurely because I want to just
insist that the user operate the program correctly so I might give them an error message like missing command line
argument back sln but now I want to quit out of the program now how can I do that the right way quote unquote to do that
is to return a value from Main now it's a little weird because no one called main yet right Main gets called
automatically but the convention is anytime something goes wrong in a program you should return a nonzero
value from main one is fine as a go-to we don't need to get into the weeds of having many different exit statuses so
to speak but if you return one that is a clue to the system the Mac the PC the cloud device that something went wrong
why because one is not zero if everything works fine like let's go ahead and print out hello comma
percent s like before uh quote unquote argv bracket one so this is just a version of the program without an else
so this is the same as doing essentially an else here like I did earlier I want to signal to the computer that all is
well and so I return zero but strictly speaking if I'm already returning here I don't technically need if I really want
to be nitpicky I don't technically need the Els because the only way I'm going to get to line 11 is if I didn't already
turn so what's going on here the only new thing here logically is that for the first time ever I'm returning a value
from Main that's something I could always have done because main has always been defined by us as taking an INT as a
return value by default main automatically sort of secretly returns zero for you if you've never once used
the return keyword which you probably haven't in main it just automatically returns zero and the system assumes that
all went well but now that we're starting to get a little more sophisticated with our code and you know
the programmer something went wrong you can abort programs early you can exit out of them by returning some other
value beside Zero from Main and this is sort of fortuitous that it's an INT right zero means everything worked
unfortunately in programming there are seemingly an infinite number of things that can go wrong an in gives you four
billion possible codes that you can use AKA exit statuses to signify errors so if you've ever on your Mac or PC gotten
some weird popup that an error happened sometimes there's a cryptic number in it maybe it's positive maybe it's negative
it might say error code 123 or 49 or something like that what you're generally seeing are these exit statuses
these return values for Maine in a program that someone at Microsoft or apple or somewhere else wrote something
went wrong they are sort of unnecessarily showing you the user what the error code is if only so that when
you call customer support or submit a ticket you can tell them what exits status you encountered what error code
you encountered all right any questions then on exit statuses which is the last of
our new building blocks for now any questions at all [Music]
yeah no question is can you do things again and again at the command line like you could with get string and get int
which by default recall are automatically designed to keep prompting the user in their own Loop until they
give you a int or a float or the like with command line arguments no you're going to get an error message but then
you're going to be returned to your prompt and it's up to you to type it correctly the next time good question
yeah if you do not return a value explicitly main will automatically return zero for you like that is the way
C simply works so it's not strictly necessary but now that we're starting to return values explicitly if something
goes wrong it would be good practice to also start returning a value for main when something goes right and there are
no errors in fact so let's now get out of the weeds and contextualize this for some actual problems that we'll be
solving in the coming Days by way of problem set two and Beyond so here for instance so here for instance is a
problem that you might think back to when you were a kid the the readability of some text or some book the grade
level at which some book is written if you're a young student you might read at first grade level or third grade level
in the US or if you're in college presumably you're reading at a university level of text but what does
it mean for text like in a book or in an essay or something like that to correspond to some kind of grade level
well here's a a quote a title of in a childhood book One Fish Two Fish Red Fish Blue Fish what might the grade
level be for a book that has words like this maybe when you were a kid or if you have sibling still reading these things
what might the grade level of this thing be any guesses yeah sorry
again before grade one is in fact correct so that's for really young kids and and why is that well let's consider
these are actually pretty simple phrases Right One Fish Two Fish r i mean there's not even verbs in these sentences
they're just uh nouns and adjectives and very short sentences and so that might be a heuristic we could use when
analyzing text well if the words are kind of short the sentences are kind of short everything's very simple that's
probably a very young or early grade level and so by one formulation it might indeed be even before grade one for
someone quite young how about this Mr and Mrs dersley of number four privet Drive we're proud to say that they were
perfectly normal thank you very much they were the last people you would expect to be involved in anything
strange or mysterious because they just didn't hold with such nonsense and onward all right what grade level is
this book at okay I heard third seventh fifth okay all over the
place but grade seven according to one particular measure and whether or not we can we can debate exactly what age you
were when you read this and maybe you're feeling ahead of your time or behind now but here we have a snippet of text what
makes this text assume an older audience a more mature audience a higher grade level would you
think yeah different tyes of words yeah it's longer different types of words there's commas now and phrases and so
forth so there's just some kind of sophistication to this so it turns out for the upcoming problem set among the
things you'll do is take as input texts like this and analyze them considering well how many words are in the text how
many sentences are in the text how many letters are in the text and use those according to a well- defined formula to
prescribe what exactly the grade level of some actual text there's the third might actually be well what else are we
going to do in the coming days well I've alluded to this notion of cryptography in the past this notion of scrambling
information in such a way that you can uh hide the contents of a message from someone who might otherwise intercept it
right the earliest form of this might also be when you're younger and you're in class and you're passing a note from
one person to another from yourself to someone else you don't want to just necessarily write a note in English or
some other written language you might want to scramble it somehow or encrypt it maybe you change the A's to a b and
the B's to a c so that if the teacher snaps it up and intercepts it they can't actually understand what it is you've
written because it's encrypted now so long as your friend the recipient of this note knows how you manipulated it
how you added or subtracted sort of letters to each other they can decrypt it which is to say reverse that process
so formally in the world of craphy and computer science this is just another problem to solve your input though when
you have a message you want to send securely is what's generally known as plain text there's some algorithm that's
going to then encipher or encrypt that information into what's called Cipher text which is The Scrambled version that
theoretically can get safely intercepted and your message has not been spoiled unless that intercept actually knows
what algorithm you used inside of this process so that that would be generally known as a cipher the ciphers typically
take though not one input but two if for instance your Cipher is as simple as a becomes b b becomes c c becomes D dot
dot dot Z becomes becomes a you're essentially adding one to every letter and encrypting it now that would be what
we call the key you and the recipient both have to agree presumably before class in advance what number you're
going to use that day to rotate or change all of these letters by because when you add one they upon receiving
your Cipher text have to subtract one to get back the answer so for instance if the input plain text is high as before
and the key is one the cipher text using this simple rotational algorithm otherwise known as a Caesar Cipher might
be IJ exclamation point so it's similar but it's at least scrambled at first glance and unless the teacher really
cares to figure out what algorithm are they using today or what key are they using today it's probably sufficiently
secure for your purposes how do you reverse the process well your friend gets this and reverses it by negative
one so I becomes h j becomes I and things like punctuation remain untouched at least in this scheme so let's
consider one final example here if the input to the algorithm is UI JT xbt dt50 and the key this time is negative 1 such
that now B should become a and C should become b and a should become Z so we're going in the other direction how might
we analyze this well if we spread all the letters out and we start from left to right and we start subtracting one
letter U becomes t i becomes h J becomes i t becomes s x becomes w a was DT this was cs50 we'll see you next time
[Music] this is cs50 and this is already already week three and even as we've gotten much
more into the minutia of programming and some of the C stuff that we've been doing is all the more cryptic looking
recall that at the end of the day like everything we've been doing ultimately fits into to this model so keep that in
mind particularly as things seem like they're getting more complicated more sophisticated it's just a process of
learning a new language that ultimately lets us express this process and of course last week we really went into the
weeds of like how inputs and outputs are represented and this thing here a photograph there of is called
what this is what Ram I heard random access memory or just generally known as memory and
recall that we looked at one of these little black chips that that contains all of the bites all of the bits
ultimately it's just kind of a grid sort of an artist grid that allows us to think about every one of these memory
locations as just having a number or an address so to speak like this might be bite number zero and then one and then
two and then maybe way down here again something like two billion if you have 2 gigabytes
of memory and so as we did that we started to explore how we could use this canvas to create kind of our own
information our own inputs and outputs not just the basics like ins and floats and so forth but we also talked about
strings and what is a string as you now know it how would you describe in Lay person's terms a string yeah over
there an array of characters and an array meanwhile let's go there how might someone else Define an array in more
familiar now terms what would be an array [Music]
yeah an indexed set of things not bad and I think a key characteristic to keep in mind with an array is that it does
actually pertain to memory and it's contiguous memory bite after bite after bite is what constitutes an array and
we'll see in a couple of weeks time that there's actually more interesting ways to use this same primitive canvas to
stitch together things that are sort of twood directional even that have some kind of shape to them but for now all
we've talked about his arrays and just using these things from left to right top to bottom continuously to represent
information so today we'll consider still an array but we won't focus so much on representation of strings or
other data types we'll actually now focus on the other part of that process of inputs becoming outputs namely the
thing in the middle uh algorithms but we have to keep in mind even though every time we've looked at in Array thus far
certainly on the board like this you as a human certainly have the luxury of just kind of eyeballing the whole thing
with a bird's eye view and seeing where all of those numbers are if I asked you where a particular number is like zero
odds are your eyes would go right to where it is and boom problem solved in sort of one step but the catch is with a
computer that has this memory even though you the human concar see everything at once a computer cannot
it's better to think of your computer's memory your phone's memory or more specifically in Array of memory like
this as really being a set of closed doors not unlike lockers in a school and only by opening each of those doors can
the computer actually see what's in there which is to say that the computer unlike you doesn't have this bird's eye
view of all of the data in all these locations it has to much more methodically look here maybe look here
maybe look here and so forth in order to find something now fortunately we already have some building blocks Loops
conditions uh Boolean expressions in the like where you could imagine writing some code that very methodically goes
from left to right or right to left or something more sophisticated that actually finds something you're looking
for and just remember that the conventions we've had since last week now is that these arrays are zero
indexed so to speak to be zero indexed just means that the data type starts counting from zero so this is location
zero 1 2 3 4 5 6 and notice even though there are seven total doors here the rightmost one of course is called six
just because we've started counting at zero so in the general case if you had n doors or n bytes of memory zero would
always be at the left and N minus one would always be at the right that's sort of a generalization of just thinking
about this kind of convention all right so let's revisit the problem that we started the whole term off with in week
one week zero which was this notion of searching and what does it mean to search for something well to find
information and this of course is omnipresent anytime you take out your phone you're searching for a friend's
contact anytime you pull up a browser you're Googling for this or that so search is kind of one of the most
omnipresent topics and features of any device these days so let's consider how the Google the apples the microsofts of
the world are implementing something as seemingly familiar as this so here might be the problem statement we want some
input to become some output what's that input going to be maybe it's a bunch of closed doors like this out of which we
want to get back an answer true or false is something we're looking for there or not you could imagine taking this one
step further and trying to find where is the thing you're looking for but for now let's just take one bite out of the
problem can we tell ourselves true or false is some number behind one of these doors or Lockers in memory but before we
go there and start talking about ways to do that that is algorithms let's consider how we might lay the foundation
of like comparing whether one algorithm is better than another we talked about correctness and it sort of goes without
saying that any code you write any algorith you implement had better be correct otherwise what's the point if it
doesn't give you the right answers but we also talked about design and in your own words like what do we mean when we
say a program is better designed at this stage than another how do you think about this notion of design now yeah in
midle okay so easier to understand I like that other thoughts yeah efficiency efficiency and what do you mean by
efficiency precisely memory nice it doesn't use up too much
memory and it isn't redundant so you can think about design along a few of these axes sort of the quality of the code but
also the quality of the performance and as our programs get bigger and more sophisticated and uh more uh and just
longer those kinds of things are really going to matter and in the real world if you start writing code not just by
yourself but with someone else getting the design right is just going to make it easier to collaborate and ultimately
produce right code with just higher probability so let's consider how we might focus on exactly the second
characteristic the efficiency of an algorithm and the way we might talk about the efficiency of algorithms just
how fast or how slow they are is in terms of their running time that is to say when they're running how much time
do they take and we might measure this in seconds or milliseconds or minutes or just some number of steps in the general
case because presumably fewer steps to your point is better than more steps so how might we think about running times
well there's one General notation we should Define today so computer science tend to describe the running time of an
algorithm or a piece of code for that matter in terms of what's called Big O notation this is literally a capitalized
O A Big O and this generally means that the running time of some algorithm is on the order of such and such where such
and such we'll see is just going to be a very simple mathematical formula it's kind of a way of waving your hands
mathematically to convey the idea of just how fast or how slow some algorithm or code is without getting into the
weeds of like it took this many milliseconds or this many specific number of steps so you might recall then
from week zero I even introduced this picture but without much context at the time we just used this to compare those
phone book algorithms recall that this red straight line was the first algorithm one page at a time the yellow
line that's still straight differed how if you recall that line represented What alternative
algorithm looking out and back what is that second algorithm yeah over there two pages at a time which would
almost correct so long as we potentially double back a page if maybe we go a little too far in the phone book so it
had a potential bug but arguably solvable this last algorithm though was the so-called divide and conquer
strategy where I sort of unnecessarily tore the phone book in half and then in half and then in half which dramatic as
that was unnecessarily it actually took significantly bigger bites out of the problem like 500 pages the first time
another 250 another 125 versus just one or two bytes at a time and so we described it's running time as this
picture there though I didn't use that expression at the time running times but indeed time to solve might be measured
just abstractly in some unit of measure seconds milliseconds minutes Pages via this y AIS here so let's now slap some
numbers on this if we had n pages in that phone book n just representing a generic number the first algorithm here
we might describe as taking n steps second algorithm we might describe as taking n divided by two steps maybe give
or take one if we have to the double back but generally n / two and then this thing if you remember your logarithms
which sort of a fundamentally different formula log base 2 of n or just log of n for short so this is sort of a
fundamentally different formula but what's noteworthy is that these first two algorithms even though yes the
second algorithm was hands down faster I mean literally twice as fast when you start to zoom out and if I increase my y
AIS and x axis these first two whoops these first two start to look awfully similar to one another and if we keep
zooming out zooming out zooming out as n gets really large that is the x-axis gets really long these first two
algorithms start to become ah essentially the same and so this is where computer scientists use Big O
notation instead of saying specifically this algorithm takes n steps and this one n ided two a computer scientist
would say ah each of those algorithms takes on the order of n steps or on the order of n/2 but you know what on the
order of n / 2 is pretty much the same when n gets really large as being equivalent to Big O of n itself so yes
in practice it's obviously fewer steps to move twice as fast but in the big picture when n becomes a million a
billion the numbers are already so darn big at that point that these are as the shapes of these curves imply pretty much
functionally equivalent but this one still looks better and better as n gets large because it's rising so much less
quickly and so here a computer scientist would say that that third algorithm was on the order of that is Big O of log n
and you don't have to bother with the base because it's a smaller mathematical detail that is also just in some sense a
constant multiplicative Factor so in short what are the takeaways here this is just a new vocabulary that we'll
start to use when we just want to describe the running time of an algorithm to make this more real if any
of you have implemented a for loop at this point in any of your code and that for Loop iterated n times where maybe n
was the height of your pyramid or maybe n was something else that you wanted to do end times you wrote code or you
implemented an algorithm that operated in Big O of n time if you will so this is just a way now to retroactively start
describing with somewhat um mathematical notation what we've been doing in practice for a while now so here's a
list of uh commonly seen running times in the real world this is not a uh thorough list because you could come up
with an infinite number of mathematical formulas certainly but the common ones will discuss and you will see in your
own code probably reduced to this list here and if you were to study more computer science theory this list would
get longer and longer but for now these are sort of the most familiar ones that we'll soon see all right two other
pieces of vocabulary if you will before we start to use this stuff so this a big Omega Capital omega symbol is used used
now to describe a lower bound on the running time of an algorithm so to be clear Big O is on the order of that is
an upper bound on how many steps an algorithm might take on the order of so many steps if you want to talk though
from the other perspective well how few steps might my algorithm take maybe in the so-called best case it'd be nice if
we had a notation to just describe what a lower bound is because some algorithms might be super fast in these so-called
best cases so the symbology is almost the same but we replace the Big O with the big Omega so to be clear Big O
describes an upper bound and Omega describes a lower bound and we'll see examples of this before long and then
lastly last one here big Theta is used by a computer scientist when you have a case where both the upper bound on an
algorithm's running time is the same as the lower bound you can then describe it in one breath as being in Theta of such
and such and instead of saying it's in Big O and in Omega of something else all right so out of context sort of just
sort of um seemingly cryptic symbols but all they refer to is upper bounds lower bounds or when they happen to be one and
the same and we'll now introduce over time examples of how we might actually apply these to concrete problems but
first let me pause to see if there's any questions any questions here any
questions uh I see pointing somewhere uh where are you pointing to over here there we go okay sorry very
bright smaller n functions move faster so yes if you have something like n that takes only n steps if you have a formula
like n s just by nature of the math that would take more steps and therefore be slower so the larger the mathematical
expression the slower your algorithm is because the more time or more steps that it takes want your
want your you want your n function so to speak to be small yes and in fact the
Holy Grail so to speak would be this last one here either in Big O notation or even Theta when an algorithm is on
the order of a single step that means it literally takes constant time one step or maybe 10 steps 100 steps but a fixed
constant number of steps that's the best because even as the phone book gets bigger even as the um data set you're
searching gets larger larger if something only takes a finite number of steps constantly then it doesn't matter
how big the data set actually gets questions as well on these notations yep thank you for the pointing this is
actually very helpful I'm seeing pointing this [Music]
way what is the input to each of these functions it is an expression of how many steps an algorithm takes so in fact
let me go ahead and make this more concrete with an actual example here if we could so on St stage here we have
seven lockers which represent if you will an array of memory and this array of memory is maybe storing seven
integers seven integers that we might actually want to search for and if we want to search for these values how
might we go about doing this well for this why don't we make things interesting would a volunteer like to
come on up have to be masked and on the internet if you are comfortable both of the oh are Someone putting their
friend's hand up and back yes okay come on down and in just a moment our brave volunteer
is going to help me find a specific number in the data set that we have here on the screen so come on down and I'll
get things ready for you in advance here come on [Music]
down nice to meet and what is your name Mana n namir nice to meet come on over so here we have for Nira uh seven
lockers or an array of memory and behind each of these doors is a number and the quite simply is given this array of
memory as input to return uh true or false is the number I care about actually there so suppose I care about
the number zero what would be the simplest most correct algorithm you could apply in order to find us the
number zero okay try open the first one all right and just maybe just step
aside so the audience can see I think you have not found zero yet okay so keep the door open let's move on to your next
choice second door sure oh go ahead second door well let's keep it simple let's just move from left
to right sort of searching our way and what do you see there up six not zero how about the next
[Music] door all right also not working out so well yep but that's okay if you want to
go on to the next we're still looking for zero all right I see a two all right not
so good yet let's keep going next door two seven no okay next door no that's uh F all right very well
done okay all right so I kind of set you up for a fairly slow algorithm but let me
just ask you to describe what is it you did by following the steps I gave you I just one by one to each character you ju
you went one by one to each character and if you want to talk into here so you went one by one by each character and
would you say that algorithm left to right is correct no
no or yes in the scenario okay yes in the scenario and why are you hesitating what's through mind it's not the most
efficient way to do it okay good so we see a contrast here between correctness and design I mean I do think it was
correct because even though it was slow you eventually found zero but it took some number of steps so in fact this
would be an algorithm it has a name called linear search andir as you did you kind of walked AC along a line going
from left to right now let me ask if you had gone from right to left would the algor gthm have been fundamentally
better yes okay and why because the zero is here in the first scenario but then um if it was like zeros in the middle it
wouldn't have been yeah and so here's sort of where the the the right way to do things becomes a little less obvious
you would absolutely have given yourself a better result if you had just happened to start from the right or if I had
pointed you to start over there but the catch is if I asked her to find another number like the number eight well that
would have backfired and this time it would have taken longer to find that number because it's way over here
instead and so in the general case you know going left to right or heck right to left is probably as correct as you
can get because if you know nothing about the order of these numbers and indeed they seem to be fairly random
some of them are smaller some of them are bigger there doesn't seem to be Rhyme or Reason linear search is about
as as good as you can do when you don't know anything op prior about the numbers so I have a little thank you gift here a
little cs50 stress ball Round of Applause for our first volunteer thank you so much
let's try to formalize what I just described as linear surch because indeed no matter which end the mirror had
started on I could have kind of changed up the problem to make sure that it appears to be running slow but it is
correct if zero were among those doors she absolutely would have found it and indeed did so let's now try to translate
what we did into what we might call again pseudo code AS from week zero so with pseudo code we just need a tur
English like or any language syntax to describe what we did so here might be one formulation of what nir did for each
door from left to right if the number is behind the door return true else at the very end of the program you would return
false by default and now you got lucky and by the seventh door Nira had indeed returned True by saying Wella there is
the zero but let's consider if this pseudo code is now correct an accurate translation first of all normally we've
seen ifs we might see an if else and yet down here return false is aligned with the four why did I not indent the return
false or put another way why did I not do if number is behind door return true else return
false why would that version of this code have been problematic weigh and back
[Music] okay I'm not sure it's because of redundancy let me go ahead and just make
this explicit if I had instead done else return false I don't think it's so much redundancy that I'd be worried about let
me bounce somewhere else yeah in [Music] front yeah I would be returning Falls
for uh even though I'd only looked at herir had only looked at one element and it would have been as though if all
these doors were still closed she opens this up and sees nope this is not zero return false that would give me an
incorrect result because obviously at that stage in the algorithm she wouldn't have even looked through any of the
other doors so just the original indentation of this if you will without the El is correct because only if I get
to the bottom of this algorithm or this pseudo code does it make sense to conclude at that point once she's gone
through all of the doors that nope there's in fact the number the number I'm looking for is in fact not actually
there so how might consider now the running time of this algorithm we have a few different uh types of vocabulary now
and if we consider now how we might think about this let's start to translate it from sort of higher level
pseudo code to something a little lower level right we've been writing code using n and and loops and the like so
let's take this higher level pseudo code and now just kind of get a middle ground between English and C let me propose
that we think about this version of the same algorithm as being a little more pedantic for I from 0 to n minus one if
number behind doors bracket I return true otherwise at the end of the program return false now I'm kind of mixing
English and C here but that's reasonable if the reader is familiar with C or some similar language and notice this pattern
here this is a way of just saying in pseudo code uh give myself a variable called I start at zero and then just
count up to n minus one and recall n minus one is not one shy of the end of the array n minus one is the end of the
array because again we started counting at zero so this is a very common way of expressing this kind of loop from the
left all the way to the right of an array doors I'm kind of implicitly treating as the name of this array like
it's a variable from last week that I defined as being an array of integers in this case so doors bracket I means that
when I is zero it's this location when I is one it's this when I is seven or more generally n minus sorry six or more
generally n minus one that's this location here so same idea but a translation of it so now let's consider
what the running time of this algorithm is if we have this menu of possible answers to this question how efficient
or inefficient is this algorithm let's take a look in the context if this pseudo code we don't even have to bother
going all the way to see how do we go about analyzing each of these steps well let's consider that
this outermost Loop here for I from 0 to n minus1 that line of code is going to execute how many times how many times
will that Loop execute let me give folks this moment to think on it how many times is that going to Loop
here uh yeah over there n times right because it's from 0 to n minus one and if it's a little
weird to think in from 0o to n minus one this is essentially the same mathematically as from one to n and
that's perhaps a little more obviously more intuitively n total steps so I might just make a note to myself this
Loop is going to operate n times what about these inner steps well how many steps or seconds does it take to ask a
question if the number behind if the number you're looking for is behind doors bracket I well as Nina did that's
kind of like one step right she opened the door and boom all right maybe it's two steps but it's a constant number of
steps so this is some constant number of steps let's just call it one for Simplicity how many steps or seconds
does it take to return true I don't know exactly in the computer's uh memory but that feels like a single step just
return true so if this takes one step this takes one step but only if the condition is true it looks like you're
doing a constant number of things end times or maybe you're doing one additional step so in short the only
thing that really matters here in terms of the efficiency or inefficiency of the algorithm is what are you doing again
and again and again CU that's obviously the thing that's going to add up doing one thing or two things a constant
number of times not a big deal but looping that's going to add up over time because the more doors there are the
more the bigger n is going to be and the more steps that's going to take which is all to say if you were to describe
roughly how many steps does this algorithm take in Big O notation what might Your Instinct
say how many steps is this algorithm on the order of given n doors or n integers yeah say again Big O of N and indeed
that's going to be the case here why because you're essentially at the end of the day doing N Things as an upper bound
on running time and that's in fact what exactly what happened witha she had a look at all in lockers before finally
getting to the right answer but what if she got lucky and the number we were looking for was not at the end of the
array but were was at the beginning of the array how might we think about that well we have a nomenclature for this too
of course Omega notation remember Omega notation is a lower bound so given this menu of possible running times for lower
bounds on an algorithm what might theega notation be for Nina's linear search omega of one and why
that right because if just by chance she gets lucky and the number she's looking for is right there where she Begins the
algorithm that's it it's one step maybe it's two steps if you have to like unlock the door and open it but it's a
constant number of steps and the way we describe constant number of steps is just with a single number like one so
the Omega notation for linear search might be Omega of one because in the best case she might just get the number
right from the get-go but in the worst case we need to talk about the upper bound which might indeed be Big O of n
so again there's this way now of talking symbolically about best cases and worst cases or for upper uh lower bounds and
upper bounds Theta notation just as a little trivia now is it applicable based on the definition I gave
earlier okay no because you only take out the feta notation when those two bounds upper and lower happen to be the
same for shorthand notation if you will so it suffices here to talk about just Big O and Omega notation well what if we
are a little smarter about this let me go ahead and uh sort of semi secretly here rearrange these numbers but first
how about one other volunteer one other volunteer to be comfortable with your mask and you're uh
being on the internet how about over here yes you want to come on down all right come on down and don't look at
what I'm doing cuz I'm going [Applause] [Music]
to take your time and don't look up this way cuz I need a moment to rearrange all of the
numbers and actually if you could stay right there before coming up just an awkward few seconds while I finish
hiding the numbers Behind These doors for you I will be right with
you actually if um do you want to warm up the crowd for a moment and I'll be right back you want
to introduce yourself yeah hi guys I'm Rave [Applause]
yeah all right I think I am ready thank you for stalling there of course and and I didn't catch your name what was your
name I'm Rave sorry Rave like a party Rave okay nice to me come on over so Rave has kindly volunteered now and I'm
going to give you an additional Advantage this time um unbeknownst to you I now took numbers behind the doors
but I sorted them for you so they're not in the same random order like they were for Nina you now have the advantage to
know that the numbers are sorted from small to big okay given that and given perhaps what we talked about in week
zero with the phone book where might you propose we begin the story this time with which locker to find zero uh let's
find number six this time let's make things interesting okay um I'll start in the middle okay so the middle there's
seven total so that would be right here go ahead open that up and you find sadly the number five so what do you know now
um I know to go up okay you okay all right so and just to keep it uniform just like I did uh I open to the right
half of the phone book let's keep it similar yeah all right all right and a little too far even though I know you
wanted to go one over all good all good and now we're going to go which direction over here in the middle all
right and voila the number six all right so very nicely done little stress ball for you as well
thank you again so here we see by nature of the locker door still being open sort of a an artifact of the greater
efficiency it would seem of this algorithm because now that um Rave was given the assumption that these numbers
are sorted from small on the left to large on the right she was able to apply that same divide and conquer algorithm
from week zero which we're now going to give a name binary search and simply by starting in the middle and realizing
okay too small then by going to the right half and realizing oh went a little too far then by going to the left
half which ra Rave able to find in just three steps instead of seven the number six in this case that we were actually
searching for so you can see that this would seem to be more efficient let's consider for just moment is it correct
if I had used different numbers but still sorted them from left to right would it still have
worked this algorithm you're nodding your head can I call on you like why would it still have worked do you
[Music] think yeah so so long as the numbers are always in the same order from left to
right or heck they could even be in reverse order so long as it's consistent the decisions that Rave was making if
greater than else if less than would gu us to the solution no matter what and it would seem to take fewer steps so if we
consider now the pseudo code for this algorithm let's take a look how we might describe binary search so binary search
we might describe with something like this if the number is behind the middle door which is where Rave began then we
can just return true else if the number is less than the middle door so if six is less than whatever is behind the
middle door then Rave would have searched the left half else if the number is greater than the middle door
Rave would have searched the right right half else if there are no doors and we'll see in a moment why I put this up
top just to keep things clean if there's no doors what should Rave have presumably returned immediately if I
gave her no lockers to work with just return false but this is an important case to consider because if in
the process of searching by Locker by Locker we might have whittel down the problem from seven doors to three doors
to one door to zero doors and at that point we might have had no doors left to cir so we have to naturally have a
scenario for just considering if there were no doors so it's not to say that maybe I don't give Rave any doors to
begin with but as she divides and divides and divides if she runs out of lockers to ask those questions of or a
few weeks ago if I ran out of phone book pages to tear in half I too might have had to return false as in this case so
how can we now describe this a little more like C just to give ourselves a variable to start thinking and talking
about well I might talk about doors as being an array and so if I want to express the middle door I could just in
pseudo code say doors bracket middle I'm assuming that someone has done the math to figure out what the middle door is
but that's easy enough to do and then doors if the number we're looking for is less than doors bracket middle then
search do zero through doors middle minus one so again this is a more pedantic way of taking what's a pretty
intuitive idea search the left half search the right half but start to now describe it in terms of actual indices
or indexes like we did with our array notation the last scenario of course is if the number is greater than the doors
bracket middle then ra would have wanted to search the middle door plus one so one over uh through doors n minus one
through n minus one so again just a way of sort of describing a little more syntactically what it is that's going on
so how might we translate this now into Big O notation well in the worst case how many steps total might R's binary
search algorithm have taken given seven doors or given more generically n doors how many times could she go left
or go right before finding herself with one or no doors left what's the way to think about that H yeah in the middle
log in log in so there's log in again and even if you're not feeling wholly comfortable with your logarithm still
pretty much in programming and in computer science generally anytime we talk about some algorithm that's
dividing and conquering in half in half in half or any other multiple it's probably involving logarithms in some
sense and log base n essentially refers to the number of times you can divide n by two until you bottom out at just a
single door or equivalently zero doors left so log in so we might say that indeed binary search is in bigo of login
because the door that Rave opened last this one happen to be three doors away and actually if you do the math here
that roughly works out to be exactly that case if we add one that's sort of um out of seven doors or roughly eight
we were able to search it in just three total steps what about Omega notation though like in the best case ra might
have gotten lucky she opened the door and there it is so how might we describe a lower bound on the running time of
linear of binary search yeah say again oh Omega of one so here too we see that in some cases binary search and linear
search eh like they're pretty equivalent and so this is why sometimes it's considered it's sometimes compelling to
consider both the best case and the worst case because honestly in general who really cares if you just get lucky
once in a while and your algorithm is super fast what you probably care about is what's the worst case how long are my
users how long am I going to be sitting there watching some spinning hourglass or a beach ball trying to give my uh
give my myself an answer to a pretty big problem well odds are you're going to generally care about Big O notation so
indeed moving forward we'll generally talk about the running time of algorithms often in terms of Big O A
Little Less so in terms of Omega but understanding the range can be important depending on the nature of the data that
you're going to actually be given here all right let me pause and see if there is any
questions any questions here yes thank you [Music]
yeah that's a really good question and if I can generalize it how do you guarantee that you can do this at scale
which algorithm is better I've sort of led us down this road of implying that Rave second algorithm binary search is
better because it's so much faster it's log ofan in the worst case instead of Big O event but Rave was given an
advantage when she came up here and that the doors were already sorted and so that sort of invites the question well
given a whole bunch of random data either a small data set or heck something Google size with millions
billions of pieces of data should you sort it first from smallest to largest and then search or should you just Dive
Right In and search it linearly like how might you think about that if you are Google for instance and
you've got millions billions of web pages should they just go with linear search because it's always going to work
even though it might be slow or should they invest the time in sorting all of that data we'll see how in a bit and
then search it more efficiently like how do you decide between those [Music]
options yeah if you had to sort the data first and we don't yet formally know how to do this but obviously as humans we
could probably figure it out you do have to look at all of the data anyway and so you're sort of wasting your time if
you're sorting it only then to go and search it but maybe it depends a bit more like that's absolutely right and if
you're just searching for one thing in life then that's probably a waste of time to sort it and then search it
because you're just adding to the process but what's another scenario in which you might not worry about
that whereby it might make sense to sort it and then search [Music]
yeah yeah exactly so if your problem is a Google like problem where you have more than just one user who's searching
for more than just one web page probably you should incur the cost up front and sort the whole thing because every
subsequent request thereafter is going to be faster faster faster because it's going to be R algorithm of binary search
binary search binary search that's going to add up way to be way fewer steps than doing linear search multiple times so
again kind of depends on the use case and kind of depends on how important it is and this happens even in like real
world's context I think back always to graduate school when I was writing some code to analyze some large data set and
honestly it was actually easier at the time for me to write pretty inefficient but hopefully correct code because you
know what I could just go to sleep for eight hours and let it analyze this really big data set I didn't have to
bother writing more complex code to sort it just to run it more efficiently why because I was the only user and I only
needed to run these queries once and so this was kind of a reasonable approach reasonable until I woke up 8 hours later
and my code was incorrect and now I had to spend another 8 hours rerunning it after fixing it but even there you see
an example where what is your most precious resource is it time to run the code is it time to write the code is it
the amount of memory the computer is using these are all resources we'll start to talk about because it really
depends on what your goals are any questions then on upper bounds lower bounds or each of these two searches
linear or binary yeah [Music] when analyzing running time does the
Sorting step count if you want it to if you actually do it at the moment it did not apply I just gave Rave the luxury of
knowing that the data was sorted but if I really wanted to charge her for the amount of time it took to find that
number six I should have added the time to sort plus the time to search and in fact that's a road will go down why
don't we go ahead and pce ourselves as before let's take a 10-minute break here and when we come back we'll write some
actual code so we've seen a couple searches linear search and binary search which to
be fair we saw back in week zero but let's actually translate at least one of those now to some code using this
building block from last week where we can actually Define an array if we want like an array of integers called numbers
so let me switch over to vs code here let me go ahead and start a program called numbers. C and in numbers. C let
me go ahead here and how about let's include our familiar header file so cs50.h I'll include standard iio H that
we can get input and print input if we want and now I'm going to go ahead and give myself int main void no command
line arguments today so I'll leave that as void and I'm going to go ahead and give myself an array of how about seven
numbers so I'll call it int number seven and then I can fill this array with numbers like numbers bracket zero can be
the number four and numbers bracket one can be the number six and numbers bracket two can be the number eight and
this is the same list that we saw in the Mina a bit ago where it was four then six then eight but you know what there's
actually another syntax I can show you here if you know in advance in a c program that you want an array of
certain values and you know therefore how many of those values you want you can actually do this little trick using
curly braces you can say don't worry about how big this is it's going to be implicit by way of these curly braces
here I can do 4 6 8 2 7 5 Z close curly brace so it's a somewhat new use of curly braces but this has the effect of
giving me an array called numbers inside of which are a whole bunch of integers how many the compiler can infer it from
whatever inside these curly braces and it seems to be of size 1 2 3 4 5 6 7even and all seven elements will be
initialized with 4 6 8 2 7 5 Z respectively so just a minor optimization codewise to tighten up what
would have otherwise been like eight separate lines of code now let's go ahead and Implement linear search as we
called it and you can do this in a bunch of ways but I'm going to do it like this for for in I gets zero I is less than 7
uh I ++ then inside of my Loop I'm going to ask the question well if the numbers at location I equals equals as we asked
of Nina the number zero then I'm going to go ahead and do something like uh print F found back
sln and then I'm going to return zero just because of last week's discussion of returning a value for main when all
is well I'm going to return zero by convention just a signal that indeed I found what I'm looking for otherwise on
what line do I want to go and add a printf like not found and return something other than zero right I don't
think I want an else here per our pseudo code earlier so on what line would you prefer I sort of insert a default
scenario of not found and I'll return an error uh yeah over here at the
nice so at the end of the for Loop because you want to give the program or our volunteer earlier a chance to go
through all of the doors all of the numbers but if you go through the whole thing through the whole loop at the very
end you probably just want to conclude not found back sln and then return something like positive one just to
signify that an error happened and again this was a minor detail last week anytime main is successful the
programming convention is to return zero that means all is well and if something goes wrong like you didn't find what
you're looking for you might return something other than zero like positive one maybe positive two or even negative
numbers if you want all right well let me go ahead and save this let me do make numbers hopefully no syntax errors all
good so far dot slash numers enter all right and it's found as I would hope it would be and just as a little check
let's search for something that's definitely not there like the number uh -1 let me go ahead and recompile the
code with make numbers let rerun the code with Slash numbers and hopefully okay not found so proof by example seems
to be working correctly but let's make things a little more interesting now right now I'm using just an array of
integers let me go ahead and introduce maybe an array of strings instead and maybe this time I'll store a bunch of
names and not just integers but actual strings of names so how might I do this well let me go back to my code here I'm
going to switch us over to maybe a file called names. C and in here I'll go ahead and include
cs50.h I'll include uh standard i.h and I'm going to go ahead and for now include a new friend from last week
string.h which gives me some string related functionality in main void because I'm not going to bother with any
command line Arguments for now and now if I want an array of strings I could do something like this uh string names
bracket 7 and then I could start doing like before names bracket zero could be someone like Bill and names bracket one
could be someone like Charlie and so forth but there's this new uh Improvement I can make let me just let
the compiler figure out how many names there are and using curly braces I'll do Bill and then Charlie and then Fred and
then George and then Jenny and then Percy and then Ron if there's the pattern there all right so now I have
these seven names as strings let's do something similar so for INT I gets zero I is less than S as before i++ as before
and inside of the loop let's this time check for the string in question and suppose we're searching for Ron
arbitrarily he is there so we should eventually find him let me go ahead and say if uh names bracket I equals quote
unquote Ron then inside of my if condition I'm going to say print F found just like before and I'm going to return
zero just because all is well and I'm going to take your advice uh from the get-go this time and at the end of the
loop print out not found because if I get this far I have not printed found and I have not returned already so I'm
just going to go ahead and return one and after printing not found all right let me go ahead and cross my fingers as
always make names this time and it doesn't seem to like my code here this is perhaps a new error that
you might not have seen yet in names. C line 11 so that's this line here my if condition uh result of comparison
against a string literal is unspecified use an explicit string comparison function instead I mean that's kind of a
mouthful and first time you see it you're probably not going to know how to make sense of that but it does kind of
draw our attention to something being arai with the equality checking here with equal equals and Ron and here's
where again we've been telling sort of a white lie for the past couple of weeks strings are a thing in C strings are a
thing in programming but recall from last week I did disclaim there's no such thing as a string data type technically
because it's not a A Primitive in the way an INT and a float and a bull are that are sort of built into the language
you can't just use equal equals to compare two strings you actually have to use a special function that's in this
header file we talked briefly about last week in that header file was string length or stir Lang but there's other
functions inste as well let me in fact go ahead and open up uh the manual pages and if we go to string.h let me scroll
down a bit in string.h you can perhaps infer what function will probably take the place of equals equals for
today what do we want to use yeah so St comp Str strcmp which apparently Compares two
strings and if I click on that we'll see more information and indeed if I click on stir comp we'll see under the
synopsis that okay I need to use the cs-50 LI uh header file and string.h as I already have here is its prototype
which is telling me that stir comp takes two strings S1 and S2 that are presumably going to be compared and it
returns an integer which is interesting so let's let's read on the description of this function is that it Compares two
strings case sensitively so uppercase or lowercase matters just FYI and then let's look at the return value here the
return value of this function returns an INT less than zero if S1 comes before S2 zero if S1 is the same as S2 or an inte
greater than zero if S1 comes after S2 so the reason that this function returns an integer and not just a bull true or
false is that it actually will allow us to sort these things eventually because if you can tell me if two strings come
in this order or in this order or they're the same you need three possible return values and a bull of course only
gives you two but an in gives you like four billion even though we just need the three so zero or a positive number
or a negative number is what this function returns and the documentation goes on to explain what we mean by
astical order recall that capital A is 65 capital B is 66 and it's those underlying asy or Unicode numbers that a
computer uses to figure out whether something comes before it or after it like in a dictionary but for our purpose
is now we only care about equality so I'm going to go ahead and do this if I want to compare names bracket I against
Ron I use Stir compare or stir comp uh names bracket I comma quote unquote Ron so it's a little more involved than
actually using equals equals which does work for integers uh Longs and certain other values but for Strings it turns
out we need to use a more powerful function why well last week recall what a string really is it's an array of
characters and so whereas you can use equals equals for single characters stir comp as we'll eventually see is going to
compare multiple characters for us there's more logic there there's a loop needed and that's why it comes with the
string library but it doesn't just work out of the box with equals equals alone that would literally be comparing two
things not two arrays of things and we'll come back to this next week as to what's really going on under the hood so
let me go ahead and fix one bug that I just realized I made I want to check if the return value of stir compare is
equal to zero because per the documentation that meant they're the same all right let me go ahead and make
uh names this time now it compiles do/ name names enter found and just as a sanity check let's check someone uh
outside the family searching now for herione after recompiling the code after rerunning the code and she's not in fact
found so here's just a similar implementation of linear search not for integers this time but instead for
Strings the subtlety really being we need a helper function stir compare to actually do the the leg work for us of
comparing two arrays of characters all right questions on either of these implementations yeah in the
middle ah good question if I had uh not fixed what I claimed was a mistake earlier and I did this and we saw an
example of this last week actually if a function returns an integer be it negative or positive or zero when you
get back zero the expression the Boolean expression will be considered false so 0 equals false always if a function
returns any positive number or any negative number that's going to be interpreted as true even if it's
positive or negative whether it's one negative 1 2 -2 and so if I did this this would be saying the opposite so if
I were to say this if stir compare of names bracket I and Hermione that's implicitly like saying this does not
equal zero or it means sort of is true but you don't want to check for true because again we're comparing integers
here so the reason I did zero here in this case is that it explicitly checks for the return value that means they're
the same and yeah followup yes you might not have seen this yet but you can express the
equivalent because if you want to this if you want to check if this is false you can actually use an exclamation
point known as a bang in programming that inverts the meaning so false becomes true true becomes false so this
would be another way of expressing it this is arguably a worse design though because the documentation explicitly
says you should be checking for zero or a positive value or a negative value and this little trick while correct and I
think you can make a reasonable case for it s of hides that detail and I would argue instead for the first way checking
for equals equals zero instead and if that's a little subtle not to worry we'll come back to sort of uh little
syntactic tricks like that before long uh other questions on linear search in these two
forms is there another hand or hands two hands no okay just holler if I missed so let's now actually take this one step
further suppose that we want to write a program that maybe implements something a little more like a phone book that has
both names and numbers and not just integers but actual phone numbers well we could escalate things like this we
could now have two arrays one called names one called numbers and I'm going to use strings for the numbers now the
phone numbers because in most communities uh phone numbers might have dashes pluses parentheses so something
that really looks more like a string even though we call it a phone number probably don't want to use an INT lest
we throw away those kinds of details so let me switch back to vs code here and let's do one more program this one in a
file called phonebook.com [Music] and then inside of my program I'm going
to give myself two arrays the efficient way this time uh string names will be just two of us this time how about uh
Carter and me and then I'll give myself whoops typo already if I want this to be an array I don't have to specify the
number the compiler can count for me but I do need to the square brackets then for numbers I'm again going to use a
string uh array specifying with the curly braces that how about Carter can be at 1 617 495 1000 and how about my
own number here 1 949468 up pattern appearing 2750 will be mine why mine well I've
just kind of lined things up so Carter's number is apparently uh first in this array and I'm claiming that he'll be
first in this array respectively I David will be the first the second in the names array and second in the numbers
array if you want to have a little fun with programming feel free to text or call me sometime at that number so now
let's actually use this data in some way let's go ahead and actually search for my own name and number here so let me do
for in I gets zero uh there's two of us this time so I less than two and then i++ as before and now I'm going to
practice what I preached earlier and I'm going to use Stir compare to find my name in this case and I'm going to say
if stir comp of names bracket I equals quote unquote David and that equals zero meaning they're the same then just as
before I'm going to go ahead and print something out but this time I'm going to make the program more useful not just
say found or not found now I'm implementing a phone book like the contacts app on iOS or Android so I'm
going to say something like quote unquote uh found percent s back sln and then actually plug in numbers bracket I
to correspond to the current names bracket I and then I'll return zero as before and then down here if we get all
the way through the loop and David is not there for some reason I'm going to print as before not found and then
return one so let me go ahead and compile this with make phonebook do/ phonebook and it seems to have found the
number so this code I'm going to claim is correct it's kind of stupid because I've just made a phone book or a
contacts app that only supports two people they're only going to be me and Carter this would be like downloading
the contact app on a phone and you can only call two people in the world there's no ability to add names or edit
things that of course could come later using get string or something else but for now for the sake of discussion I've
just hardcoded two names and two numbers but for what it does I claim this is correct it's going to find me and print
out my number but is it welld designed let's start to now consider if we're not just using arrays but are we using them
well we started to use them last week but are we using them well this week and what might I even mean by using an array
well or designing this program well any critiques or concerns with why this might not be the
best road for us to be going down when I want to implement something like a phone book with pieces of information it seems
all too vulnerable to just mistakes for instance if I screw up the number the actual number of names in the names
array such that it's now more or less than is in the numbers array or vice versa it feels like there's not a tight
relationship between those pieces of data and it's just sort of as trusting on the honor System that anytime I use
names bracket I that it that it uh lines up with numbers bracket I and that's fine if you're the one writing the code
you're probably not going to really screw this up but if you start collaborating with someone else or the
program is getting much much longer the odds that you or your colleagues remember that you're sort of just
trusting that names and numbers line up like this is going to fail eventually someone's not going to realize that and
just the code is going to break and you're going to start outputting the wrong numbers for names which is to say
it' be much nicer if we could somehow couple these two pieces of data names and numbers a little more tightly
together so that you're not just trusting that these two independent variables names and numbers have this
kind of relationship with themselves so let's consider how we might solve this a new feature today that we'll introduce
is generally known as a data structure in C we have the ability to invent our own data types if you will data types
that the authors of C decades ago just didn't Envision or just didn't think were necessary because we can Implement
them ourselves similar to scratch just as you could create custom puzzle pieces or in C you can create custom functions
so in C can you create your own types of data that go beyond the built-in inss and floats and even strings you can make
for instance a person uh data type or a candidate data type in the context of Elections or a person data type more
generically that might have a name and a number so how might we do this well let me go here and propose that if we want
to define a person wouldn't it be nice if we could have a person data type and then we could have an array called
people and maybe that array is our only array with two things in it two persons in it but somehow those data types these
persons would have both a name and a number associated with them so we don't need two separate arrays we need one
array of persons a brand new data type so how might we do this well if we want every person in the world or in this
program to have a name and a number we literally write out first those two data types give me a string called name give
me a string called number semicolon after each and then we wrap that those two lines of code with this syntax which
at first glance is a little cryptic it's a lot of words all of a sudden but typ def is a new keyword today that defines
a new data type this is the C keyword that lets you create your own data type for for the very first time struct is
another related keyword that tells the compiler that this isn't just a simple data type like an INT or a float renamed
or something like that it actually is a structure it's got some Dimensions to it like two things in it or three things in
it or even 50 things inside of it the last word down here is the name that you want to give your data type and it
weirdly goes after the curly braces but this is how you invent a data type called person and what this code is
implying is that henceforth the compiler clang will know that a person is composed of a name that's a string and a
number that's a string and you don't have to worry about having multiple arrays now you can just have an array of
people moving forward so how can we go about using this well let me go back to my code from before where I was
implementing a phone book and why don't we enhance the phone book code a little bit by borrowing some of that new syntax
let me go to the top of my program above Main and Define a type that's a structure or a data structure uh that
has a name inside of it and that has a number inside of it and the name of this new structure again is going to be
called person inside of my code now let me go ahead and delete this old stuff temporarily let me give myself an array
called people of size two and I'm going to use the non uh tur way to do this I'm not going to use the curly brazes I'm
going to more pedantically spell out what I want in this array of size two at location zero which is the first
person in an array because you always start counting at zero I'm going to give that person a name of quote unquote
Carter and the dot is admittedly one new piece of syntax today too the dot means go inside of that structure and access
the variable called name and give it this value Carter similarly if I want to give Carter a number I can go into
people bracket zero do number and give that the same thing as before + 1 617 495 1 ,000 and then I can do the same
for myself here people bracket where should I go okay one because again two elements
but we started counting at zero bracket name equals quote unquote David and then lastly people bracket 1. number equals
quote unquote + 1 uh 949 uh 468 275 so now if I scroll down here to my
logic I don't think this part needs to change too much I'm still for the sake of discussion going to iterate two times
from I zero on up two but not through two but I think this line of code needs to change how should I now refer to the
E person's name as I iterate what should I compare quote unquote David to this time let me see on
the end here yeah people bracket i.n name why because people is the name of the array
bracket I is the E person that we're iterating over in the current Loop first zero then one maybe higher if it had
more people then dot is our new snx for going inside of a data structure and accessing a variable therein which in
this case is name and so I can compare David just as before so it's a little more verbose but now arguably this is a
better program because now these people are full-fledged data types unto themselves there's no more honor System
inside of my Loop that this is going to line up because in just a moment I'm going to fix this one Last Remnant of
the previous version and if I can call back on you again what should I change numbers bracket I to this
[Music] time do number exactly So Gone is the honor System that just assumes that
bracket I in this array lines up with bracket I in this other array now why there's only one array it's an array
called people the things it stores are persons a person has a name and a number and so even though kind of marginal
admittedly given that this is a short program and given that this kind of made things look more complicated at first
glance we're now laying the foundation for just a better design because you really can't screw up now the
association of names with numbers because every person's name and number is so to speak encapsulated inside of
the same data type and that's a term of Art in CS encapsulation means to encapsulate that is contain related
pieces of information and thus we have a person that in encapsulates two other data types name and number and this just
sets the foundation for all of the cool stuff we've talked about and you use every day how what is an image well
recall that an image is a bunch of pixels or dots on the screen every one of those dots has RGB values associated
with it red green and blue you could imagine now creating a structure in C probably where maybe you have three
values three variables one called red one called green one called blue and then you could name the thing not person
but pixel and now you could store in c three different colors some amount of red some green some blue and
collectively treat it as the color of a pixel and you could imagine doing something similar perhaps for video or
music music you might have three variables one for the musical note the duration the loudness of it and you
could imagine coming up with your own data type for music as well so this is a little low level we're just using like a
familiar contacts application but we now have the away in code to express most any type of data that we might want to
implement or discuss ultimately so any questions now on struct or defining our own types the purposes for which are to
use arrays but use them more responsibly now in a better uh design but also to lay the foundation for implementing
cooler and cooler stuff um per our week zero discussion yeah what's the difference between this
and an object in an object oriented language so slight side note C is not objectoriented languages like Java and
C++ plus and others which you might have heard of programmed yourself had friends program in are objectoriented languages
in those languages they have things called classes or objects which are interrelated and objects can store not
just data like variables objects can also store functions and you can kind of sort of do this in C but it's not sort
of conventional in C you have data structures that store data in languages like Java and C++ you have objects that
store data and functions together python is an object oriented language as well so we'll see this issue in a few weeks
but let me wave my hands at it for now yeah you use yes could you use this struct to redefine how an INT is defined
short answer yes we talked uh a couple of times now about integer overflow and most recently you might have seen me
mention um the bug in iOS and Mac OS that was literally related to an INT overflow that's proba that's the result
of in only storing four bytes or 32 bits or even along is 64 bits or 8 bytes but it's fine
but if you want to implement some Financial software or some scientific or mathematical software that you allows
you to count way bigger than a typical int or along you could imagine coming up with your own structure and in fact in
some languages there is a structure called Big int which allows you to express even bigger numbers how well
maybe you store inside of a big int an array of values and you somehow allow yourself to store more and more bits
based on how high you want to be able to count so in short yes we now have the ability now to do most anything we want
in the language even if it's not built in for us other [Music]
questions could you define a name and a number in the same line uh sort of it starts to get syntactically a little
messy so I did it a little more pedantically line by line good question over here
[Music] prototypes you have to in C you have to Define anything you're going to use or
declare anything you're going to use before you actually use it so it is deliberate that I put it at the top of
my code in this file otherwise the compiler would not know what I mean by person when I first use it here on
what's line 14 so it has to come first or it has to be put into something like a header file so that you know so you
include it at the very top of your code other questions over here here uh [Music]
yeah yeah good question we'll come back to this later in the term when we talk about SQL a database language and
storing things in actual databases generally speaking even though we humans call things uh phone numbers or in the
US we have social security numbers those types of numbers often have other punctuation in it like dashes
parentheses uh pluses and so forth you could not store any of that syntax or that punctuation inside of an INT You
Could Only store numbers so one motivation for using a string is just I can store whatever the human wanted me
to store including parentheses and so forth another reason for storing things as strings even if they look like
numbers is in the context of like ZIP codes in the United States again we'll come back to this but long story short
years ago actually I was using uh Microsoft Outlook for my email client and eventually I switched to Gmail and
this is like 10 plus years ago now and Outlook at the time let you export all of your contacts at a CSV file comma
separated values more on that in the weeks to come too and that just means I could download a text file with all of
my friends and family and their numbers uh inside of it unfortunately I opened that same CSV file with Excel I think at
the time just to kind of spot check it and see if what's in there was what it was expected and I must have
instinctively hit like command or control s to save it and Excel at least has this habit of sort of reformatting
your data things look like numbers it treats them as numbers and Apple Numbers does this too Google spreadsheets does
this too nowadays but long story short I then imported uh imported my mildly saved CSV file into Gmail and now 10
plus years later I'm still occasionally finding friends and family members whose zip codes are in Cambridge Massachusetts
2138 which is missing the zero because we here in Cambridge are 02138 and that's because I treated or I let Excel
treat what looks like a number as an actual number or int and now leading zeros become a problem because
mathematically they mean nothing but but in the mail system they do sending envelopes and such all right other final
questions here yeah so could I've created used a 2d or two-dimensional array to solve the
problem earlier of having just one array yes but uh one I'd argue it's less readable especially as I get lots of
names and numbers and two that two is also kind of relying on the honor System it would be all too easy to Omit some of
the square brackets in the two-dimensional array so I would argue it too is not is not as good as
introducing a struct more on that down the road two-dimensional arrays just means arrays of arrays as you might
infer all right so now that we have this ability to store different types of data like contacts in a phonebook having
names and addresses let's actually take a step back and consider how we might now solve one of the original problems
uh by actually sorting the information we're given in advance and considering per our discussion earlier just how
costly how timec consuming is that because that might tip the scales in favor of sorting sorting then searching
or maybe just not sorting and only searching it'll give us a sense of just how expensive so to speak uh sorting
something actually is well what's the formulation of this problem it's the same thing as week zero we've got input
to sort we want it to out be outputed as sorted so for instance if we're taking unsorted input as input we want the
sorted output as the result more concretely if we've got numbers like these 6 3 8 8 5 2741 which are just
randomly arranged numbers we want to get back out 1 2 3 4 5 6 78 so we just want those things to be sorted so again
inside of the black box here is going to be one or more algorithms that actually gets this job done so how might we go
about doing this well uh just to vary things a bit more I think we have some a chance here for a bit more audience
participation uh but this time we need eight people if we may all of you have to be comfortable appearing on the
internet okay this is actually quite convenient that you're all quite close how about 1 2 3 4 5 6
7 oh okay and someone volunteering their friend number eight come on down come on down oh and
if you could I'm going to set things up if you all could join Valerie my colleague over there to give you a prop
to use here we'll go ahead in just a moment and try to find some numbers at hand in just a moment moment each of our
volunteers is going to be representing an integer and that integer is ultimately is initially going to be an
unsorted order and I claim that using an algorithm step-by-step instructions we can probably sort these folks in at
least a couple of different ways so they're in wardrobe right now uh just getting their very own Harvard T-shirt
with a jersey number on it which will then represent an element of our array give us just a moment to
finish getting the attire ready they're being handed a shirt and a number and let me ask the audience for
just a moment as we have these numbers up here on the screen these numbers too are unsorted they're just in random
order and let me ask the audience how would you go about sorting these eight numbers on the
screen how would you go about sorting these yeah what are your thoughts [Music]
okay okay okay so just to just to recap you would start with one of the numbers on
the end you would look to the number to the right or to the left of it depending on which end you start at and if it's
out of order you would just start to swap things and that seems reasonable there's a whole bunch of mistakes to fix
here because things are pretty out of order but probably if you start to solve small problems at a time you can achieve
the end result of getting the whole thing sorted other instincts if you were just handed these numbers how you might
go about sorting them how might you yeah in the back okay I like that so to recap there
find the smallest one first and and put it at the beginning if I heard you correctly and then presumably you could
do that again and again and again and that would seem to give you a couple of different algorithms and if you all are
attired here do do you want to come on up if you're ready we had some felt volunteers too
come on over so if you all would like to line yourselves up facing the audience in
exactly this order so whoever is number zero should be way over here and whoever is number five should be way over there
feel free to distance as much as you'd like and scooch a little this way if you could okay all right and make a little
more room so seven let's see 5 2 7 four four hopefully one uh and yeah keep him to the side okay one uh six and
there we go three come on over three looking for you all right so here we have an array of eight numbers eight
integers if you will and do you want to each say a quick hello to the group hello I'm Quinn go canid
day hi everyone I'm agad hey I'm Mitchell hi I'm Brett and also go
canid I'm Hannah go Appley hi I'm Matthew gobut hi I'm Miriam go wiup hi I'm
Celeste go Strauss wonderful but welcome all to the stage and let's just visualize perhaps organically how you8
would solve this problem so we currently have the number 0 through 7 quite out of order could you go ahead and just sort
yourselves from zero through seven okay so what did they just do what okay yes first of all yes very well well
done how would you describe what they just did well let's do this could you go back
into that order on the screen 52 74163 0 and could you do exactly what you just did again sort
yourselves all right what it okay yes well done again all right so admittedly there's
kind a lot going on because each of you except number four are doing something in parallel all at the same time and
that's not really how A computer typically works just like a computer can only look at one memory location at one
Locker at a time so can a computer only move one number at a time sort of opening a locker checking what's there
moving it as needed so let's try this more methodically based on the two audience suggestions if you all could
randomize yourself again to 5 2 74 1 1630 let's take the second of those approaches first I'm going to look these
numbers and even though I as the human can obviously see all the numbers and I just kind of have the intuition for how
to fix this we've got to be more methodical because eventually we got to translate this to pseudo code and then
code so let me see I'm going to search for as you propose the smallest number and I'm going to start from left to
right I could do it right to left but left to right just tends to be convention all right five at this moment
is the smallest number I've seen so I'm going to sort of remember that in a variable if you will now I'm going to
take one more step two okay two I'm going to compare to the variable in mind obviously smaller I'm going to for
forget about five and only now remember two as the now smallest element seven nope I'm going to ignore that cuz it's
not smaller than the two I have in mind 4 One okay I'm going to update the variable in mind because that's indeed
smaller now obviously we the humans know that's getting pretty small maybe it's the end I have to check all values to
see if there's something even smaller because six is not three is not but zero is and what's your name again Celeste
Celeste where should Celeste or number zero go according to this proposed algorithm all right all I'm seeing a lot
of this so at the beginning of the array so before doing this for real let's have you pop out in front and could you all
shift and make room for Celeste is this a good idea to have all of them move or equivalently move everything in the
array to make room for Celeste and number zero over there no probably not that felt like a lot of work and even
though it happened pretty quickly that's like seven steps to happen just to move her in place so what would be marginally
smarter perhaps a little more efficient perhaps what's that swapping what do you mean by
swap okay replace two value so if you want to go back to where you were one step over number five he's not in the
right place he's got to move eventually so you know what if that's where Celeste belongs why don't we just swap five and
zero so if you want to go ahead and exchange places with each other notice what's just happened the problem trying
to solve I'm trying to solve has gotten smaller instead of being size eight now it's size seven now granted I moved five
to another wrong location but if these numbers started off randomly doesn't really matter where five goes until we
get him into the right place so I think we've improved and now if I go back my Loop is sort of coming back around I can
ignore Celeste and make this a seven step problem and not eight cuz I know she's in the right place two seems to be
the smallest I'll remember that not seven not four one seems to be the smallest now I know as a human this
should be my next smallest but why intuitively should I keep going do you think I can't sort of optimize as a
human and just say number one let's let's get you into the right place I still want to check the whole array why
yeah maybe there's another one and that could be another problem all together other thoughts yeah another zero there
could be another zero indeed but I I did go through the list once right and I I kind of know there isn't your thoughts
we don't know that value is represented so maybe May you just don't
know yeah I don't necessarily know what is there and honestly I only stipulated earlier that I'm using one variable in
my mind I could use two and remember the two smallest elements I've seen I could use three variables four but then I'm
going to start to use a lot of space in addition to time so if I've stipulated that I only have one variable to solve
this problem I don't know anything more about these elements because the only thing I'm remembering at this moment is
number one is the smallest element I've seen so I'm going to keep going six nope three nope five nope okay I know that
number one and your name was Hannah Hannah is the next smallest element I could have everyone move over to make
room but nope two you know even though you're so close to where I want you I'm just going to keep it simple and swap
you two so granted I've made the problem a little worse but on average I could get lucky too and just pop number two
into the right place now let me just accelerate this I can now ignore Hannah and Celeste making the problem size six
instead of eight so it's getting smaller seven is the smallest nope now four is two is the smallest still two still two
still two so let's go ahead and swap two and seven and now I'll just kind of orchestrate it verbally uh for you're
about to have to do something so we now have 4 7 6 3 five okay three could you swap with four all right now we have 7
six 4 five okay four could you swap with seven now we have six seven five uh five could you swap with six and now we have
seven six six would you swap with seven and now perhaps a round of applause they've sorted themselves okay hang on
there one minute so we'll do this one other approach and my God that felt so much
slower than the first approach but that's one because I was kind of providing a long voiceover but two we
were doing one thing at a time whereas the first time you guys had the luxury of moving like eight different CPUs
brains if you will were all operating at the same time and computers like that exist if you have a a computer with
multiple cores so to speak that's like having a computer that technically can do multiple things at once but software
typically at least as we've written it thus far can only do one thing at a time so in a bit we'll add up all of these
steps but for now let's take one other approach if you all could reorder yourselves like that
5274 1 1630 let's take the other approach that was recommended by just fixing small problems and see where this
gets us so we're back in the original order five and two are clearly out of order so you what let's just bite this
problem off now five and two could you swap now let me take a next step five and seven I think you're okay there's a
gap yes but that might not be a big deal seven and four problem let's have you swap okay seven and one let's have you
swap seven and six let's have you swap seven and three you swap seven and zero you swap now let me pause for just a
moment still not sorted so I'm clearly not done but have I improved the problem right I I can't cheat like before or I
can't optimize like before because zero is obviously not here so let they still way back there so it's not like I've
gone from eight steps to seven to six just yet but have I made any improvements yes in what sense is this
improved what's a concrete thing you could point to as better yeah sorted the highest I've
sorted the highest number which is indeed seven and conversely if you prefer Celeste is one step closer to the
beginning now worst case Celesta is going to have to move one step on each iteration so I might need to do this
thing like n total times to move her all the way over but that might work out okay let me see uh two and five you're
good five and four swap you five and one let's swap you five and six you're good six and three let's swap you six and
zero let's swap you six and seven you're good and I think now notice that the high values as you noted are sort of
bubbling up if you will to the end of the list two and four you're good four and one let's swap four and five good
five and three swap five and zero swap 5 six 7 of course are good so now you can sort of see the problem resolving itself
and let's just do this part now faster two and one two and four okay four and three four and
zero all right now one and two two and three three and zero and good so we do have some optimization there we don't
need to keep going because those all are sorted one and two you're good two and zero
all right done one and zero and big round of applause in closing okay thank you all um we need the
puppets back but you can keep the shirts thank you for volunteering here uh feel free to make your way exit left or right
and let's see if thanks to our volunteers here we can't now formalize a little bit what we did on both passes
here um I claim that the first algorithm our volunteers kindly acted out is what's called select C sort and as the
name implied we selected the smallest element again and again and again working our way from left to right
putting Celeste into the Celestia into the right place and then continuing with everyone else so selection sort as it's
formally called can be described for instance with this pseudo code here 4 I from 0 to n minus one and again why this
this is just how we think about talk about arrays the Left End is zero the right end is n minus one where in this
case happened to be eight people so that's 0 through 7even so for I from 0 to n minus one what did I do I found the
smallest number between numbers bracket I and numbers bracket n minus one so a little cryptic at first glance but this
is just a very pseudo Cod like way of saying find the smallest element among all eight volunteers because if z i
starts at zero and N minus one never changes because there's always eight and uh eight people so 8 - 1 is 7 this first
says find the smallest number between numbers bracket 0 and numbers bracket 7 if you will then what do I do swap the
smallest number with numbers bracket I so that's how we got Celeste from over here all the way over there we just
swapped those two values what then happens next in this pseudo code I of course goes from 0er to one and that's
the technical way of saying now find the smallest element among the seven remaining volunteers ignoring Celeste
this time because she was already in the correct location so the problem went from size 8 to size 7 and if we repeat
size 6 5 4 3 2 1 until boom it's all done at the very end so this is just one way of expressing in pseudo code what we
did a little more organically in a formalization of what we Vol uh what someone volunteered out in the
audience so if we consider then the efficiency of this algorithm maybe abstracting it away now is a bunch of
doors where the leftmost again is always zero the right most is always n minus one or equivalently the second to last
is n minus 2 the third to last is n minus 3 where n might be 8 or anything else how do we think about or quantify
the running time of selection sort Big O of what I mean that was a lot of steps to
be adding up uh it's probably more than n right because I went through the list again and again it was like n + nus1 +
n- 2 any instincts here we got like the whole team in the orchestra now let me let me propose we think about
it this way with with just a bit of uh formula sake so the first time I had to look at n different volunteers n was
eight in this case but in generically I looked at all eight numbers in order to decide who was the smallest and sure
enough Celeste was at the very end she happened to be all the way to the right but I only knew that once I looked at
all eight or all n volunteers so that took me and steps first but once Celeste was swapped into the right place then my
problem was size n minus one and I had n minus one other people to look through so that's n minus one steps then after
that it's nus 2 plus n minus 3 plus n minus 4 plus dot dot dot until I had one final step and it's obvious that I only
have one human left to consider so we might wave our hands at this with a little ellipses and just say dot dot dot
plus one for the final step now what is this actually equal well this is where you might think back unlik your high
school math or physics textbook that has a little cheat sheet at the end that shows these kinds of recurrences that
happens to work out mathematically to be n * n+ one all divided by two that's just what that recurrence that series
actually adds up to so if you take on faith that that math is correct let's just now multiply this out um math work
ma math mathematically that's n^2 + n / 2 or n^2 / 2+ n/ 2 and here's where we're starting to get annoyingly into
the weeds like honestly is n gets really large like a million doors or integers or a billion web pages in Google search
engine honestly which of these terms is going to matter the most mathematically if N is a really big number is n^2 divid
two the dominant factor or is n divided two the dominant Factor yeah n squ i mean no matter what
n is and the bigger it is the bigger uh raising it to the power two is going to be so you know what let's just wave our
hands at this this because at the end of the day as n gets really large the dominant factor is indeed that first one
and you know what even the divided two as I claimed earlier with our two phone book examples where the two straight
lines if you keep zooming out essentially look the same when n is large enough let's just call this on the
order of n s so that is to say a computer scientist would describe bubbl sort as taking on the order of n sared
steps that's an oversimplification if we really added it up it's actually this many steps n squ / 2 plus n /2 but again
if we want to just be able to generally compare two algorithms performance I think it's going to suffice if we look
at that highest order term to get a sense of what the gra what the uh algorithm feels like if you will or what
it even looks like graphically all right so with that said we might describe bubble sort as being in bigo sorry
selection sort as being in Big O of n^ s but what if we consider now the best case scenario an opportunity to talk
about a lower Bound in the best case how many steps does selection sort take well here we need some context like what does
it mean to be the best case or the worst case when it comes to sorting like what could you imagine meaning the best
possible scenario when you're trying to sort a bunch of numbers okay the whole crew here again
yeah all right they're already sorted right I can't really imagine a better scenario than I have to sort some
numbers but they're already sorted for me but does this algorithm leverage that fact in practice even if all of our
humans had lined up from 0 to seven I'm pretty sure I would have pretty naively started here and yes Celeste happens to
be here but I only know she needs to be here once I've looked at all eight people and then I would have realized
well that was a waste of time I can leave Celeste B but then what I would I what would I have done I would have
ignored her position cuz we're solved one problem I would have done the same thing now for seven people people then
six people so every time I walk through I'm not doing much useful work but I am doing those comparisons because I don't
know until I do the work that the people were in the right order so this would seem to imply that the Omega notation
the best case scenario even a lower bound on the running time would be what then a little
ladder it's still going to be N squared in fact because the code I'm giving myself doesn't Leverage or benefit from
any of that scenario because it just mindlessly continues to do this again and again so in this case yes I would
claim that the Omega notation for uh selection sort is also Big O of n^ s so those are the kinds of numbers to beat
it seems like the upper bound and lower bound of selection sort are indeed N squared and so we can also describe
selection sort therefore as being in Theta of n^2 that's the first algorithm we've had the chance to describe that in
which is to say that it's kind of slow I mean maybe other algorithms are slower but this isn't the best starting point
can we do better well there's a reason that I guided us to doing the second algorithm second even though you
verbally propose them in a different order this second algorithm we did is generally known as bubble sort and I
deliberately used that word a minute a bit ago saying the big values are bubbling their way up to the right to
kind of capture the fact that indeed this algorithm Works differently but let's consider if it's better or worse
so here we have pseudo code for bubble sort you could write this two in different ways but let's consider what
we did on the stage we repeated the following n minus one times we initialized at least even though I
didn't verbalize it this way a variable like I from 0 to n minus 2 N minus 2 and then I asked this question if numbers
bracket I and numbers bracket I + 1 are out of order then swap them so again I just did it more intuitively by pointing
but this would be a way with a bit of pseudo code to describe what's going on but notice
that I'm doing something a little differently here I'm iterating from I of equals 0 to nus 2 why well if I'm
comparing two things left hand and right hand i' still want to start at zero but I don't want to go all the way to n
minus one because then I'd be going past the boundary of my array which would be bad I want to make sure that my left
hand I if you will stops at n minus 2 so that when I plus one in my pseudo code I'm looking at the last two elements not
the last element and then past the boundary that's actually a common programming mistake that we'll
undoubtedly soon make but going beyond the boundaries of your array so this pseudo code then allows me to say
compare everyone again and again and swap them if they're out of order why do I repeat the whole thing n minus one
times like why does it not suffice just to do this Loop here think what happened with
Celeste why do I repeat this whole thing n minus one
times yeah [Music] back indeed and I think if I can recap
accurately think back to Celeste again I'm sorry to keep calling on you as our number zero each time through bubble
sort she only moved one step and so in total if there's n locations at the end of the day she needs to move n minus one
steps to get zero all the way to where it needs to be and so this inner loop if you will where we're iterating using I
that just fixes some of the problems but it doesn't fix all of the problems until we do that same logic again and again
and again and so how might we quantify the running time of this algorithm well one way to see it is to just literally
look at the pseudo code the outer loop repeats n minus one times by definition it literally says that the inner loop
the for Loop also iterates and - one times why because it's going from 0 to nus 2 and if that's hard to think about
that's the same thing as 1 to n minus one if you just add one to both ends of the formula so that means you're doing n
minus one Things N minus one times so I literally multiply how many times the outer loop is running by the how many
times the inner loop is running which gives me sort of foil method n minus one squared and I could multiply that whole
thing out well let's consider this just a little more methodically here if I have nus1 on the outer nus1 on the inner
let's go ahead and foil this so n^2 - n - n + 1 combined like terms n^2 - 2 n + 1 and now which of these terms is
clearly going to be dominant so to speak the the n s so yes even though minus 2N is a good thing because it's subtracting
off some of the time required plus one's not that big a thing there's such drops in the bucket when n gets really large
like in the millions or billions certain that bubble S 2 is on the order of n s it's not the same exactly as selection
sore but as n gets big honestly we're barely going to be able to notice the difference most likely and so it too
might be said to be on the order of N squared and if we consider now the lower bound on Bubble sorts running time
here's where things get potentially interesting um what might you claim is the running time of bubble sort in the
best case and the best case I claim is when the numbers are already sorted is our pseudo code going to take
that into account okay n why do you propose [Music]
n yes and that's the key word to summarize in bubble sort I do have to minimally make one pass because if I
don't look at all n elements then I'm theoretically just guessing if it's sorted or not like I obviously
intuitively have to look at every element to decide yay or nay it's in the right order and my original pseudo code
though is pretty naive it's just going to blindly go back and forth n times n minus one times again and again and
that's going to add up but what if I add a bit of an optimization that you might have glimpsed on the slide a moment ago
where if I compare two people and I don't swap them compare two people don't swap them and I go all the way through
the list comparing every pair of adjacent people and I make no swaps it would be kind of not just naive but
stupid to do that same process again because if the humans have not moved I'm not going to make any different
decisions I'm GNA do nothing again nothing again so at that point it would be stupid very inefficient to go back
and forth and back and forth so if I modify our pseudo code with just an additional if condition I bet we can
speed this up inside of that same pseudo code what if I say hey if no swaps quit like quit prematurely before the loops
are finished running one of the loops has gone through per the indentation here but if I do a loop loop from left
to right and I have made no swaps which you can think of as just being one other variable that's plus plus as I go
keeping track of how many swaps if I've made no swaps from left to right I'm not going to make any swaps the next time
around either so let's just quit at that point and that is to say in the best case if you will when the list is
already sorted the Omega notation for bubble sort might indeed be Omega of n if you add that optimization so as to
Short Circuit all of that inefficient Loop in to do it only as many times as is
necessary let me pause to see if there's any questions here [Music]
yeah good question if the uh running time of selection sort and bubble sort are both in Big O of n^2 but they are in
but selection sorts in Omega of n s while bubble sorts in Omega of n which sounds better I think if I may uh should
we just always use bubble sort yes if we think that we might benefit over time from a lot of good case scenarios or
best case scenarios um however the goal at hand in just a bit is going to be to do even better than both of these so
hold that question further for a moment [Music] yeah oh uh no so yes good question so I
say Omega of n but is it technically Omega of n minus one maybe but again we're throwing away lower lower order
terms and that's an advantage because we're not comparing things ever so precisely just like I plotted with the
green and yellow and red chart I just want to get a sense of the shape of these algorithms so that when n gets
really large which of these choices is going to matter the most at the end of the day it's actually perfectly
reasonable to use selection sort or bubble sort if you don't have that much data cuz they're going to be pretty fast
my God our computers nowadays are one gig gz 2 gahz 1 billion things per second one two billion things per second
but if we have large data sets as we will later in the term and as you might in the real world at the Google of the
world then you're going to be want you're going to want to be more thoughtful and that's where we're going
today all right so let's actually see this visualized a little bit in a moment I'm going to change screens here to open
up what is a little visualization tool um that will give us a sense of how these things actually work and look at a
faster rate than our humans were able to do here on stage so here is another visual ization of uh a bunch of numbers
an array of numbers short bars mean small numbers tall bars mean big numbers so instead of having the numbers on
their torsos here we just have bars that are small or tall based on the magnitude of the number let me go ahead and I
preconfigured this in advance to operate somewhat quickly let's go ahead and do selection sort by clicking this button
and you'll see some pink bars flying by and that's like me walking left and right left and right to select the next
smallest number and so what you'll see happening on the left of this array of numbers is Celeste if you will and all
of the other smaller numbers are appearing on the left while we continue to solve the remaining problems to the
right so again we no longer have to touch the smaller numbers here so that's why the problem's getting smaller and
smaller and smaller over time but you can notice now visually look at how many times we're retracing our steps this is
why things that are n s tend to be frowned upon if avoidable because I'm touching the same elements again and
again when I was walking through I kept pointing at the same humans again and again and that adds up so let's see if
bubble sort looks or feels a little different let me randomize the thing and let me now click bubble sort at the top
and as you might infer there's other sorting algorithms out there not all of which we'll look at but here's bubble
sort same pink coloration but it's doing something different it's two pink bars going through again and again comparing
the adjacent numbers and you'll see that the largest numbers are indeed bubbling their way up to the right but the
smaller numbers like our number zero was is only slowly making its way over here's a comparable here's the number
one and it's going to take a while to get all the way to the left and here too notice how many times the same bars are
becoming pink how many times the algorithm is retracing and retracing its steps why because it's only solving one
problem at a time on each pass and each time we do that we're stepping through practically the whole array and now
granted I could speed this up even further if I really wanted to but but my God this is only what like 50 or 60
elements something like that this is slow like this is what n s looks like and feels like and now I'm just trying
to come up with words to say until we get to the finish line here like this would be annoying if this is the speed
of sorting and this is why I sort of secretly sorted the numbers uh for Rave in advance because it would have taken
us an annoying number of steps to get that in place for her so those two algorithms are n SED can we do in fact
better well to save the best algorithm for let's take a shorter five minute break here and when we come back we'll
do even better than n squ all right so the challenge at hand is to do better than selection sort and
better than bubble sort and ideally not just marginally better but fundamentally better just like in week zero that third
and final divide and conquer algorithm was sort of fundamentally faster than the other two so can we do better than
something on the order of n s well I bet we can if we start to approach the problem a little differently the sorts
we've done thus far generally known as comparison sorts and that kind of captures the reality that we were doing
a huge number of comparisons again and again and you kind of saw that in the vertical bars that were going Pink as
everything was being compared again and again but there's this programming technique and it's actually a
mathematical technique known as recursion that we've actually seen before and this is a a building block or
a mental model we can bring to bear on the problem to solve the Sorting problem sort of fundamentally differently but
but first let's look at it in a more familiar context a little bit ago I proposed this pseudo code for the binary
search algorithm and notice that what was interesting about this code even though I didn't call it out at the time
it's kind of cyclically defined like I claim this is an algorithm for search and yet it seems a little unfair that
I'm using the verb search inside of the algorithm for search it's like in English sort of defining a word by using
the word normally you shouldn't really get away with that but there's something interesting about this technique here
because even though this whole thing is a search algorithm and I'm using my own algorithm to search the left half or the
right half the key feature here that doesn't normally happen in English when you define a word in terms of a word is
that when I search the left half or search the right half yes I'm doing the same thing I'm using the same algorithm
but the problem is by definition half as large so this isn't going to be a cyclical argument in the same way way
this approach by using search within search is going to whittle the problem down and down and down until hopefully
one door or no doors remains and so recursion is a programming technique whereby a function calls itself and we
haven't seen this yet in C and we haven't seen this really in scratch but in C you can have a function call itself
and the form that takes is like literally using the function's name inside of the function's implementation
itself we've actually seen this uh an opportunity for this once before too
think back to week zero here's that same pseudo code for searching for someone in an actual physical phone book and notice
these yellow lines here we describe those in week zero as inducing a loop a cycle and this is a very procedural
approach if you will because lines eight and 11 are very mechanically if you will telling me to go back to line three to
do this kind of looping thing but really what that's doing in the binary search algorithm for the phone book is it's
just telling me to search the left half or search the right half I'm doing it more mechanically Again by sort of
telling myself what line number to go back to but that's equivalent to just telling myself go search the left half
search the right half the key thing being the left half and the right half are smaller than the original problem it
would be a bug if I just said search the phone book search the phone book search because obviously you never get anywhere
but if you search the half the half the half problem gets smaller and smaller so let's reformulate week Zero's phone book
code to be not procedural as here but recursive whereby in this search algorithm AKA binary search formerly
called divide and conquer I'm going to literally use also the keyword search here notice among the benefits of doing
this is it kind of tightens the code up makes it a little more succinct even though that's kind of a fringe benefit
here but it's an elegant way too of describing a problem by just using having a function use itself to solve a
smaller puzzle at hand so let's now consider a familiar problem a smaller version than the one you've dabbled with
this sort of pyramid this half pyramid from Mario and let's throw away the the parts that aren't that interesting and
just consider how we might up until now implement this in C code this uh left align pyramid if you will let me go over
here and let me create a file called how about um uh iteration . C and in this file I'm going to go ahead and include
cs50.h and I'm going to include standardin uh standard io. and the goal at hand is to implement in C A little
program that just prints out this and exactly this pyramid so no get string or any of that we're just going to keep it
simple and print exactly this pyramid of height four here so how might I do this well let me go ahead and in main let me
first ask the user for um well we'll go ahead and generalize it let's go ahead and ask the user for Heights we're using
get int as before and I'll store that in a variable called height and then let me go ahead and simply call a function draw
passing in that height so for the moment let me assume that someone somewhere has implemented a draw function and this
then is the entirety of my program all right unfortunately C does not come with a draw function so let me go ahead and
invent one uh it doesn't need to return a value it just needs to print something so called side effect so I'm going to
define a function called Draw that takes as input an INT I'll call it n for number but I could call it anything I
want and inside of this I'm going to go ahead and print out a left aligned pyramid like this from top to bottom the
Salient features here are that this is a pyramid at least in this example of height four and on height four the first
row has one brick the second row has two the third has three the fourth has four that's a nice pattern that I can
probably represent in code so how might I do this well how about for in I gets let me do it the old school way one and
then I is less than uh or equal to n and then i+ plus so I'm going from 1 to four just to keep myself saying here and then
inside of this Loop what do I want to do well let me let me keep it conventional in fact let me just change this to be
the more conventional 0 to n even though it might not be as intuitive because now on row zero I want one brick on Row one
I want two bricks dot dot dot on Row three I want four so it's kind of offset now but I'm being more convinced
so on each row how many bricks do I want to print well I think I want to do this for INT J for instance commonly uh
common to use J after I if you have a nested Loop let's start J at zero and do this so long as J is less than I + 1 and
then do j++ so y i + 1 well again when I equals z that's the first row and I want one brick when I equals 1 that's the
second row I want two bricks and dot dot dot when I is three I want four bricks so again I have to add one to I to get
the total number of bricks that I want to print on the screen so inside of this nested for Loop I'm going to do print F
of a hash with no line bra uh line uh new line I'm going to save the new line for about here instead all right the
last thing I'm going to do is copy and paste the Prototype at the top of the file so that I can call this and again
this is sort of now week one week two wouldn't necessarily come to your mind as quickly as it might to mine after all
this practice but this is something reminiscent of what you yourselves did already for Mario printing out a pyramid
that hopefully in a moment is going to look like this so let me go back to my code let me run make
iteration and let me do do/ iteration I'll type in four and voila seems to be correct and let's assume it's going to
work for other inputs as well oh thank you this is so this is indeed an example of
iteration doing something again and again and it's very procedural like I literally have a function called Draw
that does this thing but I can think about implementing draw in a somewhat different way that's kind of clever and
it's not strictly necessary for this problem because this problem honestly is not that complicated to solve once you
have practice under your belt certainly the first time around probably significantly challenging but now that
you kind of associate okay Row one with one brick row two with two bricks it kind of comes together with these four
loops but how else could we think about this problem well this physical structure these bricks in some sense is
a recursive structure a structure that's defined in terms of itself now what do I mean by that well if I were to ask you
the question what is what does a pyramid of height four look like you would point of course to this picture but you could
also kind of um you know cleverly say to me well it's actually a pyramid of height three plus one additional row and
here's that cyclical argument right kind of obnoxious to do typically in English or in a spoken language because you're
defining one thing in terms of itself what's a pyramid of height four well it's a pyramid of whoops it's a pyramid
of height three plus one more row but we can kind of Leverage this logic in code well what's a pyramid of height three
well it's a pyramid of height two plus one more row fine what's a pyramid of height two well it's a pyramid of height
one plus one more row and then hopefully this process ends and it does because notice the pyramid is getting smaller
and smaller so you're not going to sort of have this sort of silly back and forth with me infinitely many times
because when we finally get to the base case the end of the pyramid fine what is a pyramid of height one well it's a
pyramid of no height plus one more row and at that point things just get Negative no pun intended things just
would otherwise go negative and so you can just kind of stop the base case is when there is no more pyramid so there's
a way to sort of draw a line in the sand and say stop no more arguments but this idea of defining a physical structure in
terms of itself or code in terms of itself actually lets us do some interesting new algorithms let me go
back to my code here let me go ahead and create one final file here uh called recursion doc that leverages this idea
of this builtin uh self-referential nature let me include cs50.h let me go ahead and include
standard io. in main void and then inside of Main I'm going to do the exact same thing int height equals uh get int
asking the user for height and then I'm going to go ahead and call draw passing in height so that's going to stay the
same I even am going to make my prototype the same void draw int n semicolon and now I'm going to implement
void down here with that same prototype of course but the code now is going to be a little different what am I going to
do here well first of all if you ask me to draw a pyramid of height n I'm going to be kind of a you know wise ass here
and say well just draw a pyramid of nus one done all right but there's still a little more work to be done what happens
after I print or draw a pyramid of height n minus one according to our structural definition a moment
ago what remains after drawing a pyramid of height n minus one or three specifically we need one more row of
hashes okay so I can do that right I'm I'm okay with the single Loops there's no nesting necessary here I'm just going
to do this for in I gets zero I is less than n which is the height that's passed in i++ and then inside of this Loop I'm
very simply going to print out a single hash and then down here I'm going to print out a new line at the very end so
that's good right I might not be as comfortable with nested Loops this is nice and simple what does this Loop do
here on line 17 through 20 it literally prints n hashes by counting from I equals 0 on up two to but not through n
so that's sort of week one style syntax but this is kind of trippy now because I've somehow boil down the
implementation of draw into printing a row after just drawing the thing above it but this is problematic as is because
in this case my draw function notice is always going to call the draw function forever in some sense but
ideally when do I want this cyclical process to stop when do I want to not call draw anymore
yeah when n is one right when I get to the top of the pyramid when n is one or heck when the pyramid's all gone and n
equals zero I can pick any Line in the Sand so long as it's sort of at the end of the process then I don't want to call
draw anymore so maybe what I should do is this uh if if n equals equals z there's really
nothing to draw so I'm just going to go ahead and return like this otherwise I'm going to go ahead and draw uh n minus
one rows and then one more row and I could express this differently I could do something like this which would be
equivalent I could say something like if n is greater than or equal to zero then go ahead and draw the row but I like it
this way first for now I'm going to go with the original way just to ask a simple question and then just bail out
of the function if n equals 0 and heck just to be super safe just in case the user types in a negative number let me
also just check if N is a negative number also just return immediately don't do anything I'm not returning a
value because again the function is void it doesn't need or have a return value so just saying return suffices but if n
equals 1 or two or three or anything higher it is reasonable to draw a pyramid of slightly shorter height like
instead of four three and then go ahead and print one more row so this is an example now of code that calls itself
within itself draw is calling draw but this so-called base case ensures this conditional ensures that we're not going
to do this forever otherwise we literally would do this infinitely many times and something bad is probably
going to happen all right let me go ahead and compile this code make recursion okay no syntax errors do/
recursion enter height of four and voila if only because some of you have run into this issue accidentally already let
me get rid of the base case here and let me recompile the code make recursion oh and actually now it's actually catching
it so the compiler is smart enough here to realize that all pass through this function will call itself AKA it's going
to Loop forever so let me do the first thing suppose I only check for n equaling zero let me go ahead and
recompile this code with make recursion and now let me just be kind of uncooperative when I run this PR prog
still works for four still works for zero what if I do like 00 have any of you experienced a
segmentation fault or core dump okay so no shame in this like this means I have somehow touched memory that I shouldn't
have and in short I actually called this function thousands of times accidentally it would seem now until the program just
bailed on me because I eventually touched memory in the computer that I shouldn't have that'll make even more
sense next week but for now it's simply a bug and I can avoid that bug in this context probably not your own pet
context by just making sure we don't even allow for negative numbers at all so with this building block in place
what can we now do in terms of those same numbers to sort well it turns out there's a sorting algorithm called merge
sort and there's Bunches of others too but merge sort is a nice one to discuss because it fundamentally I we hope is
going to do better than selection sword and bubble swort that is better than n s but the catch is it's a a little harder
to think about in fact I'll act it out myself with just these numbers on the Shelf here rather than humans because
recursion in general takes a little bit of effort to wrap your mind around typically a bit of practice but I'll see
if we can't walk through it methodically enough such that this comes to light so here is the pseudo code I propose for
this algorithm called merge sort in the spirit of recursion this sorting algorithm literally calls itself by
using the verb sort in its pseudo code so how does merch sort work it sort of obnoxiously says well if you want to
sort all of these things go sort the left half then go sort the right half and then merge the two together now
obnoxious in what sense well if I just asked you to sort something and you just tell me well go sort that thing and then
go sort that thing what was the point of asking you in the first place but the key is that each of these lines is
sorting a smaller piece of the problem So eventually we'll be able to pair this down into something that doesn't go on
forever because in fact in merge sort there's a base case to there's a SC scario where we just check wait a minute
if there's only one number to sort that's it quit then because you're all done so there has to be this base case
in any use of recursion to make sure that you don't mindlessly call yourself forever you've got to stop at some point
so let's focus on the third of these steps what does it mean to merge two lists uh two halves of a list just cuz
this is apparently going to be a key ingredient so here for instance are two halves of a list of size 8 um we have
the numbers two and I'll call it out if you're at a bad angle 2 4 5 7 and 0 136 notice that the left half at the
moment 2 457 is already sorted and the right half 0136 is also sorted as well so that's a good thing because it means
that theoretically I've sorted the left half already I've sorted the right half already before we begin I just need to
merge these two halves what does it mean to sort two halves well for the sake of discussion I'm just going to turn over
most of the numbers except for the first numbers in each of these halves there's two halves here left and right at the
moment I'm only going to consider the leftmost element of each half that is the one on the left here and the one on
the left here how do I merge these two lists together well if I look at two and I look at zero which one should
presumably come first the smaller one so I'm going to grab the zero and I'm going to put it into its own place on this new
shelf here and now I'm going to consider as part of my iteration the beginning of this list and
the new beginning of this list so I'm now comparing two and one which one's smaller I'm going to go ahead and grab
the one now I'm going to compare the beginning of the left list and the new beginning of the right list two and
three of course it's two now I'm going to compare the beginning of the left list and the beginning of the right list
four and three it's of course three now I'm going to compare the four four against the beginning and end it
turns out of the second list four of course now I'm going to compare the beginning of the left list and the
beginning of the right list five of course I'm realizing this does not going to this is not going to end well because
I left too much distance between the numbers but that has nothing to do with the algorithm seven is the beginning of
the left list six is the beginning of the right list it's of course six and at the risk of knocking all of these over
if I now make room for this element we have hopefully sorted the whole thing by having merg
together the two halves of the list so in short thank you I'm a little worried that's just
getting sarcastic now but now we have we now have merged two have lists we haven't done the guts of the pro of the
algorithm yet sort the left half and sort the right half but I claim that that is how mechanically you merge two
sorted Hales you keep looking at the beginning of each list and you just kind of weave them together based on which
one belongs first based on its size so if you agree that that was a reasonable way to merge two lists together let's go
ahead and focus lastly on what it means to actually sort the left half and sort the right half of a whole bunch of
numbers and for this I'm going to go ahead and order them in this seemingly random order and I just have a little
cheat sheet above so that I don't mess up and I'm going to start at the very top this time and and hopefully these
will not fall down at any point but I'm just deliberately putting them in this random order
5274 and then we have 1 163 0 1 six 3 0 hopefully this won't fall over here is now a array of size eight with eight
integers and I want to sort this I could use selection sort and just go back and forth and back and forth I could use
bubble sore and just compare pairs pairs pairs but those are going to be on the order of Big O of n^2 my hope is to do
fundamentally better here so let's see if we can do better all right so let me look now at my code I'll keep it on the
screen how do I Implement merge sord well if there's only one number I quit there's obviously not there's eight
numbers so that's not applicable I'm going to go ahead and sort the left half of numbers all right here's the left
half 5274 how do I sort an array of size four well here's where we the recursion kicks
in how do you sort a list of size four well there's the pseudo code on the board I sort the left half of the of the
list of size four so here we go I have a list of size four how do I sort it I sort the left half all right now I have
a list of size two how do I sort this well sort the left half so here we go here's a list of size one how do I
sort this I think it's done right that's quit right if only one number I'm done the
five is sorted all right what was the next step you have to now rewind in time I just sorted the left half of the left
half of the left half what do I now sort the right half which is two this is one Element so I'm done so now at this point
in the story I have sorted sort of idiotically the five is sorted and the two is sorted but what's the third and
final step of this phase of the algorithm merge the two together so here's the left here's the right list
how do I merge these together I compare the lists and I put the two there I only have the fist and I do that so now we
see some visible progress but again let's rewind how did we get here we started to sort the left half of the
left half of the left half then the right half and now where are we we've just sorted the left half of the left
half so what comes after sorting the left half of anything right half all right here's the sort of same
nonsensical thing here's a list of size two let's sort the left half done let's sort the right half done what's the
third step merge them together so that's the four four and that's the seven what have I now done in total I've now sorted
the left half of the original thing so what happens next wait a minute wait a minute I have not done that what have I
done I have sorted the left half of the left half and I've sorted the right half of the left half what do I now need to
do lastly merge those two lists together so again I put my finger on the beginning of this list the beginning of
this list and if you want I'll do the same thing when I merged last time to be clear what I'm comparing two and four
the two obviously comes first what comes next well the four comes next what comes next the five comes next and then lastly
of course the seven notice that the two four 5 seven are now sorted so the original left half is sorted and now
I'll do the rest a little faster CU my God this feels like it takes forever but I bet we're on to something here what
step remains next I've just sorted the left half of the original sort the right half of the original how do I sort this
I sort the left half of the right half how do I sort this I sort the left half of the left half done I sort the right
half of the left half done now I merge the two together the one comes first the six comes next now I sort the right half
of the right half what do I do sort the left half done sort the right half done what do I
do merge them together so that's the third step of that phase now where are we in the story oh my God where are we
in the story we have sorted the left half of the right half and the right half of the right half what comes next
merge so I'm going to compare and I'm going to move those down just to make clear what I'm comparing the beginning
of both sublists what comes first of course the zero what comes next what's comes next uh the
one what comes next the three three and then lastly comes the six all right where are we in the story we've now
sorted the left half of the original and the right half of the original what step remains merge all right so I'm going to
make the same point and this is actually literally what we did earlier because I deliberately demoed those original
numbers in this order two and a zero this comes out first what comes next two and one the one comes out next what
comes next the two comes next what comes next the three three comes next what comes next the
four what comes after that the five what comes after that the six and lastly slight this is when we run out of
memory the seven over there is actually in place
okay okay so admittedly a little harder to explain and honestly gets a little trippy because it's so easy to forget
about like where you are in the story because we're constantly like diving into the algorithm and then backing back
out of it but in code we could express this pretty correctly and it turns out pretty efficiently because what I was
doing even though it's longer when I do it verbally I was touching these elements a minimal amount of times right
I wasn't going back and forth back and forth in front of the Shelf again and again I was deliberately only ever
merging the smallest elements in each list so every time we merged even though I was doing it quickly my fingers were
only touching each each of the elements once and how many times did we uh divide divide divide and half the list well we
started with all of the elements here and there were eight of them and then we moved them one two three positions so
the height of this visualization if you will is actually log n right if I started with eight turns out if you do
the arithmetic this is log n height because 2 3 is8 but for now just trust that this is a n height and how wide is
the Shelf well it's a with n because there's n elements anytime they were on the Shelf so technically I was kind of
cheating this algorithm because this is the first time I've needed shelves right with the human examples we just had the
humans and that's it and only eight of them here I was sort of using more and more memory in fact I was using like
four times as much memory even though that was just for visualization sake merge sword actually requires that you
have some spare space an empty array to move the elements into when you're merging them together but if I really
wanted and if I didn't have this shelf or this shelf honestly I could have just gone back and forth between the two
shelves that would have been sufficient so merg sore uses more memory for this merging process but the advantage of
using more memory is that the total running time if you can perhaps infer from that math is what the Big O
notation for merge sort it turns out is actually going to be n time log n and even if you're a little rusty still on
your logarithms we saw in week zero and again today that log n is smaller than n right that's a good thing any binary
search was log n that's faster than linear search which was n so n * log n is of course smaller than n * n or N
squared so it's sort of lower on this little cheat sheet that I've been drawing which is to suggest that its
running time is indeed better or faster and in fact if we consider the best case running time turns out it's not quite as
good as bubbles sore with Omega of n where you can just sort of abort if you realize wait a minute I've done no work
merge sort you actually have to do that work to get to the Finish Line anyway so it's actually in Omega and ultimately
Theta of n log n as well so again a trade-off there because if you happen to have a data set that is very often
sorted honestly you might want to stick with bubble sore but in the general case where the data is unsorted n log n is
sounding better than n s well what does it actually look or feel like give me a moment to just change over to our
visualization here and we'll see with this example what merge sort looks like depicted with now these vertical bars so
same algorithm but instead of my numbers on shelves here is a random array of numbers being
sorted and you can see it being done half at a time and you see sort of remnants of algori of the previous bars
actually that was unfair let me zoom out here let me zoom out so you can actually see the height here let me go ahead and
randomize this again and run merge sort there we go now you can see the second array and where the values are going
temporarily and even though this one looks way more cryptic visualization wise it does seem to be moving faster
and it seems to be merging halves together and boom it's done let's actually see in conclusion what these
algorithms compare to and consider that moving forward as we write more and more code the goal is again not just to be
correct but to be welld designed and one measure of design is going to indeed be efficiency so here we have in final a
visualization of three algorithms selection sort bubble sort and merge sort from top to bottom and let's see
what these algorithms might look or sound like here if we can dim the lights for dramatic
effect selections on top bubble on bottom merge in the middle [Applause]
[Music] [Music] [Music]
[Music] [Music] well this is cs50 and already this is
week four and recall that last week week three we began to explore like the inside of a computer's memory a bit more
we talked about arrays which were just chunks of memory back to back to back that really laid things out left to
right top to bottom and this is actually a pretty common Paradigm even if you're new to programming and certainly new to
see you've kind of seen this approach of just using uh memory in some way to lay things out like images for instance so
for instance here is a a photo taken of uh last week's front uh front row for instance and this is kind of an
opportunity to explore exactly what happens if we start to zoom in and zoom in and zoom in because it seems like
most any TV show like CSI whatever or or any movie that explores forensic information might have the U
investigators sort of zoom in on an image like this to see what the glint in someone's eye is because that reveals
like the license plate number of someone that just drove past right something that's a little over the top there but
there's an opportunity here to speak to why that is so unrealistic for instance let's zoom on this puppet here's eye and
let's zoom in a little more to see what might be reflected let's zoom in a little more and that's it there's only
finite amount of information if you have an image represented in this way using pixels these dots on the screen as rows
and columns because if you're only using a finite amount of memory then at the end of the day you can only store a
finite amount of information and at least I don't really see in this grid here any glint of a license plate or
something like that that you might otherwise see in in Hollywood so today we'll explore sort of these kinds of
representations of how you might use memory in new and interesting ways to represent now very familiar things but
also start to explore what some of the limitations are of this representation but consider after all that this doesn't
need to be even as high resolution as many pixels as something like this other image you could imagine just doing
something silly with Post-it notes like this and if you think of an image as just having rows and columns these rows
otherwise known as scan lines something we'll explore in the coming week you could make this sort of fun smiley face
by just using two different values maybe a zero and a one or yellow and purple or vice versa just to kind of make
something come to life now in practice recall we talked about storing not just a zero or one but maybe an r a g and a b
value like 24 bits or three uh bytes in total but we'll come back to that that would just be a more involved image but
if for fun if today you sort of want to um uh tackle something passively in the background if you go to uh this URL here
we've put together an opportunity to do a sort of uh a bit of pixel art if you go to this URL here that will redirect
you to a Google spreadsheet if you have a laptop with you today that'll look a little something like this which we sort
of organized in rows and columns so if you'd like to go ahead and use Google spreadsheets colorization uh feature to
color in those individual squares if you'd like see if you can't make something a little creative and then
email it to Carter and we'll exhibit some of the uh the best or favorites on the website thereafter so let's
transition then to something a little more familiar images and not all of you have used presumably Photoshop but
you're probably generally familiar with Photoshop as a program for editing and creating uh images or photos or the like
and here is a screenshot of photo sh's Color Picker via which you can change like what color you're going to draw
with the paintbrush or what color you're going to fill in with the the paint buckets it's representative of any kind
of graphical tool and there's kind of a lot of information in here but there's perhaps some familiar terms now r g and
B in fact right now this is uh photoshop's way of saying you're about to fill in your background or foreground
with the color black and that appears to be represented with an r a g and a b value of 0 0 or alternatively using a
hash symbol and then Z 00 0 00 0 and if some of you have already made web pages before and you
know a little bit of HTML and CSS you probably are familiar with this kind of syntax like a hash symbol and then six
or sometimes three digits thereafter and if we look at a few different colors here for instance here might be the
representation of white now the r the G and the B values went way up from 0 to 255 255 255 or alternatively it looks
like Photoshop and in turn uh web browsers could represent that same color white with FF FF FF and let's just do a
few others here is red and it turns out that red is a whole lot of red 255 but no green no blue or AKA
ff0000 so there's perhaps a pattern here emerging here is green 0 255 0 AKA 00 ff00 or lastly here blue which is no red
no green but apparently a lot of blue 255 again AKA 000000 FF now some of you might have seen these this notation
before these zeros and these fs and all of the numbers and letters in between but this is another form of notation and
in fact we'll explore this today really is just a precondition for talking about some other Concepts but the ideas
ultimately are really no different what we're about to see is a different base system uh not just binary not just
decimal but something we we're about to call heximal but first recall that with RGB we previous did did did the
following any RGB value red green blue just combin some amount of red or green or blue so here we have 72 73 33 which
in the context of an email or a text of course said what a couple weeks back just high with an exclamation point but
in the context of a Photoshop like program this might instead be representing collectively this shade of
yellow for instance when you combine that much red that much green that much blue so here is the same idea if you've
got a lot of red no green no blue together that's going to give us red if you've got no red a lot of green no blue
that's going to give us of course green if you've got no red no green a lot of blue that of course is going to give us
blue so there's a pattern emerging here where apparently 0 0 is none as always and FF is apparently a lot and it's
maybe somehow equated with 255 at least per that photoshop screenshot meanwhile if we combine one last one a lot of red
a lot of green a lot of blue that's actually going to give us a single white pixel like this all right so think back
here was binary in the world of binary you had just two digits zero and one could have been anything else A or B X
or Y but the world standardized on these uh uh numerals Z and one in our world's Decimal System of course you have 0
through n as of today though we're going to start using hexadecimal sometimes in the context of images and also files
just because it's a convention and there's some conveniences to it where now you're going to be able to count up
to F in a notation called heximal from 0 through 9 then you keep going to A to B to C to D to E to F the idea being each
of these even though it's weirdly a letter of the English alphabet it's still just a single symbol it's not one
Zer for 10 or 1 one for 11 all 16 of these values these digits so to speak are indeed still just single symbols and
that's a characteristic of just using this other notational system so how do we get from 0 0 uh and FF to something
like 0o and 255 respectively well this heximal system AKA base 16 just does the math from week zero and really grade
school a little bit differently for instance if you have a number that's got two digits or hexadecimal digits as of
today the columns are just a little different instead of powers of two or powers of 10 which we saw for binary and
decimal respectively it's powers of 16 so if we just kind of do the math out that's the ones column this is the 16's
column and so forth things get actually pretty big pretty quickly in this system but now let's just consider how we would
represent familiar numbers if you've got two heximal digits for which these hash are just placeholders 0 0 is going to
mathematically equal the decimal number you and I know of course as zero why same thing as week Z 16 * 0 + 1 * 0 is
the number you and I know as zero and we can count up from here this in heximal would be how a computer represents the
number we know is one it would be 01 in this case this would be 2 3 4 5 6 7 8 n in decimal we're about to go to 10 but
in heximal to be clear what comes next so apparently a so 0 a 0 B which is now 10 or 11 or 12 13 14 15 so using heximal
is just kind of an interesting way of using single symbols now 0 through F to count from zero through 15 and we'll see
why it's 15 in a moment but as soon as we get to F anyone want to conjuncture how in hexad decimal AKA hex do we now
count up one position higher what comes after z f in hexit as small so one Z it's the same kind of
thing like once you're at the highest digit possible F or in our decimal world it would have been nine you sort of add
one more and nine wraps around to zero or in this case f wraps around to zero you carry the one and voila now we're
representing the number you and I know is 16 and we could keep going forever literally this could be 17 18 19 20 in
decimal but let's just wave our hands at it and count as high as we can dot dot dot the highest we could count in hexad
with two digits just logically would be what in heximal something something FF I heard so yes that's the
biggest digit possible so FF is what we have so how high can you count in heximal if you've got just two of these
digits well it's the same math as always 16 * F AKA 15 so that's 16 * 15 + 1 * F or 1 * 15 that gives us 240 plus 15 in
decimal the result of which of course now is 255 so this heximal system you may have seen
in the world of web pages and if you haven't we'll get to that in this class in a few weeks or we just saw in the
context of Photoshop just has this sort of shorthand notation of counting as high as 255 but just calling it FF now
it's marginal but that's like 50% Savings of how many digits you need in order to count as high as 255 because in
decimal of course 255 is three digits in heximal you can count as high using just two and that difference is going to get
magnified the bigger our numbers get let me stipulate for now you're going to get more and more Savings in terms of just
how many symbols you need on the screen to represent bigger and bigger numbers than that all right let me pause here
just to see if there's any questions thus far on what we've called heximal which again just gives us Z through n as
well as a through F any questions or confusion and if it feels like we're
lingering a bit much on arithmetic we're not really going to see other notations besides this moving forward these are
sort of the go-to three in a a programmer's world typically but there are some others
yeah good question does heximal require more storage or less storage than a decimal system theoretically no because
this is just a way of representing information and we'll see in a concrete example in a moment um but inside of the
computer at the end of the day you're still storing bits and using hexadecimals not using more more or
fewer bits think of this as how you might write it down on a piece of paper just how many digits you're going to
write or on a computer screen how many digits you're going to see at once but it doesn't change how many how the
computer is representing information because all they're representing at the end of the day is zeros and ones so in
fact let's go there if this a moment ago FF I claim was 255 let's just rewind to week zero and if we wanted to count to
255 in binary that's as high as you can count recall with eight bits and there's only a few of these numbers that are
used ful to memorize like 255 is as high as you can count with eight bits if you start at zero CU 2 to the E8 is 256 but
if you start at 0 it's zero through 255 so in binary recall if you have eight bits all of which were ones and I won't
do out the math pedantically here but if I do do this plus this plus this dot dot dot that's also going to give me
255 so this is what's interesting here about heximal it turns out that in upside of storing values in heximal is
that we're going to see the first F sort of represents the left half of all these bits and the second F in this case
represents the rightmost four of these bits so it turns out hexadecimal is very useful when you want to treat data in
units of four it's not quite eight but units of four and that's not bad which is why if you use two digits like I have
thus far 0 0 or FF or anything in between that's actually a convenient way of representing eight bits in total one
hex de for the first four bits one hex digit for the second and again there's nothing new intellectually here per se
it's just a different way of representing the same story as before zeros and ones so in what context do we
see this well we talked about memory last week we're going to talk more about it this week if this is my computer's
Ram Random Access Memory you can again think of each bite as having a a number associated with it like its address or
location this might be zero this might be like two billion and so in the past I described these as just this using
decimal numbers here's bite 0 1 2 3 4 5 6 7 15 16 would be here and so forth but it turns out in the world of memory and
thus today programming people tend to count memory bytes using heximal partly just by convention but also partly
because it's a little more succinct and again each digit represents four bits uh typically so what comes after F here
well if I think about the computer's memory I normally might do um after uh F which is 15 16 but instead 1 1 1 1 2 13
this is not 10 11 12 13 because I claim I'm in the context of heximal now as per the previous slide we already started
going into A's through FS so you immediately see here a possible problem like why is this now worrisome if all of
a sudden you're seeing seemingly familiar numbers like 1 10 11 12 13 we didn't really stumble off uh
across this problem when it was like all zeros and ones before yeah yeah so if you reading some code
and C that's doing some math you might accidentally or the computer might accidentally confuse heximal with
decimal if they look in some context the same I mean any number on the board that doesn't have a letter is Ambiguously
heximal or decimal at this point and so how might we resolve this what it turns out that what computers typically do is
this by convention anytime you see visually 0x and then a number that's a human Convention of saying signaling to
the reader that this is in fact a heximal number so if it's 0x10 that is not the number 10 that is
the heximal number one which recall we said earlier is how you count up to 16 and again and again
these are not the kinds of things to memorize it's really just the system for how you think about these things so
henceforth today we're going to start seeing heximal in a bunch of context when you write code you might even write
code using some heximal but again it's just a different way of representing numbers and humans have different
conventions for different contexts all right so with that said any questions now on this building block but here on
out we'll start using it in some actual code any questions nothing so far all right so let's go ahead and consider
maybe a familiar example something where involving code where I initialize a variable like n to a value like 50 in
this case and then let's start to Tinker around with what's going on inside of the computer's memory in a moment I'm
going to load up VSS code on my computer and I'm going to go ahead and whip up a program that very simply assigns a value
like the number 50 to a variable called n but today keep in mind that that V variable n and that value 50 is going to
be stored somewhere in my computer's memory and it turns out today we'll introduce a bit more syntax you can
actually see where things are being stored so let me click over to vs code here I'm going to create a program
called address. C just to explore computers addresses today and I'm going to do an include standard i.h in main
void as usual no command line Arguments for now I'm going to declare that variable n equals 50 and then I'm just
going to go ahead and print it out so nothing very interesting but I'll use percent I back sln and then comma n to
print out that value nothing here should be very interesting to compile or run but I'll do it just to make sure I
didn't make any mistakes looks like as expected it simply prints out the number z uh 50 like this but let's consider
then what this code is doing underneath the hood when it's actually run on your machine so here we have that grid of
memory that variable n is an INT and if you think back how many bytes typically do we use for an INT yeah four four so
four bytes or 32 bits so if each of these squares represents one bite then my computer Somewhere In My Memory or
Ram is is using four of these squares maybe it ends up over here just because there's other stuff being used Elsewhere
for instance though I don't really know and frankly I don't really care where it ends up just that it ends up somewhere
so the variable the value 50 is stored here in a variable called n even though I've written it as decimal just like in
my code let me again remind that this is 32 zeros and ones representing that 50 it's just going to be very tedious if we
start writing everything in binary so I'll use the more comfortable human Decimal System so that's what's going on
inside of the computer's memory so what if I actually wanted to start tinkering with its location or maybe just knowing
its location well this variable n indeed has a name n that's a label of sorts for it but at the end of the day that 50 is
technically at a specific address and I'm going to make one up 0x123 and it's one two3 because I really
don't care what it is I just want an address for the sake of discussion so way over here off screen might be BTE
zero way down here is b ox1 2 three it's in heximal notation just by convention so how can I actually see where my
variables are ending up in memory if I'm curious to do so well let me go back to my code here and let me actually change
this just a little bit let me go ahead and introduce for instance another symbol here and another topic Al
together namely pointers so a pointer is a variable that stores the address of some value the location of some value or
more specifically the specific bite in which that value is stored so again if you think of your memory as being a
whole bunch of bytes zero at top left two billion or whatever at bottom right depending on how much RAM you have each
of those things has a location in our address a pointer is just a variable storing one such address so it turns out
that in the world of C there's a couple of new symbols we can use if we want to see what it is we're talking about here
and those two operators as of today are these you can use the Amper sand operator in C in a couple of ways we
already saw it very briefly to do Ampersand Amper sand to kind of and two two Boolean Expressions together in the
context of a conditional this is different a single Amper sand is the address of operator so literally in your
code if you've got a variable like n or anything else and you write Ampersand n c is going to figure out for you what is
the address of that variable n in the computer's memory and it's going to give you a
number a number the otherwise known as the address of that if you want to store that address in a variable even though
yes it's a number like ox123 you have to tell see in advance that you want to store not an INT per se
but the address of an INT and the synx for doing that somewhat non- obviously is to use an asterisk here a star
operator and you say this when creating the variable if you want P to be a pointer that is the address of some
other variable you do in Star p and the star just tells the computer this is not an integer per se this is the address of
something that yes is an INT but we're just being more precise so on the right hand side you have the addressive
operator as always with the equal sign you copy from right to left because Ampersand n is by definition the address
of something you have to store it in a pointer and the way to declare a pointer is to specify the type of value whose
address you're storing and then use the star to indicate that this is indeed a pointer and not just a regular old
int so let's see this in practice let me go back to my own source code here and let me make just a couple of tweaks I'm
going to leave n alone here but I'm going to go ahead and initially just do this uh let me say in Star P equals
Ampersand n and then down here I'm going to print out not n this time but P the variable p and then even though yes it's
just a number and therefore I kind of sort of could use percent I for integers there's actually a special format code
in printf for printing pointers or addresses and that's percent P so now let's go ahead and recompile this make
address so far so good/ address enter and a little weirdly but perhaps understandably now the address in my
computer's memory memory at which the variable n happened to be stored was not quite as simple as ox123 this computer
has a lot more memory so technically it was stored at Ox 7 FF CB 4578 e5c now that has no special
significance to me it could have ended up somewhere else altogether but this is just where in my computer or technically
the Cloud Server to which I'm connected using VSS code here that just happens to be where n ended up and strictly
speaking I don't even need to introduce this variable I could get rid of p and I could just say print not just n but the
address of N and achieve the same thing you don't need to temporarily stored in a variable let me just do make address
again/ address and now I see this address here and notice if I keep running the program it's actually moving
around there's other stuff presumably going on inside of the computer maybe it's actually randomizing it so it's not
always at the same location that can actually be a security feature underneath the hood but this happens to
be at that moment in time when where that value is in memory quite like our picture a moment ago all right so let me
pause here to see if there's now any questions on what we just did yeah really good question is there any
way to control where something is in memory short answer is yes and this is both the power and the danger of C and
we're going to do this today and make a few deliberate mistakes because with this power of going to or getting the
address of any variable I could just arbitrarily right now write code that stores a value at like bite two billion
or zero or anything in between but that also means potentially I could start kind of creepily looking around at all
of the computer's memory even at things that I didn't put there maybe other programs maybe other parts of programs
and indeed this is a potential security threat if suddenly you're able to just look anywhere want in the computer's
memory now I'm overselling it a little bit because nowadays in this uh decade there are some defenses in place in
compilers and in our operating systems that do hedge against this a little bit but this is still a very frequent source
of problems and later today we'll talk briefly about things called stack Overflow which is not just a website it
is a problem that you can encounter Heap overflow and more generally buffer overflows there's just so many things
that can go wrong using this language called C and if any of you have encountered a segment
fault yet I think we saw a few hands for that already you touched memory that you shouldn't have and odds are you did it
most recently by going too far in an array going to the left or negative in an array or somehow looking at memory
you shouldn't have and we'll explain today why it is you were able to do that other questions on these Primitives so
far yeah from Carter good question earlier we used star P let me rewind in time to the
previous version of this code where I actually had a variable called P just like with variable declarations in the
past once you've declared a variable to be an INT a Char a bull or an INT star AKA a pointer you don't thereafter keep
using the word int or now the star once you've declared it that's it you only refer to it by name and so it's very
deliberate what I did here name uh uh saying that the type here is in star that is a pointer to an INT but here I
just said the name of the variable as always I didn't repeat int and I also didn't repeat star but at the risk of
kind of bending one's Minds a little bit there is unfortunately one other use for the star operator and that's as follows
if you want to print out not the address of something but what is at a specific address you can actually do this if I
want to print out the integer via percent I that is at that address I can actually use the star here which
technically contradicts what I just said but it has a different function here different purpose so let me go ahead and
do this in two different ways I'm going to leave this line of code AS is but I'm going to add another line of code now
that prints out what apparently will be an integer in a moment so percent I back sln and I could cheat and let me just do
n for now so there's really nothing special happening now I'm just adding a sort of mindless printing of n so make
address do slash address there's the current address of N and there's the value of n but what's kind of cool about
C here too is if you know that a value is at a specific address like P there's one other use for this star operator the
asterisk you can use it as the soc called D reference operator which means go to that address and so here what we
actually have is an example of a pointer P which is an address like ox123 or ox7 FF and so forth but if you say star P
now you're not redeclaring the variable because I didn't mention int you're going to that address in P so let me
recompile this now make um make address do slash address and just to be clear what should I see I'm first going
to see the pointer itself Ox something what's the second line of output I should presumably see
now just shout a little louder so I'm bearing 50 and that's true because if you figure out the address of N and
print it in line seven but then go to the address of n AKA P that's indeed going to just show you the number n the
value of n again all right any questions now on this syntax and I will concede I think this is confusing the fact that we
use the star for multiplication the fact that we use the star to declare a pointer but then we use a star in a
third way to dreference the pointer and go to the pointer it's just confusing honestly but with practice comes Comfort
yeah good question do you when you are using the uh the Amper sand operator to get the address of something the onus is
on you at the moment to know what you are getting the address of is it a string is it a Char is it a bull is it
an INT I wrote this code so I know in line six that I'm trying to get the address of what is an
integer in line8 you don't have to worry about that good question notice in line eight I didn't tell the computer other
than the percent I what kind of address I'm going to but I did already in line six I told the compiler that P now and
forever is going to be the address of an INT that's enough information in advance so that print F or really the language C
still knows on line 8 that P is a pointer to an INT and that way it will print out all four bytes at that address
not just part of it and not more than those four bytes good question yeah next to
you do pointers have pointers yes we won't um sort of do this today by having pointers to pointers but yes you can use
star star and then things get in I'm sorry we won't do that today and we won't do that often in fact Python and
other language is just a couple of weeks away so hang in there almost there uh question back here was there that was
amaz good more verbal feedback like that is helpful as we Forge into the more complicated stuff other questions
yeah what's the point of printing the address like sure what's the point of doing this
if you don't mind let me let's get there in a moment this is not the common use case just printing out the address like
who really cares um at the moment we care only for the sake of discussion we're soon going to start using these
addresses so hang in there just a little bit for that one too but it will uh solve some problems for us before long
so let's actually just now depict what was going on inside of the computer's memory just a moment ago so if I toggle
back here let me redraw my computer's memory let me plop into the memory n which is storing in this program the
number 50 where is p in my computer's memory specifically I don't know and apparently it moves around each time I
run the program so for the sake of discussion let's just propose that if 50 ended up at address ox123 I don't know P
ends up over here at address whoops uh at whatever address this is here but notice a couple of Curiosities now if p
is a pointer it's the address of something so the value in P should be an address and I've indeed written it as
such ox123 and technically there's not an X there there's not a zero there there's not even a one two 3 there per
se there's a pattern of bits that represents the address ox123 but again that's week zero don't care about binary
dayto day so if this is p and this I claimed was n why is p so much bigger can someone conjecture here because it
turns out whether n is an INT or a Char or a bull which are different types heck even along it turns out that P is always
going to take up eight squares on the board but why might that be What might explain
that yeah thoughts okay fair maybe it's allocating eight bytes because it doesn't know the
type turns out that's okay because an address is an address it's really up to the programmer to use it as a string or
or bull other thoughts okay possibly it could be that pointers have some complexity like a
backlash n or something curious like that like we talked about for Strings turns out that's not the case it turns
out that pointers nowadays typically are but not always are eight bytes AKA 64 bits because you and I our Macs RPS heck
even our phones have a lot more memory than they did years ago back in the day a pointer might have only been 32 bits
or even only eight bits way back in the day but consider 32 bits because that was the norm for some time how high can
you count roughly if you've got 32 bits what's the number we keep rattling off 32 bits is
roughly 2 to the 32 so it's 4 billion and I keep saying it's two billion if you do negative but in the world of
memory I there's a reason I keep saying 2 billion bytes 2 gigabytes because for a very long time that was the maximum
amount of memory a computer could have why because the pointers that the computers were using were only for
instance 32 bits and with 32 bits depending on whether you allow for negatives or not you can count as high
as 2 billion roughly or maybe 4 billion but you know what your Mac your PC your phone could not have had 5 gabt of
memory or 5 billion bytes of memory you certainly couldn't have had what computers nowadays come with which might
be 8 gigabytes of memory gab of memory why because with four bytes or 32 bits you literally physically can't count
that high which means if I drew a picture of all of the memory we would run out of numbers to describe them
which means like most of my memory would just be unusable so pointers nowaday are 64 bits or eight bytes that's really big
I can't even pronounce how big that number is but it's plenty for the next many years and so we've drawn it that
way on the board here now let's just kind of abstract this away let's get rid of all the other bites that are storing
something or nothing nothing else and let's now start to abstract away this complexity because the reality is to
your question earlier you know what is this useful for what do we do we actually care about these addresses
generally no we're doing this so that you see there's no magic we're just moving things around and poking around
in memory but what a person would typically do when talking about pointers would literally be to just point at
something like I really don't care what address n is at so it suffices when General when drawing pictures on a
whiteboard having a discussion with another programmer you just kind of draw an arrow from the pointer to the value
in question because neither you nor I probably care about the specifics of ox whatever and there's your pointer it's
literally an arrow and we can kind of see this so it turns out that these pointers these addresses are not that
disimilar to what we've done for hundreds of years in the form of a postal system um for instance here is a
post office here no here is a mailbox and suppose that this is a uh mailbox labeled P it's a pointer and suppose
there's another mailbox like way over there which is just another bite of my computer's memory what are we really
talking about well you store in a computer's memory values like the number 50 or the word like High inside of your
computer's memory at some location but today we can also use those same memory locations to store the address of things
for instance um if I open this up here and I see okay the value inside of this mailbox is not a number like 50 It's
actually an address ox123 that's kind of like a pointer sort of a breadcrumb leading from one location in memory to
another and in fact with someone who's seated roughly over there do you mind getting the mail over there any
volunteers over in this section just need you to get to the mailbox before I do who's who's being volunteered oh yes
please whoever is uh gesturing wild most wildly come on down sure what's your name an an say again an Anu
okay come on up to the edge of the stage there and just to be clear if this is P that is apparently n but to make clear
what we're talking about when we're storing Ox whatever values like ox123 that's essentially equivalent to my you
know maybe pulling out something like this and just abstractly pointing to your mailbox there or if you prefer
pointing to the mail okay all right oh thank you all right this is akin to me pointing at
your mailbox and if you want to go ahead and open your mailbox and reveal to the crowd what's inside your mailbox labeled
n all right thank you we have a little a cs50 stress ball
for your trouble thank you for coming up so that's just to put a visual on what it is we're talking about because it can
get very abstract very Crypt it quickly when we're talking about addresses and memory and drawing it like these little
squares but if you think about just walking into a post office or an apartment complex that's got a lot of
mailboxes those mailboxes essentially are a big chunk of memory and each of those mailboxes has an address this is
Apartment One 23 Apartment 2 billion and inside of those mailboxes can go anything that can be represented as
information it could be a number like n or 50 or if you prefer it could be a number that represents the address of
another mailx box and this is a kin really if you've ever had an apartment or you and your parents have moved to
having a forwarding address it's like having the post office in the US put some kind of piece of paper in your old
mailbox saying actually forward it to that other mailbox that really is all a pointer is doing at the end of the day
it's just a number but it's a number being used in a different way and it's the syntax that we've introduced not
just in but in star that tells the computer how to treat that number in this slightly different way all right
any questions then on this yeah in back if I did in see and said it can you
say the code again once more equ equal to n so let me actually type it out if I give myself
another line of code tell me one last time what to type in is equal to n like this so this is
okay and I can't draw it quite quickly enough on the board here but this would be like creating another four bytes
somewhere in memory maybe down here that stores an identical copy of 50 because the assignment operator from right to
left copies one value to another so that would just add one more rectangle of size four to this particular
picture if I'm answering your question as intended okay so that is sort of sort of week one style use of assignment
operators before pointers I could though start copying pointers but again we'll come back to some of that complexity any
other questions here that was a great question ah good question short answer no to repeat for the camera if I create
a second variable like this in C equals n and I claim without actually drawing it on the board that this gives me
another rectangle the value of which is also 50 P does not get touched and this is what important and really
characteristic of C nothing happens automatically for you like no p is not going to be updated unless you update p
in some way so creating a third variable called C even if you're copying its value from right to left that has no
effect on anything else in the program a good question so what have we seen that's perhaps now a little more
explainable well recall that we talked quite a bit last week about strings and just to recap in Lay person's terms like
what is a string as you now understand it say let me take a specific hand here what's a string how about uh over here
aray okay sure both of you are right an array of characters an array of characters and we I claimed or revealed
last week that string is not technically a feature built into C like it's not an official data type but every programmer
in most any language refers to sequences of characters Words letters paragraphs as RS so the vernacular exists but the
data type doesn't typically exist per se in C so what we're about to do if you will for dramatic effect is take off
some training wheels today the cs50 library implemented in the form of the header file cs50.h we claim has had a
bunch of things in it prototypes for get string prototypes for get int and all of those other functions but it turns out
it also is what defines the word string in such a way that you all can use it these past several weeks so let's take a
look at an example of a string in use here for instance is a tiny bit of code that uses uh the word string uh creating
a variable called s and then storing quote unquote High exclamation point let's consider what this looks like now
in the computer's memory I don't care about all the other bites let's just focus on these and this per last week is
how high might be stored Hi exclamation point and then one more as someone already observed that Sentinel value
that null character which just means eight zero bits to demarcate the end of that string just in case there's
something to the right of it the computer can now distinguish one string from another so last week we introduced
this new syntax well if strings are just arrays of characters you can then very cleverly use that square bracket
notation and go to location zero or one or two which are kind of sort of like addresses but they're relative to the
string right this could be at ox123 or ox4 56 but with this bracket notation zero is always the beginning of the
string one is the next two is the next and so forth so that was our array Syntax for indexing into an array but
technically speaking we can go a little deeper today technically speaking if high is starting at the address
ox123 then it stands to reason that I is at o x124 uh exclamation points at o x125 and
the B the null is at o x126 now I don't care about one 123 per say but even though this is heximal this is correct
math even in HEX if you just add one when you start ox12 3 the next number is four 5 six at the end I don't have to
worry about A's B's and C's cuz I'm not counting that high in this example so if that's the case and my computer is
actually laying out the word high in memory like that well what exactly is s right what exactly is s if at the end of
the day hi I exclamation point null is storing is are stored at these addresses like where is s like now that I've kind
of taken off those training wheels and showed you where Hi exclamation point null actually are what happened to S
well S as always is actually a variable right even in the code I proposed a moment ago s is apparently a data type
that yes doesn't come with C but cs50's Library makes it exist s is a variable of type string so where is s in this
picture well it turns out that s might be up here again I'm just drawing it anywhere for the sake of uh discussion
but s is a variable per that line of code what s is storing apparently I claim is
ox123 I actually don't really care about these addresses so let's abstract that away s is apparently as of now today one
week later just a pointer to a character specifically the first character in s and this is kind of the last piece of
the puzzle last week we had this clever way of demarcating the end of a string well it turns out that strings are
represented in the computer's memory as a variable that is a pointer inside of which is the address
of the first character in the string so if s is got points at the first character and you can trust that back
sl0 is at the end of the string that's literally all you need to figure out where a string begins and ends so what
do I mean by this well let's be a little more Concrete in terms of this picture if I've started with this line of code
here it turns out all this time since week one that the word string has just semi secretly been an alias
for Char star I know so Char star so why does this make sense it's a little weird
still but if in our previous example we were able to store the address of an integer by declaring a variable called p
as in Star P well if as of now strings are just the address of the first character in a string then probably a
string is just a Char star because that means s is the address of a character the very first character in the string
now the string might have three letters like it did or four or even 100 if it's a long paragraph but that's fine because
you can trust that there's going to be that null character at the very end so this is a general purpose way of
representing strings using this new mechanism in C so in fact let me go ahead here and introduce maybe a couple
of manipulations of this let me go back to my code here and let's get rid of this integer stuff and let's instead Now
do for instance this let me add in the cs50 library so we include cs50.h for now I'm going to go ahead and inside of
main give myself a string s equals High exclamation point I don't type the back0 C does that for me
automatically uh by using my double quotes like this now let me just go ahead and print it so this again is sort
of week one style stuff where I'm just printing a string no pointers yet so let me do make address enter
SL address and hopefully I see hi so nothing new there but let's start to peel back some of these layers here let
me first of all get rid of the cs50 library for a moment and let me change string to char star and it's a little
bit weird but yes the convention is to say Char a space then the star and then immediately thereafter the name of the
variable strictly speaking though you might see textbooks or websites that do it like this or like this but the
canonical way is typically to do it like that so now no more cs50 Library no more training wheels if you will I'm just
treating strings for what they really are let me go ahead and do make address enter so far so good/ address and that
too still works so percent s is a thing that comes with printf because the word string is programmer terminology but
strictly speaking C doesn't have a string data type it's always been Char star so what this means now is I can
start to kind of have some fun with the these basic IDE is even though this is not purposeful other than for the sake
of discussion but if s is this let me go back and give myself the cs50 library let's put those training wheels back on
for just a moment so that I can do one manipulation at a time here's my string S as before well let me go ahead and
declare a Char called C and let me store the first character in the string there which is s bracket zero and that should
give me H and then Just for kicks let me go ahead and do Char star whoops let me go ahead and do Char star P equals
Ampersand C and see what this actually prints for me let me go ahead and print out what p is here so we're just playing
around so make a dress So Far So good/ address all right so what have I just done I've just created a Char C and
stored in it the letter H which is the same thing as s braet i then I'm saying what's the address of c and that's
apparently ox7 FF whatever so that's the address but I technically didn't have to do that let me go ahead and do two
things now instead of just printing P let me go ahead and print out maybe s itself let me go ahead and do make
address enter so far so good slash address and damn it what did I do oh shoot I didn't want to do that oh
I really made a mess of this um what did I want to do here that was supposed to be impressive
but it was the opposite so let me turn it around so if I intended to do this why are nine lines nine and 10 printing
different values didn't really intend to go here but let me try to save this why are we seeing different addresses namely
this address 40204 for S and then ox7 FF for any thoughts yeah over
here correct so if I really wanted to weasel my way out of this this is a great answer to the previous question
which was about what if I introduce another variable C that's a copy of the value and not in this case an INT but an
actual Char here I've made C be a copy of the car that's at the beginning of s but that's
indeed a copy so if I were to draw it on the screen that would give me a different rectangle in which this copy
of H would actually be stored so I didn't intend to do this but what you're seeing is yes the address of s and
apparently that's at a pretty low address by default here then you're seeing the address of C but even though
each of them is H I claim one is at a different address in memory and this is always been happening anytime you
created one variable or another it was ending up here or here or here or somewhere else in memory now for the
first time all we're doing is actually just poking around the computer's memory to see what is actually there so let me
actually back this up a little bit and do what I intended to do here which was something like this so if string s
equals quote unquote High let's go ahead and give myself a pointer called P to the first character in
s all right so now let me go ahead and print out the value of this pointer percent P printing out P so we're just
going to do one thing at a time so make address enter address there at the moment is the address of the first
character in s what I meant to do now was this if I want to print out two things this time let me print out not
only what p is but also what s itself originally is because if I claim that everyone from last week should be
comfortable with s bracket Z just representing the first character in s by definition of strings being a rays of
characters then S as of today is itself the address of a character the first one in s so if I now do make address and do
dot slash address this time I see the same exact things thank you this is really like the lamest sort
of thing to be applauding over but what we're demonstrating here is that s is by definition the addess of the first
character in C so if we borrow some of our mental model from last week well if S braet 0 is the first character in C
doing the Amper sand on that expression should be the same as s now this isn't to say that we would kind of jump
through these hoops all the time with this much syntax but this is just to do proof by example that s is in fact as I
claimed a moment ago just the address of a character not even multiple characters it's the address of a single character
but the key thing is it's the address of the first character in the string and per last week we trust that c is going
to look for that null character at the very end just to make sure it knows where the string actually ends all right
a question came up over here correct to summarize on line 8 when I am using percent P that just means print a
pointer value so Ox something I'm passing it s previously when we used percent s print F new to print not just
the first character of s but h i exclamation point and then stop when it hits the back slash zero percent p is
different percent P tells the computer to go to that address sorry tells the computer to print that address on the
screen so this is where percent s all this time has been powerful the reason print F
worked in week one and two and three was because print F was designed by some human years ago to go to the address
that's being passed in for instance s and print out character after character after character until it sees the null
character back0 and then stop printing it so that's you're getting a lot of functionality sort of for free from
percent s today we're using something much simpler percent P which just literally prints what s is and the
reason we don't do this in week one is just because this is like way too much to be interesting when all you want to
print out is hi or hello world or the like but now what we're really doing is revealing what's been going on this
whole time and let me make one other example here let me go ahead and get rid of this variable here and let me just
print out a few things to make the same point I'm going to print out not just a s like I did here but let's go ahead and
print out every the address of every character in s so let's get the first letter in s and get its address and I'm
going to do copy paste for time sake but not something I would do frequently so let me print out the address of the
first character the second character the third and actually even the fourth which is the back sl0 by doing this so when I
compile this program make address address I should see two identical values and then additional
values that are one bite away in my diagram a moment ago my addresses were arbitrarily ox123 124 4 125 126 now it
starts at by chance Ox 4024 which is s o x424 is the same thing as s because I'm
just saying go to the first character and then get its address those are one and the same now and then after that is
ox42 o5 o6 O7 because that is just like the diagram go to the I to the exclamation point and to the null
character so all I'm doing now is using my new found understanding of what emper sand does and what the star does is I'm
just kind of playing around I'm poking around in the computer's memory just to demonstrate there's no magic it's all
there very deliberately because I or print F or someone else put it there yeah really good observation so it's
indeed the case that high unlike 50 is kind of ending up at a very low address not the ox7 FF wherever it was
that's actually because long story short strings are often stored in a different part of the computer's memory more on
that later today for efficiency there's actually only going to be one copy of the word high in exclamation point and
the computer is going to tuck it at sort of the beginning of my memory but other values like ins and floats and the like
they end up lower in memory by convention but a good observation because that is consistent here all
right so a couple final details then on what's been going on here let me go ahead and claim that we implemented Char
star or rather string as a Char Star as follows as of last week we were writing this code as of this week we can now
start writing this code because charar specifically we invented in the cs50 library but it turns out you've seen a
way of inventing your own data types recall this thing here we played around last time with data structures or the
struct keyword in C and briefly the type def keyword which defines a type for you and if I highlight what's interesting
ing here the way we invented a person data type last time was to define a person as having two variables inside of
it a structure that encapsulates a name and encapsulates a number now even though the syntax is a little different
today because of the star thing notice that this could be a similar application of that idea if I want to create a type
called string highlighted in yellow here then I use typ def to make it defined to be char star so this is literally all
that has ever been in cs50.h in addition to those prototypes of functions we've talked about type def charar string is a
oneline code that brings the word string as a data type into existence and that's all that's ever been there but the star
the Char star it's just too much in week one we wait until this point to sort of Peel back that layer are any questions
then on what a string is what star or the Ampersand are doing yeah oh my God massive spoiler but yes if
that is that why when you compare two strings as I briefly did uh or almost did uh you problems arise and in fact
yes last week we used stir compare St strcmp for a very deliberate reason because yes the spoiler is I
accidentally would have compared two addresses in memory not the strings at those
addresses other questions here now are well before we give ourselves maybe a 10-minute break here
um we have lots of pieces of paper if anyone wants to come on up and play with this big stack of posits if you want to
make your own 8 by8 grid of something to share with the class if you're artistically inclined come on up
otherwise let's take 10 minutes and we'll return after 10 all right so let's come back to this question of how we can
start to use these pointers and these addresses ultimately in an interesting way the goal ultimately next week is
going to be to use these addresses to really stitch together more complicated data structures than just persons like
last week or candidates in the context of like an electral algorithm if you will and actually really use our memory
in the most versatile way to represent not just images but maybe videos and other two-dimensional structures as well
but for now let's come back to this address example widle it down to just a high initially and see what's going on
again here underneath the hood so let me read the cs-50 library just so we use our synonym for a moment that is the
word string and I'll redefine S as a string ring and what I didn't mention before is that these double quotes that
you've been using for some time are actually a little special the double quotes are a clue to the compiler that
what is between them is in fact a string as we now know it which means the compiler will do all the work of
figuring out where to put the H the I the exclamation point and even adding for you automatically a back SL zero and
what the compiler will do for you to is figure out what address all four of those chars ended up at and store it for
you in the variable s so that's why it just kind of happens with strings without using Amper Sands or even Stars
explicitly but the star at least has been there because again string is just synonymous now with Char star it's not
really as readable but it is now the same idea so I'll leave string in place just to do something sort of week one
style here for a moment and let's go ahead and print out a few characters so I'm going to use percent C this time and
I'm going to print out s bracket 0 and then I'm going to print out s bracket one and S bracket
2 literally doing sort of week uh three style from last week a printing of every character in s as though it were an
array so/ address should give me Hi exclamation point and if I really want to get curious technically speaking I
could print out one more location and let me go ahead and recompile make address address and there is it would
seem the back SL zero I'm not seeing Zero because I didn't type literally the zero Char in ASI it's literally 8 zero
bits which are technically unprintable if you will in print F speak and so what I'm seeing here is like a blank symbol
that just means there is something else there it's apparently all 8 Z bits but they are uh there even though we're not
seeing them literally right now well let's go ahead and peel back one of these layers and let me go ahead and get
rid of the cs50 library and get rid of therefore the word string because again henceforth it's just Char star nothing
else is different I'm going to now do make address do/ address and it's the same exact thing and now let's just
focus on the high rather than even worry about that so I'm going to recompile one last time and now I have Hi exclamation
point well it turns out that the array notation we used last week was technically some of this syntactic sugar
sort of a neat way to use syntax in a useful way but we can see more explicitly today what the square
brackets for a string is actually doing let me go ahead and do this let me adventurously say I want to print out
not S braet 0 but I want to print out whatever the first character of s is so to be clear what is s now it's the
address of a string okay but what is s really s is the address of the first Char in a string and again that's
sufficient for defining a string because eventually the computer will see that there's a back slash n at the end of it
so s is specifically the address of the first character in a string so that means using my new syntax if I want to
print out that first character I can print out star s because recall that star is the D reference operator when
you don't repeat the word Char you don't repeat the word int you just use the star here that means go to that
address similarly if I in my sort of new found knowledge of how strings work know that the H comes first then the I right
after it then the exclamation point then the back slash zero contiguously one bite apart I could kind of start to do
some arithmetic I could go to s+ one byte and print out the second character and I could print out whatever is at s+
2 in fact doing what's generally known as pointer arithmetic literally treating pointers as the numbers they are heximal
or decimal doesn't really matter it's still just numbers and go ahead and add one bite or two B to them to start at
the beginning of a string and just kind of poke around from left to right so this now is equivalent to what we did
last week using square bracket notation but now I'm reimplementing that same idea with the sort of lower level
Plumbing understanding Amper Sands and stars now a little bit more so if I remake this program and do do/ address I
should still see Hi exclamation point but what I'm really doing is just kind of demonstrating hopefully my my sort of
understanding of what really is going on in the computer's memory now programmers who are maybe trying to show off might
actually write this syntax I think the more common syntax would be what we did last week s bracket 0 S bracket 1 y it's
just a little more readable and like we don't need to sort of uh brag about or care about this underlying
representation the square brackets last week were an abstraction if you will on top of what is lower level math but
that's all that's going on underneath the hood we're poking around from bite to bite to bite all right let me pause
here see if there's any questions on that one any questions on
this let's do one more then just to demonstrate that this is not even specific to Strings let me go ahead and
get rid of all of this and let me give myself an array of numbers like I did last week so if if I don't uh if I'm
going to declare all the numbers at once using this funky uh curly brace notation I can do like 4 6 8 2 7 5 Zer so seven
different numbers inside of an array that's automatically initialized like this I don't strictly speaking need to
say seven the compiler is smart enough to figure out how many numbers I put with commas between them and that just
gives me an array containing four 6 82750 so it turns out I can print each of these numbers in the familiar way I
can do a print F of percent I back sln and I can print numbers bracket zero and let me just do some quick copy paste
just to print the first three of these theoretically that should print out 4 6 8 and so forth but I can do the same
sort of manipulation understanding what pointers now are using pointer arithmetic so let me actually unwind
this and just go back to one print F and instead of printing numbers bracket zero like I might have last week let me just
go and print out whatever is at that address so asterisk numbers let me then print out the second digit which is
going to be whatever is at numbers + one and then let me do this further and do whatever is at numbers plus two and if I
really want to repeat this let me do it four more times and do what's at uh location three four five and six and
that's seven total numbers because I started counting at zero so let me just quickly run this make address do SL
address there are those seven digits being printed but there's something subtle but also kind of useful here each
of these digits 4 6 8 2750 is an INT why because I made an array of integers but think back how big is a typical integer
have we claimed four bytes or 32 bits so it's worth noting that I don't really need to
worry about that detail notice that I did not do plus 4 plus 8+ 12 plus 16 plus 20 right I the programmer strictly
speaking don't need to worry about how big the data type is this is the power of pointer arithmetic the compiler is
smart enough to know that if you add one to this pointer that is the same as saying go one more uh piece of data not
just one bite so if it's an INT move four if it's a second int move eight if it's a third int move 12 pointer
arithmetic sort of handles that annoying arithmetic for you so you can just think of this as a number after a number after
a number that are back to back to back but not one bite apart but four bytes apart which is only to say plus one plus
two plus three Works no matter the data type why because the compiler knows what type of data you're talking about now
there's one other detail I should reveal here that I've kind of taken for granted in the past I was using double quotes to
represent strings and I claim that the compiler smart enough to realize that oh if I have double quote high that means
it's an array of Hi exclamation point and then the back slash zero notice this usefulness it turns out that you can
actually treat arrays as though the name of the array is itself a pointer and this is actually going to be something
useful in upcoming problems when we want to pass arrays around in the computer's memory notice that strictly speaking on
line five there's no pointers going on there's no star there's no Ampersand there's nothing new there and yet
instantly on line seven I'm sort of pretending that it is the address and this is actually okay it turns out that
an array really can be treated as the address of the first element in that array the difference is that there's no
secret back slash zero anywhere like this is just part of the phone number here the ending in zero that's not like
a special backslash zero so this is something we're going to take advantage of too before long there's this sort of
inter relationship between addresses and uh arrays that just generally allows you to treat one as though it is the other
but the math is taken care of for you are any questions then on on this before we start to solve some some bigger
problems yeah potentially if you you go beyond the end of an array you might get a
segmentation fault the problem is that that symptom is sometimes non-deterministic which means that
sometimes it will happen sometimes it won't it often depends on how far off the end of the array you actually go
you'll often not induce a segmentation fault if you just kind of poke a little too far but if you go way too far it
quite likely will but we'll give you a tool today actually for detecting and solving exactly that kind of situation
so let's go ahead now and do something a little different in code but that actually comes back to that spoiler from
earlier let me go ahead and create a program called compare. C and in this program I'm going to go ahead and allow
myself uh the cs50 library not so much for string but so that I can actually use get int still which is way easier
than the way we'll see that c normally lets you get input let me give myself standard i.h do an INT main void not
worrying about command line arguments today and let me go ahead and get an in I using get int and ask the human for
the value of I then let me give myself an INT J ask the user for another int calling it J and then let me go ahead
and kind of naively but to your point earlier if I equals equals J then let's go ahead and print out something like
same back sln else let's go ahead and print out different if they are not in fact the same so that would seem to be a
program that compares the value of two integers all right so let's go ahead and run make compare so far so good/ compare
okay I will be 50 i j will be 50 they're the same let's do it once more I will be 50 J will be 42 they are different so so
far so good in this first version of comparison but as you might see where I'm going with this let's move away from
integers and let's actually change these things to char to Strings so I could do string s over here get string s over
here then I could do uh string T over here and get string over here asking the user for T this time here and then I can
compare the two if s equals equals T and this is a common convention if you've used s for string already you can use T
for the next one at least for simple demonstrations like this I'm going to compare the two just like I did for in
which worked great make compare so far so good do address oh uh sorry wrong program compare let me go ahead and type
in something like uh High exclamation point point and by exclamation point which of course should
definitely be different let me run it again with high exclamation point and high exclamation
point huh different maybe I I messed up let's maybe do lowercase is maybe that'll fix but no those two are
different so to come back to what I described as a spoiler earlier what's the fundamental issue here to be
clear why is it saying different even though I pretty sure I typed the same thing twice yeah
yeah this is where it's now useful to know that string has been an abstraction a training wheel if you will and if we
take that away still use get string because that's convenient still but if I change string to be Char star it's a
little more explicit as to what s and what T are s is a pointer to a Char that is the address of a Char T is a pointer
to a Char that is the address of a Char specifically the first character in s and the first character in t
respectively so if I'm comparing these two it should stand a reason that they're going to be different why
because s might end up here in memory and T might end up here in memory each time I call get string it is not smart
enough or Advanced enough to know that wait a minute you type the same thing I'm just going to hand you back the same
address that doesn't happen because we did not design get string that way each time I call get string it returns
apparently a different copy of the string that was typed in a high over here and a high over here they might
look the same to the human but to the computer they are different chunks of memory and therefore at different
addresses and here too we can reveal what is get string returning well up until today it was returning a string so
to speak that's not really a thing technically what get string has always been doing is returning the address of
the first Char in a string and trusting that we put a back slash Zer at the end of whatever the human typed in and
that's enough now for printf for Sterling for you to know where a string begins and ends so get string has
actually always returned a pointer it has not returned a quote unquote string per se but there are functions that can
solve this comparison for us recall that I could do something like this I could actually go in here and I could uh let's
see where was it so if I uh include stir compare here and use it to pass in two values s and t let's see now what
happens when I make compare huh implicitly declaring Library functions compare with type int and well
there's a star so you might have seen this error before and you might have uh ignored most of it but there's some
evidence of stars or pointers going on here uh looks like I didn't include the string.h header file so that's an easy
fix include string.h which despite its name does not create a data type called string it just has string related
functions in it like stir compare let's make compare again now it compiles compare now let's type in high
exclamation point and even the same thing again these are
now oh I used it wrong okay user error that was supposed to be impressive but it's the opposite what did I do
wrong what did I do wrong here yeah yeah yeah it returns three different
values zero if they're the same positive one becomes before the other negative if the opposite is true I just forgot that
so like I did last week correctly if I want to compare them for quality per the manual page I should be checking for
zero as the return value now make comparecompare enter let's try it one last time high and high okay now they're
in fact the same and justtin thank you and indeed not that it's returning same all the time if I type in high and
then buy it's indeed noticing that difference as well well let me go ahead and do one other thing here let's do one
other thing let me go ahead now and just reveal more pictorially what's going on let's get rid of the string comparison
and let's just print these things out the simple way to print this out would be with percent s and again percent s is
special print F knows take in an address and start there print every character up until the back sln so let's just hand it
s and do that and then let's do one more percent s comma T this is again sort of a mix of week one and this week CU I got
rid of the word string I'm using charar but I'm still using print F and percent s in the same way let me go ahead and
run compare now and if I type high and high I should see the same thing twice so they look the same but here now we
have the syntax today to print out the actual addresses of these things so let me just change the S to a p because p
means don't go to the address and print it it means just print the address as a pointer so make compare do/ compare and
now let's type in high and once more and I should see indeed two slightly different addresses given
in heximal one's got a B at the end one's got an F at the end and they are indeed a few bytes apart so this is just
confirming what our suspicions have actually been so what does this mean perhaps in the computer's memory well
let's kind of take a look I've zoomed out so I have a little more uh squares to look at it once here might be s in
memory when I do string s equals or charar S equals I get a variable that's of size 1 2 3 4 5 6 7 8 cuz I claim
earlier that on Modern systems pointers are generally eight bytes nowadays so they can count even higher and inside of
the computer's memory also might be high and I don't know where it ends up so for the sake of discussion it ended up down
here that's what was free when I ran the program Hi exclamation point back sl0 maybe it ended up for the sake of
discussion at ox1 2 3 4 five and six so to be clear what is s storing once the assignment operator copies from right to
left what is s storing if I Advance one more slide yeah
0x123 the presumption being that if a string is defined by the address of its first charart and that address of its
first Char is ox123 then that's indeed what should be in the variable s and so technically that's what's been happening
with that assignment operator from right to left get string indeed returns a string so to speak but more properly it
Returns the address of a Char what's been then copied from right to left using that assignment operator all these
weeks is indeed that address now technically we don't really need to care about where these addresses are it
suffices to just think about them sort of referentially but let's first consider where T might be T is just
another variable that I created on my second line of code maybe it ends up there maybe somewhere else for the sake
of discussion I'll draw it left and right where did the second uh word end up that I typed in well suppose the
second copy of high ended up at ox4 456 457 458 459 what ended up in t I'll pluck this one off myself Ox 456
presumably and so this is now a pictoral representation of why and let's abstract away everything else when I compared s
against T using equal equals based on the picture they're obviously not the same one is over here one is over here
and per a moment ago one is ox123 the other is Ox 456 yes technically they're pointing at something that's the same
but that just reveals how stir compare Works stir compare is apparently a function that takes in the address of a
string as its argument and the address of another string as its argument it goes to the first character in each of
those strings respectively and probably has like a for Loop or a while loop and just goes from left to right comparing
looking for the same chars left and right and if it doesn't notice any differences boom it returns zero if it
does notice a difference it returns a posit positive or A negative value and that's very similar recall to how we
implemented string length ourselves last week I sort of used a for loop I was looking for a back SL zero stir compare
is probably a little similar in spirit looping from left to right but comparing this time not just counting are any
questions then on string comparison and why it is that we use Stir compare not equals equals yeah
Poes do pointers have addresses yes so we won't do that today but I could actually use the Amper sand operator on
S or on T that would give me the equivalent of a Char star star that itself could be stored elsewhere in
memory that's where it ends we don't sort of do that recursively forever there's star and there's star star but
yes that is a thing and it's very often useful in the context of like two-dimensional arrays which we haven't
really talked about but that is a feature of the language too but not today good question all right so what
might we now do to take things up a notch well let's go ahead and Implement a different program here that maybe
tries copying some values just to demonstrate this let me open up a file called uh how about
copy.c and I'm going to start off with a few includes so let's include the cs50 library just so we have a way of getting
user input let's include um how about standard IO as always let's preemptively include string.h and maybe one other in
a moment let's do int main void as before and then in here let's get a string from the user and just call it s
for Simplicity and heck we can actually just call this uh charar if we want or string
since we're using the cs50 library but we'll come back to that let's now make a copy of s and do s equals T using a
single assignment operator and then let's check something like this let's go into the first character of T which is t
braet 0 and then let's uppercase it using that function that we've used in the past of two upper t braet 0
semicolon and actually I should go back up here if I'm using two upper or if you've used two lower or is upper or is
lower I might not remember this offand but it was in another header file called ctype.h there was a bunch of helpful
functions in that Library as well now at the very last line of the program let's just print out what both s and t are by
simply printing out percent s for each of them and T is percent s also not percent T of course and let's see what
happens here so let me make copy oh my God so many mistakes what did I do wrong oh okay
that was unended string T equals s sorry so I'm creating two variables s and t respectively and I'm copying s into T
make copy enter there we go do/ copy and let's now type in for instance uh how about high exclamation point in our
lowercase this time and now what gets printed huh I don't think that's what I intended
so to speak here because notice that I got s from the user so that checks out I then copied T into s which looks correct
that's what we always use assignment for then I uppercase the first letter in t but not s at least in my code then I
printed s and t and then notice apparently both s and t got capitalized so if you're sort of getting
starting to get a little comfortable with what's going on underneath the hood like what what's the fundamental problem
here why did both get capitalized why did both get capitalized yeah over
here yeah they're representing the same address so C is really literal if you create another variable called T and you
assign it the value of s you are literally assigning it the value in s which is like ox123 or something like
that and so at that point in the story both s and t presumably have a value of ox123 which means they technically point
to the same h exclamation point in memory nowhere did I tell the computer to give me a copy of Hi exclamation
point per se I literally said just copy s so here's where an understanding of what s literally is kind of explains the
situation I'm only copying the pointers so what actually went on in memory let's take a look here at this grid if I
created s initially maybe it ends up here and I created high in lowercase and it ended up down here then the address
was again like ox1 2 3 4 5 6 ox123 is what's in s if then I create a second variable called T and I call it a
string akr star maybe it again ends up here but when I copy s into T by doing T equals s semicolon that literally just
copies s into T which puts the value ox123 there so if we now abstract away all these numbers and just think about a
picture with arrows what we've drawn in the computer's memory is this two different pointers but storing the same
address which means the breadcrumbs lead to the same place and so if you follow the T breadcrumb and capitalize the
first letter it is functionally the same as copying the uh changing the first letter in the version S as
well so what's the solution then to this kind of problem like even if you have no idea
how to do it in code like what's the gist of what I really intended which is I want a genuine copy of
s called T I want a new Hi exclamation point back sl0 what do I need to do to make that
happen thoughts so there is a function called stir copy ST strcpy which is a a
possible answer to this question the catch with stir copy is that you have to tell it in advance not only what the
source string is the one you want to copy you also need to pass in the address of a chunk of memory into which
you can copy the string and here here's one thing we haven't seen yet and we need one more building block today if
you will we haven't yet seen a way to create new chunks of memory then then let some other function copy into them
and for this we're going to introduce something called dynamic memory allocation and this is the last and most
powerful feature perhaps today whereby we're going to introduce two functions maloc and free where Malo means memory
allocate which literally does just that it's a function that takes a number is input how many btes of memory do you
want the operating system system to find for you somewhere in that Big Grid it's going to find it and it's going to
return to you the address of the first bite of contiguous memory back to back to back and then you can do anything you
want with that chunk of memory free is going to do the opposite when you're done using a chunk of memory that Malo
has given you you can say free it and that means you hand it back to the operating system and then the operating
system can use it for something else later so this is actually evidence of a common problem in programming if your
Mac your PC has ever been in the habit of starting to get like really really slow or it's kind of slowing to a crawl
heck maybe it even freezes one of the possible exclam explanations could be that the program you're running by apple
or Microsoft or whoever maybe they're using malok or some equivalent asking the operating system Mac OS or Windows
for give me more memory I need more memory the user is creating more images the user is typing a longer essay give
me more memory more memory if the program has a bug and never actually freeze any of that memory your computer
might end up using all of the available memory and honestly human are not very good at handling Corner cases like that
very often programs computers just freeze at that point or get really really slow because they start trying to
be creative uh when there's not enough memory left so one of the reasons for a computer really slowing down might be
calling uh malok a lot or some equivalent but never freeing it which is to say you should always use these two
functions in concert and free memory once you are done with it so let me go ahead and do this in code and solve this
problem properly let me go ahead and do this before I copy s into T using something like stir copy I first need to
get a bunch of memory from the computer so to do that let's make this super clear that we're deing with pointer so
I'm going to change my strings to char stars for both s and t and what I technically am going to store in T is
the CH is the address of a available chunk of memory to do that I can ask the computer to allocate memory for me and
how many bytes if I want to create a copy of Hi exclamation point I need how many
B good four because I need the H the I the exclamation point and additional space for the back slash Zer it's up to
me to understand that and ask for it it's not going to happen magically nothing does in C so I could just
naively type four there and that would be correct if I type in Hi exclamation point or any other three-letter word or
phrase but to do this dynamically I should probably do something like Sterling of s plus one for the
additional null character recall that string length does it in the sort of the English sense it Returns the length of
the string you see plus one also takes into account the fact that I'm going to need that back slash n now let me do
this old school style first let me go ahead and manually copy the string t uh s into T first so four uh in I equals z
i is less than the string length of s i ++ then inside my for Loop I'm going to do T bracket I equals s braet i but
actually I want the null character to so I want to do the length of the string plus one more and heck I think I learned
an optimization last time if I'm doing this again and again I could really do n equals stir length of s + one and then
do I is less than n just as a a nice design optimization I think this for Loop will actually handle the process
then of copying every character from s into every available bite of memory in t or or I could get rid of all of that and
take your suggestion which is to use Stir copy which takes as its first argument the destination and its second
argument the source so copy from right to left in this case too that's going to do all of that automatically for me as
well now I think I'm good I can now capitalize safely the first character in t which is now a different chunk of
memory than S and then I can print them both out to see that one has not changed but the other has so make copy all right
what did I do wrong implicitly declaring Library function malok dot dot dot so we've seen this kind of error
before what is even if you don't know quite how to solve it what's the essence of the solution what do I need to do to
fix this kind of problem involving implicitly declaring a library function what did I forget
yeah I need to include the library and I could look this up in the manual uh or I can I know it off the top of my head I
just forgot it there's another Library will'll occasionally need now called standard lib standard library that
contains malok and free prototypes and some other stuff too all right let me just clear this away and do make copy
one more time now I'm good copy enter all right s I'm going to type in high lowercase T and S now come back as
intended s is untouched it would seem but T is now capitalized all right any questions then
on what we just did in code yeah indeed there's a few improvements I want to make so let me actually do those
right now technically I should practice what I preached and I should indeed when I'm done with te free tea fortunately I
don't have to worry about how big tea was the op the computer remembers how many bites it gave me and it will go
free all of them not just the first I should do free T I don't need to do free s and I shouldn't because that is
handled automatically by the cs50 library s recall came from get string and we actually have some fancy code in
place that Mak sure that at the end of your programs execution we free any memory that we allocated so we don't
actually waste memory like I described earlier but there's actually a couple of other things if I really want to be
pedantic I should put in here it turns out that sometimes Malo can fail and sometimes Malo doesn't have enough
memory available cuz maybe your computer's doing so much stuff there's just no more RAM available so
technically I should do something like this if T equals equals null with two L's today then I should just return one
or something to say that there was a problem I should probably print an error message too but for now I'm going to
keep it simple uh I should also probably check this this is a little risky of me if I'm doing T bracket Z this is
assuming that there is a letter there but what if the hum the human just hit enter at the prompt and didn't even type
h let alone Hi exclamation point what if there is no T bracket zero so technically what I should probably do
here is if the length of T is at least greater than zero then go ahead and safely capitalize the first letter of it
and then at the very end if all goes well I can return zero thereby signifying that indeed this thing was
successful so yes these two functions Malo and free should be in concert and so if you call malok you should call
free eventually but you did not call Malo for S so you should not call free for S all right yeah other
question why did I do maloc plus one so Malo sorry maloc of string length of s plus one the string length is sort of
the literal length of the string as a human would perceive it in English so Hi exclamation point Stirling gives me
three but I know now as of last week and this week what a string technically is and a string always has an extra bite
the onus is on me to understand and apply that lesson learned so that I actually give stir copy enough room for
that trailing null character and here's just an annoying thing when we call the back0 N L uh last week uh it turns out
that n l is the same idea it's also zero but it's zero in the context of pointers so long story short you never really
write n I've just said it and we saw it on the screen you will start writing n l when you want to check whether or not a
pointer is valid or not and what I mean by that is this if malok fails and there's just not enough memory left
inside of the computer for you it's got to return a special value and that special value is n l l in all capital
letters that signifies something went wrong do not trust that I'm giving you a useful return
value other questions on these copies thus far yeah over
there good question will stir copy not work without Malik you kind of need both in this case because stir copy by
definition if I pull up its manual page needs a destination to put the copied characters it's not sufficient just to
say charar t semicolon that only gives you a pointer but I need another chunk of memory that's just as big as Hi
exclamation point back sl0 so Malo gives me a whole bunch of memory and then stir copy fills it with Hi exclamation point
back sl0 so again that's why we're sort of going down to this lower level cuz once you understand what needs to be
done you now have the functions to do it so let's actually consider what we just solved so in this s version of the
program where I actually introduced malok T was initialized for the return value of malok and maybe the memory that
I got back was here Ox 456 457 458 459 I've left it blank initially because nothing is put there automatically by
malok I just get a chunk of memory that is now mine to use as I see fit I then assign T to that return value which
points T at the first address notice there's no back slash Z this is not yet a string it's just a chunk of memory
four bytes an array of four bytes what stir copy eventually did for me was it copied the H over the I over the
exclamation point over and the back slash zero and if I didn't want to use Stir copy or I forgot that it existed my
for Loop would have done exactly the same thing all right any questions then on
these examples here any questions yeah good question after malok if I had then still done just tals s it actually
would have recreated the same original Problem by just copying ox123 from s into T so then I would have been left
with a picture that looked like this a few steps ago I would have and I can't quite do it live this if I did what you
just described would now be pointing over here and so I wouldn't have fundamentally solved the problem I would
have just additionally wasted four byes temporarily that I'm not actually using yeah um you can can you do you always
use Malo and stir copy together not necessarily these are both solving two different problems malok's giving me
enough memory to make a copy stir copies doing the copy however you could actually use an array if you wanted of
character and you could use Stir copy on that and there's other use cases for stir copy but thus far it's a reasonable
mental model to have that if you want to copy strings you use malok and then stir copy or your own homegrown Loop
yeah say that once more no it will good question uh if I uh hadn't well stir copy perrit
documentation will copy the whole string plus the null character at the end it just assumes there will be one there
it's therefore up to you to pass stir copy a long enough chunk of memory to have room for that if I only asked Malo
for three bytes that could have potentially created a memory problem whereby stir copy would just still
blindly copy 1 2 3 four bytes but technically it should only only touched three of those you do not yet have
access to the fourth one or the rights to it because you never asked malok for it uh
yeah correct the number inside Malo it's one argument is the number of btes you want
back yes the onus is on you the programmer to remember or frankly use a function to figure out how many bytes
you actually need that's why I did not ultim type in four manually I use Stir length plus one so the plus one is
necessary if you understand how strings are represented but using stir length means that I can actually play around uh
with any types of inputs and it will dynamically figure out the length so suffice it to say there's so many ways
already where you can start to break programs let's give you at least one tool for finding mistakes that you might
make and indeed in upcoming problem sets will you use this to find bugs in your own code not just using printf not just
using the built-in debugger but another tool here here as well so let me go ahead and deliberately write a program
called memory. C that has some memory related errors let me include standard io. at the top and let me include
standard libh at the top so I have access to Malik now let me do in main void and then inside of main let me do
this I want to allocate maybe how about three space for three integers why just for the sake of discussion so I'm going
to go ahead and do Malo of three but I don't want three btes I want three integers and an integer is four bytes so
technically I could do this 3 * 4 or I could do 12 but again that's making certain assumptions and if I run this
program on a slightly different computer ins might be a different size so the better way to do this would be three
times whatever the sizes of an INT and this is just an operator you can use anytime if you just want to find out on
this computer how big is an INT how big is a float or something else so that's going to give me that many that much
memory for three ins what do I want to assign this to well Malo returns an address pointers are addresses so I'm
going to create a pointer to an INT called X and assign it the value so what am I doing here this is a little less
obvious but again go back to basics the right hand side here gives me a chunk of memory for three integers Malo Returns
the address of the first bite of that chunk how do I store the address of anything I need a pointer the syntax for
today is type of data star the type of data in question is uh 3 ins so I do in Star X again it's kind of purposeless
only for sort of instructional purposes here but this is equivalent now to having a chunk of memory of size 12 in
total presumably so I can technically now do this I can go into maybe the first location and assign it the number
72 like the other day uh second location number uh the number 73 and the Third location maybe the uh loc the number uh
33 now I've deliberately made two mistakes here because I'm trying to trip over my new found understanding or my
sort of greenness with understanding pointers one I didn't remember that I should be treating chunks of memory as
zero indexed malok essentially returns an array if you want to think of it as that an array of 3 ins or more
technically the address of a chunk of memory that could fit 3 ins so I can use my square bracket notation or I could be
really cool and use pointer arithmetic but this is a little more user friendly but I have made two mistakes I did not
start indexing at zero so line 7 should have been X bracket 0 line 8 should have been X bracket one and then line n
should have been X bracket two so first mistake the second mistake that I've made as a side effect is I'm also
touching memory that I shouldn't x braet 3 would mean go to the fourth in in the chunk of memory that came back I only
asked for enough memory for three in not four so this is what's called a buffer for overflow I am accidentally but
deliberately at the moment going Beyond boundaries of this array this chunk of memory so bad things happen but not
necessarily by just running your program let me go ahead and just try this uh make memory and you'll see here that it
compiles okay dotmemory and it actually does not segmentation fault which comes back to that point of non-determinism
sometimes it does sometimes it doesn't it depends on how bad of a mistake you made but there's a program that can spot
these kinds of mistakes and I'm going to go ahead and expand my terminal window for a moment and I'm going to run not
just dotmemory but a program called valgrind dotmemory This is a command that comes with a lot of computer
systems that's designed to find memory related bugs in code so it's a new tool in your toolkit today and you'll use it
with the coming problem sets I'm going to run this now its output honestly is hideous but there's a few things that
will start to jump out and we'll help you with uh tools and the problem set to see these kinds of things here's the
first mistake invalid right of size 4 that's on memory. C line 9 per my highlights so let me go look at line
nine in what sense is this an invalid right of size four well I'm touching memory that I shouldn't and I'm touching
it as though it's an INT and an INT is four bytes size four so again this takes some practice to get used to the
nomenclature here but this is now a clue for me the programmer that not only did I screw up but I screwed up related to
memory and so this this is just kind of a a hint if you will it's not going to necessarily tell you exactly how to fix
it you have to kind of wrestle with the the semantics but invalid right of size four oh okay so I should not have
indexed past the boundary here all right so I shouldn't have done that so let me go ahead then and change this to uh 0o
one and two perhaps here all right so let me go ahead and recompile my code make
memorymemory still doesn't seem to be broken but it is technically buggy let me go ahead and run vrind again so
valind of/ memory enter and now there's fewer scary out less uh scary output now but there's still something in there
notice this 12 bytes in one blocks no regard for grammar there are definitely lost in Lost record one of one super
cryptic but this is hinting at a so-called memory leak the blocks of memory are lost in the sense that I Malo
them I asked for them but I never take a guess freed them I have a memory leak and this is the Arcane way of saying
you've screwed up you have a memory leak so this is an easy fix fortunately once I'm done with this memory I just need to
free it at the end so now let me go ahead and rerun make memory it still runs fine so all the while I might have
thought incorrectly my code is correct but let me run valgren one more time vren of dotmemory enter now this is
pretty good all heat blocks were freed whatever that means no leaks are possible possible and even though it's
still a little cryptic there's no other error here and in fact it's pretty explicit error summary zero errors from
zero contexts dot dot dot so even though this is one of the most Arcane tools we'll use it's also one of the most
powerful because it can see things that you the human might not and maybe even that the debugger might not it does a
much closer reading of your code while it's running to figure out exactly what is going
on all right any questions then on this tool and we'll guide you after today with actually using this too just helps
you find memory related mistakes that you might now be capable of making all right let's do one other memory related
thing let me shrink my terminal window here let me create one other file here called garbage. C so it turns out
there's a term of art called garbage values in programming that we can reveal as follows let me include standard i.h
and let me include how about standard li. and then let me give myself int main void and then in this relatively short
program let me give myself like three ins using last week's notation just in scores bracket three for like three quiz
scores or whatever then let me go ahead and do four in I equals 0 I Less Than 3 I ++ then let me go ahead and print out
percent I back sln scores bracket I semicolon that's it this quote pretty sure is going to
compile and it's going to run but what is my logical bug I've sort of Forgotten a step step even though the code that's
written is not so wrong yeah yeah I didn't provide the score so I didn't actually initialize the array
called scores to have any scores whatsoever what's curious about this though is that the computer technically
doesn't mind let me go ahead and sort of playfully make garbage enter and it's kind of an app description because what
I'm about to see are socalled garbage values when you the programmer do not initialize your codes VAR to have values
sometimes who knows what's going to be there the computer's been doing some other things there's a bit of work that
happens even before your code runs in the computer so there might be remnants of past ins Char strings floats anything
else in there and what you're seeing is those garbage values which is to say you should never forget as I just did to
initialize the value of some variable and this is actually pretty dangerous and there have been many examples of
software being compromised because of one of these issues where a variable wasn't initialized and all of a sudden
users maybe uh people on the internet in the context of web applications could suddenly see the contents of someone
else's memory or remnants maybe someone's password that had been previously typed in or some other value
like a credit card number that had been previously typed in there are different defense mechanisms in place to generally
make this not so likely but it's certainly very possible at least in this kind of context to see values that you
probably shouldn't because they might be remnants from something else that used them so this is to say again you sort of
have this great power now to manipulate memory but also now you have this great sort of hacking ability to poke around
the contents of memory and this is exactly what hackers sometimes do when trying to find ways to exploit systems
are any questions here no all right let's go ahead and take a quick five minute break and when
we come back we'll build on these final topics see you in five we are back uh first just a little programmer humor
from XKCD which hopefully now will make a little bit of sense to you and what let's we'll also do next to take a look
at a short two-minute video that uh animates uh with clation if you will from our friends at Stanford exactly
what happens now if you have an understanding of what garbage values are and how they get there and what happens
then if you misuse them it's one thing just to print them out as I just did it's another if you actually mistake a
garbage value for a valid pointer because garbage values are just zeros and ones somewhere numbers that is but
if you that new dfference operator the star and try to go to a garbage value thinking incorrectly that it's a valid
pointer bad things can happen computers can crash or more familiarly uh segmentation Falls can happen so allow
me to introduce if we could dim the lights for two minutes uh our friend Binky from
Stanford hey Binky wake up it's time for pointer fun what's that learn about pointers oh goodie well to get started I
guess we're going to need a couple pointers okay this code allocates two pointers which can point to integers
okay well I see the two pointers but they don't seem to be pointing to anything that's right initially pointers
don't point to anything the things they point to are called Pointes and setting them up is a separate step oh right
right I knew that the Pointes are separate or so how do you allocate a point okay well this code allocates a
new integer point and this part sets X to point to it hey that looks better so make it do something okay I'll
dereference the pointer X to store the number 42 into its Point e for this trick I'll need my magic wand of
dereferencing your magic wand of D referencing uh that that's great this is what the code looks like I'll just set
up the number and hey look there it goes so doing a d reference on X follows the Arrow to
access its Point e in this case to store 42 in there hey try using it to store the number 13 through the other pointer
why okay I'll just go over here to Y and get the number 13 set up and then take the wand of D referencing and
just oh hey that didn't work say uh Binky I don't think D referencing Y is a good idea cuz uh you know setting up the
point T is a separate step and I don't think we ever did did it good point yeah we we allocated the pointer y but we
never set it to point to a point D very observant hey you're looking good there Binky can you fix it so that y
points to the same pointy as X sure I'll use my magic wand of pointer assignment is that going to be a problem like
before no this doesn't touch the Pointes it just changes one pointer to point to the same thing as another oh I see now y
points to the same place as X so so wait now Y is fixed it has a pointy so you can try the wand of De referencing again
to send the 13 over uh okay here it goes hey look at that now D referencing works on why and because the pointers
are sharing that one point e they both see the 13 yeah sharing uh whatever so are we going to switch places now oh
look we're out of time but it's from our friend Nick parlante at Stanford so let's consider what Nick did here as
Binky so here's kind of all the code together these first couple of lines were not bad and notice that in
Stanford's code they move the Stars to the left that's fine again more conventional might be this syntax here
um these two lines are fine it's okay to create variables even pointers and not assign them a value initially so long as
you eventually do so we eventually do here with this line we assign to X the return value of Malo which is presumably
the address of something to be fair we should really be checking for null as well but that's not the biggest problem
here the biggest problem is not in this next line which means go to the memory location in X and store the number 42
there that's fine CU again malok Returns the address of some chunk of memory this chunk of memory is big enough for an INT
X is therefore going to store the address of that chunk that's big enough for an INT starx recalls the D reference
operator means go to that address and put 42 in it it's like going to the mailbox and putting the number 42 in it
instead of taking the number 50 out like we did before but why is this line bad this is where Binky sort of lost his
head so to speak why is this bad yeah exactly we haven't yet allocated space for why there's no mention of Malo
there's no assignment of Y even to that same memory so this would be go to the address in y but if there is no known
address in y it is a so-called garbage value which means go to some random address that you have no control over
and boom that might cause what we've seen in the past perhaps as a segmentation fault now this fortunately
is the kind of thing that if you don't quite have the eye for it yet valren that new tool could help you find as
well but it's just another example of again the sort of upside and downside of having control now over memory at this
level all right well let's go ahead and do one other thing considering from last week that this notion of swapping was
actually really common operation we had all of our volunteers come up we had a swap a lot of things during bubble sort
and even selection sort and we just kind of took for granted that you know the two humans would would swap themselves
just fine but there needs to be code to do that if you actually Implement bubble sort selection sort or anything that
involves swapping so let's consider some code like this we'll keep it simple like last week and where we wanted to uh swap
some uh values uh like int a and int B uh for instance here void because I'm not going to return a value but I have a
function called swap so here for instance might be some code for this uh but why is it so complicated here let's
actually take a step back why don't we do this here um I think we have time for one more volunteer could we get someone
to come on up you have to be comfy on camera and oh and you're being asked to help with your oh I'll go with the
friend pointing so whoever has their friend doing this here no now they're pointing over here
now literally an arm is being Twisted okay come on down that backfired come on over
and what is your name Marina Marina nice to meet you who were you trying to volunteer my friend J okay so here we
have for Marina two glasses of liquid orange and purple just so that they're super obvious and suppose that the
problem of at hand like last week is just a swap two values like as though these two glasses represented two people
and we want to swap them but let's consider these glasses to be like variables or location and array and you
know know what I'd really like you to swap the values so like orange has to go in there and purple has to go in there
how would you do it and we'll see if we can then translate that to code okay what you say it a little lat
it all right yeah so presumably you're sort of struggling mentally with how you would do this without having an extra
cup so good foresight here let me go ahead and we do have a a temporary variable if you will so if I hand you
this how would you now solve this problem well I would go like that no that's oh well okay do go with your
instincts okay there sure go ahead go to whatever your instincts are yeah so a little so strictly
speaking probably shouldn't have moved the glasses just because that would be like moving the array locations so let's
actually do it one more time but the glasses now have to go back where they originally are so how would you swap
these now you this temporary variable okay good otherwise we'd be completely uprooting the array for
instance by just physically moving it around so you moved the orange into this temporary variable then you copied the
purple into where the orange was and now presumably excellent the Orange is going to end up where the purple once was and
this temporary variable it's sort of some extra memory it was necessary at the time but not necessary ultimately
but a round of applause if we could and thank you for doing that so well so right the fact that it sort of
instantly occurred to Mariana that like you need some temporary variable is a perfect translation to code and in fact
this code here that we might Glimpse now is reminiscent of exactly that algorithm where A and B at the end of the day are
the same chunks of memory just like the second time the two glasses have to kind of stay put even though we're physically
lifting them but they're going back to where they were is kind of like having two values A and B and you just have a
temporary variable into which you copy a then you change a with B then you go and change B with whatever the original
value of a was because you temporarily stored it in uh this temporary variable TMP unfortunately this code doesn't
necessarily work as intended so let me go over to my uh vs code here and open up a program called swap. C and in swap.
C Let Me Whip up something really quickly here with how about include standard
i.h int main void inside of main let me do something like int X gets one int y gets two let me just print out as a
visual confirmation that X is percent i y is percent I back sln plugging in X and Y respectively then let me call a
swap function that we'll invent in just a moment swap X and Y and then let me print out again X is percent i y is
percent I back slash and just to print out again what they are because presumably I should see one two first
then 2 1 the second time now how is swap going to be implemented let me implement it exactly as on the screen a moment ago
so void swap int X or let's call it int a for consistency int B but I could always call those anything I want int
temp gets a A gets b b gets temp so exactly as I proposed a moment ago and exactly as Mariana really implemented it
using these glasses of water I need to now include my prototype as always so nothing new there and I'll just copy
paste that up here and now let's go ahead and run this so make swap so far so good swap X is now one y is two x is
one y is two so there seems to be a bit of a bug here but why might this be this code does not in fact work even though
it obviously worked in reality yeah good and let me summarize A and B do indeed have different addresses of X
and Y and in fact what happens when you call a function like this on line 11 calling swap passing in X and Y you are
calling a function by value so to speak and this is a term of art that just means you are passing in copies of X and
Y respectively and calling them A and B in the context of this function but they're indeed copies now technically
these names are local only I could have called this x I could have called this y I could have changed this to X this to Y
this to X and this to to Y the problem would still remain just because you use the same names in One function as you do
elsewhere that doesn't mean they're the same they just look the same to you but indeed swap is going to get copies of
this X and Y and in this context this scope so to speak uh X and Y will be copies of the original so for clarity
let me revert this back to A and B just to make super clear that they're indeed different albe it copies but there's
indeed a problem there this function actually works fine in fact notice this let me go ahead and print out inside of
this print F A is percent i b is percent I back slash n and then I'll print A and B and let me do that same thing at the
beginning of this function before it does any work let me go ahead and rerun make
swap swap and this is promising initially X is one y is 2 a is 1 B is two a is 2 B is one but then nope X is
one y is 2 so if anything I've confirmed that the logic is right Mariana's logic is right but there's something about C
there's something about using one function versus another that's actually creating a problem here the fact that
I'm passing in copies of these values is creating this problem so what in fact is going on well again inside of your
computer's memory there's these little chips and we've been talking about them abstractly it's just this grid of memory
locations it turns out that your computer uses this memory in a pretty conventional way it's not just kind of
random where it just puts stuff wherever is available it actually uses like different parts of the memory for
different purposes and you have control over a lot of it but the computer uses some of it for itself and let's go ahead
and zoom out from this and consider that within your computer's memory what a computer will typically do is actually
store initially all of the zeros and ones that you compiled in the top of your computer's memory so to speak so
when you compile a program and then you run it with SL whatever or on a Mac or PC you double click on it the computer
first the operating first operating system first loads all of your program zeros and ones AKA machine code into
just one big chunk of memory at the top so to speak below that it stores Global variables any variables you have created
in your program that are outside of Main and outside of any functions generally at like the top of your file globals
tend to go at the top there then there's this chunk of memory that's generally known as the Heap and we saw that word
briefly in vran's output and then there's this other chunk of memory called the stack and it turns out out
that up until this week you were using the stack heavily anytime you use local variables in a function they end up on
the stack anytime you use malok that memory ends up on the Heap now as the arrow suggests this actually looks like
a problem waiting to happen because if you use more and more and more Heap and more and more and more stack it's like
you know two things barreling down the tracks at one another this does not end well and that's actually a problem if
you've ever heard the phrase stack Overflow or use the website this is the origin of its name when you start to use
more and more and more memory by calling lots and lots of functions or using lots and lots of local variables you use a
lot of this stack memory or if you use malok a lot and keep calling malok malok malok and never really or rarely calling
free you just use more and more memory and eventually these two things might overflow each other at which point
you're just kind of out of luck the program will crash or something bad will happen so the onus is kind of on you
just to don't do that but this is the design generally of what's going on inside of your computer memory now
within that memory though there are certain conventions focusing on here the stack and in fact let me go over here
with a marker and say that this represents like the bottom of my memory ultimately and so here uh we have a
whole bunch of wooden blocks and each of these squares represents a bite of memory and this for instance might
represent four bytes altogether good enough for an INT or something like that so in my original code that I wrote
earlier that is in fact buggy what is in fact going on inside the swap function we can kind of ual it like this when you
run/ swap or any program for that matter main is the first function to get called with a c program and so I'm just going
to label this bottom row of memory as Main and what were the two variables I had in main called in this
code yeah X and Y and each of those was an in so that's four bytes so it's kind of deliberate that I reserved a four uh
a chunk of wood here that's four bytes so let me just call this X and I'm just going to write the number one in this
box here and then I had my other variable Y and I'm going to put the number two there what happens when main
calls swap like it does in this code here well it has two variables of its own a and b and a initially is one and B
is initially two but it has a third variable temp which is a local variable in addition to the arguments A and B
that are passed in so I'm going to call this temp TMP over here and what is the value of temp well we have to look back
at the code temp initially gets the value of a all right the value of a was one so temp initially gets one that's
step one in my three line program okay a equals B so that is a sign from the right to the left of the B into the a so
B is two a is this so let me go ahead and erase this and just overwrite that so at this moment in the story you have
two copies of two so that's that's okay though because the third line of code says temp gets copied into B so what's
temp one gets copied into B so let me overwrite this to with a one and now what happens now unfortunately the code
ends swap doesn't actually do anything with the result and the problem in C is that I could have had a return value I
could go in there and change void to int but which one am I going to return the a or the B the whole goal is to swap two
values and it seems kind of lame if you can't write a function to do something as common per last week sorting
algorithms as swapping two values but what really happens well even though when this program starts running Maine
is using this chunk of memory at the bottom in the so-called stack and the stack is just like a cafeteria stack of
trays it grows up like this here's Main's memory on the stack here's the swap functions memory on the stack it's
using three in instead of two uh instead of only two what happens when the function returns whether it's void or
not the sort of recollection that this is swaps memory goes away and garbage values are left so adorably we get rid
of these values here and there's still data there technically the numbers 1 one and two are still there in the
computer's memory but they no longer belong to us because the function has now returned so they're still in there
and this is kind of an example visually of why there's other stuff in memory even though you didn't put it there
necessarily sometimes you did put it there but now once swap returns you only should be touching memory inside of main
but we've never actually copied one value into main we haven't returned anything and we haven't solved this
fundamentally so how could we do this well what if we instead passed into swap not copies of X and Y calling them A and
B what if they passed in breadcrumbs to X and Y sort of a Treasures map that will lead swap to the actual X and to
the actual y today we have that capability using pointers so suppose that we use this code instead there's a
lot of stars going on here which is a bit annoying but let's consider what it is we're trying to achieve what if we
pass in not X and Y but the address of X and the address of Y respectively breadcrumbs if you will that will lead
swap to the original values then what we do is we still give ourselves a temp variable Like An Empty Glass it's still
a glass so we still call it an in but what do we want to put into that temporary variable we don't want to put
a into it because that's an address now we want to go to that address per the star and put whatever's at that address
what do we then want to do well we want to then copy into whatever's at location a we want to copy over to location A's
contents whatever is at locations B's contents and then lastly we want to copy temp into whatever's at location B so
again we're very deliberately introducing all of these stars because we don't want to change any of these
addresses we want to go to these addresses per the D reference operator and put values there or get values from
so what does this actually mean well if I kind of rewind in this story and I go back here I still have temp although I'm
going to delete its value to begin with I still have uh B and I still have a but what's going to be different this time
is how I use a and b so let me finish erasing those that's a on the left this is B on the right at this point in the
story we're rerunning swap with this new and improved version and let's see what happens well X is presumably at some
address maybe it's like ox123 as always what then does a get when I'm using this code the value of a
is ox1 2 3 What is the value of B maybe y is at Ox 456 what goes in B well I'm going to put Ox
456 and then what am I going to do based on these three lines of code I'm going to store in temp whatever is at the
address in a what is the address in a that's this thing here so I'm going to put one in temp line two I'm going to go
to B all right B is 4 56 so I'm going to B and I'm going to store two at whatever is at location a and at location a is 1
2 3 so that's this so what am I going to do I'm going to change this one to a two last line of code get the value of temp
which is one and then put it at whatever the location B is so B 456 go there and change it to be the value of temp TMP
which puts one here that's it for the code there's still no return value swap returns which means these three
temporary variables are sort of garbage values now they can be reused by subsequent function calls but now I've
actually swapped the values of X and Y which is to say what came as naturally as the real world here for Mariana is
not quite as simply done in C because again functions are sort of isolated from each other you can pass in values
but you get copies of those values if you want one function to affect the value of a variable somewhere else you
have to one understand what's going on but two pass things in as via pointer here so if I go back to my code here I
need to make a few changes now let me get rid of these extra printfs let me go in and add all these Stars so I'm
dereferencing these actual addresses here in here and I've got to make one more change how do I now call
swap if swap is expecting an in Star and an in star that is the address of an INT and the address of another int what do I
change on line 11 here yeah sorry a little
louder sorry the the address of operator so up here on line 11 we do Ampersand X and ampers sand y so that yes we're
technically passing in a copy of a value but this time the copy we're passing in is technically an address and as soon as
we have an address just like when I held up the fuzzy finger the foamy finger I can point at that address I can go to
that address and actually get a value from the mailbox or put a value into the mailbox if I even want so let's cross
our fingers now and do make swap enter oh my God so many mistakes oh I didn't remember to change my prototype so let
me go way up here and add two more stars because I made that change already make swap do SL Swap and voila now I have
actually swapped thank you thank you the two Valu all right so what more can we do here
well let me consider that all this time we've been deliberately using get string and get
int and get float and so forth but for a reason these aren't just training wheels for the sake of like making things
easier they're actually EX in place to make your code safer and to illustrate this let me go ahead and open up one
other file here how about a file called uh code uh scanf doc it turns out that the old school way the way in C really
of getting user input is via functions like scanf and let me go ahead and include standard io. in main void and
without using the cs50 library at all for Strings or for any of those get functions let me give myself an INT
called X let me just print out what the value of x is even though it's going to be um or rather ask the user for the
value by asking them for x and I'm going to use a function called scan F that's going to scan in an integer using perc I
and I'm going to store whatever the human types in at this location and then I'm going to go ahead and just so we can
see what happened I'm going to print out with percent I whatever the human typed in as follows all right so line eight is
week one style code line five and six is week one style code so the Curiosity today is this new line scanf is another
function in standard i.h and notice what I'm doing I'm using the same syntax that I use for printf which is kind of a
little clue a format code to tell scanf what it is I want to scan in that is read from the human's keyboard and I'm
telling it where to put whatever the human typed in I can't just say x because we run into the same darn
problem as with swap I have to give a little breadcrumb to the variable where I want scanf to put the human's integer
and so this just tells the computer to get an INT this is what you would have had to type essentially in week one just
to get an INT from the user and there's a whole bunch of things that can go wrong still but that's the cryptic
syntax we would have had to show you in week one let me go ahead and make scan F here oops uh user error put the
semicolon in the wrong place make scan F enter oh my God uh non void doesn't return a
value oh thank you strike two okay make scan F there we go okay so scan F I'm going to type in a number like 50 and it
just prints it back out so that is the sort of traditional way of implementing something like get in the problem though
is when you start to get into Strings things get dangerous quickly let me delete all of this and give myself a
string s although wait a minute we don't call it strings anymore Char star to store a string then let me go ahead and
just prompt the user for a string using just print F then let me go ahead and use scan F ask them for a string this
time with percent s and store it at that address then let me go ahead and print out whatever the human typed in just by
using the same notation so here line five is the same thing as string s but we've taken back that layer today so
it's Char star s this is just week one this is just week one line seven is new scanf will also read from the human's
keyboard a string and store it at s but that's okay because s is an address it's correct not to do the Amper sand it's
not necessary a string is and has always been a Char star AKA string the problem though arises as follows if I do make
scan F oh my God what I do oh uh I can't okay we have certain defenses in place with make let me do clang of
scanf Doc and output of program called scanf all right so I'm overriding some of our pedagogical defenses that we have
in place with make let me now run scan if of this version enter and let me type in something like uh how about hi again
huh so it didn't even store something and it weirdly printed out null this time it's in lowercase but that is
somewhat related what did I fundamentally do wrong though here why is this getting more and more
dangerous and let me illustrate the point even more what if I type in not just something like hello which also
doesn't work what if I do like hello and make a really long string enter that still works can I do this
again let's try again right a real really long unexpectedly long string this is the non-determinism kicking it
enter all right damn it I was trying to trigger a segmentation fault but it wouldn't uh but the point Still Remains
it's still not working but what's the essence of why this isn't working and it's not storing my actual input
yeah uh we have to make space for it so what we're missing here is malok or something like that so I I could do that
I could do something like this well let me let the human type in at least a threel word so I could do malok of three
plus one for the new for the null character so like let me give them four characters and let me go ahead and do
make scan F whoops uh no sorry clang I have to cir no damn it oh include standard
li. there we go that gives me malok now I'm going to recompile this with clang now I'm going to reun it and now I'm
going to type in my first thing hi that now works and let me get a little aggressive now and type in hello which
is too long still works but I'm getting lucky let me try a hello damn it that still works too sort
of but it actually not quite there's some weirdness going on there already it turns out I can also do this I could
actually just say charar four and give myself an array of four characters let me try this one more time so let me
rerun clang scanf hello clearly exceeding the four characters
there we go thank you all right so the the point here though is if we hadn't given you get int you would have
had to use the scanf thing not a huge deal because it seemed to work but if we hadn't given you get string you would
have had to do stuff like this knowing about maloc already or knowing about strings being arrays and even now
there's a danger if the human types in five letters six letters 100 letters this code like with the hello input will
probably just crash which is bad so get string also has this functionality built in where we have a fancy Loop inside
such that we allocate using malok as many btes as you physically type in and we use Malo essentially every keystroke
the moment you type in h e l l o we're sort of like laying the tracks as we go and we keep allocating more and more
memory so that we theoretically will never crash with get string even though it's this easy to CRA well this easy to
crash your code using scan F if you again did it without the help of a library so where are we all going with
this well let me show you a few final examples that'll pave the way for what'll be problem set 4 let me go ahead
and open up from uh today's code which is available on the course's website for instance a program
like this uh called phonebook.com file to open like a CSV that you might manipulate in Excel or Google
spreadsheets or the like comma separated values and then something like a for a pend R for read W for write depending on
whether you want to add to the file just open it up or change it um we're going to introduce you to a file pointer
you'll see that Capital file which is a littleit nonconventional capital file is a pointer to an actual file on the
computer's hard drive so that you can actually access something like a CSV file or heck even images and we're going
to see down below that you're also going to have the ability to write files as well or print to files you'll see
functions like print f printf for file print F or f file right which now that you will begin to understand pointers
you'll have the ability to actually not only read files text files images other things but also write them out in fact
for instance just as a teaser here um jpegs will be one of the uh things we focus on this week where we give you a
forensic image and your goal is to recover as many photographs from this forensic image of like a digital camera
as you possibly can and the way you're going to do that is by knowing in advance that every jpeg in the world
starts with these three bytes written in heximal but these three numbers and so in fact just as a teaser let me open up
an example you'll see on the coures website for today if I scroll through here you'll see a program that does a
little something like this and again more on this in if we could hit the button oh there
we go so here we have um the notion of a bite we're going to create for ourselves we'll see a data type called bite which
is a common convention this gives me three bytes and you're going to learn about a function called f read which
reads from a file some number of bytes for instance three bytes we might then use code like this if bytes bracket 0
equals equals o x FF and bytes bracket 1 equals 0x d8 and bytes bracket 2 equals 0x FF all three of those bytes I just
claimed represent a JPEG you'll see an output like this let me go ahead and run this program as follows let me copy jpeg
doc into my directory from today's distribution let me do make uh JPEG and let me run jpeg on a file which
is available online called lecture. JPEG and I claim yes it's possibly a JPEG well what is that file let me open it up
for us called lecture. JPEG and here for instance is that same photo with which we began class namely implemented as a
jpeg but what we're also going to do this week is start to implement our own sort
of uh filters all Instagram whereby we might take images and actually run them through a program that creates different
versions thereof for instance using a different file format called BMP which essentially lays out all of its pixels
from left to right top to bottom in a grid you're going to see a struct a data struct iny that's way more complicated
than like the candidate structure from the past or the person structure from the past that looks like this which is
just a whole bunch more values in it but we'll walk you through these in the P set and we might take a photograph like
this and ask you to run a few different filters on it all Instagram like a black and white filter or grayscale a cpia
filter to give it some old school field or a reflection like this to invert it or blur it even in this way and just to
end on a note here I have a version of this code ready to go that doesn't Implement all of those filters it just
implements one filter initially uh let me go ahead and just ready this on my computer here I'm going to go into my
own version of filter and you'll see a few files that we'll give you a tour of this coming week in bitmap. for instance
is a version of this structure that I claimed existed a moment ago and let me show you this file here helpers Doc in
which there is a function called filter that I've already be implemented in advance today but the ones we give you
for the piece that won't already be implemented this function called filter takes the height of an image the width
of an image and a two dimensional array so rows and Columns of pixels and then I have a loop like this that iterates over
all of the pixels in an image from top to bottom left to right and then notice what I'm going to do here I'm going to
change the blue value to be zero in this case and the green value to be zero in this case but why well the image I have
here in mind is this one whereby we have this hidden image that simply has sort of old school style a secret message
embedded in it and if you don't happen to have in your dorm like one of these sort of secret decoder glasses that
essentially make everything red getting rid of the Green in the world and the blue in the world you can actually I'm
actually probably the only one who can read this right now see what message is hidden behind all of this red noise but
if using my code written here in helpers.com make filter run/ filter on this hidden
message. BMP I'm going to save it in a new file called message. BMP and with one final flourish we're going to open
up message. BMP which is the result of having put on these glasses and hopefully now you too will see what I
see all right all right all right that's it for cs50 we'll
see you next time [Music] all right this is cs50 and this is
already week five which means this is actually our last week in C together in fact in just a few days time what has
looked like this and much more cryptic than this perhaps is going to be distilled into something much simpler
next week when we transition to a language called Python and with python we'll still have our conditionals and
loops and functions and so forth but a lot of like the low-level Plumbing that you might have been wrestling with
struggling with frustrated by over the past couple of weeks especially now that we've introduced pointers and it feels
like you probably have to do everything yourself in Python and in a lot of higher level languages so to speak more
modern more recent languages you'll be able to do so much more with just single lines of code and indeed we're going to
start leveraging libraries all the more code that other people wrote uh Frameworks which is collections of
libraries that other people wrote and on top of all that will you be able to make even better grander more impressive
projects that actually solve problems of particular interest to you particularly by way of your own final project so last
week though in week four recall that we focused on memory and we've been treating this memory inside of your
computer as kind of like a canvas right at the end of the day it's just zeros and ones or bites really and it's really
up to you what you do with those bittes and how you interconnect them how you represent information on them and arrays
were like one of the simplest ways we started playing around with that memory just contiguous chunks of memory back to
back to back but let's consider for a moment some of the problems that pretty quickly arise with arrays and then today
focus on what more generally are called data structures using your computer's memory as a much more versatile C uh
canvas to create even two dimensional structures to represent information and ultimately to solve more interesting
problems so here's an array of size three maybe the size of three integers and suppose that this is inside of a
program and at this point in the story you've got three numbers in it already one two and three and suppose whatever
the context you need to now add a fourth number to this array like the number four well instinctively where should the
number four go if this is your computer's memory and we currently have this array 1 two three from left to
right where should the number four just perhaps naively go yeah what do you think number one sorry rep number one oh
okay so you could replace number one I don't really like that though because I'd like to keep number one around but
that's an option but I'm losing of course information so what else could I do if I want to add the number four over
there right yeah so I mean it feels like if there's some ordering to these which seems kind of a reasonable inference
that it probably belongs somewhere over here but recall last week as we started poking around a computer's memory
there's other stuff potentially going on and if we sort of fill that in ideally we'd want to just plop the number four
here if we're maintaining this kind of order but recall in the context of your computer's memory there might be other
stuff there some of these garbage values that might be usable but we don't really know or care what they are as
represented by Oscar here but there might actually be useful data in use like if your program has not just a few
integers in this array but also a string that says like Hello World it could be that your computer has plopped the h e l
l o w r LD right after this array why well maybe you created the array in one line of code and filled it with 1 two
three maybe the next line of code used get string or maybe just hardcoded a string in your code for hello world and
so you kind of painted yourself into a corner so to speak now I think you might claim well let's just overwrite the H
but that's kind of problematic for the same reasons we don't want to do that so where else could the four go or how do
we solve this problem if we want to add a number and there's clearly memory available cuz those garbage values are
junk that we don't care about anymore so we could certainly reuse those where could the four and perhaps this whole
array go okay so I'm hearing we could move it somewhere maybe replace some of those garbage values and honestly we
kind of have a lot of options we could use any of these garbage values up here we could use any of these down here or
even further down the point is there is plenty of memory available as indicated by these Oscars where we could put four
maybe even five six or more integers the catch is that we sort of chose poorly early on or we just got unlucky and 1 2
three ended up back to back with some other data that we care about all right so that's fine let's go ahead and assume
that we'll abstract away everything else and we'll plop the new array in this location here so I'm going to go ahead
and copy the one over the two over the three over and then ultimately once I'm ready to fill the four I can throw away
essentially the old array at this point because I have it now entirely in duplicate and I can populate it with the
number four all right so problem solve that is a correct potential solution to this problem
but what's the tradeoff and this is something we're going to start thinking about all the more what's the downside
of having solved this problem in this way yeah yeah I'm adding a lot of running time it took me a lot of effort
to copy those additional numbers now granted it's a small array three numbers who cares it's going to be over in the
blink of an eye but if we start talking about interesting data sets sort of uh web application data sets mobile app
data sets where you have not just a few but maybe a few hundred a few thousand a few million pieces of data this is
probably kind of a suboptimal solution solution to just oh move all your data from one place to another because who's
to say that we're not going to paint ourselves into a new corner and it would feel like you're wasting all of this
time moving stuff around and ultimately just costing yourself a huge amount of time in fact if we put this now into the
context of our Big O notation from a few weeks back what might the running time now of search be for an array let's
start simple a throwback a couple of weeks ago if you're using an array to recap what was the running time of a
search algorithm in Big O notation so maybe in the worst case if you've got n numbers three in
this case or four but N More generally Big O of what for search yeah what do you think Big O of N
and what's your intuition for [Music] that okay yeah so if we go through each
element for instance from left to right then search is going to take his Big O notation Big O running time if though
we're talking about these num specifically and now I'll explicitly stipulate that yeah they're sorted does
that bias anything what would the Big O notation be for searching in array in this case be it of size three or four or
N More generally Big O of not n but rather log n right because we could use per week zero binary search on an array
like this we'd have to deal with some rounding because there's not a perfect number of elements at the moment but you
could use binary search go to the middle roughly and then go left or right left or right until you find the element you
care about so so search remains in Big O of login when using arrays but what about insertion now if we start to think
about other operations like adding a number to this array or adding a friend to your contacts app or Google finding
another page on the internet so insertion happens all the time what's the running time of
insert when it comes to inserting into an existing array of size n how many steps might that
take Big O of n it would be indeed n why because in the worst case where you're out of space you have to allocate it
would seem a new array maybe taking over some of the previous garbage values but the catch is even though you're only
inserting one new number like the number four you have to copy over all the darn existing numbers into the new one so if
your original array is size n the copying of that is going to take Big O of n plus one but we can throw away the
plus one because of the math we did in the past so insert now becomes Big O of N and that might not be ideal because if
you're in the habit of inserting things frequently that could start to add up and add up and add up and this is why
computer programs and websites and mobile apps could be slow if you're not being mindful of these kinds of
tradeoffs so what about uh just for good measure uh Omega notation in maybe the best case well just to recap here we
could get lucky and search could just take one step because you might just get lucky and boom the number you're looking
for is right there in the middle if using binary search or even linear search for that matter and insert two if
there's enough room and we didn't have to move all of those num numbers 1 2 and three to a new location you could get
lucky and we could have someone suggested just put the number four right there at the end and if we don't get
lucky it might take end steps if we do get lucky it might just take the one or constant number of steps in fact let me
go ahead and do this how about we do something like this let me switch over to some code here let me start to make a
program called list. C and in list. C let's start with the old way so we kind of follow our the breadcrumbs we've laid
for ourselves as follows so in this list. C I'm going to include standard i.h int main void as usual then inside
of my code here I'm going to go ahead and give myself the first version of memory so int list three is now
implemented at the moment in an array so we're rewinding for now to week two style code and then let me just
initialize this thing at the first location will be one at the next location will be two and at the last
location will be three so the array is zero indexed always I for just the sake of discussion though I'm putting in the
numbers one two three like a normal person might all right so now let's just print these out for in I gets zero I
Less Than 3 i++ let's go ahead now and print out using print F percent I back sln list bracket I so very simple
program kind of inspired by what we did in week two just to create and then print out the context of an array so
let's make list so far so good list and whilea we see one 2 three now let's start to
practice some of what we're preaching with this new syntax so let me go in now and get rid of the array version and let
me zoom out a little bit to give ourselves some more space and now let's begin to create a list of size three so
if I'm going to do this now dynamically so that I'm allocating these things again and again let me go ahead and do
this let me give myself a list that's of type instar equal the return value of Malo of
three times whoops three times the size of an INT so what this is going to do for me is give me enough memory for that
very first picture we drew on the board which was the array containing one two and three but laying the foundation to
be able to resize it which was ultimately the goal so my syntax is a little different here I'm going to use
malok and get memory from the so-called Heap as we called it last week instead of using the stack by just doing the
previous version where I said int list three that is to say this line of code from the first ver version is in some
sense identical to this line of code in the second version but the first line of code puts the memory on the stack
automatically for me the second line of code that I've left here now is creating an array of size three but it's putting
it on the Heap and that's important because it was only on the Heap and Via this new function last week malok that
you can actually ask for more memory and even give it back when you just use the first notation int list three you have
uh permanently given yourself a an array of size three you cannot add to that in code so let me go ahead and do this if
list equals equals null something went wrong the computer's out of memory so let's just return one and quit out of
this program there's nothing to see here so just a good error check there now let me go ahead and initialize this list so
list bracket zero will be one again list bracket one will be two and list bracket two will be three so that's the same
kind of syntaxes before and notice this equivalence recall that there's this relationship between chunks of memory
and arrays and arrays are really just doing pointer arithmetic for you where the square bracket notation is so if
I've asked myself here in line five for enough memory for three integers it is perfectly okay to treat it now like an
array using square bracket notation because the computer will do the arithmetic for me and find the first
location the second and the third if you really want to be kind of cool and hacker like well you could say list
equals 1 list + 1 = 2 list + 2 = 3 that's the same thing using very explicit pointer
arithmetic which we looked at briefly last week but this is atrocious to look at for most people it's just not very
user friendly it's longer to type so most people even when allocating memory dynamically as I did a second ago would
just use the more familiar notation of an array all right so let's go on now suppose uh time passes and I realized oh
shoot I really wanted this array to be of size four instead of size three now obviously I could just rewind and like
fix the program but suppose that this is a much larger program and I've realized at this point that I need to be able to
dynamically add more things to this array for whatever reason well let me go ahead and do this let me just say all
right list should actually be the result of asking for four uh chunks of memory for malok and
then I could do something like this uh list bracket 3 equals 4 now this is buggy potentially in a couple of ways
but let me ask first what's really wrong first with this code the goal at hand is to start
with the array of size three with the one two three and I want to add a number four to it so at the moment in line 17
I've asked the computer for a chunk of four integers just like the picture and then I'm adding the number four to it
but I kind of have skipped a few steps and broken this somehow [Music]
yeah yeah I don't necessarily know where this is going to end up in memory it's probably not going to be immediately
adjacent to the previous chunk and so yes I even though I'm putting the number four there I haven't copied the one the
two or the three over to this chunk of memory so well let me fix well hm that's actually indeed really the essence of
the problem I am orphaning the original chunk of memory if you think of the picture pict that I drew earlier the
line of code up here on line five that allocates space for the initial three integers this code is fine this code is
fine but as soon as I do this I'm clobbering the value of list and saying no no no don't point at this chunk of
memory point at this chunk of memory at which point I've forgotten if you will where the original chunk of memory is so
the right way to do something like this would be a little more involved let me go ahead and give myself a temporary
variable and I'll literally call it temp TMP kind of like I did last week so that I can now ask the computer for a
completely different chunk of memory of size four I'm going to again say if temp equals null I'm going to say oh bad
things happened here so let me just return one and you know what just to be tidy let me free the original list
before I quit because remember from last week anytime you use malok you eventually have to use free but this
chunk of code here is just a safety check if there's no more memory there's nothing to see here I'm just going to
clean up my state and quit but now if I have asked for this chunk of memory now I can do this for INT I get whoops for
in I gets zero I is less than three i++ what if I do something like this temp bracket I equals list bracket I that
would seem to have the effect of copying all of the memory from one to the other and then I think I need to do one last
thing temp bracket 3 gets the number four for instance again I'm kind of just hardcoding the numbers for the sake of
discussion after I've done this what could I now do I could Now set list equals to
Temp and now I have updated my link list properly so let me go ahead and do this for in I gets zero I is less than four I
++ let me go ahead and print each of these elements out with percent I using list bracket I and then I'm going to
return zero just to signify that all is successful now so to recap we initialize the original array of size three and
plug in the values 1 2 3 time passes and then I realize wait a minute I need more space and so I asked the computer for a
second chunk of memory this one of size four just as a safety check I make sure that temp doesn't equal null because if
it does I'm out of memory so I should just quit Al together but once I'm sure that it's not null I'm going to copy all
the values from the old list into the new list and then I'm going to add my new number at the end of that list and
then now that I'm done playing around with this temporary variable I'm going to remember in my list variable what the
address is of this new chunk of memory and then I'm going to print all of those values out so at least aesthetically
when I make this new version of my list except for my missing semicolon let me try this again when I make list okay
what I do this time implicitly declaring a library function malok dot dot dot what's my mistake anytime you see that
kind of there yeah Library so up here I forgot to do include standard libh which is
where malok lives let me go ahead and again do make list there we go so I fixed that list and I should see 1 2 3 4
but there's still a bug here does anyone see the bug or question oh sorry say
again I forgot to free the original list and we could see this even if not just with our own eyes or intuition if I do
something like valgrind of/ list remember our tool from this past week let me increase the size of my terminal
window temporarily the output is crazy cryptic at first but notice that I have definitely lost some number of bytes
here and indeed it's even pointing at the line number in which some of those bites were lost so let me go ahead and
back to my code and indeed I think what I need to do is before I clobber the value of list pointing it at this new
chunk of memory instead of the old I think I now need to First proactively say free the old list of memory and then
change its value so if I now do make list and do do slash list the output is still the same and if I cross my fingers
and run Valen again after increasing my window size hopefully here ah still a bug so better it seems like less memory
is lost what have I now forgotten to do I forgot to free it at the very end
too cuz I still have a chunk of memory that I got from malok so let me go to the very bottom of the program now and
after I'm done sort of uh sort of Senses senselessly just printing this thing out let me free the new list and now let me
do make listlist it still works visually now let's do valgrind
of/ list enter and now hopefully all Heap blocks were freed no leaks are po possible so this is perhaps the best
kind of output you can see from a tool like vren I used the Heap but I freed all the memory as well so there were two
fixes needed there are any questions then on this array based approach the first of which is statically allocating
an array so to speak by just hard coding the number three this second version now is dynamically allocating the array
using not the stack but the Heap but it too suffers from the slowness we described earlier of having to copy all
those values from one to the other okay uh hand was over here why did you not have to free the temp
good question why did I not have to free the temp I essentially did eventually because temp was pointing at the chunk
of four integers but on line 33 here I assigned list to be identical to what temp was pointing at and so when I
finally freed list that was the same thing as freeing temp in fact if I wanted to I could say free temp here and
it would be the same but conceptually it's sort of wrong because at this point in the story I should be freeing the
actual list not that temporary variable but they were the same at that point in the story
yeah good question and long story short everything we're doing this far is still in the world of arrays the only
distinction we're making is that in version one when I said int list bracket 3 close bracket that was an array of
fixed size so-called statically allocated on the stack as per last week this version now is still dealing with
arrays but I'm kind of flexing my muscles and using dynamic memory allocations so that I can still use an
array per the first pictures we started talking about but I can at least grow the array if I want so we haven't even
now solved this even better in a sense with link list that's going to come next [Music]
yeah how am I able to Freel list I freed the original address of list I then changed what list is storing I'm moving
its Arrow to a new chunk of memory and that is perfectly reasonable for me to now manipulate because now list is
pointing at the same value of Temp and temp is what was given the return value of malok the second time so that chunk
of memory is valid so these are just um you know squares on the board right there's just pointers inside of them so
what I'm technically saying is I'm not pointing I'm not freeing list per se I am freeing the chunk of memory that
begins at the address currently in list therefore if a few lines later I change what the address is in list totally
reasonable to then touch that memory and eventually free it later because you're not freeing the variable per se you're
freeing the address in the variable good distinction all right so let me back up here and just now make one final edit so
let's finish this with one final Improvement here because it turns out there's a somewhat better way to
actually resize an array as we've been doing here and there's another function in standard lib that's called realloc
for reallocate and I'm just going to go in and make a little bit of a change here so that I can do the following um
let me go ahead and first comment this now so we can keep track of what's been going on this whole time so uh
dynamically allocate an array of size three assign three numbers to that array time passes allocate new array of size
4 copy numbers from old array into new array and add fourth number to new array free old
array um remember if you will new array using my same list variable and now print new
array free new array hopefully that helps and we'll post this code online after too which tells a more explicit
story so it turns out that we can reduce some of the labor involved with this um not so much with the printing here but
with this copying turns out C does have a function called rolot that can actually handle the resizing of an array
for you as follows I'm going to scroll up to where I previously allocated a new array of size four and I'm instead going
to say this resize old array to be of size four now previously this wasn't necessarily
possible because recall that we had painted ourselves into a corner with the example on the screen where hello world
happened to be right after the original array but let me do this let me use realloc for reallocate and pass in not
just the size of memory we want this time but also the address that we want to resize which again is this array
called list all right the code thereafter is pretty much the same but what I don't need to do is this so
realloc is a pretty handy function that will do the following if at the very beginning of class when we had one two
three on the board and someone's Instinct was to just plop the four right at the end of the list if there's
available memory realic will just do that and boom it will just grow the array for you in the computer's memory
if though it realizes sorry there's also there's already a string like hello world or something else there realloc
will handle the trouble of moving that whole array from one chunk of memory originally to a new chunk of memory and
then realloc will return to you the address of that new chunk of memory and it will handle the process of freeing
the old chunk for you so you do not need to do this yourself so in fact let me go ahead and get rid of this as well so
realak just condenses a lot of what we just did into a single function whereby uh rioc handles it for you all
right so that's the final Improvement on this array based approach so what now knowing what your memory is what can we
now do with it that solves that kind of problem because the world is going to get really slow and our apps and our
phones and our computers are going to get really slow if we're just constantly wasting time moving things around in
memory what could we perhaps do instead well just just one new piece of syntax today that builds on these three pieces
of syntax from the past recall that we've looked at struct which is a keyword C that just lets you invent your
own structure your own variable if you will in conjunction with typ def which lets you say a person has a name and a
number or something like that or a candidate has a name and some number of votes you can encapsulate multiple
pieces of data inside of just one using struct what did we use the dot notation for now a couple times what does the dot
operator do in C perfect to access the field inside of a structure so if you've got a person
with a name and a number you could say something like personname or person. number if person is the name of one such
variable star of course we've seen now in a few ways like way back in week one we saw it as like multiplication uh last
week we began to see it in the context of pointers whereby you use it to declare a pointer like int star P or
something like that but we also saw it in one other context which was like the opposite which was the dreference
operator which says if this is an address that is if this is a variable like a pointer and you put a star in
front of it then with no int or no char No data type in front of it that means go to that address and it dereferences
the pointer and goes to that location so it turns out that using these three building blocks you can actually start
to now use your computer's memory almost any way you want and even next week when we transition to Python and you start to
get a lot of features for free like a single line of code will just do so much more in Python than it does in C it
boils down to those basic Primitives and just so you've seen it already it turns out that it's so common in C to use this
operator to go inside of a structure and this operator to go to an address that there's shorthand notation for it AKA
syntactic sugar that literally looks like an arrow so recall last week I was in the habit of pointing even with the
big foam finger this arrow notation a hyphen and a angled bracket uh denotes going to a uh an address and looking at
a field inside of it but we'll see this in practice in just a bit so what might be the solution now to this problem we
saw a moment ago whereby we had painted ourselves into a corner and our memory a few moments ago looked like this we
could just copy the whole existing array to a new location add the four and go about our business what would another
perhaps better solution longer term be that doesn't require constantly moving stuff
around maybe hang in there for your instincts if you know the sort of Buzz phrase we're looking for from past
experience hang in there but if we want to avoid moving the one two and the three but we still want to be able to
add endless amounts of data what could we do yeah so maybe create some kind of list using pointers that just kind of
point at a new location right in an ideal world even though this uh piece of memory is being used by this H in the
string hello world maybe we could somehow use a pointer from last week like an arrow that says after the three
oh I don't know go down over here to this location in memory and you just kind of stitch together these uh
integers in memory so that each one leads to the next it's not necessarily the case that it's literally back to
back that would have the downside it would seem of costing us a little bit of space like a pointer which recall takes
up some amount of space typically eight bytes or 64 bits but I don't have to copy potentially a huge amount of data
just to add one more number and so these things do have a name and indeed these things are what generally would be
called a linked list a linked list captures exactly that kind of intuition of
linking together things in memory so let's take a look at an example here's computer's memory in the abstract
suppose that I'm trying to create an array no let's generalize it as a list now of numbers an array has a very
specific meaning it's memory that's contiguous back to back to back at the end of the day I as the programmer just
care about the data 1 two 3 four and so forth I don't really care how it's stored until uh I don't care how it's
stored when I'm writing the code I just want it to work at the end of the day so suppose that I first insert my number
one and who knows it ends up up there at location ox123 for the sake of discussion all
right maybe there's something already here and heck maybe there's something already here but there's plenty of other
options for where this thing can go and suppose that for the sake of discussion the first available spot for the next
number happens to be over here at uh location Ox 456 for the sake of discussion so that's where I'm going to
plop the number two and where might the number three end up oh I don't know maybe down over there at o 789 the point
being I don't know what is or really care about everything else that's in the computer's memory I just care that there
are at least three locations available where I can put my one my two and my three but the catch is now that we're
not using an array we can't just naively assume that you just add one to an index and boom you're at the next number add
two to an index and boom you're at the next next number now you kind of have to leave these little breadcrumbs or use
the arrow notation to kind of lead from one to the other and sometimes it might be close a few bytes away maybe it's a
whole gigabyte away and even bigger computers memory so how might I do this like where do these pointers go as you
proposed right all I have access to here are bites I've already stored the one the two and the three so what more
should I do okay yeah so let me you put the pointers right next to these numbers so
let me at least plan ahead so that when I ask the computer like malok recall from last week for some memory I don't
just ask it now for a space for just the number let me start getting into the habit of asking malok for enough space
for the number and a pointer to another such number so it's a little more aggressive of me to ask for more memory
but I'm kind of planning ahead and here's an example of a trade-off almost anytime in CS when you start using more
space you can save time or if you try to conserve space you might have to lose time U it's being that kind of trade-off
there so how might I solve this well let me abstract this away and either next to or below I'm just drawing it uh
vertically just for the sake of discussion so the arrows are a bit prettier I've asked Malo for now twice
as much space it would seem than I previously needed but I'm going to use this second chunk of memory to refer to
the next number and I'm going to use this chunk of memory to refer to the next essentially stitching this thing
together so what should go in this first box well I claim the number o x 456 and it's written in HEX because it
represents a memory address but this is the equivalent of sort of drawing an arrow from one to the other as a a
little uh check here what should go in this second box if the goal is to stitch these together in order one two three
feel free to just shout this out okay oh okay that worked well so Ox 789 indeed and you can't do that with
hands because I can't count that fast so Ox 789 should go here because that's like a little breadcrumb to the next and
then we don't really have terribly many possibilities here this has to have a Val value right because at the end of
the day it's got to uh use its 64 bits in some way so what value should go here if this is the end of this
list so it could be 0x123 the implication being that it would kind of be a cyclical list which is okay but
potentially problematic if any of you have accidentally sort of lost control over your uh code space because you had
an infinite Loop this would seem a very easy way to give yourself The Accidental uh probability of a an INF Loop what
might be simpler than that and Ward that off say again so just the null character not NL confusingly which is at the end
of strings but n l as we introduced it last week which is the same as o x0 so this is just a special value that
programmers decades ago decided that if you store the address zero that's not a valid address there's never going to be
anything useful at Ox Z therefore it's a sentinel value just a special value that indicates that's it there's nowhere from
further to go it's okay to come back to your suggestion of making a cyclical list but we better be smart enough to
maybe remember where did the list start so that you can detect Cycles if you start looping around in this structure
otherwise all right but these addresses who really cares at the end of the day if we abstract this away it really just
now looks like this and indeed this is how most anyone would draw this on a whiteboard if having a discussion at
work talking about what data structure we should use to solve some problem in the real world we don't care generally
about the addresses we care that in code we can access them but in terms of the concept alone this would be perhaps the
right way to think about this all right let me pause here to see if there's any questions on this idea of creating a
linked list in memory by just storing not just the numbers like 1 two 3 but twice as much data so that you have
little breadcrumbs in the form of pointers that can lead you from one to the
next any questions on these linked lists any questions no or right oh yeah over
here this does take more memory than an array because I now need space for these pointers and to be clear I technically
didn't really draw this to scale thus far in the class we've generally thought about integers like 1 two and three as
being four bytes or 32 bits I made the claim last week that on monitored computers pointers tend to be 8 bytes or
64 bits so technically this box should actually be a little bigger it was just going going to look a little stupid in
the picture so I abstracted it away but indeed you're using more space as a result oh how does the sorry how does
the computer identify useful data from uh used data so for instance garbage values or non-garbage values for now
think of that as the job of malok so when you ask malok for memory as we started to last week malok keeps track
of the addresses of the memory it has handed to you as valid values um the other type of memory you use not just
from the Heap because recall we briefly discussed that Malo used the space from the Heap which was drawn at the top of
the picture pointing down there's also stack memory which is where all of your local variables go and where all of the
memory used by individual functions go and that was drawn in the picture as working its way up that's just an artist
rendition of Direction the uh compiler essentially will also help keep track of which values are valid or not inside of
the stack or really the underlying code that you've written will keep track of that for you so it's managed for you at
that point all right good question sorry it took me a bit to catch on so let's now translate
this to actual code how could we implement this idea of let's call these things nodes and that's a term of Art in
CS whenever you have some kind of data structure that encapsulates information node no o is the generic term for that
so each of these might be said to be a node well how can we do this well a couple of weeks ago we saw how we could
represent something like a student or a candidate and a student or rather a person we said will has a name and a
number and we used a few pieces of syntax here one we use the struct keyword which gives us a data structure
we use type def which defines the name person to be our new data type representing that whole structure so we
probably have the right ingredients here to build up this thing called a node and just to be clear what should go inside
of one of these nodes do we think it's not going to be a name or a number obviously but what should a node have in
terms of those fields perhaps yeah so a number like a number and pointer in some form so let's translate this to actual
code so let's rename person to node for to capture this notion here and the number is easy if it's just going to be
an INT that's fine we can just say int number or intn or whatever you want to call that particular field the next
one's a little nonobvious and this is where things get a little weird at first but in retrospect it should all kind of
fit together let me propose that ideally we would say something like node star next and I could call the word next
anything I want next just means what comes after me is the notion I'm using in it so a lot of Cs people would just
use next to represent the name of this pointer but there's a catch here C and C compilers are pretty naive recall they
only look at code top to bottom left to right and anytime they encounter a word they have never seen before bad things
happen like you can't compile your code you get some cryptic error message or the like and that seems to be about to
happen here because if the compiler is reading this code from top to bottom it's going to going to say oh inside of
this struct should be a variable called Next which is of type node star what the heck is a node because it literally does
not find out until two lines later after that semicolon so the way to avoid this which we haven't quite seen before is
that you can temporarily name this whole thing up here struct node and then down here inside of the data structure you
say struct node star and then you leave the rest alone this is kind of a workaround this is possible because now
you're teaching the compiler from the first line that here comes a data structure called struct node down here
you're shortening the name of this whole thing to just node why it's just a little more convenient than having to
write struct everywhere but you do have to write struct node star inside of the data structure but that's okay because
it's already come into existence now as of that first line of code so that's the only fundamental difference between what
we did last week with a person or a candidate um we just now have to use this this work around
syntactically all right yeah [Music] question why is the next variable a
struct node star pointer and not an in Star pointer for instance so think about the picture we are trying to draw
technically yes each of these arrows I deliberately Drew is pointing at the number but that's not alone they need to
point at the whole data structure in memory because the computer ultimately and the comp piler in turn needs to know
that this chunk of memory is not just an INT it is a whole node inside of a node is a number and also another pointer so
when you draw these arrows it would be incorrect to point at just the number because that throws away information
that would leave the compiler wondering okay I'm at a number where the heck is the pointer you have to tell it that
it's pointing at a whole node so it knows a few bytes away is that corresponding pointer good question
yeah really good question it would seem that just as copying the array earlier required twice as much memory because we
copied from old to new so technically twice as much plus one for the new number here too it looks like we're
using twice as much memory also and to my comment earlier it's even more than twice as much memory because these
pointers are eight bytes and not just four bytes like a typical integer is the differences are these in the context of
the array you were using that memory temporarily so yes you needed twice as much memory but then you were quickly
freeing the original so you weren't consuming long-term more memory than you might need the difference here too is
that as we'll see in a moment it turns out it's going to be relatively quick for me potentially to insert new numbers
in here because I'm not going to have to do a huge amount of copying and even though I might still have to follow all
of these arrows which is going to take some amount of time um I'm not going to have to be asking for more memory
freeing more memory and certain operations in a computer anything involving asking for or giving back
memory tends to be slower so we get to avoid that situation as well there's going to be some downsides though this
is not all upside but we'll see in a bit just what some of those trade-offs actually are all right so from here if
we go back to the structure in code as we left it let's start to now build up a link list with some actual code how do
you go about in see representing a linked list in code well at the moment it would actually be as simple as this
you declare a variable called list for instance that itself stores the address of a node that's what node star means
the address of a node so if you want to store a linked list in memory you just create a variable called list or
whatever else and you just say that this variable is going to be pointing at the first node in a list wherever it happens
to end up because malok is ultimately going to be the tool that we use just to go get at any one particular node in
memory all right so let's actually do this in pictorial form when you write a line of code like I just did here and I
do not initialize it to anything with the assignment op operator an equal sign it does exist in memory as a box as I'll
draw it here called list but I've deliberately drawn Oscar inside of it why to connote what
exactly value it's a garbage value I have been allocated the variable in memory called list which is going to
give me 64 bits or eight bytes somewhere drawn here with this box but if I myself have not used the assignment operator
it's not going to get magically initialized to any particular address for me it's not going to even give me a
node this is literally just going to be an address of a future node that exists so what would be a solution here suppose
that I'm beginning to create my linked list but I don't have any nodes yet what would be a sensible thing to initialize
list two perhaps yeah again null so just null right when in doubt with pointers
generally it's a good thing to initialize things to null so at least it's not a garbage value it's a known
value invalid yes but it's a special value you can then check for with a condition or the like so this might be a
better way to create a linked list even before you've inserted any numbers into the thing itself all right so after that
how can we go about adding something to this link list so now the story looks like this Oscar is gone because inside
of this box is all zero bits just because it's nice and clean and this represents an empty link list well if I
want to add the number one to this link list what could I do well perhaps I could start with code like this
borrowing inspiration from last week let's ask m lock for enough space for the size of a node and this kind of gets
to your question earlier like what is it I'm manipulating here I don't just need space for an INT and I don't just need
space for a pointer I need space for both and I gave that thing a name node so size of node figures out and does the
arithmetic for me and gives me back the right number of bytes uh this then stores the address of that chunk of
memory in what I'll temporarily called n just to represent a generic new node and it's of type node star because just like
last week when I asked malok for enough space for an INT and I stored it in an INT star pointer this week if I'm asking
for memory for a node I'm storing it in a node star pointer so technically nothing new there except for this new
term of Art and data structure called node all right so what does that do for me it essentially draws a picture like
this in memory I still have my list variable from my previous line of code initialized to null and that's why I've
drawn it blank I also now have a temporary variable called n which which I initialized to the return value of
malok which gave me one of these nodes in memory but I've drawn it having garbage values too because I don't know
what int is there I don't know what pointer is there it's garbage values because malok does not magically
initialize memory for me there is another function for that but Malo alone just says sure use this chunk of memory
deal with whatever's there so how could I go about initializing this to known values well suppose I want to insert the
number one and then leave it at that a list of size one I could do something like this and this is where you have to
think back to some of these Basics my conditional here is asking the question if n does not equal null so that is if
Malo gave me valid memory and I don't have to quit altogether because my computer's out of memory if n does not
equal null that is it equals a valid address I'm going to go ahead and do this and this is cryptic looking syntax
now but does someone want to take a stab at translating this inside line of code to English and some sense how might you
explain what that inner line of code is doing star n do number equals 1 uh let me go further back no okay over here
yeah perfect the place that n is pointing to set it equal to one or using the vernacular of going there go to the
address in n and set its number field to one however you want to think about it that's fine but the star again is the D
reference operator here and we're doing the parentheses which we have needed to do before because we haven't dealt with
pointers and data structures together until today this just means go there first and then once you're there go
access number you don't want to do one thing before the other so this is just enforcing order of operations the
parenthesis just like in grade school math all right so this line of code is cryptic it's ugly it's not something
most people easily remember thankfully there's that syntactic sugar that simplifies this line of code to just
this and this even though it's new to you today should eventually feel a little more familiar because this now is
shorthand notation for saying start at n go there as by following the arrow and when you get there change the number
field in this case two one so most people would not write code like this it's just ugly it's a couple extra
keystrokes this just looks more like the artist Renditions we've been talking about and how most CS people would think
about pointers as really just being arrows in some form all right so what have we just done the picture now after
setting number to one looks a little something like this so so there's still one step missing and that's of course to
initialize it would seem the pointer in this new node to something known like null so I bet we could do this like this
with a different line of code I'm just going to se say if n does not equal null then Set n next field to null or more
pedantically go to n Follow the arrow and then update the next field that you find there to equal null and again this
is just doing some nice bookkeeping technically speaking we might not need to set this to null if we're going to
keep adding more and more numbers to it but I'm doing it step by step so that I have a very clean picture and there's no
bugs in my code at this point but I'm still not done there's one last thing I'm going to have to do here if the goal
ultimately was to insert the number one into my linked list what's the last step I should perhaps do
here just in English is fine yeah yes I now need to update the actual variable that represents my link list to
point at this brand new node that is now perfectly initialized as having an integer and a null pointer yeah
technically this is already pointing there but I described this deliberately earlier as being temporary I just needed
this to get it back from malok and sort of clean things up initially this is the long-term variable I care about so I'm
going to want to do something simple like this list equals n and this seems a little weird that list equals n but
again think about about what's inside this box at the moment this is null because there is no link list at the
beginning of our story n is the address of the beginning and it turns out end of our length list so it stands to reason
that if you set list equal to n that has the effect of copying this address up here or really just copying the arrow
into that same location so that now the picture looks like this and Heck if this was a temporary variable it'll
eventually go away and now this is the picture so kind of an annoying number of steps certainly to walk through verbally
like this but it's just malok to give yourself a node initialize the one the SEC the two Fields inside of it update
the link list and boom you're on your way I didn't have to copy anything I just had to insert something in this
case let me pause here to see if there's any questions on those steps and we'll see before long it all in context with
some larger [Music] code yes I we I drew them separately
just for the sake of the voice over of doing each thing very methodic in real code as we'll transition to now I could
have and should have just done it all inside of one conditional after checking if n is not equal to null I could set
number to a value like one and I could set the pointer itself to something like null all right well let's translate then
this into some similar code that allows us to build up a linked list now using code similar in spirit to before but now
using this new primitive so I'm going to go back into VSS code here I'm going to go ahead now and delete the entirety of
these old old version that was entirely array based and now inside of my main function I'm going to go ahead and first
do this I'm going to first give myself a uh a list of size zero and I'm going to call that node star list and I'm going
to initialize that to null as we proposed earlier but I'm also now going to have to take the additional step of
defining what this node is so recall that I might do something like typ def struct node inside of this struct node
I'm going to have a number which I'll call number of type int and I'm going to have a structure called node with a star
that says the next pointer is called Next and I'm going to call this whole thing more succinctly node instead of
struct node now as an aside for those of you wondering what the difference really is between struct and node technically I
could do something like this not use typ Def and not use the word node alone this syntax here would actually create for me
a new data type called verbosely struct node and I could use this throughout my code saying struck node struck node that
just gets a little tedious and it would be nicer just to refer to this thing more simplistically as a node so what
type def has been doing for us is it again lets us invent our own word that's even more succinct and this just has the
effect now of calling this whole thing node without the need subsequently to keep saying struct all over the place
just FYI all right so now now that this thing exists in main let's go ahead and do this let's add a number to list and
to do this I'm going to give myself a a temporary variable I'll call it n for consistency I'm going to use malok to
give myself the size of a node just like in our slides and then I'm going to do a little safety check if n equals equals
null I'm going to do the opposite of the slides I'm just going to quit out of this program because there's nothing
useful to be done at this point but most likely my computer's not going to run out of memory so I'm going to assume we
can keep going with some of the logic here if n does not equal null and that is it's a valid memory address I'm going
to say n bracket and I'm going to build this up back words just well let's do that's okay let's go ahead and do this n
bracket number equals 1 and then n bracket NE or Arrow next equals null and now uh update list to point to new node
list equals n so at this point in the story we've essentially constructed what was that first picture which looks like
this this is the corresponding code via which we built up this node in memory suppose now we want to add the number
two to the list so let's do this again add number uh add a number to list how might I do this well I don't need to
redeclare n because I can use the same temporary variables before so this time I'm just going to say n equals malok and
the size of a node I'm again going to have my safety check so if n equals equals null then let's just quit out of
this Al together but but but I have to be a little more careful now Tech technically speaking what do I
still need to do before I quit out of my program to be really proper free the memory that did succeed
a little higher up so I think it suffices to free what is now called list way at the top all right now if all was
well though let's go ahead and say n bracket number equals 2 and now n bracket uh or sorry n arrow next equals
null and now let's go ahead and add it to the list uh if I go ahead and
do uh list Arrow next equals N I think what we've just done is build up the equivalent now of this in the computer's
memory by going to the list Fields next field which is synonymous with the one nodes bottommost box and store the
address of what was in which a moment ago looked like this and I'm just throwing away in the picture the
temporary variable all right one last thing to do let me go down here and say uh add a
number to list n equals malok let's do it one more time size of node and clearly in a real program we might want
to start using a loop and do this dynamically or a function because it's a lot of repetition now but just to go
through the syntax here this is fine if n equals equals null out of memory for some reason let's return one but but but
we should return we should free the list itself and even the second node list bracket next but I've deliberately done
this poorly all right this is a little more subtle now and let me get rid of the highlighting just so it's a little
more visible if n happens to equal equal null and something really just went wrong there out of memory why am I
freeing two addresses now and again it's not that I'm freeing those variables per se I'm freeing the addresses at in those
variables but there's also a bug with my code here and it's subtle let me ask more pointedly this
line here 43 what is that freeing specifically can I go to you I'm freeing not not so that's okay
I'm not freeing list two times technically I'm freeing list once and list next want but let me just ask the
more explicit question what am I freeing with line 43 at the moment which node I think node number one why because
if one is at the beginning of the list list contains the address of that number one node and so this frees that node
this line of code you might think now intuitively okay it's probably freeing the node number two but this is bad and
this is subtle valren might help you catch this but by eyeing it it's not necessarily obvious you should never
touch memory that you have already freed and so the fact that I did this in this order very bad because I'm telling the
operating system I don't know I don't need the list address anymore do with it what you want and then literally one
line later you're saying wait a minute let me actually go to that address for a moment and look at the next field of
that first node it's too late you've already sort of given up control over the node so it's an easy fix in this
case logically but we should be freeing the second node first and then the first one so that we're uh doing it in
essentially reverse order and again vren would help you catch that but that's the kind of thing one needs to be careful
about when touching memory at all you cannot touch memory after you freed it but here is my last step let me go ahead
and update the number field of next to number field of n to be three the next node of n to be null and then just like
in the slide earlier I think I can do list next next equals n and that has the effect now of building up in the
computer's memory essentially this data structure very manually very pedantically like in a better world we'd
have a loop and some functions that are automating this process but for now we're doing it just to play around
with the syntax so at this point unfortunately suppose I want to print the numbers it's
no longer as easy as in I equals 0 I Less Than 3 I ++ because you cannot just do something like
this because pointer arithmetic uh no longer comes into play when it's you who are stitching together
the data structure in memory in all of our past examples with arrays you've been trusting that all of the bytes in
the array are back to back to back so it's perfectly reasonable for the compiler and the computer to just figure
out oh well if you want bracket zero that's at the beginning bracket one it's one location over bracket two it's one
location over this is way less obvious now because even though you might want to go to the first element in the link
list or the second or the third you can't just jump to those AR arithmetically by doing a bit of math
instead you have to follow all of those arrows so with link list you can't use this square bracket notation anymore
because one node might be here over here over here over here you can't just use some simple offset so I think our code's
going to have to be a little fancier and this might look scary at first but it's just an application of some of the basic
definitions here let me do a for Loop that actually uses a node star variable initialized to the list itself I'm going
to keep doing this so long as temp does not equal null and on each iteration of this Loop I'm going to update temp to be
whatever temp Arrow next is and I'll rewind in a moment and explain in more detail but when I print something here
with print F I can still use percent I because it's still a number at the end of the day but what I want to print out
is the number in this temporary variable so maybe the ugliest for Loop we've ever seen because it's mixing not just the
idea of a for Loop which itself was a bit cryptic weeks ago but now I'm using pointers instead of integers but I'm not
violating the definition of a for Loop recall that a for Loop has three main things in parentheses what do you want
to initialize first what condition do you want to keep checking again and again and what update do you want to
make on every iteration of the loop so with that basic definition in mind this is giving me a temporary variable called
temp that is initialized to the beginning of the loop so it's like pointing my finger at the number one
node then I'm asking the question does temp not equal null well hope not because I'm pointing at a valid node
that is the number one node so of course it doesn't equal null yet null won't be until we get to the end of the list so
what do I do I start at this temp variable I Follow the arrow and go to the number field
therein what do I then do the for Loop says change temp to be whatever is at temp by following the arrow and grabbing
the next field that then has the result of being checked against this conditional nope of course it doesn't
equal null because the second node is the number two node null is still at the very end so I print out the number two
next step I update temp one more time to be whatever is next that then does not yet equal null so I go ahead and print
out the number three node then one last time I update temp to be whatever temp is in the next field but after one two
three that last next field is null and so I break out of this for Loop Al together so if I do this in pictoral
form all we're doing if I now use my finger to represent the temp variable I initialize temp to be whatever list is
so it points here that's obviously not null so I print out whatever is at temp Follow the arrow in number and I print
that out then I update temp to point here then I update temp to point here then I update temp to point here wait
that's null the for Loop ends so again admittedly much more cryptic than our familiar in I equals z and so forth but
it's just a different utilization of the for Loop syntax yes how
[Music] does Good Question how is it that I'm actually printing numbers and not
printing out addresses instead the compiler is helping me here because I taught it in the very beginning of my
program what a node is which looks like this here the compiler knows that a node has a number field and an X field down
here in the for Loop because as I'm iterating using a node star pointer and not an in Star pointer the compiler
knows that anytime I'm pointing at something I'm pointing at the whole node doesn't matter where specifically in the
rectangle I'm pointing per se it's ultimately pointing at the whole node itself and the fact that I then use temp
Arrow number means okay adjust your finger slightly so you're literally pointing at the number field and not the
next field so that's sufficient information for the computer to distinguish the two good question other
question questions then on this approach here yeah in back how would I use a for Loop to add
elements to a linked list um you will do something like this if I may uh in problem set five we will give you some
of the scaffolding for doing this um but in uh this coming week's materials will we guide you to them but let me not
spoil it just yet fair question though [Music] yeah okay
good question is line 49 acceptable even if we freed it earlier we didn't free it in line 43 in this case right you can
only reach line 49 if n does not equal null and you do not return on line 45 so that's safe I was only doing those
freeing if I knew on line 45 that I'm out of here anyway at that point good question and
[Music] yeah correct if you're asking about temp because it's in a for Loop does that
mean you don't have to free it you never have to free pointers per se you should only free addresses that were returned
to you by malok so I haven't finished the program to be fair but you're not freeing variables you're not freeing
like Fields you are freeing specific addresses whatever they may be so the last thing and I was kind of stalling on
showing this just because it too is a little cryptic here is how you can free now a whole link list in the world of
arrays recall it was so easy you just say free list you return zero and you're done not with a link list because again
the computer doesn't know what you have stitched together using all of these pointers all over the computer's memory
you need to follow those arrows so one way to do this would be as follows while the list itself is not null so while
there's a list to be freed what do I want to do I'm going to give myself a temporary variable called temp again and
it's a different temp because it's in a different scope it's inside of the while loop instead of the for Loop a a few
lines earlier I am going to initialize temp to be the address of the next node just so I can get one step ahead of
things why am I doing this because now I can boldly free the list itself which does not mean the whole list again I'm
freeing the address in list which is the address of the number one node that's what list is it's just the address of
the number one node so if I first use temp to point at the number two slightly in the middle of the picture then it is
safe for me on line 61 at the moment to free list that is the address of the first node now I'm going to say all
right once I freed the first list the first node in the list I can update the list itself to be literally Temp and now
the loop repeats so what's happening here if you think about this picture temp is initially pointing at not the
list but list Arrow next so temp represented by my right hand here is pointing at the number two totally safe
and reasonable to free now the list itself aka the address of the number one node that has the effect of just
throwing away the number one node telling the computer you can reuse that memory for you the last line of code I
wrote updated list to point at the number two at which point my Loop proceeded to do the exact same thing
again and only once my finger is literally pointing at nowhere the null symbol will the loop by nature of a
while loop as I'll toggle back to break out and there's nothing more to be freed so again what you'll see ultimately in
problem set five more on that later is an opportunity to play around with just this syntax but also these ideas but
again even though the syntax is admittedly pretty cryptic we're still using Basics like these for Loops or
while Loops we're just starting to now follow explicit addresses rather than letting the computer do all of the
arithmetic for us as we previously benefited from at the very end of this thing I'm going to return zero as though
all is well and I think then we're good to go our right questions on this linked list code now and again we'll walk
through this again in the coming weeks spec [Music]
yeah sure can we explain this this while loop here for freeing the list so notice that first I'm just asking the obvious
question is the list null because because if it is there's no work to be done however while the list is not null
according to line 58 what do we want to do I want to create a temporary variable that points at the the same thing that
list Arrow next is pointing at so what does that mean here's list list Arrow next is whatever this thing is here so
if my right hand represents the temporary variable I'm literally pointing at the same thing as the list
is itself the next line of code recall was free the list and unlike in our world of arrays like half an hour ago
where that just meant free the whole darn list you now have taken over control over the computer's memory with
a link list in ways that you didn't with the array the computer knew how to free the whole array cuz you Malo the whole
thing at once you are now Mal loocking the link list one node at a time and the operating system does not keep track of
for you where all these nodes are so when you free list you are literally freeing the value of the list variable
which is just this first node here then my last line of code which I'll flip back to in a second updates list to now
ignore the freed memory and point that two and the story then repeats so again it's just a very pedantic way of using
this new syntax of star notation and the arrow notation and the like to sort of do the equivalent of walking down all of
these arrows following all of these breadcrumbs um but it does take admittedly some getting used to syntax
you only have to do one week but again next week in Python will we begin to abstract a lot of this complexity away
but none of this complexity is going away it's just that someone else the auth of python for instance will have
automated this kind of stuff for us the goal this week is to understand what it is we're going to get for free so to
speak next week all right questions on these length lists all right just oh yeah and
[Music] back fair question let me summarize as could we have freed this with a for Loop
absolutely um it just is a matter of style it's a little more elegant to do it in a while loop according to me but
other people will reasonably disagree um anything you can do with a while loop you can do with a for Loop and vice
versa do while Loops recall are a little different but they will always do at least one thing but four loops and wild
Loops behave the same in this case sure other questions all right well let's just vary things a little bit here
just to see what some of the pitfalls might now be without getting into the weeds of code indeed we'll try to save
some of that for problems at 5es exploration but instead let's imagine that we want to create a list here of
our own um I can offer an exchange for a few volunteers uh some foam fingers to bring to the next game perhaps uh could
we get maybe just one volunteer first come on up you will be our linked list from the GetGo what's your name Pedro
Pedro come on up all right thank you to Pedro and if you want to just stand
roughly over here but you are a null pointer so just point sort of at the ground as though you're pointing at zero
all right so Pedro is our link list of size zero which pictorially might look a little something like this for
consistency with our past pictures now suppose that we want to go ahead and malok O how about uh the number two can
we get a a volunteer to be on camera here okay you kind of jumped out of your seat do you want to come
up okay you really want the foam finger I see all right round of applause sure okay and what's your name Caleb say
again Caleb hayin Caleb Caleb Caleb sorry all right so here is your number two for your number field and here is
your pointer and come on let's say that there was room for Caleb like right there that's perfect so Caleb got maled
if you will over here so now if we want to insert Caleb and the number two into this link list well what do we need to
do I already initialized you to two and pointing as you are to the ground means you're initialized to null for your next
field Pedro what you should you perfect what should Pedro do Point that's fine too so Pedro is now pointing at the list
so now our list looks a little something like this so so far so good all is well so the first couple of these will be
pretty straightforward let's insert one more or if anyone really wants another foam finger here how about right in the
middle come on down and just in anticipation how about let's malx someone out okay your friends are
pointing at you do you want to come down to preemptively this is a a pool of memory if you will what's your name
Hannah Hannah all right Hannah you are number four and hang there for just a moment
all right so we've just mlocked Hannah and Hannah how about Hannah suppose you ended up over there in just some random
location all right so what should we now do if the goal is to keep these things sorted how about so Pedro do you have to
update yourself no no all right Caleb what do you have to do okay and Hannah what should you be
doing I would just you're it's just you for now so point at the ground representing null okay so again
demonstrating the fact that unlike in past weeks where we had our nice clean array back to back to back continuously
these guys are deliberately all over the stage so let's maloc another how about number five what's your name Jonathan
Jonathan all right Jonathan you are our number five and pick your favorite place in memory
okay all right so Jonathan's now over there and Hannah's over there so five we want to point Hannah at number five so
you of course you're going to point there and where should you be pointing down to represent null as well okay so
pretty straightforward but now things get a little interesting and here we'll use a chance to without the weeds of
code point out how order of operations is really going to matter suppose that I next want to allocate say the number one
and I want to insert the number one into this list yes this is what the code would look like but if we sort of act
this out could we get one more volunteer uh how about on the end there in the sweater yeah come on down we have what's
your name Lauren Lauren okay Lauren come on down and how about Lauren why don't you
go right in here in front if you don't mind here is your number here is your pointer so I've initialized Lauren to
the number one and your pointer will be null pointing at the ground uh where do you belong if we're maintaining sorted
order looks like right at the beginning what should happen here okay so Pedro has presumed to point
now at Lauren but how do you know where to point number two Pedro's undoing what he
did a moment ago so this was deliberate and that was perfect that Pedro presumed to point immediately at Lauren why you
literally just orphaned all of these folks all of these chunks of memory why because if was our only variable
pointing at that chunk of memory this is the danger of using pointers and dynamic memory allocation and building your own
data structures the moment you point temporarily if you could to Lauren I have no idea where he's pointing to I
have no idea how to get back to Caleb uh or Hannah or anyone else on stage so that was bad so you did undo it so
that's good I think we need Lauren to make a decision first who should you point at so pointing at Caleb why
because you're pointing at literally who Pedro is pointing at Pedro now what are you safe to do
good so order of operations there matters and if we had just done this line of code in red here list equals in
that was like Pedro's first instinct bad things happen and we orphaned the rest of the list but if we think through it
logically and do this as Lauren did for us instead we've now updated the list to look a little something more like this
let's do one last one we got one more foam finger here for the number three how about on the end yeah you want to
come down all right one final volunteer all right what's your name miam sorry miam miam all right so here is your
number three here's your pointer if you want to go maybe in the middle of the stage in a random memory location so
here too the goal is to maintain sorted order so let's ask the audience who or what number should point at whom first
here so we don't screw up in orphan some of the memory and this is if we do orphan memory this is what's called
again per last week a memory leak your Mac your PC your phone can start to slow down if you keep asking for memory but
never give it back or lose track of it so we want to get this right who should point at whom or what number say again
to four three should point at four so three do you want to point at four and not uh so okay good and how did
you know Miriam whom to point at perfect okay so copying Caleb why because if you look at where this list
is currently constructed and you can cheat on the Bo here two is pointing to four if you point at whoever Caleb
number two is pointing at that indeed leads you to Hannah for number four so now what's the next step to stitch This
Together our our voice in the crowd two to two to three so two to three so Caleb I think it's now safe for you to
decouple because someone is already pointing at Hannah we haven't orphaned anyone so now if we follow the
breadcrumbs we've got Pedro leading to one to two to three to four to five we need the numbers back but you can keep
the foam fingers thank you to our volunteers here thank you thank you you can p put the numbers
here thank you to all so this is only to say that when you start looking at the code this week and and the problem said
it's going to be very easy to sort of lose sight of the forest for the trees because the code does get really dense
but the ideas again really do pubble up to these higher level descriptions and if you think about data structures at
this level if you go off in program after a class like cs50 and you're whiteboarding
something with a friend or a colleague most people think at and talk at this level and they just assume that yeah if
we went back and look at our textbooks or class notes we could figure out how to implement this but the important
stuff is the conversation and the ideas up here even though via this week will we get some practice with the actual
code so when it comes to analyzing an algorithm like this let's consider the following what might be now the running
time of operations like searching and sort and searching and inserting into a linked list we talked about arrays
earlier and we had some binary search possibilities still as soon as it's an array but as soon as we have a link list
these arrows like our volunteers could be anywhere on stage and so you can't just assume that you can jump
arithmetically to the middle element to the middle element to the middle one you pretty much have to follow all of these
breadcrumbs again and again so how might that inform what we see well consider this too even though I keep drawing all
these pictures with all of the numbers exposed and all of us humans in the room can easily spot where the one is where
the two is where the three is the computer again just like with our lockers and arrays can only see one
location at a time and the key thing with a length list is that the only address we've fundamentally been
remembering is what Pedro represented a moment ago he was the link to all of the other nodes and in turn each person led
to the next but without Pedro we would have lost some of or all of the length list so when you start with a length
list if you want to find an element as via search you have to do it linearly following all of the arrows following
all of the pointers on the stage in order to get to the node in question and only once you hit null can you conclude
yep it was there or no it was not so given that if a computer essentially can only see the number one or the number
two or the number three or the number four or the number five one at a time how might we think about
the running time of search it is indeed bigo a van but why is that well in the worst case the number you might be
looking for is all the way at the end and so obviously you're going to have to search all of the N elements and I drew
these things with boxes on top of them because again even though you and I can immediately see where the five is for
instance the computer can only figure that out by starting at the beginning and going there so there too is another
tradeoff it would seem that overnight we have lost the ability to do a very very powerful algorithm from week zero known
as binary search right it's gone because there's no way in this picture to jump mathematically to the middle node unless
you remember where it is and then remember where every other node is and at that point you're back to an array
link list by Design only remember the next node in the list all right how about something like insert in the worst
case perhaps how many steps might it take to insert something into a linked list someone else someone else yeah say
again n squ fortunately it's not that bad it's not as bad as n squ that typically means doing N Things and times
and I think we can stay under that but not a bad bad thought yeah why would it be
[Music] n okay so you're to summarize you're proposing n because to find where the
thing goes you have to Traverse potentially the whole list because if I'm inserting the number six or the
number 99 that numerically belongs at the very end I can only find its location by looking for all of them at
this point though in the term and really this point in the story you should start to question these kinds of very
simplistic questions to be honest because it the answer is almost always going to depend right if I've just got a
linked list that looks like this the first question back to to uh someone asking this question would be well does
the list need to be sorted right I've drawn it as sorted and it might imply as much so that's a reasonable assumption
to have made but if I don't care about maintaining sorted order I could actually insert into a link list in
constant time why I could just keep inserting into the beginning into the beginning into the beginning and even
though the list is getting longer the number of steps required to insert something between the first element is
not growing at all you just keep kind of inserting inserting if you want to keep it sorted though yes it's going to be
indeed Big O of n but again these kinds of now assumptions are going to start to matter so let's for the sake of
discussion say it's Big O of n if we do want to maintain inserted order but what about um in the case of not caring it
might indeed be Big O of one and now these are the kinds of decisions that will start to leave to you what about in
the best case here if we're thinking about big Omega notation then frankly we could just get lucky in the best case
and the element we're looking for happens to be at the beginning or heck we just blindly insert to the beginning
irrespective of the order that we want to keep things in all right so besides that how can we improve further on this
design we don't need to stop at link list because honestly it's not been a clear win like link list allow us to use
more of our memory because we don't need massive growing chunks of contiguous memory so that's a win but they still
require Big O event time to find the end of it if we care about order we're using at least twice as much memory for the
darn pointer so that seems like you know a s side step it's not really a step forward so can we do better here's where
we can now accelerate the story by just stipulating that hey even if you you haven't used this technique yet we would
seem to have an ability to stitch together pieces of memory just using pointers and anything you could imagine
drawing with arrows you can implement it would seem in code so what if we leverage a second dimension instead of
just stringing together things laterally left to right essentially even though they were bouncing around on the screen
what if we start to leverage a second dimension here so to speak and build more interesting structures in the
computer's memory well it turns out that in a computer's memory we could create a tree similar to a family tree if you've
ever seen or drawn a family tree with grandparents and parents and siblings and so forth It's kind of this uh uh you
know so inverted branch of a tree that grows um typically when it's drawn downward instead of upward like a
typical tree but that's something we could translate into code as well specifically let's do something called a
binary search tree which is a type of tree and what I mean by this is the following notice this this is an example
of an array from like week two when we first talked about those and we had the lockers on stage and recall that what
was nice about an array if one it's sorted and two all of its numbers are indeed contiguous which is by definition
of an array we can just do some simple math for instance if there's seven elements in this array and we do 7 / two
that's what three and a half round down through truncation that's 3 0 1 2 3 that gives me the middle element
arithmetically in this thing and even though I have have to be careful about rounding using simple arithmetic I can
very quickly with a single line of code or math find for you the middle of the left half of the left half of the right
half or whatever that's the power of arrays and that's what gave us binary search and how did binary search work
well we looked at the middle and then we went left or right and then we went left or right again sort of as implied by the
this color scheme here wouldn't it be nice if we somehow preserved the new upsides today of dynamic memory
allocation giving ourselves the ability to just add another element add another element add another element but retain
the power of binary search because log of n was much better than n certainly for large data sets right even the phone
book demonstrated as much weeks ago so what if I kind of draw this same picture in two dimensions and I preserve the
color scheme just so it's obvious what came where what do these things kind of look like
now maybe like things we might Now call nodes right a node is just a generic term term for like storing some data
what if the data these nodes are storing are numbers so still integers but what if we kind of connected these cleverly
like an old family tree whereby every node has not one pointer now but as many as two maybe zero like in the leaves at
the bottom there in green but other nodes on the interior might have as many as two like having two children so to
speak and indeed the vernacular here is exactly that this would be called the root of the tree or this would be a
parent with respect to these children the green ones would be grandchildren respect to these the green ones would be
siblings with resp uh sorry these green ones would be siblings with respect to each other and over there too so all the
same jargon you might use in the real world applies in the world of data structures and CS trees but this is
interesting because I think we could build this now this kind of data structure in the computer's memory how
well suppose that we defined a node to be no longer just this a no a number in a next field what if we sort of give
ourselves a bit more room here and give ourselves a pointer called left and another one called right both of which
is a pointer to a struck node so same idea as before but now we just make sure we think of these things as pointing
this way and this way not just this way not just a single Direction but two so you could imagine in code building
something up like this with a node that creates in essence this diagram here but why is this compelling suppose I want to
find the number three I want to search for the number three in this tree it would seem just like Pedro was the
beginning of our link list in the world of trees the root so to speak is the beginning of your data structure you can
retain and remember this entire tree just by pointing at the root node ultimately one variable can hang on to
this whole tree so how can I find the number three well if I look at the root node and the number I'm looking for is
less than notice I can go this way or if it's greater than I can go this way so I've preserved that property of the
phone book or just a sorted array in general what's true over here if I'm looking for three I can go to the right
of the two because that number is going to be greater if I go left it's going to be smaller instead and here's a an
example of actually recursion recursion in a physical sense much like the Mario's pyramid which was kind of
recursively defined notice this I claim this whole thing is a tree specifically a binary search tree which means every
node has two or maybe one or maybe zero children but no more than two hence the buy and binary and it's the case that
every left child is smaller than the root and every right child is larger than the root that definition certainly
works for two four and six but it also works recursively for every sub tree or branch of this tree notice if you think
of this as the root it is indeed bigger than this left child and it's smaller than this right child and if you look
even at the leaves so to speak the grand children here this root node is bigger than its left child if it existed so
it's sort of a meaningless statement and it's it's less than its right child or it's not greater than certainly so
that's meaningless too so we haven't violated the definition even for these leaves as well and so now how many steps
does it take to find in the worst case any number in a binary search tree it would
seem so it seems two literally and the height of this thing is actually three and so long story short especially if
you're a little less comfy with your your logarithms from yester year log base 2 is just like the number of times
you can divide something in half and half and half until you get down to one this is kind of like a logarithm in the
reverse Direction here's a whole lot of elements and we're having we're having until we get down to one so the height
of this tree that is to say is log base 2 of n which means that even in the worst case the number you're looking for
maybe is all the way at the bottom in the leaves doesn't matter it's going to take log base 2 of n steps or log of n
steps to find maximally and one of those numbers so again binary uh sorry binary search is back but we've paid a price
right this isn't a link list anymore it's a tree but we've gained back binary search which is pretty compelling right
that's where the whole class began on making that distinction but what price have we paid to retain binary
search in this new world yeah it's no uh it's no longer sorted left to right but this is I claim sorted
according to the binary search tree definition where again left tree is left child is smaller than root and right
child is greater than root so it is sorted but it's sorted in a two-dimensional sense if you will not
just one but another price paid exactly every node now needs not one number but two three pieces of data
a number and now two pointers so again there's that kind of trade-off again where well if you want to save time you
got to give something if you start giving space and you start using more space you can speed up time like you've
got it there's always a price paid and it's very often in space or time or complexity or developer time uh the
number of bugs you have to solve I mean all of these are sort of finite resources that you have to juggle among
so if we consider now the code with which we can implement this here might be the node and how might we actually
use something like this well let's take a look at maybe one final program in C here before we transition to higher
level Concepts ultimately let me go ahead here and let me just open a program I wrote here in advance so let
me in a moment copy over a file called tree. C which we'll have on the course's website and I'll walk you through some
of the logic here that I've written for code do uh for tree. C all right so what do we have here first so here is a
implementation of a binary search tree for numbers and as before I've just kind of played around and I've inserted the N
the numbers manually so what's going on first here is my definition of a for a binary search tree copied and pasted
from what I proposed on the board a moment ago here are two prototypes for two functions that I'll show you in a
moment that allow me to free an entire an entire tree one note at a time and that also allow me to print the tree in
order so it's even though they're not sorted left to right I bet if I'm clever about what child I print first I can
reconstruct the idea of printing this tree properly so how might I Implement a binary search tree here's my main
function here is how I might represent a a tree of size zero it's just a null pointer called tree here's how I might
add a number to that list so here for instance is me maling space for a node storing it in a temporary variable
called n here is me just doing a safety check make sure n does not equal null and then here is me initializing this
node to contain the number two first then initializing the left child of that node to be null and the right child of
that n node to be null and then initializing the tree itself to be equal to that particular node so at this point
in the story there's just one rectangle on the screen containing the number two with no children all right let's just
add manually to this a little further let's add another number to the list by maling another node I don't need to
redeclare n as a node star because it already exists at this point here's a little safety check I'm going to not
bother with my let me do this uh free memory here just to be safe do I want to do this um we want to free memory 2
which I've not done here but I'll save that for another time here I'm going to initialize the number to one I'm going
to initialize the children of this no no to null and null and now I'm going to do this initialize the tree's left child to
be n so what that's essentially doing here is if this is my root node the single rectangle I described a moment
ago that currently has no children neither left nor right here's my new node with the number one I want it to
become the new Left child so that line of code on the screen there tree left equals n
is like stitching these two together with a pointer from two to the one all right the next line of code next lines
of code you can probably guess are me adding another number to the list just the number three so this is a simpler
tree with two one and three respectively and this code let me wave my hands is almost the same except for the fact that
I'm updating the tree's right child to be this new in third node let's now run the code before looking at those two
functions let me do make tree do slash tree and waila one two three so it sounds like the data structure is
sorted to your concern earlier but how did I actually print this and then eventually free the whole thing well
let's look at the definition of first print tree and this is where things get kind of interesting print tree returns
uh nothing so it's a void function but it takes a pointer to a root element as its sole argument node star root here's
my safety check if root equals equals null there's obviously nothing to print just return that sort of goes without
saying but here's where things get a little magical otherwise print your left child then print your own number then
print your right child what is this an example of even though it's not mentioned by name
here what programming technique here yeah so this is actually perhaps the most compelling use of recursion yet it
wasn't really that compelling with the Mario thing cuz we had such an easy implement mentation with a for Loop
weeks ago but here is kind of a perfect application of recursion where your data structure itself is recursive right if
you take any snip of any branch it all still looks like a tree just a smaller one that lends itself to recursion so
here is this leap of faith where I say ah print my left tree or my left sub tree if you will via my child at the
left then I'll print my own root note here in the middle then go ahead and print my right sub tree and because we
have this base case that just make sure that if the tree the root is null there's nothing to do you're not going
to recurse infinitely you're not going to call yourself again and again and again infinitely many times so it just
kind of works out and prints the one the two and the three and notice what we could do too if you wanted to print the
tree in reverse order you could do that print your right tree first the greater element then yourself then your smaller
sub tree and if I do make tree here in do/ tree W now I've reversed the order of the list and that's pretty cool you
could do it with the for Loop in an array but you can also do it even with this two-dimensional structure let's
lastly look just at this free tree function and this one's almost the same order doesn't matter in quite the same
way but it does still matter here's what I did with free tree well if the root of the tree is null there's obviously
nothing to do just return otherwise go ahead and free your left child and all of its descendants then free your right
child and all of its descendants and then free yourself and again free literally just frees the address in that
variable doesn't free the whole darn thing it just frees literally what's at that address why was it important that I
did line 72 last though why did I free the left child and the right child before I freed myself so to
speak exactly if you free yourself first if I had done incorrectly this line higher up you're not allowed to touch
the left child subtree or the right child subtree because the memory address is no longer valid at that point you
would get some kind of memory error perhaps the program would crash vren definitely wouldn't like it bad things
would otherwise happen but here then is an example of recursion and again just a recursive use of an actual data
structure and what's even cooler here is relatively speaking suppose we wanted to search something like this binary search
actually gets pretty uh straightforward to implement too for instance here might be the prototype for a search function
for a binary search tree you give me the uh the root of a tree and you give me a number I'm looking for and I can pretty
easily now return true if it's in there or false if it's not how well let's first ask a question if tree equals
equals null then you just return false because if there's no tree there's no number so it's obviously not there
return false else if the number you're looking for is less than the tree's own number which direction should we go okay
left how do we express that well let's just return the answer to this question search the left sub Tree by way of my
left child looking for the same number and you just assume through the beauty of recursion that ah you're kicking the
can and let yourself figure it out with a smaller problem just that snipped left tree instead else if the number you're
looking for is greater than the tree's own number go to the right as you might infer so I can just return the answer to
this question search my right sub tree for that same number and a fourth and final condition what's the fourth
scenario we have to consider explicitly yeah if the number itself is right there so else if the number I'm looking for
equals the tree's own number then and only then should you return true and if you're you're thinking quickly here
there's an optimization possible better design opportunity think back to even our scratch days what could we do a
little better here you're pointing at it exactly and El suffices because if there's logically only four things that
could happen you're wasting your time by asking a fourth gratuitous question and else here suffices so here too more so
than the Mario example a few weeks ago there's just this Elegance arguably to recursion and that's it this is not
pseudo code this is the code for binary search on a binary search tree and so recursion tends to work in lock step
with these kinds of data structures that have this kind of structure to them as we're seeing here are any questions then
on binary search as implemented here with a tree [Music]
yeah uh good question so uh when returning a Boolean value true and false are values that are defined in a library
called standard bu stdb L.H with the header file that you can use um it is the case that true is pro uh it's it's
not well defined what they are but they would map indeed yes to zero and one essentially but you should not compare
them explicitly to zero and one when you're using true and false you should compare them to each
[Music] other ah sorry so if I am in my own code from earlier in a void function it is
totally fine to return you just can't return something explicitly so return just means that's it quit out of this
function you're not actually handing back a value so it's a way of short circuiting the execution if you don't
like that and some people people do frown upon having code return from functions prematurely you could invert
the logic and do something like this if the root does not equal n do all of these things and then indent all three
of these lines underneath that's perfectly fine too I happen to write it the other way just so that there was
explicitly a base case that I could point to on the screen whereas now it's kind of implicitly there for us only but
a a good observation too all right so let's ask the question as before about like running time of this it would look
like binary search is back and we can now do things in logarithmic time but we should be careful is this a binary
search tree just to be clear and again a binary search tree is a tree where the root is greater than its uh left child
and smaller than its right child that's the essence so you're Shake you're nodding your head you agree all okay I
agree so this is a binary search tree is this a binary search tree okay I'm hearing yeses or I'm hearing
just my delay changing the vote it would seem so this is kind of one of those trick questions this is a binary search
tree because I've not violated the definition of what I gave you right is there any example of a left child that
is greater than its parent or is there any example of a right child that's smaller than its parent that's just the
opposite way of describing the same thing no this is a binary search tree unfortunately it also looks like albeit
at a different axis what linked list but you could imagine this happening right suppose that I hadn't been as thoughtful
as I was earlier by inserting two and then one and then three which kind of nicely balanced everything out suppose
that instead because of what the user's typing in or whatever you contrive in your own code suppose you insert a one
and then a two and then a three like you've kind of created a problem for yourself because if we follow the same
logic as before going left or going right this is how you might Implement a binary search tree accidentally if you
just blindly keep following that definition I mean this would be better designed as what if we kind of like
rotated the whole thing around and that's totally fine and those kinds of trees actually have names there's trees
called AVL trees in computer science there are red black trees in computer science there are other types of trees
that additionally add some logic that tell you when you got to Pivot the thing and rotate it and kind of snip off the
root and fix things in this way but a binary search tree in and of itself does not guarantee that it will be balanced
so to speak speak and so if you consider the worst casee scenario of even using a binary search tree if you're not smart
about the code you're writing and you just blindly follow this definition you might accidentally create a crazy long
and stringy binary search tree that essentially looks like a linked list because you're not even using any of the
left children so unfortunately the literal answer to the question here is what's the running time of search well
hopefully log in but not if you don't maintain the balance of the tree both insert and search could actually devolve
into instead of Big O of login literally Big O of n if you don't somehow take into account and we're not going to do
the code for that here sort of a higher level thing you might explore Bey uh down the road it can devolve into
something that you might not have intended and so now that we're talking about two Dimensions it's really the
onuses on the programmer to consider what kinds of perverse situations might happen where the thing devolves into a
structure that you don't actually want it to devolve into all right we've got just a few structures to go let's go
ahead and take one more five minute break here when we come back we'll talk at this level about some final
applications of this see you in five all right so we are back and as promised we'll sort of operate now at this higher
level where if we take for granted that even though you haven't had an opportunity to play with these
techniques yet you have the ability now in code to kind of stitch things together both in a one dimension and
even two Dimensions to build things like lists and trees so if we have these building blocks things like now array
and lists and trees what if we start to kind of amalgamate them such that we build things out of multiple data
structures can we start to get some of the Best of Both Worlds by way of for instance something called a hash table
so a hash table is sort of a Swiss army knife of data structures and that it's so commonly used because it allows you
to associate keys with Valu so to speak so for instance it allows you to associate um a username with a password
or a name with a number or anything where you had to take something as input and get as output a corresponding piece
of information and hashtable is often a data structure of choice and here's what it looks like it's actually looks like
an array at first glance but for discussion sake I've drawn this array vertically which is totally fine it's
still just an array but it allows you a hash table to jump to any of these locations randomly that is instantly so
for instance there's actually 26 locations in this array because I want to for instance store initially names uh
of people for instance and wouldn't it be nice if the person's name starts with a I have a go-to place for it maybe the
first box and if it starts with z i put them at the bottom so that I can jump instantly arithmetically using a little
bit of asy or Unicode fanciness exactly to the location that they want they need to go so for instance here's our array
zero index 0 through 25 if I think of this though as a through z I'm going to think of these 26 locations now in the
context of a hash table is what we'll generally call buckets so buckets into which you can put values so for instance
suppose that we want to insert a value one name into this data structure and that name is say alus so Albus starting
with a Albus might be go at the very beginning of this list all right and then we want to insert another name this
one happens to be Zacharias starting with z so it goes all the way at the end of this data structure in location 25
AKA Z and then maybe a third name like hermion and that goes at location H according to that position in the
alphabet so this is great because in constant time I can insert and conversely search for any of these names
based on the first letter of their name a or Z or H in this case let's fast forward and assume we put a whole bunch
of other names that might look familiar into this hash table it's great because every name has its own
location but if you're thinking of names you don't yet see it on the screen we eventually encounter a problem with this
right when could something go wrong using a hash table like this if we want to insert even more names what's going
to eventually happen yeah there's already someone with the first letter right like I haven't even mentioned
Harry for instance or Hagrid and yet hermion is already using that spot so that sort of invites the question well
what happens maybe if we want to insert Harry next do we maybe cheat and put him at location I but then if there's
location I where do we put them and it just feels like the situation could very quickly devolve but I've deliberately
drawn this data structure that I claim as a hash table sort of in two directions an array vertically here but
what might this be hinting I'm using horizontally even though I'm drawing the rectangles a little differently from
before yeah maybe another array to be fair but honestly arrays are such a pain with the allocating and reallocating and
so forth these kind of look like the beginnings of a linked list if you will where the name is where the number used
to be even though I'm drawing it horizontally now just for discussion sake and this seems to be like a pointer
that isn't pointing anywhere yet but it looks like the array is 20 six pointers some of which are null that is empty
some of which are pointing at the first node in a linked list so that's really what a hashtable might be in your mind
an amalgam of a an array whose elements are linked lists and in theory this kind of gives you the best of both worlds
right you get Random Access with high probability right you get to jump immediately to the location you want to
put someone but if you run into this perverse situation where there's someone already there okay fine it starts to
devolve into a link list but it's at least 26 smaller length lists not one massive length list which would be Big O
of N and quite slow to solve so if Harry gets inserted in Hagrid yeah you have to kind of chain them together so to speak
in this way but at least you're not you've not painted yourself into a corner and in fact if we fast forward
and put a whole bunch of familiar names in the data structure starts to look like this so the chains not terribly
long and some of them are actually of size zero because there's just some unpopular letters of the alphabet among
these names but it seems better than just putting everyone in one big array or one big length list we're sort of
trying to balance these trade-offs a little bit in the middle here well how might we represent something like this
here's how we could describe this thing a node in the context of a link list could be this I have an array called
word of type Char and it's big enough to fit the longest word in the alphabet plus one and the plus one why probably
the null character so I'm assuming that longest word is like a constant defined elsewhere in the story and it's
something big like 40 100 whatever whatever the longest word in the uh Harry Potter universe is or the English
alphabet or English dictionary is longest word plus one should be sufficient to store any name in the
story here and then what else does each of these nodes have well it has um a pointer to another node so here's how we
might implement the notion of a node in the context of storing not integers uh but names instead like this but how do
we decide what the hash table itself is well if we now have a definition of a node we could have a variable in main or
even globally called hashtable that itself is an array of node star pointers that is an array of pointers to nodes
the beginnings of length lists number of buckets is kind of up to me I proposed verbally that it be 26 but honestly if
you get a lot of collisions so to speak a lot of H names trying to go to the same place well maybe we need to be
smarter and not just look at the first letter of their name but maybe the first and the second so it's ha and H E but
wait no then Harry and Hagrid still Collide but we start to at least make the problem a little less impactful by
tinkering with something like the number of buckets in a hash table like this but how do we decide where someone goes in a
hash table in this way well it's an old school problem of input and output the input to the problem is going to be
something like the name and the algorithm in the middle as of today is going to be something called a hash
function a hash function is generally something that takes as input a string a number whatever and produces as output a
location in our context like a number 0 through 25 or 0 through 16,000 or whatever the number of buckets you want
is it's going to just tell you where to put that input at a specific location so for instance Albus according to the
story thus far gave me back zero as output Zacharias gave me 25 so the hash function in the middle of that black box
is pretty simplistic in this story it's just looking at like the asy value it seems of the the first letter in their
name and then subtracting off what capital A is 65 so like doing some math to get back and numbered between 0 and
25 so that's how we got to this point in the story and how might we then resolve the problem further and use this notion
of hashing more generally well just for demonstration sake here here's actually uh some buckets literally and we've
labeled in advance these buckets with the suits from a deck of cards so we've got some Spades and we've got
diamonds here and we've got what else here uh clubs and
hearts so we have a a deck of cards here for instance right and this is something you yourself might do instinctively If
you're sort of getting ready to start playing a game of cards you're just kind of cleaning up or you want things in
order like here is literally a jumbo deck of cards what would be the easiest way for me to sort these things well
we've got a whole bunch of sorting algorithms from the past so I could go through like here's the of diamonds and
I could here let me throw this up on the screen just so if you're far and back so here's uh you know diamonds I could put
this here three four I could do this in order here but a lot of us honestly if given a deck of cards and you just want
to kind of clean it up and sort it in order you might do things like this well here's my input three of diamonds let's
put it in this bucket four of diamonds this bucket five of diamonds this bucket and if you keep going through the cards
here's seven of hearts hearts bucket eight bucket uh Queen of Spades over here and it's
still going to take you 52 steps but at the end of it you have hashed all of the Cards into four distinct buckets and now
you have problems of size 13 which is a little more tenable than doing one massive 52 card problem you can now do
four 13 size problems and so hashing is something that even you and I might do instinctively taking as input some card
some name and producing his output some location a sort of temporary Pile in which you want to uh stay so to speak
but these collisions are kind of inevitable and honestly if we kept going through the Harry Potter Universe some
of these chains would get longer and longer and longer which means that instead of getting someone's name
quickly by searching for them or inserting them might start taking a decent amount of time so what could we
do instead to resolve situations like this if the problem fundamentally is that the first letter is just too darn
popular H we need to take in more input not just the first letter but maybe the first two letters so if we do that we
can go from a through z to something more extreme like maybe ha HB HC HD h e HF and so forth so that now Harry and
Hermione end up at different locations but you know darn it Hagrid still collides with Harry so it's better than
before the chains aren't quite as long but the problem isn't fundamentally gone and in this case here anyone know how
many buckets we just increase to if we now look at not just a through z but a a through z
roughly yeah okay good so the easy answer two 26 squared or 676 so that's a lot more buckets and this is why I only
showed a few of them on the screen so that's a lot more and it spreads things out specific uh particular what if we
take this one step further instead of ha we do like haa H H hzz and so forth well now we have an even better situation
because Hermione has her one spot Harry has his one spot Hagrid has is one spot but there's a trade-off here the upside
is now arithmetically we can find their locations in constant time maybe technically three steps but three is
constant no matter how many other names are in here it would seem but what's the downside
here sorry say again memory so significantly more we're now up to 17,576 buckets which itself isn't that
big a deal right computers have a lot of memory these days but as you can kind of infer you know I can't can't really
think of someone whose name started with HQ for instance in the Harry Potter universe and if we keep going definitely
don't know of anyone whose name started with zzz or a AA there's a lot of sort of not useful combinations that have to
be there mathematically so that you can do a bit of math and jump to randomly so to speak the precise location but
they're just going to be empty so it's a very sparsely populated array so to speak so what does that really mean for
performance ultimately well let's consider again in the context of our Big O notation it turns out that a hash
table technically speaking is still just going to give us Big O of n in the worst case why if you have some crazy perverse
case where everyone in the universe has a name that starts with a or starts with H or starts with z you just get really
unlucky and your chain is massively long well then at that point it's just a link list it's not a hash table it's like the
perverse situation with the tree where if you insert it without any mind for balancing it keeping it back bance it
just evolves but there's a difference here between sort of a theoretical performance and an actual performance if
you look back at the tree uh the hash table here this is absolutely in practice going to be faster than a
single link list you know mathematically ASM totically Big O notation sure it's all the same Big O of n but if what
we're really caring about is real humans using our software there's something to be said for crafting a data structure
that technically if this data were uniformly distributed is 26 times faster than a linked list alone and so there's
this tension too in between like uh systems uh types of Cs and theoretical CS where yeah theoretically these are
all the same but in practice we're making real world software you know improving the speed by a factor of 26 in
this case let alone 576 or more might actually make a big difference but there's going to be a trade-off and
that's typically some other resource like giving up more space all right how about another data structure we could
build let me fast forward to something here called a try so a try sort of a weird name in pronunciation short for
retrieval pronounced try typically a try is a tree that actually gives us constant time lookup even for
massive data sets what do I mean by this in the world of a try you create a tree out of arrays so we're really getting
into like the Frankenstein territory of just building things up with like spare parts of data structures that we have
here but the root of a try is itself an array for instance of size 26 where each element in that try points to another
node which is to say another array and each of those locations in the array represents a letter of the alphabet Like
A through Z so for instance if you wanted to store the names of the Harry Potter Universe not in a hash table not
in a link list not in a tree but in a try what you would do is hash on every letter in the person's name one at a
time so a try is like a multi-tier hash table in a sense where you first look at the first letter then the second letter
then the third and you do the following for instance each of these locations represents a letter A through Z suppose
I wanted to insert someone's name into this that starts with the letter A H uh like Hagrid for instance well I go to
the location h i see it's null which means I need to maloc myself another node or another array and that's picted
here then suppose I want to store the second letter in Hagrid's name an A so I go to that location in the second node
and I see okay it's currently null there's nothing below it so I allocate another node using malok or the like and
now I have h a and I continue this with r i d and then when I get to the bottom of this person's name I just have to
indicate here in color but probably with a Boolean value or something like a true value that says a name stops here so
that it's clear that the person's name is not h a ha a or h a g or h a r h a g r it's h a g r i d and the D is green
just to indicate there's like some other Boolean value that just says yes this is the node in which the name stops and if
I continue this logic here's how I might insert someone like Harry and here's how I might insert someone like Hermione and
what's interesting about the design here is that some of these names share a common prefix starts to get compelling
because you're reusing space you're using the same nodes for names like h a g and h a r because they share H and an
A in common and they all share an H in common so you have this data structure now that itself is a tree each node in
the tree is itself in Array and we therefore might implement this thing using Code like this every node is
containing I'll do it in reverse order a an array I'll call it children because that's what it really represents up to
26 children for each of these nodes size of the alphabet so I might have used a constant for number 26 to give myself 26
letters of the alphabet and each of those arrays stores that many node stars that many pointers to another node and
here's an example of the bu this is what I represented in green on the slide a moment ago I also need another piece of
data just a zero or One A true or false that says yes a name stops in this node or it's just a path to the rest of the
person's name name but the upside of this is that the height of this tree is only as tall as the person's longest
name h a g r i d or h e r m i o n e and notice that no matter how many other people are in this data structure
there's three at the moment if there were three million it would still take me how many steps to search for Hermione
h e r m i o n e so eight steps total no matter if there's two other people 2 million 10 million other people because
the path to her name is always on the same path and if you assume that there's uh there's a maximum limit on the length
of names in the human world maybe it's 40 100 whatever whatever the longest name in the world is that's constant
maybe it's 40 100 but that's constant which is to say that with a try technically speaking it is the case that
your lookup time Big O of n Big O notation would be Big O of one it's constant time because unlike every other
data structure we've looked at with a try the Run amount of time it takes you to find one person or insert one person
is completely independent of how many other pieces of data are already in the data structure and this holds true even
if one name is a prefix of another I don't think there was a a Daniel or Danielle in the Harry Potter universe
that I could think of but d a n i e l could be one name and therefore we have a true there in green and if there's a
longer name like Danielle then you keep going until you get to the E so you can still have with a try one name that's a
substring of another name so it's not as though we've created a problem there that too is still possible but at the
end of the day it only takes a finite number of steps to find any of these people and again that's what's
particularly compelling that you effectively have constant time lookup so that's amazing right we've gone through
this whole story for weeks now of like linear time and then it went up to like n squar and then log in and now constant
time what's the price paid for a data structure like this this so-called try what's the downside here there's got
to be a catch and in fact tries are not actually used that often amazing as they might sound on some CS level
here memory what would why in what sense exactly if you're storing all of these darn arrays it's again a sparse
sparsely populated data structure and you can kind of see it here grant that there's only three names but most of
those boxes most of those pointers are going to remain null so this is an incredibly wide data structure if you
will it uses a huge amount of memory to store the names but again you got to pick a lane either you're going to
minimize space or you're going to minimize time it's not really possible to get truly The Best of Both Worlds you
have to decide where the inflection point is for the device you're writing software for how much memory it has how
expensive it is and again taking all of these kinds of things into account so lastly let do one further abstraction so
even higher level to discuss something that are generally known as abstract data structures it turns out we could
spend like all day all week talking about different things we could build with these data structures but for the
most part now that we have arrays now that we have link lists or their cousins trees which are two-dimensional and
beyond that there's even graphs where the arrows can go in multiple directions not just down so to speak now that we
have this ability to Stitch things together we can solve all different types of problems so for instance a very
common type of data structure to use in a program or even our human world are things called cues a q being a data
structure uh like a line outside of a store where it has what's called a fifo property first in first out which is
great for fairness at least in the human world and if you've ever waited outside of uh tasty burger or salsa Fresco or
some other restaurant nearby presumably if you're queuing up at the counter you want them store to maintain a fifo
system first in first out so that whoever's first in line gets their food first and gets out first so a f a q is
actually a computer science term to and even if you're still in the habit of printing things on paper there are
things you might have heard called printer cues which also do things in order the first person to send their
essay to the printer should ideally be printed before the last person to send their uh essay to the printer again in
the interest of fairness but how can you implement a Quee well you typically have to implement like two fundamental
operations NQ and DQ so adding something to it and removing something from it and the interesting thing here is that how
do you implement a queue well in the human world you would just have like literally physical space for humans to
line up from left to right or right to left same in a computer like a printer queue if you send a whole bunch of jobs
to be printed a whole bunch of essays or documents well you need a chunk of memory like an array all right well if
you use an array what's a problem that could happen in the World of Printing for instance if you use an array to
store all of the documents that need to be printed it could be filled right so if
the programmer decided HP or whoever makes the printer decides oh you can send like a megabyte worth of documents
to this printer at once at some point you might get an error message which says sorry out of memory wait a few
minutes which is maybe a reasonable solution but a little Annoying or HP could write code that maybe dynamically
resizes the array or so forth but at that point maybe they should just use a linked list and they could so there too
you could implement the notion of a q using a linked list instead you're going to spend more memory but you're not
going to run out of space in your array which might be more compelling you know this happens even in the physical world
you go to the store and you know you start having to line up outside and down the road and like for a really busy
store they kind of run out of space so they they make do but in that case it tends to be more of an array just
because of the physical notion of humans lining up but there's other data structures too if you've ever gone to
the dining hall and picked up like a a Harvard or a Yale tray right you're typically picking up the last tray that
was just cleaned not the first tray that was cleaned why cuz these uh cafeteria trays stack up on top of each other and
indeed a stack is another type of abstract data structure in the physical world it's literally something physical
like a um a stack of trays which have what we would call a lifo property last in first out so as these things come out
of the washer they're putting the most recent ones on the top and then you the human are probably taking the most
recently cleaned one which means in the extreme no one on campus might ever use that very first uh tray which is
probably fine in the world of trays but would really be bad in the world of like tasty burger or lining up for food if
lifo were the property being implemented but here too it could be an array it could be a link list and you see this
honestly every day if you're using Gmail and your Gmail inbox that is actually kind of a stack at least by default
where your newest message last in are the first ones at the top of the screen that's kind of a lifo data structure and
it means that you see your most recent emails but if you have a busy day you're getting a lot of emails it might not be
a good thing because now you're kind of ignoring the people who wrote you way earlier in the day or the week so lifo
and fighto are just properties that you can achieve with these very specific types of data structures and the
parament in the world of stacks is to push something onto a stack or pop something out um these are here for
instance as an example of like why might you always wear the same color well if you're storing all of your clothes in a
stack you might not ever get to like the different colored clothes at the bottom of the list and in fact to paint this
picture we have a a couple minute uh video here just to to paint this here by a faculty member elsewhere let's go
ahead and Di the lights for just a minute or two here so that we can take a look at Jack learning some
facts once upon a time there was a guy named Jack when it came to making friends Jack did not have the Knack so
Jack went to talk to the most popular guy he knew he went up to Lou and asked what do I do Lou saw that his friend was
really distressed well Lou began just look how you're dressed don't you have any clothes with a different look yes
said Jack I sure do come to my house and I'll show them to you so they went off to Jacks and Jack showed Lou the box
where he kept all his shirts and his pants and his socks L said I see you have all your clothes in a pile why
don't you wear some others once in a while Jack said well when I remove clothes and socks I wash them and put
them away in the box then comes the next morning and up I hop I go to the box and get my clothes off the top Lou quickly
realized the problem with Jack he kept clothes CDs and books in a stack when he reach for something to read or to wear
he chose the top book or underwear then when he was done he would put it right back back it would go on top of the
stack I know the solution said a triumphant Lou you need to learn to start using a queue Lou took Jack's
clothes and hung them in a closet and when he had emptied the box he just tossed it then he said now Jack at the
end of the day put your clothes in the left when you put them away then tomorrow morning when you see the sun
shine L get your clothes from the right from the end of the line don't you see said Lou it will be so nice you'll wear
everything once before you wear something twice and with everything in cued in his closet and shelf J started
to feel quite sure of himself all thanks to Lou and his wonderful queue so just to help you realize that
these things are everywhere in the world even in our human world if you've ever lined up at
this place anyone recognize this okay so sweet green a little salad place in the Square this is if you order
online or in advance your food ends up according to the first letter in your name which actually sounds awfully
reminiscent of something like a hash table and in fact no matter whether you implement a hash table like we did with
an array and Link lists or with like three shelves like this this is actually an abstract data type called a
dictionary and a dictionary just like in our human world has keys and values words and their definitions this just
has uh letters of the alphabet and salads as their value but here too there's a real world constraint at what
in what kind of scenario does this system at Sweet green devolve into a problem for instance because they too
are using only finite space spite storage what could go wrong yeah yeah if they run out of space on the shelf and
there's a lot of people whose names start with D or e or whatever and so they just pile up and then maybe they
kind of overflow into the e or the fs and you know they probably don't really care because any human's going to come
by and just eyeball it and figure it out anyway but in the world of a computer you're the one coding and have to be
ever so precise we thought we would lastly do one final thing here um in advance we prepared a uh a linked list
of sorts in the audience since this has become a bit of a thing I am starting to represent the beginning of this link
list and so far as I have a pointer here with seat location G9 uh whoever is in G9 would you mind standing
up and what letter is on your seat there okay so you have S15 and your letter say again F5 so I see you're holding a c in
your node you are pointing to if you could physically F15 F15 what do you held you have an S and who should you be
pointing at F5 F5 could you stand up F5 you're holding a five I see what what address12 F12 big finale F12 if you'd
like to stand up holding a zero and null which means that was cf50 all right we'll see you next time
[Music] all right this is cs50 and this is already week six and this is the week in
which you learn yet another language but the goal is not just to teach you another language for language's sake as
we transition today and in the coming weeks from C where we've spent the past several weeks now to python the goal
ultimately is to teach you all how to teach yourselves new languages so that by the end of this course it's not in
your mind the the fact that you learned how to program in C or learned some weeks back how to program in scratch but
really how you learned how to program fundamentally in a paradigm known as procedural programming as well as with
some taste today and in the weeks to come of other aspects of programming languages like object-oriented
programming and more so recall though back in week zero hello world to look a little something like this and the world
was quite simple all you had to do was drag and drop these puzzle pieces but there were still functions and
conditionals and loops and variables and all of those kinds of Primitives we then transitioned of course to a much more
Arcane language that looked a little something like this and even now some weeks later you might still be
struggling with some of the syntax or getting annoying bugs when you try to compile your code and it just doesn't
work but there too the past few weeks we've been focusing on functions and loops and variables conditionals and
really all of those same ideas and so what we begin to do today is to one simplify the language we're using
transitioning from C now to python this now being the equivalent program in Python and look at its relative
Simplicity but also transitioning to look at how you can Implement these same kinds of features just using a different
language so we're going to see a lot of code today and you won't have nearly as much practice with python as you did
with C but that's because so many of the ideas are still going to be with us and really it's going to be a process of
figuring out all right I want to do a loop I know how to do it in C how do I do this in Python how do I do the same
with conditionals how do I declare variables and the like and moving forward not just in cs50 but in life in
general if you continue programming and learn some other language after the class if in 5 10 years there's a new
more popular language that you pick up it's just going to be a matter of Googling and looking at websites like
stack Overflow and the like to look at just basic building blocks of programming languages because you
already speak after these past six plus weeks uh you already speak programming itself fundamentally all right so let's
do a few quick comparisons left and right of what something might have looked like in scratch and what it then
looked like in C but now as of today what it's going to look like in Python then we'll turn our attention to the
command line ultimately in order to implement some actual programs so in scratch we had functions like this say
hello world a verb or an action in C it looked a little something like this and a bit of a cryptic mess the first week
you had the print F you had the double quotes you had the semicolon the parenthesis there's a lot more syntax
just to do the same thing we're not going to get rid of all of that syntax now but as of today in Python that same
statement is going to look a little something like this and just to perhaps call out the obvious what is different
or now simpler in python versus C even in this simple example here [Music]
yeah good so it's now print instead of print F and there's also no semicolon and there's one other subtlety over here
yeah so no new line and that doesn't mean it's not going to be printed it just turns out that one of the
differences we'll see is that with print you you get the new line for free it automatically gets outputed by default
being sort of a common case but you can override it we'll see ultimately too how about in scratch we had multiple
functions like this that not only said something on the screen but also asked a question thereby being another function
that returned a value called answer in C we saw code that looked a little something like this whereby that first
line declares a variable called answer sets it equal to the return value of get string one of the functions from the
cs-50 library and then the same double quotes and parentheses and semicolon then we had this format code in C that
allowed us with percent s to actually print out that same value in Python this to is going to look a little bit simpler
instead we're going to have answer equals get string quote unquote what's your name and then print with a plus
sign and a little bit of new syntax but let's see if we can't just infer from this example what it is that's going on
well first missing on the left is what to the left of the equal sign there's no what this time feel free to just call it
out so there's no type there's no type like the word string which even though that was a type in cs50 every other
variable in C did we use int or string or float or bull or something else in Python there's still going to be data
types today onward but you the programmer don't have to bother telling the computer what types you're using the
computer is going to be smart enough the language really is going to be smart enough to just figure it out from
Context meanwhile on the right hand side get string is going to be a function we'll use today in this week which comes
from a python version of the the cs50 library but we'll also start to take off those training wheels so that you'll see
how to do things without any cs50 Library moving forward using a different function instead as before no semicolon
but the rest of the syntax is pretty much the same here this starts of course to get a little bit different though
we're using print instead of print F but now even though this looks a little cryptic perhaps if you've never
programmed before cs50 what might that plus be doing just based on inference here what
do you think add to hello yeah so adding answer to the string hello and adding so to speak not
mathematically but in the form of joining them together much like we saw the join Block in scratch or
concatenation was the term of art there this plus sign uh appends if you will whatever is in answer to whatever is
quoted here and I deliberately left a space there so that grammatically it looks nice after the comma as well now
there's another way to do this and it too is going to look cryptic at first glance but it just gets easier and more
convenient over time you can also change this second line to be this instead so what's going on here this is actually a
relatively new feature of python in the past couple of years where now what you're seeing is yes a string between
these same double quotes but this is what python would call a format string or F string and it literally starts with
the letter F which admittedly looks I think a little weird but that just indicates that python should inum assume
that anything inside of curly braces inside of the string should be interp ated so to speak which is a fancy term
saying substitute the value of any variables therein and it can do some other things as well so answer is a
variable declared of course on this first line this F string then says to python print out hello comma space and
then the value of answer if by contrast you avoid if you omitted the curly braces just take a guess what would
happen What would the symptom of that bug be if you accidentally forgot the curly braces but maybe still had the F
there it would print hello answer yeah it would literally print hello comma answer because it's going to take you
literally so the curly braces just kind of allow you to plug things in and again it looks a little more cryptic but it's
just going to save us time over time and if any of you programmed in Java in high school for instance you saw Plus in that
context too for concatenation this just kind of makes your code a little tighter a little more succinct so it's a
convenient feature now in Python all right this was an example in scratch of a variable setting a variable like
counter equal to zero in C it looked like this where you specify the type the name name and then the value with a
semicolon in Python it's going to look like this and I'll State the obvious here you don't need to mention the type
just like before with string and you don't need a semicolon so it's a little simpler if you want a variable just
write it and set it equal to some value but the single equal sign still behaves the same as in C suppose we wanted to
increment counter by one in scratch we use this puzzle piece here in C we could do this actually in a few different ways
there was this way if counter already exists you just say counter equals counter plus one there was the slightly
less verbose way where you could whoops sorry in let me do first sentence first in Python that same thing as you might
guess is actually going to be almost the same you just throw away the semicolon and the mathematics are ultimately the
same copying from right to left via the assignment operator now recall and see that we had this shorthand notation
which did the same thing in Python you can similarly do the same thing just no need for the semicolon the only step
backwards we're taking if you were a big fan of counter Plus+ that doesn't exist in Python nor minus minus you just can't
do it you have to do the plus equals 1 or plus minus or minus equals one to achieve that same result all right how
about in Python 2 here in scratch recall was a conditional asking a silly question like is X less than y and if so
just say as much in C that looked a little something like this print F and if with the parenthesis the curly braces
the semicolon and all of that in Python this is going to get a little more pleasant to type two
it's going to be just this and if someone wants to call it some of the obvious changes here what
has been simplified now in Python for a conditional it would seem yeah what's missing or chain so no curly
braces and sorry and we're using the colon instead so I got rid of the curly braces in
Python but I'm using a colon instead and even though this is a single line of code so long as you indent subsequent
lines along with the print F that's going to imply that everything uh if the if condition is true should be executed
below it until you start to unindent and start uh writing a different line of code all together so indentation in
Python is important so this is among the reasons we've emphasized uh axes like Style just how well styled your code is
and honestly we've seen certainly in office hours and youve seen in your own code sort of a tendency sometimes to be
a little LAX when it comes to indentation right if you're one of those folks who likes to indent everything on
the left hand side of the window yeah might compile and run but it's not particularly readable by you or anyone
else python actually addresses this by just requiring indentation when logically needed so python is going to
force you to start indenting properly now if that's been perhaps um a tendency otherwise what else is missing well we
have no semicolon here of course it's print instead of print F but otherwise those seem to be the primary differences
what about something larger in scratch if an if else block like this you can perhaps guess what it's going to look
like and see it looks like this curly braces semicolons and so forth in Python it's going to now look like this almost
the same but in dentation is important the colons are important and there's one other difference that's now again
visible here but we didn't call it out a second ago what else is different in python versus C for these conditionals
[Music] yeah perfect we don't have any parentheses around the condition the
Boolean expression itself and why not well it's just simpler to type it's less to type you can still use parentheses
and in fact you might want to or need to if you want to like uh combine thoughts and do this and that or this or that but
by default you no longer need or should have those parentheses just say what you mean lastly with conditionals we had
something like this an if else if else statement in C it looked a little something like this in Python it's going
to get really tighter now it's just if and this is the Curiosity L if x greater than y so it's not else if it's
literally one keyword L if and the colons remain now on each of the three lines but the indentation is important
and if we did want to do multiple things we could just indent below each of these conditionals as well all right let me
pause there first to see if there's any questions on these syntactic differences [Music]
yeah uh between between what and what ah good question question is python
uh sensitive to spaces and where they go sometimes no sometimes yes is the short answer stylistically though you should
be practicing what we're preaching here whereby you do have spaces to the left and right of binary operators that
they're called something like less than or greater than is a binary operator because there's two upper Rands to the
left and to the right of them and in fact in Python more so than the world of SE there's actually formal style
conventions not only within cs50 have we had style a style guide on the course's website for instance that just dictates
how you should write your code so that it looks like everyone else's in the python Community they take this one step
further and there's an actual standard whereby you don't have to adhere to it but generally speaking in the real world
someone would reprimand you would reject your code if you're trying to contribute it to another project if you don't heere
to these standards so while you could be laxed with some of this white space do make things readable and that's Python's
theme for the code to be as readable as possible all right let's take a look at a couple of other constructs before
transitioning to some actual code this of course in scratch was a loop meowing forever in C the closest we could get
was doing something while true because true never changes so it's sort of a simple way of just saying do this
forever in Python it's pretty much the same thing but a couple small differen is here the parentheses are gone the
colon is there the indentation is there no semicolon and there's one other subtle difference what do you see true
is capitalized just because both true and false are Boolean values in Python but you got to start capitalizing them
just because all right how about a loop like this where you repeat something a finite number of times like meowing
three times in C we could do this a few different ways there's this very mechanical way where you initialize a
variable like I to zero you then use a while loop and check if I is less than three the total number of times you want
to meow then you print what you want to print you increment I using this syntax or the longer more verbo syntax with
plus equals or whatnot and then you do it again and again and again again in Python you can do it functionally the
same way same ideas slightly different syntax you just don't bother saying what type of variable you want python will
infer from the fact that there's a zero right there you don't need the parenthesis you do need the colon you do
need the indentation you can't do the i++ but you can do this other technique as we could have done in C as well how
else might we do this though too well it turns out in C we could do something like this which again sort of cryptic in
at first glance became perhaps more familiar where you have an initialization a conditional and then an
update that you do after each iteration in Python there isn't really an analog there is no analog in Python where you
have the parentheses and the multiple semicolons in the same line instead there is a for Loop but it's meant to
read a little more like English for I in 0 1 and two so we'll see in a bit these square brackets represent an array now
to be called a list in Python so lists in Python are more like link lists than they are in Array they are arrays more
on that soon so this just means for I in the following list of three values and on each iteration of this Loop python
automatically for you it first sets I to zero then it sets I to one then it sets I to two so that you effectively do
things three times but this doesn't necessarily scale as I've drawn it on the board suppose you took this at face
value as the way you iterate some number of times in Python using a for loop at what Point does this approach perhaps
get bad or bad design let me give folks just a moment to think yeah I'm
[Music] back sure if you don't know how many times you want to Loop or iterate you
can't really create a hardcoded list like that of 012 other [Music]
thoughts yeah if you're iterating a large number of times this list is going to get longer and longer and you're just
kind of stupidly going to be typing out like comma 3 comma 4 comma 5 comma dot dot dot comma 99 comma 100 I mean your
code would start to look atrocious eventually so there is a better way in Python there is a function or
technically a type called range that essentially magically gives you back a range of values from zero on up to but
not through a value so the effect of this line of code for I in the following range essentially hands you back a list
of three values thereby letting you do something three times and if you want to do something 99 times instead you of
course just change the three to a 99 question is higher zero or is there never really
to a really good question can you start counting at a higher number so not zero which is the implied default but
something larger than that yes so it turns out the range function takes multiple arguments not just one but
maybe two or even three that allows you to customize this Behavior so you can customize where it begins you can
customize the increment by default it's one but if you want to do every two values for like evens or odds you could
do that as well and a few other things and before long we'll take a look at some python documentation that will
become your authoritative source for answers like that like what can this function do other questions on this thus
far seeing none so what else might we uh compare and contrast here well in the world of C recall that we had a whole
bunch of buil-in data types like these here um Bull and Char and double and float and so forth string which happen
to come from the cs50 library but uh the the language C itself certainly understood the idea of strings because
the back slz the support for percent s and print F that's all Native built into c not a cs50 simplification all we did
and revealed as of a couple of weeks ago is that string this data type is just a synonym for a type def for charar which
is part of the language natively in Python now this list actually gets a little shorter at least for these common
primitive data types still going to have Bulls we're going to have floats ins and we're going to have strings but we're
going to call them stirs and this is not a cs50 thing from the library stir St Str is in fact a data type in Python
that's going to do a lot more than strings did for us automatically in C ins and floats meanwhile um don't need
the corresponding Longs and doubles because in fact among the problems python solves for us too ins can get as
big as you want integer overflow is no longer going to be an issue per week one the language solves that for us floating
point in Precision un unfortunately is still a problem that remains but there are libraries code that other people
have written as we briefly discussed in weeks past that allow you to do scientific or financial Computing using
libraries that build on top of these data types as well so there's other data types too in Python which we'll see
actually gives us a whole bunch of more power and capability things called ranges like we just saw lists like I
called out verbally with the square brackets things called tuples for things like X comma y or latitude comma
longitude uh Aries or dicks which allow you to store keys and values much like our hash tables from last time um and
then sets in the mathematical sense where they filter out duplicates for you and you can just put a whole bunch of
numbers a whole bunch of words or whatnot and the language with via this data type will filter out duplicates for
you now there's going to be a few functions we give you this week and Beyond training wheels that we're then
going to very quickly take off just because as we'll see today they just simplify the process of getting user
input correctly without accidentally writing buggy code just when you're trying to get hello world or something
similar to work and will give you functions not like not as long as this list in C but a subset of these get
float get int and get string that'll automate the process of getting user input in a way that's more resilient
against potential bugs but we'll see what those bugs might be and the way we're going to do this is similar in
spirit to C instead of doing include cs50.h like we did in C you're going to now start saying import cs50 python
supports similar to to see libraries but there aren't header files anymore you just use the name of the library in
Python and if you want to import cs50's functions you just say import cs50 or if you want to be more precise and not just
import the whole thing which could be slow if you've got a really big library with a lot of functionality in it you
can be more precise and say from cs50 import get float from cs50 import get int from cs50 import get string or you
can just separate them by commas and import three and only three things from a particular Library like ours but
starting today and onward we're going to start making much more heavy use of libraries code that other people wrote
so that we're no longer Reinventing the wheel we're not making our own linked lists our own trees our own dictionaries
we're going to start standing on the shoulders of others so that you can get real work done so to speak Faster by
building your software on top of others code as well all right so that's it for the
syntactic tour of the language and the sort of core feature soon we'll transition to application thereof but
let me pause here to see if there's any questions on syntax or Primitives or otherwise or
otherwise oh yes and [Music] back sorry say it again why doesn't
python have what kind of operators sorry someone coughed when you said something operators the increment
the oh the increment operator I'd have to check the history honestly python has tended to be a fairly minimist language
and if you can do something one way the community arguably has tended to not give you multiple ways to do the same
thing syntactically um there's probably a better answer and I'll see if I can dig in and post something online uh to
follow up on that all right so before we transition to now writing some actual code let me go ahead and
consider exactly how we're going to write code in the world of C recall that it's generally been a two-step process
we create a file called like hello.c and then step one make hello step two /hello or if you think back to week two when we
sort of peeled back the layer of what hello what make was doing you could more verbosely type out the name of the
actual compiler clang in our case command line arguments like- o hello to specify what name you want to create and
then you can specify the file name and then you can specify what libraries you want to link in so that was a very
verbose approach but it was always a two-step approach and so even as you've been doing recent problem sets odds are
you've realized that anytime you want to make a change to your code or uh uh make a change to your code and try and test
your code again you're constantly doing those two steps moving forward in Python it's going to become simpler and it's
going to be just this the file name is going to change but that might go without saying it's going to be
something like hello.py py instead of hello. c and that's just a convention using a different file extension but
there's no compilation step per se you jump right to the execution of your code and so python it turns out is the name
not only of the language we're going to start using it's also the name of a program on a Mac a PC assuming it's been
pre-installed that interprets the language for you this is to say that python is generally described as being
interpreted not compiled and by that I mean you get to skip from the programmer's perspective that
compilation step there is no manual step in the world of python typically of writing your code and then compiling it
to zeros and ones and then running the zeros and ones instead these kind of two steps get collapsed into the illusion of
one whereby you instead are able to just run the code and let the computer figure out how to actually convert it to
something the computer understands and the way we do that is via this old process input and output but now when
you have source code it's going to be passed into an interpreter not a compiler and the best analog of this is
just to perhaps point out that in the human world if you speak or don't speak multiple human languages it can be a
pretty slow process from going from one language to another for instance here are step-by-step instructions for
finding someone in a phone book unfortunately in Spanish unfortunately if you don't speak or read Spanish you
could figure this out you could run this algorithm but you're going to have to do some Googling or you're going to have to
open up literal dictionary from Spanish to English and convert this and the catch with translating any language
human or computer or otherwise is that you're going to pay a price price typically some time and so converting
this in Spanish to this in English is just going to take you longer than if this were already in your native
language and that's going to be one of the subtleties with the world of python yes it's a feature that you can just run
the code without having to B bother compiling it manually first but we might pay a price and things might be a little
slower now there's ways to chip away at that but we'll see an example thereof in fact let me transition now to just a
couple of examples that demonstrate how python is not only easier for many people to use perhaps yourselves too cuz
it throws away a lot of the Annoying syntax it shortens the number of lines you have to write and also it comes with
so many darn libraries you can just do so much more without having to write the code yourself so as an example of this
let me switch over here to this image from problem set four which is the week's Bridge down by at the Charles
River here in Cambridge and this is the original photo pretty clear and it's even higher as if we looked at the
original version of the photo but there have been no filters all Instagram applied to this photo recall for problem
set 4 you had to implement a few filters and among them might have been blur and blur was probably among the more
challenging of the ones because you had to iterate over all of the pixels you had to take into account what's above
what's below to the left to the right I mean there was a lot of math and arithmetic and if you ultimately got it
it was probably a great sense of satisfaction but that was probably several hours later in a language like
python where there might be libraries that have been written by others on Whose shoulders you can stand we could
perhaps do something like this let me go ahead and run a program or write a program called blur. py here and in
blur. py in VSS code let me just do this let me import from a library not the cs50 library but the pillow Library so
to speak a uh keyword called image and another one called image filter then let me go ahead and say let me open the
current version of this image which is called bridge. BMP so the before version of the image will be the result of
calling image. openen quote unquote bridge. BMP and then let me create an after version so you'll see before and
after after equals the before version. filter of image filter and there is if I read the documentation I'll see that
there's something called a box blur that allows you to blur in box format like one pixel above below left and right so
I'll do one pixel there and then after that's done let me go ahead and save the file as something like out.
BMP that's it assuming this Library works as described I am opening the file in Python using line three and this is
somewhat new syntax in the world of python we're going to start making use of the dot operator more because in the
world of python you have what's called objectoriented programming or oop as a term of Art and what this means is that
you still have functions you still have variables but sometimes those functions are uh embedded inside of the variables
or more specifically inside of the data types themselves think back to C when you wanted to convert something to
uppercase there was a two upper function that takes as input an argument that's a Char and you can pass in any Char you
want and it will uppercase it for you and give you back a value well you know what if that's such a common Paradigm
where Upper casing chars is a useful thing what the world of python does is it embeds into the string data type or
Char if you will the ability just to uppercase any Char by treating the Char or the string as though it's a struct in
C recall that structs encapsulate multiple types of values in object-oriented programming in a
language like python you can encapsulate not just values but also functionality functions can now be inside of structs
but we're not going to call them strs anymore we're going to call them objects but that's just a different vernacular
so what am I doing here inside of the image Library there's a function called open and it takes an argument the name
of the file to open once I have a variable called before that is a struct or technically an object inside of which
is now because it was returned from this function a function called filter that takes an argument the argument here
happens to be image. boox blur one which itself is a function but it just Returns the filter to use and then after do
saave does what you might think it just saves the file so instead of using fop and fr right you just say do saave and
that does all of that messy work for you so it's just what four lines of code total let me go ahead and go down down
to my terminal window let me go ahead and show you with ls that at the moment well sorry let me not bother showing
that cuz I have other examples to come I'm going to go ahead and do python of blur.
py nope sorry wrong place I did need to make a command there we go okay let me go ahead and type LS inside of my filter
directory which is among the sample code online today there's only one file called bridge. be damn it I'm trying to
get these things ready at the same time let me rewind let me move this code into place all right I've gone ahead and move
this file blur. piy into a folder called filter inside of which there's another file called bridge. BMP which we can
confer with ls let me now go ahead and run python which is my interpreter and also the name of the language and run
python on this file so much like running the Spanish algorithm through Google translate or something like that as
input to get back the English output this is going to translate the python Lang to something this computer or this
cloud-based environment understands and then run the corresponding code top to bottom left to right I'm going to go
ahead and enter no error message is generally a good thing if I type LS you'll now see out. BMP let me go ahead
and open that and you know what just to make clear what's really happening let me blur it even further let's make a box
that's not just one pixel around but 10 so let's make that change and let me just go ahead and rerun it with python
of blur. py I still have out. BMP let me go ahead and open out. BMP and show you first the before which looks like this
that's the original and now crossing my fingers four lines of code later the result of blurring it as well so the
library is doing all of the same kind of leg work that you all did for the assignment but it's encapsulated it all
into a single library that you can then use instead those of you who might have been feeling more comfortable might have
done a little something like this let me go ahead and open up one other file called
edges. and in edges. pi I'm again import from the pillow Library the image keyword and the image filter then I'm
going to go ahead and create a before image that's a result of calling image. openen of the same thing bridge. BMP
then I'm going to go ahead and run a filter on that called image whoops uh image filter. find edges which is like a
constant if you will defined inside of this library for us and then I'm going to do after. saave quote unquote out.
BMP using the same file name I'm now going to run python of edges. piy after sorry user error we'll see
what syntax error means soon let me go ahead and run the code now edges. Pi let me now open that new file out. PMP and
before we had this and now especially if what will look familiar if you did the more comfortable version of P set 4 we
now get this after just four lines of code so again suggesting the power of using a language that's better optimized
for the tool at hand and at the risk of really making folks sad let's go ahead and reimplement if we could problem set
five real quickly here let me go ahead and open um another version of this code wherein I have a c version just from
problem set five where you implemented a spell checker loading 100,000 plus words into memory and then you took kept track
of just how much time and memory it took and that probably took a while implementing all of those functions in
dictionary.c let me instead now go into to uh a new file called dictionary. pi and let me stipulate for the sake of
discussion that we already wrote in advance spell. piy which corresponds to spell. C you didn't write either of
those recall for problem set five we gave you spell. C assume that we're going to give you spell. Pi so the onus
on us right now is only to implement spell uh dictionary. py all right so I'm going to go ahead and Define a few
functions and we're going to see now the syntax for defining functions in Python I want go ahead and Define a first a
hash table which was the very first thing you defined in dictionary.c I'm going to go ahead then and say words
gets this give me a dictionary otherwise known as a hash table all right now let me Define a function called check which
was the first function you might have implemented check is going to take a word and you'll see in Python the syntax
is a little different you don't specify the return type you use the word defa instead to Define you still specify the
name of the function and the any arguments there too but you omit any mention of types but you do use a colon
and indent so how do I check if a word is in my dictionary or in my hash table well in Python I can just say if word in
words go ahead and return true else go ahead and return false done with the check function all right now I want to
do like load that was the heavy lift where you had to load the big file into memory so let me Define a function
called load it takes a string the name of a file to load so I'll call that dictionary just like see but no data
type let me go ahead and open a file uh by using an open function in Python uh by opening that dictionary in read mode
so this is a little similar to fop a function in see you might recall then let me iterate over every line in the
file in Python this is pretty Pleasant for line in file colon indent how now do I get at the current word and then strip
off the new line because in this file of words the 140,000 words there is word back sln word back sln all right well
let me go ahead and get a word from the current line but strip off from the right end of the string the new line
which the r strip function in Python does for me then let me go ahead and add to my dictionary or hash table that word
done let me go ahead and close the file for good measure and then let me go ahead and return true cuz all was well
that's it for the load function in Python how about the size function this did not take any arguments it just
Returns the size of the hashtable or dictionary in Python I can do that by returning the length of the the
dictionary in question and then lastly gone from the world of python is malok and free memory is managed for you so no
matter what I do there's nothing to unload the computer will do that for me so I give you in these functions
problems at five in Python so I'm sorry we made you write it in C first but the implication now is that what are you
getting for free in a language like python well encapsulated in this one line of code is much of what you wrote
for problem set five implementing your array for all of your letters of the alphabet or more all of the link lists
that you implemented to create chains to store all of those words all of that is happening it's just someone else in the
world wrote that code for you and you can now use it by way of a a dictionary and actually I can change this a little
bit because add is technically not the right function to use here I'm actually treating the dictionary as something
simpler a set so I'm going to make one tweak set recall was another data type in Python but set just allows it to
handle duplicates and it allows to just throw things into it by literally using a function as simple as ADD and I'm
going to make one other tweak here because when I'm checking a word it's possible it might be given to me in
uppercase or capitalized it's not going to necessarily come in in the same lowercase format that my dictionary did
I can force every word to lowercase by using word. lower and I don't have to do it character for character I can do the
whole darn string At Once by just saying word. lower all right let me go ahead and open up a terminal window here and
let me go into first my C version on the left and actually I'm going to go ahead and split my terminal window into two
and on the right I'm going to go into a version that I essentially just wrote but it's also available online if you
want to play along afterward I'm going to go ahead and make spell in C on the left and note that it takes a moment to
compile then I'm going to be ready to run speller of dictionaries let's do like the Sherlock Holmes text which is
pretty big and then over here let me get ready to run python of spell on texts SL homes. txt2 so the syntax is a little
different at the command prompt I just on the left have to compile the code with make and then run it with/ spell on
the right I don't need to compile it but I do need to use The Interpreter so even though the lines are wrapping a little
bit here let me go ahead and run it on the right and I'm going to count how long it takes verbally for demonstration
sake 1 Mississippi 2 Mississippi 3 Mississippi okay so it's like 3 seconds give or take
now running it in Python keeping in mind I spent way fewer hours implementing a spell checker in Python then you might
have in problem set five but what's the trade-off going to be and what kinds of design decisions do we all now need to
be making consciously here we go on the right in Python One Mississippi two Mississippi three Mississippi four
Mississippi five Mississippi six Mississippi 7 Mississippi 8 Mississippi 9 Mississippi 10 Mississippi 11
Mississippi all right so 10 or 11 seconds so which one is better let's like go to the group here which of these
programs is the better one how might you answer that question based on demonstration
alone what do you think I think is better for the programmer more for the prammer but better for the user okay so
python to summarize is better for the programmer because it was way faster to write but C is maybe better for the
computer because it's much faster to run I think that's a reasonable formulation opinions yeah I think it depends on the
size of the that you're deal with so if it's going to be something that's relativ quick I might not care that it
takes 10 seconds to do it and it took me way faster to a python whereas in C if I'm dealing with something like a
massive data set or something huge that uh that time is going to really build up on it might be worth it to uh put in The
Upfront effort and just go so the process continually runs faster over a longer perod of time absolutely a really
good answer and let me summ is it depends on the workload if you will if you were to you if you have a very large
data set you might want to optimize your code to be as fast and performant as it can be especially if you're running that
code again and again maybe you're a company like Google people are searching a huge database all the time you really
want to squeeze every bit of performance as you can out of the computer you might want to have someone smart take a
language like C and write it at a very low level it's going to be painful they're going to have bugs they're going
to have to deal with memory management and like but if and when it works correctly it's going to be much faster
it would seem by contrast if you have a data set that's big and 140,000 words is not small but you don't want to spend
like 5 hours 10 hours a week of your time building a spell checker or dictionary you can instead leverage a
different language with different libraries and build on top of it in order to uh prioritize the human time
instead other thoughts [Music] that perfect segue to exactly the next
point we wanted to make which was is there something in between and indeed there is I'm oversimplifying what this
language is actually doing it's not as Stark a difference of saying like hey python is four times slower than C like
that's not the right takeaway there are absolutely ways that Engineers can optimize languages as they have already
done for Python and in fact I've configured my settings in such a way that I've kind of uh dramatized just how
big the difference is it is going to be slower python typically than the equivalent C program but it doesn't have
to be as big of a gap as it is here because indeed among the features you can turn on in Python is to save some
intermediate results technically speaking yes python is interpreting uh dictionary. p and these other files uh
translating them from one language to another but that doesn't mean it has to do that every darn time you run the
program as you propose you can save or cash CA the results of that process so that the second time and the third time
are actually notably faster and in fact python itself The Interpreter the most popular version thereof itself is
actually implemented in C so you can make sure that your interpreter is as fast as possible and what then is maybe
the high level takeaway yes if you are going to try to squeeze every bit of performance out of your code and maybe
code is constrained you maybe you have very small devices maybe it's like a watch nowadays or maybe it's a sensor
that's installed uh in some small format in a in an appliance or in infrastructure where you don't have much
battery life and you don't have much size you might want to minimize just how much work is being done and so the
faster the code runs and uh the better it's going to be if it's implemented something low level so C is still very
commonly used for certain types of applications but again if you just want to solve real world problems and get
real work done and your time is just as if not more valuable than the device you're running it on long term you know
what P python is among the most popular languages as well and frankly if I were implementing a spell checker moving
forward I'm probably starting with python and I'm not going to waste time implementing all of that lowlevel stuff
because the whole point of using newer modern languages is to use abstractions that other people have created for you
and by abstraction I mean something like the dictionary function that just gives you a dictionary or hashtable or the
equivalent version that I used which in this case was a set all right any questions then on python thus
far no all right let's oh yeah in the middle [Music]
[Music] really good question or observation could you just compile python code yes
absolutely this idea of compiling code or interpreting code is not native to the language itself it tends to be
native to the conventions that we humans use so you could actually write an interpreter for C that would read it top
to bottom left to right converting it to on the FL something the computer the computer understands but historically
that's not been the case C is generally a compiled language but it doesn't have to be what python nowadays is actually
doing is what you described earlier it technically is sort of unbeknownst to us compiling the code technically not into
zeros and ones technically into something called bite code which is this intermediate step that just doesn't take
as much time as it would to recompile the whole thing and this is an area of research for uh computer scientists
working in programming languages to improve these kinds of paradigms why well honestly for you and I the
programmer it's just much easier to one run the code and not worry about the stupid second step of compiling it all
the time why it's literally half as many steps for me the human and that's a nice thing to optimize for um and ultimately
two um you might want all of the fancy features that come with these other languages so you should really just be
fine-tuning how you can enable these features as opposed to shying away from them here and in fact the only time I
personally ever use see is from like September to October of every year during CS 50 almost every other month do
I reach for python or another language called JavaScript to actually get real work done which is not to impune C it's
just that those other languages tend to be better fits for the amount of time I have to allocate and the types of
problems that I want to solve all right let's go ahead and take a five minute break here and when we come back we'll
start writing some programs from scratch all right so let's go ahead and start writing some code from the beginning
here whereby we start small with some simple examples and then we'll build our way up to more sophisticated examples in
Python but what we'll do along the way is First Look side by side at what the C code looked like way back in week one or
two or three and so forth and then write the coresponding python code at right and then we'll transition just to
focusing on python itself what I've done in advance today is I've downloaded some of the code from the course's website my
source six directory which contains all of the pre-written C code from weeks past but it'll also have copies of the
Python code we'll write here together and look at so first here is hello.c back from week zero this was
version zero of it I'm going to go ahead and do this I'm going to go ahead and split my uh code window up here I'm
going to go ahead and create a new file called hello.py and this isn't something you'll typically have to do laying your
code out side by side but I've just clicked the little icon in vs code that looks like two columns that splits my
code editor into two places so that we can in fact see things for now side by side with my terminal window down below
all right now I'm going to go ahead and write the corresponding Python program on the right which recall was just print
quote unquote hello world and that's it now now down in my terminal window I'm going to go ahead and run python of
hello.py enter and voila we've got hello.py working so again I'm not going to play any further with the c code it's
there just to jog your memory left and right so let's now look at a second version of hello world from that first
week whereby if I go and get hello 1. C I'm going to drag that over to the right whoops I'm going to go ahead and drag
that over to the left here and now on the right let's modify hello.py to look a little more like this second version
in C all right I want to get a uh answer from the user is a return value but I also want to get some input from them so
from cs50 I'm going to import the the function called get string for now we're going to get rid of that eventually but
for now it's a helpful training wheel and then down here I'm going to say answer equals get string quote unquote
what's your name question mark space but no semicolon no data type and then I'm going to go ahead and print just like
the first example on the slide hello comma space plus answer and now let me go ahead and run this python of hello. P
all right it's asking me what's my name David hello comma David but it's worth calling attention to the fact that I've
also simplified further it's not just that the individual functions are simpler what is also now glaringly
omitted from my python code at right both in this version and the previous version what did I not bother
implementing main yeah so I didn't even need to implement main we'll revisit the main function because having a main
function actually does solve problems sometimes but it's no longer required and see you have to have that to
Kickstart the entire process of actually running your code and in fact if you were missing main as you might have
experienced if you accidentally compiled helpers.com error in Python it's not necessary python you can just jump right
in start programming and boom you're good to go especially if it's a small program like this you don't need the
added overhead or complexity of a main function so that's one other difference here all right there were a few other
ways we could say hello world recall that I could use a format string so I could put this whole thing in quotes I
could use this F prefix and then let me go ahead and run python of hello.py again you can perhaps see where we're
going with this let me type my name David and here we go okay that's the mistake that someone identified earlier
you need the curly braces otherwise no variables are interpolated that is substituted with their actual values so
if I go back in and add those curly braces to the F string now let me run python of hello.py type in my name and
there we go we're back in business which one's better I mean it depends but generally speaking making shorter more
concise code tends to be a good thing so stylistically the F string is probably a reasonable instinct to have all right
well what more can we do besides this well let me go ahead here and let's get rid of the training wheel Al together
actually so same C code at left let me get rid of the cs50 library which we will ultimately in a couple of weeks
anyway I can't use get string but I can use a function that comes with python called input and in fact this is
actually a one for one substitution pretty much there's really no downside to using input instead of get string we
Implement get string just for consistency with what you saw and see python of hello.py what's your name
David still actually works the same So Gone are the cs50 specific training wheels but we're going to bring them
back shortly just to deal with integers or Floats or other values too because it's going to make our lives a little
simpler with error checking all right any questions before we now pivot to revisiting other examples from week one
but now in Python all right let me go ahead and open up now let's say uh calculator z.c
which was one of the first examples we did involving math and operators like that as well as functions like get int
let me go ahead and create a new file now called uh calculator. at right so that I have my C
code at left still and my python code at right all right let me go do dive into a a translation of this code into python I
am going to use get int from the csfd library so let me import that I'm going to go ahead now and get an INT from the
user so in x equals get int and I'll ask them for an x value just like we did weeks ago no need to specify a semicolon
though or a um int for the X it will just figure it out Y is going to get another int bya y colon and then down
here I'm going to go ahead and say print of x + y so this is already a bit new recall the C version required that I use
this format string as well as print F itself Python's just a little more user friendly if all you want to do is print
out a value like X Plus y just print it don't fuss with any uh percent signs or format codes it's not print F it's
indeed just print now all right let me go ahead and run python of calculator. piy enter and just do a quick sample 1 +
2 indeed equals 3 as an aside suppose I had taken a different approach to importing the whole cs50 Library
functionally it's the same you're not going to notice any performance impact here it's a small library but notice
what does not work now whereas it did work in C python of calculator. Pi enter we see our first Trace back deliberately
here so a trace back is just a term of art that says here is a trace back through all of the functions that just
got executed in the world of C you might call this a stack Trace stack being the oper operative word recall that when we
talked about the stack and the Heap the stack like a stack of trays was all of the functions that might get called one
after the other we had main we had swap then swap one away and then main finished recall so here's a traceback of
all the functions or code that got executed there's not really any functions other than my file itself
otherwise there' be more detail but even though it's a little cryptic we can perhaps infer from this the output here
name error so something related to the name of something name get int is not defined and this of course happens on
line three over there all right so why is that well python essentially allows us to namespace our functions that come
from libraries there was a problem in C if you were using the cs-50 library and thus had access to get int get string
and so forth you could not use another library that had the same function names they would Collide and the compiler
would not know how to link them together correctly in Python and other languages like um JavaScript and in uh Java you
have a support for effectively what would be called Nam spaces you can isolate variables and function names to
like their own name space like their own container in memory and what this means is if you import all of cs50 you have to
say that the get in you want is inside the cs50 Library so just like with the image blurring and the image edges
before where I had a specify image dot an image filter dot similarly here am I specifying with a DOT operator albe it a
little differently that I want cs50 dot get int in both places and now if I rerun python of calculator. piy one and
two now we're back in business which one is better generally speaking um it depends on just how many functions
you're using from the library if you're using a whole bunch of functions just import the whole thing if you're only
using maybe one or two import them line by line all right so let's go ahead and make a little tweak here let's get rid
of this library and take this training wheel off too as quickly as we introduced it though for the problem set
six you'll be able to use all of these same functions suppose I get rid of this and I just use the input function just
like I did by replacing get string earlier let me go ahead now and run this version of the code python of
calculator. piy okay how about 1 + 2 = 3 huh all right obviously wrong incorrect can anyone explain what just
happened based on instincts what just happened here yeah sure
[Music] yeah exactly python is interpreting both or treating both X and Y as strings
which is actually what input function returns by default and so plus is now being interpreted as concatenation as we
defined it earlier so X Plus y isn't X+ y mathematically but in terms of string joining just like in scratch so that's
why we're getting 12 or really one two which isn't itself a number it to is another string so we somehow need to
convert things and we didn't have this ability quite as easily in C we did have like the a toi function asky to integer
which did allow you to do this the analog in Python is actually just to do a cast a type cast using int so just
like in C you can use the keyword int but use it a little differently notice that I'm not doing parenthesis int close
parenthesis before the value I'm using int as a function so indeed fun in Python int is a function float is a
function that you can pass values into to do this kind of conversion so now if I run python of calculator. piy one and
two now we're back in business and getting the answer of three but there's kind of a catch here here there's always
going to be a trade-off like that sounds amazing that it just works in this way we can throw away the cs50 library
already but what if the user accidentally types or maliciously types in like a cat instead of a number damn
well there's one of these tracebacks like now my program has crashed this is similar in spirit to the kinds of segals
that you might have had in C but they're not segals per se it doesn't necessarily relate to memory this time it relates to
actual uh runtime values uh not being as expected so this time it's not a name error it's a value error invalid literal
for INT with base 10 quote unquote cat so again it's written for sort of a a programmer more than um sort of a
typical person because it's pretty arcane the language here but let's try to interpret it invalid literal a
literal is just something someone typed uh for INT which is the function name with base 10 it's a defaulting to
decimal numbers C A is apparently not a decimal number it doesn't look like it therefore can't be treated like it
therefore there's a value error so what can we do unfortunately you would have to somehow catch this error and the only
way to do that in Python really is by way of another feature that c did not have namely what are called exceptions
an exception is exactly what just happened name error value error they are things that can go wrong when your
python code is running that aren't necessarily going to be detected until you run your code so in Python and in
JavaScript and in uh Java and other more modern languages there's this ability to actually try to do something except if
something goes wrong and in fact I'm going to introduce a bit of syntax here even though we won't have to use this
much just yet instead of just blindly converting X to an INT let me go ahead and try to do that and if there's an
exception go ahead and say something like uh print uh that is not an INT and then I'm
going to do something like exit right there and let me go ahead and do this here let me try to get y except if
there's an exception then let me go ahead and say again that is not an INT exclamation point and then I'm going to
exit from there too otherwise I'll go ahead and print X Plus y if I run python of calculator. Pi now whoops
uh that oh forgot my close quote sorry all right so close quote python of calculator. Pi uh one and two still work
but if I try to type in something wrong like cat now it actually detects the error so what is the cs50 library in
Python doing it's actually doing that try and accept for you because suffice it to say otherwise your programs for
something simple like a calculator start to get longer and longer so we factored that kind of logic out to the cs50 get
int function and get float function but underneath the hood they're essentially doing this try accept but they're being
a little more precise they're detecting a specific error and they are doing it in a loop so that these functions will
get executed again and again again in fact the best way to do this is to say except if there's a value error then
print that error message out to the user and again let's not get to into the weeds here with this feature we've
already put it into the cs50 library but that's why for instance we bootstrap things by just using these functions out
of the box all right let's do something more with our calculator here how about this in the world of C we had another
version of this code which actually did some division by way of um which actually did division of numbers not
just the addition here in so let me go ahead and close the C version and let's focus only on python now doing some of
these same lines of codes but I'm going to go ahead and just assume that the user is going to cooperate and use
proper input so from cs50 import get int that'll deal with any errors for me X gets uh get in ask the user for an INT x
y equals get int ask the user for an INT Y and then let's go ahead and do this let's declare a variable called Z set it
equal to x / y then let's go ahead and print Z still no need for a format string I can just print out the
variable's value let me go ahead and run python of calculator. Pi let me do 1 10 and I get point1 what did I get in C
though if you think back does what would we have happened in C Zer yeah we would have gotten zero in
C but why in when you divide one in by another and those in are like 1 and 10 respectively it will give you
what it will give you an integer back and unfortunately 0.1 the integer part of it is indeed
zero so this was an example of truncation so truncation was an issue in C but it would seem as though this is no
longer a problem in Python in so far as the division operator actually handles that for us as an aside if you want the
old Behavior because it actually is sometimes useful for rounding or flooring Val vales you can actually use
two slashes and now you get the C Behavior so that now 1 divid 10 is zero so you don't give up that capability but
at least it does a more sensible default most people especially new programmers when dividing one value by another would
want to get 0.1 not zero for reasons that indeed we had to explain weeks ago but what about another problem we had
with the world of floats before whereby there is imprecision let me go ahead and somewhat cryptically print out the value
of Z as follows I'm going to format at it using an F string and I'm going to go ahead and format not just Z because this
is essentially the same thing notice this if I do python of calculator. Pi 1 and 10 I get by default just one
significant digit but if I use this syntax in Python which you won't have to use often I can actually do and see like
I did before 50 significant digits after the decimal point so now let me reun python of calculator. Pi 1 and 10 and
let's see if floating point in Precision is still with us unfortunately it is and you can see as
much here the F string the format string is just showing us now 50 digits instead of the default one so we've not solved
all problems but we have solved at least some all right before we pivot away from a mere calculator any questions now on
syntax or concepts or the like [Music] yeah how do you com how do you what oh
how do you comment really good question if you're using double slash for division with flooring or truncation
like I describe how do you do a comment in Python this is a comment and the convention is actually to use a complete
sentence like uh with a capital T here you don't need a period unless there's multiple sentences and technically it
should be above the line of Code by convention so you would use a hash symbol instead good question haven't
seen those yet all right let's go ahead and make something else here how about let me go ahead and open up for instance
an example called points one. C which we saw uh a few weeks back and let me go ahead on the other side and create a
file called points. piy this was a program recall that asked the user how many points they lost on uh the first
assignment and then it went ahead and just printed out whether they lost fewer points than me because I lost two if you
recall the photo more points than me or the same points as me let me go ahead and zoom out so we can see a bit more of
this and let me now on the top right here go about implementing this in Python so I want to First prompt the
user for some number points so from cs50 let's import get int so it handles the error checking let's then do points
equals get int and ask the user how many points did you lose question mark then let's go ahead and say if points less
than two which was my value print you lost fewer points than me otherwise if it's else if points greater than two go
ahead and print uh you lost more points than me else let's go ahead and handle the final
scenario which is you lost the same number of points as me before I run this does anyone want to point out a mistake
I've already made yeah yeah so else if in C is actually now L if in Python it's a single word so
let me change this to L if and now cross my fingers python of points. Pi suppose you lost three points on some assignment
you lost more points than my two if you only lost one point you lost fewer points than me so the logic is the same
but notice the code is much tighter in 10 Total Lines we did in what was 24 lines because we've thrown away a lot of
the syntax the curly braces are no longer necessary the parentheses are gone the semicolons so this is why it
just tends to be more pleasant pretty quickly using a language like this all right let's do one other example here um
in C recall that we were able to determine the parity of some number if something is even or odd well in Python
let me go ahead and create a file called parity. piy and let's look for a moment at the C version at left here was the
code in C that we used to determine the parity of a number and really the key takeaway from all these lines was just
the remainder operator and that one is still with us so this is a simple demonstration just to make that point if
in Python I want to determine whether a number is even or odd well let's go ahead and from cs50 import get int then
let's go ahead and get a number like n from the user using get int and ask them for n and then let's go ahead and say if
n percent sign 2 equals 0 then let's go ahead and print quote unquote even else let's go ahead and print out
odd but before I run this anyone want to instinctively even though we've not talked about this point out a mistake
here what I do wrong yeah so double equals again so even though some of the stuff is changing some of the same ideas
are the same so this two should be a double equal sign because I'm comparing for equality here and why is this the
right math well if you divide a number by two it's either going to have zero or one as a remainder and that's going to
determine if it's even or odd for us so let's run python of parity. Pi type in a number like 50 and hopefully we get
indeed even so again same idea but now we're down to eight lines of code instead of the 20 well let's now do
something a little more interactive and a little representative of tools that actually ask the user questions in C
recall call that we had this agreement program agree. C and then let's go ahead and Implement a corresponding version in
Python in a file called agree. piy and let's look at the C version first on the left we used get Char here and then we
used the double vertical bars to check if C is equal to Capital y or lowercase Y and then we did the same thing for n
for no and so let's go over here and let's do from cs50 import get okay get Char is not a thing
and this here is another difference with python there is no data type for individual characters you have string
stirs and honestly those are fine because if you have a stir that's just one character for all intents and
purposes it is just a single character so it's just a simplification you don't have to think as much you don't have to
worry about double quotes single quotes in fact in Python you can use double quotes or single quotes so long as
you're consistent so long as you're consistent the single quotes do not mean something different like they do in C so
I'm going to go ahead and use get string here although strictly speaking I could just use the input function as we saw
before I'm going to get a string from the user that asks them this uh get string quote unquote do you agree like a
little checkbox or interactive prompt where you have to say yes or no you want to agree to the following terms or
whatnot and then let's translate the conditionals to python now too so if s equals equals quote unquote y or S
equals equals lowercase y let's go go ahead and print out agreed just like in c l if s equals equals n or S equals
equals little n let's go ahead then and print out not agreed and you can already see perhaps one of the differences here
too is python a little more English like in that you just literally use the English word or instead of the two
vertical bars but it's ultimately doing the same thing can we simplify this code a bit though this would be a little
Annoying if we wanted to add support not just for Big Y and little y but yes or big yes or little yes or Big Y lower
case e capital s right there's a lot of permutations of y s or just y that we ideally should tolerate otherwise the
user has going to have to type exactly what we want which isn't very user friendly any intuition for how we could
logically even if you don't know how to do it in code make this better yeah nice yeah we saw an example of a
list before just zero one two why don't we take that same idea and ask a similar question If s is in the following list
of values y or little y or heck let me add to the list now yes or maybe all capital yes and it's going to get a
little Annoying admittedly but this is still better than the alternative with all the ores I could do things like this
and so forth there's a whole bunch more permutations but let's leave this alone and let me just go into here and change
this too if s is in the following list of n or little n or no and I won't do as let's just not worry about the weird
capitalizations there for now let's go ahead and run this python of agree. Pi do I agree why okay how about yes all
right how about big yes okay that does not seem to work notice it did not say agreed and it did not say not agreed it
didn't detect it so how can I do this well you know what I could do what I I don't really need the uppercase and
lowercase let me tighten this list up a little bit and why don't I just force s to be lowercase s. lower recall whether
it's one character or more is a function built into stirs now strings in Python that forces the whole thing to a
lowercase so now watch what I can do python of agree. Pi little y that works Big Y that works big yes that works Big
Y little e big S that also works so we've now handled in one Fell Swoop a whole bunch more logic and you know what
we can tighten this up a bit here's an opportunity in Python for slightly better design what have I done in here
that's a little redundant does anyone see an opportunity to eliminate a redundancy doing
something more than times than you need is a stretch shirt no yep we we could move the s. lower above
notice that I'm using s. lower twice but it's going to give me the same answer both times so I could do a couple of
things here I could first of all get rid of this lower and get rid of this lower and then Above This maybe I could do
something like this s equal I can't just do this because that throws the value away it does the math but it doesn't
convert the string itself it's going to return a value so I have to say s equals s. lower I could do that or honestly I
can chain these things together and this is not something we saw and see if get string returns a string and strings have
functions like lower in them you can chain these functions together like this and do dot this dot that dot this other
thing and eventually you want to stop because it's going to be crazy long but this is reasonable still fits on the
screen it's pretty tight it does in one place what I was doing in two so I think that's okay let me go ahead and do
python of agree. Pi one last time let's try it one last time and it's still working as intended also if I tried
those other inputs as well yeah [Music] question cover all the functions lower
or all function upper as together um let me summarize could we could we
handle uppercase and lower case together in some form I'm actually doing that already I just have to pick a lane I
have to either be all lowercase in my logic or all uppercase and not worry about what the human type's in because
no matter what the human types in I'm forcing their input to lowercase and then I am using a lowercase list of
values if I want to flip that fine I just have to be self-consistent but I'm handling that already yeah
[Music] a really good loaded questions are strings no longer an array of characters
conceptually yes underneath the hood no they're a little more sophisticated than that because with strings you have a few
changes not only do they have functions built into them because strings are now what we call objects in what's called
objectoriented programming and we're going to keep seeing examples of this dot operator they are also immutable so
to speak I IM m u t a b l e immutable means they cannot be changed which means unlike C you can't go into a string and
change its individual characters you can make a copy of the string that makes a change but you can't change the original
string itself this is both a little Annoying maybe sometimes but it's also pretty protective because you can't do
screw-ups like I did weeks ago when I was trying to copy s and call it t and then one affected the other python
underneath the hood is handling all of the memory management and the pointers and all of that there are no pointers in
Python if that were wasn't clear all of that pain if you will all of that power is now handled by the language itself
not by us the programmers all right so let's introduce maybe some Loops like we've been uh in
the habit of doing let me open up meow. C which was an example in C just meowing a bunch of times textually let me create
a file called meow. P here on the right and notice on the left this was correct code in C but it was kind of poorly
designed why because it was a missed opportunity for a loop why say something three times when you can say it just
once so in Python let me do it the uh poorly designed way first first let me print out meow and like I generally
should not let me copy paste it three times run python of meow. and it works okay but not good practice so let me go
ahead and improve this a little bit and there's a few ways to do this if I wanted to do this three times I could
instead do something like this uh for I in range of three recall that that was the better version rather than
arbitrarily enumerate numbers yourself let me go ahead and print out quote unquote meow now if I run python of meow
still seems to work so it's a little tighter in my God like programs can't really get much shorter than this we're
down to two lines of code no main function no gratuitous syntax let's now improve the design further like we did
in C by introducing a function called meow that actually does the meowing so this was our first abstraction recall
both in scratch and in C let me Focus now entirely on the python version here let me go ahead and first uh Define a
function uh let me first go ahead and do this for I in range of three let's assume for the
moment that there's a meow function that I'm just going to call let's now go ahead and Define using the deaf keyword
which we saw briefly with the speller uh demonstration a function called meow that takes no arguments and all it does
for now is print meow let me now go ahead and run python of meow. py enter huh one of those tracebacks so this is
another name error and again name meow is not the defined what's your instinct here even
though we've not tripped over this yet in Python where does your mind go here [Music]
yeah perfect as smart as as smarter as python seems to be it ALS it still makes certain assumptions and if it hasn't
seen a keyword yet it just doesn't exist so if you want it to exist we have to be a little clever here I could just put it
flip it around like this but this honestly isn't particular ularly good design why because now if you the reader
of your code whether you wrote it or someone else you kind of have to go fishing now like where does this program
begin and even though yes it's obvious that that it begins on line four logically like if the file were longer
you're going to be annoyed in fishing visually for the right lines of code so let's reintroduce Main and indeed this
would be a common Paradigm when you want to start having abstractions in your own functions just put your own code in main
so that one you can leave it up top and two you can solve the problem we just encountered so let me Define a function
called main that has that same Loop meowing three times but now Watch What Happens let me go into my terminal and
run python of meow. enter nothing all
right investigate this what could explain this symptom I have not told you the answer yet so all you have is your
instinct assuming you've never touched python before What might explain this symptom where nothing is meowing
yeah yeah I didn't run the main function so in C this is functionality you get for free you have to have a main
function but heck so long as you make it it will be called for you in Python this is just a convention to create a main
function borrowing a very common name for it but if you want to call that main function you have to do it so this looks
a little weird admittedly that you have to call your own main function now and it has to be at the bottom of the file
because only once The Interpreter gets at the bottom of the file have all of your functions been defined higher up
but this solves both problems it keeps your code that's the main part of your code at the very top of the file so it's
just obvious to you and a TF or any reader in the future where the program logically starts but it also ensures
that main is not called until everything else main included has been defined so this is another perfect example of we're
learning a new language for the first time you're not going to have heard all the answers before just apply some logic
as do like all right what could explain this symptom start to defer how the language does or doesn't work if I now
go and run this python of meow. now we're back in business and just so you have seen it there is a quote unquote
better way of doing this that solves different problems that we are not going to encounter certainly in these initial
days uh typically you would see in online tutorials or books something that looks like this where you actually have
a weird conditional with multiple underscores that's functionally the same thing but it solves problems with
libraries if we ourselves were implementing a library or something similar in spirit but we're going to
keep things simpler and just write main at the bottom because we're not going to encounter that problem just yet all
right let's make one change to this just to show how it's done in C the last version of Meo also took command line
argu sorry also took arguments to the function meow so suppose that I want to factor this out and I want to just call
meow as a better abstraction where I just say meow this number of times and I figure out how many times by just like
putting in number three or using get in or something like that to figure out how many times to say meow well now I have
to Define inside my meow function in input let's call it n and then use that as by doing this for I in range of n let
me go ahead and print out meow that many times so again the only thing that's different in C is we don't bother
specifying return types for any of these functions and we don't bother specifying the type of our arguments or our
variables so same ideas simpler in some sense we're just throwing away keystrokes all right let me run this one
final time python of meowy and we still have the same program all right let me pause here any questions and I know this
is going fast but hopefully the C code is still somewhat familiar
yeah good question is there any difference between Global and local variables short answer yes and we would
run into that same problem if we declare a variable in one function another function is not going to have access to
it we can solve that by putting F uh variables globally but we don't have all of the features we had in CED like
there's no such thing as a constant in Python the mentality in the python Community is if you don't want some
value to change don't touch it like just don't screw up so there's trade-offs here too some languages are stronger or
more defensive than that but that too is p p of the mindset with this particular language
yeah oh sorry where's say it louder [Music] that is an amazing segue let's come to
that in just a moment because we're going to recreate also that Mario example where we had like the um
question marks for the coins and the vertical bars so let's come back to that in a second and your
[Music] question correct strings are immutable anytime you seem to be modifying it as
with the lower function you're getting back a copy so it's taking a little more memory somewhere but you don't have to
deal with it Python's doing that for [Music] you say it again you don't need
what you don't free anything so if you weren't a big fan over the past couple of weeks of malok or free or memory or
addresses or all of those low-level implementation details python is the language for you because all of that is
handled for you automatically Java does the same uh JavaScript does the same yeah
[Music] how do you define a global variable if there's no main function in Python
Global variables by definition always need to be outside of main as well so that's not a problem if I wanted to have
a function that's outside of and therefore Global to all of these like um Global actually don't use the word
Global that's a special word in Python uh variable equals uh food fo just as an arbitrary string value that a computer
scientist would typically use um that is now global there are some caveats though as to how you access that but let's come
back to that another time but that problem is solvable too all right so let's go ahead and do this to come back
to the question about the print command let me go ahead and create a file now called
mario. won't bother showing the C code anymore we'll focus just on the new language here but recall that in Python
in uh Mario we wanted to First do something like this this was a random screen from the Sid scroller version one
of Super Mario Brothers and we just want to print like three hashes to represent those three blocks well in Python we
could do something like this print oh sorry four I in the range of three go ahead and print out quote unquote hash
and I think this is pretty straightforward python of Mario dopy we get our three hashes you could imagine
parameterizing this now though and getting actual user input so let's do that let me go up here and let me go and
say from cs50 import get int and then let's get the input from the user so it actually is a value n like all right get
int uh the height of the column of bricks that you want to do and then let's go ahead and print out n hashes
instead of three so let me run this let's print out like five hashes okay 1 2 3 four five that seems to work too and
it it's going to work for any positive value but it's not going to work for how about -1 that just doesn't do anything
but that seems okay but also recall that it's not going to work if the user types in something weird like C um oh sorry it
is going to work if the user types in something weird like cat why we're using cs50's get int function which is
handling all of those headaches for us but what if the user in D types a negative number were tolerating that so
that was the bug I wanted to highlight it would be nice to reprompt them and reprompt them and in C what was the
programming construct we used when we wanted to ask the user a question and then if they didn't cooperate prompt
them again prompt them again what was that yeah yeah do while loop right that was useful because it's
almost the same as a while loop but instead of checking a condition and then doing something you do something and
then check a condition which makes sense with user input because what are you even going to check if the user hasn't
done anything yet you need that inverted logic unfortunately in Python there is no do while loop there is a for Loop
there is a while loop and frankly those are enough to recreate this idea and the way to do this in Python the pythonic
way which is another term of Art in the community is to say this deliberately induce an infinite Loop while true with
capital T for true and then do what you got to do like get an INT from a user asking them for the height of this thing
and then if that is what you want like a number greater than zero go ahead and break out of the loop so this is how in
Python you could recreate the idea of a Doh Loop Loop you deliberately induce an infinite Loop so something's going to
happen at least once then if you get the answer you want you break out of it effectively achieving the same logic so
this is the pythonic way of doing a do while loop let me go ahead and run python of mario. py type in three this
time and now I get back just the three hashes as well what if though I wanted to get rid
of uh how about ultimately that cs50 Library function and also encapsulate this in a function well let's go ahead
and tweak this a little bit let me go ahead and remove this temporarily give myself a main function so I don't make
the same mistake as I did initially earlier and let me give myself a function called get height that takes no
arguments and inside of that function is going to be that same code but I don't want to break in this case I want to
return n so recall that if you return from a function you're done you're going to exit from right that point so this
would be fine you can just say return n inside of the loop or if you would prefer to break out you could do
something like this instead break and then down here you could return down here you could return in as well and let
me make one point here before we go back up to main this is a little different from C and this one's subtle what have I
done here that in C would have been a bug but is apparently not I claim in Python it's super subtle this one
yeah we [Music] so similar it's not quite that we're
using it first so it's okay not to declare a variable with like the data type we've addressed that before but on
line nine we're assigning n a value it seems and then we return uh n on line 12 but notice the indentation in the world
of C if we had declared a variable inside of a loop on line line it would have been scoped to that Loop which
means as soon as you get out of that Loop like further down in the program n would not exist it would be local to the
curly braces therein here logically curly braces are gone but the indentation makes clear that n is still
inside of this Loop between lines 8 through 11 but n is actually still in scope in Python the moment you create a
variable in Python For Better or For Worse it is available everywhere within that function even outside of the loop
in which you defined it so this logic is actually okay in Python in C recall to solve this same problem we would have
had to do something a little hackish like this like Define n up here on line 8 so that it exists now on line 10 and
so that it exists on line 13 that is no longer an issue or need in Python once you create a variable even if it's
nested nested nested inside of Some Loops or conditionals it still exists within the function itself all right any
questions then on on this before we now run this and then get rid of the cs50 library
again okay so let me go ahead and get the height from the user let's go ahead and create a variable in main called
height let's call this get height function and then let's use that height value instead of something hardcoded
there and let me see if this all works now python of maroy hopefully I haven't messed up but I did but this is an easy
fix now yeah I got to call Main so again I deleted that earlier but let me bring it
back so I'm actually calling main let me rerun python of mario. there we go height three now it seems to be working
so let's do one last thing with Mario just to tie together that idea now of exceptions from before again exceptions
are a feature of python whereby you can try to do something and if there's a problem you can handle it in any way you
see fit previously I handled it by just yelling at the user that that's not an INT but let's actually use this to
reimplement cs50's own get int function let me throw away cs50's get int function and now let me go ahead and
replace uh get int with input but it's not sufficient to just use input what do I have to add to this line of code on
line8 if I want to get back in int yeah I have to cast it to an INT by calling the int function around that
value or I could do it on a separate line just to be clear I could also do n equals int of n that would work too but
it's sort of an unnecessary extra long this is not sufficient because that does not change the value it creates the
value but then it throws it away you need to assign it so the conventional way to do this would probably be in one
line just to keep things nice and tight so that works fine now if I run python of mario. py I can still type in three
and all as well I can still type in Nega -1 because that is an INT that I am handling what I'm not yet handling is
weird input like cat or some string that is not a base 10 number so here again is my Trace back and notice that here let
me scroll up a little bit here we can actually see a more detail in the trace back notice that just like in C or just
like in the debugger in VSS code you can see a few things you can see mention of module that just means your file main
which is my main function and get height so notice it's kind of backwards it's top to bottom instead of bottom up as we
drew it on the board the other day and as we envisioned stacks of trays in the cafeteria but this is your stack of
functions that have been called from top to bottom get get height is the most recent main is the very first value
error is the problem so let's try to do let's try to do this literally except if there's an error so what do I want to do
I'm going to go in here and I'm going to say try to do the following whoops try to do the following
except if there's a value error value error then go ahead and say something like well like before print that's not
an integer exclamation point But the difference this time is because I'm in a loop the user's going to have a chance
to recover from this issue so if I run mario. py 3 still works as before if I run my. and type in cat I detect it now
and because I'm still in that Loop and because the program hasn't crashed because I've caught so to speak the
value error using this line of code here that's the way in Python to detect these kinds of errors that would otherwise end
up being on the user's own screen if I type in cat dog that doesn't work if I type in though two I get my two hashes
cuz that's indeed an INT all any questions on this and we're not going to spend too much time on exceptions but
just wanted to show you what's involved with getting rid of those training wheels
yeah okay so let's do this that actually comes to the earlier question about printing the hashes on the same line or
maybe something like this where we have little bricks in the sky or little question marks let's recreate this idea
because the problem with print as was noted earlier is you're automatically printing out new lines but what if we
don't want that well let's change this program entirely let me throw away all the functions let's just go to a simpler
world where we're just doing this so let me start fresh in Mario up high I'm not going to bother with exceptions or
functions let's just do a very simple program to create this idea 4 I in range of four this time CU there are four of
these things in the sky let's go ahead and just print out a question mark to represent each of those bricks odds are
you know this is not going to end well because these are unfortunately as you predicted on separate lines so it turns
out that the print function actually takes in multiple arguments not just the thing you want to print but also some
additional arguments that allow you to specify what the default line ending should be but what's interesting about
this is that if you want to change the line ending to be something like quote unquote that is nothing instead of back
sln this is not sufficient because in Python you can have two types of arguments or parameters some arguments
are positional which is the fancy way of saying it's a comma separated list of arguments and that's what we did all the
time in C something comma something comma something we did it in print F all the time and in other functions that
took multiple arguments in Python you have not only positional arguments where you just separate them by commas to give
one or two or three or or more arguments there are also named arguments which looks weird but is helpful for reasons
like this if you read the documentation you will see that there is a named argument that python accepts called n
and if you set that equal to something that will be used as the end of every line instead of the default which the
documentation will also say is quote unquote back sln so this line here has no effect on my logic at the moment but
if I change it to just quote unquote essentially overriding the default new line character and now Run Mario again
now I get all four on the same line there's a bit of a bug though my prompt is not meant to be on the same line so I
can fix that by just printing nothing but really it's not nothing cuz you get the new line for free so let me run
python of mario. again and now we have what I intended in the first place which was a little something that looked like
this and this is just one example of an argument that has a name but this is a common Paradigm in Python 2 to not just
separate things by commas but to be very specific because the print function might take five 10 even 20 different
arguments and my God if you had a enumerate like 10 or 20 commas you're going to screw up you're going to get
things in the wrong order named arguments allow you to be resilient against that so you only specify
arguments by name and it doesn't matter what order they are in all any questions then on on this and the overriding of
new line and to be clear you can do something like very weird but logically expected like this by just changing the
line ending too but the right way to solve the Mario problem would be just to override it to be nothing like this
all right how about this for cool and this is why a lot of people like python suppose you don't really like Loops you
don't really like three line programs cuz that was kind of three times longer than it needs to be what if you just
printed out a question mark four times python whoops python of mario. piy that also works so it turns out that just
like the plus operator in Python can join things together the multiply operator is not arithmetic in this case
it actually means take this and do the con it four times over so that's a way of just distilling into one line what
would have otherwise taken multiple lines in C fewer but still multiple lines in Python but is really now rather
succinct in in Python by doing that instead let's do one last Mario example which looked a little something like
this if this is another part of the Mario interface uh this is like a grid of like 3x3 bricks for instance so two
Dimensions now just not just vertical not horizontal but now both let's print out something like that using hashes
well how about how do I do this so how about 4 I in range of three then I could do 4 J in
range of three just because J comes after I and that's reasonable for counting I could now print out a hash
symbol uh well let's see what this does python of mario. py okay that's just one crazy
long column what's what do I need to fix in wear here to make this look like this so 3x3 bricks instead of one long
column any [Music] instincts okay so after printing three
we want to skip a line so maybe like print out a blank line here okay let's try that I like that instinct write
print three new line print three new line let's go ahead and run python of mario. py okay it's more visible what
I'm doing but still wrong what can I what's the remaining fix though [Music]
yeah yeah I'm getting w an extra new line here which I don't want while I'm on this row so let me do end equals
quote unquote and now together your Solutions might take us the whole way there Mario uh python of mario. py voila
now we've got it in two dimensions and even this we can tighten up like we could just use the little trick we
learned so we could just say print a hash times three times and we can get rid of one of those Loops Al together
all it's doing is auto whoops all it's doing is automating that process but no I don't want to do that what do I how do
I fix this here oh I don't think I want this anymore right because that's giving me an extra new line so now this program
is really tightened up same thing two lines of code but we're now implementing the same two-dimensional structure here
all right any questions here on these [Music] yeah any if print in any spaces uh say
that once more [Music] [Music]
oh um oh yes good question I see what you're saying so in a previous version Let Me rewind in time when we had this I
did not put spaces the convention in Python is not to do that why it just starts to add too much space and this is
a little inconsistent because earlier when we talked about like pluses or spaces around the less than or equal
signs I did say added here it's actually clear and recommended to keep them tighter together otherwise it just
becomes harder to read where the gaps are good observation all right let's do how about
[Music] um another five- minute break let's do that and then we're going to dive into
some more sophisticated problems and then ultimately build with some audio and uh visual examples as well see you
in five all right right so almost all of the examples we
just did were Recreations of what we did in week one and recall that week one was like our most syntax heavy week it was
when we were first learning how to program in C but after week one we began to focus a bit more on ideas like arrays
and other higher level constructs and we'll do that again here condensing some of those first early weeks into a fewer
set of examples in Python and we'll culminate by actually taking python out for a spin and doing things that would
be way harder to do and way more timeconsuming to do in and see um even more so than the speller example but how
do you go about figuring out like what functions exist if you didn't hear it in class you don't see it uh online but you
want to see it officially you can go to the python documentation docs. python.org here and I will disclaim that
honestly the python documentation is not terribly userfriendly Google will often be your friend so Googling something
you're interested in to find your way to the appropriate page on python.org or stackoverflow.com is another popular
website as always though the line should be Googling things like how do I convert a string to lower case like that's
reasonable to Google or how to convert to uppercase or how Implement function in Python but Googling of course things
like how to implement problem set 6 in cs50 of course crosses the line but moving forward in really with
programming in general like Google and stack Overflow are your friends but the line is between the reasonable and the
unreasonable so let me officially use the python documentation search just to search for something like the lowercase
function like I know I can lowercase things in Python I don't quite remember how so let me just search for the word
lower you're going to get often an overwhelming number of results cuz Python's a pretty big language with lots
of functionality and you're going to want to look for familiar patterns for whatever reason string. lower which is
probably more popular or more commonly used than these other ones is third on the list but it's purple cuz I clicked
it a moment ago when looking for it so stir. lower is probably what I want cuz I am interested at the moment in
lowercasing strings when I click on that this is an example of what python 's documentation tends to look like it's in
this General format here's my stir. lower function this returns a copy of the string with all the case characters
converted to lowercase and the lower casing algorithm dot dot dot so that doesn't give me much it doesn't give me
sample code but it does say what the function does and if we keep looking you'll see mention of L strip which is
left strip I used its analog R strip before Right strip which allows you to remove that is strip from the end of a
string something like white space like a new line or even something else and if you scroll through string this web page
here and we're halfway down the page already if you see my scroll bar tiny on the right there's a huge amount of
functionality built into string objects here and this is just Testament of just how rich the language itself is but it's
also reason to uh to assure that the goal when playing around with some new language and learning it is not to learn
it exhaustively just like in English or any human language there's always going to be vocab words you don't know uh ways
of presenting the same information in some language that's going to be the case with python and what we'll do today
and this week in problem set six is really get your footing with this language but you won't know all of
python just like you won't know all of c and honestly you won't know all of any of these languages on your own unless
you're perhaps using them full-time professionally and even then there's more libraries than one might even
retain themselves so let's actually now pivot to a few other ideas that we'll Implement in Python in a moment let me
switch back over to vs code here and let me whip up say a recreation of our scores example from week two where we
averaged like three scores together and that was an opportunity in week two to play with arrays to realize how
constrained arrays are they can't grow or Shrink you have to decide in advance but let's see what's different here in
Python so let me do scores. piy and let me give myself an array in Python called scores sorry let me give myself a
variable in Python called scores set it equal to a list of three scores which are the same ones we've used before 72
73 33 in this context meant to be scores not ask values and then let's just do the average of these so average will be
another variable and it turns out I can do well how did I sum these before I probably had a four Loop to add one then
I knew how long there were turns out in Python you can just say sum of scores divided by the length of scores that's
going to give me my average so sum is a function that takes a list in this case as input and it just does the sum for
you with a for Loop or whatever underneath the hood Lang gives you the length of the list how many things are
in it so I can dynamically figure that out now let me go ahead and print out using print uh the word average and then
in curly braces the actual average close quote all right so let's run this code python of scores. py and there's my
average in this case 59.3 3333 and so forth based on the math well let's actually now change this a little bit
and make it a little more interesting and actually get input from the user rather than hardcoding this let me go
back up here and use from cs50 import get int cuz I don't want to deal with with all the exceptions and the loops
like I just want to use someone else's function here let me give myself an empty list called scores and this is not
something we were able to do in C right because in C if you tried to make an empty array well that's pretty stupid
because you you can't add things to it it's a fixed size so it wouldn't even let you do that but I can just create an
empty list in Python cuz lists unlike arrays are really link list they'll grow and Shrink but you and I are not dealing
with all the pointers underneath the hood pythons doing that for us so now let's go ahead and get a whole bunch of
scores from the user how about three of them in total so four I in range of three let's go ahead and grab a score
from the user using get int asking them for score and then let's go ahead and append to the
scores uh list that particular score so it turns out that a list and I could read the python documentation to confirm
as much lists have a function built into them and functions built into objects are generally known as methods if you've
heard that term before same idea but whereas a function kind of stands on its own a method is a function built into an
object like a list here that's going to achieve the same result strictly speaking I don't need the variable just
like in C I could tighten this up and do something like this as well but I don't know I kind of like it this way it's
more clear to me at least that what I'm doing here getting the score and then appending it to the list now the rest of
the code can stay the same python of scores. Pi score will be 72 73 33 and I get back the math but now the program is
a little more Dynamic which is nice but there's other syntax I could use here just so you've seen it python does have
some neat syntactic tricks whereby if you don't want to do scores. append you can actually say scores plus equals this
score so you can actually concatenate lists together in Python 2 just as we use plus to join two strings together
you can use plus to join two lists together the catch is you need to put the one score I'm adding here in a list
of its own which is kind of silly but it's necessary so that this thing and this thing are both lists to do this
more verbosely which most programmers wouldn't do but just for clarity this is the same thing as saying scores plus
this score so now maybe it's a little more clear that scores and bracket score plural sorry singular are both list
themselves being concatenated or joined together so two different ways not sure one is better than the other this way is
pretty common but do aen is also quite reasonable as well all right how about another example
from week two uh this one was called uh uppercase so let me do this in uppercase dop though this time and let me import
from cs50 uh get string again and let me go ahead and say before will be my first variable let me get a string from the
user asking them for a before string and then let me go ahead and say after just to demonstrate some changes upper casing
to this string uh let me change my line ending to be that using our new trick and this is where things get cool in
Python relatively speaking if I want to iterate over all of the characters in a string and print them out in uppercase
one way to do that would be this for C in the before string go ahead and print out c. uppercase sorry c. uper but don't
end the line yet because I want to keep these all on the same line until I'm all done so what am I doing python of
uppercase dop let me type in hello in all lowercase I've just uppercased the whole string how I first get string
calling it before I then just print out some fluffy text that says after colon and I get rid of the line ending just so
I can kind of line these up notice I hit the space bar a couple times just so letters line up to be pretty for C and
before this is new this is powerful in C sorry in Python whereby you don't have to do like n i equals zero n i less than
this you could just say for C in the string in question for C in before and then here is just upper casing that
specific character and making sure we don't output a new line too soon but this is actually more work than I need
to do based on what we've seen thus far like from our agreement example can I tighten this up further can I Collapse
lines five and six maybe even seven Al together if I the goal of this program is just to uppercase the before
string how might I do this yeah in back stirer stir uper upper yeah so I could do something like this
after gets before do uper so it's not stir literally uper stir just represents the string in question so it would be
before. uper but right idea otherwise and so let me go ahead and just tweak my print statement a little bit let me just
go ahead and print out the after variable here after creating it so this line is the same I'm getting a string
called before I'm creating another variable called after and as you propose I'm calling upper on the whole string
not one character at a time why because it's allowed and again in Python there aren't technically car characters
individually there's only strings anyway so I might as well do them all at once so if I rerun the code now python of
uppercase dopy now I'll type in hello and all lowercase and oh so close I think I can get rid of this override
because I'm printing the whole thing out at once not character by character so now if I type in hello before now I have
an even tighter version of the program here all right any questions then on s or on strings and what this kind of
function upper represents with its docs all right so a couple other building blocks before we start oh where
was that right to the right the right right yes thank [Music]
you yes do I have to create this variable upper no I don't I could actually tighten this up and if you
really want to see something neat inside of the curly braces you don't have to just put the names of variables you can
put a small amount of logic so long as it doesn't start to look stupid and kind of overwhelmingly complex such that it's
sort of bad design at that point I can tighten this up like this and now run python of uppercase dop writing hello
again and that too worked but I would be careful about this you want to resist the temptation of having like a long
line of code that's inside the curly braces because it's just going to be harder to read but absolutely you could
indeed do that too all right how about uh command line arguments which was one thing we introduced in week two also so
that we could actually have the ability to take input from the user whoops um so we could actually take input from the
user at the command line so as to take literally command line arguments these are a little different but it follows
the same Paradigm there's no main by default and there's no def main int argc Char or what we call it string ARG V by
default there's none of this so if you want access to the argument Vector argv you import it and it turns out there's
another uh module in python or library in Python called CIS and you can import from the system this thing called argv
so same idea different place now I'm going to go ahead and do this let's write a program that just requires that
the user types in two uh a word after the program's name or none at all so if the length of argv equals 2
let's go ahead and print out how about hello comma Arvy bracket 1 close quote else if they don't type two words total
at the prompt let's just say the default like we did weeks ago hello world so the only thing that's new here is we're
importing Argy from CIS and we're using this fancy f- string format which kind of to your point too it's it's putting
more complex logic in the curly braces but that's okay this in this case it's a list called argv and we're getting
bracket one from it let's do python of Arvy dopy enter hello world what if I do Arvy dopy David at the command line now
I get hello David so there's one curiosity here python is not included in argv whereas in C do whatever was the
first thing if the analog in Python is that the name of your Python program is the first thing in bracket zero which is
why David is in bracket one the word python does not appear in the argv list just to be clear but otherwise the idea
of these arguments is exactly the same as before and in fact what you can do which is kind of cool is because argv is
a list you can do things like this for ARG in argv go ahead and print out each argument so instead of using a for Loop
and I and all of this if I do python of argv enter it just writes the program's name if I do python of argv Fu it puts
argv dop and Fu if I do sorry if I I do Foo and bar those words all print out if I do Foo bar baz those print out too and
Foo and bar baz are like a mathematician's X and Y and Z for computer scientist when you just need uh
some placeholder words so this is just nice it reads a little more like English and a for Loop is just much more concise
allows you to iterate very quickly when you want something like that suppose I only wanted the real words that the
human typed after the program's name like suppose I want to ignore Arvy dop I mean I could do something hackish like
this if ARG equals ARG v. Pi I could just ignore you know let's invert the logic I could do this for instance so if
the ARG does not equal the program name then go ahead and print out the word so I get Fubar and baz only or this is
what's kind of neat about python 2 let me undo that and let me just take a slice of the array of the list instead
so it turns out if argv is a list I can actually say you know what go into that list start at element one instead of
zero and then go all the way to the end and we have not seen the syntax in C but this is a way of slicing a list in
Python so now watch what happens if I run python of rgvp Fu bar baz enter I get only a
subset of the list starting at position one going all the way to the end and you can even do kind of the opposite if for
whatever reason you want to ignore the last element you can say colon um we could say colon -1 and use a
negative number which we've not seen before which slices off the end of the list as well so there's some syntactic
tricks that tend to be powerful in Python 2 even if at first glance you might not need them for typical things
all right let's do one other example with exit and then we'll start actually applying some algorithms to make things
interesting so in one last program here let's do exit. just to do one more mechanic before we introduce some
algorithms and let's do this let's import um from CIS import argv let's now do this uh let's
make sure the user gives me one command line argument so if the length of argv does not equal two in total then let's
go ahead and print out something like missing command line argument just to explain what the problem is and then
let's do this um we can exit but I'm going to use a better version of exit here let me import two functions from
CIS turns out the better way to do this is with cy. exit because I can then exit specifically to with a exit code
otherwise down here I'm going to go ahead and print out something like uh hello comma arv bracket 1 same as before
and then I'm going to exit with zero so again this was a subtle thing we introduced in week two where you could
actually have your programs exit with some number where zero signifies success and anything else signifies error this
is just the same IDE in Python so if I for instance just run the program like this oops I screwed up I meant to say
exit here and exit here let me do that again if I run this like this I'm missing a command line argument so let
me rerun it with like my name at the prompt so I have exactly two command line arguments the file name and my name
hello comma David and if I do David ma it's not going to work either because now argv does not equal two but the
difference here is that we're exiting with one so that special programs can detect an error or zero in the event of
success and now there's one other way to do this too suppose that importing a lot of functions and you don't really want
to make a mess of things and just have all of these function names available without it being clear where they came
from let's just import all of CIS and let's just change our syntax kind of like I proposed for cs50 where we just
Preen to all of these Library functions CIS just to be super explicit where they came from and if there's another uh exit
or um arv value that we want to import from a library this is one way to avoid Collision so if I do it one last time
here missing command line argument but David still actually works all right only to demonstrate how we can Implement
that same idea let's now do something more powerful like a search algorithm like binary search I'm going to go ahead
and open up a file called numbers. and let's just do some searching or linear search rather on a list of numbers let's
go ahead and do this how about uh import CIS as before let me give myself a list of numbers like uh 4 6 8 2 75 0 so just
a bunch of integers and then let's do this if you recall from week three we searched for the number zero at the end
of the lockers on stage so let's just ask that question in Python no need for a loop or anything like that if zero is
in the numbers go ahead and print out found and then let's just exit successfully with zero else if we get
down here let's just say print not found and then we'll sis then we'll CIS exit with one so this is where python starts
to get powerful again here's your list here is your Loop that's doing all of the checking for you underneath the hood
Python's going to use linear search you don't have to implement it yourself no while loop no for Loop you just ask a
question if zero is in numbers then do the following so that's one feature we now get with python and get to throw
away a lot of that code we can do it with strings to let me open a file called names. piy instead and do
something that was even more involved in C because we needed stir comp and the for Loop and so forth let me import CIS
for this file let's give myself a bunch of names like we did in C and those were Bill and Charlie and Fred and
George and Jenny and two more uh Percy and lastly Ron and recall at the time we looked for Ron and so we had to iterate
through the whole thing doing stir comp and i++ and all of that now just ask the question if Ron is uh in names then
let's go ahead and whoops let me hide that I hit the command too soon let me go ahead head and say print uh found as
before CIS exit one just to indicate success and then down here if we get to this point we can say not found and then
we'll just CIS exit one instead so again this just does linear search for us by default uh python of names. we found Ron
because indeed he's there and at the end of the list but we don't need to deal with all of the mechanics of it all
right let's take things one step further in week three we also implemented the IDE of a phone book that actually
Associated keys with values but remember the phone book in C was kind of a hack right because we first had two arrays
one with names one with numbers then we introduced structs and so we gave you a person structure and then we had array
of we had an array of persons you can do this in Python using objects and things called classes but we can also just use
a general purpose dictionary because just like in pet 5 you can associate keys with values using a hash table
using a tri well similarly can python just do this for us from cs50 let's import get string
and now let's give myself a dictionary of people uh di ICT open print Clos print gives you a dictionary or you can
simplify the syntax actually and a dictionary again is just keys and values words and definitions you can also just
use curly braces instead that gives me an empty dictionary but if I know what I want to put in it by default let's put
Carter in there with a number of uh plus one 617495 1,000 just like last time and put
myself David with + one uh 949 uh 468 275 and it came to my attention tragically after class that day that we
had a bug in our little Easter egg if today you would like to call me or text me at that number we have fixed the code
that underlies that little Easter egg spoiler ahead all right so this now gives me a variable called people that's
associating keys with values there is some new syntax here in Python not just the curly braces but the colons and the
quotes on the left and the right this is a way in Python of associating keys with values words with definitions anything
with anything else and it's going to be a super common Paradigm including in week seven when we look at CSS and HTML
and web programming keys and values are like this omnipresent idea in computer science and programming because it's
just a really useful way of associating one thing with another so at this point in the story we have a dictionary a hash
table if you will of people associating names with phone numbers just like a real world phone book let's write a
program that gets a string from the user and asks them whose number they would like to look up then let's go ahead and
say uh if that name is in the people dictionary go ahead and print out that person's number by going into the people
dictionary and going to that specific name within there using an F string for the whole thing so this is similar to in
spirit to before linear search and Dictionary lookups will just happen automatically for you in Python by just
asking the question if name and people and this Line's just going to print out whoever is in the people dictionary at
that name so I'm using square brackets because here's the interesting thing in Python just like you can index into an
array or a list in Python using numbers 012 you can very conveniently index into a dictionary in Python using square
brackets as well and just to make clear what's going on here let me go and create a temporary
variable person equals people bracket name and then let's just all right sorry let's say number equals people bracket
name and let's just print out the number in question in C and previously in Python anything with square brackets
like this would have been go to a location in a list or an array using a number but that can actually be a string
like a word the human is typed and this is what's amazing about dictionary it's not like a big line of a big linear
thing it's this table that you can look up in one column the name and get back in the other column the number so let's
go ahead and run python of phone book. found not that oh wait uh that's not what's supposed to happen
at all I think I'm in the wrong play P what's going
on print found I am confused okay let's run this again python ofone book. py what
the okay [Music] standby what the
H what am I not understanding here okay R Shin Carter do you see what I'm doing
[Music] wrong what the say
again oh what yeah uh found okay we're going to do this one sec
[Music] whoa okay um all this is coming out of the video um
so thanks all right I will try to figure out what was going wrong the best I can
tell it was running the wrong program I don't quite understand why so we will diagnose this later I just put the file
into a temporary directory for now to run it so let me go ahead and just run this python of phone book.
uh type in for instance my name and there's my corresponding number I have no idea what was just happening but I
will get to the bottom of it and update you if we can put our finger on it so this was just an example now of
implementing a phone book Let's now consider what we can do that's a little more powerful in these examples like a
phone book that actually keeps this information around thus far these simple phone book examples throw the
information away but using CSV files comma separated values maybe we could actually keep around the names and
numbers so that like on your phone you can actually keep your contacts around long term so I'm going to go ahead now
and do a slightly different example and let me just hide this detail so it's not confusing whoops um I'm going to change
my prompt temporarily so let me go ahead now and refine this example as follows I'm going to go into uh phone book. and
I'm going to import a whole Library called CSV and this is a powerful one because python comes with a library that
just handles CSV files for you a CS v file is just a file with comma separated values and in fact to demonstrate this
let me check on one thing here just to make this a little more real to demonstrate this let's go ahead and do
this let me import the CSV library from cs50 let me import get string let me then open a file using the open function
open a file called phonebook.com in contrast with read format and write format WR just blows it away if it
exists a pend adds to the bottom of it so I keep this phone book around just like you might keep adding contacts to
your phone now let me go ahead and get a couple values from the user let me say get string and ask the user for a name
then let me get get string again and ask the user for their number and now let me go ahead and do this and this is new and
this is python specific and you would only know this by following a a tutorial or reading the documentation let me give
myself a variable called writer and ask the CSV library for a writer to that file then let me go ahead and use that
writer variable use a function or a method inside of it called WR row to write out a list containing that
person's name and number notice the square brackets inside the parentheses because I'm just printing a list to that
particular Row in the file and then I'm just going to close the file so what is the effect all of this well let me go
ahead and run this version of phonebook and I'm prompted for a name let's do Carter first plus1 617 4 uh
495 1,000 and then let's go ahead and LS notice in my current directory there's
two files now phone book. py which I wrote and apparently phone book. CSV CSV just stands for comma separated values
and it's like a very simple way of storing data in a spreadsheet if you will where the comma represents the the
separation between your columns there's only two columns here name and number but because I'm writing to this file in
aend mode let me run it one more time python of phone book. and let me go ahead and do David
n+ one 949468 2750 enter and notice what happened in the CSV file it automatically updated because I'm now
persisting this data to the file in question so if I wanted to Now read this file in I could actually go ahead and um
do linear search on the data using a read function to actually read from the CSV but for now we'll just leave it a
little simply as right and let me make one refinement here it turns out that if you're in the habit of read opening a
file you don't have to even close it explicitly you can instead do this you can instead say with the opening of a
file called phonebook.com calling the thing file go ahead and do all of these lines here so the with
keyword is a new thing in Python and it's used in a few different ways but one of the ways it's used is to tighten
up code here and I'm going to move my variables to the outside because they don't need to be inside of the WID
statement where the file is opened this just has the effect of ensuring that you the programmer don't screw up and
accidentally Don't Close Your file in fact you might recall from C valren might have complained at you if you had
a file that uh you didn't close a file you might have had a memory leak as a result the with keyword takes care of
all of that for you as well how how about let's do do I want to do this how about let's do one other thing let's do
this let me go ahead and propose that on your phone or laptop here or online go to this URL here where you'll find a
Google form and just to show that these csvs are actually kind of omnipresent and if you've ever like used a Google
form or managed a student group or something where you've collected data via Google forms you can actually export
all of that data via CSV files so go ahead to this URL here and those of you Watching On Demand later we'll find that
the form is no longer working since we're only doing this live but that will lead to a Google form that's going to
let everyone input their answer to a question like what house do you want to end up into sort of an approximation of
the Sorting Hat in Harry Potter and Via this form will we then have the ability to export we'll see a CSV file so let's
give you a moment to do that in just a moment I'll share my version of the screen which is going to
let me actually open the file the form itself and in just a moment I'll switch over okay so this is now my version of
the form here where we have 200 plus responses to a simple question of the form what house do you belong in
Gryffindor Hufflepuff Ravenclaw or Slytherin if I go over to responses I'll see all of the responses in the gooey
form here so graphical user interface and we could flip through this and it looks like uh
interestingly 40% of Harvard students want to be in Gryffindor um 22% in Slytherin and everyone else in between
the others but you might have noticed if ever using a Google form this Google spreadsheets link so I'm going to go
ahead and click that and that's going to automatically open in this case Google spreadsheets but you can see do the same
thing with Office 365 as well and now you see the raw data as a spreadsheet but in Google spreadsheets if I go to
file and then I go to download notice I can download this as an Excel file a PDF and also a CSV comma separated values so
let me go ahead and do that that gives me a file in my downloads folder on my computer I'm going to now go back to my
code editor here and what I'm going to go ahead and do is upload this file from my downloads folder to vs code so that
we can actually see it within here and now you can see this open file and I'm going to shorten its name just so it's a
little easier to read I'm going to rename this using the MV command to just Hogwarts CSV and then we can see in the
file that there's two columns timestamp column house where you have a whole bunch of timestamps when people filled
out the form with someone very early in class and then everyone else just a moment ago and the second value after
each comma is the name of the house well let me go ahead here and Implement a program in a file called
hogwarts's just wear a program that now reads a CSV in this case not a phone book but everyone's Sorting Hat
information and I'm going to go ahead and import CSV and suppose I want to answer a reasonable question ignoring
the fact that Google's guey or graphical user interface can do this for me I just want to count up who's going to be in
which house so let me give myself a dictionary called houses that's initially empty with curly braces and
let me precreate a few keys let me say Griff uh indor is going to be initialized to zero uh huffle puff will
be initialized to zero as well uh Ravenclaw will be initialized to zero and finally Slytherin will be
initialized to zero so here's another example of a dictionary or a hash table just being a very general purpose piece
of data you can have keys and values the keys in this case are the houses the values are initially zero but I'm going
to use this instead of like four separate variables to keep track of everyone's answer to this form so I'm
going to do this with opening Hogwarts do uh CSV in read mode not a pend I don't want to change it I just want to
read it as file as my variable name let's go ahead and create a reader this time that is using the reader function
in the CSV Library by opening that file I'm going to go ahead and ignore the first line of the file because recall
that the first line is just timestamp in house I want to get the real data so this next function is just a little
trick for ignoring the first file the first line of the file then let's do this for every other Row in the reader
that is line by line get the current person's house which is in Row Bracket one this is what the CSV Reader Library
is doing for us it's handling all of the reading of this file it figures out where the comma is and and for every Row
in the file it hands you back a list of size two in bracket zero is the timestamp in bracket one is the house
name so in my code I can say house equals Row Bracket one I don't care about the time stamp for this program
and then let's go into my dictionary called houses plural index into it at the house location by its name and
increment that 0 to one and now at the end of this block of code that has the effect of iterating over every line of
the file updating my dictionary in four different places based on whether someone typed Gryffindor or uh Slytherin
or anything else and notice that I'm using the name of the house to index into my dictionary to essentially go up
to this little cheat sheet and change the zero to a one the one to a two the two to a three instead of having like
four separate variables which would be just be much more annoying to maintain down at the bottom let's just print out
the results for each house in those houses iterating over the keys there're in by default in Python let's go ahead
and print out an F string that says the current house has the current uh count and count will be the result of indexing
into houses for that given house and let me close my quote so let's run this to summarize the data Hogwarts stpy 140 of
you answered Gryffindor 54 Hufflepuff 72 Ravenclaw and 80 of you Slytherin and that's just my now way of code and this
is oh my God so much easier than C to actually analyze data in this way and one of the reasons that Python's so
popular for data science and analytics more generally is that it's actually really easy to manipulate data and run
analytics like this and let me clean this up slightly it's a little Annoying that I just have to know and trust that
the house name is in bracket Z bracket one and time stamp is in bracket zero let's clean this up there's something
called a dictionary reader in the CSV uh library that I can use instead capital D capital r this means I can throw away
this next thing because what a dictionary reader does is it still returns to me every Row from the file
one after the other but it doesn't just give me a list of size two representing each row it gives me a dictionary and it
uses as the keys in that dictionary timestamp and house for every Row in the file which is just to say it makes my
code a little more readable because instead of doing this little trickery bracket one I can say quote unquote
bracket house with a capital H because it's capitalized in the Google form it's self so the code now is just minorly
different but it's way more resilient especially if I'm using Google spreadsheets and I'm moving the columns
around or doing something like that where the numbers might get messed up now I can run this on Hogwarts St high
again and I get the same answers but I now don't have to worry about where those individual columns are all right
any questions on those capabilities there that's a teaser of sorts for what some of the manipulation we'll do in pet
6 all right so some final examples and flare to Intrigue with what you can do with python I'm going to actually switch
over to a terminal window on my own Mac so that I can actually use audio a little more effectively so here's just a
terminal window on Mac OS I before class have pre-installed some additional python libraries that won't really work
in vs code in the cloud because they require audio that the browser won't necessarily support but I'm going to go
ahead and write an example here that involves writing a speech-based program that actually does something with speech
and I'm going to go ahead and import a library that again I pre-installed called python text to speech and I'm
going to go ahead and purit documentation giv myself a speech engine by using that library's uh init function
for initialize I'm then going to use this engine's save function to do something fun like hello world and then
I'm going to go ahead and tell this engine to run and wait while it says those words all right I'm going to save
this file I'm not using VSS code at the moment I'm using another popular program that we used in cs50 back in my day
called themm uh which is a command line program that's just in this black and white window let me go ahead now and run
python of speech. and hello world all right so it's a little computerized but it is speech
that has been synthesized from this example let's change it a little bit to be more interesting let's do something
like this let's ask the user for their name like what's your name question mark and then let's use a little F string and
say not hello world but hello to that person's name let me save my file run python of speech. enter David hello
David all right so we pronounce my name okay might struggle with different names depending on the phonetics but that one
seemed to be okay let's do something else with python using similarly just a few lines of code uh let me go
into uh today's examples and I'm going to go into a folder called detect whoops a folder
called faces Pi sorry faces and in this F folder that I've written in advance are a few files detect. py recognize.
and two F uh photos office. JPEG and toby. jpeg if you're familiar with the show uh here for instance is the cast
photo from the office here so here is a photo as input suppose I want to do something very Facebook style where I
want to analyze all of the faces or find detect all of the faces in there well let me go ahead and show you a program I
wrote in advanced that's not terribly long much of it is actually comments but let's see what I'm doing I'm importing
the pillow Library again to get access to images I'm importing a library called face recognition which I downloaded and
installed in advance but it does what it says according to its documentation you go into that library and you call a
function called load image file to load something like office. JPEG and then you can use a line of code like this call a
function called face locations passing the images input and you get back a list of all of the faces in the image and
then down here a for Loop that iterates over all of those face locations and inside of this loop I just do a bit
of trickery I figure out the top right bottom and left corners of those locations and then using these lines of
code here I'm using that image library to just draw a box essentially and the code looks cryptic honestly I would have
to look this up to write it again um but per the documentation this just draws a nice little box around the image so let
me go ahead and zoom out here and run this now on office. jpeg all right it's analyzing analyzing
and you can see in the sidebar here here's the original and here is every face that my what 10 lines of python
code found within that file what's a face presumably the library is looking for something maybe without a mask that
has two eyes a nose and a mouth and some kind of arrangement some kind of pattern so it would seem pretty reliable at
least on these fairly easy to read faces here what if we want to look for someone specific for instance someone that's
always getting picked on well we could do something like this recognize. piy which is taking two files as in put that
image and the image of one person in particular and if you're trying to find Toby in a crowd here I conflated the
program sorry this is the version that draws a box around the given face here we have Toby as identified why because
that program recognize. py has a few more lines of code but long story short it additionally loads as input toby.
jpeg in order to recognize that specific face and that specific face is a completely different photo but it looks
similar enough to the person that it all worked out okay let's do one other that's a little sensitive to microphones
let me go into um how about my listen folder here which is available online too and let's just run python of listen
z.y I'm going to type in like David oh sorry no I'm going to hello
[Music] world oh no that's the wrong version okay I look like an idiot okay hello
there go hello to you too and if I say goodbye I'm talking to my laptop like an idiot okay now it's detecting what I'm
saying here so this first version of the program is just using some relatively simple if L if L if and it's just asking
for input forcing it to lowercase and that was my mistake with the first example and then I'm just checking is
hello in the users's words is how are you in the user's words didn't see that but it's there is goodbye in the user's
words now let's do a cooler version using a library just by looking at the effect python of listen one up P hello
world huh let's do version two of this that uses a uh audio uh Speech to Text Library hello
world okay so now it's artificial intelligence now let's do something a little more interesting the third
version of this program that actually analyzes the words that are said hello world my name is David how
are you okay so that time it not only analyzed what I said but it plucked my name out
of it let's do two final examples this one will generate a QR code let me go ahead and write a program called qr.
that very simply does this uh let me import a library called OS let me import a library called QR code let me grab an
image here that's QR code. make and let me give you like the URL of like a lecture video on YouTube or something
like that uh with this ID let me just type this so I don't get it
wrong okay so if I now use this URL here of a video on YouTube making sure I haven't made any typos I'm now going to
go ahead and do two lines of code in Python I'm going to first save that as a file called qr. PNG which is a
two-dimensional barcode a QR code and indeed I'm going to use this format and I'm going to use the os. system library
to open qr. PNG automatically and if you'd like to take out your phone at this point you can see the result of my
bar code that's just been dynamically generated hopefully from afar that will [Music]
scan and I think that's an appropriate line to end on so that's it for cs50 we will see you next time
[Applause] [Music] this is cs50 and this is week seven the
week here of Halloween indeed special thanks to cs50's own Valerie and her mom for having created this very festive
scenery and all past ones as well today we pick up up where we left off last time which recall we introduced Python
and that was our big transition from C where suddenly things started to look new again probably syntactically but
also probably things hopefully started to feel easier well with that said like problem set six certainly added some
challenges and you did some new things but hopefully you'll be you've begun to appreciate that with python just a lot
more stuff is easier to do you get more out of the box with the language itself and that's going to be so useful over
the coming weeks as we transition further to introducing something called databases today uh web programming next
week and the week after so that by terms end and perhaps even for your final project you really are building
something from scratch using all of these various tools somehow together so before we do that though today let's
consider what we weren't really able to do last week which was actually create and store data ourselves right in Python
we've played around with the CSV comma separated values Library uh and you've been able to to read in csvs from dis so
to speak that is from files in your uh programming environment but we haven't necessarily started saving data
persisting data ourselves and that's a huge limitation because pretty much all of the examples we've done thus far with
a couple of exceptions have involved my providing input at the keyboard or even vocally but then nothing happens to it
it disappears The Moment the program quits because it was only being stored in memory but today we'll start to focus
all the more on storing things on disk that is storing things in files and fold folder so that you can actually write
programs that remember what it is the human did last time and ultimately you can actually make mobile or web apps
that actually begin to grow and grow and grow their data sets as might happen if you get more and more users for instance
on a website to play then with this new capability of being able to write files let's go ahead and just collect some
data in fact those of you here in person if you want to pull up this URL on your phone or laptop that's going to lead you
to a Google form and that Google form is going to ask you in just a moment for really just your
favorite TV show and it's going to ask you to categorize it according to its genre like comedy or drama or action or
musical or something like that and this is useful because if you've ever used a Google form before or Microsoft's
equivalent with Office 365 it's a really useful mechanism of just collecting data from users and then ultimately putting
it into a spreadsheet form so this is a screenshot of the form that those of you here in person are tuning in on Zoom are
currently filling out it's asking only two questions What's the title of your favorite TV show and what are one or
more genres into which your TV show Falls and I'll go ahead and pivot now to the view that I'll be able to see as the
person who created this form which is quite simply a Google spreadsheet Google forms has this nice feature if you've
ever noticed that allows you to export your data to a Google spreadsheet and then from there we can actually grab the
file and download it to my own Mac or your own PC so that we can actually play around with the data that's come in so
in fact let me go ahead and and slide over to this the live Google spreadsheet and you'll see probably a whole bunch of
familiar TV shows here all coming in and if we keep scrolling and scrolling and scrolling only 46 47 there we go up to
50 plus already if you need that URL again here if you're just tuning in you can go to this URL here and in just a
moment we'll have a bunch of data with which we can start to experiment I'll give you a moment or so there
[Music] all right let me hang in there a little longer okay we've got over 100
submissions good good even more coming in now and we can see them coming in live here let me switch back to the
spreadsheet the list is growing and growing and growing and in just a moment let me give Carter a moment to help me
export it in real time Carter just give me a heads up when it's reasonable for me to download this file
F all right and I'll begin to do this very slowly so I'm going to go up to the file menu if you've never done this
before download you can download a whole bunch of formats one in Excel but more simply in the one we'll start to play
with here is comma separated values so CSV files we used this past week why are they useful now that you've played with
them or used them in past real world like what's the utility of a CSV file versus something like Excel for
instance why CSV in the first place any instincts yeah text file okay so storage is compelling a simple text
file with asky or unic code text is probably pretty small I like that other thoughts structure of it yeah well said
it's just a simple text format but using conventions like commas you can represent the idea of columns using new
lines back SL ends invisibly at the end of your lines you can create the idea of rows so it's a very simple way of
implementing what we might call a flat file database it's a way of storing data in a flat that is very simple file
that's just pure asking or Unicode text and more compellingly I dare say is that with a CSV file it's completely portable
something's portable in the world of computing if it means you can use it on a Mac or a PC running this operating
system or this other one and portability is nice because if I were to download an Excel file there'd be a whole bunch of
people in this room and online who couldn't download it because they haven't bought Microsoft Excel or
installed it or if they have a Mac they might not or if it's aumers file in the Mac World a PC user might not be able to
download it so a CSV is indeed very portable so I'm going to go ahead and download quite simply the CSV version of
this file that's going to put it onto my own Max downloads folder and let me go ahead here and in just a moment let me
just simplify the name because it actually downloads it at a pretty small pretty large name and give me just one
moment here and you'll see that indeed on my Mac I have a file called favorites. CSV I shortened the name real
quick and now what I'm going to do is go over to vs code and in vs code I'm going to open my File Explorer and if I
minimize my window here for a moment a handy feature of vs code is that you can just drag and drop a file for instance
into your Explorer and voila it's going to automatically upload it for you so let me go ahead and full screen here
close my Explorer temporarily close my terminal window and you'll see here a CSV file favorites. CSV and the first
row by convention has whatever the columns were in Google spreadsheets or Office 365 in Excel online timestamp
comma title comma genres then we have timestamps which indicates when people started submitting looks like a couple
of people were super eager to get started an hour or two ago and then you have the title next after a comma but
there's kind of a curiosity after that sometimes I see the genre like comedy comedy comedy but sometimes it's like
crime comma drama or action comma crime comma drama and those things are quoted and yet I didn't do any quotes you
probably didn't type any quotes where are those quotes coming from in this CSV file why are they there if we infer
yeah yeah so you have a kind of a corner case if you will because if you're using commas as you describe to separate your
data into what are effectively columns well you've kind of painted yourself into a corner if your actual data has
commas in itself so what Google has done what Microsoft does what Apple does is they quote any strings of text that
themselves have commas so that these are now sort of English grammatical commas not CSV specific commas so it's a way of
escaping your data if you will and escaping just means to call out a symbol in a special way so it's not
misinterpreted as something else all right so this is all to say that we now have all of this data with which we can
play in the form of what we'll start calling a flat file database so suppose I wanted to now start manipulating this
data and I want to store It ultimately indeed in this CSV format how can I actually start to read this data Maybe
clean it up maybe do some analytics on it and actually figure out what's the most popular show among those who
submitted here over the past few minutes well let me go ahead and close this let me go ahead then and open up for
instance just my terminal window and let's code up a file called favorites. py and let's go ahead and iteratively
start simple by just opening up this file and printing out what's inside of it so you might recall that we can do
this by doing something like import CSV to give myself some CSV reading functionality then I can go ahead and do
something like with open the name of the file that I want to open in read mode quote unquote R means to read it and
then I can say as file or whatever other name for a variable to say that I want to open this file and essentially store
some kind of reference to it in that variable called file then I can give myself a reader and I can say csvreader
passing in that file as input and this is the magic of that Library it deals with the process of opening it reading
it and giving you back something that you can just iterate over like with a for loop I do want to skip the first row
and recall that I can do this next reader is this little trick that just says ignore the first row because the
first one's special it said timestamp title genres that's not your data that was mine but this means now that I've
skipped that first row everything Hereafter is going to be the title of a show that you all like so let me do this
for Row in the reader let's go ahead and print out the title of the show each of you typed in how do I get at the title
of the show each of you typed in it's somewhere inside of row row recalls a list so what do I want to type next in
order to get at the title of the current row just as a quick check here what I want to type to get at the title of the
row keeping in mind again that it was timestamp title genres yeah so Row Bracket one would give me the second
column zero index that is the one in the Middle with the title so this program isn't that interesting yet but it's a
quick and dirty way to figure out all right what's my data look like let me actually just do a little bit of a check
here and see if it contains the data I think it does let me maximize my terminal window here let me run python
of favorites. hitting enter and you'll see now a purely textual list of all of the shows you all seem to like here but
what's not worthy about it specific shows aside judgment aside as to people's TV tastes like what's
interesting or noteworthy about the data that might create some problems for us if we start to analyze this data and
figure out what's the most popular how many people like this or that [Music]
what do you think [Music] yeah yeah there might be user errors or
just sort of stylistic differences that give the appearance that one show is different from the other for instance
here let's see if I can see an example on the screen here yeah so friends here is an all lowercase friends here is
capitalized no no big deal we can sort of mitigate that but this is just a tiny example of where data in the real world
can get messy fast and that probably wasn't even a typo it was just sort of just someone not um caring as much to
capitalize it and that's fine your users are going to type what they're going to type so let's see if we can't now begin
to get at more specific data and maybe even clean some of this data up let me go back into my uh file called
favorites. here and let's actually do something a little more user friendly for me instead of a reader recall that
there was this dictionary reader that's just a little more user friendly and it means I can type in dictionary reader
here passing in the same file but now when I iterate over this reader variable what is each row when using a dict
reader instead of a reader recall and this is just a p peculiarity of the CSV Library this gives me back not a list of
cells but what instead which is marginally more user friendly for me [Music]
yeah yeah I can now use Open Bracket quotes and the title because what's coming back now is a dict object that is
a dictionary which has keys and values the keys of which are the column headings the values of which are the
data I actually care about so this is just marginally better because one it's just way more obvious to me the author
of this code what it is I'm getting at I mean I don't remember what column the title was was it zero was it one was it
two that's something you're going to forget over time and God forbid someone changes the data by just dragging and
dropping the columns in Excel or apple numbers or Google spreadsheets that's going to break all of your new numic
indices and so a dictionary reader is arguably just better designed because it's more robust against changes and
potential errors like that now the effect of this progam of this change isn't going to be really any different
if I run python of favorit stop High voila I get all of the same results but I've now not made any assumptions as to
where each of the columns actually is numerically all right well let's go ahead and now filter out some duplicates
because there's a lot of commonality among some of the shows here so let's see if we can't filter out duplicates if
I'm reading a CSV file top to bottom what intuitively might be the like logic I want to implement to filter out
duplicates it's not going to be quite as simple as a simple function that does it for me I'm going to have to build
this but logically if you're reading a file from top to bottom how might you go about in python or just any context
getting rid of duplicate values yeah what do you think sure I could use a list and I could add
each title to the list but first check if I've put this into the list before so let's try a little something like that
let me go ahead and create a a variable at the top of my program here I'll call it titles for instance initialized to an
empty list Open Bracket close bracket and then inside of my Loop here instead of printing it out let's start to make a
decision so if the current current Row's title uh uh uh is in the titles array I don't want to the title list I don't
want to put it there and actually let me invert the logic so I'm doing something proactively so if it's not the case that
row do Row Bracket title is in titles then go ahead and do something like titles. append the current rows title
and recall that we saw append uh a week or so ago where it just allows you to append to the current list and then what
can I do at the very end after I'm all done reading the whole file why don't I go ahead and say for title in titles go
ahead and print out the current title so it's two Loops now and we can come back to the quality of that design but let me
go ahead here and rerun python of favorites.i let me increase the size of my terminal window so we can focus just
on this and hit enter and now I'm just skimming I don't think I'm seeing
duplicates although I am seeing some near duplicates for instance there's friends
again and if we keep going and going and going and going there's friends again oh interesting so that's
curious that I seem to have multiple friends and I have this one here too so how might we clean this up further I
like your instincts and that's it's step closer to it what are we going to have to do to really filter out those near
duplicates any thoughts [Music] yeah what are the common mistakes to
summarize we could ignore the capitalization all together and maybe just force everything to lowercase or
everything to uppercase doesn't matter which but let's just be consistent and for those of you who might have
accidentally or instinctively hit like the space bar at the beginning of your input or even at the end we can strip
that off too stripping white space is a common thing just to clean up user input so let me go back into my code here and
let me go ahead and tweak the title a little bit me say that the current title inside of this Loop is not going to be
just the current rows title but let me go ahead and strip off from the left and the right implicitly any wh space if you
read the documentation for the strip function it does just that it gets rid of white space to the left wh space to
the right and then if I want to force everything to maybe uppercase I can just uppercase the entire string and remember
what's handy about python is you can chain some of these function calls together by just using dots again and
again and that just takes whatever just happened like the white space got stripped off then it Additionally
uppercases the whole thing as well so now I'm going to check whether this specific title is in titles and if not
I'm going to go ahead and append that title massaged into this different format if you will so I'm throwing away
some information like I'm sacrificing all of the nuances of your uh grammar and input to the form itself but at
least I'm trying to canonicalize that is standardize what the data actually looks like so let me go ahead and run python
of favorites. py again and hit enter oh and this is just user error maybe you haven't seen this before this just looks
like uh uh mistake on my part I meant to say not even uppercase that's completely wrong the function is called upper now
that I think of it all right let's go and increase the size of the terminal window again run python of favorites. Pi
and now you know it's a little more overwhelming to look at because it's not sorted yet and it's all
capitalized but I don't think I'm seeing multiple friends so to speak there's one friends up here and that's it I'm back
up at my prompt already so we seem now to be filtering out duplicates now before we dive in further and clean this
up uh further than this what else could we have done well it turns out that in Python 2 you often do get a lot of
functionality built into the language and I'm kind of implementing myself the idea of a set if you think back to
mathematics a set is typically something with a bunch of values that has duplicates filtered out recall that
python already has this for us and we saw it really briefly when I whipped up the dictionary implementation a couple
of weeks back so I could actually Define my titles to be a set instead of a list and this would just modestly allow me to
refine my code here such that I don't have to bother checking for duplicates anyway I can instead just say something
like titles. add the current uh title like this know marginally better design if you know that a set exists you're
just getting more functionality out of this all right so let's clean the data up further we've now gone ahead and
fixed the problem of case sensitivity we threw away Whit space in case someone had hit the space bar with some of the
input let's go ahead now and sort these things by the titles themselves so instead of just printing out the titles
in the same order you all inputed them but in uh filtering out duplicates as we go let me go ahead and use another
function in Python you might not have seen which is literally called sorted and will take care of the process of
actually sorting title for you let me go ahead and increase the font size of my terminal run python of favorites. and
hit enter and now you can really see how many of these shows start with the word the' or do not now it's a little easier
to wrap our minds around just because it's at least sorted alphabetically but now you can really see some of the
differences in people's inputs so far so good but a few of you decided to stylize Avatar in three different ways here um
Brooklyn 99 a couple of different ways here and I think if we keep going we'll see further and further variances that
we did not fix by focusing on wh space and capitalization alone so already here this is only what 100 plus 200 rows
already real world data starts to get messy quickly and that might not bode well when we actually want to keep
around real data from real users you can imagine an actual website or a mobile application dealing with this kind of
thing on scale well let's go ahead and do this let's actually figure out the popularity of these various shows by now
iterating over my data and keeping track of how many of you inputed a given title we're going to ignore the problems like
Brooklyn 99 and uh the uh Avatar sorry uh yeah uh Avatar uh where there was things that were different Beyond just
Whit space and uh capitalization but let's go ahead and keep track of now how many of you inputed each of these titles
so how can I do this I'm still going to take this approach of iterating over the CSV file from top to bottom we've used a
couple of data structures thus far a list to keep track of titles or a set to keep track of titles but what if I now
want to keep around a little more information for each title I want to keep around how many times I've seen it
before I'm not doing that yet I'm throwing away the total number of times I see these shows how could I start to
keep that around how we could use a dictionary and how what elaborate on
that perfect really good instincts using a dictionary in so far as it lets us store Key keys and values that is
associate something with something else this is why a a dictionary or hash tables more generally are such a useful
practical data structure because they just let you remember stuff in some kind of structured way so if the keys are
going to be the titles I've seen the values could be the number of times I've seen each of those titles and so it's
kind of like just having a two column uh a two column table on paper for instance if I were going to do this on a piece of
paper I might just have a two columns here where maybe this is the title that I've seen and this is the count over
here this is in effect a dictionary in Python it's two columns keys on the left values on the right and this if I can
Implement in code will actually allow me to store this data and then maybe do some simple arithmetic to figure out
which is the most popular so let's do this let me go ahead and change my titles to not be a list not be a set
let's have it be a dictionary instead either doing this or more succinctly two curly braces that empty gives me an
empty dictionary automatically what do I now want to do I think most of my code can stay the same but down here I don't
want to just blindly add titles to the data structure I somehow need to keep track of the count and unfortunately if
I just do this let's do titles bracket title plus equals 1 this is a a reasonable first attempt at this because
what am I doing if titles is a dictionary and I want to look up the the current title therein you the Syntax for
that like before is titles bracket and then the key you want to use to index into the dictionary it's not a number in
this case it's an actual word a title and you're just going to increment it by one and then eventually I'll come back
and finish my second Loop and do things uh in terms of the order but for now let's just keep track of the total
counts let me go ahead and increase my terminal window let me do python of favorites. piy and hit enter and huh I
Met Your Mother is giving me a key error what does that mean and why am I seeing this and in
fact just to just to give a little bit of a a breadcrumb here let me zoom out here let me open up the CSV file again
real quickly and wow we didn't even get past the second row in the file or the first show in the file notice that How I
Met Your Mother somewhat lowercased is the very first show in therein what's your instinct for why this is happening
starting point I don't have a starting point right I'm adding one to what like I'm blindly indexing into the dictionary
using a key How I Met Your Mother that doesn't yet exist in the dictionary and so python throws what's called a key
error because the key you're trying to use just doesn't exist yet so logically how could we fix
this we're close we got like half of the problem solved but I'm not handling the obvious now case of nothing being there
yeah creating a creating the counter itself so maybe I could do something like this let me close my terminal
window and let me ask a question first if the current title is in the dictionary already if title in titles
that's going to give me a true false answer it turns out then I can safely say titles bracket title plus equals 1
and recall this is just shorthand notation for the same thing as in C uh title plus one whoops typo don't do that
that's the same thing as this but it's a little more succinct just to say plus equals 1 else if it's logically not the
case that the current title is in the titles dictionary then I probably want to say titles bracket title equals feel
fre to just Shout It Out zero I just have to put some value there so that the key itself is also
there all right so now that I've got this going on let me go ahead and undo my sorting temporarily and now let me go
ahead and do this I can as a quick check let me go ahead and just run the code AS is python of favorites.i I'm back in
business it's printing correctly no key errors but it's not sorted and I'm not seeing any of the counts let me just
quickly add the counts and there's a couple of ways I could do this I could say print out the title and then maybe
let's do something like uh how about just comma titles bracket title so I'm going to print two things at once both
the current title in the dictionary and whatever its value is by indexing into it let me increase my terminal window
let me run python of favorites. Pi enter and okay huh
huh none of you said a whole lot of TV shows it seems what's the logical error here what did I do wrong if I look back
at my code here yeah why so many zeros [Music]
exactly to summarize I initialized the count to zero the first time I saw it but I should have initialized it at
least to one because I just saw it or I should change my code a bit so for instance if I go back in here the
simplest fix is probably to initialize to one because on this iteration of the loop obviously I'm seeing this title for
the very first time or I could change my logic a little bit I could do something like this instead if the current title
is not in titles then I could initialize it to zero and then I could get rid of the else and now blindly index into the
title's dictionary because now on line 11 I can trust that lines N9 and 10 took care of the initialization for me if
need be which one is better I don't know this one's a little nicer maybe because it's one line fewer but I think both
approaches are perfectly reasonable and welld designed but the key thing no pun intended is that we have to make sure
the key exists before we presume to actually increment oh this is wrong don't this is this is incorrect code
what did I do wrong okay yes there we go so otherwise everyone would have liked this show once
and no matter how many people said the same thing now the code is as it should be so let me go ahead and open up my
terminal window again let me run python of favorites.i and now we see more reasonable counts some of the shows
weren't that popular there's just ones and maybe twos but I bet if we sort these things we can start to see a
little more uh detail so how else can we do this Well turns out when dealing with with um when dealing with a dictionary
like this let's go ahead and just sort the titles themselves so let's reintroduce this sorted function as I
did before but no other changes let me go ahead now and run python of favorites. py now it's just a little
easier to wrap your mind around it because at least it's alphabetical but it's not sorted by value it's sorted by
key but sure enough if we scroll down there's something down here for instance like uh let's see the office that's
definitely going to be a contender for most popular 15 responses but let's see what's actually going to Bubble up to
the top unfortunately the sorted function only sorts dictionaries by keys by default not by values but it turns
out in Python if you read the documentation for the sorted function you can actually pass in other arguments
that tell you tell it how to sort things for instance if I want to do things in reverse order I can add a second
parameter to to the sorted function called reverse and it's a named parameter you literally say reverse
equals true so that the position of it in the comma separated list doesn't matter if I now rerun this after
increasing my terminal window you'll see now that it's in the opposite order now Adventure an an with an E is at the
bottom of the output instead of the top how can I tell it to sort by a different part of the of the of a different um by
values instead of by key well let's go ahead and do this let me go ahead and Define a function
um I'm just going to call it f to keep things simple and this F function is going to take a title as input and given
a given title it's going to return the value of that title so actually maybe a better name for this would be get value
and or we could come up with something else as well the purpose of the get value function to be clear is to take as
input a title and then return the corresponding value why is this useful well it turns out that the sorted
function in Python according to its documentation also takes a key parameter where you can pass in crazy enough the
name of a function that it will use in order to determine what it should short sort by by the key or by the value or in
other cases even other types of data as well so there's a curiosity here though that's very deliberate key is the name
of the parameter just like reverse was the name of this other parameter the value of it though is not a function
call it's a function name notice I am not doing this no parentheses I'm instead passing in get value the
function I wrote by its name and this is a feature of python and certain other languages just like variables you can
actually pass whole functions around so that they be could be called for you later on by someone else so what this
means is that the sorted function written by python uh they didn't know what you're going to want to sort by
today but if you provide them with a function called get value or anything else now their sorted function will use
that function to determine okay if you don't want to sort by the key of the dictionary what do you want to sort by
this is going to tell it to sort by the value by returning the specific value we care about so let me go ahead now and
rerun this after increasing my terminal python of favorites.i enter here we have now an example of all of the titles you
all typed in albeit Force to Capital uh Force to uppercase and with any Whit space thrown out and now the office is
an easy win over friends versus Community versus Game of Thrones Breaking Bad and then a lot of variants
thereafter so there's a lot of steps to go through like this you know isn't that bad once you've done it once and you
know what these functions are and you know that these parameters exist but it's a lot of work I mean that's 17
lines of code just to analyze a CSV file that you all created by way of those Google form submissions but it took me a
lot of work just to get simple answers out of it and indeed that's going to be among the goals for today ultimately is
how can we just make this easier right it's one thing to learn new things in Python but if we can avoid writing code
or this much code that's going to be a good thing and so one other other Technique we can introduce here that
does allow us to write a little less code is we can actually get rid of this function it turns out in Python if you
just need to make a function but it's going to be used and then essentially thrown away it's not something you're
going to be reusing in multiple places it's not like a library function that you want to keep around you can actually
just do this you can change the value of this key parameter to be what's called a Lambda function which is a fancy way of
saying a function that technically has no name it's an anous function why does it have no name well it's kind of stupid
that I invented this name on line 13 I used it on line 16 and then I never again used it right if there's only
being used in one place why bother giving it a name at all so if you instead in Python say Lambda and then
type out the name of the parameter you want this Anonymous function to take you can then say go ahead and return this
value now it's notice the inconsistencies here when you use this special Lambda keyword that says hey
give me an anonymous function a function with no name it then says python this Anonymous function will take one
parameter notice there's no parentheses and that's deliberate if confusing it just tightens things up a little bit
notice that there's no return keyword which similarly tightens things up a bit albeit inconsistently but this line of
code I've just highlighted is actually identical in functionality to this but it throws away the word deaf it throws
away the word get value it throws away the parenthesis and it throws away the return keyword just to tighten things up
and it's well suited for a problem like this where I just want to pass in a tiny little function that does something
useful but it's not something I'm going to reuse it doesn't need multiple lines to take up space it's just a nice
elegant oneliner that's all a Lambda function does it allows you to create an anonymous function right then and there
and then the function you're passing it to like sorted will use it as before indeed if I run python of favorites.
after growing my terminal window the result is exactly the same and we see at the bottom here all of those small
results are any questions then on this syntax on these ideas the goal here has been to write a Python program that just
starts to analyze or clean up data like this yeah could you use the Lambda if it's
just returning immediately it's really meant for one line of code generally so you don't use the return keyword you
just say what it is you want to [Music] return good question could you do more
in that one line if it's got to be a more involved algorithm yes but you would just ultimately return the value
in question in short if it's getting it all sophisticated you don't use the Lambda function in Python you go ahead
and actually just Define a name for it even if it's a one-off name JavaScript another language we'll look at in a few
weeks makes heavier use I dare say of Lambda function and those can actually be multiple multiple lines but python
does not support that that instinct all right so let's go ahead and do one other thing office was clearly
popping out of the code here quite a bit let's go ahead and write a slightly different program that maybe just
focuses on the office for the moment just focuses on the office so let me go ahead and throw most of this code away
up until this point when I'm inside of my inner loop and let me go ahead and I don't even want the global variable here
all I want to do is focus on the current title how could I detect if someone likes the office well I could say
something like uh how about this so counter equals zero we'll just focus on the office uh if title equals equals the
office I could then go ahead and say uh counter uh plus equals 1 I don't need a key there's no dictionary involved now
it's just a simple integer variable and then down here I'll say something like uh number of people who like the office
is is whatever this value is and I'll put in counter in curly braces and then I'll turn this whole thing into an F
string all right let me go ahead and run this python of favorites. py enter number of people who like the office is
15 all right so that's great but let's go ahead now and deliberately muddy the data a bit all of you were very nice in
that you typed in the office but you could imagine someone just typing office for instance maybe there maybe there and
many people might just write office you could imagine didn't happen here butos it did and probably would have if we had
even more more submissions over time now let's go ahead and rerun this program no changes to the code now only 13 people
like the office so let's fix this the data is now as I mutated it to have a couple of offices and many the offices
how could I change my python code to now count both of those situations what could I change up here
in order to improve this situation any thoughts yeah title equals the office or title equal
yeah so I could just ask two questions like that if title equals the office or title equals equals uh just office for
instance and I'm still don't have to worry about capitalization I don't have to worry about spaces because I at least
threw that all away now I can go ahead and rerun this code let me go run it a third time okay so we're back up to 15
so I like that um but this could you could imagine this not scaling very well right like Avatar had three different
permutations and there were some others if we dug deeper that there might have been more variants could we do something
a little more general purpose well we could do something like this if uh office in the title this is kind of a
cool thing you can do with python it's very english-like just ask the question albeit turle this interesting just got
me into trouble now all of a sudden we're up to 16 does anyone know what the other one
is office which what office [Music] oh interesting yes so they hit V and
okay okay someone did that sure so the be office um we so okay this one's actually going to be hard to correct for
like I can't really think of a general well this is I mean this is actually good example of like data gets messy
fast and you could imagine doing something where okay we could have like 26 condition if someone said the a
office or the B office or right you could imagine doing that but then there's surely going to be other typos
that are possible so that's actually a hard one to fix um but in turns out we got lucky and now this is actually the
accurate count um but the data is itself messy let me show another way that just add another tool to our toolkit it turns
out that there's this feature in many programming languages python among them called regular expressions and this is
actually a really powerful technique that we'll just scratch the surface of here but it's going to be really useful
actually maybe toward final projects in web programming anytime you want to clean up data or validate data and
actually just to make this clear give me a moment before I switch screens here and let me open up a Google form from
scratch give me just a moment to create something real quick if you've never noticed this
before when creating a Google form you can do like a a question and if you want the user to type in something very
specific as a short text answer like this you might know that there's toggles like this in Google's world like you can
require it or you can do response validation like you could say what's your email and then you could say
something like uh text uh is an email so here is an example in Google uh forms how you can
validate users input but a feature most of you have probably never noticed or cared about or used is this thing called
a regular expression where you can actually Define a pattern and I could actually reimplement that same idea by
doing something like this I can say let the user type in anything represented by star then an at sign then something else
then a literal period Then for instance uh something else so it's very cryptic admittedly at first glance but this
means any character Zero more times this means any character Zero more times this means a literal period because
apparently dot means any character in the context of these patterns then this thing means uh any character Zero more
times so I should actually be a little more nitpicky you don't want zero or more times you want one or more times so
this with a plus means any character one or more times so there has to be something there and I think I want the
same thing here one or more times one or more times or Heck if I want to restrict this form in some sense to edu addresses
I could change that last thing to literally. EDU and so long story short even though this looks I'm sure pretty
cryptic there's this sort of mini language built into Python and JavaScript and Java and other languages
that allows you to express patterns in a standardized way and this pattern is actually something we can Implement in
code to and let me switch back to python for a second just to do the same kind of idea let me um toggle back to my code
here let me put up for instance a summary of what it is you can do and here's just a quick summary of all of
the available uh some of the available symbols a period represents any character dot star or dot asterisks
means zero or more characters so the dot means anything so it can be a or nothing it can be b or nothing it can be a ABC
it can be any combination of zero or more characters change that to a plus and you now Express one or more
characters question mark means something is optional uh carrot symbol means start matching at the beginning of the user's
in put dollar sign means start matching at the end of the user or stop matching at the end of the user's input so we
won't play with all of these just now but let me go over here and actually tackle this office problem let me go
ahead and import a new library called the regular expression Library import re and then down here let me say this if
re. search this pattern uh let's just search for office quote unquote in the current
title then we're going to go ahead and increas increase the counter so it turns out that the regular expression library
has a function called search that takes as its first argument a pattern and then as its second argument the string you
want to analyze for that pattern so it sort of look for a needle in this Hy stack from left to right let me go ahead
now and run this version of the program enter and now I screwed up because I forgot my colon but that's old stuff
enter huh number of people who like the office is now zero so this seems like a big thank you big step
backwards what did I do wrong yeah yeah so my I forced all my input to
uppercase so I probably need to do this so we'll come back to other approaches there let me rerun it now okay now we're
back up to 16 but I could even let's say I could tolerate just the office how about this or how about something like
or the office let me do this instead and let me use these other special characters this carrot sign means the
beginning of the string this dollar sign weirdly represents the end of the string I'm adding in some parentheses just like
in math just to add another symbol here the or symbol here and this is saying start matching at the beginning of the
user string check if the beginning of the string is office or the beginning of the string is the office and then you
better be at the end of the string so they can't keep typing words before or after that input let me go ahead and
rerun the program and now we're down to 15 which used to be our correct answer but then we noticed the the V office how
can we deal with that it's going to be Messier to deal with that but I could how about if I tolerate any character
represented by dot in between the an office now if I rerun it now I really have this expressive capability so this
is to say there are so many ways in languages in general to solve problems and some of these tools are more
sophisticated than others this is one that you've actually probably glanced at but never used in the context of Google
forms for years if you're in the habit of creating these for student groups or other activities but it's now something
you can start to leverage and we're just scratching the surface of what's actually possible with this but let's
now do one final example just using some python code here and let's actually write a program that's a little more
general purpose that allows me to search for any given title and figure out its
popularity so let me go ahead and simplify this let's get rid of our regular Expressions let's go ahead and
continue capitalizing the title and let's go ahead to at the beginning of this program and first ask the user for
the title they want to search for so title equals let's ask the user for input which is essentially the same
thing as our cs50 get string function ask them for the title and then whatever they type in let's go ahead and strip
whitespace and uppercase the thing again and now inside of my loop I could say something like this uh if the current
rows title after stripping whites space and forcing it to uppercase 2 equals the user's title then go ahead and maybe
increment a counter so I still need that counter back so let me go ahead and Define this maybe in here counter equals
z and then at the very end of this program let me go ahead and print out just the popularity of whatever the
human typed in so again the only difference is I'm asking the human for some input this time I'm initializing my
counter to zero then I'm searching for their title in the CSV file by doing the same massaging of the data by forcing it
to uppercase and getting rid of the white space so now when I run python of favorites. py enter I could type in the
office all lowercase even and now we're down to 13 [Music]
13 y oh that's correct because I'm the one that went in and removed those the
keywords a bit ago if we fixed those we would be back up to 15 if we added support for the V office we would be up
to 16 as well all right any questions then on these various manipulations and if you're feeling like oh my God this is
so much python code just to do simple things that's the point and indeed even though it's a powerful language and can
solve these kinds of problems we had to write almost 20 lines of code just to ask a single question like this but any
questions on how we did this or on any of these building blocks along the way anything here no all right that was
a lot let's take a five minute break here when we come back we'll do it better so we are back and the rest of
today is ultimately about how can we store and manipulate and change and retrieve data more efficiently than we
might by just writing raw code this isn't to say that you shouldn't use Python to do the kinds of things that we
just did and in fact it might be super common if you're getting a lot of like messy input from users that you might
want to clean it up and maybe the best way to do that is to write a program so that step by step you can make all of
the requisite changes and fixes like we did uh with the office for instance again and again and reuse that code
especially if more and more submissions are coming through but another theme of today ultimately is that sometimes there
are different if not better tools for the same job and in fact now at this point in the term is we begin to
introduce not just python but in a moment a language called SQL and next week a language called JavaScript and
the week after that synthesizing a whole lot of these languages together is to just kind of paint a picture of like how
you might decide what the trade-offs are between using this tool or this tool or this other tool because undoubtedly you
can solve problems moving forward in many different ways with many different tools so let's give you another tool one
with which you can Implement a proper relational database what we just saw in the form of CSV files are what we might
call flat file databases again just a very simple file flat in that there's no hierarchy to it it's just like rows and
columns and that is all ultimately storing aski or Unicode text a relational database though is something
that's actually closer to a proper spreadsheet program right like a CSV is an individual sheet if you will from a
spreadsheet when you export it if you had multiple sheets in a spreadsheet you would have to export mult multiple csvs
and that gets annoying quickly in code if you have to open up this CSV this CSV all of which represent different sheets
or tabs in a proper spread sheeet a relational database is more like a spreadsheet program that you a
programmer now can interact with you can write data to it you can read data from it and you can have multiple sheets AKA
tables storing all of your data so where as Excel and numbers and Google spreadsheet are meant to be reused
really by humans with their Mouse and their keyboard clicking and pointing and manipula things graphically a relational
database using a language called SQL is one in which the programmer has similar capabilities but doing so in code
specifically using a language called SQL and at a scale that's much grander than spreadsheets alone in fact if you try on
your Mac or PC to open a spreadsheet that's got tens of thousands of rows it'll probably work fine hundreds of
thousands of rows millions of rows no way like at some point your Mac or PC is going to struggle to open particularly
large data sets and that too is where proper databases come into play and proper languages for databases come into
play when it's all about scale and indeed most any mobile app or web app today that you or someone else might
write should probably plan on lots of data if it's successful so we need the right tools for that problem so
fortunately even though we're about to learn yet another language it only does four things fundamentally known by this
silly acronym crud uh SQL this language for databases supports the ability to create data read data update data and
delete data like that's it there's a few more keywords that exist in this language called SQL that we'll soon see
but at the end of the day even if you're starting to feel like this is a lot very quickly it all boils down to these four
basic operations and the four commands in SQL if you will functions in a sense that Implement those four ideas happen
to be these they're almost the same but with some slight variance the ability to create or insert data is the C uh the
ability to select data is the r or read update is the same delete is the same but drop is also a keyword as well so
we'll see these and a few other keywords in SQL that at the end of the day just allow you to create read and update data
using verbs if you will like these so to do that what's the syntax going to be well we won't get into the weeds too
quickly on this but here's a representative syntax of how you can create using this language called SQL in
your very own database a brand new table right this is so easy in Excel and Google spreadsheets and Apple Numbers
you want a new sheet you like click the plus button you get a new tab you give it a name and boom you're done in the
world of uh programming though if you want to create the analog of that spreadsheet in the computer's memory you
create something called a table like a sheet that has a name and then in parentheses has one or more columns but
unlike Google spreadsheets and apple numbers and Excel you have to decide as the programmer what types of data you're
going to be storing in each of these columns now even though EX and Google spreadsheets and numbers does allow you
to format or present data in different ways it's not strongly typed data like it is for instance when we were using C
and heck even in Python there's underlying data types even if you don't have to type them explicitly databases
you're going to want to know are you storing integers are you storing real numbers or floats are you storing text
why because especially as your data scales the more hints you give the database about your data the more
performant it can be the faster it can help you get at and store that data so types are about to be important again
but there's not going to be that many of them fortunately now how can I go about converting for instance some real data
like that from you my favorites. CSV file into a proper relational database well it turns out that using SQL I can
do this in vs code on my own Mac or PC or in the cloud here by just importing the CSV into a database we'll see
eventually how to do this manually for now I'm going to use more of an automated process so let me go over to
vs code here let me type LS to see where we left off before I had two files favorites. CSV which I downloaded from
Google spreadsheets recall that I made a couple changes we deleted a couple ofth from the file for the office but this is
the same file as before and then we have favorites.i which we'll set aside for now I'm going to go ahead now and run a
command SQL light three so in the world of relational databases there's many different products out there many
different software that implements the SQL language Microsoft has their own there's something called MySQL that was
that's been very popular for years Facebook for instance used it early on uh postgress SQL Microsoft Access server
Oracle and maybe another a whole bunch of other product names you might have encountered over time which is to say
there's many different types of tools and servers and software in which you can use SQL we're going to use a very
lightweight version of the SQL language today called SQL light this is the version of SQL that's generally used on
iPhones and Android devices these days if you download an app that stores data like your own contacts typically is
stored using SQL light because it's fairly lightweight but you can still store hundreds thousands even tens of
thousands of pieces of data even using this lightweight version thereof SQL light 3 is like version three of this
tool we're going to go ahead and run SQL light 3 with a file called favorites. DB it's conventional in the world of SQL
light to name your file something. DB I'm going to create a database called favorites. DB once I'm inside of the
program now I'm going to go ahead and enter CSV mode again not something you have to memorize just something you can
look up as needed and then I'm going to import favorites. CSV into a table that is a sheet if you will uh called
favorites as well now I'm going to hit enter and I'm going to go ahead and exit the program Al together and type LS now
I have three files in my current directory the CSV file the python file from before and now favorites. DB but if
I did this right all of the data you all typed into the CSV file has now been loaded into a proper data datase where I
can now use this SQL language to access it instead so let's go ahead and again and run SQL light three of favorites. DB
which now exists and now at the SQL light prompt I can start to play around and see what this data is for instance I
can look by typing schema at what the schema is of my data what's the design now no thought was put into the design
of this data at the moment because I automated the whole process once we start creating our own databases we'll
give more thought to the data types and the columns that we have but we can see what SQL light presumed I wanted just by
importing the data by default what the import command did for me a moment ago is essentially the syntax it automated
the process of creating a table if it doesn't exist called favorites and then notice in parenthesis it gave me three
columns timestamp title and genres which were inferred obviously from the CSV all three of which have been decreed to be
text again once we're more comfortable we'll create create our own tables choose our own types and column names
but for now I just automated the whole process just to get us started by using this buil-in import command as well all
right so what now can I begin to do well if I wanted to for instance start playing around with data therein I might
execute a couple of different commands um one of which let me find the right one here one of which would
be select select being one of our most um versatile tools to select data from this database so if I have these three
columns here timestamp title and genres suppose I want to select all of the titles doing that earlier in Python
required importing the CSV Library uh opening the file creating a reader or a dict reader iterating over every row
adding every title to a dictionary or just printing it out and Dot do dot right there was like a dozen or so lines
of code when we first began now how about this Select Title from favorites semicolon done so now with this
particular language the output is very textual and it's simulating what it looks like if it were more graphical by
creating this table so to speak Select Title from favorites is a distillation in a different language called SQL of
all the lines of code I wrote early on when we first started playing with favorites. py SQL is therefore optimized
for reading and creating and updating and ultimately deleting data so here's a perhaps a better tool for the job once
you have the data tossing it into a more powerful versatile format might allow you now to get more work done more
quickly without having to reinvent the wheel someone else has figured out how to select data like this what more can I
do here well let me go ahead and pull up in a moment just a little bit of a cheat sheet here give me one second to find
this so suppose I want to now select data a little more power so here's what I just did in a canonical way so select
typically works like this you select columns from a specific table semicolon unfortunately stupid semicolons are back
uh select columns from table then is the sort of generic form of what I just did more specifically I selected one column
called title from favorites favorites is the name of the table semicolon ends my thoughts suppose I wanted to get two
things like the genres that each of you inputed I could instead do Select Title comma genres from favorites and then a
semicolon and enter it's going to look a little ugly on my screen because some of these titles and some okay one of you
really went all out with Community um you can see that it's just wrapping in an ugly way but it's just now showing me
two columns if we scroll up to the very top again the left most of one Black Mirror went all out too thank you and
now okay we're going to have to clean some of these up Game of Thrones good comedy yes
um keep going keep going keep going so now we've selected two of The Columns that we care about there it is okay so
it's crazy wide because of all of those genres but it allows me to select exactly the data I want let's go back to
the titles though and perhaps start playing around with some modifiers here for instance it turns out using SQL
there's a lot of functionality built into the language you've got a lot of functions similar to Excel or Google
spreadsheets we're going have formulas SQL provides you with some of the same heris sixs that allow you to apply
operations like these on entire columns for instance you can take averages count the total get the distinct values Force
things to lowercase uppercase Min and Max and so forth so let's try distinct for instance let me go back to my
terminal and let's say select how about the distinct titles from the favorites table enter I didn't bother selecting
the genres because I want it to be a little prettier and you can see here that we have just the distinct title
except for issues of formatting right so white space is going to be an issue again capitalization is going to be a
thing again so there's a tradeoff I mean one of the things I was doing in Python was forcing everything to uppercase and
then getting rid of whites space but we could combine some of these I could do something like force every title to
uppercase then get the distinct value and that's actually going to get rid of some of those values as well and again I
did it all in one simple line that was fast so let me pull up at the bottom of the screen again I selected distinct
upper titles from f favorites and that did everything for me at once in just one breath suppose I want to get the
total number of counts of titles how about select uh count of all of those titles from uh favorites semicolon enter
and now you get back sort of a mini table that contains just your answer 158 in this case so that's the total number
of uh not distinct but total titles that we had in the file and we could continue to manipulate the FI uh the data further
using again functions like these here but there's also additional filtration we can do we can also qualify our
selections by saying where some condition is true so just as in scratch in C and python you have Boolean
Expressions you can have the same in SQL as well where I can filter my data where something is true or false uh like
allows me to do approximations if I want to get something that's like the office but not necessarily th space office I
could do uh pattern matching using like here order by limit and group by are other commands I can execute too so let
me go back and do a couple of these here how about let me just get uh oh I don't know all of the titles from favorite but
limit it to 10 results that might be one thing that's helpful to see if you just care about some of the data at the top
there instead uh how about select all of the titles from favorites where the title itself is like quote unquote
office and this will give me only two answers those are the two rows recall that I mutated by getting rid of the
word the' notice that like allows me to tolerate uppercase and lowercase because if I instead just used the equal sign
and in SQL in single equal sign does in fact mean equality it's not uh for comparison sake it's not doing
assignment this is not how you assign data in SQL I got back no answers there so indeed the equal sign is giving me
literal answers that searches just for what I typed in how could I get all of these well similar in spirit to regular
Expressions but not quite as powerful in SQL I could do something like this I can select the title from favorites where
the title is like quote unquote office but I can add a bit weirdly percent signs to the left and the right so the
language sequel supports the same notion of pattern matching but much more limited out of the box if we want more
powerful regular Expressions we probably do want to use Python instead but the uh percent sign here means zero or more
characters on the left zero or more characters on the right so this will just grab any title that contains o f f
i c in it in that order and now I get all 16 it would seem of those results again how do I know it's 16 well I can
just get the count of those titles and get back that answer instead as well so again take some getting used to to the
library uh the vocabulary and sort of the syntax that you can use there's these building blocks and others but SQL
is really designed again for creating reading updating deleting data for instance um I've never really been a fan
of friends for instance so uh right now if I do select how about title from favorites where title like quote unquote
friends with the percent signs we can see that there's a whole bunch of them that's how many exactly let's just do a
quick count so that's nine of them uh well delete from favorites okay you and me you
delete from favorites where title like friends enter nothing seems to happen but bye-bye friends so now we've
actually thank you so now we've actually changed the data and this is what's compelling about
a database a proper database yes you could technically write python code that not only reads the CSV file but also
writes it like you can change using quote unquote a for append or quote unquote W for write instead of quote
unquote R for read alone but it's definitely a little more involved to do that in Python but with SQL you can
update the data in real time and if I were actually running like a web application here or a database for a
mobile app that change theoretically would be reflected everywhere on your own devices if you're somehow talking to
this application so that's sort of the direction we're headed um this other thing has been bothering me so select uh
how about title from favorites where uh title equals what was it the V office was it yeah it was that
one how about we update favorit by setting title equal to the office where title equals quote unquote the V office
semicolon and now if I select the same thing again I can go up and down with my arrow keys quickly now there is no the V
office we've change that value how about genres select genres from favorites where the title is uh title equals Game
of Thrones semicolon these were kind of long and you know I don't really agree with all of that so how about we update
favorites set genres equal to I mean sure action adventure sure drama okay so it's a decent list fantasy sure Thriller
War okay anything really but comedy I would say let's go ahead and hit enter now and now if I select genres again
same query now we've sort of canonicalized that we've thrown data away so whether or not that is right is
probably a bit subjective and argumentative but I have at least cleaned up my data which is again the U
in crud create read update delete you can do it that easily um beware using delete beware worse using drop whereby
you can drop an entire table but via the these kinds of commands can we actually now manipulate our data um much more
rapidly and with single thoughts and in fact if you're an aspiring statistician or data scientist or analyst in the real
world I mean SQL is such a commonly used language because it allows you to really dive into Data quickly and ask questions
of the data and get back answers quite quickly and this is a simple data set you can do this with much larger data
sets as we soon will too are any questions on what we've seen of SQL thus far only scratch the surface but again
it boils goes down to creating reading updating and deleting data questions here all right well let's
consider the design of this data recall that if I do do schema that shows me the design of my table the so-called schema
of my data this is okay it gets the job done and frankly everything the user typed in was arguably text including the
timestamp which is the date and time but so the data set itself is somewhat simple but if we look at the data set
itself ESP especially genres let's do this select genres from favorites and let me point out one other thing
stylistically too I am very deliberately capitalizing all of the special SQL keywords and I'm lowercasing all of the
column names and the table names this is a convention and honestly it honest it just helps you read I think the code
when you're co-mingling your names for columns and tables with proper SQL uh keywords but I could just as easily do
select genres from favorites but again the sequel specific keywords don't quite jump out as much so stylistically we
would recommend this selecting genres from favorites semicolon so here is where
oh okay that was not intended I accidentally made every show including the office about uh action adventure
Drama Fantasy Thriller and War how did I do that accidentally what did I do wrong think
you updated all yeah so beware was funny I think I did say beware around this time so I it the
SQL database took me literally I updated favorites setting genres equal to that semicolon end OFA I really wanted to say
where title equals quote unquote Game of Thrones unfortunately there isn't an undo command or time machine with a SQL
database so the best we can do here is let's actually get rid of favorites. DB let's run C SEL light of favorites. DB
again which now will be recreated let me change myself into CSV mode let me import into uh my favorites table the
CSV file and now um friends is back for better for worse but so are all of our genres so if
I now do select uh if I now reload the file and do select star from sorry select genres from favorites that was
the result I was getting it's much Messier but but that's because some of these are quite long but now we're back
to the original data less than here be sure to back up your work all right so what more can we now do with this data
well I don't love the design of the genres table for a couple of reasons one I mean we didn't have any sort of
validation but user input's going to be messy there's just a lot of redundancy in here right like let's go ahead and do
this uh let me select all the comedies you all typed in so Select Title from uh favorites where genr equals
quote unquote comedy okay so there's all of the shows that are explicitly comedies but I think there might
actually be others let me scroll back up here comedy drama what was a comedy and a drama how about let's search for the
oops let me copy paste comedy comma drama okay so the office in this case was considered comedy and drama billions
It's Always Sunny in Philadelphia and Gilmore Girls as well but notice that I get many more when I just search for
comedy so the catch here is that because I have all of these genres implemented the way Google did as a comma separated
list it's actually really hard and messy to get at any show all of the shows that are somewhere described as comedy right
because if I search for quote unquote comedy the only answers I'm going to get are this one Whatever that show is this
one Whatever that show is this one but I'm not going to get this one I'm not not going to get this one why if I'm
searching for where genres equals quote unquote comedy why am I missing those other
shows why am I missing [Music] yeah exactly it's not just a comedy it's
a comedy and a drama and a comedy or a news show and so forth so I have to search for these commas so this gets
messy quickly right like let me copy this so I can do this let me search for where genres equals comedy uh how about
or genres equals comedy drama or genres equals this whole thing comedy News Talk show I'm going to get more and more
results but that's not going to scale well what could I do instead of enumerating with ores all of the
different permutations of genres do you [Music] think yeah so I could use the keyword
similar in Python to the word in I could use the like keyword so that so long as the genres is like comedy somewhere in
there that's going to give me all of them so long as the word comedy is in there but let me go ahead and just open
the form from earlier uh the form had let me see if I can open this real quick before I toggle
over if we look back at the form recall that there were all of those radio buttons asking for the specific genres
into which something fell and if I open this let me f screen Here and Now open the original form you'll see all of the
genres here none of which are that worrisome except for a corner case is jumping out at me where might the like
keyword alone get me into trouble it's not with comedy I'm okay with comedy but yeah music and musical are
deliberately on the list here because one there are separate genres but if I just search for something that's like
music I'm going to accidentally suck in all of the musicals which might not be what I intend if music is like a music
video or whatever and musical is actually a different type of show I don't want to just do that so it seems
just very messy like I could probably hack something together with maybe add a commas in there or something like this
but this is just not a good design for the data Google has done it this way because it's just simple to actually
keep the user's data all in a single column and just as they did separated by commas but this is a real m y way to use
CSV is by putting comma separated values in your comma separated values arguably the folks at Google probably just did
this because it's just simpler and they didn't want to give people multiple sheets or complicate things using some
other weirder character than commas alone but I bet there's a better way for us to do this and let me go ahead and do
this let me go back into my code here and in just a moment I'm going to grab a program that I wrote in advance that's
going to use Python to open up the CSV file iterate over all of the rows and load the data into two tables this time
two tables one called shows and one called genres so it's to actually separate these two things out give me
just a moment to grab the code and when I run this I'll only have to run it once let me go ahead and run python in a
moment and I'll reveal the results in a sec uh this is going to be version eight of the code online when I do this let me
go ahead and open up this file give me a second to move it into this directory version eight okay so here we
have version eight of this that's available online that's going to do the following and I'll gloss over some of
the details just so that it uh we don't get stuck in the weeds of some of this code I'm going to be using at the top of
this program as we'll soon see a cs-50 library not for the sake of get string or get int or get float but because
there's some built-in SQL functionality that we didn't discuss a couple of weeks back with the cs50 library itself but
inside of the cs50 library we'll see there is a special function called SQL that gives you the ability using this
weird URL like looking thing technically called a URI that allows me to open a file called favorites. DB and long story
short all of the subsequent code is going to iterate over this favorites. CSV file that we downloaded and it's
going to import it into the SQL light database but it's going to use two tables instead of just one so give me
just a moment to run this and then I'll reveal the actual results this is going to be run on favorites uh
[Music] CSV and taking a look here give me just a
moment oh uhuh give me a sec come on come on this program should not be
taking this long sorry let's open this real fast whoops not that file
okay let me just skim this code real quick to see where we've gone wrong with favor reader reader title show ID insert
into shows AOW and genres split execute all right this is me debugging
in real time all those times we encourage you to use print this is me actually using
print we'll see how quickly I can recover from this python of favorites version 8
okay so here's me debugging in real time it's printing it oh maybe I just didn't wait long enough okay so here we go what
I'm doing is printing out the dictionary that represents each row that you all typed in and we're actually making
progress all right I'm just didn't I was too impatient and didn't wait long enough so in a moment there we go all
right so all you have to do sometimes is wait let me go ahead now and open this file using SQL light 3 so in SQL light 3
I now have a different version of favorites. DB I named it number eight for consistency once I've run the
program I can do schema to look inside of it and here's what the two tables in this database are going to look like
I've created a table called shows this time to represent all the TV shows that are favorites that has two columns one
is called ID one is called title but now I'm going to start taking out for a spin some of the other features of SQL and
besides there being text it turns out there's a data type called integer besides there being a data type called
text there's also a special key phrase that you can specify that the title can never be null think back to our use of
null in uh C think back to the keyword none in Python this is a database constraint that allows you to ensure
that none of you can't have a favorite TV show like if you submit the form you have to have typed in a title for it to
end up in our database here and you'll notice one other new feature it turns out on this table I'm defining what's
called a primary key specifically to be the IDE column more on that in just a moment meanwhile the second table my
code has created for me as we'll soon see gives me a column called show ID and then a genre the value of which is text
that can also not be null and then more on this in a moment this table has what we're going to call a foreign key
specifically the show ID column that references shows ID so before we get into the weeds of this this is now a way
of creating the relation in relational database if I have two tables now not just one they can somehow be linked
together by a common column in other words the shows column shows table is going to give me a table with two
columns an ID and a title every title you gave me I'm going to assign a unique value the genres table meanwhile is
going to associate individual genres singular with that same ID and the result of this to
pop back to the uh Comm the terminal here is let's do this select star from shows of this new
database and you'll see that I've given indeed all of the shows you all typed in unique identifiers I didn't filter out
duplicates or do anything beyond just forcing everything to uppercase so there's going to be some duplicates here
because I didn't want to get rid of anyone's data but you'll see that indeed I've given everyone a unique identifier
from the very first person who typed How I Met Your Mother all the way down to input number
158 meanwhile if I do select star from genres which is now a table not just a column in the original data now you'll
see a much better design for this data notice what I've done here let me go all the way to the top and you'll see two
columns one of which is called show ID the other of which is called genr and again I wrote some code to do
this because I had to take Google's messy output where everything was separated by commas I had had to tear
away the commas and then put each genre into this table by itself even though we haven't introduced the syntax via which
we can reconstitute the data and reassociate your genres with your titles why at a glance might this be a better
design now even though I've doubled the number of tables from one to two why is this probably on the direction toward a
better design what might your instincts be why is this cleaner again first time
with SQL why is it better perhaps that we've done this with our genres table can I come to you why might this be
better yep oh just because we had the conversation before about the commas exactly it's as simple as that
we've cleaned up the data by giving every genre every word in the genre's column in the original Google
spreadsheet its own cell in this table if you will and now notice show ID might appear multiple times whoever typed in
How I Met Your Mother they only Associated one genre with it and so we see that show ID one is a comedy but
whoever typed in I forget the name of the second show offand but that person whoever was assigned show ID to checked
off a whole bunch of the genres boxes that happened again with ver uh show ID 3 four persons 5 six 7 only checked one
box and so you can see now that we've Associated the data with what we might call a one to many relationship a on to
many relationship whereby for every one show in the show's table it can now have many genres associated with it Each of
which is represented by a separate uh separate row here so again if I go ahead and select star from shows
let's limit it to the first 10 just to focus on a subset of the data How I Met Your Mother The Sopranos was the second
input there it would seem that now that I've created the data in this way I could ideally
somehow search the data but a little more correctly I don't have to worry about the commas I don't have to worry
about the hackish approach of Music being a substring of musical but how can I actually get back at this data well
let's go ahead and do this suppose I did want to get back maybe all of the comedies all of the comedies no matter
whether the person checked just the comedy box or multiple boxes instead how now given that I have two tables could I
go about selecting only the titles of comedies like I've actually made the problem a little harder but again sequel
is going to give me a solution for this the problem is that if I want to search for comedies I have to check the genres
table first and then that what's that going to give me like if I search the genres table for comedies what's that
going to give me back potentially yeah show ID maybe show ID so let me try that let me do select
show ID from genres where the genre in a given row equals quote unquote comedy no commas no like no percent signs cuz
literally that column now is singular words like comedy or drama or the like let me go ahead and hit enter here okay
so I got back a whole bunch of ID numbers now this could very quickly get annoying it looks like show ID 1 2 4 5 6
79 and so forth are all comedies so I could do something really crazy like Select Title from shows where ID equals
1 or ID equals 2 or ID I mean this is not going to scale very well but this is why SQL is especially powerful you can
actually compose one SQL question from multiple ones so let's do this why don't I select the title where the ID of the
show is in the following list of IDs select show ID from genres where the specific genre is quote unquote comedy
so I've got two SQL queries one is deliberately nested inside of parentheses that's going to give me back
that whole list of show IDs but that's exactly what I want to then look up the titles for by selecting title from shows
where the ID of the show is in that big tall list and so now if I hit enter I get back only those shows that
were somehow flagged as comedy whether you in the audience checked one box for comedy two boxes or all of the box boxes
somehow we teased out comedy again just by using that python script which loaded this data not into one big table but
instead two and if we want to clean this up let's do a couple of things let's outside of the parentheses do order by
title this is a way of sorting the data in SQL very easily now we have a whole list of the same titles that are now
sorted and what was the keyword with which I could filter out duplicates yeah distinct so let's try
this same query but let's select only the distinct titles from that whole query and notice I've very deliberately
done it this way and this day anytime I'm using SQL I don't just start at the beginning and type out my whole thought
and just get it right on the first try I very commonly start with the subquery if you will the thing in parenthesis just
to get myself one step toward what I care about then I add to it then I add to it then I add to it just like we've
encouraged in Python and C taking baby steps in order to get to the answer you actually care about like this one now
and other than this mistake which um we didn't fix because I reimported the data after accidentally changing everyone's
genre we now have an alphabetized list of all of the same data but now it's better designed because we have it split
across these two tables oh thank you we're okay just thanks what questions do we have if any
here questions on this approach yeah [Music]
oh now that we have a database how do we transfer it to a CSV there are ways to do that and in fact there's a command
within SQL light that allows you to export your data back to a CSV file if you want to email it to someone and you
want them to be able to open it in Excel or Google spreadsheets or apple numbers or the like you can go in the other
direction generally though once you're in the world of SQL you're probably storing your data there long term and
you're probably updating it maybe deleting it adding to it and so forth for instance the one command I did not
show earlier is suppose someone uh forgot a show let's see do I did I see this in the output all right so Curb
Your Enthusiasm saw that last night it was just yeah did anyone see it last night all right well just the one person
that checked that box so you and me uh what's another show that didn't make the list how about uh Seinfeld is now on
Netflix apparently so insert into uh shows uh what do we want to insert well
we want to insert maybe an ID and a title but you know I don't actually care what the ID is so I'm just going to
insert a title and the value I'm going to give to that title is going to be quote unquote Seinfeld and then let me
go ahead and hit semicolon nothing seems to happen but let me rerun the big query from before looking for comedies and
unfortunately Seinfeld has not yet been flagged as a comedy so let's get this right too what intuitively I'm going to
have to do to associate now Seinfeld with my comedies I just inserted into the shows
table what more needs to happen before we can flag Seinfeld as a comedy say
again yeah so I need to insert into the genres table two things now a show ID like this and then the name of the genre
which presumably is comedy what values do I want to insert well the show ID I better grab that oh I don't even know
what it is I'm going to have to figure out what that is so I could do this in a couple of ways let me do it onetime
thing select star from shows where title equals quote unquote Seinfeld semicolon 159 so now I could do insert into genres
a show ID and a genre name the values 159 and quote unquote comedy semicolon enter and now if I scroll back in my
history and execute that really big query again looking for all to comedies now Seinfeld has made the list but I did
this manually so I didn't actually capitalize it let's clean that up let's do update uh let's do update my shows
set title equals to Seinfeld semicolon no okay thank you where title equals quote unquote Seinfeld let's not
make that mistake again enter and now if I execute that really big query now Seinfeld is indeed considered a um a
comedy so where are we going with this well thus far we've been doing all this pretty manually and this is absolutely
what an analyst a data scientist type person might do if just manipulating a pretty large data set just to get at
interesting answers that might be across one two or even many more tables eventually in a few weeks we're going to
start to automate all of this by writing code in Python that generates SQL to do this right if you go to most any website
on the internet today and you for instance log in odds are you're typing a username and password clicking submit
what's then happening well the website might not be implemented in Python but it's probably implemented in some
language python JavaScript Java Ruby something else and that language is probably using something like a
relational database to use SQL to get your username get your password and compare the two against what you've
typed in and actually it's hopefully not getting your actual password but something called the hash thereof but
there's probably a database involved doing that when you buy something on amazon.com and you click check out odds
are there's some code on Amazon server that's looking at what you is you added to your shopping cart and then maybe
using a a for Loop of some sort in python or another language it's doing a whole bunch of SQL inserts to store in
their database what it is you bought there's other types of databases too but SQL databases or relational databases
are quite popular so let's go ahead and write one other program here in Python that now merges these two languages
together whereby I'm going to use SQL inside of a Python program so I can Implement my sort of logic of my program
in Python step by step line by line but when I want to get at some data I can actually talk to a SQL database so let
me go ahead and open uh favorites. Python here. py and let me go ahead and throw away some of what we did earlier
and really just now add a sequel to The Mix from the cs50 library let's import the SQL function this will be useful to
use because most thirdparty libraries that deal with SQL and python more complicated than they need to be so I
think you'll find this Library easier to use let's then do the following create a variable called DB for database I could
call it anything I want let's use that URI which is a fancy way of saying something that looks like a URL but that
actually opens up a database um on disk that is in the current folder let's now ask the user for a title by
prompting them for a quote unquote title like this and let's strip off any whites space just so that the data is not messy
and then let's go ahead and do this and this is the new logic I'm going to go ahead now and write a line of code that
uses python to talk to the original favorites. DB so again I'm not using the two table database which is in favorites
8. DB I'm using the original that we imported from your own data and I'm going to do the following I'm going to
use db. execute to execute a SQL command inside of python I'm going to select the count
of um shows from the favorites table where the title you typed in where the title user
typed in is like this question mark and why I'm doing that is this follows just like in C when we had percent s in SQL
for now the analog is going to be a question mark So same idea different syntax instead of percent s it's just a
question mark and using a comma outside of this first string using CS 50s execute function I can pass in a SQL
string a command then any arguments I want to plug into the question marks therein so the goal at hand is to
actually write a program that's going to search favorites. CSV AKA favorites. DB for the total number of people that
liked a particular show so this is going to select the count of people from the favorites table where the title they
typed in is like whatever the user has just now typed in this DB execute function returns a list it returns a
list of rows and you would only know that by I telling you or reading the documentation and therefore if I want to
get back the total count I'm going to go ahead and grab the first row from those rows because it's only going to give me
back the count and then I'm going to go ahead and print out that Row's first value but it's going to be a little
weird technically the column is going to be called countstar quote unquote which is a little weird let me add one more
feature to the mix you can actually give nicknam names to columns that are coming back especially if they are the result
of functions like this I can just call that column counter in all lower case that means I can now say get back the
counter key inside of this dictionary so just a recap what have we done we've imported the cs50 library SQL function
we've with this line of code opened the favorites. DB file that you and I created earlier by importing your CSV
into SQL light I'm now just asking the user for a title they want to search for I'm now executing this SQL query on that
database plugging in whatever the human typed in as their title in order to get back a total count and I'm giving the
count a nickname an alias of counter just so it's more uh self-explanatory this function DB
execute no matter what always returns a list of rows even if there's only one row inside of it so this line of code
just gives me the first and only row and then this goes inside of that row which it turns out is a dictionary and gives
me the key counter and the value it corresponds to so what to be clear is this doing let's go ahead and run this
manually in my terminal window first let me run SQL light 3 on favorites uh dot oh let's do this on favorites. DB let me
import the data again so uh mode CSV do Import in from favorites. CSV into a favorites table so I've just
recreated the same data set that you all gave me earlier in favorites. DB if I were to do this manually let's search
for the office again select count star from favorites where title like and let's just manually type it in for now
uh the office we'll search for the one with the the word the' semicolon I get back 12 but technically notice what I
get back I technically get back a miniature table containing one column and one row
what if I want to rename that column that's where the as keyword comes in so select count Star as counter notice what
happens enter I just get back same simple table but I've renamed the column to be counter just because it's a little
more self-explanatory as to what it is so what am I doing with this line of code this line of code is returning to
me that miniature temporary table in the form of a list of dictionaries the list contains one row
as we'll see and it contains one column as we'll see the column the key for which is counter so let's now run the
code itself I'm going to get out of squal light three and I'm going to run python of favorites. py enter I'm being
prompted for a title I'm going to type in the office and cross my fingers and there's that 12 why is it 12 well
there's a typo again because I reimported the CSV I had deleted two of the th so we're back at the original
data set so there's 12 total that have uh quote unquote the office in the title like that so what have we
done we've combined some python with some SQL but we've relegated all of the complexity of searching for something
the selecting of something gotten rid of all of the with keyword the open keyword the for Loop the reader the dict reader
and all of that and it's just one line of sequel Now sort of using the best of both worlds are any questions on what
we've just done here or how any of this works any question questions here yeah when does this function return more
than one row well let me let is was that the question yeah so let's do that by changing the problem at hand this
program was designed just to select the total count let's go ahead and select for
instance um all of the ways you all typed in the office by selecting the title this time if I do this in SQL
light three whoops if I do this in SQL light three let me go ahead and do this again
after increasing my terminal window let's do it manually Select Title from favorite where the title is like quote
unquote uh the office semicolon I get back all of these different rows and we didn't even notice this one there's
actually another little typo in there with some capitalization of the E and the C and the E uh that would be an
example of a query that gives me back there for multiple rows so let's now change my Python program if I now in my
python program do this I get back a whole bunch of rows containing all of those titles I can now do for Row in
rows I can print out the current rows title and now manipulate all of those things together let me keep both on the
screen let me run python of favorites. py and that for Loop now should iterate what 10 or more times once for each of
those titles and indeed if I type in the office again enter oops uh wrot title what did I do wrong oh I I
should not be renaming title to counter this time so that's just a dumb mistake on my part let me rerun it again and now
I should see after typing in the office enter a whole bunch of the offices and because I'm using like even the misc
capitalizations are coming through because like is case insensitive doesn't matter if it's uppercase or lowercase
whereas had I use the equal sign I would get back only the same ones capitalize correctly all right any questions on
this next all right so let's transition to a larger juicier data set and consider some of the issues that arise
when actually now using SQL and skating toward a world in which we're using SQL for mobile apps web apps and generally
speaking very large data sets so let's start with a larger data set just like that give me just a moment to switch
screens over to uh what we have for you today which is an actual relational database that we've created out of a
real world data set from from IMDb so internet mov database.com is a website where you can search for TV shows and
movies and actors and so forth all using their database behind the scenes IMDb wonderfully makes their data set
available as not CSV files but tsv files tab separated values and so what we did is before class we downloaded those tsv
files we wrote a Python program similar to my favorite 8p file earlier that read read in all of those tsv files created
some sequel tables in an IMDb uh uh database for you in SQL light that has multiple tables and multiple columns so
let's go and wrap our minds around what's actually in this data set let me go back to vs code here and in just a
moment I'm going to go ahead and copy the file which we've named shows. DB and I'm going to go ahead and increase my
terminal and do SQL light three of shows. DB whenever playing around with a SQL light database for the first time
typing schema is perhaps a good place to start to give you a sense of what's in there and things just escalated quickly
like there's a lot in this data set because indeed there's going to be tens of hundreds of thousands of rows in this
data set and also problem set seven where we'll look at the movie side of things and not just the TV shows so what
is the schema that we have created for you from IMDB's actual real world data one there's a table called shows and
notice we've just added Whit Space by hitting enter a bunch of times to make it a little more stylistically readable
the shows table has an ID column a title column a year and the total number of episodes for a given show and the types
of those columns are integer text numeric and integer so it turns out there's actually a few different data
types that are worth being aware of when it comes to creating tables themselves in fact in SQL light there's five data
types and only five fortunately one of which is indeed integer negative or positive numeric which is kind of a
catchall for dates and times things that are kind of numeric but are not just integers and not just real numbers for
instance real number is what we've generally thought of as float up until now text of course is just text but
notice that you don't have to worry about how big it is like in Python it will size to fit and then there's blob
which is binary large object which is for just like raw zeros and ones like for files or things like that but we'll
generally use the other four of these and so indeed when we imported this data for you we decided that every show would
be given an ID which is just an integer every show has of course a title which should not be null otherwise why is it
in the database every show has a year which is numeric according to that definition a moment ago and the total
number of episodes for a show is going to be an integer what now is with these primary keys that we mentioned earlier
too a primary key is the column that uniquely identifies all of the data in our case with the favorites I
automatically gave each of your submissions a unique ID so that even if two or more of you typed in the office
your submission still had a unique identifier a number that allowed me to then correlate it with your genres just
as we saw a moment ago in this version of IMDB's there's also genres but they don't come from us they come from
imdb.com and so a genre has a show ID and a genre just like our database but these are real world genres with a bit
more filtration notice though just like my version there's a foreign key a foreign key is the appearance of another
table's primary key in its own table so when you have a table like genres which is somehow cross referencing the
original shows table if shows have a primary key called ID and those same numbers appear in the genres table under
the column called show ID by definition show ID is a foreign key it's the same numbers but it's foreign in the sense
that the number is being used in this table even though it's officially defined
primarily in this other table this is what we mean by relational databases you have multiple tables with some column in
common numbers typically and those numbers allow you to line the two tables up in such a way that you can reconnect
the shows with their genres just like we did with our smaller data set a moment ago this logic is extended further
notice that the IMDb database we've created for you has a Stars table like a uh TV show Stars
the actors there in and that table interestingly has no mention of people and no mention of shows per se it only
has a column called show ID which is an integer and a person ID which is an integer meanwhile if we scroll down to
um if we scroll down to the bottom you will see a table called people and we have decided in IMDB's world that every
person in the movie in the TV show world will have a unique identifier that's a number
a name that's text a birth date which is numeric and then again specifying that ID is going to be their primary uh
primary key so what's going on here well it turns out that Mo uh TV stars and writers are both types of people so
using this relational database notice the the um the road we're going down we're sort of factoring out
commonalities and if a person can be different things in life well we're defining them first and foremost as
people and then notice these two tables are almost the same the Stars table has a
show ID which is a number and a person ID which is a number which allows us via a this middleman table if you will to
link people with TV shows similarly The Writer's table allows us to connect shows with people too by just recording
those numbers so if we go into this data set let's do the following let's do select star from people semicolon it's a
huge amount of data is coming back right this is hundreds of thousands of rows now based on the ID numbers alone so
this is a real world data now flying across the screen there's a lot of people in the TV show business not just
actors and writers but others as well it's still going there's a lot of data there so my God like if you had to do
anything manually in this data set it's probably not going to work out very well and actually we're up to what a million
people in this data set plus which would mean this probably isn't even going to open very well in Excel or Google
spreadsheets or apple number SQL probably is the better approach here let's search for someone specific like
select star from people where name equals like Steve Carell for instance sticking with comedies all right so
there's Steve Carell he is person number 13679 born in 1962 and that's as much data as we have on Steve Carell here how
do we figure out what shows for instance he's in well let's see select star from shows semicolon there's a crazy number
shows out there in the IMDb database and you can see it here again flying across the screen feels like we're going to
have to employ some techniques in order to get at all of Steve Carell's shows so how are we going to do that
well God this is a lot of data here and in fact yeah we have what uh 15 million shows Plus in this data set too so doing
things efficiently is now going to start to matter so let's actually do this let me select a specific show select star
are from shows where title equals quote unquote the office and there presumably shouldn't be typos in this data because
it comes from the real website imdb.com let's get back to show turns out there's been a lot of the offices out in the
world the one that started in 2005 is the one that we want presumably the most popular with 188 episodes how
can we get just that maybe we could do like and year equals uh how about 2005 all right so now we've got back just the
ID of the office that we care about and let's do this too let me turn on a timer within SQL light just to get a sense of
running time now let me do that again select star from shows where title equals the office and year equals 2005
and let's keep it simple let's just do titles for now enter all right so not terribly long it found it pretty fast
but it looks like it took how much real time 02 seconds not bad for just a title but just to plant a seed it turns out
that we can probably speed even this up let me do this let me create something called an index which is another use of
the C in crud for creating something and I'm going to call this like title index and I'm going to create it on the shows
table uh specifically on the title column and we'll see in a moment what this is going to do for me enter took a
moment like 349 seconds to create something called an index but now watch if I select star from shows searching
for the office again previously it took me 0.021 seconds not bad bad but now wow like literally no time at all or
so low that it wasn't really measurable and I'll do it again just to get a sense of things still quite low now even
though 0.021 seconds not crazy long imagine now having a lot of data a lot of users running a real website a real
mobile app every millisecond we can start to shave off is going to be compelling so what is it we just did
well we actually just created something called an index and this is a nice way to tie in now some of our we five
discussion of data structures and our week three discussion of running times an index in a database is some kind of
fancy data structure that allows the database to do better than linear search I mean literally as you just saw these
tables are crazy long or tall right now very linear that is and so when I first searched for the office it was literally
doing linear search top to bottom looking at as many as like what a million plus rows that's relatively slow
I mean it's not that slow 0.021 seconds but that's Rel L slow just theoretically algorithmically doing anything linearly
but if you instead create an index using syntax like this which I just did creating an index on the title column of
the shows table that's like giving the database a clue and Advance saying hey I know I'm going to search on this column
in this table a lot do something with data structures to speed things up and so if you think back to our discussion
of data structures maybe it's using a tree maybe it's using a try or a hash table some Fanci or two-dimensional data
structure is generally going to lift the data up creating right maybe a tree structure so it's just much faster to
find data especially if it's sorting it now based on title and not just store it in one long list and in fact in the
world of relational databases the type of structure that's often used in a database is something called a b tree
it's not a binary tree different use of the letter B but it looks a little something like the tap the trees we've
seen it's not binary because some of the nodes might have more than two children or fewer but it's a very wide but
relatively shallow tree it's not very tall and the upside of that is that if your data stored in this tree the
database can find it more quickly and the reason it took like half a second a third of a second to build the index is
because SQL light needed to take some nonzero amount of time to just build up this tree in memory and it has
algorithms for doing so based on like alphabetization or other techniques but you spend a bit of time up front a third
of a second and then thereafter wow like every subsequent query if I keep doing it again and again is going to be crazy
low 0.0 maybe 0.001 but in order of magnitude a factor of 10 or 100 faster than it previously was earlier so we
have these indexes which allow us to get at data faster but what if we want to actually get data that's now across
these multiple tables how can we do that and how might these indices or indexes help further Well turns out there is a
way that we've seen already indirectly to join two tables together previously when I selected the ID of the office and
then I searched for it in the other table using select and a nested query I was kind of joining two tables together
and it turns out there's a couple of ways to do this let's go ahead now and for instance find all of like Steve
Carell's TV shows not just the office but all of them too unfortunately if we look at our schema shows up here have no
mention of TV oh shows over here has no mention of the TV stars in them and people have no mention of shows we
somehow need to use this table here to connect the two and this is called a join table in the sense that using two
integer columns it kind of joins the two tables together logically and so if you're kind of Savvy enough with SQL you
can kind of do what I did with my hands earlier and like recombine Tables by using these common IDs these integers
together so let me do this let me go ahead and figure out step by step Steve Corell shows so how am I going to do
this well if I select star from people where name equals Steve Carell fortunately there's only one of them so
this gives me back his uh his name uh his name his ID and his birth year but it's really only his ID that I care
about why because in order to get back his shows I need to link person ID with show ID right so I I need to know his ID
number so what could I do with this well remember the schema and the Stars table I've just gotten from the people table
Steve Carell's ID I bet by transitivity I could now use his person ID his ID to get back all of his show IDs and then
once I've got all of his show IDs I can take it one step further and get back all of his shows titles so the answer is
actually English words and not just random seemingly integers so let me go ahead and do this let me again get Steve
Carell's ID number but not star star represents everything it's a wild card character in Sequel let me just select
the ID of Steve Carell and that gives me back 13679 and it's only giving me back one
value the thing called ID is just the column heading up above now suppose I want to select all of the show IDs that
Steve Carell is affiliated with let me select show ID from Stars where the person ID in Stars happens to equal
Steve Carell's ID so again I'm sort of building up my answer in reverse and taking these baby steps on the right in
parenthesis I'm getting Steve Carell's ID on the left I am now selecting all of the show IDs that have some connection
with that person ID in the Stars table this answer to is not going to be that Illuminating it's just a whole bunch of
integers that have no meaning to me the human but let's take this one step further and even though my code is
getting long I could but hit enter and kind of format it nicely especially if I we're doing this in a code file but I'm
just doing it interactively for now let's now select all of the titles from the shows table where the ID of the show
is in this following previous query so again the query is getting long but notice it's the last third and last step
Select Title from the shows table where the ID of the show is in the list of all of the show IDs that came back from the
Stars table searching for Steve Carell's person ID how did we get that person ID let me scroll to the end well I selected
in my innermost parentheses Steve Carell's own ID so now when I hit enter voila I get all of Steve Carell's TV
shows up until now and if I want to tidy this up further I can use the same tricks as before order by title
semicolon now I've got it all alphabetized as before so again with SQL comes the
ability to search I mean look how quickly we did this 094 seconds to search across three different tables to
get back this answer but my data is now all kind of neatly designed in individual tables which is going to be
important now that the data set is so large but but let me take this one step further let me go ahead and do this let
me go ahead and point out that with this query notice that I'm searching on
uh let's say I'm searching on a person ID here and at the end here I'm searching on a name column here so let
me actually go ahead and do this let me go ahead and see if we can't speed this up this
query at the moment takes 092 seconds let's see if we can't speed this up further by just quickly creating a few
more of those B trees in the databases memory create an index called person index and I'm going to do this on the
Stars table uh on the person ID column enter it's taking a moment taking a moment that's almost a full second
because that's a big table let's create another index called show index on the Stars table why because I want to search
by the show ID also that was part of my big query takes a moment okay just more than two about two-thirds of a second
now let's create one last one another index called name index but I could call these things anything I want on the
people table why because I'm also searching on the name column so in short I'm creating indexes on each of the
columns that are somehow involved in my search query going from one table to the other now let's go back to the previous
query which recall took 0.0 whoops let took um I think I erased it 0.091 all right well it was roughly this
order of magnitude we're not seeing the data now but let me go ahead and run my original big query once and boom we're
down to almost nothing so again creating these indexes in memory has the effect of rapidly speeding up our computation
time now if you've ever used for instance uh the my. Harvard course shopping tool here on campus or Yale's
analog you might wonder like why is the thing so slow this could be one of the reasons why large data sets with
thousands of rows thousands of courses tend to be slow if and I'm only conjecturing if the database isn't
properly indexed if you're building your own web application and you're finding that users are waiting and waiting and
things they spinning and spinning what might be among the problems well could absolutely just be bad algorithms and
bad code that you wrote or it might be that you haven't thought about well what columns should be optimized for searches
and filtration like I've done here in order to speed up subsequent queries again from the outside in we can only
conjecture But ultimately um this is just one of the things that explains performance problems as well all right
let's point out just a couple of final syntactic things and then we'll consider bigger picture some problems that might
arise in this world if these Jo Ed nested queries start to get a little much there are other ways just so you've
seen it that you can execute similar logic in SQL for instance if I know in advance that I want to connect Steve
Carell to his show IDs and to their titles we can do something more like this Select Title from the people table
joined with the Stars table on whoops on people ID equals stars. person ID so what am I doing new syntax and again
this is not something you'll have to memorize or ingrain right away but just so you've seen other approaches Select
Title from people join Stars this is an explicit way to say take the people table in one hand the Stars table and in
the other hand and somehow join them as I keep doing with my fingertips here how to join them join them so that the
people the ID column in the people table lines up with the person ID in the Stars table but that's not quite everything I
could also say join further on the shows table where uh the Stars show ID equals the shows ID column so what am
I doing here that's saying go further and join the star the people table sorry the uh Stars table with the shows table
joining the show ID column with the ID column again it's this starts to get a little messy to think about but now I
can just say name equals quote unquote Steve Carell I can do in one query what previously took me three nested queries
and get back the same answers and I can still add in my order by title to get back the results and if I do this a
little more uh neatly let me type this out a little differently let me type this out by adding a new line ah can't
do that here I'm going to leave it alone for now we can type it on multiple lines in
other contexts and let me do one last thing do I want to show that I'm going to show it but this is not something you
should ingrain just yet either Select Title from people stars and shows if you know in advance that you want to do
something with all three tables you can just enumerate them one table name after the other and then you can say where
people. ID equals Stars doerson ID and now I'm hitting enter so that it formats a little more readly on my screen and
stars. show ID equals shows. ID and lastly name equals Steve Carell in short you specify that you want to select data
from all three of these tables and then you tell the database how to combine foreign keys with primary keys that is
the columns that have those integers in col common if I hit enter now I get these same exact results ever more so if
I also add in an order by title oops uh that's why I didn't want to do this earlier I'd have to hit uh I'd have to
go back through my history multiple times to actually get back the multi-line query this time all right
that was a lot all at once but this is only to say that even as we sort of make the design of the data more
sophisticated and we put some of it over here some of it over here some of it over here so as to avoid duplication of
data weird hacks like putting commas in the data we can still get back all of the answers that we might want across
these several tables and using indexes we can significantly speed up these processes so as to handle 10 times as
many a 100 times as many users on the same actual database there is going to be a downside and thinking back to our
discussion of algorithms and data structures in past weeks what might be a downside of creating these indexes
because as of now I created four separate indexes on the name column the title column and some other columns too
like why wouldn't I just go ahead and index everything if it's clearly speeding things up memory so space
anytime you're starting to benefit TimeWise in computer science odds are you're sacrificing space or vice versa
and probably indexing absolutely everything is a little dumb because you're going to create you're going to
waste way more space than you might actually need so figuring out where the right inflection point is as part of the
process of Designing and just getting better at these things now unfortunately a whole lot of things can
go wrong in this world and they continue to in the real world with people using SQL databases in fact here on out if
you're reading something technical about SQL databases and websites being hacked in some form and passwords leaking out
unfortunately all too often it is because of what are called SQL injection attacks and just to give you a sense now
to counterbalance maybe any the enthusiasm for like woo that was neat how we can do things so quickly with
great power comes responsibility in this world too and so many people introduce bugs into their code by not quite
appreciating what it is that um how it is the data is getting into your application so what do I mean by that
here for instance is a typical login screen for Yale and here's the analog for Harvard where you're prompted like
every day probably for your username and your password your email address and your password Here suppose though that
behind this login page whether Harvard's or yalees there's some websit and that website is using SQL underneath the hood
to store all of the Harvard or Yale people's usernames passwords ID numbers courses transcript all of that stuff so
there's a SQL database underneath the website well what might go wrong with this process unfortunately there's some
special syntax in SQL just like there is C in Python for instance there are comments in SQL to if you do two hyphens
Das Dash that's a comment in SQL and if you the programmer aren't sufficiently distrustful of your users such that you
defend against potentially adversarial attacks you might do something like this suppose that I somewhat maliciously or
uh curiously log in by typing my username Ma at harvard.edu and then maybe a single quote and a Dash Dash why
because I'm trying to sus out if there is a vulnerability here to a SQL injection attack do not do this in
general but if I were the owner of the website trying to see if I've made any mistake I might try using potentially
dangerous characters in my input dangerous how because single quote is used for quoting things in SQL as we've
seen single quotes or double quotes dash dash I claim now is used for commenting but let's now imagine what the code
underneath the hood might be for something like uh Yale's login or Harvard's login what if it's code that
looks like this so let me read it from left to right suppose that they are using something like cs50's own execute
function and they've got some SQL typed into the website that says select star from users where username equals this
and password equals that and they're plugging in username and password so what am I doing here well when the user
types their username password hits enter I probably want to select that user from my database to see if the username and
passwords match so the underlying SQL might be select St from users where username equals question mark and
password equals question mark users is the table one column is username one column is password all right and if we
get back one row presumably M at harvard.edu exists with that password we should let him proceed from there on out
so that's like some pseudo code if you will for the scenario what if though uh this code is not as well written as it
currently is and isn't using question marks so the question mark syntax is a fairly common sequel thing where the
question marks are used as placeholders just like in print F percent s was but this function db. execute from cs50's
library and third party libraries as well is also doing some good stuff with these question marks and defending
against the following attack suppose that you were not using a thirdparty library like ours and you were just
manually constructing your SQL queries like this you were to do something like this instead using an F string in Python
you're you're comfortable with format strings now you've gotten into the habit of using curly Braes and plugging in
values suppose that you the aspiring programmer is just using techniques that you've been taught so you have an F
string with select star from users where username equals quote unquote username in curly braces and password equals
quote unquote password in curly braces right like as of L what two weeks ago this was perfectly legitimate technique
in Python to plug in values into a string but notice if you are using single quotes yourself
and the user has typed in single quotes to their input what could go wrong here like where are we going with this if
you're just blindly plugging user input into your own prepared string of text [Music]
yeah yeah worst case they could insert what is actually SQL code into your database as follows generally speaking
if you're using special syntax like single quotes to surround the user input you'd better hope that they don't have
an apostrophe in their name or You' better hope that they don't type a single quote as well because what if
their single quote finishes your single quote instead and then the rest of this is somehow ignored well let's consider
how this might happen let me go ahead in here it's got a little blurry here but let me plug in here wow that looks awful
let me fix the red just change this to White so it's more readable what happens if the user
does this instead they type in like I did into the screenshot maen harvard.edu single quote
dash dash what has just happened logically even though we've only just begun with SQL today well select star
from users where username equals Ma at harbor. edu end quote what's bad about the rest of
this D- I claim means a comment which means my color coding is going to be a little blurry again but everything after
the dash dash is just ignored the logic then of this SQL query then is to just say select Ma at harvard.edu from the
database not even checking the password anymore therefore you will get back at least one row so length of rows will
equal one and so presumably the rest of the pseudo code logs the user in gives them access to my my. harbard account or
whatever it is and they've pretended to be me simply by using a single quote and a Dash Dash in the username field again
please don't go start doing this later today on Harvard y or other websites but it could be as simple as that why
because the programmer practiced what they were taught which was just to use curly braces to plug in in FST strings
values but if you don't understand how the user's input's going to be used and if you don't distrust your users
fundamentally for every good person out there there's going to be unfortunately some adversary who just wants to try to
find fault in your code or hack into your data set this is what's known as a SQL injection attack because the user
can type something that happens to be or look like SQL and trick your database into doing something it didn't intend to
like for instance uh logging the user in worst case they could even do something else maybe the user types a semicolon
than the word drop or the word update you could imagine doing semicolon update table grades where name equals m and set
the grade equal to a instead of b or something like that the ability to inject SQL into the database means you
can do anything you want with the data set either constructively or Worse destructively all right and now just a
quick little cartoon that should now make [Music]
sense okay to like one of us two of us awkwardly somewhat funny all right so let's move on to one last condition
there's one other problem that can go a here oh and I should explain this so this is an illusion to uh the son Robert
having typed in semicolon the word drop table students and doing some of the same technique this is sort of humor
that only CS people would understand because it's the mom realizing oh her son's doing a SQL injection attack onto
the database less funny when you explain it but if once you notice the syntax that's all this is an illusion to all
right so one final threat now that you are graduating to the world of proper databases and away from CSV files alone
things can go wrong when using databases and honestly even using CSV files if you have multiple users and thus far you and
I have had the luxury in almost every program we've written that it's just me using my code it's just you using your
code and even if your teaching fellow or ta is using it probably not at the same time but the world gets interesting if
you start putting your uh code on phones on websites such that now you might have two users literally trying to log in at
the same time literally clicking a button at the same or nearly the same time what happens then if a computer is
trying to hand over requests from two different people at once as might happen all the time on a website you might get
what are called race conditions and this is a problem in Computing in general not just with SQL not just with python
really just anytime you have shared data like a database as follows this apparently is one of the most uh liked
uh Instagram posts ever it is literally just a picture of an egg has anyone clicked on this egg like a couple okay
wow all right so yes so go search for this photo if you'd like to add to the on Instagram the account is world record
egg um this is just a screenshot of Instagram of that picture of an egg if you're in the habit of using Instagram
or like any social media site there's some equivalent of a like button or a heart button these days and that's
actually a really hard problem such a simple idea to like count the number of likes something has but that means
someone has to click on it your code has to detect The Click your code has to update the database and then do it again
and again even if multiple people are perhaps right now clicking on that same egg and
unfortunately bad things can happen if two people try to do something this at the same time on a computer how might
this happen so here's some more code sort half pseudo code half python code here as follows suppose that what
happens when you literally right now maybe click on the like button on the Instagram post suppose that code like
the following is executed on Facebook servers db. execute of Select likes from Posts where ID equals question mark all
right so what am I assuming here I'm assuming that that photograph has unique ID it's like some big integer whatever
it was randomly assigned I'm assuming that when you click on the heart the unique ID is somehow sent to Instagram
server so that their code can call it ID and I'm assuming that uh Instagram is using a SQL database and selecting from
a post table the current number of likes of that egg for that given ID number why cuz I need to know how many likes it
already has if I want to add one to it and then update the database right I need to select the data then I need to
update the data here all right so in some python code here let's store in a variable called likes whatever comes
back in the first row from the likes column again this is new syntax specific to our library but a common way of
getting back first row and the column called likes therein so at this point in the story likes is storing the total
number of likes and the millions or whatever it is of that particular egg then I do this execute update posts set
the number of likes equal to this value where the ID of the post equals this value what do I want to update the likes
to whatever likes currently is plus one and then plugging in the ID so a simple idea right I'm checking the value of the
likes and maybe it's 10 I'm changing 10 to 11 and then updating the table but a problem can arise if two people have
clicked on that egg at roughly the same time or literally the same time why is that well in the world of databases and
servers and the Instagrams of the world have thousands of physical servers nowadays so they can support millions
billions even of users nowadays what can go wrong well typically code like this is not what we'll call Atomic to be
Atomic means that it all executes together or not at all rather code typically is executed as you might
imagine line by line and if your code is running on a server that multiple people have access to which is absolutely the
case for an app like Instagram if you and I click on the heart at roughly the same time for efficiency the computer
the server owned by Instagram might execute this line of code for me then it might execute this line of code for you
then this line of code for me then this line of code for you then this line of code for me then this line of code for
you that is to say our queries kind of might get intermingled uh chronologically because it' be a little
obnoxious if when you're using Instagram I'm blocked out while you're interacting with the site it'd be a lot nicer for
efficiency and fairness if somehow they do a little bit of work for me a little bit of work for you and back and forth
and back and forth equitably on the server so that's what typically happens by default these lines of code get
executed independently and they can happen in alternating order with other users you can get them sort of combined
like this same same order top to bottom but other things might happen in between so suppose that the number of likes at
the very beginning was like 10 and suppose that Carter and I both click on that egg at roughly the same time and
suppose this line of code gets executed for me and that gives me a value in likes ultimately of 10 suppose then that
the computer takes a break from dealing with my request does the same code for Carter and gets back what value for the
current number of likes also 10 for Carter because mine has not been recorded yet at this point in the story
somewhere in the computer's memory there's a likes variable for me storing 10 there's a likes variable represent
storing 10 for Carter then this line of code executes for me it updates the database to be likes plus one which
stores 11 in the database then Carter's code is executed updating the same Row in the database
to 11 unfortunately because his value of likes happened to be the same value of mine and so the metaphor here that if we
had a refrigerator on stage we would actually act out is something that was taught to me years ago in an operating
systems class whereby the kind of most similar analog in the real world would be if like you've got like a mini fridge
in your dorm room and you and a roommate uh are uh one of you and your roommates comes home opens the fridge and realizes
like oh we're out of milk was how the story went in my day so you close the refrigerator and you walk across the
street go to CVS and get in line to buy some milk Meanwhile your roommate comes home they too inspect the state of your
refrigerator AKA a variable open the door and realizes oh we're out of milk I'll go get more milk close the fridge
go across the street and head to maybe a different store or the line is long enough that you don't see each other at
the store so long story short you both eventually get home open the door and damn it now there's milk from your other
roommate there because you both made a decision on this based on the state of a variable that you
independently examined and you didn't somehow communicate now in the real world this is absolutely solvable like
how would you fix this or avoid this problem in the real world literally own roommate own
fridge perfect let them know so somehow communicate and in fact the terminology here would be multiple threads can
somehow intercommunicate by having shared State like the iMessage thread on your phone you could leave a note you
could more dramatically lock the refrigerator somehow thereby making the milk purchasing process Atomic the
fundamental problem is that for efficiency again computers tend to intermingle logic that needs to happen
when it's happening across multiple users just for fairness' sake for scheduling sake you need to make sure
that all three of these lines of code execute for me and then for Carter and then for you if you want to ensure that
this count is correct and for years when social media was first cting off the ground this was a super hard problem
Twitter used to go down all of the time and tweets and retweets were a thing that were similarly happening with a
very high frequency these are hard problems to solve and thankfully there are solutions and we won't get into the
weeds of how you might use these things but know that there are solutions in the form of things called locks which I use
that word deliberately with the fridge uh software locks can allow you to protect a Vari so no one else can look
at it until you're done with it um there are things called transactions which allow you to do the equivalent of
sending a message to or really locking out your roommate from accessing that same variable too but for slightly less
uh amount of time there are solutions to these problem so for instance in Python the same code now in green might look a
little something like this when you know that something has to happen all at once altogether you first begin a transaction
and you do your thing and then you commit the trans transaction at the very end here too though there's going to be
a downside typically the more you use transactions in these way the potentially the higher the probability
is that you're going to box someone out or make Carter's request a little slower why because we can't interact at the
same time or you might make his request fail if he tries to update something that's already been updated so you
generally want to have as few lines of code together in between these transactions so that you get in and you
get out and you go to CVS and you get back really fast so it's to not cause these kind of performance things so
things indeed escalated quickly today the original goal was just to solve problems using a different language more
effectively than python but as soon as you have these more powerful techniques a whole new set of problems arises takes
practice to get comfortable with but ultimately this is all leading us toward the introduction next week of web
programming with HTML CSS and some JavaScript the week after bringing Python and SQL back into the mix so that
by terms in we've really now used all of these different languages for what they're best at and over the next few
weeks the goal is to make sure you're understanding and comfortable with what things each of these things is good and
bad for let's go ahead and wrap here I'll Stick Around for questions we'll see you next time
[Music] all right this is cs50 and this is already week eight and if we think back
to like the past several weeks now recall that things started pretty interestingly pretty interactively in
like week zero when we were using scratch because with scratch we had a gooey a graphical user interface so even
as we explored variables and loops and conditionals and all of that you had kind of a fun environment in which to
express those ideas and then in week one we sort of took a lot of that away when we introduced c and a terminal window
and a command line because now all of your programs became very textual uh very keyboard-based and gone was the
mouse the animation the menus and so forth and so now fast forward to week eight we're going to bring those kinds
of user interface UI elements back in the form of web programming and this goes beyond just laying out websites
this will to this week and next week combine elements of the backend server stuff that we've been doing for the past
several weeks using python using SQL and now introducing a couple of other languages on the so-called client side
on your own Mac your own PC your own phone that's going to talk to those backend services so indeed at this uh
end of cs50 just does everything rather come together into a user interface that's just super familiar all of us are
on our phones desktops laptops every day and increasingly even the mobile apps that you all are using are implemented
not necessarily in languages like Swift or Java if you're familiar with those but with languages called HTML CSS and
JavaScript which we'll focus on here today but before we do that let's kind of provide a foundation on which these
apps can run because indeed we'll start to look underneath the hood of how the internet itself works be it quickly so
that we have kind of a mental model for where all of this code is running how you can troubleshoot issues and how
really ultimately after cs50 you can learn by just poking around other actual websites so the internet we're all on it
literally right now what is it in your own words what is the
Internet it's this utility nowadays that we all rather take for granted how would you describe
it okay big storage and indeed that's how the cloud is described which is kind of an abstraction if will for a whole
lot of wires and cables and hardware and the internet other formulations of the term how
else okay a bunch of data that we can all reach by way of being interconnected somehow with wires or wirelessly and so
really the internet to is is a hardware thing there's a whole lot of servers out there that are somehow interconnected
via physical cables via internet service providers via Wireless connectivity and the like and once you start to have
networks of networks of networks do you get the internet indeed Harbor has its own network and Yale has its own network
and your own home probably has its own network but once you start connecting those networks do you get the
interconnected Network that is the internet as we now know it so there's this whole alphabet soup that goes with
the internet some of whose acronyms and terms you've probably seen before but let's at least peel back some of those
layers and consider what some of the building blocks are so here's a picture of the internet before it was known as
the internet back in 1969 when it was something called arpanet uh from the advanced research projects agency and
the intent originally was just to interconnect a few universities here in Utah and California literally servers or
computers on each of those uh in each of those areas somehow interconnected with wire so that people could start to share
data a year later it expanded to include MIT and Harvard and others and now fast forward to today you have a huge number
of systems around the world that are on this same network and in fact if I just pull up a web page here that's sort of
constantly changing a visualization of the internet as it might now be today this here in the abstract all of these
lines and interconnections represent just how interconnected the world is today and it just means that there's all
the more servers all the more cabling all of the more Hardware giving us this underlying infrastructure but if we
focus really on just these nodes these individual dots whether Back in 1970 or now in 2021 each of these dots you can
think of is yes a server but a certain type of server namely known as a router and a router as the name implies just
routes data left to right top to bottom from one point to another and so there's all these servers here on campus at
Harvard on yalees campus in Comcast network Verizon's Network your own home network you have your own routers out
there whose purpose in life is to take in data and then decide should I send it this way or this way or this way so to
speak assuming there are multiple options with multiple cables you and your home probably have just one cable
coming in or going out but certainly if you're a place like Harvard or or Yale or comcast or the like there's probably
a whole bunch of interconnect interconnections that the data can then travel across ultimately so how do we
get data among these routers for instance if you want to send an email to someone at Stanford in California from
here on the East Coast or if you want to visit www.stanford.edu how does your laptop
your phone your desktop actually get data from point A to point B well essentially your laptop or phone knows
when it boots up at the beginning of the day what the local router is what the address of that local router is so if
you want to send an email from like my laptop over here my laptop's essentially going to hand it to the nearest harbard
router and then from there I don't know I don't care how it gets the rest of the distance but hopefully within some small
number of steps later Harvard's router is going to send it to maybe Boston's router is going to send it to
California's router is going to send it to Stanford's ryer until finally it reaches Stanford's email server and we
can depict this actually about a bit playfully thankfully um the corses staff kindly volunteered to create
visualization for this uh using a familiar technology so here we have some of our TFS and Tas and Casa present and
past let me go ahead and full screen this window here give me just a moment to pull it up on my screen here and
we'll consider what happens if we want to send a a packet of information from one person or router namely Phyllis in
this case in the bottom right hand corner up to Brian in this case in the top leftand corner so each of the staff
members here represents exactly one of these routers on the internet [Music]
[Applause] [Music] [Applause]
[Music] the Applause is appreciated actually took us a significant number of attempts
to get that ultimately right so when what was it the staff were all passing here here we have just physically what
it was the staff were passing around so Phyllis started with an envelope inside of which was that email presumably on
the east coast and she wanted to send it to Brian on the west coast top leftand corner and so she had all of these
different options different connections between her and point B namely Bryant she could go up down uh in her case and
then each of those subsequent routers could go up down left or right until it finally reaches Brian and long story
short there's algorithms that figure out how you decide to send a packet up down left or right so to speak but they do so
by taking an input and in the form of input is this envelope and there's at least a couple of things on the outside
of this because all of these routers and in turn all of our Macs and PCs and phones these days speak something called
tcpip a set of acronyms you've probably seen somewhere on your phone your Mac or PC and print somewhere which refers to
two protocols two conventions that computers use to intercommunicate these days now what's a protocol a protocol is
like a set of rules that you behave in healthier times I might extend my hand and someone like Carter might extend his
hand thereby interacting with me based on a human protocol of like literally physically shaking hands nowadays we
have mask protocols whereby what you need to do is wear a mask indoors but that too it's just a set of rules that
we all follow and adhere to that's somewhere standardized and documented so computers use protocols all the time to
govern how they are information and receiving information and TCP and IP are two such protocols that standardize this
as follows what TCP IP tells someone like Phyllis to do is if she wants to send an email to Brian is put the email
in a virtual envelope so to speak but on the outside of that virtual envelope put Brian's unique address and I'll describe
this as destination on the middle of the envelope just like in our human world you would write the destination address
on the envelope and then she's going to put her own source address address in the top left hand corner just like you
the sender would put your own source address in the human world but instead of these addresses being like something
Kirkland streets Cambridge Massachusetts 02138 USA you probably know that computers on the internet have unique
addresses of their own known as IP addresses and an IP address is just a numeric identifier on the internet that
allows computers like Phyllis and Brian to address these envelopes to and from each other and you've probably seen the
format at some point typically the format of IP addresses is something do something do something do something each
of those something represented here with the hash symbol is a number from 0 through
255 and based on that little hint if each of these hashes represents a number from 0 to 255 each of those hashes is
represented with how many bytes or bits eight bits or one bite which is to say we can extrapolate from there an IP
address must use 32 bits or four bytes if we re now to some of the Primitives we looked at in week zero and what that
means is at least At a Glance it looks like we have 4 billion some odd IP addresses available to us now
unfortunately there's a huge number of humans in the world these days all of whom have many of whom have multiple
devices uh certainly in places like this where you have a laptop and a phone and you have other Internet of Things type
devices all of which need to be addressed so there's another type of IP address that's starting to be used more
commonly this is version four of Ip there's also version six which instead of 32 bits uses 128 bits which gives us
a crazy number of possible addresses for computers so we can at least handle all of the more uh the additional devices we
now have today so this is to say what ultimately is going on this envelope is the destination address that is Brian's
IP address and the source address that is phyllis's IP address so that this packet can go from point A to point B
and if need be back by just flipping the source and the destination but on the internet you presumably know that
there's not just email servers there's web servers there's chat servers video servers Game servers like there's all of
these different functions on the internet nowadays and so when Brian receives that envelope how does he know
it's an email versus a uh web page versus a Skype call versus something else altogether Well turns out that we
can look at the other part of this acronym the TCP in tcpip and what TCP allows us to do for instance is specify
a couple of things one the type of service whose data is in this envelope that is it does this with the uh a
numeric identifier and I'm going to go ahead and write down a colon and the word port p o r t and I'm going to write
that in the source address to colon and Port so technically now what's on this envelope is not just the addresses but
also a unique number that represents what kind of service has is being sent from point A to point B whether it's
email or web traffic or Skype or something else these numbers are standardized and here are just two of
the most common ones not even in the context of email but in the context of the web Port 80 is typically used
whenever an envelope contains a web page or a request therefore or the number 443 when that request is actually encrypted
using that thing you probably know in URLs known as https where the S literally means secure more on what the
http means later if it's emailed the number might be 25 or 465 or 587 like these are the kinds of things you Google
if you ultimately care about but if you've ever had to configure like Outlook or even Gmail to talk to another
account you might very well have seen these numbers by typing in something like smtp.gmail.com and then a number
which is only to say these numbers are omnipresent but they're typically not things you and I have to care about
because servers and computers nowadays automate much of this process but that's all it takes ultimately for Phyllis to
get this message to Brian but what if it's a really big message if it's a short email it might fit perfectly in
one single packet so to speak but suppose that Phyllis wants to send Brian a picture of a a cat like this or worse
a video of a cat it would be kind of inequitable if no one else could do anything on the internet just because
Phyllis wants to send Brian a really big picture a really big video of a cat it would be nice if we could kind of time
share the interconnections across these routers so that we can give a little bit of time to Phyllis a little bit of time
to someone else a little bit of time to someone else so that eventually phyllis's entire cat gets through the
internet but in terms of uh in terms of fairness she doesn't monopolize the bandwidth of the network in question um
and this then allows us to do one other feature of tcpip which is fragmentation where we can temporarily and phyllis's
computer would do this automatically fragment the big packet in question or the big file in question and then use
not just a single envelope but maybe a second a third and a fourth or more if we do that though we're probably going
to need one other piece of information just logically on these envelopes like if you were implementing this chopping
up this picture of a cat into four parts like intuitively what might you want to put virtually on the outside of this
envelope now yeah order the order of them somehow so probably something like part one of
four part two of four part three of four and so forth so I'm going to write one more thing in like the memo line of the
envelope here I put some kind of sequence number that's just a little bit of a clue to Brian to know in what order
to reassemble these things and even more powerfully than that this actually gives us this simple primitive of just using
ins on these envelopes in these packets if Brian receives envelopes like these with numbers like these in the memo
field what other feature does TCP apparently enable Brian and fillis to implement this is a bit subtle but it's
not just the ordering of the packets what else might be useful about putting numbers on these things might you
think what might be useful here yeah back if you missed something that was intended to be sent if I heard that
correct so shorter answer exactly yes so TCP because of this simple little integer that we're including can quote
unquote guarantee delivery why because if Brian receives one out of four two out of four four out of four but not
three out of four he now knows predictably that he needs to ask Phyllis somehow to resend that packet and so
this is why pretty much always if you receive an email you either receive the whole thing or nothing at all like
sentences and words and paragraphs should never really be missing from an email or if you download a photograph on
the web it shouldn't just have a blank hole in the middle just because th that packet of information happened to be
lost TCP if it is the protocol being used to transmit data from point A to point B ensures that it either all gets
there or ultimately none of it at all so this is an important property but just as a teaser there's other protocols out
there there's something called UDP which is an alternative to TCP that doesn't guarantee delivery and just as a taste
of why you might ever not want to guarantee delivery maybe you're watching like a streaming uh video like a a sport
uh event online you probably don't necessarily want the thing to buffer and buffer and buffer just because you have
a slow connection because you're going to start to miss things and then you're going to be the only one in the world
watching the game that ended 20 minutes ago when everyone else is sort of up to speed similarly for a voice call be
really annoying if our voice is constantly buffered so UDP might be a good protocol for making sure that even
if the person on the other end sounds a little crappy at least you can hear them it's not pausing and resending and
resending because that would really slow down that sort of human interaction so in short IP handles the addressing of
these packets and standardizes numbers that every computer your own included gets and TCP handles the standardization
of like what services can be used uh between point a and point B all right this is great but presumably when
Phyllis sends a message to Brian like she doesn't really know and probably shouldn't care what his IP address is
right these days it's like I don't know most of the phone numbers that my friends have I instead look them up in
some way and indeed when you visit a website what do you type in it's typically not something do something do
something do something where each of those something as a number what do you typically type in to a browser so a
domain name right something like stanford. edu harvard.edu yale.edu gmail.com or any other such domain name
and so thankfully there's another system on the internet one more acronym for today called DNS domain name system and
pretty much every Network on the internet Harvard's yales Comcast your own home network somewhere somehow has a
DNS server you probably didn't have to configure it yourself someone else did your campus your job your internet
service provider but there's some server connected somehow to to the network you're on Via wires or wirelessly that
just has a really big table in its memory a big spreadsheet if you will or if you prefer a hash table that has at
least two columns of keys and values respectively where on the left hand side is what we'll call domain name something
like harvard.edu yale.edu an IP address on the right hand side that is to say a DNS server's purpose in life is just to
translate domain names to IP addresses and vice versa if you want to go in the other direction and technically just to
be precise it translates fully qualified domain names to IP addresses and we'll see what those are in just a moment but
again all of this just kind of happens magically when you turn on your phone or your laptop today because all of these
things are preconfigured for us nowadays so how can we actually start to see some of these things in action well let's go
ahead and and poke around for instance at a couple of URLs here let's see what we can actually do now
with these basic Primitives if we now have the ability to move data from point A to point B and what can be in that
envelope could be yes an email but today onward it's really going to be web content there's going to be content that
you're requesting like give me today's homepage and there's content you're sending which would be the contents of
that actual homepage and so just to go one level deeper now that we have these packets that are getting from point A to
point B using tcpip let's put something specific inside of them them not just an email
and a bunch of text but something called HTTP which stands for hyper text transfer protocol you've seen this for
decades now probably in the form of URLs so much so that you probably don't even type it nowadays your browser just adds
it for you automatically and you just type in harvard.edu or yale.edu or the like but HTTP is just a final protocol
that we'll talk about here that just standardizes how web browsers and web servers intercommunicate so this is a
distinction now between the internet and the web the internet is really like the low-level Plumbing all of the cables all
of the technology that just moves packets from left to right right to left top to bottom that gets data from point
A to point B you can do anything you want on top of that internet nowadays email and web and video and chat and
gaming and and all of that so HTTP or the web is just one application that is conceptually on top of built on top of
the internet once you take for granted that there is an internet you can do really interesting things with it just
like in our physical world once you have electricity you can just assume you can do really interesting things with that
too without even knowing or caring how it works but now that you'll be programming for the web it's useful to
understand how some of these things indeed work so let's take a Peak at the format of the things that go inside of
these messages these days it's usually actually https that's in play where again the S just means secure more on
that later but the HTTP is what standardizes what kinds of M messages go inside of these envelopes and
wonderfully it's just textual information typically there's a simple text format that humans decided on years
ago that goes inside of these envelopes that tells a browser how to request information from a server and how to
respond from the server to that client with information so here's for instance a canonical URL https
col. example.com what might you see at the end of this you might sometimes see a slash browsers nowadays kind of
simplify things and don't show it to you but slash as we'll see just represents like the default folder the root of the
web server's hard drive like whatever the base is of it it's like C colon backlash on Windows or it's uh you know
my computer on Mac OS but a URL can have more than that it can have SL path where path is just a word or multiple words
that sort of describe a longer part of the URL that path could actually be a specific file we'll see like something
called file. HTML more on HTML in just a bit or it can even be slf folder maybe with another slash or maybe it can be SL
folder file. HTML now these days Safari and even Chrome to some extent and other browsers are in the habit of trying to
hide more and these more and more of these details from you and I from you and me ultimately though we'll it'll be
useful to understand what URLs you're at because it Maps directly to the code that we're ultimately going to write but
this is only to say that all this stuff in yellow refers to presumably a specific file and or folder on the web
server on which you're programming all right what's this example.com this is the domain name as we described it
earlier uh uh example.com is the so-called domain name this whole thing www.example.com is the fully qualified
domain name and what the WW is referring to is specifically the name of a specific server in that domain so back
in the day there was a www.example.com web server there might have been a mail. examp le.com mail
server there might have been a chat. example.com chat server nowadays this host name or subdomain depending on the
context can actually refer to a whole bunch of servers right when you go to www.fb.com that's not one server that's
thousands of servers nowadays so long story short there's technology that somehow get your data to one of those
servers but this whole thing is what we meant by fully qualified domain name this thing here host name in the context
of an email address it might alternatively be called a subdomain this thing here top level domain probably
know that means commercial although anyone can buy it these days.org is similar net uh some of them are a bit
restricted Mill is just for the US military edu is just for credit educational institutions but there are
hundreds if not more top level domains nowadays some more popular than others cs50 is tools for instance use uh cs50.
iio uh IO sort of connotes input output it actually is a uh belongs though to uh a small island nation a country whose
country code isio and you see other twl uh top level domains that are
country specific indeed uh something. UK something. JP and the like typically refer to countries but some of them have
been rather co-opted TV as well because they have these meanings in English as well lastly this is what we'll call the
protocol that specifies how the server uses this URL to get data from point A to point B so what is inside of this
envelope let's now start poking around a little bit more what is inside of this envelope it's essentially for our
purposes today one of two verbs either get or post and any of you have dabbled with HTML or made your own website you
might have seen some of these terms before but these two verbs describe just how to send information from you to the
server long story short more on this next week get means put any user input in the URL post means hide it so that
things you're searching for credit card numbers you're typing in usernames and passwords you're inputting don't show up
in the URL and are therefore visible to anyone with access to your computer and your search history uh but rather
there's somehow provided elsewhere deeper into that envelope but for now we'll focus almost entirely on get which
is perhaps the most common one that we're always going to use and what we're going to do is this let me switch over
just to a blank screen here and if we assume that little old me is this laptop here and I'm connected to the cloud and
in that cloud is some server that I want to request the web page of harvard.edu or yale.edu it's really going to be a
two-step process there's going to be a request that goes from point A to point B and then hopefully the server that
hears that request is going to reply with what we'll typically call a response and other terms that are
relevant here is my laptop is the so-called client harvard.edu yale.edu whatever it is is the so-called server
and just like in a restaurant where you might request something to eat the server might bring it to you it's again
that kind of bir directional relationship one request one response for each such web page we request all
right so what's inside these envelopes and what do we actually see well this Arrow this line I just drew from left to
right representing the request technically looks a little more like this when you visit a web page using
your browser on your phone laptop or desktop what's going inside that envelope and the textual message your
Mac or PC or phone is automatically generating looks a little something like this the get the URL or rather the path
that you want to get SL represents the default page on the website http/1.1 is just some mention of what
version of HTTP you're speaking now we're up to versions two and version three but 1.1 is quite common and the
envelope contains some mention of the host that was typed in the fully qualified domain name this is because
single servers can actually host many different websites if you're using Squarespace or Wix or one of these
popular hosting websites nowadays days you don't get your own personal server most likely you're on the same server as
dozens hundreds of other customers but when your customers your users browsers include a little mention of your
specific fully qualified domain name in the envelope Squarespace and Wix just know to send it to your web page or my
web page or some other customer altogether dot dot dot there's some other stuff there but that's really the
essence of what's in these respon requests hopefully then when your browser requests this web page from the
server what comes back well hopefully a response that looks like this http/1.1 so the same version some status
code like a number 200 and then literally a short phrase like okay which means exactly that like okay this
request was satisfied then it contains some other information like the type of content that's coming back and we'll see
that this too is standardized text/html means here comes some HTML which is just a text language it could instead be
image/jpeg or image slping or video/ MP4 there are these different content types otherwise known as mime types that
uniquely identify types of files that come back similar in spirit to file extensions but a little more
standardized this way then there's more some more stuff dot dot dot but in general what you see here are our
familiar pattern keys and values these keys and values are otherwise known as HTTP headers and your browser has been
sending these every time you visit a website and indeed we can see this right now ourselves let me go over in just a
second to Chrome on my computer though you can do this kind of thing with most any browser today I'll go ahead and
visit HTTP colh harvard.edu enter and voila I'm at Harvard's homepage for today the content often changes but this
is what it looks like right now well I typed in the URL but notice it changed a little bit it actually sent me to https
and it added www even though I didn't type that but it turns out we can poke around at what my browser is actually
doing let me open another page and I'm going to start to use incognito mode this time not because I care that people
know I'm visiting harvard.edu but because it throws away any history that I just did so that every request is
going to look like a brand new one and that's just useful diagnostically because we're always going to see fresh
information my browser is not going to remember what I previously already requested but I'm going to go up to view
developer developer tools which is something that all of you have if you use Chrome and there's something
analogous for Firefox and Edge and Safari and other browsers developer tools is going to open up these tabs
down here I don't really care what's new so I'm going to close the bottom thing there and I'm going to hover over the
network tab for just a moment and now I'm going to go and say HTTP colh harvard.edu so the shorter version I'm
going to hit enter and a whole bunch of stuff just flew across the screen and it's still coming in and if I zoom in
down here my God visiting harvard.edu still going is downloading what 17 18 19 megabytes 20 megabytes millions of bytes
of information over 111 HTTP requests in other words bit of a simplification but my browser unbeknownst to me sent one
envelope initially with the request then the server said okay by the way there's 110 other things you need 112 other
things you need to get so my computer went back and forth requesting even more content for me why well inside of
Harvard's web page is a whole bunch of images and maybe sound files and videos and other stuff that all need to be
downloaded and to compose what is ultimately the web page but I don't care about like a 100 plus of these things
let's focus on the very first one first the very first request I sent was up here and I'm going to click on this row
under the network Tab and then I'm going to see a bit of diagnostic information to an average person using the web they
needn't care about this just as you probably didn't care about it until right now and even then perhaps not but
if I scroll down to request headers you will see if I click view source literally everything that was in the
request my Mac just sent to harvard.edu two of the lines are familiar get/ HTP 1.1 host colon harvard.edu and then
other stuff that for now it's not that interesting for us but let's look at the response that came back from the server
I'm going to scroll up now and see response headers view source and this is interesting it is not okay there's no
200 there's no word okay curiously harvard.edu has moved moved permanently what does that mean well
there's a whole bunch of stuff here that's not that interesting for us but this line location is interesting this
is an HTTP header a standardized key value pair that's part of the HTTP protocol that is conventions and if I
highlight just this one it's telling me Harvard is not at HTTP colh harvard.edu Harvard's website is now and perhaps
Forever at https www.harvard.edu so what's the value here
probably someone at Harvard wants you to use a secure connection so they redirected you from HTTP to htps maybe
the marketing people want you to be at www instead of just harvard.edu just to standardize things but there are
technical reasons to use a host name and not just the raw domain name and all this other stuff is sort of
uninteresting for our purposes now because a browser that receives a 301 response Knows by design by the
definition of HTTP to automatically redirect the user and that's why in my browser all of this
happened in like a split second cuz I didn't really know or care about all of those headers but that's why and how I
ended up at this URL here my browser was told to go elsewhere via that new location and the browser just followed
those breadcrumbs if you will at which point it downloaded all of the other images and files and so forth that
compose this particular page well let me let me zoom out and let me actually go into vs code if only because it's a
little more pleasant to do things in just a terminal window without actually using a full-fledged browser so now
let's just use an equivalent program it's called curl for connecting to a URL that's going to allow me to play with
websites and just see those headers without bothering to download all the images and text and so forth from the
website it's going to allow me to do something like this let me go ahead and run uh for instance curl - I-X getet
which is just the command line arguments that says simulate a get request textually as though you're a browser and
let's go to http col harvard.edu enter now by way of how curl works I'm just seeing the headers it didn't bother
downloading the whole website and you see exactly the same thing 301 moved permanently location is indeed this one
here so that's kind of interesting but let's follow it manually now let's now do what it's telling me to do let's go
to the location with https and the www and hit enter and now what's a good sign with this output
most of it's irrelevant 200 okay that means I'm seeing presumably if I were using a real
browser the actual content of the web page looks like Harvard's version of HTTP is even newer than my the one I'm
using it's using HTTP version two which is fine but 200 is indeed indicative of things being okay well what if I try
visiting some bogus URL like harvard.edu and this uh file does not exist something completely random probably
doesn't exist and hit enter what do you see now that's perhaps familiar in the real world
yeah yeah error 4 or4 all of us have seen this probably endlessly from time to time when you screw up by mistyping a
URL or someone deletes the web page in question but all that is is a status code that a browser is being sent from
the server that's a little clue as to what the actual problem is underneath the hood so instead of getting back for
instance something like okay or move permanently what I've just gotten back quite simply is 404 not found well it
turns out there's other types of status codes that you'll start to see over time as you start to program for the web
200's okay 301 is moved permanently 302 304 307 are all similar in spirit they're related to redirecting the user
from one place to another 401403 uh unauthorized or forbidden if you ever mess up your password or you
try visiting a URL you're not supposed to look at you might see one of these codes indicating that you just don't
have authorization for those 404 not found 418 I'm a teapot was a April Fool's joke by the community Tech
Community years ago 500's bad and unfortunately all of you are probably on a path now to creating HTTP 500 errors
once next week we start writing code because all of us are going to screw up we're going to have typos logical errors
and this is on the horizon just like segals were in the world of sea but solvable with the right skills 503
service unavailable means maybe the server's overloaded or something like that and there's other codes there but
those are perhaps some of the most common ones um has anyone I can we can get away with this here l so in New
Haven has anyone um ever visited uh safetyschool.org [Music]
https safetyschool.org dare we do this enter oh look at that where did we end
up okay so so this has been like a joke for like 10 or 20 years someone out there has been
paying for the domain name safetyschool.org just for this two second demonstration but we can now
infer how did this work the person who bought that domain name and somehow configured DNS to point to like their
web server the IP address of their web server what is their web server presumably spitting out whenever a
browser requests the page what status code perhaps well we can simulate this let me go over to vs
code uh let me go back over here let me increase my terminal window let me do curl D I-X get
https safey school.org enter and that's all this website does there's not even an actual
website there no HTML no CSS languages we're about to see it literally just exists on the internet to do that
redirect there um In fairness um there are others uh let me actually do another one here uh instead of safetyschool.org
turns out someone some years ago bought Harvard sus.org enter and when we do this you'll see
that oh I they don't need us to be secure but I do need the the www let's do this one enter oh that one is not
found this demo actually worked for so many years but someone has stopped paying for the Squarespace discount
recently apparently so okay so fortunately we did save the YouTube video to which this thing refers
and so just to put this into context since it's been quite a few years Harvard and Yale of course have this
long-standing rivalry uh there is this tradition of pranking each other and honestly hands down one of the best
pranks ever done in this rival we was by Yale to Harvard it's about a 3-minute retrospective it's one of the earliest
videos I dare say on YouTube so the quality is uh representative of that but let me go ahead and full screen my you
my page here and what used to live at harvard.org is this video here if we could dim the lights for about 3 minutes
[Music] the top this is for you Yale we love you
Yale we're here to go har
down go we're nice you see that look at them they it's
[Music] toen it's actually going to happen I can't believe this what do you
think of Yale they don't think good does everyone have it does everyone have their stuff does everyone have
their stuff probably that it's going to be legible very small
comp I know but it's complicated what house [Music]
how many extra how many extra are there no fimer yeah make sure everyone
has all the carded all right now [Applause]
[Music] [Applause] what do you think of
[Applause] Y one more one more time one more one more time
[Applause] oh there comes [Applause]
[Applause] again all right so thanks to our friends at Yale for that one um let's go ahead
here and consider in just a moment moment what further is deeper down inside of the envelope because we now
have the ability to get data from we oh okay all YouTube autoplay again got to stop doing that let's consider for just
a moment that let's consider for just a moment that we now have this ability to get data from point A to point B and we
have the ability to specify in those envelopes what it is we want from the website we want to get the homepage we
want to get back the HTML but what is that HTML in fact we don't yet have the language with which the web web pages
themselves are written namely HTML and CSS but let's go ahead and take a 5minute break here and when we come back
we'll learn those two languages all right we are back so we got three languages to look at today but
two of them are not actually programming languages what makes something a programming language like C or Python
and and SQL is that there are these contracts via which you can express conditionals you might have variables
you might have looping constructs you have the ability ultimately to express logic HTML and CSS aren't so much about
logic they are about structure and the Aesthetics of a page and so we're going to create like the skeleton of a web
page using this pair of languages HTML and CSS and then toward the end of the today we'll introduce an actual
programming language that actually is pretty similar in spirit and syntactically to both C and python but
that's going to allow us to make these web pages not just static things that you look at but interactive applications
as well and then next week again in week n will we reintroduce Python and SQL tie all of this together so that you can
actually have a brows or a phone talking to a backend server and creating the experience that you and I now take for
granted from most any app or website today well let's go ahead and do this let's quickly whip up something in this
language called HTML I'm in vs code here I'm going to go ahead and create a file quite simply called uh hello.html the
convention is typically to end your file names in HTML and I'm going to go ahead and bang this out real quick but then
we'll more slowly step through what the constructs are here in so I'm going to say Doc type HTML Open Bracket HTML and
then notice I'm going to do Open Bracket SL HTML close bracket and I'm leveraging a feature of vs code and programming
environments more generally to do a bit of autocomplete so you'll see that there's this symmetry to much of what
I'm going to type but I'm not typing all of these things VSS code is automatically generating the end of my
thought for me if you if you will uh let me go ahead and say uh open the head tag open the title tag I'll say something
cute like hello title and then down here I'm going to create the body of this web page and say something like hello body
and let me specify at the very top that all of this is really in English Lang equals en so at this moment I have a
file in my vs code environment called hello.html vs code as we're using it of course is cloud-based we're using it in
a browser even though you can also download it and run it on a Mac and PC so we are kind of in this weird
situation where I'm using the cloud to create a web page and I want that web page to also live in the Cloud that is
on the internet but the thing about VSS code or really any website that you might use in a browser by default that
website is using probably TCP port number 80 or TCP port number 443 which is HTTP and https respectively but here
I am sort of a a programmer myself trying to create my own website on an existing website so it's a bit of a
weird situation but that's okay because what's nice about TCP is that you and I can just pick po port numbers to use and
run our own web server on a web server that is we can control the environment entirely by just running our own web
server via this command http-server in my terminal window this is a command that we pre-installed in vs
code here and you'll notice a pop-up just came up your application running on port 8080 is available that's a commonly
used TCP port number when 80 is already used and 443 is already used you can run your own server on your own port 8080 in
this case I've opened that tab in advance and if I go into another browser tab here here I see a so-called
directory listing of the web server I'm running so I don't see any of my other files I don't see anything belonging to
vs code itself I only see the file that I've created in my current directory called hello.html and so if I click on
this file now I should see Hello body I don't see the title but that's because the title of a web page nowadays is
typically embedded in the tab and if I'm full screen in my browser there are no tabs so let me minimize the window a bit
and now you can see just in this single browser window and my own URL here that hello body is in the top left hand
corner and if I zoom in there's hello title so what have I done here I have gone ahead and created my own web page
in HTML in a file called hello.html and then I have opened up a web server of my own configured it to listen on TCP port
8080 which just says to the internet Hey listen for requests from web browsers not on the standard port number 80 or
443 listen on 8080 and this means I can develop a website using a web- based tool like this one here which is
increasingly common today all right so now let's consider what it is I actually just typed out HTML is characterized
really by just two features two vocab words tags and attributes most of what I just typed were tags but there was at
least one attribute already here's the same source code that I typed out in HTML from top to bottom let's consider
what this is the very first line of code here here doc type HTML is the only anomalous one it's the only one that
starts with an Open Bracket a less than sign and an exclamation point there's no more exclamation points thereafter for
now this is the document type declaration which is a fancy way of saying it's just got to be there
nowadays it's like a little breadcrumb at the beginning of a file that says to the browser I am you are about to see a
file written in HTML version five that line of code has changed over time over the years the most recent version of it
is nice and succinct like this and it's just a clue to the browser it's to what version of HTML is being used by you the
programmer all right what comes after that well after that and I've highlighted two things in yellow this is
what we're going to start calling an open tag or a start tag Open Bracket HTML then something close bracket is the
so-called start or open tag then the corresponding close or end tag is down here and it's almost the same you use
the same tag number you use the same angled brackets but you do add a slash and you don't repeat yourself with any
of the things called attributes because what is this thing here Lang equals quote unquote N means the language of my
page is written in the English language uh humans have standardized two and threel codes for every human language
right now and so this is just a clue to the browser for like automatic translation and accessibility purposes
what language the web page itself is written in not the tags but the words like hello title and hello body which
while minimalist are indeed in English so when you close a tag you close the name of it with the Slash and the angled
brackets you don't repeat the attribute that would just be annoying to have to type everything again but notice the
pattern here it's new syntax but this is another example of key value pairs in Computing the key is Lang the value is
en for English the attribute is called Lang the value is called is n so again it's just key value Pairs and just yet
another context probably the browser is using a hash table underneath the hood to keep track of this stuff like a two
column table with keys and values again humans keep using the same Paradigm in different languages what's inside of
that the nesting is important visually not to the computer but to us the humans because it implies that there's some
hierarchy here and indeed what is inside of the HTML tag here well we have what we'll call the head tag the head tag
says Hey browser here comes the head of the page and then the body tag says Hey browser here comes the body of the page
the body is like 99% of the user's experience the big rectangular window the head is really just the address bar
and other such stuff at top like the title that we saw a moment ago just to introduce the vernacular then the HTML
tag otherwise known as an element has two children the head child and the body child which is to say that head and body
are now siblings so you can use the same kind of family tree terminology that we used when talking about trees weeks ago
if we look at the head tag how many children does it seem to have I'm saying one and indeed at least if we ignore all
the white space the the spaces or tabs or new line characters there's just one child a title element and an element is
the terminology that includes the start tag and the end tag and everything in between so this is the title element and
the title element has one child which is just pure text otherwise known as a text node recall node from our discussions of
data structures weeks ago if we jump then to the body which is the other child of the HTML tag it too has one
child which is just another chunk of text a text node that says quote unquote hello body what's nice about this
indentation even though the browser technically is not going to care is that it implies this kind of structure and
this is where we connect like weeks five and now weeks eight here is the tree structure we began to talk about even in
our world of sea it's not a binary tree even though this one happens to have no more than two children it's an arbitrary
tree that can have zero or any number of children but if we have a special note here that refers to the document the
root note so to speak is HTML drawn with a rectangle here just for discussion sake it has two children head and body
also rectangles head has a title child and then it and body have text nodes which I've drawn with ovals instead
which is only to say that when your browser Chrome Safari whatever downloads a web page opens up that envelope and
sees the contents that have come back from the server it essentially reads the code that someone wrote the HTML code
top to bottom left to right and creates in the browser's memory in your Mac or your PC or your phone's memory or Ram
this kind of data structure that's what's going on underneath the hood and that's why aesthetically it's just nice
as a human to indent things stylistically because it's very clear then to you and to other programmers
what the structure actually is so that's it for like the fundamentals of HTML we'll see a bunch of tags and a bunch of
examples now but HTML is just tags and attributes and it's the kind of thing that you look them up when you need to
eventually many of them get ingrained I constantly check the reference guides or stack Overflow if I'm trying to figure
out how do I lay something out it's really just these building blocks that allow you to assemble the structure of a
web page this one's being super simple but it's just tags and attributes any questions on this framework before we
start to add more tags more vocabulary if you will and in the middle yeah if we put the T tag around body uh
that's a good question let's try it so let me actually go to uh this and say Open Bracket title
whoops sometimes you don't want it to finish your thought for you but it did that time I've gone ahead and uh changed
the file let me go and open up give me a second to open my terminal window and go back to the URL that has my page give me
a second there's my hello.html let me zoom in on this let me zoom in on this and
let me go ahead now and click on hello.html and in this case it looks like we don't
actually see anything so the browser is hiding it technically speaking browsers tend to be pretty generous and half the
time when you make mistakes in HTML it will display it might display not display as you intended it might not
display the same on Macs or PCS or Chrome or on Firefox there is a tool though that we'll see that can help
answer this question for you for instance if I go to validator w3.org W3 is the worldwide Web Consortium a group
of people that standardize this kind of stuff I can click on validate by direct input and just copy paste my sample HTML
into this box and click check and I should see hopefully that indeed it's an error what you propose that I do the
browser just did its best to do something which was to show me nothing at least rather than the incorrect
information but if I revert that change and let me undo what we just did let me copy my original code back into this
text box and click check now you can see conversely my code is now correct and there's automated tools to check that
but we'll encourage you for problem sets and projects to use that particular manual tool all right so let's go ahead
and enhance this a little bit by introducing a whole bunch of tags just to give you a sense of some of the
building blocks here so I'm going to go ahead and create a new file uh called paragraphs. HTML and I'm just going to
do a bunch of copy paste just to start things off so I'm not constantly typing all this darn stuff again and again
because I want everything to be the same here except I'm going to change my title to be paragraphs for this demo and
inside of the body I need a whole bunch of paragraphs of text and I don't really want to come up with some text so let me
go to some random website here and grab laurum ipsum text which if you're involved in like student newspaper or
just design uh this is just placeholder text kind of looks like Latin but technically isn't here though I have a
handy way of just getting three long paragraphs in something that looks like Latin and I've put those notice inside
of the body and they're indeed long look how long the uh the madeup words here are so let me go now into my browser
here let me reload this page and you'll see two files have now appeared paragraphs. HTML which is my new one and
hello.html let me click on paragraphs. HTML and what clearly seems to be wrong
yeah yeah it's obviously one massive paragraph instead of three so that's interesting but it's just a little hint
as to how pedantic HTML is it will only do what you say and each of these tags tells the browser to start doing
something and then maybe stop doing something like hey browser here comes my HTML hey browser here comes the head of
my page hey browser here comes the title of my page hello title hey browser that's it for the title that's it for
the head here comes the body tag so it's kind of having this conversation between the browser uh between the HTML and the
browser doing literally what it says so if you want a paragraph you're probably going to want to use the P tag for
paragraph and I'm going to go ahead and add this to my code I'm going to keep things neat even though the browser
won't care by indenting things here let me create another paragraph tag here and close it right after that one indenting
again and I'm keeping everything nice and orderly let me do one more here let me indent that and then let me add it to
the end of my page here so again a little tedious but now I have three paragraphs of text that say Hey browser
start a paragraph hey browser stop that paragraph start stop and so forth let me go back to the browser window here let
me hit command r or controlr to reload the page and voila now I have three cleaner paragraphs all right so there's
a P tag for paragraphs so now we have that particular building block what if I want to add for instance some headings
to this page well that's something that's possible too let me go ahead and create a new file called headings. HTML
let me copy and paste that same code as before but now let's preface each paragraph with maybe H1 and I'm going to
just write the word one and here I'm going to say H2 2 and down here I might say h33
so this is another tag another three tags H1 H2 H3 as you might have inferred by the file name I chose this is gives
you headings like in a chat in a book different chapters or sections or subsections or in an academic paper you
have different hierarchies to the text that you're writing so now that I've added an H1 tag and the word one h2 tag
the word two H3 tag and the word three let's go back to the browser reload the page again and
voila once the page reloads I'll do it with the manual button reload the
page oh what am I doing wrong yeah I'm right right I'm not in the
headings file so let me go back a page now there's headings. HTML let me click on that okay now we see some evidence of
this again it's nonsensical content but you can kind of see that H1 is apparently big and bold H2 is slightly
less big but still bold H3 is the same but a little smaller and it goes all the way down to H6 after that you should
probably reorganize your thoughts but there's six different hierarchy here as you might use for chapters sections
subsections and so forth all right so those are headings as an HTML tag uh in our vocabulary what's a common thing too
well let me go to uh vs code again let me go ahead and get some boiler plate here create a file
called list. HTML let's create a simple list inside of my body and I'll give this a title of uh list and let me fix
the title of this one to be headings as well so in list HTML suppose I want to have a list of things uh Foo bar and baz
or like a computer scientist goto words just like a mathematician might say XYZ fu barbaz is in list. HTML let me go
back to my browser hit the back button there's list. HTML and hopefully I'll see Fu bar and baz one on each line like
a nice little list but of course I do not and this is not English Chrome thinks it might be Arabic um but that's
um curious too because the land attribute should be overriding that so Google is trying to override it all
right what's the obvious explanation why we're seeing Fubar and bz on the same line and not three separate
ones we didn't tell it to do that so we need paragraph tags or maybe something else turns out there is something else
there is a UL tag for an unordered list in HTML inside of which you can have Li tags for list item inside of which you
can put your words so there's my Foo there's my bar there's my baz and again notice that VSS code is finishing my
thought for me but notice the hierarchy open UL open Li close Li open Li close Li open Li close Li close UL so it's
sort of done in reverse order here let me go back to my browser uh reload the same page list. HTML and voila a default
bulleted list that still seems to be in Arabic what if I want this list to be numbered well you can probably guess if
you don't want an unordered list but an ordered list what tag should I use o sure so let's try that not always that
easy as just guessing but in this case o is going to do the trick let me go back to my other browser let me reload the
page and now it's going to automatically number for me it's a tiny thing but this is actually useful if you have a very
long list of data and maybe you might add some things in the middle the beginning or the end it would just be
annoying to have to go and renumber it the computer is doing it for us by instead just numbering from top to
bottom here all right what about another type of layout not just paragraphs not just lists but what about about tabular
data you've got some research data you want to present some financial data you want to present a phone book that you
want to present how might we go about laying out data all a table well let me create a file called table.
HTML and I'll just copy paste where we started earlier let me start to close some of these other files and in table.
HTML this is going to be a bit more HTML but I'm going to go ahead and do this table and close table tables can have
table headings so T head is the name of that tag and tables can have T bodies table bodies so I'm going to add that
tag and this is a common technique sort of start your thought finish your thought and then go back and fill in
what's in between what do I want to put in this table how about a bunch of names and numbers uh so for instance like left
column name right column number so let's create a table row with What's called the TR tag let's create a table heading
with the T tag and let's say name here let's create another table heading called number here and all of that to be
clear is in one table row meanwhile in the table body let me create another table row but this time it's it's not a
heading now I'm in the guts of my table let's do table data which is synonymous with like the cell of the table in uh
like in an Excel spreadsheet or Google spreadsheet in this TD I'm going to say like Carter's name and then let's you
grab Carter's number from our past demo 617495 1000 then let's put me into the mix and I'll go ahead and copy paste
here which often is not good but we'll see that there's a lot of shared structure with HTML let me go ahead and
do mine 949 468 2750 and now save this page so we're getting to be a lot of indentation um
I'm using four spaces by default some people use two spaces by default so long as you're consistent that's considered
good style but let me go back to my browser here and hit back that then brings me to my directory listing again
here's table. HTML and this is not that interesting yet but you can see that there's two columns name and number
because it's a table heading th uh the browser made it boldfaced for me in there in the table are two rows below
that Carter and David it's a little oh I forgot my number one sorry about that one and one it's not the prettiest table
right I feel like I kind of want to separate things a little more maybe put some Borders or the like but with HTML
alone I'm really focusing on the structure alone so we'll make this prettier soon but for now this is how
you might lay out tabular data all right let me pause here just to see if there's any questions but again the goal right
now is just to kind of throw at you some basic building blocks that again can be easily looked up in a reference but
we're going to start stylizing these things soon too and yeah in the
middle how do you indent paragraphs really good question for that we're probably going to want something called
CSS cascading stylesheet so let me come back to that in just a little bit for the stylization of these things beyond
the basics like big and bold we're going to need a different language Al together all right well let's now create what the
web is full of which is uh like photographs and images and the like let me go ahead and create a new file called
image. HTML and let me go ahead and change the title here to be say image and then in the body of this page let's
go ahead and put an image the interesting thing about an image is that it's actually not going to have a start
tag and an end tag because that's kind of illogical like how can you start an image and then eventually finish it it's
either there or It Isn't So some tags do not have end tags so let me do image IMG uh Source equals harvard. JPEG and let
me go ahead and in my terminal window I actually came with a photo of Harvard let me grab this for just a
second uh let me grab harvard. JPEG and put it into my directory pretend that I downloaded that in advance and so I'm
referring to now a file called harvard. JPEG that apparently is in the same folder as my image. HTML file if this
image were on the inter internet like Harvard server I could also say like https www.harvard.edu
folder name whatever it ish harvard. JPEG but if you've in advance uploaded a file to your own vs code environment
like I did before class by dragging and dropping this full file this photo of Harvard you can just refer to it
relatively so to speak this would be the same thing as saying Doh harvard. jpeg go to the current directory and get the
file called harvard. JPEG but that's unnecessary to type for accessibility purposes though for someone uh whose
vision impaired it's ideal if we also give this an alternative text something like uh Harvard University and the
so-called alt tag and this is so that screen readers will recite what it is the photo is for folks who can't see it
and if you're just on a slow connection sometimes you'll see the text of what you're about to see before the image
itself downloads especially on a mobile device so let's now go back to my open browser Tab and let's look in the
directory I now have harvard. jpeg which I downloaded in advance an image. HTML let me click on image. HTML and here we
have a really big picture of Memorial Hall the building we're currently in so if I sit to say I should probably fix
this and maybe make it only so wide but to do that we're going to probably want to use this other language CSS there are
some historical attributes uh that you can still use to control width and height and so forth but we're going to
do it the better way so to speak with a language designed for just that how about a video though I also came
prepared with let me grab another file here uh let me grab a file called uh halloween. mpp4 which is an MPEG file
and let me go ahead and change this now to be a file called video. HTML I'll change my title to be video and let's go
ahead and now introduce another tag a video tag Open Bracket video and then let me go ahead and close that tag
proactively and then inside of the video tag uh you can say the source of the video is going to be specifically
halloween. mpp4 the type of this file I know is video/ MP4 because I looked up its content type or MIM type and the
video tag actually has a few attributes I can have this thing autoplay I can have it Loop forever I can mute it so
that there's no sound which is necessary nowadays most browsers to prevent ads don't autoplay videos if they have sound
so if you mute your video it will autoplay but presumably not annoy users and let me set the width of this thing
to be like oh 1280 pixels wide but I can make it any size I want so I know this just from having you know looked up the
Syntax for this tag but notice one curiosity sometimes attributes don't have values they're empty attributes
they're just single words autoplay Loop muted and that kind of makes sense for any attribute that really does what it's
says like it doesn't make sense to say muted equals something like it's either muted or not the attribute is there or
or not similarly for these others as well so let me go back to my other browser tab reload the directory listing
there is both my mp4 and also video. HTML which is the web page that embeds it and this is actually a video that was
just on Harvard's website last yesterday and it was amazing so we included it in this demo
here this is the video that was on harvard.edu last night same photo but you can see here
that an image alone probably would not have the same effect this is actually a movie a small video file that's now
looping now there's some artifacts here like there's a white border around the top I feel like it'd be nice to fill the
screen but again we'll come back to a language that can allow us to do exactly that well it's not just videos like this
that you might want to put into a web page let me create another file called if frame. HTML if you've ever poked
around with if you have your own YouTube account or you had your own blog or WordPress site or Wix or Squarespace you
might have been in the habit of embedding videos in websites using like embedded YouTube players well this is
possible too using what's called an inline frame an iframe and an iframe is just a tag that is literally iframe it
has Source equals and then a URL and if it happens to be a YouTube video there's a certain URL format you need to follow
per YouTube's documentation so you might do www.youtube.com embed and then here's an
ID of a video uh so this is essentially what we do if we want to embed uh cs50's own
lecture videos in the course's website or the video player does literally this if I want to allow full screen I can add
this attribute too that I know exists by just having checked the documentation and if I now go back to my browser here
reload my directory listing there's if frame. HTML it's not going to fill the screen because I haven't customized the
Aesthetics yet but it does seem to embed a tiny little video there for for you to play with later if you'd like so we
could change the width change the height get rid of that Mar and so forth but an iframe is a way of embedding someone
else's web page in your web page if they allow it so as to create all the more of an interactive experience for them on
say your site all right well let what the web is of course known for things like links let's go ahead and create a
file called link. HTML and if we want to create a web page that actually links from itself somewhere else let's go
ahead and do this something very simple like visit uh harvard.edu period now in like Facebook Instagram a
lot of websites nowadays if you just type in a domain name or a fully qualified domain name it automatically
becomes a link that's because those websites have code in them that automatically detects something that
looks like a URL and turns it into a proper link HTML itself does not do that for you and so if I go back to my web
page here click on link. HTML if you type visit harvard.edu period that's all you're literally going to see but
instinctively even if you've never written HTML before what should we probably do here to solve this
[Music] problem what could we do to solve this problem what what do what do I probably
want to add yeah URL yeah so I want to surround the URL with some kind of Link text and you
wouldn't necessarily know this until someone told you where you looked it up but the tag for creating a link is
somewhat weirdly called the a tag for anchor it has an attribute called href for hyper reference which is like a link
in the virtual world to a URL so let me type in Harvard's full and proper URL here then I'm going to close the tag and
then I can still say harvard.edu and make that what the human sees but the place they're going to go should be a
full URL protocol and all HTTP or https and all now if I go back here and reload the page now it automatically gets
underlined it happens to be purple by default why because we visited harvard.edu a few minutes ago so my
browser by default is indicating in purple that I've been there before but now I have a L that I can click on and
if I hover over it but don't click you'll see that in most browsers there's a little clue as to where you will go if
you click subsequently on this link and without going too far down a rabbit hole but to tie together our discussion of
cyber security recently what if I were to do something like this right now you have the beginnings of a fishing attack
of sorts p i s h i n g whereby you can create clearly a web page or heck even an email using HTML that tells the user
they're going to go one place but they're really going to go someplace else altogether and that is the essence
of fishing attacks these days if you've ever gotten a bogus email pretending to be from PayPal or your bank or some
other website odds are they've just written HTML that says whatever they want but the underlying tags might do
something very different and so having the instinct to look in the bottom leftand corner or be a little suspicious
when you're just told blindly to click on a link it's this easy to socially engineer people that is deceive them by
just saying one thing and linking to another well what if I want to link my page to another page I already created
well if I want to link to like that photo of Harvard I can just do href equals quote unquote in the name of a
file in my same account that is itself a web page so this is how you can create relative Links of multi-page web pages
multi-page websites yourself so if I now reload this page hover over harbor. edu you'll see in the bottom left hand
corner a very long URL but that's because I'm in code spaces right now vs code and
it's appending automatically to the end of my current URL the file name image. HTML but this should work when I click
on this I go immediately to that file we created earlier with a crazy big version of the image but that's just a way that
one page on a website can link to another page on a website let's do one other thing here uh making things more
responsive because in fact that wasn't a particular Lally responsive website responsive means responding to the size
of the user's device which is so important when someone might be on a screen like this or on a screen like
this these days there are special tags we can use to tell the browser to modify its display based on the hardware so let
me create a file called responsive. HTML I'm going to copy paste some starting point here call this title responsive
and let me go ahead and just grab uh let me grab some of that laurum ipsum text from before just so that we have a
sizable paragraph to play with here and let me go ahead and grab this text here and I'm just going to paste this into
the body of this page and that's it so I just have a big paragraph at the moment inside of my body let me go back to my
browser let me open up this file called responsive. HTML to make the point that it is not yet responsive let me go ahead
and click on responsive. HTML that looks fine but here's another trick you can do using Chrome or Edge or other browsers
these days you can pretend to be another device let me go to view developer developer tools again last time we used
this to use the network tab which was kind of interesting because we could see what the underlying Network traffic is
but notice we can also click on this icon in Chrome at least that looks like a mobile phone I can turn my laptop into
what looks like a mobile device by clicking this I'm going to click the dot dot dot menu over here and just move the
dock instead of on the bottom where it might be by default I'm going to move it to the right hand side so that now on
the left you see what looks more like the shape of a vertical phone and in fact if I go to my Dimensions here I'll
choose something like iPhone x a few years back here's what that same website might look like on an iPhone x you know
that looks pretty damn small you know to be able to read it and that's because the website has not automatically
responded to The Fairly narrow dimensions of the iPhone in question or Android device or whatnot so let me go
ahead and do this let me go back into my code and let me go into the head of the page and for the first time add another
tag up here uh this word is now all over the internet but there is a meta tag that is called that allows you to
specify the name of some kind of configuration detail here or property if you will viewport is the technical term
for the rectangular region that the human sees in a browser it's essentially the body of the page but only the part
the human is currently seeing and you can specify the content of the viewport should have an initial scale of one so
it shouldn't be zoomed in or out out and the width that the browser should assume should be equal to the devic's width
these are sort of magical statements that you just have to know or copy paste or transcribe that just Express to the
browser assume that the width of the page is the same thing as the width of the device don't assume the luxury of a
big laptop or desktop computer now making only that change let me go back to my pretend iPhone here using Chrome's
developer tools let me reload the page and now it's not very effective on this screen if I were showing you this
on is there well I'm there we go let's do this there we go so if I zoom in to 100% this would be on an actual physical
device much more readable than it would have been a moment ago even though I realized that demo was not necessarily
persuasive but it's as simple as telling the browser to resize the thing to the width of the page all right let me pause
here to see if there's any questions because that feels like enough HT ml tags we'll add just a couple of more in
but for the most part like HTML tags are things you Google and figure out over time just to build up your vocabulary
the basic building blocks are tags attributes some attributes have values some do not and that's sort of the
structure of HTML in essence questions on any of these though yeah do attributes have an order uh no
attributes can be in any order from left to right I tend to be a little nitpicky and so I alphabetize them if only
because then I can easily spot if something's Missy missing if it's not there alphabetically most people on the
internet don't seem to be uh do that yeah in the middle yeah good question I mentioned
that HTML is starting to replace other languages for user interfaces and it's not just HTML alone it's HTML with CSS
with JavaScript both of which we'll get a taste of here today um that rather has been the trend for portability and the
ability for companies for individual programmers to write one version of an app and have it work on Android devices
and iPhones and Macs and PCs and the like it is very expensive it is very timec consuming to learn a language like
Java and write an Android app learn another language called Swift and make an IOS app not to mention make them look
and behave the same not to mention fix a bug in one and then remember to fix it in the other I mean this is just very
painful and timec consuming and costly so this this standardization on HTML CSS and JavaScript even for mobile apps and
web apps has been increasingly comp compelling because it solves problems like that all right so let's go ahead
and now do something that's finally interactive all of these Pages thus far are really just tastes of static content
content that does not change well let's go ahead and and do this let me introduce one other format of URLs which
looks a little something like it did before so SL paath but it could actually be something like this SL path question
mark key equals value you might not have noticed or cared to notice the URLs in URL bar every day but these things are
everywhere often when you type into a search engine like Google a search query whatever you just typed ends up in the
URL when you click on a link that contains some information there might be a question mark and then some keys and
values there might be an Amper sand and more keys and values here again is that very common programming Paradigm of just
associating keys with values we can see this as follows let me actually go to uh google.com in a browser here and let me
search for something the internet is filled with cats enter notice now that my URL changed from google.com to
google.com/ search question mark Q equals cats Ampersand and then a bunch of stuff that I don't understand or know
so let's just delete it for now and leave it with the essence of that URL and that still works if I zoom out here
years ago you would get pictures of cats now you get uh videos of the the the movie and then that top query there is
cats a bad movie um but we can also of course click on images and there are the adorable cat creepy cats all right this
didn't Ed to happen when we search for cats but anyhow the point is that the URL changed to include the user's input
and this is such a simple but such a powerful thing this is how humans provide input to servers they don't
manually create the URLs like I sort of just did but when you fill out a form on the web and you hit enter typically the
URL suddenly Chang to include whatever you typed in in the URL assuming the form is using the verb get that's not
ideal if you're typing in a username a password a credit card information because you don't want the next person
to sit down at your laptop to see literally everything you typed in saved in your history so there's another verb
post that can hide all of that and it's just sent a little differently but things like this are typically sent via
get and what that means underneath the hood is that your browser is just making a request like this get slash search
question mark Q equals whatever you typed in the host that you visited and so forth and hopefully what comes back
is a page full of search results including cats and what's interesting here now is if I go back to vs code on
my own computer and let me go ahead and create a file called how about search. HTML in search. HTML I'm going to start
with some copy paste from before change my title to search and in the body of this page I'm going to introduce a form
tag and in this form tag I'm going to have a couple of inputs and the types of inputs are going to be uh text and the
type of the input is going to be submit and this isn't that interesting yet but let's see what's happening in the page
itself let me go back to my directory listing let me click on search. HTML I seem to have the beginning of my own
search engine it's not very interesting it's just a text box and a submit Buton but let's finish my thoughts here so
let's specifically give this text box a name of Q which um if you roll back to the late 90s when Larry and Sergey of
Google Fame created google.com Q represented query the query that the human's typing in so the name of this
text box shall be uh text shall be Q the form is going to use what method technically it uses get by default but
I'll be explicit and say method equals quote unquote get stupidly it's lowercase in HTML even though what's in
the envelope is indeed uppercase by convention the action of this form specifically would ideally go to my own
server but we don't really have time today to implement Google itself so we're just going to send the user's
request to google.com/ search so I'm creating a form the action of which is to send the data to Google's search path
using the get method it's going to send a input called Q whenever I click this submit button let me go back uh to the
browser reload the page nothing seems to have changed yet but if I search for let me zoom out so we can see the URL bar
right now I'm in search. HTML if I zoom out and search for cats now and click submit I'm whisked away to google.com
but notice that the URL is parameterized with those key value pairs that key value pair and I get back a whole bunch
of cap results and I can very easily now make this a little prettier right now it's not ideal that like the human has
to move their cursor and click in the box and it's a little obnoxious that autocomplete is enabled if I don't want
to search for cats anymore well according to html's documentation I can say something like this autocomplete
equals off to turn off autocomplete autofocus to automatically put the cursor inside of that text box if I want
some explanatory text I can put placeholder text like quote unquote query and now if I go back to this page
and reload now it's a little more user friendly you see query and kind of gray text the cursor is already there and
blinking I don't have to even move my cursor cursor I can search for dogs now and you didn't see any auto complete at
all hit enter to submit and now I'm searching for there we go adorable dogs instead so what have I done I've
implemented the front end of google.com just not the back end so implement the back end where obviously going to need
like a really big database maybe something like SQL we're going to need some code that like searches the
database for dogs or cats or anything else we're going to need python for something like that and in fact that's
the direction we're steering next week when we Implement that back end so today it's all about this front end all any
question then about forms these URL parameters before we now transition to making things look a little prettier
with CSS and then we'll end by making things a little more functional with JavaScript anything at
all no all right so let's start to answer a couple of the questions that came up by making these Pages a little
more aesthetically interesting let's go ahead now and introduce to them mix one other language as follows let me go
ahead and create a file called home.html as though I'm making a homepage for the very first time and in this page I'm
going to give a title of home and I'm just going to have like three things first I'm going to have maybe a
paragraph of text up here at the top that says something welcoming for my homepage like my name John Harvard for
instance or John Harvard's homepage then in the middle of the page I'm going to have some text like uh welcome to my
homepage exclamation point and at the bottom of the page I'm going to have a final paragraph that says something like
copyright the copyright symbol John Harvard or something like that all right so it's like a web page with three
different structural areas made with text this isn't that interesting if I open this page called home.html let me
go ahead and create quick three quick paragraphs the first paragraph for John Harvard inside the middle I'm going to
say something like welcome to my homepage exclamation points and at the bottom whoops at the bottom a little
footer that says something like copyright a little simple Copyright symbol and uh John Harvard's name all
right now let me reload the page and there we go it's very simple very underwhelming web page that has three
main sections let's start to now stylize this in an interesting way so that it's a little more aesthetically pleasing
first these aren't really paragraphs they're sort of like areas of the page divisions like the header is up here
there's like the main part of my screen and then there's the footer of my screen so paragraphs isn't quite right if these
aren't really paragraphs of texts I might more properly call them divs or divisions of the page which is a very
commonly used tag in HTML which just has this generic rectangular region to it it does not do anything aesthetically no
bold facing no size changes it just creates an uh an invisible rectangular region inside of which you can start to
style the text or I can take this one step further there's some other tags in HTML known as semantic tags that
literally have names that describe the types of your page which is all the more compelling these days for accessibility
to for screen readers for search engines because now a screen reader a search engine can realize that footer is
probably a little fluffy the header might be a little interesting the main part of the page is probably the juicy
part that I want users to be able to search for or read aloud uh substantively so let's start to stylize
this page somehow let's introduce a style attribute in HTML inside of which is going to be text like this font size
colon large text align Colon Center on Main I'm going to add a style attribute and say font size medium text align
Center and then on the footer I'm going to say style equals font size small text align Center what's going on here well
in blue is the language we promised called CSS for cascading stylesheets we're not really seeing the cascading
stylesheet of it yet but in blue here notice is another very common Paradigm it's different syntax now but how would
you describe what you're looking at here in blue this is another example of what kind of
programming convention yeah it's just more key value pairs right it'd be nice if the world
standardized how you write key value pairs because we've now seen equal signs and arrows and colons and semicolons and
all this but it's just different languages different choices the key here is font Das size the value is is large
the other key is text- align the colon uh the value is Center the semicolon just separates one key value pair from
another just like in the URL the Ampersand did in the context of HTTP the designers of CSS used semicolons instead
strictly speaking this semicolon isn't necessary I tend to include it just for symmetry but it doesn't matter because
there's nothing after that this is a bit of a weird example this is the co-mingling of CSS inside of JavaScript
so as of now you can use the CSS language inside of the quote marks in the value of a style attribute we did
something a little similarly last two weeks a week plus ago when we included some SQL inside of python so again
languages can kind of cross barriers together but we're going to clean this up because this is going to get messy
quickly certainly for large web pages the size of Harvard's or yales or the like so let's see what this looks like
let me go back to my browser window here reload the page and it's not that different but it's indeed centered and
it's indeed large medium and small text and let me make one refinement the copyright symbol actually can be
expressed but there's no key on my US keyboard here I can actually magically say uh Amper sand1 169 semicolon using
what's called an HTML entity turns out there are numeric codes with this weird syntax that allow you to specify symbols
that exist in Macs and PCs and phones but that don't exist on most keyboards if I reload the page now now it's a
proper Copyright symbol so minor aesthetic but it introduces us to these HTML
entities so even if you've never seen CSS before you can probably find something kind of dumb about what I did
here like poor design it is correct if my goal was small medium and large bottom up what looks like a bad design
perhaps even if you've never seen this language before yeah yeah I've used the same style three times like copy paste
or typing the exact same thing again and again has rarely been a good thing well here's where we can take advantage of
the uh design of CSS because it supports what we might call inheritance whereby child children inherit the properties
the key value pairs of their parents or ancestors and what that means is I can do this let me get rid of this textaline
let me get rid of this textaline let me get rid of this one I could get rid of the semicolon to but I'll leave it for
now and let me add all of that style to the parent element the body so that it sort of Cascades down to the header the
main and the footer tags as well and let me close my quotes there too now if I go back to my browser and hit reload
nothing changes but it's a little better designed right because if I want to change the text alignment to maybe be
right aligned I can now reload the page and voila now it's over there I change it in one place not in three different
places so that would seem to be marginally better design and and could we do this any more differently well
it's not that elegant that it's all just kind of in line with my HTML this generally tends to be bad practice where
you co-mingle your HTML and your CSS especially since some of you might be really good at like laying out the
structure of web pages and the content and the data and you might have a horrible sense of design or just not
care about the Aesthetics you might work with a designer an artist who's much better at all of these fine tunings
aesthetically wouldn't it be nice if you could work on the HTML they could work on the CSS and you don't have to somehow
like literally edit the same lines of code AS each other well just like we can move stuff into header files in C or
packages in Python we can do the same in CSS so I'm actually going to go ahead and do this let me get rid of all of
these style attributes and let me now start to practice a Convention of not co-mingling CSS with my HTML let me
instead move it into the head of the page in a style tag instead of an attribute this is one of the rare
examples where the there are attributes that have the same names if tags as vice versa it's not very common but this one
does exist here's a slightly different Syntax for expressing the same key value pairs if I want to apply CSS properties
that is key value pairs to the header of the page I say header and then I use curly braces and inside of those I say
font-size large uh text-align Center then if I want to apply some properties to the main
section of the page I again do font size say medium and then I can do text align Center then lastly on the footer of the
page I can assign some properties like font size small and then text align Center semicolon and I don't have to do
anything more in my HTML it all is just represents the structure of my page but because of this style tag in the head of
the page the browser knows in advance that the moment it encounters a header tag a main tag or a footer tag it should
apply those properties those Styles if I reload the page other than it being reentered now there's no other changes
all we're doing is sort of iteratively improving the design here but now everything's in the top of the file but
there's still a bad design here what could I now do that would be smarter similar problem to before yeah
okay create a new file with just the CSS I like that let's go there in just one second but even as we're here there's
still a redundancy we can probably chip away at yeah get rid of the textalign center in three different places which
doesn't seem necessary and perhaps someone else if I get rid of textalign Center what should I add to my style tag
in order to bring it back but apply it to everything in the page and the page if I scroll down looks like this in HTML
yeah yeah so the body tag so let me go ahead and say body and then in here put text align Center and that now if I
reload the page has no visual effect but it's just better designed because now now I factored out that kind of
commonality and so just to make clear what we've been doing here these are all again CSS properties these key value
Pairs and there's different types of ways of using them and there's this whole taxonomy what we've been doing
thus far are what we're going to call type selectors where the type is the name of a tag and so it turns out
there's other ways though to do this and let's head in this direction let's go ahead and maybe write our CSS slightly
differently because you know what would be nice I bet after today once I start creating other files for my homepage or
John Harvard's homepage I might want to have Center text on other pages and I might want to have large text or medium
text or small text it'd be nice if I could reuse these properties again and again and kind of create my own Library
maybe even ultimately putting it in a separate file so let me do this instead of explicitly applying text align Center
to the body let me create a new noun or an adjective rather for myself called centered it has to start with a DOT
because what I'm doing is inventing my own class so to speak this has nothing to do with classes in Java or python
class here is this aesthetic feature and actually let me rename these to be large medium and small what this is doing for
me is it's inventing new words well-named words that I can now use in this file or potentially in other web
pages I make as follows I can now say if I want to Center the whole body I can say class equals centered on the header
tag I can say class equals large on the main tag I can say class equals medium on the footer tag I can say class equals
small but let me take this one step further as you suggested why don't I go ahead now and let me actually get rid
let me grab all the CSS copy it to my clipboard let me get rid of the style tag here and create a new file called
home. CSS and let me just save all of that same text in a separate file ending in CSS nothing else no HTML whatsoever
but let me go back to my home.html page and this is one of the most annoyingly named tags because it doesn't really
mean what it does link href home. CSS Rel equals stylesheet so ideally we would have used the link tag for links
in web pages but this is Link in the sort of conceptual sense we're linking this file to this other one so that they
work together using this hyper reference home. CSS the relationship of that file to this one is that of stylesheet a
stylesheet is a file containing a whole bunch of stylizations a whole bunch of properties as we just did so here too
it's underwhelming the effect if I reload the page nothing changed but now I not only have a better design here
because I can now use those same classes in my second page that I might make my third page my fourth page my bio my you
know resume page whatever it is I'm making on on my website here I can reuse those styles by just including one line
of code instead of copying and pasting all of that style stuff into file after file after file and Heck if the rest of
the world is really impressed by my centered class and my large and medium and small classes I could bundle this up
let other people on the internet download it and I have my own Library my own CSS library that other people can
use why should you ever invent a centered class again if I already did it for you stupid and small as this one is
but it would be nice now to package this up in a way that's usable by other people as well so this is perhaps the
best design when it comes to CSS use classes where you can use external style sheets where you can but don't use the
style attribute where we began which while explicit starts to get messy quickly especially for large
files all right any questions then on this no all right so that's class selectors when you specify dot something
that means you're selecting all of the tags in the page that have that particular class and applying those
properties so there's a couple of others here just to give you a taste now of what's possible there there's so much
more that you can actually do with HTML and CSS together let me go ahead and open up a few examples that I did here
in advance let me go ahead and open up vs code and let me go ahead and copy um my source 8 directory
give me one sec to grab the source 8 directory for today's lectures so that I can now go into my
browser go into some of the pre-made examples in Source a and let me open up paragraphs one here so here's something
it's a little subtle but does anyone notice how this is stylized this is just some generic lurum
text again but what's noteworthy stylistically a book might do this yeah yeah the first paragraph's a little
bigger why who knows it's just a stylistic thing at the beginning of the chapter the first paragraph is bigger
how did we do that well we can actually explore this in a couple of ways one I can obviously go into vs code and show
you the code but now that we're using Chrome and we're using these developer tools let's again go into them view
developer developer tools and now notice let me turn off the mobile feature and let me move the doc back to the bottom
just so that it's fully wide we looked at the network tab before we looked at the mobile button before now let me
click on elements what's nice about the elements tab is you can see a pretty printed version of the web pages HTML
nicely colorcoded syntax highlighted for you so that you can now henceforth learn from look at the source code the HTML
source code of any web page on the internet notice that my own web page here it's not that interesting there's a
bunch of paragraph tags of Laur some text but notice what I did the very first one I gave an ID to this is
something that you as a web designer can do you can give an ID attribute to any tag in a page to give it a unique
identifier the onus is on you not to reuse the word anywhere else if you if you reuse it you've screwed up it's
incorrect Behavior but I chose an ID of first just so that I have some way of referring to the very first paragraph in
this file if I look in the head of the page and the style tag here notice that I have hash first so just as I use for
classes the world of CSS uses a hash symbol to represent IDs unique IDs and what this is telling the browser
whatever element has the first ID f i St without the hash apply font size larger to it and that's why the first paragraph
and only the first paragraph is actually stylized if I actually go into VSS code now and let me go into my source
directory let me open up paragraphs 1. HTML here's the actual file if I want to change the color of that first paragraph
to Green for instance I can do color colon green let me close the developer tools reload the page and now that page
is green as well you don't have to just use words you can use um heximal what was the hex code for Green in
RGB like no red lots of green no blue so you could do 0000 ff0000 using a hash which coincidentally is the same symbol
but it has nothing to do with IDs this this is just how Photoshop and web pages represent Colors Let's go back here and
reload it's the same although it's slightly different version of green this is pure green here if I want to change
it to Red that would be let's see RGB ff0000 and here I can go and reload now it's first paragraph red this actually
gets pretty tedious quickly like if you're a web designer trying to make a website for the first time it actually
might be fun to Tinker with the website before you open up your editor and you start making changes and save and reload
that's just more steps so notice what you can do with developer tools too in Chrome and other browsers when I
highlight over this paragraph under the elements tab notice that one it gets highlighted in blue if I move my cursor
it doesn't get highlighted if I move it it gets highlighted so it's showing me what that tag represents but notice over
here on the right you can also see all of the stylizations of that particular element some of them are built in the
italicized ones here at the bottom means user agent stylesheet that means this is what Google makes all paragraphs look
like by default but in non-italicized here you see hash first which is my code that I just changed and if I want to
start tinkering with colors I can do like 00000000 FF enter I changed it to Blue but notice if I go back to vs code
I didn't change my original vs code code this is now purely client side and this is a key detail when I drew that picture
earlier of the browser going making a request to the cloud the server in the cloud and the response coming back the
browser your Mac your PC your phone has a copy of all the HTML and CSS so you can change it here however you actually
want and for instance you can do this with any website let's go uh say on a a field trip here to uh how about
stanford.edu so here's Stanford's website as of today uh let's go ahead here and let's see there's their
admissions page campus life and so forth let me go ahead and view developer tools on Stanford's page developer tools
elements you can see all of their HTML and notice it's collapsed so here is their header here's their main part and
you can I'm using my keyboard shortcuts to just open and close the tags to dive in deeper and deeper suppose you want to
kind of mess with Stanford you can actually like right click on any element of a page or control click inspect and
that's going to jump you automatically to the tag in the elements tab that shows you that link and notice if I
hover over this Li notice Stanford's using a list as an unordered list from left to right
though it doesn't have to be a bulleted list top to bottom they've used CSS to change it to be a list from news events
academics research Healthcare campus admission about well so much for admission that's gone so now if I close
developer tools now it's gone from Stanford's website but of course what have I really
done I've just like mutated my own local copy so this is not hacking even though this might be how they do it in TV in
the movies it's still there if I reload the page but it's a wonderfully powerful way to one just iterate quickly and try
different things stylistically figure out how you want to design something and two just learn how Stanford did
something so for instance if I right click or control click on admission again go to inspect and let me go to the
LI tag let me keep going up up up up up to the UL tag there's going to be a lot going on here but notice they have
applied all of these CSS properties to that particular UL tag but notice here this is how it's something like this and
we'd have to read more to learn how this works list style type none this is how they probably got rid of the bullets and
what you can do is just Tinker like all right well what does this do well let me uncheck it all right didn't really
change anything font weight uncheck this there we go so now the margin is changed the padding around it has changed let's
get rid of this we can just start turning things on and off just to get a sense of how the web page works I'm not
really learning anything here so far let me go to the LI here for uh let's go to the admissions one here um
margin there we go okay so when you there's a display property in CSS that's apparently effectively changing things
from vertical to horizontal if I turn that off now Stanford's links all look like this and there are those bullets so
again just default styles that they've somehow overridden and a good web designer just knows ultimately how to do
these kinds of things all right have a couple final building blocks before we'll take one more break and then we'll
dive in with JavaScript to manipulate this stuff programmatically let me go ahead and open up how about paragraphs
two here let me close this tab let me go into paragraphs 2 which is pre-made and this one looks the same except when I go
ahead and inspect this first paragraph notice that I was able to get rid of the ID somehow which is just to say there's
many many ways to solve problems in HTML and CSS just like there is in CN python let me look in the head and the style of
the page now this is what uh we might call a um this is another type of selector that allows us to
specify the paragraph tag that itself happens to be the first child only so you can apply CSS to a very specific
child namely first child there's also Syntax for last child if just the first one is supposed to look a little
different so here I've just gotten out of the business of creating my own unique identifier and instead I'm using
this type of selector as well well what more can we do let me go into another example here
called link one. HTML and here we have a very simple page that just says visit Harvard but notice it's purple by
default because we've been to harvard.edu before let's see if we can't maybe stylize Harvard's links to be a
little different let me go into link version two now which looks like this and now Harvard is very red how did I do
that well let me right click on it click inspect and I can start to poke around it looks like my HTML is not at all
noteworthy it's just very simple HTML anchor tag with an HF so let's look at the style let me zoom out and we can
look at it in two different ways we can literally look at the style contents here or we can look at Chrome's pretty
version of it over here it looks like my style sheet in the style tag has changed the color to be red and the text
decoration which is a new thing but it's another CSS property To None notice if I turn that off links on the internet are
underlined by default which tends to be good for familiarity for visibility for accessibility but if it's very obvious
what is text and what is a link maybe you change text decoration to none but maybe watch this maybe the link comes
back the line comes back when you hover over it well let's look at how I did this in style notice that I have
stylization and I put my uh curly braces on the same line here as tends to be convention in CSS color is red text
decoration is none but whenever an anchor tag is hovered over you can change the deor text decoration to be
back to the default underline so again just little ways of playing around with the Aesthetics of the page once you
understand that really there's just different types of selectors and you might have to remind yourself look them
up occasionally as to what the syntax is but it's just another way of scoping your properties to specific tags let's
look at a version three of this here which adds Yale to the mix if I go to link 3. HTML maybe I want to have
Harvard links read uh Yale links blue how might I have done this well let's right click and click
inspect and here we might have two links with a couple of techniques just to again emphasize you can do this so many
different ways I gave my Harvard link and ID of Harvard my Yale link an ID of Yale in my CSS if we go to the head of
the page I then did this the tag with the Harvard ID AKA Harvard should be red hash Yale should be blue and then any
anchor tag should have no text decoration unless you hover over it at which point it should be underlined and
so if I hover over Harvard it's red underlined Yale it's blue underlined if I want to get rid of the IDS I can do
this a slightly different way let me go into link four same effect but notice I got rid of the IDS now how else can I
express myself well let's look at the CSS here the anchor tag has no text decoration by default unless you're
hovering over it and this is kind of cool this is what we would call on our list here an attribute selector where
you select tags using CSS notation based on an attribute so this is saying go ahead and find any anchor tag whose href
value happens to equal this URL and make it red do the same for Yale and make it blue now this might not be ideal because
if there's something after the slash these equal signs don't work because if it's a different Harvard or different
Yale link this is a little too precise so let me look at version five here of link. HTML look at at this style and I
did this a little smarter this is new syntax and again just the kind of thing you look up star equals means change any
anchor tag whose href contains anywhere in it harvard.edu to red and do the same thing for Yale based on Star equals so
star here connotes wild card so search for harvard.edu or yale. anywhere in the href and if it's there colorize the link
and again we could do this all day long with dimin returns to actually achieve the same kind of stylizations in
different ways and as projects just get larger and larger you just have more and more decisions to make and so you have
certain conventions you start to adopt and indeed if I may you have the introduction of what are called
Frameworks ultimately if you're a full-time web developer or you're working for a company doing the same you
might have internal conventions that you adhere to for instance the company might say always use classes don't use IDs or
always use attribute selectors or don't use this and it wouldn't be necessarily as draon is that Draconian is that but
they might have a style guide of sorts but what many people and many companies do nowadays is they do not come up with
all of their own CSS properties they start with something off the shelf a framework typically a free and open
source framework that just gives them a lot of pretty stylizations for free just by using a thirdparty library and one of
the most popular ones nowadays is something called bootstrap that cs50 uses on all of its websites super
popular in Industry as well um it's at uh get bootstrap.css that documents the library
that they offer and there's so much documentation here but let me just go to things like uh how about components it
just gives you out of the box the CSS with which you can create little alerts if you've ever noticed on cs50's website
little colorful warnings at the top of the page or call outs to draw your attention to things how did we do that
it's probably a paragraph tag or a tag and maybe we change the font color we Chang the background color right it's a
lot of stuff we could absolutely do from scratch but you know what why do why reinvent the wheel if we can just use
bootstraps so for instance let me just scroll down if you've ever seen on cs50's website a yellow warning alert
like this let me just zoom in on this we are just using HTML like this we're using a div tag which again is an
invisible division a rectangular region of the page but we're using classes called alert and another class called
Alert warning those are classes that the friend the the folks at uh bootstrap invented they Associated certain text
colors and background colors and padding and margin and like other Aesthetics with so all we have to do is use those
classes roll equals alert just makes clear to like a screen reader that this is an alert that should probably be
recited and whatever's in between the open tag and Clos tag is what the human would see how do you use something like
bootstrap well you just read the documentation under getting started there is a link tag you copy paste into
your own so let me do this so in table. HTML we had code like this let me actually read bootstraps documentation
really fast and they tell me dot dot dot copy paste this code I'm going to put this into the head of my page and it's
quite long but notice it's a link tag which I used earlier for my own CSS file the href of which is this CDN link
content delivery Network that's referring to a specific version of bootstrap that's available on this day
and the file that I'm including is called bootstrap.min.css this is an actual file
I can visit with my browser if I open this in a separate tab this is the CSS that bootstrap has made freely available
to us crazy long no Whit space that's because it's been minimized just to not waste Space by adding lots of white
space and comments but this contains a whole lot hundreds of CSS properties that we can re reuse thanks to classes
that they invented if I want to use some JavaScript code I can also copy this script tag but come back to that before
long let me now just make a couple of tweaks to this this table if I go into my browser from before this is what it
looked like previously where name and number were bolt but centered and then Carter and David were on the left and
the numbers were to the right you know it's fine it's not that pretty but it'd be nice if it were a little prettier
than that so if we add bootstrap into it notice one thing happens first when I reload the page no longer are Chrome's
default styles used Now bootstraps default styles are used which is a way of enforcing similarity across Chrome
Edge Firefox Safari and others notice it went from a sarap font to a sand serap font and something cleaner like this it
still looks pretty ugly but let me go into bootstraps documentation let me go under their uh content tab for tables
and if I just kind of start skimming this these are some goodlooking tables right like there's some underlining here
some uh uh it's Boulder font there's a dark line if I keep going oo that's getting pretty too if I want to have a
colorful table like I could figure all of this stuff out myself if I want sort of dark mode here uh if I want to have
uh alternating highlights and so forth there's so many different stylizations of tables that I could do myself but I
care about making a phone book not about Reinventing these wheels so if I read the documentation closely it turns out
that all I need to do is add bootstraps table class to my table tag and watch with a simple reload what my now table.
HTML file looks like much nicer right might not be what you want but my god with like two lines of code I just
really prettied things up and so here then is the value of using something like a framework it allows you to
actually create much prettier much more uh much more userfriendly websites than you might otherwise be able to make on
your own certainly quickly in fact let's iterate one more time on one other example before we introduce a bit of
that code let me go ahead head and open up uh search. HTML from before which recall looked like this and search. HTML
in my browser was this very simple Google search and suppose I want to reinvent google.com's UI a bit more
here's a screenshot of google.com on a typical day it's got an about link a store link Gmail images these weird dots
sign in their logo it's not appearing well on the screen here but there's a big text box in the middle and then two
buttons Google search and I'm feeling lucky well could I maybe go about implementing this UI myself using some
HTML some CSS and maybe bootstraps help just so I don't have to figure out all of these various stylizations well
here's my starting point in search. HTML let's go and add in bootstrap first and foremost so that we have access to all
of their classes that are reusable now and let me go ahead and figure out how to do this well just like Stanford's
site had like nav navigation bar using a UL but they changed it from being a bulleted list to being left to right I
bet I can do something like this myself so let me go into the body of my page and first based on bootstraps
documentation let me add a div called uh a div with a class of container fluid container fluid is just a class that
comes with bootstrap that says make your web page fluid that is grow to fill the window so that way it's going to resize
nicely I'm going to go ahead and uh fix my indentation here if you haven't discovered this yet if you highlight
multiple lines in VSS code you can hit Tab and indent them all at once so now I have all of that inside of this div now
just like in Stanford site let's create an unordered list that has maybe An Li uh called uh with a class of nav item
and then in here whoops in here let me go ahead and say uh
a HF equals hdps col SL slab. gooogle which is the real URL of Google's about page and I'll put the about text in
there then I'm going to close my li tag here and I want to do one other thing because I'm using bootstrap bootstrap's
documentation if I read it closely says to add a class to your links called like nav link and text dark to make it dark
uh like black or dark gray instead of the default blue all right so I think I have now an about Link in a navigation
part of my screen let me go ahead and save this and reload all right so not exactly what I
wanted it's a bulleted list still so I need to override this somehow let me read bootstraps documentation a little
more clearly and let me pretend to do that for time sake if I go under content whoops if I go under components and I go
to navs and tabs long story short if you want to create a pretty menu like this where your links are from the left to
the right just like Stanford I essentially need HTML like this and this is is subtle but I left off this class I
should have added a class called nav on my UL so that was my bad let me go in here and say add class equals nav and
then again this class nav item bootstrap told me to nav link text dark bootstrap told me to let me go back to my page
here reload and okay still kind of ugly but at least the about link is in the top left hand corner just like it should
be in the real google.com now let me whip up a couple of more links real fast let me go and do a little copy paste so
I bet next week we can avoid this kind of copy paste let me change this link to uh be store. google.com the text will be
store uh let me go ahead and create another one here for Gmail so this one's going to go to
officially uh how about technically it's www.google.com Gmail normally it just redirects and let me grab one more of
these and for Google images and I'm going to paste this whoops I'm going to come on I'm going to put this here too
this is going to be images and that URL is IM GHP is their URL all right let me go
ahead and reload the browser page now it's coming along right about store Gmail images it's not quite what I want
so I'd have to read the documentation to figure out how to maybe nudge one of these over to start right aligning it
and there's a couple of ways to do this but one way is if I want Gmail to move all the way over and push everything
else I can say that at uh add some margin to the Gmail list item margin start Auto this is in bootstraps
documentation a way of saying whatever space you have just automatically shove everything apart and now if I reload the
page again now voila Gmail and images is over to the right all right so now we're kind of moving along let me go ahead and
add the big blue button to sign in so here with signin let me go ahead and over in my same nav yeah so let's go
ahead and do one more Li class equals nav item and then inside of this Li tag what am I going to do turns out there is
a class that can turn a link into a button if you say BTN for button and then button primary makes it blue the
hrf for this one is going to be https accounts.google.com service login which is literally where
you go if you click on that Big Blue Button the goal of this link is that of button and then sign in is going to be
the text on it if I now reload the page now we're getting even closer although it looks a little stupid notice that
signin is way in the top right hand corner whereas the real google.com has a little bit of margin around it okay
that's an easy fix too let me go back into my HTML here let me add margin -3 this two is a bootstrap thing they have
a class called m- something the something is a number from like one to five I believe that adds just some
amount of white P white space so if I reload now okay it's just a little prettier and now let me accelerate just
to demonstrate how I can take this home let me go ahead and open up my pre-made version of this whereby I added to this
some final flourishes if I go to search 2html I decided to replace their logo with just this out of a cat and notice
that I reimplemented essentially google.com here's a text box here's two buttons even though they're a little
washed out on the screen I even figured out how to get dots that look pretty similar to Google's and if we view
Source you can see how I kind of finished this code if I go to view developer tools and I go to
elements and I go into this div and I go into this div you'll see that here's an image tag for happy cat and I added some
classes there to make it fluid and with 25% of the screen if I go into the form tag this is the same form tag as before
but notice I used button tag tags this time with button and button light classes and then I stylize them in a
certain way and so in the end result if I want to go ahead and search now for birds and click Google search voila I've
implemented something that's pretty darn close to google.com without even touching raw CSS myself and now here's
the value then of a framework you can just start to use offthe shell functionality that someone else created
for you but if you want to make refinements you don't really like the shade of blue that bootstrap chose or
the gray button or or you want to curve things a bit more that's where you can create your own CSS file and do the last
mile sort of fine-tuning things and that tends to be best practice stand on the shoulders of others as much as you can
using libraries and then if you really don't like what the library is doing then use your own skills and
understanding of HTML and CSS to refine things a bit further but still after all of that all of these examples we've done
thus far are still static other than the Google one which searches on the realg google.com let's take a final 5 minute
break and we'll give you a sense of what we can next do next week onward with JavaScript see you in five all right so
I think it's fair to say we're about to see our very last language next week and final projects are ultimately going to
be about synthesizing so many of these thankfully this language called JavaScript is quite similar
syntactically to both C and Python and indeed if you can imagine doing something in either of those you can
probably do it in some form in JavaScript the most fundamental difference today though is that when you
have written code and python code thus far you've done it on the server you've done it in the terminal window
environment and when you run the code it's running in the cloud on the server the difference now today with JavaScript
is even though you're going to write it in the cloud using VSS code recall that when a browser gets the page containing
this code it's going to get a copy of the HTML the CSS and the JavaScript code so JavaScript that we see today is all
going to be executed in the browser on users own Macs PCS and phones not in the server JavaScript can be used on the
server using an environment called node.js it's an alternative to python or Ruby or Java or other languages we are
using it today client side which is a key difference so in scratch let's do this one last time if you wanted to
create a variable in scratch setting counter equal to zero in JavaScript it's going to look like this you don't
specify the type but you do use the keyword let and there's a few others as well that say let calendar equals zero
semicolon uh if you want to increment that VAR Able by one you in JavaScript could say something like counter equals
counter plus one or you can do it more succinctly with plus equals or the Plus+ is back in JavaScript you can now say
counter Plus+ semicolon again in scratch if you wanted to do a conditional like this asking if x less than y it looks
pretty much like C the parentheses are unfortunately back uh the curly braces here are back if you have multiple
statements in particular but syntactically it's pretty much the same as it was for if for if else and even
for it's else if else unlike python it's two words again else if so quite quite like C nothing new beyond that if you
want to do something forever in scratch you'd use this block in JavaScript you can do it a few ways similar to python
similar to C you say while true in JavaScript booleans are lowercase again just like in C so it's lowercase true if
you want to do something a finite number of times like repeat three times looks almost like C as well the only
difference really is using the word let here instead of in and again you'll use let to create a string or an INT or any
other type of variable in JavaScript the browser will figure out what type you mean from Context uh in C we would have
said int instead ultimately this language and that's it for our tour of JavaScript syntax there's Bunches of
other features but syntactically it's going to be that accessible relatively speaking the power of JavaScript running
in the user's browser is going to be that you can change this thing in memory think about most any website that's at
all interesting today that you use it's typically very interactive and dynamic if you're sitting in front of Gmail on a
laptop or desktop with the browser tab open and someone sends you an email all of a sudden another row appears in your
inbox another row another row how is that implemented honestly it could be an HTML table maybe it's a bunch of dibs
top to bottom the point though is you don't have to hit command r or controlr to reload the page to see more email it
automatically appears every few seconds or minutes how is that working when you visit
gmail.com You are downloading not just HTML and CSS with your initial inbox presumably you're downloading some
JavaScript code that is designed to keep talking every second every 10 seconds or something to Gmail servers and they then
are using their code to add another element another element another element to the existing Dom document object
model which is the fancy term for tree in memory that represents HTML so that the web page can continue to update in
real time Google Maps same thing if you click and drag and drag and drag your browser did not download the entire
world to your Mac or PC by default it only downloaded what's in your viewport the rectangular region but when you
click and drag it's going to get some more tiles up there some more images some more images as you keep dragging
using JavaScript again behind the scenes so let's actually use JavaScript to start interacting with Pages how can we
do this we can put the JavaScript code in the head of the page in the body of the page or even Factor it out to a
separate file so let's take a look here is a new version of hello.html that during the break I just added a form to
because it'd be nice if this page didn't just say hello title hello body it said hello David hello Carter hello whoever
uses it I've got a form that I borrowed from some of our earlier code and that form has an input whose ID is name and
whose uh that also has a submit button but there's no code in this yet so let's add a little bit of JavaScript code as
follows suppose that when this form is submitted I want to greet the user how can I do that well let's do it the
somewhat messy way first I can add an attribute called onsubmit to the form element and I can say onsubmit call a
function called greet close quotes unfortunately this function doesn't yet exist but I can make it exist but
there's another detail here when the user clicks submit normally forms get submitted to the server I don't want to
do that today I want to just submit the form to the browsers keep on the same page and just print on the screen hello
David or so forth so I'm also going to go ahead and say return false and this is a JavaScript way of telling the
browser even when the user tries to submit the form return false like no don't let them actually submit the form
but do call this function called greet in the head of my page I'm going to add a script tag wherein the language is
implicitly JavaScript and has no relationship for those of you who took apcs with Java just a similarly named
language but no relation I am going to name a function called greet apparently in JavaScript the way you create a
function is you literally say the word function instead of defa you don't specify a return type and in this
function I could do something like this uh alert quote unquote uh how about hello there initially I'm going to keep
it simple using a buil-in function called alert which is not a good user interface there are better ways to do
this but we're doing something simple first let me now go ahead and load this page again it still looks as simple as
before with just a simple text box I'll zoom in to make it bigger I'm going to type my name but I think it's going to
be ignored when I click submit it just says hello there and this is again this is an ugly user interface it literally
says the whole codespace URL of the web page is saying this to you it's really just meant for uh simple interactions
like this for now all right let's have it say hello David somehow well how can I do this well if this element on the
page was given by me a unique ID it'd be nice if just like in CSS I can go grab the value of that text box using code
and I actually can let me go ahead and do this let me store in a variable called name the result of calling a
special function called document. query selector this query selector function is javascript's version of what we were
doing in CSS to select nodes using hashes or dots or other syntax it's the same syntax so if I want to select the
element whose unique ID is name I can literally just pass in single or double quotes hash name just like in CSS that
gives me the actual node from the tree it gives me one of these rectangles from the Dom the document object model if I
actually want to get at the specific value therein I need to go one step further and sayt value so similar in
spirit to python where we saw a lot of dot notation where you can go inside an object inside of an object that's what's
going on long story short in JavaScript there is a special Global variable called document that you just do stuff
with the document the web page itself one of those functions is called query selector that function returns to you
whatever it is you're selecting and value means go inside of that rectangle and grab the actual text that the human
typed in so if I want to now say hello to that person the syntax is a little different from CN python I can use
concatenation which actually does exist in Python but we didn't use it much I can go ahead and say hello quote unquote
uh hello Plus name all right now if I go back to the browser window reload the page to get the latest version of the
code Type in David and click submit now I see hello David not the best website but it does demonstrate how I can start
to interact with the page but let me stipulate that this comingling of languages is never a good thing it's
fine to use classes but using style equals quote unquote in a whole bunch of CSS that was not going to scale well
once you have lots and lots of properties same here once you have more and more code you don't want to just put
your code inside of this onsubmit Handler so there's a better way let's get rid of that onit attribute and
literally never use it again that was for demonstration sake only and let's do this let me move the script tag actually
just below the form but still inside the body so that the script tag exists only after the form tag exists logically just
like in Python your code is read top to bottom left to right and let me now do this let me Define this function called
greet and then let me do this document. query selector let me select the form on the page it doesn't have a unique ID it
doesn't need to I can just reference it by name form because there's only one of them and let me call this special
function add event listener this is a function that listens for events now this is actually a term of art within
programming many different languages are governed by events and pretty much any user interface is governed by events
especially phones on phones you have touches and you have uh drags and you have long press and you have pinch and
all of these other gestures on your Mac or PC you have click you have drag you have key down key up as you're moving
your hands up and down on the keyboard This is a non-exhaustive list of all of the events that you can listen for in
the context of web programming and this might be a throwback to scratch where recall scratch let you broadcast events
and we had the two puppets sort of talking to one another via events in the world of web programming game
programming any human physical device these days they're just governed by events and you write code that listens
for these events happening so what do I want to listen for well I want to add an event listener for the submit event and
when that happens I want to call the Greet function like this so this is kind of interesting thank you I have my greet
function as before no changes but I'm adding one line of code down here I'm telling the browser to use document.
Query selector to select the form then I'm adding an event listener specifically for the submit event and
when that happens I call greet notice I am not using parentheses after greet I don't want to call greet right away I
want to tell the browser to call greet when it hears this submit event now let me go ahead and um let me go ahead
and deliberately I think trip over something here let me type in my name David submit and there we go all right
hello David all right but let's let's now make this slightly better designed right now I'm defining a function greet
which is fine but I'm only using it in one place and you might recall we we stumbled on this in Python where I was
like why are we creating a special function called get value when we're only using it like one line later and I
we introduced what type of function in Python the other day yeah so Lambda functions Anonymous
functions you can actually do this in JavaScript as well if I want to define a function all at once I can do this let
me cut this onto my clipboard paste it over here let me fix all of the alignment let me get rid of the name and
I can actually now do this the syntax is a little weird but using now just these four lines of code I can do this I can
tell the browser to add an event listener for the submit event and then when it hears that call this function
that has no name and unlike python this function can have multiple lines which is actually a nice thing it looks a
little weird there's lot of indentation and curly braces going on now but you can think of this as just being run
these two lines of code when the form is submitted but if I want to block the form from actually being submitted I got
to do one other thing and you would only know this from being told it or reading the documentation I need to do this
function prevent default passing in this e argument which is a variable that represents the event more on that
another time that just allows us to prevent whatever the default handling of that particular event is is so long
story short this is representative of the type of code you might write in JavaScript whereby you can actually
interact with your code the user's actual form and we can do interesting things too built into browsers nowadays
as functionality like this so here's a very simple example that has just three buttons in it one red one green one blue
well it turns out using JavaScript you can control the CSS of a page programmatically I can change the
background of the body of the page to Red to green to blue just by listening for clicks on these buttons and then
changing CSS properties just to give you a taste of this if I view the Page's Source similar code here I can select
the red button by an ID that I apparently defined on it right up here I can add an event listener this time not
for submit but for click and when it's clicked I execute this one line of code and this one line of code we haven't
seen before but you can go into the body of the page it's style property and you can change its background color to Red
this is one example of like two different groups talking to one another in advance in CSS properties that have
two words are usually hyphenated like background-color unfortunately in JavaScript if you do something-
something that's subtraction which is logically nonsensical here so in CSS you can convert background- color to in
JavaScript background color where you capitalize the C and you get rid of the minus sign what else can we do here well
back in the day there used to be a blink tag and it's one of the few historical examples of a tag that was REM removed
from HTML because in the late '90s early 2000s this is what the web looked like there was a lot of this kind of stuff
there was even a Mary that would move text from left to right over the screen and the web was a very ugly place um I
will admit my very first web page probably used both of these tags but how can we bring it back well this is a
version of the blink tag implemented in JavaScript how I wrote some code in this example that Waits every 500
milliseconds to change the CSS of the page to be visible invisible visible invisible because built into JavaScript
is support for a clock so you can just do something on some sort of schedule let me go ahead and open up this example
autocomplete let me Zoom back out in autocom complete. HTML I whipped up as an example that has just a text box but
I also grabbed the dictionary from problem set 5 speller so that if I want to search for something like apple this
searches that 140,000 words using JavaScript to create what we know in the world of the web as autocomplete when
you start searching for something you sure start to see words that start with that phrase and sure enough if I search
for something like banana here's the three variants of bananas that appear in that file and so forth how is that
working just JavaScript when it finds matching F uh words it's just updating the Dom the tree in the computer's
memory to show more and more text or less and for one final example this is how programs like door Dash and uh
Google Maps and Uber Eats and so work you have built into browsers today some fancy apis application programming
interfaces whereby you can ask for information about the user's device for instance here I wrote a program in
geolocation HTML that's apparently asking to know my location all right let me go ahead and allow it this time if
that's something you're comfortable with on your own device it's taking a moment because sometimes these things take a
little while to analyze but hopefully in just a moment there are apparently my GPS coordinates and as a final flours
today for what you can do with a little bit of HTML for your structure CSS for your style and now JavaScript for your
logic which will'll tie in again next week let me go ahead and search Google for those GPS coordinates zoom in here
on Google Maps and if we zoom in in in okay we're pretty close we're not on that street but there oh there it is
actually there's the marker it had put for us we're indeed here in Memorial Hall so all that with JavaScript with
the basic understanding of the Dom and the document object model we'll pick up where we left off next week and now add
a backend see you next time [Music] all right this is cs50 and this is week
nine and and this is kind of it in terms of programming fundamentals today we come rather full circle with so many of
the languages that we've been looking at over the past several weeks and with HTML and CSS and JavaScript last week
we're going to add back into the mix Python and SQL and with that do we have the ability to program for the web and
even though this isn't the only user interface out there increasingly are people certainly using laptops and
desktops and a browser to access applications that people have written but it's also increasingly the way that
mobile apps are written as well there are languages called Swift for iOS there are languages called Java for Android
but coding applications in both both of those language means knowing twice as many language building twice as many
applications potentially so we're increasingly seeing For Better or For Worse that the world is starting to
really standardize at least for the next some number of years on HTML CSS and JavaScript coupled with other languages
like Python and SQL on the so-called back end and so today we'll tie all of those together and give you the last of
the tools in your toolkit with which to tackle final projects to go off into the real world ultimately and somehow solve
problems with programming but we need an additional tool today and we're we've sort of outgrown HTTP server this is
just a program that comes on certain computers that you can install for free happens to be written in a language
called JavaScript but it's a program that we've been using to run a web server in vs code but you can run it on
your own Mac or PC or anywhere else but all this particular HTTP server does is serve up static content like HTML files
CSS files JavaScript files maybe images Maybe video files but just static content it has no ability to really
interact with the user Beyond Simple clicks you can create a web form and serve it visually using HTTP server but
if the human types in input into a form and clicks submit unless you submit it elsewhere to something like google.com
like we did last time it's not actually going to go anywhere because this server can actually process the requests that
are coming in so today we're going to introduce another type of server that comes with python that allows us to not
only serve web pages but also process user input and recall that all that input is going to come ultimately from
the URL or more deeply inside of those virtual envelopes so here's like the canonical URL we talked about last week
for random website like www.example.com and I've highlighted the slash just to connote the the root of
the web server like the default folder where presumably there's a file called index.html or something else in there
otherwise you might have a more explicit mention of the actual file name file. HTML you can have folders as you
probably gleaned from the most recent problem said you can have files in folders like this and these are all
examples of what a programmer would typically call a path so it might not just be a single word it might have
multiple slashes and multiple folders and so folders and files but this is just more generally known as a path but
there's another term of art that's essentially equivalent that we'll introduce today this is also
synonymously called a route which is maybe a better generic description of what these things are because it turns
out they don't have to map to that is refer to a specific folder or a specific file you can come up with your own
routes in a website and just make sure that when the user visits that you give them a certain web page if they visit
something else you give them a different web page it doesn't have to map to a very specific file as we'll soon see and
if you want to get input from the user just like Google does like Q equals cats you can add a question mark at the end
of this route uh the key or the HTTP parameter name that you want to Define for yourself and then equal some value
that presumably the human typed in if you have more of these you can put an Amper sand and then more key equals
value pairs Ampersand repeat repeat repeat the catch though is that using what the tools that we had last week
alone we don't really have the ability to parse that is to analyze and extract things like Q equals cats you could have
appended question mark Q equals cats or anything else to any of your URLs in your homepage for problem set 8 but it
doesn't actually do anything useful necessarily unless you use some fancy JavaScript the server is not going to
bother even looking in that for you but today we're going to introduce using a bit of python and in fact we're going to
use a web server implemented in Python instead of using HTTP server alone to automatically for you look for any key
value pairs after the question mark and then hand them to you in the form of a python dictionary recall that a
dictionary in Python a dict object is just key value pairs that seems like a perfect fit for these kinds of
parameters and you're not going to have to write that code yourself it's going to be handed to you by way of what's
called a framework so this will be the second of two Frameworks really that we look at in the class and a framework is
essentially a bunch of libraries that someone else wrote and a set of conventions therefore for doing things
so those of you who really started dabbling with bootstrap this past week to make your homepages prettier and
nicely laid out you were using a framework why well you're using libraries code that someone else wrote
like all the CSS maybe some of the JavaScript that the bootstrap people wrote for you but it's also a framework
in the sense that you kind have to go all in like you have to use bootstraps classes and you have to kind of lay out
your divs or your spans or your table tags in a sort of bootstrap friendly way and it's not too onerous but you're
following conventions that a bunch of humans standardized on so similarly in the world of python is there another
framework we're going to start using today and whereas bootstrap is used for CSS and JavaScript flask is going to be
used for Python and it just solves a lot of common problems for us it's going to make it easier for us to analyze the
URLs and get at key value pairs it's going to make it easier for us to find files or images that the human wants to
see when visiting our website it's even going to make it easier to send emails automatically like when someone fills
out a form you can dynamically using Code send them an email as well so flask and with it some related libraries is
just going to make stuff like that easier for us and to do this all we have to do is adhere to some pretty
minimalist requirements of this framework we're going to have to create a file for ourselves called app.py this
is where our web app or application is going to live if we have any libraries that we want to use the convention in
the python world is to have a very simple text file called requirements.txt where you list the names of those
libraries top to bottom in that text file similar in spirit to the include or the import statements that we saw in C
and python respectively we're going to have a static folder or static directory which means any files you create that
are not ever going to change like images CSS files JavaScript files they're going to go in this folder and then lastly any
HTML that you write web pages you want the human to see are going to go in a folder called templates so this is again
evidence of what we mean by a framework like do you have to make a web app like this no but if you're using this
particular framework this is what people decided would be the human conventions if you've heard of other Frameworks like
d Jango or asp.net or Bunches of others there are just different conventions out there for creating applications flask is
a very nice micro framework and that that's it like all you have to do is kind of adhere to these pretty
minimalist requirements to get some code up and running all right so let's go ahead and make a web app let me go ahead
and switch over to VSS code here and let me practice what I'm preaching here by first creating
app.py and let's go ahead and create an application that very simply maybe says hello to the user so something that
initially is not all that Dynamic pretty static in fact but we'll build on that as we've always done so in app.py what
I'm going to do first is exactly the line of code I had on the screen earlier from flask import flask with a capital f
second and a lowercase f first and I'm also going to preemptively import a couple of functions render template and
request more on those in just a bit and then below that I'm going to say go ahead and do this give me a web a
variable called app that's going to be the result of calling the flask function and passing ing in it this weird
incantation here name so we've seen this a few weeks back when we played around with python and we had that if main
thing at the bottom of the screen for now just know that underscore uncore name underscore uncore refers to the
name of the current file and so this line here simple as it is tells python hey python turn this file into a flask
application flask is a function that just figures out then how to do the rest the last thing I'm going to do for this
very simple web application is this I'm going to say that uh I'm going to have a function called index that takes no
arguments and whenever this function is called I want to return the results of rendering a template called
index.html and that's it so let's assume there's a file somewhere haven't created it yet called index.html but render
template means render this file that is print it to the user's screen so to speak the last thing I'm going to do is
I have to tell flask when to call this index function and so I'm going to tell it to define a route for quote unquote
slash and that's it so let's take a look at what I just created here this is slightly new syntax and it's really the
only weirdness that we'll have today in Python this is what's known in Python is what's called a decorator a decorator is
a special type of function that modifies essentially another function for our purposes just know that on line six this
says Hey python Define a route for Slash the default page on my web application the next two lines seven and eight say
hey python Define a function called index takes no arguments and the only thing you should ever do is return
render template of quote unquote index.html all right so that's it so really the next question naturally
should be like all right well what is in uh index.html well let me go ahead and do
that next let me create a directory called templates practicing again what I preached earlier so I'm going to create
a new empty directory called templates I'm going to go and uh CD into that directory and then do code of
index.html so here's going to be my index page and I'm going to do a very simple web page do type HTML just going
to borrow some stuff from last week HTML language equals English I'll close that tag I'll then do a head tag I'll do a
meta tag the name of which is viewport this makes my site recall responsive that is it just grows and shrink to fit
the size of the device the initial scale for which is going to be one and the width of which is going to be device
width so I'm typing this out I have it printed here I this is stuff I typically copy paste but then lastly I'm going to
add in my title which will just be hello for the name of this app and then the body whoops Bobby the body of this tag
will be there we go the body of this page rather will just be hello comma world so
very uninteresting and really kind of a regression to where we began last week but let's go now and experiment with
these two files I'm not going to bother with a static folder right now cuz I don't have any other files that I want
to serve up no imag no CSS nothing like that and honestly requirements. text is going to be pretty simple I'm going to
go requirements. text and just say make sure the system has access to the flask Library itself all right but that's the
only thing I'm going to add in there for now all right so now I have two files app.py and I have index.html but
index.html thank you is inside of my templates directory so how do I actually start a web server last week I would
have said HTTP server but HTTP server is not a python thing it has has no idea about flask or python or anything I just
wrote HTTP server will just spit out static files so if I ran HTTP server and then I clicked on app.py I would
literally see my python code it would not get executed because HTTP server is just for static content but today I'm
going to run a different command called flask run so this framework flask that I actually pre-installed in advance uh
that's so it wasn't strictly necessary that I create that requirements. text file just yet comes with a program
called Flash takes command line arguments like the word run and when I do that you'll see somewhat similar
output to last week whereby you'll see the name uh your URL for your unique preview of that you might see a popup
saying that your application is running on TCP Port something or other by default last week we used port 8080
flask just because prefers Port 5000 so that's fine too I'm going to go ahead and open up this URL now and once it
authenticates and redirects me just to make sure I'm allowed to access that particular Port let me zoom in voila
there's the extent of this application if I view Source by right clicking or control clicking there's my HTML that's
been spit out so really I've just reinvented the Wheel from last week because there's no dynamism now nothing
at all but what if I do this let me close the source and let me zoom out so you can see my URL bar let me zoom in
now and I have a very unique cryptic URL but the point is that it ends with nothing or implicitly it ends with Slash
this is just Chrome being a little helpful it doesn't bother showing you a SL even though it's implicitly there but
let me do something explicit like uh my name equals quote unquote David so there's a key value pair that I've
manually typed into my URL bar and hit enter okay nothing happens nothing changes it still says hello world but
the opportunity today is to now dynamically get at the input from that URL and start displaying it to the user
so let me go back over here to my terminal window and code let me move that down to bottom there and what if I
want to say huh hello name I ideally want to say something like I don't want to hard code David because then it's
never going to say any hello to anyone else I kind of want to put like a variable name here like name should go
here but it's not an HTML tag so I need some kind of placeholder well here's what I can do if I go back to my python
code I can now Define a variable called name and I can ask flask to go into the current request in into its arguments
that is in the URL as they're called and get whatever the value of the parameter called name is that puts that into a
variable for me and then in render template this is one of those functions that can take more than one argument if
it takes another argument you can pass in the name of any variable you want so if I want to pass in my name I can
literally say name equals name so this is the name of a variable I want to give to the template this is the actual
variable that I want to get the value from and now lastly in my index.html the syntax as of today in flask is to do two
curly braces and then put the name of the variable that you want to plug in so here's what we mean by a template a
template is kind of like a blueprint in the real world where it's like plans to make something this is the plan to make
a web page that has all of this code literally but there's this placeholder with two curly braces here and here that
says go ahead and plug in the value of the name variable right there so in this sense it's similar in spirit to our F
strings or format strings in Python the syntax is a little different eh just because reasonable people disagree
different people different Frameworks come up with different conventions the convention in flask in their templates
is to use two curly braces here the hope is that you the programmer will never want to display two curly braces in your
actual web page but even if you do there's a workaround we can escape that so now let me go ahead and go back to my
browser tab here previously even though I added name equals David to the end of the URL with a question mark it still
said hello world but now hopefully if I made these changes let me go ahead and uh open up my terminal window let me
restart flask so it loads my changes by default let me go back to my hello Tab and click reload so it grabs the page a
new from the server and there we go hello David I can play around now and I can change the url up here to for
instance Carter zoom out hit enter and now we have something more Dynamic so the new pieces here are in Python we
have some code here that allows us to access programmatically everything that's after the question mark in the
URL and the only thing we have to do to do that is call this function request. args doget you and I don't have to
bother figuring out where's the question mark where is the equal sign where are the Amper Sands potentially the
framework flask does all of that for us okay any questions then on these principles thus
far yeah in [Music] back why do you need a question mark in
the URL the short answer is uh just because that is where key value pairs must go if you're making a get request
from a browser to a server the convention standardized by the HTTP protocol is to put them in in the URL
after the so-called route or path then a question mark and it delineates what's part of the route or the path and what's
part of the human input to the right other questions [Music]
yeah sure this is this annoying thing about python when you pass in parameters to functions that have names you
typically say something equals something else so let me make a slight tweak here uh how about I say name of person here
this allows me to invent my own variable for my template and assign it the value of name I now though have
to go into my index file and say name of person did I get that right name of person yeah so these two have to match
and so this is just kind of stupid because it's unnecessarily verbose so what typically people do is they just
use the same name as the variable itself even though it looks admittedly kind of stupid but it has two different roles
the thing to the left of the equal sign is the name of the variable you plan to use in the template the thing on the
right is the actual value you're assigning it and this is because it's general purpose I could override this
and I could say something like name always equals Emma no matter what that variable is and now if I go back to my
browser and reload no matter what's in the URL David or Carter it's always okay Emma broke the server uh what did I do
oh I might I didn't change my template back there we go let me change that back to be name so that it's name there and
its name here but I've hardcoded Emma's name so now we're only ever going to see Emma no matter whose name is in the URL
that's all all right so this is kind of bad user interface if in order to get a greeting for the day you the user have
to manually change the url which none of us ever do this is not like how web pages work how what is the more normal
mechanism for getting input from the user and putting it in that URL automatically
how did we do that last week with Google if you [Music]
recall okay so we did make something in order to get the input from the user and specifically what was the the tag or the
terminology we used last week sorry a little l oh
no but yeah is it so the input tag inside of the form tag so in short forms are of course like how the web works and
how we typically get input from the user whether it's a button or a text box or a drop down menu or something else so
let's go ahead and add that into the mix here so let's enhance this Hello app to do a little something more by this time
just doing this let me get rid of this name stuff and let me just have a very simple index.html file that by default
is going to Simply ask the user for some input as follows I'm going to go go back into my uh
index.html and instead of printing out the user's name this is the page I'm going to use to actually get input from
the user so I'm going to create a form tag uh the method I'm going to use for now is going to be quote unquote get
then inside of that form I'm going to have an input tag and I'm going to turn off autocomplete like we did last week
I'm going to turn on autofocus so it puts the cursor in the text box for me I'm going to give the name of this input
the name name not to be too confusing but I'm asking the human for their name so it makes sense that the name of the
input should be quote unquote name uh the placeholder I want the human to see in light gray text will be name with a
capital N just so it's a little grammatical and then type of this text field type of this input is going to be
text then I'm just going to give myself like last week a submit button and I don't care what it says it's just going
to say the default submit terminology let me go ahead now and open up uh my terminal window again let me go to that
same URL so that I can see whoops uh there we go so that was just cast
from earlier let me go back to that same URL my GitHub preview. deev URL and here I have the form and now I can type in
anything I want the catch though is when I click submit where is it going to go well let's be explicit it does have a
default value but let me go into my index.html and let me add just like we did last week for Google whereas
previously I said something like www.google.com search but today we're we're not going to rely on some third
party I'm going to implement the so-called backend and I'm going to have the user submit this form to a second
route not just slash but how about Slash greet I can make it up whatever I want greet feels like a nice operative word
so slash greet is where the user will be sent when they click submit on this form all right so let's go ahead now and go
back to my browser tab let me go ahead actually and let me reload flask here so that it reloads all of my changes let me
reload this tab so that I get the very latest HTML and indeed quick safety check if I view page Source we indeed
see that my browser has downloaded the latest HTML so it definitely has changed let's go ahead and type in David and
when I click submit here what's going to happen hypotheses what's going to happen
visually functionally however you want to interpret when I click submit
yeah page okay the user's going to go to an empty page pretty good Instinct CU nowhere else have I mentioned SL greet
doesn't seem to exist how's the URL going to change just to be clear what's going to appear suddenly in
the URL yeah 404 no not in the URL specifically
in the URL something's going to get added automatically when I click the key value pair right that's
how forms work that's why our Google trick last week worked I sort of recreated a form on my own website and
even though I didn't get around to implementing google.com itself I can still send the information to Google
Just relying on browsers standardizing to your question earlier that whenever you submit a form it automatically ends
up after a question mark in the URL if you're using get so this both of you are right this is going to break and all
three of you are right in effect 404 not found you can see it in the the tab here that's the error that has come back but
what's interesting and most important the URL did change and it went to SLG greet question Mar name equals David so
I just now need to add some logic that actually looks for that so-called route so let me go back to my app.py let me
Define another route for quote unquote SLG greet and then inside of under this let me Define another function I'll call
it uh greet but I could call it anything I want no arguments for now for this and then let me go ahead and do this in my
app.py this time around I do want to get the human's name so let me say request. ARS get quote unquote name and let me
store that in a variable called name then let me return a template and you know what I'm going to give myself a new
template greet HTML because this has a different purpose it's not a form I want to say hello to the user in this HTML
file and I want to pass into it the name that the human just typed in all right so now if I go up and reload the page
what might happen now other logical check here if I go ahead and hit reload or resubmit the
form what might happen now any instincts let me try so let's try this
let's go ahead and reload the page previously was not found now it's worse and this is the 500 error internal
server error that I promise next week we will all encounter accidentally ultimately but here we have an internal
server error because it's an internal error this means something's wrong with your code so the out was actually found
cuz it's not a 404 this time but if we go into vs code here and we look at the console the terminal window you'll see
that oh this is actually a bit misleading uh do I want to do this let me reload this let me reload here oh
stand by come on there we go come on okay here we have this error
here and this is where your terminal windows going to be helpful in your terminal window by default is typically
going to go helpful stuff like a log log of what it is the server is seeing from the browser for instance here's what the
server just saw in purple get/ greet question mark name equals David using HTTP version F 1.0 here though is the
status code that the server returned 500 why what's the error well here's where we get these annoying pretty cryptic
python messages that help 50 might ultimately help you with or here we might just have a clue at the bottom and
this is actually pretty clear even though we've never seen this error before what did I screw up here I just
didn't create greet HTML right template not found all right so that must be the last piece of the puzzle and again
representative of how you might diagnose problems like these let me go into my terminal window after hitting contrl C
which cancels or uh interrupts a process let me go into my templates directory if I type LS I only have index.html so
let's code up greet HTML and in this file let's quickly do doc type doc type HTML Open Bracket HTML language equals
English inside of this I'll have the head tag inside of here I'll have the meta the name is viewport the content of
which is uh see I always forget this too the content of which is initial scale equals one uh with equals device with
quote unquote title is still going to be we call this Greek cuz this is my template uh and then here in the body
I'm going to have hello comma name so I could have kept the around the old version of this but I just recreated
essentially my second template so index.html now is almost the same but the title is different and it has a form
greet HTML is almost the same but it does not have a form it just has the hello comma name so let me now go ahead
and rerun in the correct directory you have to run flask wherever app.py is not in your templates directory so let me do
flask run to get back to where I was let me go into my other tab cross my fingers this time that when I go back to slash
and I get index. html's form now I type in David and click submit now we get hello David and now we have kind of a
full-fledged web app that has two different routes slash and slash greet the ladder of which takes input like
this and then using a template spits it out but something could go wrong let's see what happens here uh suppose I don't
type anything in let me go here and just click submit now I mean it looks kind of stupid so there's Bunches of ways we
could solve this I could require that the user have input on the previous page I could have some kind of error check
for this but there's another mechanism I can use that I'll just show you it turns out this get function in the context of
HTTP and also in general with python dictionaries you can actually Supply a default value so if there is no name
parameter or no value for a name parameter you can actually give it a default value like this so I'll say
world for instance now let me go back here let me type in nothing again and click submit and hopefully this time
I'll in oops sorry let me restart flask to reload the template let me go ahead and type nothing this time clicking
submit and hopefully we now oh interesting I should have faked this
uh so suppose that the reason this H suppose I just get rid of name alt together like this
and hit enter now I see Hello World and this is a subtlety that I didn't intend to get into here when you have question
mark name equals nothing you're passing in what's called whoops when you have greet question mark name equals
something you actually are giving a value to name it is quote unquote with nothing in between that is different
from having no value at all so allow me to just propose that the error here uh we would want to require this in a
different way and probably the most robust way to do this would be to go in here in my HTML and say that the name
field is required now if I go back to my form after restarting flask here and I go ahead and click reload on my form and
type in nothing and click submit now the browser is going to yell at me but just as a teaser for something we'll be doing
in the next problem set in terms of error checking you should never ever ever rely on client side safety checks
like this because we know from last week that a curious programmer can go to inspect and let me poke around the HTML
here let me go into the body the form okay you say required I say not required you can just delete what's in the Dom in
the browser and now I can go ahead and submit this form and it appears to be broken not a big deal with a silly
little GRE reading application like this but if you're trying to require that humans actually provide input that is
necessary for the correct operation of the site you don't want to trust that the HTML is not altered by some
adversary all right any questions then on this particular app before we add another feature
here any questions here yeah sorry L ladder in the index
function sorry would it be a problem if [Music]
what no I mean no this is okay what you should really do is something we're going to do with another example where
I'm going to start error checking things so let me wave my hands at that and propose that we'll solve this better in
just a bit but it's not bad to do what I just did here it's only going to handle one of the scenarios that I was worried
about not all of them all right so even though this is new to most of us here consider index.html my first template
and consider greet HTML my second template what might be arguably badly designed even though this might be the
first time you've ever touched web programming like this like what's kind of bad or dumb about this design of
these two templates alone and there's a reason too that I kind of bored Us by typing it out that
second time time [Music] yeah yeah there's so much repetition I
mean it was kind of deliberately tedious that I was retyping everything the doc type the HTML tag the head tag the title
tag and little things did change along the way like the title and certainly the content of the body but so much of this
I mean almost all of the page is a copy of itself in multiple files and God forbid we have a third template a fourth
template a 100th template for a really big website this is going to get very tedious very quickly and suppose you
want to change something in one place you're going to have to change it now in two three 100 different places instead
so just like in programming more generally we have this ability to factor out commonalities so do you in the
context of web programming and specifically templating have the ability to factor out all of those commonalities
the syntax is going to be a little curious but it functionally is pretty straightforward let me go ahead and do
this let me go ahead and copy the contents of index.htm let me go into my templates directory
and code a file that by default is called layout. HTML and let me go ahead and per your answer copy all of those
commonalities into this file now instead so here I have a file called layout. HTML I don't want to give every page the
same title maybe uh but for now that's okay I'm going to call everything hello but in the body of the page what I'm
going to do here is just have a placeholder for actual contents that do change so in this layout I'm going to go
ahead and here and just put in the body of my page how about this syntax and this is admittedly new block body and
then percent sign close curly brace and then I'm going to do end block so kind of a curious syntax here but this is
more template syntax the other template syntax we saw before was the two curly braces that's for just plugging in
values there's this other syntax with flask that allows you to say a single curly brace a percent sign and then some
functionality like this defining a block and this one's a little weird because there's like literally nothing between
the closed curly and the open curly brace here but let's see what this can do for us let me now go into my
index.html which is where I kind of borrowed most of that code from and let me focus on what is minimally different
the only thing that's really different in this page title aside is the form so let me go ahead and just cut that form
out to my clipboard let me change the first line of index.html to say this this file is going to extend layout.
HTML and notice I'm using the curly braces again and this file is going to have its own body block inside of which
is just the HTML that I actually want to make specific to this page and I'll keep my indentation nice and neat here and
let's consider what I've done this is starting to look weird fast and this is now a mix of HTML with templating code
index.html first line now says Hey flask this file extends layout. HTML whatever that is this next line 3 through 10 says
Hey flask here is what I consider my body block to be plug this into the layout's placeholder therefore so if I
now go back to layout. HTML in layout. HTML it's almost all HTML by cont but there is this placeholder and if I want
to put a default value I could say whoops if I want to put a default value I could put a default value there just
in case some page does not have a body block but in general that's not going to be relevant so this is just a
placeholder albe it a little verbose that says plug in the page specific content right here so if I go now into
greet HTML this one's even easier I'm going to cut this content and get rid of everything else greet HTML 2 is going to
extend layout. HTML extends plural and then I'm going to have my body block here simply be this one line of code and
then I'm going to go ahead and end that block here these are not HTML tags this is not HTML syntax technically the
syntax we keep seeing with the curly braces and these now curly braces with percent signs is an example of Ginger
syntax j j i nja a which is a language that some humans invented for this purpose of templating and the people who
invented flask decided we're not going to come up with our own syntax we're going to use these other people's syntax
called ginger syntax so again there starts to be at this point in the tus and really in Computing a lot of sharing
now of ideas and sharing of code so flask is using this syntax but other libraries and other languages might also
too all right so now index.html you know is half HTML half templating code Ginger syntax greet HTML
is almost all Ginger syntax no tags even but because they both extend layout. HTML now I think I've kind of improved
the design of this thing if I go back to app.py none of this really needs to change I don't change my templates to
mention layout. HTML that's already implicit in the fact that we have the extends keyword so now if I go ahead and
open my terminal window go back to the same folder as app.py and do flask run all right my application is running on
Port 5000 let me now go back to the slash route in my browser and hit enter I have this
form again and just as a a little check let me view the source of the page that my browser is seeing and there's all of
the code no no mention of Ginger no curly braces no percent signs just HTML it's not quite pretty printed in the
same way but that's fine because now we're starting to dynamically generate websites and by that I mean this isn't
quite indented nicely or perfectly that's fine if it's indented in the source code version doesn't matter what
the browser really sees let me now go ahead and type in my name click submit I should see yep hello David let me go
ahead and view the source of this page and we'll see almost the same thing with what's plugged in there so this is now
web programming in the literal sense I did not hardcode a page that says hello comma David hello comma Carter hello
comma Emma I hardcoded a page that has a template with a placeholder and now I'm using actual logic some code in app.py
to actually tell the server what to say to the browser all right any questions
then on where we're at here this is now a web application simple though it is it's no longer just a web site
yeah design or memory is it better for design or for memory uh both it's definitely better for design because
truly if we had a third page fourth page I would really start just resorting to copy paste and as you saw with homepage
page often in the head of your page you might want to include some CSS files like bootstrap or something else you
might want to have other uh information up there if you had to upgrade the version of bootstrap or you change
libraries or you want to change one of those lines you would literally have to go into like three four 100 different
files to make one simple change so that's bad design and in terms of memory yes theoretically the server because it
knows there's this common layout it can theoretically do some optimizations underneath the hood flask is probably
doing that but not in the mode we're using it we're using it in development mode which means it's typically
reloading things each time other questions on this application anything at all all right so
let me ask a question not just in terms of uh the code design what about the implications for privacy like why is
this maybe not the best design for users how I've implemented this I've used a web form but yeah for some reason you
wanted your name private look at the yeah I mean if you have a nosy sibling or roommate and they have access to your
laptop and they just go trolling through your autocomplete or your history like literally what you typed into a website
is going to be visible not a big deal if it's your name but if it's your password your credit card or anything else that's
mildly sensitive you probably don't want it ending up in the URL at all even if you're in incognito mode or whatnot like
you just don't want to expose yourself or your users to that kind of risk so perhaps we can do better than that and
fortunately this one's actually an easy change let me go into my uh index.html where my form is and in my
form I can just change the method from get to post it's still going to send key value pairs to the server but it's not
going to put them in the URL the upside of which is that we can assuage this privacy concern but I'm going to have to
make one other change too because now if I go ahead and run flask again after making that change and I now reload the
form to make sure I have the latest version and you should be in the habit of going to view developer view Source
or developer tools just to make sure that what you're seeing in your browser is what you intend and yes I do see what
I wanted method equals post now let me go ahead and type in David and click submit now I get a different error this
one is HTTP 405 method not allowed well why is that well in my flask application I've only defined a couple of routes so
far one of which is for slash then that worked fine one of which is for slash greet and that used to work fine but
apparently what flask is doing is it only supports get by default so if I want to
change this route to support different methods I can say quote unquote uh post inside of this parameter
here so that now I can actually support post not just get and if I now re uh start flask so flask run enter and I go
back to this URL let me go back one screen to the form reload the page just to make sure I have the latest even
though nothing there has changed type David and click submit now now I should see Hello World notice that I'm at the
SLG greet route but there's no mention of name equals anything in the URL all right so that's kind of an interesting
takeaway right like it's a simple change but whereas get push things in the URL post does not but it still works so long
as you tweak the back end to look as at a post request which kind of means look deeper in the envelope it's not going to
be as simple as looking at the URL itself why shouldn't we just always use [Music]
post like why not use post everywhere any thoughts right because it's kind of obnoxious to be putting any
information in URLs if you're leaving these little breadcrumbs in your history and people can poke around and see what
you've been doing [Music] yeah what do you
think yeah I mean if you get rid of get requests and put nothing in the URL your history like your auto complete gets
pretty less useful right because none of the information is there stored so you can't just go through the menu and hit
enter you'd have to like refill out the form and there's this other something that you can see here let me zoom out
and let me just reload this page notice that you'll get this warning and it'll look different in Safari and Firefox and
Edge and chrome here confirm form submission so your browser might remember what your inputs were and
that's great but just while you're on the page and this is in contrast to get where the state state is information
like key value pairs is embedded in the URL itself and if you looked at an email I sent earlier today I deliberately
linked to htps www.google.com question mark Q equals
what plus time plus is plus it this is by definition a get request when you click on it because it's going to grab
the information the key value pair from the URL send it to Google server and it's just going to work and the reason I
sent this via email earlier was I wanted people to very quickly be able to check what is the current time and so I can
sort of automate the process of creating a Google search for you but that you induce when you click that link if
Google did not support get they only supported this the best I could do is send you all to this URL which
unfortunately has no useful information I would have had to add to my email by the way type in the words what time is
it so it's just bad for usability so there too we might have design when it comes to the low-l code but also the
design when it comes to the user experience or ux as a computer scientist would call it just in terms of what you
want to optimize for ultimately so get and post both have their roles it depends on what kind of functionality
you want to provide and what kind of sensitivity there might be around it all right any questions then on this our
first web application super simple just get someone's name and prints it back out but we kind of now have all the
plumbing with which to create really most anything we want no all right let's go ahead and
take a five minute break and when we come back we'll add to this some first year intermural Sports all right so we
are back and recall that the last thing we just changed was the route to use post instead of gu yet so gone is my
name and any value in the URL but there was kind of a subtle bug or change here that we didn't call out earlier like I
did type David into the form and I did click submit and yet here it is saying hello comma world so that seems to be
broken all of a sudden even though we added support for post but something must be wrong logically it must be the
case here intuitively that if I'm seeing hello world that's the def default value I gave the name variable it must be that
it's not seeing a key called name in request. ARS which is this gives you access to
everything after the URL that's because there's this other thing we should know about which is not just request. args
but request. form these are horribly named but request. ARS is for get requests request. form is for post
requests otherwise they're pretty much functionally the same but the onus is on you the user or the programmer to make
sure you're using the right one so I think if we want to get rid of the world and actually see what I the human typed
in I think I can just change request. RX to request. form still.get still quote unquote name and now if I go ahead and
rerun flask in my terminal window go back to my browser go back to and actually I won't even go back to the
form I will literally just reload command r or control R and what this war warning is saying is it's going to
submit the same information to the website when I click continue now I should see hello comma David so again
you too are going to encounter probably all these kinds of like little subtleties but if you focus on really
the first principles of last week like what is HTTP how does a get request work how does a post request work now you
should have a lot of the mental building blocks with which to solve problems like these and let me give you one other
mental model now for what it is we're doing this framework called flask is just an example of many different
Frameworks that all implement the same par Paradigm the same way of thinking and the same way of programming
applications and that's known as MVC model view controller and here's a very simple diagram that represents the
process that you and I have been implementing thus far and actually it's this is more than we've been
implementing thus far in app.py is what a programmer would typically call the controller that's the code you're
writing the so-called business logic that makes all of the decisions decides what to render what values to show and
so forth in layout HTML index.html greet HTML is the so-called view templates that is the visualizations that the
human actually sees the user interface those things are kind of dumb they pretty much just say plop some values
here all of the hard work is done in app.py so controller AKA app.py is doing is where your python code generally is
and in your view is where your HTML and your Ginger code your Ginger templating the curly braces the curly braces with
percent signs you usually is we haven't added um M to MVC yet model that's going to refer to things like CSV files or
databases the model where do you keep actual data typically long term so we'll come back to that but this picture where
you have one of these uh each of these components kind of intercommunicating with one another is representative of
how a lot of Frameworks work what we're teaching today this week is not really specific to python it's not really
specific to flask even though we're using flask it really is a very common Paradigm that you could Implement in
Java C SHP or Bunches of other languages as well all right so let's now pivot back to uh VSS code here let me stop
running flask and let me go ahead and create a new folder Al together after closing these files here and let me go
ahead and create a folder called Frost IMS representing freshman interal sports or first year interal sports that I can
now CD into and now I'm going to code an app.py and in anticipation I'm going to create another templates directory this
one in the fros im's folder and then in my templates directory I'm going to create a layout. HTML and I'm just going
to get myself started here FRS will go here I'm just copying my layout from earlier because most of my interesting
work this time is now going to be initially in app.py so what is it we're creating so like literally the very
first thing I wrote as a web application like 20 years ago was a site that literally looked like this um so I was
like a sophomore or Junior at the time I'd taken cs50 in a follow-on class only I had no idea how to do web programming
neither of those two courses taught web programming back in the day so I taught myself at the time a language called
Pearl and I learned a little something about CSV files and I sort of read enough can't even say Googled enough
because Google didn't come out for a couple of years later um read enough online to figure out how to make a web
application so that students on campus First Years could actually register via a website for intral sports back in my
day you would literally fill out a piece of paper and then walk it across the yard to Wigglesworth Hall one of the
dorms slide it under the dorm of the Proctor or ra and thus you were registered for sports so 1996 1997 like
we could do better by then there was an internet just wasn't really being used much on campus or more generally so uh
background images that repeat infinitely was kind of invogue apparently at the time um all of this was like images that
I had a handmade cuz we did not have the features that JavaScript and CSS nowadays have so it was really just HTML
and it was really just controller code written not in Python but in Pearl and it was really just the same building
blocks that we here already today now have so we'll get rid of all of the imagery and focus more on the
functionality and the Aesthetics but let's see if we can't whip up a web application via which someone could
register for one such intramural sport so in app.py let me go ahead and import some familiar things now from flask
let's import uh Capital flask which is that function we need need to kick everything Kickstart everything render
templates so we have the ability to render that is print out those templates and request so that we have the ability
to get that input from the human uh let me go ahead and create the application itself using this magical incantation
here and then let's go ahead and Define a route uh for slash for instance first uh I'm going to define a function called
index but just to be clear this function could be anything Fu bar baz anything else but I tend to name in a manner
that's consistent with what the route is called but you could call it anything you want it's just the function that
will get called for this particular route let me go ahead here and just get things started return render template of
index.html just keep it simple nothing more so there's nothing really Frost I am specific about this here I just want
to make sure I'm doing everything correctly meanwhile I've got my layout okay let me go ahead and uh in my
templates directory code a file called index.html and and uh let's just do extends layout. HTML at the top just so
that we get benefit from that template and down here I'm just going to say Todo just so that I have something going on
visually to make sure I've not screwed up yet in my Frost's directory let me do flask run let me now go back to my
previous URL which used to be my hello example but now I'm serving up the frost IM site oh and I'm seeing nothing that's
CU I screwed up accidentally what did I do wrong in index.html what am I doing wrong this
file extends layout. HTML but you left out the yeah I forgot to tell flask what to plug into that layout so I just need
to say block body and then in here I can just say to-do or whatever I want to eventually get around to then end the
block let me end this tag here okay so now looks kind of ugly more cryptic but this is again the essence of doing
templating uh let me now uh restart flask up here let me go back to the page let me reload crossing my fingers this
time and there we go Todo so it's not the application I want but at least I know I have some of the plumbing there
by default all right so if I want the user to be able to register for one of these Sports let's enhance now
index.html to actually have a form that's maybe got like a drop down menu for all of the sports for which you can
register so let me go into this template here and instead of to do let's go ahead and give myself
um how about an H1 tag that just says register so the user knows what it is they're looking at how about a form tag
that's going to use post just because it's not really necessary to put this kind of information in the URL uh the
action for that how about we plan to create a register route so that we're sending information from slash to a
register route so we'll have to come back to that in here let me go ahead and create um how about an input with
autocomplete equals off autofocus on uh how about a name equals name because I'm going to ask the student for their name
using placeholder text of quote unquote name and the type of this box will be text so this is pretty much identical to
before but if you've not seen this yet let's create a select menu a so-called dropdown menu in HTML and maybe the
first option I want to be in there is going to be oh how about the current three sports with the for the fall which
are uh basketball and another option's going to be soccer and a third option's going to be
Ultimate Frisbee for first year interal right now so I've got those three options I've got my form I haven't
implemented my route yet but let this feels like a good CH a good CH a good time to go back now and check if my form
has reloaded so let me go ahead and stop and start flask you'll see there's ways to automate the process of restarting
the server that we'll do for you for problem set 9 so you don't have to keep stopping flask let me reload my index
route and okay it's not that pretty it's not though maybe nor was this but it now has at least some functionality where I
can type in my name and then type in the sport now I might be biasing people toward basketball like ux wise user
experience- Wise It's kind of obnoxious to like pre-check basketball but not the others so there's some little tweaks we
can make there let me go back into index.html let me create like an empty option up here that
technically uh this option is not going to have the name of any sport but it's just going to have a word I want the
human to see so I'm actually going to disable this option and make it selected by default but I'm going to say sport up
here and there's different ways to do this this is just one way of creating essentially a whoops option yep that
looks right uh creating a placeholder sport so that the user sees something in the drop down let me go ahead and
restart flask reload the page and now it's just going to be marginally better now you see sport that's checked by
default but you have to check one of these other ones ultimately all right so that's pretty good so let me now type in
David I'll register for Ultimate Frisbee okay I definitely forgot something submit button so let's add
that all right so input type equals submit all right let's put that in restart flask reload okay getting better
submit could be a little prettier recall that we can change some of these HTTP uh these HTML attributes the value this
button should be register maybe just to make things a little prettier let me now reload the page and register all right
so now we really have the beginnings of the user interface that I created some years ago to let people actually
register for the sport so let's go now and create maybe the other route that we might need let me go into app.py and in
here if we want to allow the user to register let's do a little bit of error checking which I promise we'd come back
to like what could the user do wrong because assume that they will one they might not type their name two they might
not choose a sport so they might just submit an empty form so that's two things we could check for just so that
we're not storing like bogus entries in our database ultimately so let's create another route called greet SLG greet and
then in this route let's create a function called greet but can be called anything we want and then let's go ahead
and in the Greet function let's go ahead and validate the submission so a little comment to myself here how about
if there is not a request. form get name value so that is if that function returns nothing like quote unquote or
the special word none in python or request. form. getet quote unquote
sport uh not in quote unquote what were they basketball uh the other one was soccer
and the last was Ultimate frisbe getting a little long but notice what I'm the question I'm asking if the user did not
give us a name that is if this function Returns the equivalent of false which is quote unquote or literally none if
there's no such parameter or if the sport the user provided is not some value in basketball soccer or Ultimate
Frisbee which I've defined as a python list then let's go ahead and just yell at the user in some way let's return uh
render template of failure. HTML and that's just going to be some error message inside of that file otherwise if
they get this far let's go ahead and confirm registration by just returning whoops returning render template quote
unquote success. HTML all right so couple quick things to do let me first go in and in my
templates directory let's create this failure. HTML file and this is just meant to be a message to the user that
they fail to provide the information correctly so let me go ahead and in failure. HTML
not repeat my past mistake so let me extend layout. HTML and in the block body you are not registered I'll just
yell at them like that so that they know something went wrong and then let me create one other file called success.
HTML that similarly is mostly just Ginger syntax and I'm just going to say for now even though they're not
technically registered in any database you are registered that's what we mean by success all right so let me go head
and back in my Frost am's directory run flask run let me go back to the form and reload should look the same all right so
now let me not cooperate and just immediately click register impatiently okay what did I do wrong
register oh I'm confusing our two examples all right I spotted the error what did I do
wrong unintentional there's where I am what did I actually invent over
here where did I screw up anyone register not thank you so
register not greet I had last example on my mind so the route should be register ironically the function could be greet
because that actually doesn't matter but to keep ourselves saying let's use the one and the same words there let me go
ahead now and start flask as intended let me reload the form just to make sure all is working now let me not cooperate
and be a a bad user clicking register oh my God okay other uned mistake but this one we've seen before notice that by
default routes only support get so if I want to specifically support post I have to pass in Via a methods parameter a
list of allowed route uh methods that could be get comma post but if I don't have no need forg in this context I can
just do post all right now let's do this One Last Time reload the form to make sure everything's okay click register
and you are not registered so it's catching that all right let me go ahead and at least give them my name register
you are not registered fine I'm going to go ahead and be David with Ultimate Frisbee
register huh okay what should I what did I mean to do
here all right so let's figure this out how to debug something like this which is my third and final uned unforced
error how can we go about troubleshooting this turn this into the teachable moment
all right well first some like safety checks like what did I actually submit let me go ahead and view page Source A
good rule of thumb look at the HTML that you actually sent to the user so here I have an input with a name name so that's
what I intended that looks okay ah I see it already even though you if you've never used the select menu might not
know what apparently is missing from here that I did have for my text input just
intuitively like logically what's going through my head embarrassingly is like all right if my form thinks that it's
missing a name or a sport how did I create a situation in which name is blank or sport is blank well name I
don't think is going to be blank because I explicitly gave this text field a name name and that did work last time I've
now a second input in the form of the select menu but what seems to be missing here that I'm assuming
exists here it's just a dumb mistake I made what might be missing
here if request. form gives you all of the inputs that the user might have typed in let me go into my actual code
here and my form and name equals sport I just didn't give a name to that input so it exists and the browser doesn't care
it's still going to display the form to you it just hasn't given it a unique name to actually transmit to the server
so now if I'm not going to put my foot in my mouth I think that's what I did wrong and again my process for figuring
that out was looking at my code thinking through logically is this right is this right no I was missing the name there so
let's run flask let's reload the form just to make sure it's all defaults again type in my name and type in
Ultimate Frisbee crossing my fingers extra hard this time and there you are registered so I can't
emphas I did not intend to screw up in that way but that's exactly the right kind of thought process to diagnose
issues like this go back to the basics go back to what HTTP and what HTML forms are all about and just rule things in
and out there's only a finite number of ways I could have screwed that up [Music]
yeah can you say a little louder why did name equal sport address the problem well let's first go back to
the HTML previously it was just the reality that I had this user input drop down menu but
I never gave it a name but names or more generally key value pairs is how information is sent from a form to the
server so if there's no name there's no key to send even if the human types a value like it would be like nothing
equals Ultimate Frisbee and that just doesn't work the browser's just not going to send it however in app Pi I was
naively assuming that in my requests form there would be a name called quote unquote sport it could have been
anything but I was assuming it was sport but I never told the form that and if I really wanted to dig in we could do a
little something more let me go back to the way it was a moment ago let me get rid of the name of the sport dropdown
menu let me rerun flask down here and reload the form itself after it finishes being served and now let me do this view
developer tools and then let me watch the network tab which recall we played around with a little bit last week and
we also played around with curl which let us see the HTTP requests here's another here's what I would have done if
I still wasn't seeing the error and was really embarrassed on stage I would have typed in my name as before I would have
chosen Ultimate Frisbee I would have clicked register and now I would have looked at the HTTP request and I would
click on register here and just like we did last week I would go down to the request down here and there's a whole
lot of stuff that we can typically ignore but here let me zoom in way at the bottom what Chrome's developer tools
are doing for me is it's showing me all of the form data that was submitted so this really would have been my Telltale
clue I'm just not sending the sport even if the human typed it in and logically because I've done this before that must
mean I didn't give the thing a name but another good tool like good programmers web developers are using these kinds of
tools all the time they're not writing bug free code that's not the point to get to the point to get to is being a
good diagnostician I would say in these cases okay other questions on this
yeah sorry sorry a little Lou uh so if how would you edit a uh CSS if you have these templates that
process we'll actually see before long is almost going to be the exact same just to give you a teaser for this and
you'll do this in the problem set but we'll give you some distribution code to automate this process you can absolutely
still do something like this link href uh equals quote unquote styles.css Rel equals stylesheet that's one of the
techniques we showed last week the only difference today using flask is that all of your static files by convention
should go in your static folder so the change you would make in your layout would be to say that styles.css is in
your static folder and then if I go into my Frost's directory I can create a static folder I can CD into it nothing's
there by default but if I now code a file called styles.css I could now do something like this body and in here I
could say back ground back uh ground color say uh f f0000 to make it red let me go ahead now
and restart flask in the frost AMS directory cross my fingers cuz I'm doing this on the Fly go back to my form and
reload voila now we've tied together last week's stuff as well if I answered the right
[Music] question if you want to change one page and not the other in terms of CSS that
depends in that case you're Pro you might want to have different CSS files for each page if they're that different
you could use different classes in one template than you did in the other there's different ways to do that you
could even have a placeholder in your layout that allows you to plug in the URL of a specific stylesheet in your
individual files but that starts to get more complicated quickly so in short you can absolutely do it but typically I
would say most websites try not to use different styles sheets per page they reuse the Styles as
much as they can okay all right let me go ahead and revert this real quick and let's start to add a little bit more
functionality here I'm going to go ahead and just remove the static folder just so as to not complicate things just yet
and let's go ahead and just play around with a different user interface mechanism in my form here um the drop-
down menu is perfectly fine nothing wrong with it but suppose that I wanted to change it to like check boxes instead
maybe I want students to be able to register for multiple Sports instead well it might make sense to to clean
this up in a couple of ways and let's do this before we even get into the check boxes there's one subtle bad design here
notice that I've hardcoded basketball soccer and Ultimate Frisbee here and if you recall an app.py I also enumerated
all three of those here and anytime you see like copy paste or the equivalent thereof feels like we could do better so
what if I instead do this what if I instead give myself like a global variable of sports I'll capitalize the
word just to connote that it's meant to be constant even though python does not have constants per se uh the first sport
will be basketball the second will be soccer the third will be uh Ultimate Frisbee now I have one convenient place
to store all of my sports if it changes next semester or next year or whatnot but notice what I could do too I could
now do something like this let me pass into my index template a variable called sports that's equal to that Global VAR
iable Sports let me go into my index now and this is really now going to hint at the power of templating and ginger in
this case here let me go ahead and get rid of all three of these hard-coded options and let me show you some
slightly different Syntax for sport in sports then end4 we've not seen this N4 syntax just
like end block syntax but it's a simple as that so you have a start and an end to your block without indentation
mattering watch what I can do here option uh curly brace sport close curly brace
let me save that let me go back into my terminal window do flask run and if I didn't mess up here let me go back to
this the Red's going to go away because I deleted my CSS and now I still have a sport drop down and all of those sports
are still there I can make one more Improvement now I don't need to mention these same Sports manually in app.py I
can now just say if the user's inputed sport is not in my Global variable Sports and ask the same question and
this is really handy because if there's another sport for instance that gets added like uh say football all I have to
do is change my Global variable and if I reload the form now and look in the drop down boom now I have support for a
fourth Sport and I can keep adding and adding there so here's where templating starts to get really powerful in that
now in this uh template I'm using Ginger's for Loop syntax which is almost identical to python here except you need
the curly brace and the percent sign and you need the weird ending end four but it's the same idea as in Python
iterating over something with a for Loop let you generate more and more HTML and this is like every website out there for
instance Gmail when you visit your inbox and you see all of this big table of emails you know Google has not hardcoded
your emails manually they have grabbed them from a database they have some kind of for Loop like this and are just
outputting table row after table row or div after div dynamically all right so now let's let's
go ahead and change this maybe to oh how about uh little uh checkboxes or radio buttons so let me go ahead and do this
instead of a select menu I'm going to go ahead and do something like this for each of these Sports let me go ahead and
output not an option but let me go ahead and output an input tag the name for which is quote unquote sport the type of
which is checkbox the value of which is going to be the current sport quote unquote and then afterward I need to
redundantly seemingly output the sport so you see a word next to the checkbox and we'll look at the result of this in
just a moment so it's actually a little simpler than a select menu a drop down menu because now watch what happens if I
reload my form different user interface and you know it's not as pretty but it's going to allow users to sign up for
multiple Sports it once now it would seem now I can click on basketball and football and soccer or some other
combination thereof if I view the Page's Source this is again the power of templating I didn't have to type out
four inputs I got them now automatically and these things all have the same name but that's okay it turns out with flask
if it sees multiple values for the same name it's going to hand them back to you as a list if you use the right function
all right but suppose we don't want users registering for multiple Sports maybe capaity an issue let me go ahead
and change this checkbox to radio button which a radio button is mutually exclusive so you can only sign up for
one so now once I reload the page it now there we go it now looks like this and because I've given each of these inputs
the same name quote unquote sport that's what makes them mutually exclusive the browser knows all four of these things
are types of sports therefore I'm only going to let you select one of these things and that's simply because they
all have the same name again if I view page Source notice all of them name equals sport name sport name equals
sport but what differs is the value that each one is going to have all right any questions then on
this approach all right well let me go ahead and open a version of this that I made
in advance that's going to now start saving the information so thus far we're not quite at the point of where this
website was which actually allowed the Proctors to see like in a database everyone who had registered for sports
now we're literally telling students you are registered or you are not registered but we're literally doing nothing with
this information so how might we go about implementing this well let me go ahead and close these tabs and let me go
into what I called version three of this in the code for today and let me go into my source n directory frms 3 and let me
go ahead and open up app.py so this is a pre-made version I've gotten rid of football in this case but I've added one
thing at the very top what in English does this represent on line seven what would you describe what that thing
is what are we looking at what do you think empty dictionary yeah it's an empty dictionary right registrant is
apparently a variable on the left it's being assigned an empty dictionary on the right and a dictionary again is just
key value pairs here again is where dictionaries are just such a useful data structure why because this is going to
allow me to remember that David registered for Ultimate Frisbee Carter registered for soccer Emma registered
for something else you can associate keys with Val names with sports assuming a model where you can only register for
one sport for now and so let's see what the logic is that handles this here in my register route in the code I've
pre-made notice that I'm validating the user's name slightly differently from before but same idea I'm using request.
from.get to get the human's name if not name so if the human did not type a name I'm going to Output error. HTML but
notice I've started to make the user interface more expressive I'm telling the user apparently with a message what
they did wrong well how I'm apparently passing to my error template instead of just failure. HTML a specific message so
let's go down this Rabbit Hole let me actually go into templates error. HTML and sure enough here's a new file I
created here that adorably is apparently going to have a grumpy cat as part of the error message but notice what I've
done in my block body I've got an H1 tag that just says error big and bold I then have a paragraph tag that plugs in
whatever the error message is that the controller app.py is passing in and then just for fun I have a picture of a
grumpy cat conting that there was in fact an error let's keep looking how do I validate sport I do similarly request.
form. getet of sport and I store it in a variable called sport if there's no such sport that is the human did not check
any of the boxes then I'm going to render error. HTML 2 but I'm going to give a different message missing sport
else if the sport they did type in is not in my sports Global variable I'm going to render error. HTML but complain
differently you gave me an invalid sport somehow they you know as like a hacker went into the HTML of the page changed
it to add their own SP Sport like volleyball even though it's not offered they submitted volleyball but that's
okay I'm rejecting it even though they might have maliciously tried to send it to me by changing the Dom local a and
then really the magic is just this I remember that this person is registered by indexing into the register dictionary
using the name the human typed in as the key and assigning it a value of sport why is this useful well I added one
final route here I have a SL registr route with a registr function that renders a template called registrant
HTML but it takes as input that Global variable just like before so let's go down this Rabbit Hole let me go into uh
templates registrant HTML here's this template or it looks a little crazy big but it extends the
layout here comes the body I've got an H1 tag that says registrant big and bold then I've got a table that we saw last
week this has a table head that just says name sport for two columns then it has a table body wherein using this for
Loop in ginger syntax I'm saying for each name in the registrant variable output a table row start tag and end tag
inside of which two table datas two cell table data for name table data for registrant bracket name so it's very
similar to python syntax it essentially is python syntax albeit with these curly braces and the percent sign so the net
effect here is what let me open up my terminal window run flask run let me now go into the form that I pre-made here so
gone is football let me go ahead and type in David let me choose oh no sport register error missing Sport and there
is the grumpy cat so missing sport though specifically was outputed all right fine let me go ahead and say uh no
name but I'll choose basketball register missing name all right let me maliciously now do this right now I'm
hacking let me go into this I'll type my name sure but let me go into the body tag down here let me maliciously go down
in Ultimate Frisby ah heck with that let's volleyball change that and change this
to volley volleyball enter so now I can register for any sport I want to create let me
click register but invalid Sports so again that speaks to the power and the need for checking things on the back end
and not trusting users it is that easy to hack websites otherwise if you're not validating data server side all right
finally let's just do this for real David is going to register for Ultimate Frisbee clicking register and now the
output's not very pretty but notice I'm at the registr route and if I zoom out I have an HTML table two columns name and
Sport David and Ultimate Frisbee let me go back to the form let me pretend like Carter walked up to my laptop and
registered for basketball register now we see two rows in this table David ultimate frisbe Carter basketball and if
we do this one more time maybe Emma comes along and registers for soccer register all of this information is
being stored in this dictionary now all right so that's great now we have a database albeit in the form of like a
python dictionary but why is this maybe not the best implementation why is it not great
[Music] yeah yeah so we're only storing this dictionary in the computer's memory and
that's great until I hit contrl C and kill flask stopping the web server or the server reboots or maybe I close my
laptop or whatever if the server stops running memory is going to be lost right Ram is volatile it's thrown away when
you lose power or stop the program so maybe this isn't the best approach maybe it would be better to use a CSV file and
in fact some 20 years ago that's literally what I did I stored everything in a CSV file but let's skip that step
because we already saw last week or a couple of weeks ago now how we can use SQL light let's see if we can't marry in
some SQL here to store an actual database for the program let me go back here and let me open up say version four
of this which is almost the same but it adds a bit more functionality let me close these tabs and let me open up
app.py now in version 4 so notice it's almost the same but at the top I'm creating a database connection to a
database called Frost im. DB so that's a database I created in advance so let's go down that rabbit hole what does it
look like let me make my terminal window bigger let me run SQL light 3 of frost. DB okay I'm in let's do schema and let's
just infer what I designed this to be I have a table called registrant which has one two three columns an ID column
that's an integer a name column that's text but cannot be null and a sport column that's also text cannot be null
and the primary key is just ID so that I have a unique ID for every registration let's see if there's anyone in there yet
select star from registrant okay there's no one in there no one is yet registered for sports so let's go back to the code
and continue on in my code now I've got the same Global variable for validation and generation of my HTML uh looks like
my index route is the same it's dynamically generating uh the menu of sports interestingly we'll come back to
this there's a deregister route that's going to allow someone to uh deregister themselves if they want to um exit the
sport or undo their registration but this is the juicy part here's my new and improved register route still works on
post so some mild privacy there I'm validating the submission as follows I'm getting the user's inputed name the
user's inputed Sport and if it is not a name or the sport is not in sports I'm going to render failure. HTML so I kept
it simple there's no cat in this version it just says failure otherwise recall how we co-mingled SQL and python before
we're using C cs50's SQL library but that just makes it a little easier to execute SQL queries and we're executing
this insert into registrant name comma sport what two values the the name and the sport that came from that HTML form
and then lastly and this is a new function that we're calling out explicitly now flask also gives you
access to a redirect function which is how um which is how safety school.org harvard.org and those other sites we
played around with last week were all implemented redirecting the user from one place to another this flask function
redirect comes from my just having imported it at the very top of this file it handles the HTTP 301 or 302 or 307
code whatever the appropriate one is it does that for me all right so that's it for reg that's it for registering via
this route let's look at what the SL registrant route is here we have a new route for/ registrant and instead of
just iterating over a dictionary like before we're getting back let's see db. execute of Select star from registring
so that's like literally the programmatic version of what I just did manually that gives me back a list of
dictionar Each of which represents one row in the table then I'm going to render registr HTML passing in literally
that list of dictionaries just like using cs50's library in the past so let's go and look at these that form if
I go into templates and open up registrant HTML oh okay it's just a table like before and actually let me
change this syntactically for consistency we have a ginger for Loop that iterates over each registrant and
for each of them outputs a table row oh but this is interesting instead of just having two columns with the person's
name and Sport notice that I'm also outputting a full-fledged form all right this is starting to get kind of Juicy so
let's actually go back to my terminal window run flask and actually see what this example looks like now let me
reload the page all right and the homepage looks exactly the same but let me now register for something David AVID
for Ultimate Frisbee register oh damn it uh there let's try this again David
registering for Ultimate Frisbee register okay so good thing I have deregister so this is what it should now
look like I have a page at the route called slash registrant that has a table with two columns name and Sport David
ultimate free people oh wait a third column why because if I view the page Source notice that it's not the
prettiest UI for every Row in this table I'm also going to be outputting a form just to deregister that user but before
we see how that works let me go ahead and register Carter for instance so Carter will give you basketball again
register the table grows now let me go back and let's register Emma for soccer and the table should grow before we look
at that HTML let's go back to my a terminal window let's go into SQL light Frost's uh let me go into
frosts and let me open up with SQL light three frost. DB and now do select star from registrant and whereas previously
when I executed this there were zero people now there's indeed three so now we see exactly what's going on
underneath the hood so let's look at this form now at this page now if I want to unregister deregister one of these
people specifically how do we do this clicking those one of those buttons will indeed delete the row from the
database but how do we go about linking a web page with python code with a database like this is the last piece of
the puzzle up until now everything's been with forms and also with URLs but what if the user is not typing anything
in they're just clicking a button well watch this let me go ahead and sniff the traffic which you could be in the habit
of doing now anytime you're curious how a website works let me go to the network Tab and Carter shall we deregister you
from basketball let's deregister Carter and let's see what just happened if I look
at the deregister request notice that it's a post the status code that eventually came back is 302 but let's
look at the request itself all the headers there will ignore the only thing that button submits kind of cleverly is
an ID parameter a key equaling two what does two presumably represent or map to like where did this two come
from it doesn't say Carter it doesn't say basketball what is
it the second person that registered so those primary keys that we started talking about a couple of weeks ago why
it's useful to be able to uniquely identify a row in a table here is just one of the reasons why if it suffices
for me just to send the ID number of the person I want to delete from the database because I can then have have
code like this if I go into app.py and I look at my deregister route now the last of them notice that I got
this I first go into the form and I get the ID that was submitted hopefully if there was in fact an ID in the form
wasn't somehow empty I execute this line of code delete from registrant where ID equals question mark and then I plug in
that number deleting Carter and only Carter and I'm not using his name because what if we have two people named
Carter two people named Emma or David you don't want to delete both of them that's why these unique IDs are so so
important and here's another reason why you don't want to store some things in URLs suppose we went to this URL D
register question mark ID equals 3 suppose I maliciously emailed this URL to Emma it
doesn't matter so much what the beginning is but suppose I emailed her this URL SL deregister question mark ID
equals 3 and I said hey Emma click this and it uses get instead of post what did I just trick her into
doing what's G to happen if Emma clicks this yeah you would trick her into
deregistering herself why because if she's logged into this Frost's website and the URL contains her ID just because
I'm being malicious and she clicked on it and the website is using get unfortunately get URLs are again
stateful they have state information in the URLs and in this case it's enough to delete the user and boom she would have
accidentally deregistered herself and this is pretty innocuous suppose that this was her bank account trying to make
a withdrawal or a deposit suppose that this were some other website a Facebook URL trying to tricker into posting
something automatically here too is another consideration when you should use post versus get because get requests
can be plugged into emails sent via black m messages text messages or the like and unless there's a prompt saying
are you sure you want to deregister yourself you might blindly trick the user into being vulnerable to what's
called a cross-site request forgery a fancy way of saying you trick them into clicking a link that they shouldn't have
because the website was using get alone all right any question then on these building
blocks yeah [Music] when three columns you
[Music] mean uh the three forward slashes I'm I'm not sure I
[Music] follow sorry it's in where which file
[Music] sorry the other direction
[Music] okay keep scrolling more oh this thing okay sorry um this is just a this is uh
a URI that's refer that's typical syntax that's referring to the SQL light protocol so to speak which means use SQL
light to talk to a file locally colon slash is just like you and I see and URLs the third slash essentially means
current folder that's all so it's it's a weird curiosity um but it's typical whenever you're referring to a local
file and not one that's Elsewhere on the internet that's a bit of an oversimplification but that's indeed a
convention sorry for not that not clicking earlier all right let's do one other iteration of frost IMS here just
to show what I was actually doing to back in the day was not only storing these things in CSV files as I recall I
was also automatically generating an email to the Proctor in charge of the interal sports program so that they
would have sort of a running history of people registering and they could easily reply to them as well let me go into
Frost's version 5 which I pre-created here and let me go ahead and open up say uh app. piie this time and this is some
code that I wrote in advance and it looks a little scary at first glance but I've done the following I have now added
the flask mail library to the picture by adding flask mail to requirements. text and running a command to automatically
install email support for flask as well and this is a little bit cryptic but it's honestly mostly copy paste from the
documentation what I'm doing here is I'm configuring my flask application with a few configuration variables if you will
this is the Syntax for that app.config is a special dictionary that comes with flask that is automatically created when
you create the app up here on line 9 and I just had to fill in a whole bunch of configuration values for the default
sender address that I want to send email as the default password I want to use to send email the port number the TCP Port
that we talked about last week the mail server I'm going to use Gmail smtp.gmail.com server use TLS this means
use encryption so I set that to True mail username this is going to grab it from my environment so for security
purposes I didn't want to hardcode my own Gmail username and password into the code so I'm actually storing those in
what are called environment variables you'll see more of these in problem Set n and it's a very common convention on a
server in the world to store sensitive information in the computer's memory so that it can be accessed when your
website is running but not in your source code it's way too easy if you put credentials sensitive stuff in your
source code to post it to GitHub or to screenshot it accidentally or for information to leak out so for today's
purposes know that the os. Environ dictionary refers to what are called environment variables and this is like
an outof band a special way of defining key value pairs in the computer's memory by running a certain command but that
never show up in your actual code otherwise there would be so many usernames and passwords accidentally
visible on the internet so I've installed this in advance let me see if I can do this correctly let me go over
to another tab in just a moment and here I have on my second screen here John Harvard's inbox it's currently empty and
I'm going to go ahead and register for some Sport AS John Harvard here hopefully so let me go ahead and run
flask run on this vers five let me go ahead and reload the main screen not that one let me reload the
main screen here this time clearly I'm asking for name and email so name will be John Harvard J Harvard cs50.h
harvard.edu uh he'll register for how about soccer register and if I did this correctly not
only is John Harvard on his screen seeing you are registered but when he checks his email
on this other screen Crossing his fingers that this actually works as a
demonstration and I promise it did right before [Music]
class horrifying I don't think there's a mistake this
time let me try something over here real quick quick but I don't think this is broken it wouldn't have said success if
it were I just tried submitting again so I just did another you are
registered oh I'm really sad right [Music] now what's
that I could check spam but then it's not sure we want to show spam here on the internet that every one of us
gets oh maybe
oh thank you okay wow that was a risky click I worried all right so you are registered
is the email that I sent out and it doesn't have any actual information in it but back in the day it would have
because I included like the students name and their dorm and all the other fields of information that we asked for
so let's just take a quick look at how that code might work um I did have to configure Gmail in a certain way to
allow what they call less secure apps using SMTP which is the protocol used for outbound email but besides setting
these things let's look at the register route down here it's actually pretty straightforward in my register route I
validated the submission just like before nothing new there I then confirmed the registration down here
nothing new there all I did was use two new lines of code and it's this easy to automate the sending of emails I
apparently have done it too many times which is why it send ended up in spam I created a variable called message I used
a message function that I must have imported higher up so we'll go back to that here's apparently the subject line
is the first argument and the second argument is the uh named parameter recipients which takes a list of emails
that should get the confirmation email so in brackets I just put the one user's email and then mail. send that message
so let's scroll back up to see what message uh and what mail actually is mail I think we saw yep mail is this
which I have as a variable because I followed the documentation for this Library you simply configure your
current app with mail support capital M here and if you look up here now on line seven here's the new library from flask
mail I imported Capital mail Capital message so that I had the ability to create a message and send a mail so such
a simple thing whether you want to confirm things for users you want to do password resets it can be this easy um
to actually generate emails provided you the requisite access and software installed and just to make clear that I
did add something here let me open up my requirements. text file and indeed I have both flask and flask Das mail ready
to go but I ran the command in advance to actually do that all right any questions then on these examples
here no all right so what other pieces might actually remain for us let me flip over here it turns out that a key
component of most any web application nowadays that we haven't touched on yet but it'll be one of our final flourishes
today is the notion of a session and a session is actually a feature that deres from all of the basics we talked about
today and last week and a session is the technical term for like what you and I know as a shopping cart when you go to
amazon.com and you start adding things to your shopping cart they follow you from page to page to page Heck if you
close your browser come back to the next day they're typically still in your shopping cart which is great for Amazon
because they want your business they don't want you to have to like start from scratch the next day similarly when
you log into any website these days even if it's not an e-commerce thing but it has usernames and passwords you and I
are not in the habit of logging into every darn page we visit on a website typically you log in once and then for
the next hour day week year you stay logged into that website so somehow the website is remembering that you have
logged in and that is being implemented by way of this thing called a session and perhaps a more familiar term that
you might know as and worry about called cookies let's go ahead and take one more 5 minute break here and when we come
back we'll look at cookies sessions and these final features all right so the promise now is
that we're going to implement this notion of a session which is going to allow us to like log users in and keep
them logged in and even Implement things like a shopping cart and the overarching goal here is to build an application
that is quote unquote stateful Again State refers to information and something that's stateful remembers
information and in this context the curiosity is that HTTP is technically a stat list protocol like once you visit a
URL HTTP col something hit enter web page is downloaded to your browser like that's it like you can unplug from the
internet you can turn off your Wii but you still have the web page locally and yet we somehow want to make sure that
the next time you click on a link on that website it doesn't forget who you are or the next thing you add to your
shopping cart it doesn't forget what was already there so we somehow want to make HTTP stateful and we can actually do
this using the building blocks we've seen thus far so concretely here's like a form you might see occasionally but
pretty rarely when you log into Gmail right and I say kind of rarely because most of you don't log into Gmail
frequently you just stay logged in pretty much endlessly in your browser and that's because Google has made the
conscious choice to give you a very long session time maybe a day a week a month a year cuz they don't really want to add
friction to using their tool and making you log in every darn day by contrast there's other applications on on campus
including some of cf's own that makes you log in every time because we want to make sure that it's indeed you accessing
the site and not a roommate or friend or someone maliciously so once you do fill out this form how does Google
subsequently know that you are you and when you reload the page even or open a second tab for your same Gmail account
how do they know that you're still David or Carter or Emma or someone else well let's look underneath the hood of what's
going on when you log into Gmail essentially you initially see a form like this using a get request and the
website responds like we saw last week with some kind of HTP response hopefully 200 okay with the form meanwhile the
website might also respond with an HTTP header that last week we didn't care about this week we now do whenever you
visit a website it is very commonly the case that the website is putting a cookie on your computer and you may
generally know that cookies can be bad and they kind of track you in some way and that's both a a a blessing and a
curse without cookies you could not Implement things like shopping carts and logins as we know them today
unfortunately they can also be used for ill purposes like tracking you on every website and serving you ads more
effectively and so forth so with good comes some bad but the basic primitive for us the computer scientist boils down
to just HTTP headers a cookie is typically a big number a big seemingly random value that a server tells your
browser to store in memory or even longer term store on disk so you can think of it like a file that a server is
planting on your computer and the promise that HTTP makes is that if a server sets a cookie on your computer
you will represent that same cookie or that same value on every subsequent request so when you visit the website
like Gmail they plop a cookie on your computer like this with some session equals value some long random value 1 2
3 ABC or something like that and when you then then visit another page on gmail.com or any other website you send
the opposite header not set cookie but just cookie colon and you send the exact same value it's similar to going to a
club or an amusement park where you pay once you go through the gates once you get checked by security once and then
they you know very often take a like a little stamp and say okay now you can come and go and then for you efficiency
wise if you come back later in the day or later in the evening you can just present your hand you've been stamped
presumably they've already uh you've already paid you've already been searched or whatnot and so it's this
sort of FastTrack ticket back into the club back into the park that's essentially what a cookie is doing for
you whereby it's a way of reminding the website we've already done this you already asked me for my username and
password this is my past to now come and go now unlike this hand stamp which can you know kind of be easily copied or
transferred or duplicated or you know kept on over multiple days these cookies are really big seemingly random values
letters and numbers so sta stally there's no way someone else is just going to guess your cookie value and
pretend to be you it's just very low probability statistically but this is all it boils down to is this agreement
between browser and server to send these values back and forth in this way so when we actually translate this now to
code let's do something like a simple login app let me go into a folder I made in advance today called login and let me
code up uh app.py and let's take a look in here so what's going on a couple couple of new things up top if I want to
have the ability to stamp my users hands virtually and Implement sessions I'm going to have to import from flask
support for sessions so this is another feature you get for free by using a framework and not having to implement
all this yourself and from the flask session Library I'm going to import session capital S why I'm going to
configure the session as follows long story short there's different ways to implement sessions the server can store
these cookies in a database in a file in memory and RAM in other places too we are telling it to store these cookies on
uh the server's hard drive so in fact whenever you use sessions as you will for problem set 9 you'll actually see a
folder suddenly appear called flask session inside of which are the cookies essentially for any users or friends or
yourself who've been visiting your particular application so I'm setting it to use the file system and I don't want
them to be permanent cuz I want when you close your browser the session to go away they could be made to be permanent
and last much longer then I tell my app to support sessions and that's it for now let's see what this application
actually does before we disect the code let me go over to my terminal window run flask run and then let me go ahead and
reload my preview URL give it a second to kick back in let me go ahead and open my URL come
on oops let me go ahead too long of a break there we go so this website simply has a login form
there's no password though I could certainly add that and check for that too it just asks for your name so I'm
going to log in as myself David and click login and now notice I'm currently at the SL login route but notice this if
I try to go to the default route just slash which is where most websites live by default notice that I magically get
redirected to login so somehow my code knows hey if you're not logged in you're going to SL login instead let me type in
my name David and click login and now no notice I am back at slash Chrome is sort of annoyingly hiding it but this is the
same thing as just a single slash and now notice it says you are logged in as David log out what's kind of cool is
notice if I reload the page it still knows that if I create a second Tab and go to the same URL it still knows that I
could even um I could keep doing this in multiple types it's still going to remember me on both of them as being
logged in as David so how does that work especially when I click log out then I get uh forgotten altogether all right so
let's see how this works and it's some basic building blocks under my slash route notice I have this if there is no
name in the session redirect the user to/ login so these two lines together are what Implement that automatic
redirection using HTTP 301 or 302 automatically it's handled for me with these two lines otherwise show
index.html all right let's go down that rabbit hole what's in index.html well if I look in uh my let me look look in my
templates folder uh for my login demo and look at templates SL
index.html all right so what's going on here I extend layout. HTML I have a block body and then I've got some other
syntax so we haven't seen this yet but it's more Ginga stuff which again is almost identical to python if there's a
name in the session variable then literally say you are logged in as curly braces session bracket name and then
notice this I've got a simple HTML link to log out via SL logout else if there is no name in the session then it
apparently says you are not logged in and it leads me to an HTML Link to/ login and then end if so again Ginga
does not rely on indentation recall the HTML and CSS don't really care about indentation only the human does but in
code with ginger you need these end tags and block and for and if to make super obvious that you're done with that
thought so session is just this magic variable that we now have access to because we've included these two lines
of code and these that handle that whole process of stamping every user's hand with a different unique identifier if I
made my codespace public and I let all of you visit the exact same URL all of you would be logged out by default you
could all type your own names individually all log in at the same URL using different sessions and in fact I
would then see if I go into my terminal window here and my my login directory notice the flask session directory I
mentioned and if I CD into that and type LS notice that I had two tabs open or actually I think I started the server
twice I have two files in there I would ultimately have one file for every one of you and that's what's beautiful about
sessions is it creates the illusion of per user storage inside of my session is my name inside of your session so to
speak is your name and the same is going to apply to shopping carts ultimately as well let's see how login works here my
login route supports both get and post so I could play around if I want and notice this this login route is kind of
interesting as follows if the user got to this route via post my inference is that they must have submitted a form why
because that's what how I'm going to design the HTML form in a second and if they did submit the form via post I'm
going to store in the session at the name key whatever the human's name is and then I'm going to redirect them back
to slash otherwise I'm going to show them the login form so this is what's kind of cool if I go to this login form
which lives at literally SL login by default when you visit a URL like that you're visiting a via get and so that's
why I see the form however notice this the form very cleverly submits to itself like the one route login submits to its
same self SL login but it uses post when you submit the form and this is a nice way of having one route but for two
different types of operations or views when I'm just there visiting SL login via a URL it shows me the form but if I
submit the form then this logic these three lines kick in and this just avoids my having to have both an index route
and a greet route for instance I can just have one route that handles both get and post how about log out what does
this do well it's as simple as this change whatever name is in the session to be none which is Python's version of
like null essentially and then redirect the user back to slash because now in index.html I will not notice a name
there anymore this will be false and so I'll tell the user instead you are not logged in so like as I want to say as
simple as this is though I realized this is a bunch of steps involved this is the essence of every website on the internet
that has usernames and passwords and we skipped the password name step for that more on that in problem set nine but
this is how every website out there remembers that you're logged in and how this works ultimately is that as soon as
you use in Python lines like this and lines like this flasks takes care of stamping the virtual hand of all of your
users and whenever flask sees the same cookie coming back from a user it grabs the appropriate file from that folder
loads it into the session Global variable so that your code is now unique to that user and their name
let's do one other example with sessions here that'll show how we might use these now for shopping carts let me go into
the store example here let me go ahead and run this thing first if I run store in my same Tab and go back over here
we'll see a very ugly e-commerce site that just sells seven different books here but each of these books has a
button via which I can add it to my cart all right well where are these books coming from well let's kind of poke
around let me go into to my terminal window again let me go into this example which is called store and let me open up
about uh index. HT whoops let's open up index how about books. HTML is the default one not
Index this time so if I look here notice that that route that we just saw uses a for Loop in ginger to iterate over a
whole bunch of books apparently and it outputs in an H2 tag the title of the book and then another one of these forms
so that's kind of interesting let's go back one step let's go ahead and open up app.py because that must be excuse me
what's kicking all of this off notice that this file is importing session support it's configuring sessions down
here but it's also connecting to a store. DB file so it's adding some SQL light and notice this in my slash route
I'm selecting star from books which is going to give me a list of dictionaries Each of which represents a row of books
and I'm going to pass that list of books into my books. HTML template which is why this for Loop works the way it does
let's look at this actual database let me increase my terminal window and do SQL light of store. db. schema will show
me everything there's not much there it's a book it's a table called books with two columns ID and title let's do
select star from books semicolon there are the seven books each of which has a unique ID and you might see where this
is going if I go to the UI and I look at each of these buttons for add to cart just like Amazon might have notice that
each of these buttons is just a form and what's magical here just like d register even though I didn't highlight it at the
time there's another type of input that allows you to specify a value without the human being able easily to change it
instead of type equals text or type equals submit type equals hidden will put the value in the form but not reveal
it to the user so that's how I'm saying that the ID of this book is one the ID of this book is two the ID of this book
is three and so forth and each of these forms then will submit apparently to SL cart using post and that would seem to
be what adds things to cart so let's try this let me click on one or two of these let's add the first book add to cart
here's my cart notice my route changed to slash cards all right let's go back and let's add the book number two there
we have that one and let's skip ahead to the seventh book deathly hows and how now we have all three books here so what
does the cart route do at slart well let's look if I go back to my terminal window look at app.py and look at SLC
cart okay there's a lot going on here let's let's see so the slash cart route supports both get or post which is the
nice way to consolidate things into one URL all right this is interesting if there is not a quote unquote cart key in
session we haven't technically seen the syntax but long story short these lines here do ensure that the cart exists what
do I mean by that it makes sure that there's a cart key in the session Global variable and it's by default going to be
an empty list why that just means you have an empty shopping cart but if the user visits this route via post and the
user did provide an ID they didn't muck with the form in any way and like try to hack into the website they gave me a ID
then I'm going to use this syntax if session bracket cart is a list recall from a couple of weeks ago that aen just
add something to the list so I'm going to add the ID to the list and return the user to cart otherwise if the user is at
SL cart via get implicitly we just do this select star from books where ID is in and this might be syntax you recall
from pet 6 it lets you look for multiple IDs all at once because if I have a list of session uh list of IDs in my cart I
can get all of those books at once so long story short what has happened here I am storing in the cart the books that
I myself have added to my cart my browser is sending the same hand stamp again and again which is how this
website knows that it's me adding these books to my cart and not you or not Carter or not Emma indeed if all of us
visited the same long URL and I made it public and allowed that then we would all have our own illusions of our own
separate carts and each of those carts in practice would just be stored in this flask
session directory on the server so that the server can keep track of each of us using again these cookie values that are
being sent back and forth via these headers all right I know that's a lot but again it's just the new python way
of just leveraging those HTTP headers from last week in a clever way any questions before we look at one final
set of examples yeah [Music] so I think you're asking about using the
get and post in the same function so this is just a nice uh aesthetic if you will if I had to have separate routes
for get and post I mean it literally might mean I need twice as many routes in my file and that just starts to get a
little Annoying um and these days too in terms of user experience this is you know maybe only appeals to The Geek in
us but like having clean URLs is actually a thing like you don't want to have lots of words in the URL it's nice
if the URLs are nice and succinct and canonical if you will so it's nice if I can centralize all of my shopping cart
functionality in SLC cart only and not in multiple routes one forget one for post it's a little you know a nitpicky
of me but this is a commonly done here so what this code here means is that this route this function henceforth will
support both get requests and post requests but then I kind of need to distinguish between whether it's get or
post coming in because if it's a get request I want to show the cart if it's a post request I want to update the C
and the simplest way to do that is just to check this value here in the request variable that we imported from flask up
above you can check what is the current type of request is it a get is it a post or is it it's something else alt
together there are other verbs if it's a post that must mean because I created the web form that uses post that the
user clicked the add to cart button otherwise if it's not post it's implicitly going to be logically get
then I just want to show the user the contents of the cart and I use these lines instead so it's just one way of
avoiding having two routes for two different HTTP verbs you can combine them so long as you have a check like
this if I really want wanted to be pedantic I could do this uh L if or L if request. method equals get this would be
more symmetric but it's not really necessary because I know there's only two
possibilities hope that helps all right let's do one final set of examples here that's going to tie the
last of these features together to something that you probably see quite often in real world applications and
that For Better or For Worse is now going to involve tying back in some JavaScript from last week the goal at
hand of these examples is not to necessarily Master how you yourself would write the python code the SQL code
the JavaScript code but just to give you a mental model for how these different languages work so that for final
projects especially if you do want to add JavaScript functionality much more interactive user interface you at least
have like the bare bones of a mental model for how you can tie these languages together even though our Focus
generally has been more on Python and SQL than on JavaScript from last week let me go ahead and open up uh an
example called shows version zero of this and let me do flask run and let me go into my URL here and see what this
application looks like by default this has just a simple query text box with a search box let's take a look at the HTML
that just got sent to my browser all right there's not much going on here at all so there's a form whose action is/
search it's going to submit via get it's going to use a q parameter just like Google it seems and submit it so this
actually looks like the Google form we did last week so let's see what what goes on here let me search for something
like cat enter okay so it looks like all right so this is actually a somewhat familiar file what I've gone ahead and
done is I've grabbed all of the titles of TV shows from a couple of weeks ago when we first introduced SQL and I
loaded them into this demo so that you can search by keyword for any word you want I just searched for cat if we were
to do this again we would see all the title of TV shows that contain dog dog as a substring somewhere and so
forth so this is a traditional way of doing this just like in Google it uses SL search question mark Q equals cat Q
equals dog and so forth how does that work well let's just take a quick look at app.py here let me go into my uh zero
example here show zero and open up app.py and see what's going on all right very simple here's the form that's kind
of how we started today and here is the SL Circ route well what's going on here this gets a little interesting so I
first select a whole bunch of shows by doing this select star from shows we title like question mark and then I'm
using some percent signs from SQL on both the left and the right and I'm plugging in whatever the user's input
was for Q if I didn't use like and I used equal instead I could get rid of these curly bra uh these uh percent
signs but then it would have to be a show called cat or called dog as opposed to to it being like cat or like dog this
whole line returns to me a list of dictionaries Each of which represents a show in the database and then I'm
passing all of those shows to a template called search. HTML so let's just follow that breadcrumb let's open up shows. uh
sorry search. HTML all right so this is where templating gets kind of cool so I just passed back hundreds of results
potentially but the only thing I'm outputting is an unordered list and using a ginger for Loop An Li tag
containing the titles of each of those shows and just to prove that this is indeed a familiar data set and I
actually simplified it a bit if I look at shows. DB with SQL light I threw away all the other stuff like ratings and
actors and everyone else and I just have for instance select uh select star from shows limit 10 just so we can see 10 of
them there's 10 of the shows from that database so that's all that's in the database itself so it would look like
this is a pretty vanilla web application it uses get it submits it to the server the server spits out a response and that
response then looks like this which is a huge number of Li tags one for each cat or one for each dog match but everything
else comes from a layout. HTML all the stuff at the top and at the bottom all right so these days though we're in the
habit of seeing auto complete and you start typing something and you don't have to hit uh submit you don't have to
click a button you don't have to go to a new page web applications nowadays are much more Dynamic so let's take a look
look at this version one of this thing let me go into shows one and close my previous tabs and run flask run in here
and it's almost the same thing but watch the behavior change a little bit I'm reloading the form there's no button now
so gone is the need for submit button I want to implement autocomplete now so let's go ahead and type in C okay
there's every show that starts with C A there's every show that has CA in it rather T there's every show with C A in
it I can start it again and do dog but notice how instantaneous it was and notice my URL never changed there's no
SL search route and it's just immediate like with every keystroke it is searching again and again and again
that's kind of a nice ux user experience because it's immediate this is what users are used to these days but if I
look at the source code here notice that in the source code there's just an empty UL by default but there is some fancy
JavaScript code so let's see what's going on here this JavaScript code is doing the following uh let me zoom in a
little bit more this JavaScript code is first selecting with query selector which you
used this past week quote unquote input all right so that's just getting the text box uh then it's adding an event
listener to that input for the input event we didn't talk about this last week but literally when you provide any
kind of input by typing by pasting by um any other user interface mechanism it triggers an event called in so similar
to key press or key up I then have a function no worries about this async function for now then what do I do
inside of this all right so this is new and this is the part that let's just focus on the ideas and not the syntax
JavaScript nowadays comes with a function called Fetch that allows you to get or post information to a server
without reloading the whole page you can sort of secretly do it inside of the page what do I want to fetch SL search
question mark Q equals whatever the value of that input is when I get back a response I want to get the text of that
response and store it in a variable called shows and I'm deliberately bouncing around ignoring special words
like await and await here but for now just focus on what came back a response came back from the server I'm getting
the text from it storing it a variable called shows what am I then doing I'm using query selector to select my UL
which is empty by default and I'm changing its inner HTML to be equal to the shows that came back from the server
so let's poke here's where again developer tools are quite powerful let me go ahead and reload this page to get
rid of everything and let me now open up inspect let me go to the network Tab and let's just sniff the traffic going
between my browser and server I'm going to search for C notice that immediately triggered an HTTP request to slash
search question mark Q equals c so I didn't even finish my cat thought but notice what came back a bunch of
response headers but let's actually click on the raw response this is literally the response
from the server just a whole bunch of Li tags no UL no HTML no title no body nothing just Li tags and we can actually
simulate this let me manually go to that same URL Q equals c enter we are just going to get back whoops sorry SL search
Q equal C we are just going to get back this stuff which of I view Source it's not even a complete web page the browser
is trying to show it to me as a complete web page with bullets but it's really just partial HTML but that's perfect
because this is ex literally what I essentially want my python code to copy paste into the otherwise empty UL tag
and that's what this JavaScript code then here is doing once it gets back that response from the server it's using
these lines of code to plug all of those Lis into the UL after the fact again changing the So-Cal Dom but there's a
slightly better way to do this because honestly this is not the best design because if you've got a hundred shows or
more you're sending all of these tags unnecessarily like why do I need to send all of these stupid HTML tags why don't
I just create those when I'm ready to create them well here's the final flourish whenever making a web
application nowadays where client and server keep talking to one another Google Maps does this Gmail does this
like literally every cool application nowadays you load the page once and then it keeps on interacting with you without
reloading or having to change the url let's actually use a format called Json JavaScript object notation which is to
say there's just a better more efficient better designed way to send that same data I'm going to go into shows two now
and do flask run and I'm going to go back to my page here the user interface is exactly the same and it still works
exactly the same here's C CA c a t and so forth but let's see what's coming back now if I go to slash search
question mark Q equals cat enter notice that I get this crazy looking syntax but the fact that it's so compact is
actually kind of a good thing this is actually going to let me format it a little nicer well or a little worse this
is what's called JavaScript object notation in JavaScript an angle a square bracket means Here Comes an array in
JavaScript a curly bracket says here comes an object AKA a dictionary and you might recall from uh did we do you kind
of sort of recall that you can now have keys and values in JavaScript notation using colons like this so long story
short cryptic as this is to you and me and not very human friendly it's very machine friendly because for every title
in that database I get back its ID and its title its ID and its title its ID and its title and this is a very generic
format that an API an application programming interface might return to and this is how apis nowadays work you
get back very raw textual data in this format Json format and then you can write code that actually
programmatically turns that Json data into any language you want for instance HTML so here's the third and final
version of this program I again select my input I again listen for input I then when I get input call this
function I fetch SL search Q equals whatever that input was C or CA or c a i then wait for the response but instead
of getting text I'm calling this other function that comes with JavaScript these days called Json that just parses
that it turns it into a dictionary for me or really a list of dictionaries for me and stores it in a variable called
shows and this is where you start to see the convergence of HTML with JavaScript let me initialize a variable called HTML
to nothing quote unquote using single quotes but I could also use double quotes this is Javascript Syntax for a
loop let me iterate over every ID in the shows list that I just got back in the server that big chunk of Json data let
me create a variable called title that's equal to the shows the title of the show at that ID but for reasons we'll come
back to let me replace a couple scary characters then let me dynamically add to this variable An Li tag the actual
title and a close Li tag and then very lastly after this for Loop let me update the uls inner HTML to be the HTML I just
created on the fly so in short don't worry too much about the syntax because you won't need to use this unless you
start playing with more advanced features uh quite soon but what we're doing is with JavaScript we're creating
a bigger and bigger and bigger string of HTML containing all of the Open brackets the LI tags the closed brackets but
we're just Gra the raw data from the server and so in fact in problem set N9 you're going to use a real world
third-party API application programming interface for which you sign up the data you're going to get back from that API
is not going to be show titles but actually stock quotes and stock ticker symbols and the prices of last uh uh at
which stocks were last bought or sold and you're going to get that data back in Json format and you're going to write
a bit of code that's then going to convert that to the requisite HTML on the page so the final result here is
literally the kind of autocomplete that you and I see and take for granted every day and that's ultimately how it works
HTML and CSS are used to present the data your so-called view python might be used to actually send or get the data on
the backend server and then lastly JavaScript is going to be used to make things Dynamic and interactive so I know
that's a whole bunch of building blocks but the whole point of problem set 9 is to tie everything together set the stage
for hopefully a very successful final project why don't we go ahead and wrap up there and we'll see you one last time
next tweak for emoji [Music] all
right this is cs50 and this is week 10 our very last together and before we dive in today um just wanted to
acknowledge how much work we know this course is for for everyone uh we know there's still a tad bit of work
remaining but we do hope ultimately that you're really proud of what you've pulled off over the past few months only
and indeed the final project whatever it is you end up building really is meant to be this Capstone where you're finally
standing on your own there's no distribution code there's not really a specification and really just an
opportunity to take all this knowledge out now for a spin and we do hope it serves you well longer term before we
dive into just want to offer a number of thanks for so much of the team uh that helps out behind the scenes in
particular um the Memorial Hall team our hosts here who make all of the space and the activities behind the scenes
possible the education Support Services team who helps with audio and video and more and then especially cs50's own team
all here in the darkness helping out in front of the camera behind the camera if we could a huge round of applause for
everyone that makes this possible you might have noticed um that these have been unusual times and we've had
some unusual guests in the front of the room here since we weren't sure what to expect early on as to just what
protocols would be on campus and so we have of course all of these plush figures behind the scenes who have been
helping out uh behind the camera behind the monitors and so forth um and what many of you'll see if you've been
watching right now or in the future these videos online you'll see a lot of backs of heads so that there's a little
bit of characteristic to some of the shots that we have here um but this is actually born of an inspiration that
comes from what who will be ultimately today's special guest um Jennifer 8 Lee in fact whom we'll meet in just a little
bit um was ultimately uh the good friend of the class that inspired this tradition of using puppetry in some form
in the class here um what I see down below is is a shot like this here um and funny enough it seems that with machine
learning what it is nowadays artificial intelligence so to speak on social media and the like like literally no joke I
pulled up Twitter earlier today and among my suggestions for whom I should follow now we literally the suggestions
here um this is uh perhaps not surprising though because some weeks back I actually started following uh
count Von count whom you might remember from Sesame Street if you're not following him already this is an amazing
uh count to follow an actual count to follow um and it's actually an amazing use of programming so this account
joined in April of 202 12 it's got 198,000 followers after uh as of today and what it's been doing for like 9 plus
years is tweeting out a number one per day this morning's was 3,3 uh 20 three
3,327 yesterday's was 3,326 uh uh uh and so presumably someone's just written a program python
or something else that's just generating these tweets once a day even more amusing though is that like every tweet
for the past N9 years has like 20 or 30 comments on it from people who are following it so perhaps consider
following this same account and the same application of Cs as well wanted to also thank cs50's team behind the cameras you
might recall uh the teaching fellows um last year in particular when everything was on Zoom kindly put together this
visualization of tcpip and the passag passing of messages among routers and intern computers for instance from
Phyllis at bottom right to Brian at top left uh just wanted to thank the team but also reveal to you all that uh these
takes were not perfect by any means and in fact here's just 60 seconds or so of outtakes of us trying to get data from
point A to point B [Music] all right if we could too a round of
applause for all the teaching fellows teaching assistants and course assistants who make the course
possible as well before we now do a bit of review of the semester thought we'd take first a higher level view of where
we've come from recall of course from the syllabus and literally we zero we claim this that what ultimately matters
in this course is not so much where you end up relative to classmates but where you end up relative to yourself when you
began and we really do mean that there are certainly classmates of yours who have been programming since they are 10
years old but there are two-thirds of your classmates who were not in fact that case and so behind you in front of
you to the left and to the right today are so many classmates who have had a very shared experience with you but the
only person that really matters at the end of the day in terms of how You' progressed in this class truly is where
you in fact began and I realized that with cs and especially this course and with programming assignments especially
it can feel like week after week that you're not really making progress because it might feel like you're
struggling every darn week but that's just really because we kind of keep moving the bar higher and higher pushing
the Finish Line a little further and further ahead because think back to like week one when this for instance whoops
when this alone was hard and you were just trying to get Mario to ascend a pyramid that might look a little
something like this or the week after after when you started dabbling with readability two weeks after Mr and Mrs
dley of number four privet drive and so forth trying to analyze just how complex a sentence like that was and
manipulating strings and characters for the first time and then of course we progress to deeper uh dives into
algorithms and actually implementing something that's all to real world these days and implementing electoral
algorithms in a few different forms dabbling there after in a bit of forensics a bit of imagery and taking
images like this here and filtering it in a number of way ways ultimately understanding hopefully how these things
are implemented underneath the hood so that henceforth when all you're doing is tapping an icon on your phone or
clicking a command on your computer you can infer even if you didn't write that particular code how the thing is likely
working and even if you had started to get your footing then around week four then things escalated quickly further to
data structures but recall for your spell checker you implemented a fairly sophisticated data structure known as a
hash table and even if you struggle to get that working again think back five years five weeks prior you were just
trying to get four Loops to work and variables to work and so with each week realized there was significant progress
and then if you aggregate all these most recent weeks with python and SQL HTML JavaScript and CSS I mean you built your
very own web application and many of you will go on and build something grander for your own final project or Focus
again on C or on python alone or the like but ultimately aggregating all of these Technologies and kind of stitching
together something that you yourself created we might have kind of put some of the foundation there in place but the
the end result ultimately is yours so at the end of the day as we promised in week zero this course is really about
computational thinking cleaning up your thought process getting you to think a little more logically more methodically
and to express yourself just as logically and methodically but it's also about in some form critical thinking and
at the end of the day what computer science is is really just taking input producing output ideally correct output
in all the hard stuff is in the middle there but what we do hope you have in your your toolkit so to speak is all the
more of a mental model all the more of an understanding of like first principles from which you can derive new
outputs new conclusions based on those inputs and certainly today right there's so much misinformation or miseducation
in the world and just being able to take input and produce proper output in of itself is a compelling skill and indeed
when you all find yourselves invariably in POS in engineering positions where you're asked to build something because
you now can or perhaps you're in a managerial role where you decide you should build something because you know
people who can I would also start to consider even though the past 10 plus weeks have all been about build this
because we asked you to to really start to consider whether it's for fun for profession for political purposes or the
like should you build something and actually considering now that you have this skill how you can use it most
responsibly and not just make a website do something or make an app do something because it can be done but really start
to ask and ask of others like should we be doing this it's just a skill that you can but don't necessarily have to use
now when it comes to writing some actual code keep in mind that you might continue to evaluate or your employer or
your colleagues might continue to evaluate your code along these same axes these are not cs50 specific correctness
does it do does it do what it's supposed to do design like how well qualitatively is it implemented and then style how
readable is it how pretty is it and these three axes should really guide all of your thinking whether it's for a test
or a project or an open- source project or the like like all three of these things really matter and so if you're in
the mindset of wondering oh do I have to worry about style for this do I have to comment this like the answer is always
yes this is what it means to be a good programmer a good engineer to optimize these kinds of axes now what about sort
of those tools in the toolkit well let's focus on just a couple here uh full circle at the end of the semester
abstraction recall was one of the tools in the toolkit that we proposed is all about taking like complicated problems
complicated ideas and simplifying them to really the essence so you can focus on really just what matters or what
helps you get real work done and then related to that was also this notion of precision even as you abstract things
away you still have to be super precise when you're writing code for a computer or just giving instructions to another
human so that they are implementing your ideas your your algorithms correctly and sometimes these two goals abstraction
and precision can can rather be at odds at one another and what we thought we'd do is give everyone a sheet of paper
today which you probably received on the way in if not a pen as well if you didn't receive hopefully you or a friend
near you has a sheet of paper and a pen or pencil do go ahead and grab that and we thought we'd uh come full circle to
and see if we can't get a brave volunteer to come up on the stage here and we just need someone to give some
stage directions all right I like it when people start pointing and pointing how about you being pointed at yes yes
you yes come on down well there'll be one more opportunity after this come on down what's your name
Claire Claire okay Round of Applause for CLA for being so enthusiastic come on over here would you
like to make a quick introduction to the group yeah hey I'm Claire uh yeah that's all you
need to know about me all right so what I'm about to hand Claire is a sheet of paper that has a drawing on it and the
goal at hand is for for you all to ultimately follow Claire's hopefully very precise instructions because she's
going to give you step-by-step instructions an algorithm if you will for drawing something on that sheet of
paper all right we're going to keep it in this uh manila envelope so that folks can't see through it but this is what we
would like you to give verbal instructions to the audience to draw and you can say anything you want but you
may not make physical hand gestures or the like and or dip it down so everyone can see it oh that's so true that's so
true all right go ahead step one wait I could say whatever I want related to this problem yes
IA oh my God give them instructions for recreating this picture on their paper okay start with uh
like like um a a square but but it's no hand gestures okay okay
sorry sorry startor with a square but it's like a diamond kind like there's a point on
Top wait I should not be the one doing this okay so it's like a square but yeah start with a a
square okay step two step two is that on one of the sides of the square there's another
Square doing really well on the abstraction I don't feel like I'm doing too hot okayo does this does this affect
my grade in anyway no no okay AES go on two squares and then there's like another
square but they're like not squares they're like kind of slanted um there's another Square in
between like next next to those squares connecting those squares okay any step four step four is that it should look
like a [Laughter] cube okay so let's go ahead and pause
here pause here let's let's thank Claire for coming on up bravely I'll take this if uh let's go ahead and collect
just a few of these if maybe Carter and Valerie you wouldn't mind helping me grab just a few sheets of paper if you'd
like to volunteer what it is you drew in those seconds just hand it over if you would like no need for a name or
anything like that okay all right very eager thank you okay thank
you all right thank you thank you okay sorry okay that that's that's
plenty let's come on up if you want to oh you want to hand meet yours too okay sorry to reach all right so Carter if
you want to meet me up on stage for a second so we have a whole bunch of submissions here that represent what it
was Claire was describing let me go ahead and uh just project here in a moment use my camera
so here we have one let's see Carter feel free to just bring those on up here okay so here we have one I'll hold
up all right so some squares overlapping started to look more like a cube thank you so much uh here maybe in more
primitive form was another one this one kind of started to have wheels which was kind
of and then things started to take shape perhaps at the very end both Big Cube and small cube what it was that Claire
was showing us now if we project it was in fact this and it's actually exactly what CLA you just went through is
actually a perfect example of like why abstraction can be hard and where the line is when you're just trying to
communicate instructions So In fairness might have been nice to just start with we're going to draw a cube and like
here's how because that was kind of a spoiler at the end but that too a cube is an abstraction but it's not very
precise right like how big is the cube at what angle is it rotated what is how you looking at it and so when you were
struggling to describe these squares but no they're kind of like diamonds or whatnot I mean that's because of this
tension between what it is you're trying to abstract but what it is you're trying to communicate you could have gone maybe
the complete other direction and maybe have been super precise and not abstract this thing away as a cube but say to
everyone all right everyone put your pen down on the paper now draw a diagonal line to uh say Southwest at 45° now do
another one that's South you could really get into the weed and tell people to go up down left right of course it
could get a little tricky if they sort of follow the direction incorrectly but it would be hard for us all to know what
it is we're drawing If all we're hearing are these very low-level instructions but that's what you're doing when you're
writing code you might Implement a function called Cube how it works is via those low-level instructions but after
that you just don't care you'd much rather think about it as a cube function maybe with some arguments that speak to
the size or the rotation of it or the like and that's where again abstraction can come in so as we've discussed for so
many weeks now these tradeoffs were manifest even in week zero even if we didn't necessarily put our finger on it
just then why don't we do things in a slightly different direction if we could get one other volunteer okay come on
down I saw your hand first one other volunteer who this time we're going to give the pen
to we're going to give the pen to and what's your name Jonathan come on up so I'm going to make this screen be
drawable in just a moment but what we need you to do first on the honor System is close your eyes all right eyes are
closed everyone else in the audience is about to see the picture that we want you to draw and you all the audience are
going to give Jonathan the step-by-step instructions this time around so eyes stay closed this is what we're going to
want Jonathan to draw so kind of ingrain it in your mind if you need a refresher we can have him close his eyes again but
that's what we want him to draw I'm going to go back to the blank screen all right Jonathan you can open your eyes we
have a blank canvas and now step one what would you like Jonathan to draw first draw a circle I
heard okay it's a little smaller I'm hearing now okay you can move it no don't do that all right let's
we'll do we'll give you one redo use three fingers to delete everything uh three fingers Al together yep there we
uh further apart there we go no this back okay I'll do this part yeah okay all right so I heard thank you I heard
draw a circle would anyone like to finish the sentence more precisely a smaller
circle on top medium a medium siiz circle at the top all right that's pretty good medium siiz circle at the
top and no more deleting after this good all right step two align straight down
[Music] [Applause] yeah okay good all right that was step
two nicely done what's that step [Music] three line draw a line down from the
bottom to the [Applause] left and
okay good all right next let's go over here next one same
same thing but on the right yes all right uh that's what one two 3 four step
five step yes step five thater closer to the circle do that again but higher closer to the circle on the right
[Applause] side all we're going to have to go with it step six step six
draw line starting from the neck draw a line down and to the
right you don't like that he's he's what do you want him to do step [Music]
six can't no one do where the other line ends say again where the other line
[Music] and near the vertical line where the other line
ends draw a line that goes down okay couple more Steps step seven seven draw an A horizontally slanting
line from the end of the line you just drew Diagon
[Music] diagonally okay we're resorting to hand gestes now but I think that's what you
mean yes okay good good good all right one or two final steps let's get as close as we
can say hi make him say hi [Applause]
[Music] okay [Music]
hi okay and maybe one final step we'll give him one more say
again [Music] again put one of those lines from high
to the [Music] circle a line between High and the
circle all all right let's let's show Jonathan that's pretty darn close let's show him what what we had in mind was
this so a round of applause for Jonathan too if we could thank you a bigger round of applause for Jonathan if we
could all right so I mean this is actually there is this thing in in computer science notice is pair
programming we actually program with someone else and that's actually not all that dissimilar trying to communicate
your ideas to someone else but notice just all of the ambiguities and it certainly doesn't help that we're in a
big space but all the ambiguities that arise when you're just trying to convey something precisely so this is not
necessarily as constrained as a program would but it's representative at the end of the day even after all these weeks
this stuff is hard and in fact it's not necessarily um ever going to be completely straightforward because the
problems you're going to try solving down the road presumably if you continue to apply these skills themselves are
just going to get more and more sophisticated but hopefully the the feeling you get from accomplishing
something as a result is just going to rise with them as well before we now do a bit of review just wanted to offer a
few suggestions and answer to an FAQ which is like what do I do after a class like cs50 typically about half of you
will go on and take one or more other classes in CS which is great building on this kind of foundation and about half
of you will not like this will be it but very likely certainly given how the world is trending will you have
opportunities in the Arts Humanity social sciences or Beyond to just apply programming to data sets to problems in
those own domains and so toward that end we would encourage you to start thinking about how you can transition from what
has been your Cloud um code space in the cloud to something client side like using your own Mac and PC here on out so
that you're not reliant on a course's infrastructure a particular website and even though we used a fairly industry
standard tool you can actually get almost all of that stuff running with some effort perhaps on your own Mac and
PC so terminal Windows actually come built into Mac OS if you go to your util applications folder utilities there is a
program literally called terminal that has always always been there even if you've never used it that will behave
very similar to what vs codes does as well in the World of Windows can you similarly install a version of the
terminal Windows software that we used in the Cloud 2 to actually run similar commands like CD and LS and and much
more we would encourage you ultimately to learn git you've been indirectly using git this semester when you run
certain commands we have been using git underneath the hood of some of cs50's tools that essentially push your code so
to speak to the cloud to a place like github.com but get itself is an incredibly powerful
and just useful tool for one backing up your code somewhere else to the cloud which is effectively what we've used it
for but two collaboration so that you can actually share your code more readily with other people and three
building much bigger pieces of software where each of you work on different files different folders or even just
different parts of the same file and then somehow merge all of your handwork together at the end of the day to build
something much bigger than you as one person could alone VSS coded itself now too we've been hosting it in the cloud a
real version of VSS code but it's much more commonly used on people's own Macs and PCs and you can downloaded onto your
own Mac and PC you might have to jump through a few more Hoops to get things like C working though python is much
easier to get working as well some of the configuration won't be quite the same like your prompt might look a
little different and the like but that's just going to be the case anytime you sit down in the future at a different
system it's going to look and feel a little different to things you've used before but hopefully there'll be enough
familiarities that you get yourself up and running pretty quickly nonetheless hosting a website not necessarily
something you have to do or will do for your final project depending on your proposal but there's lots of ways to
just host your own portfolio page homepage website whatever uh on the internet itself using tools like these
GitHub or netlify or other tools too most of which have like free student-friendly plans some of these are
indeed paid services but they very often have entry-level plans that are totally fine if it's just you on the internet
and you don't expect having thousands 10 thousands of users it's a drop in the bucket for these companies and so they
very often have free tiers of service if you want to host something more Dynamic something like cs50 finance that takes
user input and output uses sessions uses databases you might like something like kuroku and for instance we have some
documentation on one of cs50's websites for actually moving your implementation of cs50 Finance over to this third party
application called Heroku so you can actually run it or something like it in the cloud as well here too using a free
tier of service all of these providers these are big cloud providers these days uh Amazon Microsoft Google and others
all have student friendly accounts that you can sign up for during or shortly after you're in school that just give
you uh free compute time and storage uh GitHub itself has this whole student pack that by transitivity gives you
access to a whole bunch of discounts and other things as well so if you're liking this stuff and you just want to like
learn more perhaps over break by playing on your own these then would be some some good starting points and as for
just keeping a breast of Trends in programming and technology or the like there's so many different blogs and
websites out there but here are just some a couple of different subreddits so to speak on Reddit that are very
programming specific stack overflow with which you've probably uh uh interacted server fault which is similar Tech
crunch Y combinator and other sites too and ultimately we would encourage all of you to stay in touch certainly Beyond
today um by the time you finish your final projects we'll have something waiting for you and if you want to stay
engaged either on the teaching staff or just as a lifelong learner of CSN programming by all means check out any
of these URLs here but in just a few weeks time will you have one of these to your name your very own iuk cs50 t-shirt
which we will distribute before long as well and now if we may uh we have an opportunity here to synthesize
the past several weeks of material if you would like to go ahead and open up the URL that we put on the screen
earlier I'll toss it up here again you can use your phone or your laptop you might recall for a previous problem set
we asked you to propose a whole bunch of review questions multiple choice or the like that synthesized the past several
weeks of material uh we took some of our favorite submissions of those ported it to this poll everywhere platform so that
we could interactively see where everyone's minds are at understanding is at and I think you'll find all of these
are written by you and your classmates um that we slipped a few fun ones they also written by you along the way if
Carter you want to come on up here to get us ready if you haven't yet opened the website go to this URL here on your
phone or your laptop and let me go ahead and switch us over here before Carter takes control of this machine here
here's that same 2D barcode again feel free to background that now and in just a moment we've got a a 20 question quiz
show it's all multiple choice so long as you have internet access whether you're here physically or online right now you
should be able to buzz in within 10 to 20 seconds of seeing a question and I'll read each one aloud I think Carter we're
just about good to go so does everyone have the software up and running on their phone or their laptop if not no
big deal just look on with a friend but otherwise Carter do you want to say hello to and TS up absolutely hi
everyone we're going to go ahead and get started here with our first question speed here matters so our first
question David go ahead what is it all right what does CSS stand for is the first question written by you four
possible options are cascading stylesheet coding stylesheet cascading style system coded style sheet
15 seconds up to 300 responses already both here in person and online give folks a few more seconds
what does CSS stand for these are the four options that were provided three two one Carter cascading stylesheets at
86% is indeed the right answer so congrats to those of you 86% who got that one here's the leader board you all
have fairly random usernames but if your username is on this board here or really any of the 86% of you that just got that
right all of you are currently in the lead but we'll see if this shifts before long question two which bests describe
the role of a compiler is our next question debug one's code run the written program distinguish between
functions and arguments turn source code into machine code 300 responses in so far 10 seconds
to go which best describes the role of a compiler 3 seconds just crossed 400 and
Carter turning source code into machine code at 92% some excellent progress there is indeed the correct answer and
indeed more generally a compiler just converts one language to another the use cases we've seen for it have been only
source code to machine code but as you go out into the real world you'll actually find there to be compilers from
One Source Code language to another source code language that itself might be runable or
compilable thereafter good job to all of you guests and Carter number three what is the type of ARG C asks a classmate in
stir Char [Music] float what is the type of ARG
C all right about 350 responses 7 Seconds to go about to cross the 400 threshold in three two one the type of
arxy is int is indeed correct but we're now starting to distinguish folks only 55%
there uh Char is not correct you might be thinking of argv in C but even that is not a Char it's a Char star array or
a Char star star in fact so it's not just a Char stir is in Python but even that too if you were thinking of cis.org
V that would be a list of stirs not a single stir all right Carter it's the leaderboard all right there are our
guests all in still tied and number four what is the searching efficiency of a balanced binary search
tree Big O of n Big O of n s Big O of log n Big O of n log n what is the searching efficiency of a
balanced binary search tree the Balan being key because as folks continue buzzing in recall that uh SE binary
search trees can de degrade devolve into link lists Big O of login is correct for 54% all right now people are getting
annoyed but let's keep going number five leaderboards not yet that interesting more subtle what was the cs50 Ducks
Halloween costume he's here in winter dress today thanks to Valerie a skeleton a vampire Frankenstein or a
ghost what was its costume at Halloween a few weeks back answers are coming in a little
slower this time people online are perhaps clicking on the video and vampire is corrected
69% nicely done all right guests are still shuffled in the top oh and we're starting to see some leaders pull ahead
the time in which you buzz in is also taken to account now in C how can we unify several variables of different
types into a single new type trees arrays structs tables oh I got quiet and see how can we unify
several variables of different types into a single new type 8 seconds 400 responses
[Music] in 450 and the answer is structs are indeed correct recall that we had a
student struct and we saw structs later on for nodes that allowed us to Cluster multiple variables or data types inside
of our own brand new structure that we then type de to a name or shall we see the leaderboard
now all right whoever guest 4045 and 4383 have Eed ahead ever so slightly so buzzing in fast can now benefit your
score too next question Carter in Python which of the following statements is false tupples are an ordered immutable
set of data dictionaries associate keywords with values arrays in Python are fixed size python is an
objectoriented language which of those statements is false [Music]
3 seconds answers coming in more slowly but the most popular answer is correct arrays in Python are indeed not
of a fixed size which is why that's false they're not even called arrays they're called lists and recall that
they dynamically grow and Shrink effectively implemented for you as a linked list all right how do we all
right we have a leader whoever 4383 is nicely done what does stir comp return in C St CMP
does it return a Boolean an integer a string or a [Music]
Char what does stir comp return in C used to compare two strings of course recall that it
returns potentially not just true false but ooh an integer is indeed correct does anyone Recall why why is it an INT
and not just a simple true false why is our three values helpful exactly it returns zero if
they're equal or it returns negative value or positive value based on whether one string comes before or after the
other atically so to speak based on its asky code uh the results Carter all right 4383 still doing quite well but
being caught up with here what is David men's phone number 949468 750 play when you call
it the Harvard Alma mat a parody of Yale song a recording of David men singing Never Going To Give You
Up feel free to call or text I can't get it now but we have nicely automated that process 4 seconds 400 responses in and
the answer of course is never going to give you up thanks to a little programming in a script that our friend
Rong Shin wrote that essentially answers the phone automatically and replies with a URL or a song
Carter oh Dethrone Dethrone 2688 nicely done next
question from which of the following places does Malo get free memory for a program to
use Heap stack array or pointer from which of the following places does Malo get free memory for a
program to use [Music] answers are a little slower this time 5
[Music] seconds and the answer is in okay that's the answer we were given
in the problem set but I think uh uh we would beg to differ pretty sure Carter would you go with I would go with the
Heap I think it's indeed the Heap so this answer not correct I know I know we just
transcribed what you gave us though let's see how that affects the scores okay 2688 is still doing okay
next question about 10 or so to go suppose I have an unsorted list of items store receipts perhaps should I sort the
items before searching for an element yes you should always sort before searching no you should never sort
before searching if you will be searching the list many times then yes you should sort first if you will be
searching the list many times then no you should not sort first some nuanced replies 5
Seconds few fewer answers than usual at this point and if you will be searching the list many times then yes you should
short first an example that we discussed of tradeoffs because if you're just going to do a one-off search and never
again why bother incurring n log in or n squar time to actually sort the thing [Music]
all right some shuffling happening but 2688 nicely done next question when you run the create index command in SQL what
type of data structure do you create array B trees link lists hash tables when you run create index recall
we did this with like the movie titles the TV show titles to speed things up so that things wouldn't be super long and
linear we did a different data structure all right about 400 responses in in the
answer is indeed B trees be trees not to be confused with binary tree a b tree typically has other children besides two
that pulls the data even higher up from the leaves of the tree uh could use a hash table could use a link list but
indeed the technology in databases is generally these things called B trees certainly in SQL light
Carter oh dethrown but 4179 has now pulled ahead nicely done next question what HTTP status code means I'm a
teapot 0418 0071 128 this recall was a April Fool's joke
by technical people uh some years ago that has become part of computing lore it's still there though in the document
in two seconds we'll know that it's 418 indeed let's see how that affected things
4179 is way down on the list 7280 is number one now nicely done what is an example of a SQL injection attack when
someone submits malicious SQL commands via web form physically destroying a computer hardware that stores a SQL
database overwhelming a server with thousands of requests to access a database injection attacks are only in
movies or TV 5 Seconds some fun answers 400 responses about in and indeed when
someone submits malicious SQL commands via a web form because the you the programmer is not escaping the code
using the question mark syntax that we've seen using cs50's library or other thirdparty libraries like it
Carter 7280 is still the guest to beat nearing the end few more questions how are the elements of an array stored in
memory contiguously in random locations that happen to be available as a linked list as a binary
tree how are the elements of an array stored in memory about five seconds to go almost
have everyone in two one and contiguously is indeed the right answer
back to back to back in random locations that happen to be available is probably describing your use of malok in the Heap
but you would then need a linked list or some other structure to stitch those locations together in Array by
definition is contiguous Carter 7280 is hanging on to that lead by about uh 499 points next up is which
SQL query would allow you to select the ID of a specific movie star Zenia in a table of movie stars select ID where
name equals Zenia select star ID from movie stars where name equals India select ID from movie stars where name
equals India select ID for movie stars where name equals quote unquote Zenia and I I'm spoiling it I should
have read out some quotes earlier or two one second the last one is correct and
indeed this one's almost correct but lacks the single quotes Zena is not a single uh is not a SQL key word it's of
course a string so it does need to be escaped there but 63% of you realize that 7280 is still in the lead I think
we have a few more questions to go why is a hash table faster to search than a linked list even though the runtime for
both is Big O of n the hash table actually has Big O of N squared runtime the hashtable optimally has Omega of O
runtime uh the hashtable creates shorter link list to search rather than one long link list the hashtable takes less
memory and this was a example of practical versus theoretical differences and inde indeed that was interesting
with 83% of you buzzing in the hash table creates shorter link lists ideally if you have a good hash function rather
than one long linked list even though technically it's still in Big O of n 7280 seem to know that is pulling ahead
of the crowd still a few questions is Game of Thrones is a dot dot dot comedy drama historical fantasy documentary
romance sci-fi or all of the above this is written by your classmates recall based
on our sequel week in 5 seconds we'll be reminded that according to our CSV file they were all of the
above okay all of the above all right 7280 did okay with that next question which of the following is
a golden rule when allocating memory every block of memory that you maloc must be freed only memory that youo
should be freed Do Not Free a block of memory more than once all of the above more into the nuances of C this
Golden Rule when allocating memory didn't have to worry about this in Python we did in C in two seconds we'll
know that all of the above are indeed things you must do not doing those would be in fact bugs Carter the
leaderboard still doing well 7280 whoever you are last few questions last question in fact last question what do
the binary bulbs on stage spell today the answers could be uh face with medical mask face with
tears of joy snowman without snow or Red Heart what do the binary bulbs on stage spell 6 5 4 3 2 1
the answer is the red heart taking a look at the leaderboard here who's our [Music]
winner the winner is oh guess 3487 a big round of applause for our guest thank you to
Carter so it's it's nice that there some opportunity here um because recall that in week zero we did start talking about
emoji and really about data and representation and we talked not about just binary but asky and then Unicode
and then when we had Unicode we had all of these additional bits that we could play with and we could start to
represent not just letters of the English alphabet as in aski but really letters of any human alphabet and even
alphabets that are continuing to develop and indeed this was faced with medical mask which we claimed at the time was
just how a Mac or PC or Android phone or iPhone nowadays would interpret and display a pattern of bits like this
thisen to be for the four bites that represent that particular emoji and over time humans have been deciding to use
different patterns for new and uh new uh emojis that might not have existed yesterday and indeed most anytime you
update your Mac or your PC or your phone these days uh at least on a semiannual basis are you getting some new and
improved emojis and they're not just these faces now they're of course representing different human emotions
different physical objects and ultimately among the Unicode consortium's goals is to be able to
represent all all human languages but were it not for certain groups of people and certain individuals these things
would all rather look fairly similar and indeed today we're so pleased to be joined by an old classmate of mine
Jennifer 8 Lee who was class of 99 here at the college who's gone off to do many many different things in life
prolifically so um not only has she been a writer an author a journalist for the New York Times a producer of films like
the Harvard computers the search for General so and the Emoji story which focuses on exactly today's topic uh
Jenny and her colleagues have been involved particularly with um championing representation of different
types of people and cultures and languages and these are just a few of the emojis that our friend Jenny has
indeed brought into creation on our phones and laptops Jenny 2 is the original inspiration for what has become
it seems my Twitter recommendations and all of these puppets I was visiting her in Manhattan one time some years ago she
had on her shelf a couple of Puppets known as Muppet whatnots at the time you could go to FAO Schwarz or the website
there for toil store and you could actually configure your very own Muppets and I thought this was the coolest thing
and literally on the cab ride home from her place was I logging into the website configuring a couple of Puppets a couple
weeks later they arrived and then rather sat on my shelf for a couple of years as I wondered why I had just bought two
Muppets in the back of a cab but brought them into the office at one point a colleague saw them Drew inspiration from
them and now have they been woven really into the fabric of this course in particular and a lot of the courses
pedagogy at least incarnated here just for fun but also in video form as well which is only to say so glad that our
friend Jenny at Ley is here for us today to talk about these Emoji Jenny hi all right well this is very exciting
I took cs50 in 1994 um to give you a sense one of my blockmates was the first intern for
Netscape if you guys have ever heard of Netscape and um I graduated just as Google with was come like we did not
have Google when we were undergrads so um it's a honor obviously to be in um at cs50 it's also very impressive to see
how David has turned it from a entrylevel computer science course into a Lifestyle brand that is world renown
so it's an honor and I'm going to talk to you today about how an emoji becomes an emoji um so first I'm going to talk
about my journey down the rabbit hole of how I got involved with emoji so this is my friend uh ianl she is a designer
famous for Designing the Twitter fail wh which was like this kind of image that popped up um when Twitter went down
which back in the day was rather often so she's Chinese Australian American so which is like a weird interesting
combination and uh so one day we were texting about dumplings because that is what Chinese is women do we text about
food and so I sent her this picture of dumpling and then she said yum yum yum yum yum um you know knife and fork knife
and fork knife and fork and then she was like oh I'm surprised that Apple doesn't have a dumpling emoji and I'm like oh
yeah that's kind of weird and you know it's one of those things where you know the thought comes through
your head and then it leaves I I was you know was just sort of an observation but then half an hour later onto my phone
pops up this like dumpling Emoji with hearts actually you can't see it here but it actually had had like blinking
eyes so she called it bling bling dumpling she's a designer so she decided she was going to fix um this like lack
of dumpling Emoji problem and I was actually like really puzzled like how could there be no dumpling Emoji right
because you know I knew that emoji originally Japanese this by the way was back in 2015 so Japanese Foods super
well represented on the Emoji Keyboard y OB brammen you have tempora you have Curry you have actuallyit Bento Box
Curry then tempora uh you have you even have like kind of slightly weird foods like
um let's see you had these like things on a stick which are fish cakes I discovered then you have this white and
pink swirly thing which is also a fish cake you even have this like triangle thing that looks like it's had a bikini
wax but in essence there were all these foods that were on the board but there was no dumpling right and I was like
dumplings are this kind of universal food like every culture has some version of a dumpling whether or not it's
empanadas or ravioli or um God what else ravioli perogi Momos you know J the whole idea is all culturists have
basically found the idea like this concept of like yummy goodness within a carbohydrate shell whether or not it's
baked or steamed or fried so dumplings are Universal Emoji I didn't use them that much but I was like they're also
kind of universal so the fact there was no dumpling Emoji told me like whatever system was in place failed and I
actually had no idea I was like who controls Emoji I'm going to go fix this problem like there is something wrong
with the universe if there's no dumpling emoji and I took it upon myself to like go fix that so I Googled um and I
basically discovered there was this thing you know called the Unicode Consortium which is a nonprofit based in
count uh see Mountain View California that when I looked had these like 12 full voting members so this is late
2015 of those 12 nine were multinational US tech companies so there was Oracle IBM Microsoft Adobe
Google Apple Facebook and Yahoo so these were um eight I think and then then you had the German software
company sap the Chinese company called Huawei and then the government ooman so these were like basically the people who
were in charge and had full voting power on Unicode so they paid $188,000 a year um to have this full voting power which
is a lot of money I was like kind of very indignant on like how this cabal of tech companies basically control This
Global um curated image based language on your keyboard so there was a little bit of a kind of loophole which is you
could you could pay $188,000 a year to have full voting power or um you could pay $75 a year as an individual you had
no voting power but you had the ability to sign up for the email list and also show up with the meetings so put in my
credit card got on email list and like was like kind of checking my email one day when there was an invite that
said they were going to have a quarterly meeting and I think this is going to be October
2015 and I looked uh it was in Sunny Veil I looked at my calendar I looked at you know uh the point that I was
actually going to be able to be in Silicon Valley at that time so I took a bus to Apple where they were having that
meeting and I don't know completely what I thought I was going to see like I think maybe it was going to be like
maybe like a Sanders Theater or like a little mini Congress like people making Emoji decisions but that was not what it
was basically this is the room um where it happens these in 2015 were the people who were deciding Emoji you know these
were Emoji decision makers which were not like the most demographically kind of um diverse group um they had a sense
of humor about it one guy had a shirt that said shadowy Emoji Overlord and so I decided along with my
friend ianl to create a group called Emoji nation whose motto was Emoji by the people for the people and it kind of
kind of brought the voice of like the normal world into the decision-making chain so um you know we launched a
little campaign uh about dumpling emojis we made a Kickstarter video um let's see dumplings are one of the most universal
crosscultural Foods in the world Georgia has Kali Japan has giosa Korea has Mandu Italy has ravioli Poland has Bogi Russia
has pelmeni Argentina has empanadas Jewish people have crep China has pot stickers Nepal and Tibet have
Momos yet somehow despite their popularity there is no dumpling Emoji in the standard set why is that emoji
exists for pizza tempora Sushi spaghetti hot dog and now tacos which Taco Bell takes credit for we need to write this
disparity dumplings are Global Emoji a global isn't it time we brought them together
oh yeah and while we're at it how about an emoji for Chinese takeout so um so this is Thanksgiving of
2015 I wrote a dumpling Emoji proposal this is it um you know kind of different styles like whether or not it's a
head-on view or slightly diagonal view um and so we we that's Ying with then um one of the co-chairs of the Emoji
subcommittee and so along with dumpling we also did takeout box we got Chopsticks and then fortune cookie which
actually I have to be honest I don't think fortune cookie would have gone in on its own merits were it not on the
quote tales of the other three so we got these four through um and you know that that is how they look today and I have
to say that that dumpling looks really photo realistic in the Apple World unlike the fortune cookie which has like
no SL it looks like a dead Pac-Man I don't know what is going on with that um design but uh so very proud you know I
also did a lot of research on Chinese food in America and wrote a book of the fortune cookie Chronicles produced a
documentary called the search for General SE so like I had a lot of moral Authority on the issues of uh you know
Asian food in America not all things but this one I felt like I had like made a mark on a 2,500 year history of emoji oh
sorry of uh dumplings by moving them into Emoji so it kind of gets into this very complicated thing like how does an
emoji become an emoji and it's actually fairly complex um so let's say you have an idea
for an emoji you write a proposal and then you submit it to the Emoji subcommittee um that L like
debates and thinks about it uh sometimes they have feedback and they kick it back to you and if so then it you have to
revise it and it kind of goes around and around in a circle and and then they once they're happy with it they kick it
to the full Unicode technical committee which is sort of like sort of a governing body within Unicode on things
Technical and encoding so what are the kinds of things that impact um whether an emoji can be an emoji so one um is
there popular demand is it frequently requested um and at this point one of the very
crude ways that we measure is uh if you search for it on Google do it have more than 500 million uh kind of like results
which is what elephant gets in English and that's s of like a median Like Elephant is like kind of right in the
middle of like popular Emoji not popular Emoji so we use that as a benchmark there's a plus but there's multiple
usages and meanings for example um like sloth that was an emoji that we did it also you know it's both in literally
kind of an emoji of an animal but it also has lots of connotations so if something has lots of multiple meanings
that kind of gives it a bump um one thing is visually distinctive like does it work at little tiny Emoji sizes and
that's actually really hard because there's some things that I think could have been Emoji but don't completely um
work when you try to shrink it down and I'll give some examples of that later and then kind of uh filling the gap or
completeness is another Factor so for a long time we had Red Heart yellow heart green heart blue heart Purple Heart
there was no or orange heart and so there was um a gay designer from Adobe who was like actually very heartbroken
by that so he had been substituting the pumpkin to get the orange to get the rainbow and so he proposed an orange
heart and that was you know obviously at that point you're like yes that will complete a set um and another thing is
is it already something that you know one of the companies um has and therefore everyone else um is going to
like adopt it and so a good example for that is um the binary I think it was a non-gender binary Emoji the pink blue
and white flag so I have to say WhatsApp is by far one of the most Rogue um platforms so they just like randomly
like added it one day and we just Noti it and we're like oh God given that they have to do it now we have to build into
the set um so factors of exclusion or against inclusion to be more PC sometimes if it's too specific
or narrow um that works against being included so poutine which the Canadians love was kind of really specific and I
know it's really important to the can Canadians but it just kind of didn't have enough sort of global appeal um if
it's redundant so an example for that is a couple years ago Butterball proposed like a roasted turkey Emoji but we
already had like an unroasted live emoji of a turkey so it wasn't clear that we needed the cooked version to go with
like the live version so that didn't pass um not visually discernable so this one's actually really tricky um and
knocks out a lot of things so it knocked out kimchi for example really hard to do kimchi at Emoji sizes like how you is it
in the jar is it like you know just sort of in a little bowl so kimchi kind of got kind of dyed on that another one
that was really hard was cave Emoji actually um really hard at Emoji sizes and then this is interesting no logos
Brands deities or celebrities um and this is a new policy we just introduced which is no more flags flags were
killing us in terms of all kinds of complicated reasons and there is much regret that we ever added flags and um
and and lots of politics so at this point we're no more Flags so um once it kind of gets passed
into the the full Unico technical committee The Proposal gets voted on like once a year and then they pass all
the emoji for the next year we just actually did that a couple weeks ago and it takes a while gets sent to all the um
companies like apple Google Adobe Facebook and then they add it to all your devices and and then Tada it takes
about 18 to 24 months from when you have first have your proposal to when it lands onto your devices so Emoji nation
has worked on a bunch of emoji and so we've kind of shephered this through so one of the interesting question is why
is it that Unicode controls Emoji so a lot of it has to go um kind of do with it has to do with the history of emoji
they were originally popularized in Japan there was a very one of the initial sets is from 1999 from Doo these
were actually recently collected by the Museum of Modern Art in New York City um and so all the the Japanese um vendors
had these like little glyphs that they added to their character set and the main problem is like if you were dok you
had like you know one set if you were in soft Bank you had another set so no matter what you couldn't you can only
kind of text the people who are on your platform not across platforms and that was a real big problem when Apple and
Google started introducing smartphones into Japan and there was sort of this kind of understanding and expectation
that if you if you did something in your smartphone you also want it to show up an email and be sent into you know Into
The Ether and someone else is supposed to get the same uh image that you sent so that was not the case so in 200 7
they went to Unicode and asked them to like basically unify the Emoji set and unicode is interesting because it's
mission is to enable everyone uh speaking every language on Earth to be able to use their language
on computers and smartphones and they actually see this as a human right because at a certain point if your
language cannot be captured digitally it's going to disappear so you know they spend a lot of time doing Chinese Arabic
uh cilic in the very early days um in 2001 they actually had a proposal for Klingon which they did not actually
accept at that point so they have three major projects um they encode characters including Emoji that's actually what
they're most famous for um they also have a bunch of localization resources so um that's like you know in this
country they use this as a currency and they use this kind of um time format and like it's you know whether or not it's
month month date date year year year year in some countries it's you know date date month month year year year
year in other countries so they kind of tell you what country cares about what and then they also then have the
libraries um so that no one's basically programming things from scratch so what's really funny is you say cldr
really fast it sounds like seal deer and this really confused one of the girlfriends or one of the engineers why
he was always talking about seal deers and so she uh basically surgically attached a bunch of antlers to this
little guy um and made a sealed year and um so it took three years between 2007 to 2010 uh to introduce the first
Unicode emojis set so this these were the ones that kind of came out it took many many years to figure out like how
to reconcile all the different images and like which one should we include which on we shouldn't include um and as
you guys probably know from cs50 a Unicode code point is a unique number assigned to each Unicode character so
you can represent that emoji tears with a face with tears of joy as this or this or the binary code so
Emoji are just kind of hanging out on your phone after 2010 until 2011 when Apple suddenly made them much um easier
to access on your phone and one of the kind of confusing things of course is like Emoji are very ambiguous and it's
not always clear what they mean and that's one of the great Joys right it's it can be more um there's there's much
more interpretation on on on in terms between the sender and the uh receiver so if you actually look if you start
doing that on Google the the autocompletes are like what does it mean when a guy sends it to you what does it
mean when a girl sends it to you and um clearly many many people have been confused by that emoji when it's been
sent to them so who can propose emoji and the short answer is basically anyone uh there's a Google form that is open
between April and August um so the hijab Emoji actually was originally proposed by a 15-year-old girl who is Saudi
Arabian but lived in Germany uh rayu aledi who actually got into Harvard and then chose Stanford so I've always given
her hard time about that I know I'm kind of on that I was like um so she wrote The Proposal and it it got
through and she's actually the subject of um the documentary that we put together called the Emoji story we also
have a group of argentinians who fought really hard for the mate Emoji their national drink and then there was um
this nonprofit for girls advocacy that really wanted a menstruation emoji and they sent in this bloody Underpants uh
proposal which is like really terrible I'll be honest so we kind of worked with them and got blood drop uh which
actually is one of like actually has done pretty statistically like like well we were kind of surprised actually how
popular it is um the skin tone Emoji we're actually proposed not from within Unicode clearly it was done by a mom
from Houston who's also an entrepreneur cuz her daughter asked her came home one day and said um I really like an emoji
that looks like me and her mom Katrina parrot was like that's great honey what's an emoji and so what she actually
had worked in procurement with NASA and so she understood fors from proposals and she actually was the one uh we
should thank for having like five skin tones today um woman's flat shoe and sort of the the one piece bathing suit
and as opposed to just the you know uh yellow you know teeny weeny yellow polka dot bikini is a mother of three now four
who um just wrote that cuz she was very offended that all of the shoe Emoji had high heels for women um I actually
really like this guy some random guy in Germany came up with this uh Emoji as we like to say it's a coar Emoji he wrote a
proposal and it got accepted because it it was a really good proposal then you even have governments the Finish
government like literally the Finish government they're equivalent of the uh Department of State uh proposed a sauna
Emoji which these are the images and I think they're really ugly for I mean they're all there's there's so many
problems with this Emoji but we helped them as Emoji Nation first we like got rid of the club feet and then you know
gave them you know sort of examples like you know do you want them to hold the ladle do you want like the the sort of
steam around it do you want like it um you know with like like clothing or not clothing we actually did um a little you
know a little bit of a towel for the more modest in us so it got p and then the way it ended up uh is basically
person in a steamy room so this is how it kind of evolved so you can see that is what Finland kind of submitted that
is what we submitted and then that is how it's ended up on your phone and that is basically supposed to mean sauna
Emoji um so one of the questions is like why do I care so much about emoji and representation of emoji and a lot of it
has to do with the fact that I grew up speaking Chinese and like going to Sunday school or Saturday um Chinese
school and as you can see there's sort of like some really interesting parallels between modernday emoji and
like Chinese radicals and characters from a long time ago so this is fire this is mouth this is tree this is moon
this is Sun uh you can mix and match them in Chinese as well so one of the interesting ones is like you know two
trees together basically makes a forest you have like a sun and a moon together and
that means bright in chines it's kind of fun then um this one's fun right so it's basically a pig
underneath a roof so you're like oh maybe that means Farm or like I don't know like a barn or
some kind of like animal thing but actually that in Chinese means home J or family so like home is where your pigs
are which I think is says a lot about society and um what people cared about way back in ancient China um this is one
of my favorites so this is a character for woman or female NE and it I guess it kind of looks like this like you know
she's like cursing or something so um super interesting character if you like grow up like you know writing your
characters you know so um so this is a woman underneath a roof and you're like oh that might mean like wife or family
or something but um it actually doesn't it means peace on so the idea is like things are at peace when the woman is
under a roof which I always thought kind of like I felt like kind of weird about that growing up um another one is okay
there's a woman and then you have a child or Boy Child specifically so you're like oh that
might mean family or mother or something but actually it means good so the standard for good in ancient China was a
woman with a Boy Child which I thought was also you know as a six-year-old was I found problematic as well um and all
kinds of interesting things in Chinese use the female radical so three women together means evil this one means
greedy this one means slave this one means jealous this one means betrayal or
adultery which I think is interesting so in case you want to bring this to your favorite 10-year-old we have a Chinese
like an emoji uh kids book coming out from MIT press in the fall called hemoji from so it's from MIT teen press it's
super fun so it's a lot of these Concepts were like a little bit more rigorous and um this idea of like gender
in emoji was really important to a bunch of us as we were kind of working through the issues so for a long time you know
on the Emoji Keyboard there are all kinds of jobs you could have as a man like you could be a police officer you
could be a detective you could be a Buckingham Palace guard you could even be Santa you could be Black Santa right
but until as of 2015 if you're a woman there were only four jobs you could have on the emoji keyboard so you could be a
princess you could be a bride you could be a dancer or you could be a PO bunny so those are your like four choices and
um so we worked really hard on like trying to diversify what women could be um and one of the ways we did it was
through this idea of like combining Emoji so in Emoji Land there's something called zge a zero with Joiner and a lot
of emoji that you see are actually glued together so the rainbow plus flag is how you get rainbow flag and this is
actually how we worked on um introducing a bunch of the um the occupations in Emoji Land so a lot of these are like
you know the chef is a woman plus um like the Fry frying pan or a teacher is a woman Plus or a man actually plus a
school um and so one of the interesting things is you can actually have um as as a result of all the gender parody stuff
we actually had to make male and female versions of all the Emoji cuz some of them originally were passes like man and
tuxedo and now because we had gendered versions of everything we now have woman with chedo I don't know if you notice
there's also man in a wedding dress to to kind of compliment the woman in a wedding dress um there's now actually
also bearded woman I don't know if you've noticed that so it gets interesting cuz originally at a certain
point we had passed women breastfeeding and then there was like all of this like complaints coming into Unicode about
what about men as caretakers you can't actually tell she's breastfeeding it's more just like she's holding it so
people are like what about the man As a caretaker like fraternity leave and so um there is now like like like man kind
of nursing the child and um the other kind of ways you can combine um the Emoji or through skin
tones so unfortunately those are not through ridges this is through an older kind of Technology where you have all
the skin tones are basically the yellow character plus like a little square box at the end we call them skin tone
modifiers and um in terms of one of the things that we worked on at imogenation which is one of the hard ones was to
create um the interracial couples and we worked on that with Tinder which really cared about it because apparently which
I thought was interesting when you introduce online dating into a community the rates of interracial marriage go up
um and there's a pretty interesting academic paper that kind of systematically looks at the roll out
through different countries and different um communities so it was really nice to see
it introduced on the phone one of my friends cried um in terms of emoji Nation Emoji we've worked on a lot so
these are just a sampling of the ones that we've um done I really liked let's see DNA I feel really good about Lobster
on behalf of people from Maine um yarn and uh thread for all the people who like knitting there was Bagel emoji on
behalf of like all New Yorkers um this Emoji actually which we called micro was like very sleepy on the keyboard until
2020 and it really had its moment I'm really really kind of proud proud of that one and um there is Yoga Emoji
sponge so these are just a sampling of the ones that we've worked on and this is a sampling of the people who have
contributed you too if you feel really passionate about Emoji could like impact billions of keyboards worldwide so it's
interesting to see in terms of frequency of use it's very power law right so here sort of these are actually
um like order magnitude like so one is half of this two is half of one all the way down um and one of the most stunning
things I I was surprised to see is that face of Tears of Joy by itself is like almost 10% of all Emoji scent
99.9% of emoji is just that one character number two is red heart which I guess you guys can see in its binary
form and then it falls off like pretty quickly so I know I'm hearing that face of Tears of Joy is very Boomer or very
Gen X and that it's uh maybe among you guys it's a little bit kind of um Blas or declass at this point um so the
future Emoji we really don't Unicode does not want to be encoding Emoji um and along the way I became a vice chair
of the Unicode Emoji subcommittee so I went from like kind of shaking my my fist at the institution to becoming part
of the institution um so there's one idea this coded hash of arbitrary images can we
create a system where instead of um just using a binary code to represent all different Emoji we actually can do
specific images we create hashes and then like you look and you can look up like by the hash which image you're
looking you know at so that was the idea this is from a Standford Professor didn't really get takeoff then there was
this idea using Wikipedia um or Wikipedia the wik data Q ID numbers which uh I didn't know this until this
proposal came along but everything in Wikipedia has a number um and that allows it to sort of match things
between different languages so in Chinese the page for Obama match is matched with the English page with this
you know Arabic page um and that went nowhere so um what I'm going to finish with is telling you what the new Emoji
are you guys are among the first people to hear about this because no one's really paying
so this this was published a couple weeks ago but like like it made no news because you have to be looking at the
Unicode register um so first off more Hearts cuz you guys all love hearts so there's
light blue heart gray heart and pink heart there was kind of debate do we need more pink hearts the answer seems
to be yes um light blue is really interesting cuz in some cultures light blue and dark blue are different colors
in our culture we just call them like versions of blue it's sort of like how in our in English pink and red are
different colors but um In some cultures there isn't a difference between pink and red then there were a bunch of bird
things the wing emojis coming Blackbird and Goose I don't really don't really know why um hense uh as a flower this
has like very popular in Iranian culture jellyfish I don't know I I I I'm very suspicious of jellyfish because um they
used they used man of war as one of their like phrases that they searched for and that had a billion I think like
it had a lot of entries and I I feel like those were not about the actual invertebrate like there was something
else going on there but kind of rode in on that um moose on behalf of the Canadians donkey uh on behalf of I guess
the Democrats um so that was interesting cuz like you had to have the donkey look different from a horse and there was a
whole debate like do you want a donkey head or do you want a donkey body do you want donkey with fluffy ears do you want
like all kinds of donkey debate and it was actually originally proposed in 2019 and just got in this year um Ginger uh
and Peep pod these are these are kind of weird like the food things kind of got in in a in a weird way Ginger was good
cuz it also represented root and then Wireless got in which is interesting cuz we couldn't use the
phrase Wi-Fi cuz that's actually trademarked by like the wi-fi people and then on behalf of um seeks K finally got
in it was the largest religion that wasn't already represented on the emoji keyboard and then on behalf of like the
faces shaking face so I don't I'm glad you guys are really
excited by that it is it is unclear to me um like I was not a big proponent of this but your excitement about it makes
me change my mind um then folding hand fan I actually find that one interesting cuz I think it was just like college
students or fresh out of college students were like we want to do a proposal that passes and they were very
opportunistic and just sort of like chose fan and then first day submitted electric fan and then we told them like
oh the longevity for electric fan isn't great even though it's been around for a couple hundred years why don't we go
with the folding hand fan which is a much longer history and then this one is actually a big deal is um afro hairpick
on behalf there's a lot of controversy about and debate about curly hair and supp to be represent afro and then Apple
did not do that so everyone else has very afro looking hair Apple just makes it look wavy and so there was like
upsetness that like black hair wasn't represented an emoji set and so this was a proposal that someone worked on um and
then animals uh sorry not animals instruments maracus and flute um and that's it so in terms of if you have any
questions you can look at EmojiNation dorg you can email me for all things Emoji um Jenny at emotion. org and
remember you guys can actually impact billions of keyboards around the world I mean it's a little bit of impact for
humans but billions so it adds up to a lot and if you have any more questions I here and can give you know lots of
answers and questions and I'm really thrilled actually to kind of bring the Emoji um flag waving to such a large
crowd and especially a large you know diverse and very motivated crowd and and um one of the interesting things is
we've kind of like this is I'm not a proponent of this but they've slowly decreased the number of emoji per year
it was like 70 then it was 50 then it was 30 and this year we only did 20 and I'm um I'm a little bit sad about that
but I hope that you know if there's more you know excited um proposals that can be submitted to unic good we might be
able to dial that number back up so that is me am I good yay so thank [Applause]
you well thank this is about 20 years late but thank you so much Jenny we have in I took cs50 t-shirt for
you on the way out to we have some cs50 stress balls for you cannot wait to see your final projects come on up if you'd
like to chat with Jenny this was cs50 see you soon [Music]
all right this is cs50 and this is first year family weekend here at Harvard so welcome to all the moms and dads
brothers sisters cousins Aunts Uncles grandparents and Beyond cs50 here is Harvard University's introduction to the
intellectual Enterprises of computer science and the Art of programming and what that means is that what we've been
doing in here over the past several weeks is introducing students to computational thinking the process of
cleaning up one's thoughts and expressing oneself all the more correctly all the more precisely and
ultimately translating those thoughts of course to a computer in the form of programming which is where we've spent
quite a bit of time programming writing code over the past several weeks but toward that end we've also been
equipping students with some basic building blocks uh you might already know if a parent uh a that computers
only somehow speak zeros and ones even if you're not necessarily a computer person yourself or know what that means
but with those zeros in ones can we represent numbers and letters and colors and videos and more and in fact your
your child perhaps sitting next to you can perhaps tell you what today's message says here we have 64 light bulbs
on stage and if you look at eight of them at a time there's a pattern of bulbs that are either on or off that if
you know the code so to speak can you actually convert these bits these zeros and ones in light bulb form to today's
particular message now before we begin we thought we'd make this as engaging as interactive as possible um rather than
focus on any assumptions of Prior Computing knowledge you need know nothing today other than how to operate
for instance your own phone or a laptop or desktop or the like and indeed we'll assume a general audience and in this
Halloween week will we also see if we can't scare you a little bit into practicing better practices when it
comes specifically to the security or cyber security of the device you carry with you every day in your pocket use on
your desk on your laptop or Beyond so if you haven't already whether you're here in person or tuning in online go to this
URL here which will lead you to an interactive polling tool any phone or laptop or desktop suffices if it's a
little easier than typing in this URL you can just scan this code with your phone's camera take a moment to just
open your camera and hopefully if you're at a good enough angle and we've made this thing big enough this is a
two-dimensional barcode or QR code embedded in which is that exact same URL we're increasingly seeing this
throughout the world as a mechanism for doing what many of you are doing right now linking the physical world to the
virtual but that URL again is simply this one here and in a moment you'll see on your screen it's okay if you weren't
quite able to get that working feel free to glance to the left or to the right at of you for someone else who did let me
go ahead and full screen a question just to ask of everyone here as we focus to today on cyber security uh is your phone
secure whether an Android phone an iPhone or anything else if you're holding it in your hand right now here
in person or online you should see three possible answers yes or no or unsure we've got over 300 responses come in
already in a moment I'll flip over and reveal the results and see if we can't see how much work we have to do together
here today few more seconds almost up to 400 answers almost up to 400 it's okay if those keep
coming in I'm going to toggle back and show the results in just a moment here and the results are now in according to
a response rate of over 400 it looks like 36% of you don't need what we're about to do here today which is great
we'll see if we can't poke some holes though and maybe some assumptions you all are making 31% 32% maybe of you are
a little are saying no your phone is not secure so so glad you came and then understandably to another third of you
are unsure so in very good company today and we'll see if we can't open the eyes of everyone in each of these disperate
audiences well let's consider first for a moment exactly how we might think about the security of our phones
representative of just any Computing device and in fact everything we discussed today could be extrapolated to
laptops and desktops and servers but all of us being so familiar with phones let's start with phones themselves now
odds are you have on your phone like so many other things in your life a password or a passcode and in fact
without raising your hands and therefore leaking information think to yourself what is my my password or passcode it's
probably four digits it's maybe four letters maybe it's even longer maybe it's even nothing and I think maybe from
the chart earlier we can assume that we have a third of each of those possible responses so a password of course is
this super common mechanism that you and I are all using all the time to keep our devices secure but do passwords keep
things secure like how many of you thinking about your phone right now and that specific password might think it's
secure and if so why do you think it's secure we have at least 33% of you are ready to say that
your password's secure don't want to know it but why might it be in your mind secure why might you think it's secure
or more generally what makes your password secure random it's random okay so it's random so random letters and
numbers and the like and that's great because it's not just a word in the dictionary that someone could guess and
type in downside of course I dare say is that it might take you as well as anyone else quite a bit of time to guess or
figure out what or just to remember what it is if it was indeed random but Randomness is going to be a primitive
that really actually helps us unfortunately you and I and really the whole world are not very good even at
passwords as omnipresent as they are as a defense against adversaries in fact if we look at um if we look at the uh most
common passwords from the past year in 2020 thought we'd share with you some of those results um this is the result of
security researchers having found uh big exploited compromised databases analyzing them for what passwords are in
them and then inferring from that what the most common passwords you and I are all using unfortunately in 2020 the most
common password according to one measure was 1 2 3 4 5 6 now funny yes but if you're seeing your password on the
screen already not so funny perhaps the number two password was not much
better number three picture one presumably for a device a website that requires that it not just be a word it
have at least one number which this person took lit these hundreds of thousands of people took literally
password was number four this past year 1 2 3 4 5 6 7 8 111111 really not trying hard there 1 12
3123 varying it a little bit 1 2 3 45 was number eight 1 2 3 4 5 6 7 89 Z was number nine and then number 10 in 2020
was Sena which any portug speakers here means password means password so made the list twice in this case so one
takeaway already today should be if your password's on this list like probably you're in one of those other 33% whereby
we can do better than this why I mean really the obvious if you're in this list there's so many bad guys so to
speak out there that are going to try guessing your password first why because just statistically if they try 1 2 3 4 5
6 1 2 3 4 5 6 7 8 9 they're just going to get into a lot ofice is quickly because they're just so commonly used
those passwords you don't want to be on this list ideally you want to be random but we want to somehow balance
Randomness with memorability so that you don't actually keep forgetting your password which of course defeats the
whole point of these things in the first place but in a class like this cs50 and computer science more generally let's be
a little more thoughtful as to what we mean by a device being secure like what does it mean to be secure and can we
even slap some numbers on it so that we can make measurements so that we can ideally compare and contrast one system
versus another one password versus another so it's not just our instincts arguing that my password is better than
these but how can you quantify that perhaps well let's start simply a lot of Android phones and iPhones these days
require minimally that you have like a four-digit passcode you're minimally encouraged to have at least this bar set
so that you're not having no passcode altogether so if you do have a 4digit passcode well let me go ahead and ask
this question how much time might it take to go about cracking so to speak that is figuring
out what a 4digit passcode is in fact let me go ahead if you want to pull up your devices again you should see on the
screen this question now how might how long might it take to crack that is figure out guess a four-digit passcode
for instance on someone's phone a few seconds a few minutes a few hours a few days thinking here from the adversarial
perspective if someone got a hold of your phone some how how long do they need to get into your phone if it has a
four digigit passcode few seconds few minutes few hours few days got about 300 responses so far let's give folks
another few seconds here another few seconds here all right up to 350 or so in a moment let me go
ahead and flip screens over to the results so we'll see the preliminary results here and if I now pull this
screen up we see that 50% of you claim that it's going to take only a few seconds few of you say about a third a
few fewer you are saying that it takes a few minutes a few hours and even a few days well let's answer that first
because honestly if it's already a few days or even longer our work is here probably already pretty done
unfortunately the problem with things like four-digit passcodes is that anyone who grabs your phone you step out of the
room you leave it behind you lose it they could certainly mimic your input device and just use their finger
pretending to be you trying 00000000 nope 00001 nope
00002 nope and it's a little slow to be fair it would take me a while to count all the way up to
99,999 that's 10,000 total possibilities there but let's go ahead and consider exactly how else you could do it for
instance here uh is an example of in computer science what we call a Brute Force attack and just an adversary using
their finger is a Brute Force attack if they're trying all possible passcodes the problem is even if your passcode is
way at the end of the list of numbers eventually they're going to get it by brute force sort of like in yester year
using you know a battering ram or the like to brute force your way into a building a castle or the like in
software sense it just means trying all possibilities and you don't even have to just use your finger right anyone with
some programming Savvy who's good with Hardware could maybe do something like this here's a quick video I'll hit play
on no sound but a little bit of a robot that has an Android phone underneath it and it's got a little robotic finger
that's doing the work for you you can step out of the room now is the adversary let the robot do its work
trying 000000 through 9999 and ultimately presumably get into that phone so let's see if we can't quantify
then exactly how fast the human or the robot could get in well how many total possibilities are there that's the right
way to begin thinking about it if you have 10 digits for the first one 0 through n and then another 10
possibilities another 10 another 10 the total number of possibilities of course between 000000 and 9999 is 10,000 10 *
10 * 10 * 10 which gives us that much of a a search space a universe of possible passcodes to choose among unfortunately
you can do even better than your own finger or even that robot anyone in cs50 now who knows a bit of programming in
languages called C or python or anything else could open up a programming window and actually just start writing some
code and so let me do that what you're seeing here uh if a family member is a programming environment called Visual
Studio code that students have been using for the past several weeks up here we have a tabbed window where we can
type our code down here we have what's called a terminal window where I can type commands to make the computer run
that code and then over here is just a menu bar so crack.pie means I'm going to write a program to crack that is figure
out passwords using this language called Python and you know even though most cs50 students wouldn't know what code to
start writing they'd have to look up some of what I'm about to do it's only going to be a few lines so I'm going to
go up here and say from string import digits this is a fan way of saying hey python give me access to all decimal
digits it just avoids my having to type out 0 through manually all right then I'm going to say from iter tools uh
import product this is another feature of python that cs50 students for the most part have not yet seen that just
says Hey python give me the ability to do like the cross product of a whole bunch of numbers so these 10times these
10times these 10 times these 10 and then what am I going to do with that well for each possible passcode in the product of
those digits repeated four times I'm going to go ahead and for now let's just print out what the passcode is in other
words assume that I am now the adversary I don't want to waste time using my finger I don't have a robot that I made
but I am good at writing software and heck I've got like a USB or a lightning cable in my bag that I could connect
your phone to my Mac or PC and I could just have my code that I'm writing now send all the possible codes from laptop
to phone to automate this process just using the little port at the bottom of all of our phones well let me go ahead
and maximize this so-called terminal window which is again where I'm going to run this code and again the question a
moment ago was does it take seconds minutes hours days well let me go ahead and run python of crack.pie I'm
pretending for the moment that I did grab that cable from my bag and plug it into the phone hitting enter and it
doesn't uh didn't actually do anything that was not supposed to happen so in cs50 we spend a lot of time introducing
students to bugs uh which are mistakes in programs sometimes s not so deliberate let me go ahead and apologize
let me open this file this didn't technically happen Okay python correct there we
okay in cs50 we now will run the code here and I'm going to go ahead and run a command called python of crack.pie I had
the file in the wrong location a moment ago and this is the equivalent on a macro PC of double clicking an icon here
we go is it seconds uh minutes hours or days barely one second to try all 10,000 possibilities you can't even see them
all on the screen but this printed out 0000 all the way down of course to 9999 plug in that cable and boom the
adversary doesn't need to be in that room for very long in order to get into that that phone all right so what would
be better then like clearly four-digit passcodes bad if you have someone in your life who has a finger or a robot or
the ability to write code and unfortunately uh because of us you all have someone in the family with at least
the third of those how might we do better than this what's better than a four-digit passcode anyone yeah six
digits okay so six digits heck or seven digits or eight digits why because that's going to make the of course the
passcode longer which means we're going to have to try more possibilities which doesn't mean that the adversary is
fundamentally stopped but it is going to slow them down it's going to take them more time probabilistically to get to
your passcode and in a sense then increases the cost to the adversary and indeed that's the theme in cyber
security raising the cost to the adversary either financially or TimeWise or the like just like in the real
physical world most of you go home you lock your doors at night you might have invested in a better deadbolt than
another why is that you really just want to be more secure than the house next door you want to make sure that it takes
too much time too much effort too much risk to the adversary to get into your home and that's again what cyber
security is all about to say my phone secure is sort of nonsensical to say that your phone is more secure than
someone else's that's really a reasonable Fair statement to make so I like this Instinct let's see if we can't
make things a little harder and actually let's go one step further rather than just numbers you've probably noticed on
your phones you can use letters of the alphabet too if you click the right option on the phone you can start typing
in words and letters so how might we do that instead well let's transition to four letter passcodes four letter
passcodes and if we do four letter passcodes uh where the letters of the alphabet for instance are a through z in
English alone let's go ahead and act ask this question here if you have four letters of the alphabet so let's not
increase length yet let's just change to a bigger vocabulary now we have a through z
instead of 0 through 9 how many fourl passcodes are possible how big is that universe that the adversary is going to
have to search via Brute Force so I'm seeing a lot of seven Millions a bunch of 52,000 26,000
10,000s 9999 a few smaller numbers here hopefully it's not this low right because we've already set the bar at
10,000 possibilities for numbers alone hopefully if we've got English letters a through z we can at least do better than
10,000 so I think we'll start to see maybe some of these bars change a little bit but we've got 60% of you proposing 7
million well let's go let's go to the math so here we might have uh a way thinking about this both uppercase and
lowercase even better if you consider it that way lowercase A through Z uppercase A through Z that's 52 possibilities for
the first digit * 52 * 52 * 52 or 52 to the 4th power that indeed gives you 7 million plus possibilities all right
well let's now translate this to code that already sounds way better 10,000 versus 7 million this is definitely
going to slow that hacker down well let's consider exactly how fast or slow it might now be let me go into my
crack.pie program and let me make a little tweak so that instead of just using digits this time
I'm going to use letters otherwise known as asky letters as cs50 students will know that just means familiar uh English
letters of the alphabet and I'm going to change my code to use these asky letters four of them still instead of digits
alone and that's the only change now I'm going to pretend to plug my phone that I just stole from someone into a USB or a
lightning cable let me maximize my window just so we can see things a bit more let me run python of crack.pie now
and let's consider how long it takes to do s million possible codes okay slower slower can't
dramatically just say in one breath that we're done but we're already at the G's and then the H's and it's kind of flying
by you know this is where the adversary is probably getting nervous in the TV show or movie right someone is tiptoeing
around in the other room you don't want them to come in you only have this much time to crack the code and we're at the
RS the s's the t's U's V's so you know this feels like what a minute or so it's a good number of seconds but it's still
pretty brief certainly if someone has the ability to now we got to do the capital letters too certainly if someone
has the ability not to just secretly do it like in Hollywood in the Next Room but just take it with them and do it
over the course of a minute or two at home this seems to be faster sorry this seems to be slower because we're trying
so many more possibilities but you know if the adversary takes your phone has it long enough this doesn't feel like
terribly long so what might be better than this let's take it one step further what might be better than four letters
what what do most websites ask you to add to the mix so special characters right and
those things are darn annoying right because sometimes they even tell you what letters punctuation symbols you
have to use and then you type one and you real ah it's not on the damn list I mean it's frustrating why well it's
going to raise the bar though to the adversary and that's indeed going to be the goal here again just to increase the
cost or time required for the adversary so that it doesn't finish like it did just now after a couple of of minutes
but it's going to keep going and going hopefully such that they're going to lose interest in your phone and go try
to crack into someone else's presumably so let's try this let me now go over to uh how about one other question here and
this question will now just be let's go from four characters how about let's take it one step further and mix the two
ideas here more digits and longer passcodes how many eight character passcodes are possible and by character
as a cs50 student will know I mean number or letter or punctuation symbol now and there's like 32 or so standard
punctuation symbols so we're up to a good set of numbers now how many eight character passcodes do you think are
possible million billion trillion quadrillion or quintilian all of which of course are better than 10,000
possibilities so we're in a whole different space now looks like these answers are coming in a little more
slowly perhaps as folks think about this this is 10 digits plus 52 letters plus 32 punctuations
symbols much more secure it would seem all right we're up to 230 responses give folks another second or
so if you're trying to do the math 10 plus 52 plus 32 that's going to give you 94 possibilities for each of the digits
all right we're just about at our just about at our 350 all right I'm going to toggle over the screen here
going to click over to the results show them in just a second on the screen now and this is an interesting distribution
I think some of you perhaps have the Instinct now of just go for the biggest one um it's not quintilian nice as that
would be um maybe it's quadrillion trillion billion or million we have more of a split there so let's consider the
math so if we've got eight characters and I claim uh that that's 94 possibilities for each 10 digits thir uh
uh 52 letters 32 punctuation symbols that's 94 to the e8th C essentially and that indeed is six quadrillion
possibilities now that's crazy big at this point I dare say we're pretty safe from the human finger now we're probably
pretty safe from that robot which is going to take a while too but Macs and PCs are pretty darn fast and you know
God forbid the adversary have a big server or use the cloud so to speak and really use a big expensive machine how
long does it take to get into six quadrillion possible passcodes well how might we think about this suppose just
for the sake of discussion it takes the adversary 1 second per code just so we have some unit of measure to start with
1 second per code which means in the worst case the adversary really gets screwed and my passcode is like
9999999 or with a lot of crazy punctuation symbols in it if each passcode takes a second to guess how
long is it going to take the adversary if in the worst case they spend six quadrillion
seconds how many hours or minutes or days or years I'm hearing a lot a lot is in fact
correct I did do the math the adversary if they're lucky and get all this way they're going to be
193,000 years old by the time they get to all of those possible pass codes so this sounds alluring and in fact let's
just change our code one final time just to get a sense of how this might look and behave in this version here let me
go back into my code and let me change this now to use not just asky letters but digits and I'm going to add in pun
situation uh for cs50 students there is again this Library called the string library that gets lets you just import
all of these symbols automatically so we don't have to type out every character on my keyboard manually and then down
here I'm going to take the product of those asky letters again plus those digits plus the punctuation repeated
eight times I claim this time I'm going to now increase the size of my window just so we can see more on the screen
rerun the code and this is going to take us you know some hundreds of thousands of years so we won't run to the end of
this demo now we seem to be in a better place all right so what's the takeaway here clearly you should use a passcode a
pass word that's eight characters with letters and numbers and punctuation yes okay now there's a mix here some of
you are saying yes some no how about someone who says no why why no yeah recapture recapture okay
so there's other mechanisms more on that in a second other instincts yeah yes I'm kind of cheating with my
verbal simplification here even this computer is way faster than one code per second so it's not going to be hundreds
of thousands of years might be tens of thousands of years or hundreds of years but it's it's not going to be quite as
dramatic as this so that's a concern C yes so maybe there's other mechanism
so maybe we don't have to be so Extreme as to introduce all of this Randomness as was proposed before cuz honestly
there's this theme in computer science too and really information technology of tradeoffs right sure I can come up with
I can use a really big random password but my God I'm going to end up writing it on my monitor on a Post-It note which
I suspect statistically some of you are guilty of right and you shouldn't necessarily just blame yourself or you
know your colleague who's doing this like this is a symptom perhaps of bad it policy if we don't have necessarily very
usable systems maybe we shouldn't blame the human for forgetting their very random password maybe we shouldn't
require the human to have a very random password so what could we do a couple of technical mechanisms were just proposed
let's go down this road of how we might try to defend against this and I'll keep this running just for fun in the
background let me switch back over to a visual here now that we've considered that many codes what if we do something
that some of your own phones already have uh that slow the adversary down and some of you might have seen on your
iPhone a screen like this let me zoom in iPhone is disabled try again in 1 minute say does anyone lock themselves out of
their phone like this I have this is not I mean it's embarrassing to admit but it's not leaking any information all
right so many of you have done that already but why is this actually a compelling feature just to be clear
annoying as this might be because you probably don't want your phone locked at the very moment you're trying to get
into it why might it be a good thing yeah uh oh let's let's go somewhere else if if we may yeah I'm
back sorry it slows down the process it annoys you you to be fair like you pay a
bit of this price but it really slows down the adversary now they're going to be able to type in not one code per
second but one code per minute a 60 times difference that's really going to force them to pump the brakes and and
unless that adversary is after you specifically odds are they're going to go take someone else's phone or lose
interest because you've raised the bar high enough to they getting in on Android if you do this it depends on the
operating system version here might be something similar on Android too many attempts try again later I mean this is
even more annoying it doesn't even tell you when to try again later but it does slow down the adversary so if you don't
have features like this enabled you should and if you're particularly security conscious or or paranoid even
you can even enable a feature on these phones nowadays where they self-destruct so to speak after 10 wrong guesses right
why 10 you know the presumption is among Apple and Google and others that if you type your passcode 10 times wrong you're
probably not who you say you are you're probably someone else although you know if if you're a little groggy first thing
in the morning or if you've been out late and having a good time you tend might not be a high enough threshold to
sort of protect your phone from you and so there too is this trade-off again and that's an extreme one if your phone
deletes itself as which is what I meant by self-destruct then that might actually be to your detriment unless you
have backups and all of that but that's another uh technology question all together so there too this theme of
trade-offs you raise the bar to the adversary but you've got to pay the price you're not going to get any such
feature for free all right what's another mechanism that many of us increasingly thankfully are doing um
might be when you log into a website like Gmail to have two-factor authentication sometimes called two-step
authentication I mean how many of you use two-factor twostep authentication with at least one account all right so
that's amazing how many of you use it with all of your accounts all right fewer of us and there too that's not
necessarily the wrong answer right I have a lot of stupid websites that I have accounts on like I bought something
once on them I don't really care about it so there's a judgment call there in terms of what you really care about but
maybe your financial websites your Healthcare websites or anything that's mildly sensitive to you probably should
be raising the bar to the adversary by enabling this so what is this particularly for those of you who didn't
raise your hand someone else what is two-factor or twostep authentication what's two-factor yeah
when you have to use your phone to verify that it's really you yeah so when you have to pull out your phone and
verify that it's really you we in the corporate world you might have a little dongle a key fob on your keychain that's
got a little number on it but generally speaking two Factor authentication is all about indeed a second factor it's
kind of oversimplified as two steps but it's really key technologically that it be a different Factor it is not two
Factor authentication if you just have two passwords that you have to remember because both of those could be forgotten
by you both of those could be stolen by someone else if you write them down the Post-it note or the like to factor
authentication is about having a fundamentally different Factor available to you so that the odds that someone get
at something you know like your password and something you have like your phone is just much much smaller than the
threat of just figuring out something you know like a password alone so the factor is something that's fundamentally
different from the other thing and so once you configure this the user typically sees a screen like this for
instance in the context of Gmail the screens vary here at Harvard and Yale uh students are familiar with something
called Duo Mobile which is the exact same idea and they typically use one-time codes six digits thereabouts
and you can only use that code once and the idea is it's texted to you or pushed to your device so that you and only you
can use it does this fundamentally secure your account is this enough to just have a
good password and two Factor authentication does that keep the adversaries out
altogether not if someone what okay not if someone really wants to get in then you have other problems that
are are certainly of concern but you do want to ideally keep most adversaries at Bay and there too all we're doing is
like raising the bar right there's nothing stopping someone in physical proximity to me stealing my phone and
getting into all of those accounts I just raised my hand about but you at least protect yourself against the
billions of other potential adversaries in the world that are geographically not near us so you at least narrow the
threat so that's a good thing but what else could we do because I feel like it's not fair for us to say all right
everyone go home start using better passwords longer more complicated because again there's this tradeoff we
don't want to send everyone home essentially with a pad of Post-it notes to then counterbalance what's an
unrealistic expectation so how many of you perhaps with a show of physical hands use a password manager already
this is something practical we can equip you with okay so that was relatively few hands and those of you who are in The
Habit still of memorizing your password or Worse writing down the password there are better Solutions today but here too
there's going to be a caveat there's no clear win necessarily a password manager is a piece of software that you install
on your Mac or PC or your phone that manages your passwords for you and these come either built into the operating
system um Mac Windows has credential manager Mac OS has something called keychain there's thirdparty software
like one password or last pass companies and universities often have site licenses so that students in particular
can use these kinds of things for free but the ones that come with your operating system or phone are themselves
already free and not using them is really the missed opportunity here so what is a password manager it's a
program that yes manages your passwords but it does a few things more it generates passwords for you typically I
mean honestly it's been years since I have chosen my own password on a website I instead click a button in my password
manager software or I use a keyboard shortcut to generate something that's eight characters heck maybe 16 24 32
characters long I don't care because the software's job is to manage that password for me that is the software
remembers this crazy long password for me and better yet it comes with a button or
a keyboard shortcut that will automatically fill out forms for me on the web when I say log me in it will
grab my password from my computer plug it in and voila I'm logged in the upside of this is that even if that website is
compromised and my password leaks out I'm not using that password presumably anywhere else because this job
software's job is generally to create unique passwords for each website and it's not going to be guessed via a Brute
Force by one of writing code cuz it's just too long probabilistically you know we're all going to be gone by the time
your computer finishes trying to crack it so what's the downside I mean this sounds great if the software generates
passcodes for you and plugs them in for you where's the downside anyone
yeah eluter Accord yeah if you use someone else's
computer or you're in like a you know a library environment a lab environment you don't have your passwords accessible
now there's a m way to mitigate that so long as you sync the same software to your phone you might have to pay another
$1.99 or $20 to have the same software on your phone you can at least mitigate that by sharing the passcodes across
your devices not as user friendly you're still going to have to now manually type out this really long password and that
two is annoying if you get one character wrong but that's one way to mitigate that other
concerns that's maybe the biggest threats I mean you're kind of putting all of your proverbial eggs in the same
basket if someone now gets into my password manager which I should stipulate is supposed to itself have a
really big long password that I do have to remember but only one such long password I mean then I'm really out of
luck now every single account I own is compromised except for except for those that at least have two Factor unless the
adversary also steals my phone or my key fob other concerns exactly if someone gets physical access
to your device honestly in general all bets are off and this is why some of today's lessons are really important
it's only going to matter when you first lose your phone or someone walks off with your laptop or the like there are
certain things you can do to defend against that inevitability dare say but you want to make sure that if you are
using some of these Solutions like a password manager that that long primary password you use for it is itself really
hard to guess and you know I would say I'm okay with you writing that down even but putting it in like a safe deposit
box or hiding it somewhere in the house that's just very low probability of someone finding cuz the other problem
with putting all of your eggs in one basket if you forget your password then you lose everything and that too seems
like a pretty serious price to pay but this is a constant battle in Computing nowadays usability and security and
finding that inflection point but there too you can be you can be selective right I called out financial information
health information your personal email your calendar anything that's mildly more sensitive to you or important raise
the bar at least on those accounts even if you're not quite ready to go all in on all of these these other factors well
let's consider then where we're using these passwords consider just a couple of specific examples email of course
Gmail is the example I used earlier Gmail and email accounts more generally are increasingly offering us features
and in fact there's one that I thought we could highlight as an example of something that as a cs50 student a cs50
family member you should really start viewing the eye a little more the the world with a more skeptical eye a little
more paranoid eye and not necessarily just believe things that websites say I mean it's mostly meaningless when a
website says sometimes with a pretty little logo or emblem our website is secure like what does that even mean and
it's again all about relativity and even Gmail I dare say somewhat irresponsibly has this feature in recent years
confidential mode like as anyone if you're using Gite or Google apps at work or workspace nowadays in the habit of
using confidential mode I mean it sounds okay no one's using this so this is great and I worry now that I'm
introducing you to a feature that you shouldn't necessarily use but all this time if you're a Gmail user there is
along the little menu bar an icon that lets you enable confidential mode and later tonight play around for it just
look for it and you'll see exactly the screenshot which I took yesterday according to google recipients won't
have the option to forward copy print or download this email right great for lawyers it would seem great for business
great for private correspondents but why is this perhaps a bit misleading like what's the where should
the skepticism come from here even a company like Google I dare say you know they've probably buried the caveats that
I'm hinting at under the learn more but unfortunately that might be too late yeah I'm
back yeah I mean those of you who know how to take a screenshot that's the simplest way if you don't know how to do
that well here's a phone I can just take a picture of what it is I see on the screen and so these are software
defenses that are in place that essentially disable the forward button disable the print button but honestly as
you probably already know once something is already digital I mean it's out there and there are other ways to get it it
might not be as high quality if you're taking out your phone to do it but you should view things like this with
skepticism and even I when I occasionally receive something like this I kind of roll my eyes but regret that
the user thinks what they're doing is consistent with this language but it isn't necessarily and so indeed in part
from a introduction to computer science you begin to I mean get a little scared from what's going on out there because
there's so many different threats and so many things that you can't in fact do and the onus is unfortunately often on
us users to read between the lines and see what actually is possible here's another one that you might be more in
the habit of using incognito mode or private mode and Chrome or Safari or Firefox or Edge or the like what does uh
incognito mode do if familiar what's incognito mode yeah it doesn't log locally what you're
doing exactly most people here probably generally know about things called cookies even if you're not quite sure
how they work but they're like these little remnants or breadcrumbs you leave behind when visiting websites that allow
the websites to keep track of where who you are in some sense according to google here when you're using incognito
mode Chrome won't save your browsing history so that's good um cookies and site data information entered into forms
uh but to their credit they do disclaim that your activity might still be visible to the website to visit your
employer or School your internet service provider so they're getting better at at least helping you evaluate by giving
more of the facts whether you do or don't want to do this but this doesn't mean that the websites you're visiting
indeed no um don't know who you are all of our computers have unique addresses these things called IP addresses that
you might have heard about in cs50 we'll explore these in another week's time your computer is constantly leaking
information that could be used to infer who you were so this is really just best left when you don't want to accidentally
on like a friend's computer or a lab computer remain logged in CU cookies are typically used to just remember that
you've logged in so if you use a friend's computer you use incognito mode and just close the window boom you're
effectively logged out but even as Google disclaims there's other caveats there there too so what else might we
keep in mind how about let's consider one other big one that's another thing to start looking for increasingly in
order to keep yourself secure and this one's a little more technical encryption and as cs50 students will know this is
something you can Implement in code and in fact let me ask this question what does it mean to
encrypt something think back to P set 2 and and Caesar and the like let me look a little further back almost any student
hand should theoretically be up here yeah exactly encryption is all about substituting one letter for another and
generally scrambling the appearance of some message up so that the recipient knows how to reverse that process and
see what you actually sent but anyone intervening in between you can't actually see the information between you
so just to impress uh the parents in the rooms uh any students what does this say we're not ending here but CS this
was cs50 that's what it would say but notice the scramble let me go back and forth back and forth uh in this message
T becomes you H becomes I I becomes j s becomes T this is what we called a few weeks ago in cs50 a rotational Cipher a
Caesar Cipher that literally does as you describe substitutes one letter for the next but it does so in a very predicable
way a becomes b b becomes C and so forth and we also talked weeks ago that you don't have to keep it that simplistic
you can use a bigger mathematical formula to make it at least harder for some adversary to figure out but you and
I as users these days are constantly thankfully using encryption you probably generally know that you should be hoping
for expecting this these days like https is a good thing s means secure literally and any website that has that in its URL
indicates to you that you and the website are having an encrypted a scrambled communication which means if
you type in your password your credit card information anything else personally no one between you
theoretically points A and B should be able to know what it is you've typed into that web page the web page
absolutely can because they have the proc the ability to decrypt that information to reverse the process but
at least encryption is generally a good thing but today let's take that one step further and encourage you all to be
looking for expecting if you will as consumers increasingly in the coming years something better than encryption
alone but end to endend encryption and you're starting to hear about read about this a little bit more but it's perhaps
a little less familiar someone in the room who's familiar what is end to endend
encryption let me give folks a moment what is end to end encryption okay
yeah good so it's when an app like WhatsApp encrypts a message but it's encrypted all the way to the other side
to the recipient even though Facebook in this case owns WhatsApp even though your message is going through Facebook or
meta servers they do not have theoretically the ability to decrypt your message whatever chat message
you've sent to a friend they are just sending seemingly random zeros and ones all the way to the end user who can then
decrypt it if you're an iPhone user iMessage for instance does this automatically so long as your text
messages are blue and not green that means you're using iMessage in Apple's platform that does this but let's let's
focus perhaps on something that's been all too familiar to most of us over this past year Zoom right Zoom actually took
some flak some months ago because in their marketing literature they were advertising endtoend encryption they
were not implementing endtoend encryption at least initially this was probably marketing gone aai not quite
understanding what endtoend encryption means they were using encryption and what that meant is that if I were having
a meeting with a colleague or you were sitting in on a class with a teacher you might have an encrypted connection all
of you to zoom centrally but they had the ability early on and still now if you leave this feature off to decrypt
that information and see and listen to theoretically anything going on in that meeting or that class room now
technologically there's not really a good defense against that if using that older approach all it really is is
policy or hopefully there's rules in place there's contracts in place that say well yeah that's possible but don't
do that end to end encryption is a stronger guarantee for you that circumvents that risk altogether by
ensuring that if you're tuning into that class or you're into that meeting all of the zeros and ones are going through
Zoom servers just like Facebooks but only the end users only the students and teachers only the colleague and
colleague can actually decrypt and see and hear what it is that's being said and if you're a one who schedule Zoom
meetings you can actually see this for instance here's a screenshot I took yesterday too scheduling like a zoom
meeting for today and you'll see that you can choose the Day and The Time the password haha and also down here the
encryption level and by fault it's typically enhanced encryption which is stupid like enhanced encryption it's
just encryption and in fact it's sort of worse encryption than the other checkbox which is endtoend encryption but there's
this little caveat and here too consistent with this reality in Computing there's always a trade off
right it's not all upside and all win several features will be automatically disabled when using endtoend encryption
including Cloud recording and some phone stuff I mean that's already kind of a big loss for a class for instance a
conference that wants to keep the sessions but it kind of makes sense right if the data is encrypted between
all of the end users and therefore Zoom has no eyes into the data or ears then it makes sense that they can't record it
for you in the cloud because it's completely completely scrambled to them too so a good primitive to have in place
but also something that you need to sacrifice in terms of usability well let me in our final moments here let me flip
back over to where our hacking tool is it would seem that eight characters is doing really well cuz we still got three
A's at the beginning of this so that might be in fact one takeaway and in fact let me flip over and propose three
pieces of homework for everyone here one use a password manager the one that's built into your phone or your operating
system or pay a little something more for something that you might like a little better two use two-factor
authentication for more of your accounts maybe not all but at least more of your accounts and that's certainly a net
Improvement and then three use not just encryption but endtoend encryption and unfortunately these features are not all
quite as simple as oh well let me just check the box and turn on that something something that's always been available
to me because it's not always been available and zoom only once they sort of got into trouble for this did they
acquire some other company that implements this feature and then add it to their software but as users as
consumers as parents as students considering choosing one tool or another because of these features is really
something you are empowered to do and do not use those tools that you don't think meet some threshold of comfort for you
for for more on this and computer SS more generally any of you can take cs50 online at edx.org cs50 it's been so nice
to see you happy to chat one-onone but otherwise have a wonderful day here on campus this was cs50
[Music] [Music]
CS50 is Harvard University's introduction to computer science and programming, focusing on computational thinking and problem-solving. The course covers foundational programming concepts, data structures, Python, SQL, web development with HTML, CSS, JavaScript, and essential cybersecurity practices.
The CS50 course begins with the basics of programming, including variables, loops, conditionals, and functions. It emphasizes understanding data structures like arrays and linked lists, and introduces algorithm efficiency through Big O notation, allowing students to grasp the trade-offs between time and space complexity.
CS50 introduces Python for its simplified syntax and memory management, and SQL for relational database management. Students also learn to use frameworks like Flask and Bootstrap for web development, along with essential tools like Git for version control.
The course covers web development essentials, starting with HTML for page structure, CSS for styling, and JavaScript for interactivity. Students learn about responsive design, accessibility, and how to build dynamic web applications using the Flask framework, including routing and session management.
CS50 emphasizes the importance of strong passwords, two-factor authentication, and the risks of SQL injection. It teaches mitigation techniques such as parameterized queries and encourages the use of debugging tools, password managers, and encryption to enhance security.
CS50 encourages students to continue learning beyond the course by recommending tools like Git and local development environments. It also highlights community resources such as Stack Overflow, Reddit, and official documentation to support ongoing education and problem-solving.
The course uses interactive elements like live polls and quizzes to reinforce learning and assess understanding. It emphasizes comprehension over memorization, ensuring that students engage deeply with the material and can apply their knowledge in practical scenarios.
Heads up!
This summary and transcript were automatically generated using AI with the Free YouTube Transcript Summary Tool by LunaNotes.
Generate a summary for freeRelated Summaries
CS50: Introduction to Computer Science and Programming at Harvard University
In this engaging lecture, David Malan introduces CS50, Harvard's renowned course on computer science and programming. He shares his personal journey into the field, discusses the importance of computational thinking, and outlines the course structure, including hands-on projects and collaborative learning experiences. The lecture emphasizes problem-solving skills and the applicability of computer science across various disciplines.
Comprehensive Guide to HTML and CSS: From Basics to Advanced Techniques
This video series provides a thorough introduction to HTML and CSS, covering everything from the foundational elements of web development to advanced styling techniques. Learn how to create structured web pages, style them effectively, and implement interactive features using HTML and CSS.
Comprehensive Artificial Intelligence Course: AI, ML, Deep Learning & NLP
Explore a full Artificial Intelligence course covering AI history, machine learning types and algorithms, deep learning concepts, and natural language processing with practical Python demos. Learn key AI applications, programming languages, and advanced techniques like reinforcement learning and convolutional neural networks. Perfect for beginners and aspiring machine learning engineers.
Java Programming Course: Introduction, Structure, and Setup Guide
Learn about Java programming fundamentals, data structures, and how to set up your coding environment.
Unlocking the Power of Go: A Comprehensive Programming Course for Beginners
Learn Go programming with our comprehensive course for beginners. Master the fundamentals and build real-world projects!
Most Viewed Summaries
Kolonyalismo at Imperyalismo: Ang Kasaysayan ng Pagsakop sa Pilipinas
Tuklasin ang kasaysayan ng kolonyalismo at imperyalismo sa Pilipinas sa pamamagitan ni Ferdinand Magellan.
A Comprehensive Guide to Using Stable Diffusion Forge UI
Explore the Stable Diffusion Forge UI, customizable settings, models, and more to enhance your image generation experience.
Pamamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakaran ng mga Espanyol sa Pilipinas, at ang epekto nito sa mga Pilipino.
Mastering Inpainting with Stable Diffusion: Fix Mistakes and Enhance Your Images
Learn to fix mistakes and enhance images with Stable Diffusion's inpainting features effectively.
Pamaraan at Patakarang Kolonyal ng mga Espanyol sa Pilipinas
Tuklasin ang mga pamamaraan at patakarang kolonyal ng mga Espanyol sa Pilipinas at ang mga epekto nito sa mga Pilipino.

