Download Subtitles for Harvard CS50 2026 Computer Science Course
Harvard CS50 (2026) – Full Computer Science University Course
freeCodeCamp.org
SRT - Most compatible format for video players (VLC, media players, video editors)
VTT - Web Video Text Tracks for HTML5 video and browsers
TXT - Plain text with timestamps for easy reading and editing
Scroll to view all subtitles
If you want to learn about computer
science and the art of programming, this
course is where to start. CS50 is
considered by many to be one of the best
computer science courses in the world.
This is a Harvard University course
taught by Dr. David Men and we are proud
to bring it to the free code camp
channel. Throughout a series of
lectures, Dr. Men will teach you how to
think algorithmically and solve problems
efficiently. And make sure to check the
description for a lot of extra resources
that go along with the course.
All right. This is
This is CS50, Harvard University's
introduction to the intellectual
enterprises of computer science and the
arts of programming. My name is David
Men and this is week zero. And by the
end of today, you'll know not only what
these light bulbs here spell, but so
much more. But why don't we start first
with the uh the elephant or the elephant
in the room. That is artificial
intelligence, which is seemingly
everywhere over the past few years. And
it's been said that it's going to change
programming. And that's absolutely the
case. It's been that way actually for
the past several years is only going to
get to be the case all the more. But
this is an incredibly exciting time.
This is actually a good thing I do think
in so far as now using AI in any number
of forms. You can ask the computer to
help solve some problem for you. You can
find some bug or mistake in your code.
Better still increasingly you can tell
the AI what additional features you want
to add to your software. And this is
huge because even in industry for years,
humans have been programming in some
form for decades, building products and
solutions to problems, the reality is
that you and I as humans have long been
the bottleneck. There's only so many
hours in the day. There's only so many
people on your team or in your company
and there's so many more bugs that you
want to solve and so many more features
that you want to implement. But at the
same time, you still really need to
understand the fundamentals. And indeed,
a class like this CS50 has never been
about teaching you how to program. Like
that's actually one of the side effects
of taking a class like this. But the
overarching goal is to teach you how to
think, how to take input and produce
correct output and how to master these
and other tools. And so by the end of
the semester, not only you will be not
only will you be acquainted with
languages like Scratch, which we'll
touch on today if you've not seen it
already, languages like C and Python and
SQL, HTML, CSS, and JavaScript. You'll
be able to teach yourself new things
ultimately, and ultimately be able to
tell computers increasingly what it is
you want it to do. But you'll still be
in the driver's seat, so to speak.
You'll be the pilot. You'll be the
conductor. Whatever your preferred
metaphor is. And that's what I think is
so empowering still about learning
introductory material, foundational
material, because you'll know what
you're ultimately talking about and what
you can in fact solve. And we've been
through this before, like when
calculators came out. It's still
valuable, I dare say, all these years
later to still know how to do addition
and subtraction and whatnot. And yet, I
think back on some of my own math
classes. I remember learning so many
darn ways in college how to take
derivatives and integrals. And after
like the six process of that, I sort of
realized, okay, I get it. I get the
idea. Do I really need to know this many
ways? And here too, with AI and with
code, can you increasingly sort of
master the ideas and then lean on a a
co-pilot assistant to actually help you
solve those same problems. So, let's do
some of this ourselves here. In fact,
just to give you a teaser of what you'll
be able to do yourselves before long,
let me go ahead and open up a little
something called Visual Studio Code, aka
VS Code for short. This is popular
largely open- source or free software
that's used by real world people in
industry to write code. And it's
essentially a text editor similar to
Notepad if you're familiar with that or
text edit kind of like Google Docs but
no boldf facing and underlining and and
things like that that you'd find in word
processing programs. And this is CS50's
version thereof. We're going to
introduce you to this all the more next
week. But for now, let's just give you a
taste of what you can do with an
environment like this. So I'm going to
switch over to this program already
running VS Code. And in this uh bottom
of the screen, you're going to see a
so-called terminal window. Again, more
on that next week. But it's in this
terminal window that I can write
commands that tells the computer what I
want it to do. For instance, let's
suppose just for the sake of discussion
that I want to make my own chatbot, not
chat GPT or Gemini and Claude, like
let's make our own in some sense. So,
I'm going to code up a program called
chat.py. And you might be familiar that
I using a language here.py is it's just
called Python. And if unfamiliar, you're
in good company. You'll learn that too
within a few weeks. And at the top of
the file here, I can write my code. And
at the bottom of the file of the window
here, I can run my code. So, here's how
relatively easy it is nowadays to write
even your own chatbot using the AI
technologies that we already have. I'm
going to go ahead and type a command
like import uh uh I'm going to go ahead
and type the following from OpenAI.
import open AI. We'll learn what this
means ultimately, but what I'm going to
do is write my own program on top of an
API, application programming interface
that someone else provides, a big
company called OpenAI, and they're
providing features and functionality
that now I can write code against. I'm
going to create a so-called client,
which is to say a program of my own
that's going to use this OpenAI
software. And then I'm going to go ahead
and ask this software for a response.
And I'm going to set that equal to
client.responses.create
whatever all that means. And then inside
of these parenthesis I'm going to say
the following. The input I want to give
to this underlying API is quote unquote
something like in one sentence
what is CS50? Much like I would ask
chatpt itself. If you're familiar with
things like chat GPT and AI more
generally nowadays, you know there's
this thing called models which are like
statistical models that ultimately drive
what the AIs can do. I'm going to go
ahead and say model equals quote unquote
gpt5 which is the latest and greatest
version at least as of today. Now down
in my terminal window I'm going to run a
different command python of chat.py and
so long as I have made no typographical
errors in this program I should be able
to ask openai not with chatgpt.com but
with my own code for the answer to some
question. But I want to know what the
answer to that question is. So, I
actually want to print out that response
by saying print response output text. In
other words, these 10 lines, and it's
not even 10 lines because a few of them
are blank, I've implemented my own
chatbot that at the moment is hard-coded
that is permanently configured to only
answer one question for me. And let's
see, with the cross of the fingers, CS50
is Harvard University's introductory
computer science course, the
intellectual enterprises of computer
science and the art of programming.
weirdly familiar covering problems
solving algorithms, data structures, and
more using languages like C, Python, and
SQL. Okay, interesting. But let's make
the program itself more dynamic. Suppose
you wanted to write code that actually
asks the human what their question is
because very quickly might we want to
learn something more than just this one
question. So up here, I'm going to go
and change my code and type something
like this. Type prompt equals input with
parenthesis. More on this another time,
too. But what I'm going to ask the user
for is to give me an actual prompt. That
is a question that I want this AI to
answer. And down here, what you'll
notice, even if you've never programmed
before, is that I can do something
somewhat intuitive in so far as line
five is now asking the human for input.
Let's just stipulate that this equal
sign means store that answer in a
variable called prompt where variables
just like in math x, y, or z. Let's go
ahead and store that in prompt. So the
input I want to give to open ai now is
that actual prompt. So, it's a
placeholder containing whatever
keystrokes the human typed in. If I now
run that same command again, python of
chat.py, hit enter, cross my fingers,
I'll see now dynamic prompting. So,
what's a question I might want to ask?
Well, let's just say it again. In one
sentence, whoops, in one sentence, what
is CS50? Question mark. Enter. And now
the answer comes back as probably
roughly the same but a little bit
different a variant thereof. But maybe
we can distill this even more
succinctly. How about let's run it
again. Python of chat.py and let's say
in one word what is CS50 and see if the
underlying AI obliges.
And after a pause course in a word. So
that's not all that incorrect. And maybe
we can have a little fun with this. Now
how about in one word which is
which is better maybe Harvard
or Stanford question mark hope you
picked right let's see the answer is
depends okay so would not in fact oblige
but notice what I keep doing in this
code I keep providing a prompt as the
human like in one sentence in one word
well if you want the AI to behave in a
certain A why don't we just tell the
underlying system to behave in that way
so I the human don't have to keep asking
it in one sentence in one sentence in
one word so we can actually introduce
one other feature that you'll hear
discussed in industry nowadays which is
not only a prompt from the user which
I'm going to now temporarily rename to
user prompt just to make clear it's
coming from the user I'm going to also
give our what's called a system prompt
by setting this equal to some
standardized instructions that I want
the AI to respect like limit your answer
to one sentence, quote unquote. And now,
in addition to passing in as input the
user prompt, I'm going to actually tell
Open III to use these instructions
coming from this other variable called
system prompt. So, in other words, I'm
still using the same underlying service,
but I'm handing it now not only what the
user typed in, but also this
standardized text limit your answer to
one sentence. So, the human like me
doesn't have to do that anymore. Let's
now go back to my terminal. run Python
of chat.py Pi once more and this time
we'll be prompted but now I can just ask
what is CS50 question mark and I'll
likely get a correct and similar answer
to before and indeed it's Harvard
University's flagship introductory
computer science course dot dot dot so
seems spot on too but now we can have
some fun with this too and you might
know that these GPTs nowadays have sort
of personalities you can make them
obliged to behave in one way or another
why don't we go into our system prompt
here and say something silly like
pretend You're a cat. And now let's go
back to the prompt one final time. Run
Python of chat.py. Prompt again will be
say what is CS50? And with a final
flourish of hitting enter, what do we
get back?
CS50 is Harvard University's
introductory computer science course
teaching programming algorithms, data
structures, and problem solving. And
it's available free online. Meow. So
that was enough to coersse this
particular behavior. So this is to say
that with programming, you have the
ability in like 10 lines of text, not
all of which you might understand yet,
but that's the whole point of a class
like this to build fairly powerful
things, maybe silly things like this,
but in fact, it's using these same
primitives that CS50 has its own virtual
rubber duck. And we'll talk more about
this in the weeks to come, but long
story short, in the world of
programming, it's kind of a thing to
keep a rubber duck literally on your
desk or really any inanimate cute object
like this because when you are
struggling with some problem, some bug
or mistake in your code and you don't
have a friend, a teaching assistant, a
parent or someone else who's more
knowledgeable than you about code, well,
you literally are encouraged in
programming circles to like talk to the
rubber duck. And it's through that
process of just verbalizing your
confusion and organizing your thoughts
enough to convey it to another person or
duck in this case that so often that
proverbial light bulb goes off and you
realize ah I'm being an idiot now I hear
in my own thoughts the ill logic or the
mistake I'm making and you solve that
problem as well. So CS50 drawing
inspiration from this will give to you a
virtual duck in computer form and in
fact among the other URLs you'll use
over the course of the semester is that
here cs50.ai AI which is also built into
that previous URL cs50.dev dev whereby
these are the AIS you can use in CS50 to
solve problems and you are encouraged to
do so as you'll see in the course
syllabus it is not reasonable it is not
allowed to use AI based software other
than CS50's own be it claw Gemini chat
GPT or the like but it is reasonable and
very much encouraged along the way to
turn not only to humans like me your
teaching assistant and others in the
class but to CS50's own AI based
software and what you'll find is that
this virtual duck is designed to behave
as close to a good human tutor as you
might expect from an actual human in the
real world knows about CS50 knows how to
lead you to a solution ideally without
simply spoiling it and providing it
outright. So with that said that's sort
of the endgame to be able to write code
like that and more. But let's really
start back at the beginning and see how
we can't get from zeros and ones that
computers speak all the way back to
artificial intelligence. So computer
science is the in the name of the course
computer science 50. But what is that?
Well, it's really just the study of
information. How do you represent it?
How do you process it? And very much
gerine to computer science is what the
world calls computational thinking,
which is just the application of ideas
from computer science or CS to problems
generally in the real world. And in
fact, that's ultimately, I dare say,
what computer science really is. It's
about problem solving. And even though
we use computers, you learn how to
program along the way, these are really
just tools and methodologies that you
can leverage to solve problems. Now,
what does that mean? Well, a problem is
perhaps most easily distilled into a
simple picture like this. We've got some
input, which is like the problem we want
to solve, and the output, which is the
goal we want, the solution there, too.
And then somewhere in the middle here is
the proverbial black box, the sort of
secret sauce that gets that input from
output. So, this then I would say is in
essence is problem solving and thus
computer science. But we have to agree,
especially if we're going to use
devices, Macs, PCs, phones, whatever.
How do we all represent information, the
inputs and the outputs, in some
standardized way? Is it with English? Is
it with something else? Well, you all
probably know, even if you're not
computer people, that at the end of the
day, computers somehow use zeros and one
entirely. That is their entire alphabet.
And in fact, you might be familiar
already with certain such systems. So
the unary uh notation, which means you
essentially use single digits like
fingers on your hand. For instance,
unary aka base one is something you can
do on your own human hand. So for
instance, with one human hand, how high
can I count?
>> All right, so hopefully 1 2 3 4 5 and if
you want to count to six and uh to 11
and 10 and so forth, you need to, you
know, take out another hand or your toes
or the like because it's fairly
limiting. But if I think a little
harder, instead of just using unary,
what if I use a different system
instead? What about something like
binary? Well, how high if you think a
little harder can you count on one human
hand?
So 31 says someone who studied computer
science before. But why is that? It's
kind of hard to imagine, right? Because
1 2 3 4 5 seems to be the five possible
patterns. But that's only when you're
looking at the totality of fingers that
are actually up. Five in total or four
in total or one or the like. But what if
we take into account the pattern of
fingers that are up and we just
standardize what each of those fingers
represent? So maybe we all agree like a
good computer would too that maybe no
fingers up means the number zero. And if
we want to count to one, let's go with
the obvious. This is now one. But
instead of two being this, which was my
first instinct, maybe two can just be
this. A single second finger up like
this. And that means we could now use
two fingers up to represent three. I'll
propose we can use just one middle
finger up to offend everyone, but
represent four. I could maybe use these
two fingers with some difficulty to
represent five, six, seven. I'm already
up to seven having used only three
fingers. And in fact, if we keep going
higher and higher, I bet I can get as
high as 31 for 32 possible combinations,
but the first one was zero. So that's as
high as we can count. So we'll make this
connection in just a moment. But what I
started to do there is something called
base 2. Instead of just having fingers
up or fingers down, I'm taking into
account the positions of those fingers
and giving meaning to like this finger
here, this finger here, this finger here
and so forth. Different weights if you
will. So the binary system is indeed all
computers understand. And you might be
familiar with some terminology here.
Binary digit is not really something
anyone really says, but the shorthand
for that is going to be bit. So if
you've heard of bits and we'll soon see
bytes and then kilobytes and megabytes
and gigabytes and terabytes and more.
This just refers to a bit meaning a
single binary digit either a zero or a
one. A zero is perhaps most simply
represented by just like turning maybe
keeping a finger down or in the world of
computers which have access to
electricity be it from the wall or maybe
a battery. You know what we could do? We
could just decide sort of universally
that when a light bulb is off, that
thing represents a zero. And when the
light bulb is on, that thing's going to
represent a one instead. Now, why is
this? Well, electricity is such a simple
thing, right? It's either flowing or
it's not. And we don't even have to
therefore worry about how much of it is
flowing. And if you're vaguely remember
a little bit about voltage, we can sort
of be like zero volts, nothing's there
available for us. Or maybe it's 5 volts
or something else in between. But what's
nice about binary only using zeros and
ones is that it maps really nicely to
the real world by like throwing a light
switch on and off. You can represent
information by just using a little bit
of electricity or the lack thereof. So
what do I mean by this? Well, suppose we
want to start counting using binary
zeros and ones only. Well, let's think
of them metaphorically as like akin to
these light bulbs here. And in fact, let
me grab a few of these light bulbs and
let me propose that if we want to
represent the number zero, well, it
stands to reason that here single light
bulb that is off can be agreed upon as
representing zero. Now, in practice,
computers don't have little light bulbs
inside, but they do have little switches
inside. Millions of tiny little things
called transistors that if turned on can
allow it to capture a little bit of
electricity and effectively turn on a
metaphorical bulb or the switch can go
off. the transistor can go off and
therefore let the electricity dissipate
and you have just now a zero.
Unfortunately, even though I can let
some electricity, there's the battery I
mentioned is required. Even though we
might have some electricity available to
us, I can therefore count to one. But
how do I go about counting?
Hardware problem. How do I go about
counting higher than one with just a
light bulb?
Yeah. So, I need more of them. So, let
me grab another one here. And now I
could put it next to it. And this two
I'll claim is just still the number one.
But if I want to turn two of them on,
well, that would mean I could count to
two. And if I maybe grab another one,
now I can count as high as three. But
wait a minute. I'm doing something wrong
because with three human fingers, how
high was they able to count?
So, seven in total, starting at zero.
So, I've done something wrong here. But
let me be a little more clever than
about the pattern that I'm actually
using. Perhaps this can still be one.
But just like my finger went up and only
one finger in the second version of
this, this can be what we represent as
two. Which one do I want to turn on as
three? Your left or your right?
>> So you're right because now this matches
what I was doing with my fingers a
moment ago. And I claimed we could
represent three like this. If we want to
represent four, that's fine. We have to
turn that off, this off, and this on.
And that's somehow four. And let's go
all the way up to seven. Which ones need
to be on to represent the number seven?
All right. So, all of them here. Now, if
you're not among those who just sort of
naturally said all of them, like what
the heck is going on? How do half the
people in this room know what these
patterns are supposed to be? Well, maybe
you're remembering what I did with my
fingers. But it turns out you're already
pretty familiar with systems like this,
even if you might not have put a name to
it. So in the human world, the real
world, most of us deal every day with
the so-called base 10 system, otherwise
known as decimal deck implying 10
because in the decimal system you have
10 digits available to you, 0 through 9.
In the binary system, we only had two by
implying two. So 0 and one and unary we
had just one, a single digit there or
not. So in the decimal system, we just
have more of a vocabulary to play with.
And yet you and I have been doing this
since grade school. So this is obviously
the number 123. But why? It's
technically just three symbols. 1 2 3.
But most of us, your mind ego goes,
okay, 123. Pretty obvious, pretty
natural. But at some point, you like me
were probably taught that this is the
one's place and this is the 10's place
and this is the 100's place and so
forth. And the reason that this pattern
of symbols 1 2 3 is 123 is that we're
all doing some quick mental math and
realizing well that's 100* 1 + 10 * 2 +
1 * 3. Oh, okay. There's how we get 100
+ 20 + 3 gives us the number we all know
mathematically is 123. Well, it turns
out whether you're using decimal or
binary or other base systems that we'll
talk about later in the course, the
system is still fundamentally the same.
Let's kind of generalize this away.
Here's a three-digit number in some base
system specifically in decimal. And I
know that only because of the
placeholders that I've got on top of
each of these numbers. But if we do a
little bit of math here, 1 10 100 1,000
10,000 and so forth. What's the pattern?
Well, technically this is 10^ the 0 10
the 1 10 the 2 and so forth. And we're
using 10 because we can use as many as
10 digits under each of those columns.
But if we take some of those digits away
and go from decimal down to binary, the
motivation being it's way easier for a
computer to distinguish electricity
being on or off than coming up with like
10 unique levels of electricity to
distinguish among. You could do it. It
would be annoying and difficult to build
in hardware. You could do it so much
simpler to just say on and off. It's a
nice simple world that way. So let's
change the base from 10 to two. And what
does this get us? Well, if we now do
undo the math, that's 2 to the 0 is 1. 2
to the 1 is 2. 2 to the 2 is 4. So the
ma the mental math is now about to be
the same, but the columns represent
something a little bit different. So for
instance, if I turn all of these off
again, such that I've got off, off off,
otherwise known as 0 0, it's zero
because it's 4 * 0 + 2 * 0 + 1 * 0 still
gives me zero. By contrast, if I turn on
maybe just this one all the way over on
the left, well, that's four times one
because on represents one and off
represents 0 plus 2 * 0 + 1 * 0, that
gives me four. And if I turn both of
these on, such that all three of them
are now on, on on aka one, one, one,
that's 4 * 1 + 2 * 1 + 1 * 1. That then
gives me seven. And we can keep adding
more and more bits to this. In fact, if
we go all the way up uh numerically,
here's how we would represent in binary
the number you and I know is zero.
Here's how we would represent one.
Here's how we would represent two and
three and four and five. And you can
kind of see in your mind's eye now
because I only have zeros and ones and
no twos or threes, not to mention nines,
I'm essentially going to be carrying a
one in a moment if we were to be doing
some math. So to go from five to six,
that's why the one ends up in the middle
column. To go to seven here gives us now
1 one or on on on. How do I represent
eight
using ones and zeros? Yeah,
>> we need to add another digit.
>> Yeah. So we're going to need to add
another digit. We need to throw hardware
at the problem using an additional digit
so that we actually have a column
representing eight. Now, as an aside,
and we'll talk about this before long,
if you don't have an additional digit
available, if your computer doesn't have
enough memory, so to speak, you might
accidentally count from 0 1 2 3 4 5 6 7
and then accidentally end up back at
zero. Because if there's no room to
store the fourth bit, well, all you have
is part of the number. And this is going
to create all sorts of problems then
ultimately in the real world. So let me
go ahead and put these back and propose
that we have a system now. If you agree
to sort of count numbers in this way via
which we can represent information in
some standard way and all the device
underneath the hood needs is a bit of
electricity to make this work. It's got
to be able to turn things on aka use
some transistors and it's got to be able
to turn those things off so as to
represent zeros instead of ones. But the
reality is like two bits, three bits,
four bits aren't very useful in the real
world because even with three bits you
can count to seven, with four you can
count to 15. These aren't very big
numbers. So it tends to be more common
to actually use units of measure of
eight bits at a time. A bite is just
that one bite is eight bits. So if
you've ever used the vernacular of
kilobytes, megabytes, gigabytes, that's
just referring to some number of bits.
But eight of them together compose one
individual bite. So here for instance is
a bite worth of bits. Eight of them
total. I've added all the additional
placeholders. And what number does this
represent in decimal even though you're
looking at eight binary digits?
>> Just zero cuz like literally every
column is a zero. Now this is a bit more
of mental math but unless you know it
already. What if I change all of the
zeros to ones? I turn all eight light
bulbs on. What number is this?
>> Yeah. So 255. Now some of those of you
who didn't get that instantly, that's
fine. You could certainly do the math
manually. I dare say some of you have
some prior knowledge of how to do this
sort of system. But 255 means that if
you start counting at zero and you go
all the way up to 255, okay, that's 256
total possibilities once you include
zero in the total number of patterns of
zeros and ones. And this is just going
to be one of these common numbers in
computer science. 256. Why? because it's
referring to eight of something. 2 to
the 8 gives you 256. And so you're going
to commonly see certain values like
that. 256. Back in the day, computers
could only show 256 colors on the
screen. Certain graphics formats
nowadays that you might download can
only use as many as 256 colors because,
as we'll see, they're only using, for
instance, eight bits, and therefore they
can only represent so many colors of the
rainbow as a result. So this then is how
we might go from just zeros and ones
electricity inside of a computer to
storing actual numbers with which we're
familiar. And honestly we can go higher
than 255. What do you need to count
higher than 255? A 9th bit, a 10th bit,
an 11th bit and so forth. And it turns
out common conventions nowadays and
we'll see this in code too is to use as
many as 32 bits at a time. So that's a
good chunk of bits. And anyone want to
ballpark how high you can count count if
you've got 32 bits available to you?
Oh, fewer people now. Yeah, in the back.
>> Yeah. So, it's roughly 4 billion. And
it's technically two billion if you also
want to represent negative numbers, but
we'll revisit that question. But 2 to
the 32nd power is roughly 4 billion.
However, nowadays it's even more common
with the Macs and PCs you might have on
your laps and even your phones nowadays
to use 64 bits, which is a big enough
number that I'm not even sure offhand
how to pronounce it. That's a lot of
permutations. That's 2 to the 64
possible permutations, but that's
increasingly common place. And as an
aside, just to dovetail things with our
discussion of AI, among the reasons that
we're living through over these past few
years, especially this crazy interesting
time of AI, is because computers have
been getting so much faster,
exponentially so over time, they have so
much more memory available to them.
There's so much data out there on the
internet in particular to train these
models that it's an interesting
confluence of hardware now actually
meeting the mathematics and statistics
that we'll talk about later in the class
that ultimately make tools like the cat
we just built possible. But of course
computers are not all math and in fact
we'll use very little math per se in
this class. And so let's move away
pretty quickly from just zeros and ones
and talk about letters of the alphabet.
Say in English here is the letter A.
Suppose you want to use this letter in
an email, a text message, or any other
program. What is the computer doing
underneath the hood? How can the
computer store a capital letter A in
English? If at the end of the day, all
the computer has access to is a source
of electricity from the wall or from a
battery and it has a lot of switches
that it can turn on and off and treat
the electricity in units of 8 or 32 or
64 or whatever.
How might a computer represent a letter
A?
>> Yeah, we need to give it an identity so
to speak as an integer. In other words,
at the end of the day, if your entire
canvas, so to speak, consists only of
zeros and ones. Like that is going to be
the answer to every question today. You
only have zeros and ones as the solution
to these problems. We just need to agree
what pattern of zeros and ones and
therefore what integer, what number
shall be used to represent the letter A.
And hopefully when we look at that
pattern of zeros and ones in the right
context, we'll indeed see it as an A. So
if we look inside of a computer so to
speak in the context of like a text
messaging program or a word processor or
anything like that, that pattern shall
be interpreted hopefully as a capital
letter A. But if I open up Mac OS's or
Windows or my phone's calculator
program, I would want that same pattern
of zeros and ones to be interpreted
instead as a number. If I open up
Photoshop, as we'll soon see, I want
that same pattern of zeros and ones to
be interpreted as a color presumably,
not to mention videos and sound and so
forth, but it's all just zeros and ones.
And so, even though I, when writing that
chat program a few minutes ago, didn't
have to worry about telling the
computer, oh, this is text, this is a
number, this is something else. We'll
see as we write code ourselves that you
as the programmer will have control over
telling the computer how to treat some
pattern of zeros and ones telling it
this is a number, this is a color, this
is a letter or something else. Um, how
do we represent the letter A? Well,
turns out a bunch of humans in a room
years ago decided ah this pattern of
zeros and ones shall be known globally
as a capital letter English A. What is
that number if you do the quick mental
math? So indeed 65 because we had a one
in the 64's place and a one in the onees
place. So 65 that's just sort of it. It
would have been nice if it were just the
number one or maybe the number zero. But
at least after the capital letter A,
they kept things consistent such that if
you want to represent a letter B, it's
going to be 66. Capital letter C, it's
going to be 67. Why? Because the humans
in this room, a bunch of Americans at
the time, standardized on what's called
ASKI, the American standard code for
information interchange. doesn't matter
what the acronym represents, but it was
just a mapping. Someone on a piece of
paper essentially started writing down
letters of the alphabet and
corresponding numbers so that computers
subsequently could all speak that same
standard representation. And here's an
excerpt thereof. In this case, we're
seeing seven bits worth, but eventually
we ended up using eight bits in total to
represent letters. And some of these are
fairly cryptic. Maybe more on those
another time. But down here, if we
highlight just one column, we'll see
that indeed on this cheat sheet, 65 is
capital A, 66 is B, 67 is C, and so
forth. So, why don't we do a little
exercise here? What pattern of zeros and
ones do I see here? I've got three
bytes, so three sets of eight bits. And
even though there's no placeholders now
over the columns, what is this
number?
It's 60. Yeah. Yeah. So, we got the
ones, twos, fours, 8s, uh, 16, 32, 64s
column. So, indeed, this is going to be
the number 72. 72. This is not what
computer scientists spend their day
doing. This is just to reinforce what it
is we just looked at. And I'll spoil it.
The rest of these numbers are 72 73 33.
And anyone in this room could have done
that if you took out a piece of paper,
figured out what the columns are, and
just do a bit of quick or mental or
written math. But this is to say,
suppose that you just got a text message
or an email that if you had the ability
to look underneath the hood of the
computer and see what pattern of zeros
and ones did you just receive over the
internet. Suppose that pattern of zeros
and ones was three bytes of bits, which
when you do the math are the numbers 72,
73, 33. Well, here's the cheat sheet
again. What message did you just get?
>> Yeah. So, it's high. Why? Because 72 is
H and 73 is I. Now, some of you said hi
fairly emphatically. Why? Well, 33 turns
out, and you wouldn't know this unless
you looked it up or someone told you, is
an exclamation point. So, literally, if
you were to text someone like right now,
if you haven't already, hi exclamation
point in all caps, you would essentially
be sending three bytes of information
somehow over the internet to that
recipient. And because their phone
similarly understands ASI because it was
programmed years ago to do so, it knows
to show you hi exclamation point and not
a number three numbers no less or colors
or something else altogether. So here we
then have hi three digits in a row here.
Um what else is worth noting here? Well,
there's some fun sort of trivia embedded
even in this cheat sheet. So here again
is a b cde e fg and so forth. 65 on
down. Let me just highlight over here
the lowercase letters 97 98 99 and so
forth. If I go back and forth, does
anyone notice the consistent pattern
between these two?
>> Yeah. So, the lowercase letters are 32
away from the uppercase letters. Well,
how do we know that? Well, 97 - 65 is
Yeah. 32. Uh 98 - 66 is okay. 32. And
that pattern continues. What does this
mean? Well, computers know how to do
this. Most normal humans don't need this
information. But what it means is if you
are representing in binary with your
transistors on and off representing some
pattern and this is the pattern
representing capital letter A, which is
why we have a one in the 64's place and
a one in the onees place. How does a
computer go about lowercasing this same
letter? Yeah,
>> perfect. All the computer has to do is
change this one bit in the 32's place to
a one because that has the effect
mathematically per our discussion of
adding the number 32 to whatever it is.
So it turns out you can force text from
uppercase to lowerase or back by just
changing a single bit inside of that
pattern of eight bits in total. All
right, why don't we maybe reinforce this
with another quick exercise? We have an
opportunity perhaps here for um maybe to
give you some stress balls right at the
very start of class. Could we get eight
volunteers to come up on stage? Maybe
over here and over here and uh over here
on the left. Let me go all the way on
the right. Uh let's see. Okay, the high
hand here. The the hand that's highest
there. Yes, we're making eye contact.
How about all the way? Wait, let's see.
Let's go here in the crimson sweatshirt
here. And how about in the the white
shirt here? Come on up. Did I count
correctly? Let's see.
Come on down. The eight of you. I didn't
count right, did I? 1 2 3 4 5 6. It's
ironic that I'm not counting correctly.
Eight here. How about on the left in
gray? Okay. Oh, and uh Okay. In black
here. Come on down. All right.
Hopefully, this is eight. 1 2 3 4 5 6 7.
I pretty. Okay. Eight. There we go. All
right. So, let's go ahead and do the
following exercise. I've got some sheets
of paper preprinted here. If each of you
indeed want to do exactly what you're
doing and line up from left to right,
each of you is going to represent a
placeholder essentially. So we have over
here the ones place all the way over
here. And then we have the two's place
and the four's place and the eights
16
32 64 128. And we come bearing a
microphone if each of you want to say a
quick hello. your name, maybe your dorm
or house, and something besides computer
science that you're studying or want to.
>> Hi, I'm Oh, that's loud. Okay. I'm
Allison. I'm a freshman in Matthews and
um I like climbing and I'm thinking of
CS and econ.
>> Number two.
>> Hi, I'm Lily. I'm in Herbut this year
and I'm thinking of doing CS in
government.
>> Nice to meet.
>> Hi. Hi, I'm Sean. I'm in candidate hall
and I'm thinking of doing astrophysics
and CS.
>> Welcome.
>> Hi, I'm Jordan. I'm doing applied math
with a specialization in CS and econ.
And um I'm in Wigglesworth and I like
going to the gym.
>> Okay, nice. 16.
>> Hi, I'm Shiv. I'm studying Macki and I'm
in Canada.
>> Nice.
>> Hi, I'm Sophia. I'm in the think of
doing electrical engineering.
>> Welcome. Hi, my name is Marie and I'm in
Canada B and I really like CS physics
and astrophysics.
>> Hi, I'm Alyssa. I'm in Hullworthy. I'm
also thinking of studying math or
physics and I also like to climb.
>> Nice. Welcome to you all. So, on the
backs of their sheets of paper, they
have a little cheat sheet that's
describing what they should do in each
of three rounds. We're going to spell
out together a threeletter word. You all
as the audience have a cheat sheet above
you that represents numbers to letters.
These folks don't necessarily know what
they're spelling. They only know what
they individually are spelling. So if
your sheet of paper tells you to
represent a zero in a given round, just
kind of stand there awkwardly, no hands
up. But if you're told on your sheet of
paper to represent a one, just raise a
single hand to make obvious to the
audience that you're representing a one
and not a zero. And the goal here is to
figure out what we are spelling using
this system called ASKI. All right,
round one, execute.
What number is this here?
I'm hearing You can just shout it out.
What number?
>> 66 or B. So, you're spelling B. All
right, hands down. Round two.
More math.
Feel free to shout it out.
>> Oh, I heard it. Yeah. 79, which is
>> O. Okay, so we have B O. Hands down.
Third and final round. Execute
number
87.
>> Yes. 87. Which is the letter?
>> W. Which spells
>> bow? If you want to take your bow now.
>> Ah, okay. Here we go. You guys can keep
those.
Okay. Thank. All right. You guys can
head back. Thank you to our volunteers
here. Very nicely done. We indeed
spelled out bow and that's just because
we all standardized on representing
information in exactly the same way
which is why when you type b on your
phone or your computer the recipient
sees the exact same thing but what's
noteworthy in this discussion is that
you can't spell a huge number of words
like yeah English okay we've got that
covered but odds are you're noticing
depending on your own background what
human languages you read or speak
yourself um that a whole bunch of
symbols might be missing from your
keyboard for instance we have accented
characters here in a lot of Asian
languages there's so many more glyphs
than we could have even fit in that
cheat sheet of numbers and letters and
so ASI is not the only system that the
world uses it was one of the earliest
but we've moved on in modern times to a
superset of ASI that's generally known
as Unicode and Unicode uses so many more
bits than ASI that we even have room for
all of these little things that we seem
to send constantly nowadays these are
obviously images that you might send
with your phone or your computer but
they're technically ally characters.
They're technically just patterns of
zeros and ones that have similarly been
standardized around the world to look a
certain way, but they're this is an
emoji keyboard in the sense that you're
sending characters. You're not sending
images per se. The characters are
displayed as images obviously, but
really these are just like characters in
a different font and that font happens
to be very colorful and graphical as
well. So, Unicode instead of using just
seven or eight bits, which if you do the
quick mental math, if ASKI only used
seven or let's say eight bits, how many
possible characters can you represent in
ASKI alone?
256. Because if we do that quick mental
math, 2 to the eth 256 possibilities,
like that's it. That is that's enough
for English because you can cram all the
uppercase letters, the lowercase
letters, the numbers, and a whole bunch
of punctuation as well. But it's not
enough for certain other punctuation
symbols, not to mention many other human
languages. And so the Unicode
Consortium, its charge in life has been
to come up with a digital representation
of all human language, past, present,
and hopefully future by using not just
seven or eight bits, but maybe 16 bits
per character, 24 bits, or heck, even 32
bits per character. And per before, if
you've got as many as 32 bits available
to you, you can represent what, like 4
billion characters in total. And that's
just one of the reasons why these emoji
have kind of exploded in popularity and
availability. There's just so many darn
patterns. Like, what else are we going
to do with all of these zeros and ones?
But more importantly, emoji have been
designed to really represent people and
places and things and emotions in a way
that transcends human language. But even
then, they're somewhat open to
interpretation. In fact, here's a
pattern of I think 32 zeros and ones.
I'm guessing no one's going to do the
quick mental math here, but this
represents what decimal number if we do
in fact do out the math with that's
being the ones place all the way over to
the left. Well, that's the number 4
bill36,991,16.
Who knows what that is? It's not a and
it's nothing near a uppercase or
lowercase, but it is among the most
popular emoji that you might send
typically on your phone, laptop, or
other device. namely this thing here
face with tears of joy which odds are
you've sent or received recently but
interestingly even though many of you
might have iPhones and see and send the
same image you'll notice that if you see
a friend who's got Android or some other
device maybe you're using uh Meta's
messenger program or Telegram or some
other messaging service sometimes these
emoji look a little bit different why
because what a Unicode has done is they
decided there shall exist an emoji known
known as excuse me faced with tears of
joy then Apple and Google and Microsoft
and others they're sort of free to
interpret that as they see fit. So what
you see on the screen here is a recent
version from iOS, Apple's operating
system. Google's version of the same
looks a little something like this. And
on Telegram, if you have animations
enabled, the same idea faced with tears
of joy is actually animated. But it's
the same pattern of zeros and ones in
each case. But again, they each
essentially have different graphical
fonts to present to you what each of
those images actually is. All right. So,
those are each, excuse me, images.
So, those are each images. How is the
computer representing them though? At
the end of the day, we've represented
numbers, we've represented letters, but
how about these things here, colors? So,
how do we represent red or green or
blue, not to mention every other color
in between? At the end of the day, we
only have one canvas at our disposal.
Yeah,
so integers is the exact same answer as
before. We just need to agree on what
number do we use for red, what do we use
for green, what do we use from blue, and
we can come up with some standardized
pattern for this. In fact, one of the
most common techniques for doing this
and the common one of the most common
ways to do this in the real world is to
use a combination of three colors
together. Some amount of red, some
amount of green, and some amount of
blue, and mix them together to get most
any color of the rainbow that you might
want. This is sort of a a picture of
something I grew up with back in the day
where in like middle school when we'd
watch movies or some kind of show in
like in in class, we would kind of uh
the projector screen would be over here.
This is a old school projector with
three different lenses, one of which
projects some amount of green, some
amount of red, some amount of blue. And
so long as the lenses are correctly
oriented to all point at the same circle
or like rectangular region on the
screen, you would see any number of
colors coming to life in the old school
video. I still remember all these years
later, we would kind of sit and lean up
against it because it was super warm and
you could hear it easy way to fall
asleep back in grade school. But we use
the same fundamental color system
nowadays as well, including in modern
programs like Photoshop. So let's
abstract that away. focus on just three
colors, some amount of red, green, and
blue. And let's suppose for the sake of
discussion that we want to mix together
like a medium amount of red, a medium
amount of green, and just a little bit
of blue. For instance,
let's suppose that we'll use 72 amount
of red, 72 amount 73 amount of green or
or 33 amount of blue, RGB. Now, why
these numbers? Well, in the context of
ASI or Unicode, which is just a
supererset thereof, what does this
spell?
>> Hi. But again, if you were instead to
open a file containing these three
numbers or really these three bytes of
bits in Photoshop, you would hope that
they're going to be interpreted not as
letters on the screen, but as some m uh
the the color of a dot on the screen
instead. So it turns out that in
typically when you have a three of these
numbers together each of them is using a
single bite. So eight bits. So you can
have zero red or 255 red. Zero green or
255 green or 0 to 255 of blue. So zero
is none, 255 is the max. So if we mix
these together, imagine that just like
that projector consolidating these three
colors into one central point. Anyone
want to guess what you're going to get
if you mix some red, some green, some
blue in those amounts in way back?
>> Yeah, you're going to get a dark shade
of yellow. I've brightened it up a
little bit for the projector here, but
you're going to get roughly this shade
of yellow. And we could play with these
numbers all day long and get similar
results if we want to represent
different colors as well. And indeed,
whether it's Photoshop or some other
program, you can actually combine these
amounts in all sorts of ratios to get
different colors. So if you had 0 0 0,
so no red, no green, no blue, take a
guess as to what color that's going to
be in the computer,
>> so it's going to be black, like the
absence of all three of those colors.
But if you mix the maximal amount of
each of those 255, red and green and
blue, that's going to give you white.
Now, if any of you have made web pages
before or use programs like Photoshop,
you might have seen numbers like 00 or
FF. Long story short, that's just
another base system for representing
numbers between 0ero and 255 as well.
But we'll come back to that mid-semester
when we make some of our own filters uh
in sort of an Instagram-like way,
manipulating images of our own. So,
where are these colors coming from or
where can we actually see them? Well,
here's just a picture of that same emoji
face with tears of joy. If I kind of
zoom in on that and maybe zoom in again,
you can start to see if you blow it up
enough or if you put your eyes close
enough to the device, sometimes you can
actually see individual dots or squares.
These are generally known as pixels. And
they're just the individual dots that
collectively compose an image. Which is
to say that if each of these dots, which
is part of the image, is going to be a
distinct color. Like this one's yellow,
this one's brown, and then there's a
bunch in between. Well, you're using
some number of bits to represent each of
those pixels colors. So, if you imagine
using the RGB system, that's 8 + 8 + 8
bit. So, that's 24 bits or three bytes
just to keep track of the color of each
and every one of these dots. So now, if
you think about having downloaded a GIF
at some point, a ping, PNG file, um a
JPEG or any other file format, it's
usually measured in what file size? like
megabytes typically that means millions
of bytes. Why? Because if it's a pretty
big photograph or pretty big image, each
of those dots takes up at least three
bytes it would seem. And if you do out
the math, if you got thousands of dots,
each of which uses three bytes, you're
going to quickly get to megabytes, if
not even larger for things like say
videos. But again, it's just patterns of
zeros and ones. And so long as the
programmer knows what they're doing and
tells the computer how to interpret
those zeros and ones. And equivalently,
so long as the software knows, look at
these zeros and ones and interpret them
as numbers or letters or colors, we
should see what we intended to
represent. All right, so that's num
that's uh colors and images. What about
how many of you kind of played with
these little flip books as a kid where
they've got like a hundred different
little pictures and you flip through
them really quickly and you see what
looks like animation in book form. Well,
this is essentially a video. So
therefore, what is a video or how can
you think of what a video is? It's just
a whole bunch of like images flying
across the screen either on paper or
digitally nowadays on your phone or your
laptop. And that's kind of nice because
we're sort of composing more interesting
media now based on these lower level
building blocks. And this is going to be
thematic. We literally started with
zeros and ones. We worked our way up to
letters. We then worked our way up to
sort of images and uh colors and thus
images. Now we're up at this level of
hierarchy in terms of video because
what's a video? It's like 30 images per
second flying across the screen or maybe
slightly fewer than that. That
collectively tricks our mind into
thinking we are seeing motion pictures.
And that's the old school term for
movies, but it literally is what it was.
motion pictures was this film was
showing you 30 pictures per second and
it looks like motion even though you're
just looking at images much like this
flip book very quickly one after the
other. What about music? Well, how could
you go about representing musical notes
if again your only ingredients are zeros
and ones? Even if you're not a musician,
how do you represent music like that on
the screen here? Yeah. Okay. So, the
frequency like the tone that you're
actually hearing from the device. What
else might weigh in beside besides the
frequency of the note? Yeah.
>> So the speed of the note or maybe the
duration like if you think about a
physical piano like how long you're
holding the key down for or not. What
else? So the amplitude maybe how loud
like how hard did you hit the keyboard
to generate that sound. So let me
propose at the risk of simplifying we
could represent each of these notes
using three numbers. maybe 0 to 255 or
some other range that represents the
frequency or the pitch of the note, the
duration, and the loudness. And so long
as the person receiving a file
containing all of those zeros and ones
knows how to interpret them three at a
time, I bet you could share uh a musical
file with someone else that they could
hear in exactly the same way that you
yourself intended. Let me pause here to
see if there's any questions now because
we've already built our way up from
zeros and ones now to video and sound.
>> Yeah, in front.
>> How does the computer know differentiate
between what the letter like 65 would be
and then what the number 65?
>> So, how does the computer distinguish
between the letter 65 and the number 65?
It's context dependent. So put simply
and we'll see this as early as next week
the programmer tells the computer how to
display the information either as a
number or a letter or equivalently once
programmed the software knows that when
it opens a GIF file or JPEG or something
else to interpret those zeros and ones
as colors instead of as like docx for a
Microsoft Word file or the like. Other
questions on any of these
representations?
Yeah. In front. Can we
>> go over like the base 10 base 2 thing
like really briefly?
>> Sure. So, can we go over base 10 and
base two? So, base 10 is like literally
the numbers you and I use every day.
It's base 10 in the sense that you have
10 digits at your disposal. 0 through 9.
And any numbers you want to represent in
the real world must be composed using 0
through 9. The binary system or base 2
is fundamentally the same. It's just the
computer doesn't have access to two
through 9. It only has access to zero
and one. But much like the light bulbs I
was displaying here, you can simply
ascribe different weights to each of the
digits. So that instead of it being as
much as the ones place, the 10's place,
and the hundred's place, if we more
modestly say the ones place, the two's
place, the four's place, we can use the
same system. In binary, you might need
to use more digits to count as high
because in 255, you can just write 255.
That's three digits in decimal. But in
binary, we've seen you need to use eight
such digits, which is more, but it's
still much better than unary, which
would have had 255 light bulbs on
instead.
>> And is
binary and like the same thing.
>> Is binary and base 2 the same thing?
Yes. Just like base 10 and decimal are
the same thing as well. And unary and
base 1 are the same thing as well. All
right. So let me just stipulate that
even though we sort of took this tour
quickly at the end of the day computers
only have zeros and ones at their
disposal. So again the answer to any
question as to how can we represent X is
going to somehow involve permuting those
zeros and ones into patterns or
equivalently into the numbers that they
represent. But if we now have a way to
represent all inputs in the world be it
letters, numbers, images, videos,
anything else and get output from some
problem-solving process like how do we
actually solve problems? Well, the
secret sauce in the middle here is
another term that you've probably heard
in the real world nowadays, which is
that of algorithm. Stepbystep
instructions for solving some problem.
So, this ultimately is what computer
science really is about too, is not just
representing information, but somehow
processing it, doing something
interesting with it to actually solve
the problem that you've been provided as
input so you can output the correct
answer. Now, there's all sorts of
algorithms implemented in our phones and
in our Macs and PCs, and that's all
software is. It's an implementation in
code, be it C++ or Java or anything
else. Other languages exist too in code
that the computer understands, but it's
still just step-by-step instructions.
And among the things we'll learn in CS50
is how to express yourself in different
ways to solve problems, not only in
different languages, but using different
methodologies as well. Because as we'll
see, among the reasons we introduce
these several languages is you don't
just learn more and more languages that
allow you to solve the same problems.
Different languages will allow you to
solve different problems and even save
you time by being better tools for the
job. So here for instance on uh an
iPhone is maybe a bunch of contacts
which is presumably familiar where we
might have a whole bunch of friends and
family and whatnot alphabetized by first
name or last name and suppose we want to
find one such person like John Harvard
whose number here might be plus1
949-4682750.
Feel free to call or text him sometime.
Um this is the goal of this problem. If
we have our contacts app and I start
typing in John's name by first name or
last name, the autocomplete nowadays
kicks in and it somehow filters the list
down from my 10 friends or 100 friends
or a thousand friends into just the
single directory entry that matches. So
here too, back in the days of RG&B um
projector, we had uh phone books like
this here too. Um I'm pleased to say
thanks to our friend Alexis, this is the
largest phone book that we've used for
this demonstration. Uh, this is an old
school phone book that's essentially the
same thing as our contacts app or
address book nowadays whereby I've got a
whole bunch of names and numbers
alphabetically sorted by first name or
last name, whatever, and corresponding
to each of those as a number. So, back
in the day and frankly even nowadays in
your phones, how do you go about finding
someone in a phone book or your contacts
app? Well, you could very naively just
start at the beginning and look down and
just turn one page at a time looking for
John Harvard in this case. Now, so long
as I'm paying attention, this
step-by-step process will get me to John
Harvard. Like, this is a correct
algorithm, even though you might kind of
object to how I'm doing this. Why? Like,
what's bad about this algorithm?
>> It's just slow. I mean, this is crazy
slow. If there's like a thousand pages
in this phone book, which looks like
there are, like this could take me as
many as a thousand pages, or maybe he's
roughly in the middle, like 500 pages.
Like, that's crazy. That's really rather
slow, especially if I'm going to do this
again and again. Well, what if I do it a
little smarter? Grade school, I sort of
learned how to count two at a time. So,
2 4 6 8 10 12 14 16 18. Again, if I'm
paying attention, I'll get there twice
as fast because I'm counting two at a
time. But is that algorithm step by step
correct?
And I'm seeing no, but why?
>> I might skip over John Harvard. So, just
by bad luck and kind of with 50/50
probability, he's going to be sandwiched
between two of the pages. Now, I don't
have to abort this algorithm alto
together. I could just as soon as I get
past the J section if we're doing it by
first name. I could just double back one
page and just make sure that I haven't
missed him. So, it's recoverable. And
this algorithm therefore is sort of
twice as fast plus one extra step maybe
to double back. But that's arguably
otherwise a bug or a mistake in the
algorithm if I don't fix it
intelligently. But what did we do back
in the day? And what does your iPhone or
Android phone do? What they typically do
is they go roughly to the middle, look
physically or virtually down. They see,
"Oh, I'm in the M section." And so,
which side is John Harbor to? To the
left or to the right? So, he's to the
left. So, I could literally now
Jesus Christ.
We talked about this before class that
this might be more Oh my god. There we
go. We can tear the problem in half.
Thank you.
It's been a while. We can tear the
problem in half. We know that John
Harvard is to the left. So, I can throw
half of the problem away if uh
dramatically such that I'm now gone from
a thousandpage problem to 500 pages
instead. What now can I do? I can go
roughly to the middle here and maybe I'm
in the E section. So, I went a little
too far back to the left, but I kept it
simple and I just divided so that I can
conquer this problem, if you will. And
if I'm in the E section now, is John
Harvard to the left or to the right? To
the right. So I can again Jesus Christ.
Tear the problem in half. And now, thank
you. So now John Harvard again is going
to be in this half. I can throw this
half away. So now I've gone from a,000
to 500 to 250. And I can repeat, repeat,
repeat down to 125. Half of that, half
of that, half of that until I'm left
with finally just a single page. And
John Harvard is hopefully now on this
page such that I can call him or not at
all at which point this is all sort of
for not. But what's powerful about each
of those algorithms is that the sort of
good better and best like they all get
the job done conditional on the second
one having that little fix just to make
sure I don't miss John Harbor between
two pages but they're fundamentally
different in their efficiency and the
quality of their design. And this is
really representative of one of the
emphases of a class like this. It's not
just about writing correct code or
getting the job done, but doing it well
and doing it quickly. Using the least
amount of CPU or computing resources,
using the minimal amount of RAM, using
the fewest number of people, using the
least amount of money, whatever your
constrained resource is, solving a
problem better. So that first algorithm
step-by-step instructions was all about
doing something like this whereby the
first algorithm if we plot things on a
grid like this we have on the x-axis a
representation of the size of the
problem. So this would mean small
problem like zero pages. This would mean
big problem like a thousand pages. And
on the y or vertical axis we have some
measurement of time. So this is the
number of seconds or the number of page
turns whatever your metric actually is.
So this would be uh not much time at
all, so fast. This would be a lot of
time, so slow. So what's the
relationship if we just roughly draw
these three algorithms? Well, the first
one is technically a straight line. And
we'll describe that as n. The slope is n
because if you think of n as a number
for the number of pages, well, there's a
one toone relationship in the first
algorithm as to how many times I have to
turn the page based on how many pages
there actually is. And you can think
about this in the extreme. If I was
looking for someone whose name started
with Z, I might have to go through like
a thousand darn pages to get to that
person whose name started with Z, unless
again I do something hackish and just
kind of cheat and go to the end. If we
execute these algorithms again and again
the same way, that's going to be pretty
slow. But the second algorithm was
pretty much twice as fast plus that one
extra step potentially. But it's still a
straight line because if there's a
thousand pages and I'm dividing the
problem and I'm doing two pages at a
time, well that's like n divided by two
steps plus one give or take. But it's
still a straight line because but it's
still better. Notice if this is the size
of the problem, a thousand pages for
instance, we'll notice that the first
algorithm took literally twice as much
time as the second algorithm. So we're
doing better already. But the third
algorithm fundamentally is going to look
something like this. And if you remember
your logarithm so to speak, sort of the
opposite of an exponential, this curve
is so much lower and flatter, if you
will, than either of these two
mathematically. More on this another
time. The slope is going to be like log
base 2 of n or just logarithmic in
nature. But what it means is that it's
growing very very very slowly. It's
still going up. It's never going to
flatline and go perfectly horizontal,
but it goes up very slowly. Why? Well,
if you think about two towns nearby,
like Cambridge on this side of the river
and the town of Alustin on the other,
suppose that they still have phone books
like this one, and they merge their
phone books for whatever reason. So,
overnight, we go from a thousandpage
phone book to a 2,000page phone book.
The first algorithm is going to take
literally twice as long as will the
second one because we're only going
through it one or two pages at a time.
But if the phone book size doubles from
this year, for instance, to next year,
you can kind of in your mind's eye think
about the green line. It's not going to
go up that much higher. Why? Well,
practically speaking, even if the phone
book becomes 2,000 pages long. Well, how
many more times do you have to tear or
divide that problem in half?
>> Just one. Because you're taking a,000
page bite out of it, or a 500 than a
250. you're taking much bigger bites out
of it than just one or two at a time.
And so what computer science and what
algorithms and about good design is
about is figuring out what is the logic
via which you can solve problems not
only correctly but efficiently as well.
And that then gives us these things
called algorithms. And when it comes
time to code, which we're about to do
too, code is just an implementation and
a language the computer understands of
an algorithm. Now this assumes that
we've come up with some digital way that
is to say zero in onebased way to
represent names and numbers. But
honestly we already did that. We came up
with a asky and then unicode to
represent the names. Representing
numbers is even easier than that. That's
really where we started. So code is just
about taking as input some standardized
representation of names and numbers and
spitting out answers. And that's truly
what iOS and Android are doing. When you
start doing autocomplete, they could be
searching from the top to the bottom,
which is fine if you've only got a few
friends and family in the phone. But if
you've got a thousand or if you've got
10,000 or if it's not a phone book
anymore, it's some database with lots
and lots of data. Well, it stands to
reason that it'd be nice maybe if the
computer kept it all alphabetized just
like that book and jumped to the middle,
then the middle of the middle, then the
middle of the middle of the middle, and
so forth. Why? because the speed is
going to be much much faster,
logarithmic in nature and not linear so
to speak in nature. But we'll revisit
those topics as well. But for now,
before we get into actual code, let's
talk for a moment about pseudo code. So
pseudo code is not one formal thing.
Every human will come up with their own
way of representing pseudo code. It's an
English-like or human-like formulation
of step-by-step instructions just using
tur correct English or whatever human
language. So, for instance, if I want to
translate what I did somewhat
intuitively with that phone book by just
dividing in half, dividing in half into
step-by-step instructions, I could hand
you or now it is like a robot or
something like that. Well, step one was
essentially to pick up the phone book,
which I did. Step two was I open to the
middle of the phone book in the third
and final algorithm. Step three was look
at the page as I did. Step four got a
little more interesting. Even though I
didn't verbalize this, presumably I was
asking myself a question. If the person
I'm looking for, John Harbert, is on the
page, then I would have called him right
then. But if he weren't on the page, if
he instead were earlier in the book, as
did happen, well then I'm going to go to
the left, so to speak, but more
methodically, I'm going to open to the
middle of the left half of the book.
Then I'm going to go back to line three.
That's interesting. We'll come back to
that in a moment. But else if the person
is later in the book, well, I'm going to
open to the middle of the right half of
the book and then go back to line three.
Now, let's pause here. Why do I keep
going back to line three? This would
seem to get me doing the same thing
forever endlessly.
But not quite. Why?
>> As soon as you hit the one the on.
>> Yeah. So because I am dividing the
problem in half, for instance, on line
six or line nine implicitly just based
on how I've written this, the problem's
getting smaller and smaller and smaller.
So it's fine if I keep doing the same
logic again and again because if the
problem's getting smaller, eventually
it's going to bottom out and I'm going
to have just one person on that page
that I want to call and so the algorithm
is done. But there is a perverse corner
case, if you will, and this is where
it's ever more important to be precise
when writing code and anticipate what
could go wrong. I should probably ask
one more question in this code, not just
these three. What might that question
be? Yeah.
>> John Harvard is in the book.
>> Yeah. So, if John Harvard is not in the
book, there's this corner case where
what if I'm just wasting my time
entirely and I get to the end of the
phone book and John Harvard's not there.
What should the computer do? Well, as an
aside, if you've ever been using your
Mac or PC or phone and the thing just
freezes or like the stupid little beach
ball starts spinning or something like
that and you're like, what is going on?
Some human at Google or Microsoft or
Apple or the like made a mistake. They
forgot for instance that fourth uncommon
but possible situation wherein if they
don't tell the computer how to handle
it, the computer's effectively going to
freak out and do something undefined
like just hang or reboot or do something
else. So we do want to add this else
quit altogether. So you have welldefined
behavior and truly think that the next
time your computer or phone
spontaneously reboots or dies or does
something wrong, it's probably not your
fault per se. It's some other human
elsewhere did not write correct code.
They didn't anticipate cases like these.
But now let's use some terminology here.
There's some salient ideas that we're
going to see in Scratch and C and Python
and these other languages I alluded to
earlier. Everything I've just
highlighted here, henceforth, we're
going to think of as functions.
Functions are verbs or actions that
really get some small piece of work done
for you. Functions are verbs or actions.
Here though, highlighted is the
beginning of what we'll call
conditionals. Conditional is like a fork
in the road. Do I go this way? Do I go
this way? Or some other way altogether.
How do you decide what road to go down?
We're going to call these questions you
ask yourself boolean expressions. Named
after a mathematician Bull. And a
boolean expression is just a question
that has a yes or no answer or a true or
false answer or a one or zero answer
just it's a binary state yes or no
typically. Otherwise we have this go
back to go back to which is what we're
generally going to call a loop which
somehow induces cyclical behavior again
and again. And those functions and those
conditionals, boolean expressions and
loops and a few other concepts are
pretty much what will underly all of the
code that we write whether it is in
scratch C or something else altogether.
But we need to get to that point and in
fact let's go and infer what this
program here does. At the end of the
day, computers only understand zeros and
ones. So I claim here is a program of
zeros and ones. What does it do?
Anyone
want to guess? I mean, we could spend
all day converting all of these zeros
and ones to numbers, but they're not
going to be numbers if it's code. What
do you think?
>> That's amazing. It does in fact print
hello world.
All right. So, no one except like maybe
you and me and a few others in the room
should know, and that was probably guess
admittedly or advancing on the slide.
But why is that? Well, it turns out that
not only do computers standardize
information, data like numbers and
letters and colors and other things,
they also standardize instructions. And
so, if you've heard of companies like
Intel or AMD or Nvidia or others, among
the things they do is they decide as a
company what pattern of zeros and ones
shall represent what functionality. And
it's very low-level functionality. those
companies and others decide that some
pattern of zeros and ones means add two
numbers together or subtract or
multiply. Another pattern might mean
load information from the computer's
hard drive into memory. Another might
mean store it somewhere else. Another
might mean print something out to the
screen. So nested somewhere in here and
admittedly I have no idea which pattern
off because it's not interesting enough
to go figure it out at this level says
print. And somewhere in there, like this
gentleman proposed, I bet we could find
the representation of H, which was 72
and E and L and L and O and everything
that composes hello world. Because, as
it turns out in programming circles, the
very first program that students
typically write is that of hello world.
Now, this one here is written in a much
more intelligible way. Even if you're
not a programmer, odds are if I asked
you, what does this program do? you
would have said,
"Oh, hello world." Even though there's a
lot of clutter here, like no idea what
this is until next week. Int main void.
That looks cryptic. There's these weird
curly braces, which we rarely use in the
real world, but at least I understand a
few words like hello in world. And this
is kind of familiar. Print f, but it's
not print, but it's probably the same
thing. So, here too is an example of
this hierarchy. Back in the day, in the
earliest days of computers, humans were
writing code by representing zeros and
ones. If you've ever heard your parents
talk about punch cards or the like,
you're effectively representing patterns
that tell the computer what to do or
what to represent, like literally holes
in paper. Well, pretty quickly early on
this got really tedious, only writing
code at such a low level. So, someone
decided, you know what, I'm going to put
in the effort. I'm going to figure out
what patterns of zeros and ones I can
put together so as to be able to convert
something more user friendly to those
zeros and ones. And as a teaser for next
week, that person invented the first
compiler. A compiler is just a program
that translates one language to another.
And more modernly, this is a language
called C, which we'll spend a few weeks
on together because it's so fundamental
to how the computer works. Even this is
going to get tedious by like week six of
the class. And this is going to get
stupid. This is going to get annoying.
This is going to get cryptic. We're just
going to write print hello on the screen
in order to use a different language
called Python. Why? because someone
wrote in C a program that can convert
Python, this is a white lie, to C which
can then be converted to zeros and ones
and so forth. So in computing there's
this principle of abstraction where we
start with the basics and thank god we
can all trust that someone else solved
these really hard problems or way uh
long ago. Then they wrote programs to
make it easier. We wrote programs to
make it easier. You can now write code
like I did with the chatbot to make
things even easier. Why? because OpenAI
and other companies have abstracted away
a lot of the lower level implementation
details. And that's where I think this
stuff gets really exciting. We can stand
on the shoulders of others so long as we
know how to use and assemble these kinds
of building blocks. And speaking of
building blocks, let's start here. Now,
odds are some of you might have started
here in like grade school playing with
Scratch. And it's great for like after
school programs, learning how to
program. And you probably used it this
language to make games and graphics and
just maybe playful art or the like. But
in Scratch, which is a graphical
programming language designed about 20
years ago from our friends down the road
at MIT's Media Lab, it represents pretty
much everything we're going to be doing
fundamentally over the next several
weeks in more modern languages like C
and Python, more textual languages, if
you will. I bet I could ask the group
here, what does this program do when you
click a green flag? Well, it says hello
world on the screen. Because with
Scratch, you have the ability to express
yourself with functions and loops and
conditionals and all of this, but by
using drag and drop puzzle pieces. So,
what we're about to do is this. We're
going to go on my screen to
scratch.mmit.edu.
It's a browserbased programming
environment, and we're only going to
spend one week, really a few days in
CS50 on this language. But the
overarching goal is to one make sure
everyone's comfortable applying some of
these building blocks and actually
developing something that's interesting
and visual and audio as well, but to
also give us some visuals that we can
rely on and fall back on when all of
those curly braces and parentheses and
sort of stupid syntax comes back that's
necessary in many languages but can very
quickly become a distraction early on
from the interesting and useful ideas.
So what we're about to see is this in a
browser. This is the Scratch programming
environment and there's a few different
parts of this world. This is the blocks
pallet so to speak. That is to say,
there's a bunch of puzzle pieces or
building blocks that represent functions
and conditionals and v and uh loops and
other such constructs. There's going to
be the programming area here where you
can actually write your code by dragging
and dropping these puzzle pieces.
There's a whole world of sprites here.
By default, Scratch is uh and is a cat
by design, but you can make Scratch look
like a dog, a bird, a garbage can, or
anything else as we'll soon see. And
then this is the world in which Scratch
itself lives. So Scratch can go up,
down, left, right, and generally be
animated within that world. For the
curious, kind of like high school
geometry class, there's sort of this XY
plane here. So 0 0 would be in the
middle. 0 180 is here. 0 comma 180 is
here. Uh -240 is here. and positive 240
0. Generally, you don't need to worry
about the numbers, but they exist. So
that when you say up or down, you can
actually tell the program go up one
pixel or 10 pixels or 100 pixels so that
you have some definition of what this
world actually is. All right, so let's
actually put this to the test. Let me go
ahead here and flip over to in just a
moment the actual Scratch website
whereby I'm going to have on my screen
in just a moment that same user
interface once I've logged in that via
which I can actually write some code of
my own. Let me go ahead and zoom in on
the screen a little bit here and let's
make the simplest of these programs
first. Maybe a program that simply says
hello world. Now at a glance it's kind
of overwhelming how many puzzle pieces
there are. And honestly, even over 20
years, I've never used them all. And MIT
occasionally adds to it. But the point
is that they're colorcoded to resemble
the type of functionality that they
offer. And also, it's meant to be the
sort of thing where you can just kind of
scroll through and get a visual sense of
like what you could do and then figure
out how you might assemble these puzzle
pieces together. So, I'm going to go
under this yellow or orangish category
here to begin with. So, there exists in
the world of Scratch not quite the same
jargon that I'm using now. functions and
conditionals and loops. That's more of
the programmer's way. This is more of
the child-friendly way, but it's really
the same idea. Under events, you have
puzzle pieces that represent things that
can happen while the world is running.
So, for instance, the first one here is
sort of the canonical when the green
flag is clicked. Why is that relevant?
Well, in the two-dimensional world that
Scratch lives in, there's a stop sign,
which means stop, and there's a green
flag, which means go. So, I can
therefore drag one of these puzzle
pieces over here so that when I click
that green flag, the cat will in fact do
something for me. Doesn't really matter
where I drop it, so long as it's
somewhere in the middle here. I'm going
to go ahead and let go. Now, I want the
look of the cat to change. I want to see
like a cartoon speech bubble come out
for now. So, I'm going to go under looks
here. And there's a bunch of different
ways to say things and think things. I'm
going to keep it simple and just drag
this one here. And now notice when I get
close enough to that first puzzle piece,
they're sort of magnetic and they want
to snap together. So I can just let go
and boom, because they're a similar
shape, they will lock together
automatically. And notice too, if I zoom
in here, the white oval, which by
default says hello, is actually editable
by me because it turns out that some
functions can take arguments or more
generally inputs that influence their
behavior. So, if I kind of click or
double click on this, I can change it to
the more canonical hello world or hello
David or hello whatever I want the
message to be. I'm going to go ahead and
zoom out. And now over here at top
right, notice that I can very simply
click the green flag. And I'll have
written my first program in Scratch. I
clicked the green flag, it said go. And
now notice it's sort of stuck on that
because I never said stop saying go. But
that's where I can click the red stop
sign and sort of get the cat back to
where I want it. So think about for just
a moment what it is we just did. So at
the one hand we have a very obvious
puzzle piece that says say and it said
something but it really is a function
and that function does take an input
represented by the white oval here
otherwise known as an argument or a
parameter. But what this really is is
just an input to the function. And so we
can map even this simple simple scratch
program onto our model of problem
solving before with an addition of what
we'll call moving forward a side effect.
A side effect in a computer program is
often something that happens visually on
the screen or maybe audibly out of a
speaker. It's something that just kind
of happens as a result of you using a
function like a speech bubble appearing
on the screen. So here more generally is
what we claimed it represents the
solving of a problem. And let's just
consider what the input is. The input to
this problem say something on the screen
is this white oval here that I typed in.
Hello world. The algorithm, the
step-by-step instructions are not
something really I wrote like our
friends at MIT implemented that purple
say block. So someone there knows how to
get the cat to say something out of its
uh comical mouth. So the algorithm
implemented in code is really equivalent
to the say function. So a function is
just a piece of functionality
implemented in code which in turn
implements an algorithm. So algorithm is
sort of the concept and the function is
actually the incarnation of it in code.
What's the output? Well, hopefully it's
this side effect seeing the speech
bubble come out of the cat's mouth like
this. All right, so that's one such
program, but it's always going to play
and look the same. What if I actually
want to prompt the human for their
actual name? Well, let me go back to the
puzzle pieces here. Let me go ahead and
throw this whole thing away. Okay. And
if you want to delete blocks, you can
either rightclick or control-click and
choose from a menu. Or you can just drag
them there and sort of let go and
they'll disappear. I'm going to go back
in and get another uh another event
block, even though I could have reused
that same one. I'm going to go ahead and
go under sensing now. And if I zoom in
over here, you'll see a whole bunch of
things like I can sense distance and
colors. But more pragmatically, I can
use this function in blue, ask
something, and then wait for the answer.
And what's different about this puzzle
piece is that it too is yes a function.
It too takes an argument, but instead of
having an immediate side effect like
displaying something on the screen, it's
essentially inside of the computer going
to hand me back the response. It's going
to return a value, so to speak. And a
return value is something that the code
can see, but the human can't. A side
effect is something the human sees, but
a return value is something only the
computer sees. It's like the computer is
handing me back the user's input. So,
how does this work? We'll notice, and
this is a bit strange. This isn't
usually how variables work, but Scratch
2 supports variables, and that was a
word I used quickly at the very start
when we were making the chatbot. A
variable like in math, X, Y, or Z, just
store some value, but it doesn't have to
store a number. In code, it can store
like a human name. So, what's going to
happen when I use this puzzle piece is
that once the human types in their name
and hits enter, MIT, or really Scratch
is going to store the answer, the
so-called return value in a variable
that's designed to be called answer.
But, as we'll see, you can make your own
variables down the line if you want and
call them anything you want. But, let me
go ahead and zoom out. Let me drag this
over here. I'm going to use the default
question, what's your name? But I could
certainly change the text there. And let
me go under looks again. Let me go ahead
and grab the say block and let me go
ahead and say just for consistency like
hello,
okay? And now let me go under maybe
sensing I want to say how do I want to
say this answer. Well, notice this. The
shapes are important. This too is an
oval even though it's not white but
that's just because it's not editable.
It's going to be handed to me by the ask
function. Let me zoom out and grab a
second say block like this. And notice
it will magnetically clip together. I
don't want to say hello again. So, I
could delete that. But now it's still
the same shape even though it's a little
smaller. Let me go back to sensing. And
notice what can happen here. When you
have values like words inside of a
so-called variable, you can use those
instead of manual input at your
keyboard. And notice it too wants to
magnetically snap into place. It'll grow
to fit that variable because the shape
is the same. And now let's do this. Let
me click the green flag at right. I'm
seeing quote unquote what's your name?
I'm getting a text box this time, like
on a web page for instance. Let me type
in my name and watch closely what comes
out of the cat's mouth as soon as I
click the check mark or hit enter.
Huh. Okay, I got my name right, but let
me do it once more. Let me stop and
start davvid.
Enter. No, it didn't work. Let me try
one other. Maybe it's my name. Let's try
Kelly. Enter. What's missing? Obviously,
the the hello. There's a bug, a mistake
in this program. But is there like what
explains this? Even if you've never
programmed before, intuitively, what
could explain why I'm not seeing hello?
>> Exactly. It's on two different lines.
So, it's doing one after the other. So,
it is happening. It's just you and I is
the slowest things in the room are just
not seeing it in time because it's
happening so darn fast. Because my
computer is so, you know, so new and so
fast, it's happening, but way too
quickly. So, how can we solve this? So
we can solve this in a few different
ways. And this is where in Scratch at
least for problems at zero when wherein
you'll have an opportunity to play
around with this. I can scroll around
here and okay under control I see
something like weight. So I can just
kind of slow things down. And now notice
too if you hover over the middle of two
blocks if it's the right shape it'll
just snap into the middle too. Or you
can just so you know kind of drag things
away to magnetically separate them. But
this might solve this. So let me hit
stop and then start davvid. Enter.
Hello, David. All right, that was a
little Let's do like maybe two seconds
to see it again. Green flag dab ID.
Enter. Hello,
David. All right, it's working better.
It's sort of more correct because I'm
seeing the hello and the David, but kind
of stupid, right, to see one and then
the other. Wouldn't it be nice to say it
all in one breath, so to speak? Well,
here's where we can maybe compose some
ideas. So, let me get rid of this weight
and the additional block. Let's confine
ourselves to just one say block. But let
me go down to operations where we
haven't been before. And this is
interesting. There's this bigger oval
here that says join two things like
apple and banana. And those are just
random placeholder words that you can
override with anything you want. But
they're both ovals and white, which
means I can edit them. So let me go
ahead and do this. Let me drag this on
top of the say block. And this is just
going to therefore uh override the hello
I put there. Now I don't want to say
apple or banana, but I do want to say
hello,
and I then want to say my name. Okay, so
now I can go back to sensing, go back to
answer, drag and drop this here. That'll
snap into place. And let me zoom in. Now
what I've done is take a function and on
top of it I've nested another function,
the join function that takes two
arguments or inputs and presumably joins
them together as per its name. So let's
see what this does for us. Let me click
stop and start. I'll type in David
enter. And it's so close. Now, this is
just kind of an aesthetic bug. What have
I done wrong here?
There's no space. So, it looks a little
wrong, but that's an easy fix. I just
need to literally go into the hello
block after the comma, hit the space
bar, so that now when I stop and start
again and type in David, now I see
something that's closer to the grammar
we might typically expect syntactically
here. All right. So, let's model this
after what we just saw earlier. We've
now introduced a so-called return value.
And this return value is something we
can then use in the way we want. It's
not happening immediately like the
speech bubble. It's clearly being passed
to me in some way that I can use to plug
in somewhere else like into that join
block. So if we consider the role of
these variables playing, let's consider
the picture now as follows. If the input
now to the first function, the ask block
is what's your name? Quote unquote,
that's indeed being fed into the ask
block. And the result this time is not a
speech bubble. It's not some immediate
visual side effect. It is the answer
itself stored in a so-called variable as
represented by this blue oval.
Meanwhile, what I want to do is combine
that answer with some text I came up
with in advance by kind of stacking
these things together. Now, visually in
Scratch, you're stacking them on top,
but it's really that you're passing one
into the other into the other because
much like math when you have the
parenthesis and you're supposed to do
what's inside the parenthesis and then
work your way out. Same idea here. You
want to join hello and answer together.
And whatever that output is, that then
becomes the input to the say block,
which like in math is outside of the
join block itself. So pictorially, it
might now look like this. There's two
inputs to this story. Hello, comma,
space, and the answer variable. The
puzzle piece in question is join. Its
goal in life had better be to give me
the full phrase that I want. Hello,
David. Let's shift everything over now
because that output is about to become
the input to the say block which itself
will now have the so-called side effect.
And so this too is what programming and
in turn what computer science is about
is composing with the solutions to
smaller problems solutions to bigger
problems using those component pieces.
And that's what each of these puzzle
pieces represents is a smaller problem
that someone else or maybe even you has
already solved. Now, we can kind of
spice things up here. If I go back to
Scratch's interface, we don't have to
use just the puzzle piece here. I can do
something like this. Let me go ahead and
drag these apart and get rid of the say
block down here. Just for fun, there's
all these extensions that you can add
over the internet to your own Scratch
environment. And if I go to like text to
speech down here, I can, for instance,
do uh a speak block instead of a say
block colored here in green. I can now
reconnect the join block in here. And if
we could raise the volume just a little
bit. Let me stop the old version, start
the new version, type in my name, and
hear what Scratch actually sounds like.
>> Hello, David.
>> Okay, not very cat-like, but we can kind
of waste some time on this by like
dragging the set voice to box. And I can
put this anywhere I want above the speak
block. So, I'm just going to put it
here, even though I've already asked a
question. Maybe kitten sounds
appropriate. Let's try again. Dav
>> meow meow.
>> Okay. And then let's see uh giant little
creepier. Here we go. DAV ID. And
lastly,
>> hello David.
>> All right. Little ransomlike instead.
All right. So, that's just some
additional puzzle pieces, but really
just the same idea, but I like that
we've introduced some sound. So, let's
do this. Let me go ahead and throw away
a lot of those puzzle pieces, leave
ourselves with just the when green flag
clicked, and play around with some other
building blocks that we've seen already
thus far. Let me go ahead, for instance,
under sound, and let's make the cow
actually meow. So, it turns out Scratch
being a cat by default comes with some
sounds by default like meowing. So, if
we go ahead and click the green flag
after programming this program, let's
hear what he sounds like now.
Okay, kind of cute. And if you want it
scratched to meow twice, you can just
play the game again.
And a third time. All right, but that's
going to get a little tedious as cute as
it is. So, I can solve that. Let's just
grab three of the puzzle pieces and just
drag them together and let them connect.
And now click the green flag.
All right. Doesn't it gets less cute
quickly, but maybe we can slow it down
so that the cat doesn't sound so so
hungry. Maybe let me go under uh let's
see under control. Let's grab one of
those. Wait one second and maybe plop a
couple of these in the middle here. That
might help things. And now click the
green flag.
Okay. Still a little hungry, but let's
see if we change it to two. And then I
change it to two down here in both
places. Let's play it again.
Okay, cuter maybe, but now I'm venturing
into badly programmed territory. This is
correct. If my goal is to get the cat to
meow three times, pausing in between.
Sorry, three times pausing in between.
What is bad about this code? Even if
you've never programmed before, though.
Yeah, in the middle.
>> Yeah, I literally had to repeat myself
three times. Essentially copy pasting.
And frankly, I could have been really
lazy and I could rightclick or
control-click and I could have chosen
duplicate. But generally, when you copy
paste code or when you duplicate puzzle
pieces, probably doing something wrong.
Why? It's solving the problem correctly,
but it's not well designed. Even if for
only because when I change the number of
seconds, now I had to change it in two
places. So, I had one initially, then I
had to change it to two. And if you just
imagine in your mind's eye having not
like six puzzle pieces but 60 or 600 or
6,000, you're going to screw up
eventually if it's on you to remember to
change something here and here and here
and here. Like you're going to mess up.
It's better to keep things simple and
ideally centralized by factoring out
common functionality. And clearly
playing sound and waiting is something
I'm doing at least twice if not a third
time here as well. So how can we do this
better? Well, remember this thing loops.
Maybe we can just do something a little
more cycllically. So I tell the computer
to do something once, but I tell it how
many times to do that al together. So
notice here by coincidence under control
I have a repeat block which doesn't say
loop, but that's certainly the right
semantics. Let me go ahead and drag the
repeat block in and I'll change the 10
to three just for consistency here. I'm
going to go back to sound. I'm going to
go ahead and play sound meow until done
just as before. And just so it's not
meowing too fast under control, I'm
going to grab a weight one second and
keep it inside the loop. And notice that
the loop here is sort of hugging these
puzzle pieces by growing to fill however
many pieces I actually cram in there. So
now if I click play, the effect is going
to be the same, but it's arguably not
only correct, but also well
designed because now if I want to change
the weight, change it in one place. If I
want to change the total number of
times, change it in one place. So I've
modularized the code and made it better
designed in this case. But now this is
silly because even though I want the cat
to meow, it feels like any program in
which I want this cat to meow, I have to
make these same puzzle pieces and
connect them together. Wouldn't it be
nice to invent the notion of meowing
once and then actually have a puzzle
piece called meow? So when I want the
cat to meow, it will just meow. Well, I
can do that, too. Let me scroll down to
my blocks here in pink. I'm going to
click make a block and I'm going to
literally make a new puzzle piece that
MIT didn't think of called meow. And I'm
going to go ahead and click okay. Now I
have in my code area here a define block
which literally means define meow as
follows. So how am I going to do this?
Well, I'm going to propose that meowing
just means to play the sound meow until
done and then wait 1 second. And notice
now I have nothing inside my actual
program which begins when I click the
green flag. But notice at top left
because I made a block called meow, I
now have access to one that I can drag
and drop. So now I can drag me into this
loop. And per my comment about
abstracting the lower level
implementation details away, I'm going
to sort of unnecessarily dramatically
just move that out of the way. It still
exists. I didn't delete it, but now out
of sight, out of mind. Now, if you agree
with me that meow means for the cat to
make a sound, we've abstracted away what
it means mechanically for the cat to say
that sound. And so, we now have our own
puzzle piece that I can just now use
forever because I invented the meow
block already. Now, I can do one better
than this. It would be nice if I could
just tell the meow block how many times
I want it to meow because then I don't
need to waste time using loops either
myself. So, let me do this. Let me zoom
out and let me go back to my define
block. Let me rightclick or
control-click and just edit it. Or I
could delete it and start over, but I'll
just edit it. And specifically, let me
say, you know what, let's add an input,
otherwise known as an argument, to this
meow block. And we'll call it maybe n
for the number of times I want it to
meow. And just to be super clear, I'm
going to add a label, which has no
functional impact, but it just helps me
remember what this does. So, I'm going
to say meow end time, so that when I see
the puzzle piece, I know what the N
actually represents. If I now click
okay, my puzzle piece looks a little
different at top left. Now it has the
white oval into which I can type or drag
input. Notice down here in the define
block, I now see that same input called
N. So what I can do now is this. Let me
go under control. Glag, drag the repeat
block here. And I have to do a little
switcheroo. Let me disconnect this. Plug
it inside of the repeat block. Reconnect
all of this. And I don't want 10. And
heck, I don't even want three down here
anymore. I can drag this input because
it's the right shape. And now declare
that meowing n times means to repeat the
following n times. Play sound meow until
done. Wait one second and keep doing
that n total times. If I now zoom out
and scroll up, notice that my usage of
this puzzle piece has changed such that
I don't actually need the repeat block
anymore. I can disconnect this. And
heck, I can actually rightclick and uh
control-click and delete it. just use
this under the green flag. Change this
to a three. And now I have the essence
of this meowing program. The
implementation details are out of sight,
out of mind. Once they're correct, I
don't need to worry about them again.
And this is exactly how Scratch itself
works. I have no idea how MIT
implemented the weight block or the
repeat block. Heck, there's a forever
block and there's a few others, but I
don't need to know or care because
they've implemented those building
blocks that I can then implement myself.
I don't necessarily know how to build a
whole chatbot, but on top of OpenAI's
API, this web-based service, I can
implement my own chatbot because they've
done the heavy lift of actually
implementing that for me. Well, let's do
just a few more examples here. Let's
bring the cat all the more to life. Let
me throw away the meowing. Let me open
up under when green flag clicked. How
about that forever block that we just
glimpsed? Let me go ahead and now add to
the mix what we called earlier
conditionals which allow us to ask
questions and decide whether or not we
should do something. So under this, let
me go ahead and under forever say if the
following is true. Well, what boolean
expression do I want to ask? Well, let's
implement how about this program and
we'll figure out if it works. Uh under
sensing, I'm going to grab this uh very
angled puzzle piece called touching
mouse pointer. that is the cursor and
only if that question has a yes answer
do I want to play the sound meow until
done. So let me zoom in here and in
English
what is this going to implement really
just describe what this program does
less arcanely as the code itself.
Yeahouse
>> yeah if you move the mouse over the cat
it will make noise. So, it's kind of
like implementing petting a cat, if you
will. So, let me zoom out, click the
green flag, and notice nothing's
happening yet, but notice my puzzle
pieces are highlighted in yellow because
it is in fact still running because it's
doing something forever. And it's
constantly checking if I'm touching the
mouse pointer. And if so,
it's like I just pet the cat. Now, it
stopped until I move the cursor again.
Now, it stopped. If I leave it there,
it's going to keep meowing because it's
going to be stuck in this loop forever.
But it's correct in so far as I'm
petting the cat. Let me do this though.
Let me make a mistake this time. Let me
forget about the forever and just do
this. And you might think this is
correct. Let me click the green flag
now. Let me pet the cat. And like
nothing's actually working here. Why
though logically?
Yeah.
>> Yeah. The program's so darn fast. It
already ran through the sequence. And at
the moment in time when I clicked the
rear flag, no, I was not touching the
mouse pointer. And so it was too late by
the time I actually moved the cursor
there. But by using the forever block,
which I did correctly the first time,
this ensures that Scratch is constantly
checking the answer to that question. So
if and when I do pet the cat, it will
actually
detect as much. All right, about a few
final examples before you're on your way
building some of your own first programs
with these building blocks. Let me go
ahead and open up a program that I wrote
in advance in fact about 20 years ago
whereby let me pull this up whereby we
have in this example a program I wrote
called Oscar time and this was the
result of our first assignment in this
class whereby when MIT was implementing
Scratch for the very first time we
needed to implement our very own Scratch
program as well. I'm going to go ahead
and full screen it here. The goal is to
drag as much falling trash as you can to
Oscar's trash can before his song ends.
For which one volunteer would be handy
here. Okay. I saw your hand go up
quickly in blue. Yeah. Come on up. All
right. So, you're playing for a stress
ball here if we will. At one at some
point, I'm going to talk over what
you're actually playing just so that we
can point out what it is we're trying to
glean from this program. And I'll
stipulate this probably took me like 8
12 hours. And as you'll soon see, the
song starts to drive you nuts after a
while because I was trying to
synchronize everything in the game to a
childhood song with which you might be
familiar. Let me go ahead and say hello
if you'd like to introduce yourself.
>> Oh, hello. So, I'm Han and uh I'm a
first year student. I'm pretty excited
for this class.
>> All right, welcome. Well, here is Oscar
time. If you want to go ahead and take
control of the keyboard, all you'll need
to do is drag and drop trash that falls
from the sky into the trash can.
Papa
heat.
And it's around this point in the game
where the novelty starts to wear off
because there's like three more minutes
of this game where more and more stuff
starts to fall from the sky. So as Han,
as you continue to play, I'm going to
cut over here. You keep playing. Let's
consider how I implemented this whereby
we'll start at the beginning. The very
first thing I did when implementing
Oscar time honestly was the easy part.
Like I found a lamp post that looked a
little something like this and I made
the so-called costume for the whole
stage. And that was it. The game didn't
do anything. You couldn't play anything.
You put your green flag, nothing
happened. But then I figured out how to
turn the scratch cat, otherwise known
more generally as a sprite, into a trash
can instead. And so the trash can,
meanwhile, is clearly animated because I
realized that, oh, I can give sprites
like the cat different costumes. So, I
can make the cat not only look like a
trash can, but if I want its lid to go
up, well, that's just another costume.
And if I want to see Oscar popping out,
that's just a third costume. And so, I
made my own simplistic animation. And
you can kind of see it. It's very
jittery step by step by step by creating
the illusion of animation by really just
having a few different images or
costumes on Oscar. Now, I hope you
appreciate how much effort went involved
into timing each of these pieces of
trash with the specific mention of that
type of piece of trash in the music.
Okay. 20 years later, still clinging.
So, you're doing amazing, by the way.
How do we get the trash to fall in the
first place? Well, at the very beginning
of the game, the trash just started
falling from some random location. What
does it mean for trash to fall from the
sky?
Oh, big climax here.
You got a lot of trash on the ground to
pick up.
There we go. And your final score is
a big round of applause if we could for
Han. Thank you.
Thank you. So just to be clear now,
let's decompose this fairly involved
program that took me a lot of hours to
make into its component parts. So this
is just a sprite. And I figured out
eventually how to change its costume,
change its costume, change its costume
to simulate some kind of animation. And
I also realized that oh, I don't need to
just have one sprite or one cat or trash
can. You can create a second sprite, a
third sprite, and many more. So I just
told the sprite to go to a random
location at Y equals 180 and X equals
something. I think I restricted X to be
in this region, which is why the trash
never falls from over here. I just did a
little bit of math based on that
cartisian plane that we saw a slide of
earlier. And then I probably had a loop
that told the trash to move a pixel,
move a pixel, move a pixel down, down,
down, down until it eventually hits the
bottom and therefore just stops. So we
can actually see this step by step. And
this is representative of how even for
something like your first problem said
in CS50 and with Scratch specifically,
you might build some of the same. So,
I'm going to go back into uh CS50 Studio
for today, which is linked on the
courses website, which has a few
different versions of this and other
programs called Oscar 0ero through Oscar
4, where zero is the simplest. And
truly, I meant it when I look inside
this program to see my code. Like, this
was it. There was no code because all I
did was put the sprite on the screen and
change it from a cat to a trash can. And
I added a costume uh a costume for the
stage, so to speak, so that the lamp
post would be fixated there. If I then
go to the next version of code, version
one, so to speak, then I had code that
did this. Now, notice there's a few
things going on here. At bottom left,
you'll see of course the trash can and
then at top right the trash. Here are
the corresponding sprites down here. So,
when Oscar is clicked on here, the trash
can, you see the code I wrote, the
puzzle pieces I dragged for Oscar. And
in a moment, when we click on trash,
you'll see the code I wrote or the
puzzle pieces I wrote dragged and
dropped for the trash piece
specifically. So what does Oscar do?
Well, I first switch his costume to
Oscar 1, which I assume is this the
closed trash can. Then forever Oscar
does the following. If Oscar's touching
the mouse pointer, then change the
costume to Oscar 2. Otherwise, that is
if not touching the mouse pointer,
change the costume to Oscar 1. Well,
what's the implication? Anytime I move
the cursor over the trash can, the lid
just pops up, which was exactly the
animation I wanted to achieve.
Meanwhile, if we do this and click the
green flag, you can see that in action,
even for this simple version. If I move
the cursor over Oscar, we have the
beginnings of a game, even though
there's no score, there's no music or
anything else, but I've solved one of my
problems. Meanwhile, if I click on the
trash piece here, and then you'll see no
code has been written for it yet. So, we
move on to Oscar version two and see
inside it. In Oscar version two, when I
click on trash, ah, now there's some
juicy stuff happening here. And in fact,
this trash sprite has two programs or
scripts associated with it. And that's
fine. Each of them starts with when
green flag clicked, which means the
piece of trash will do two things at
once essentially in parallel. The first
thing it will do is we'll set drag mode
to dragable. And that's just a scratch
thing that lets you actually move the
sprites by clicking on them, making them
dragable. Then it goes to a random X
location between 0 and 240. So yeah,
that must be what I did from the middle
all the way to the right. And I set y
always to 180, which is why the trash
always comes from the sky from the very
top. Then I said forever change your y
by negative one. And here's where it's
useful to know what 180 is, 240 is, and
so forth. Because if I want the trash to
go down, so to speak, that's changing
its Y by a pixel by a pixel by a pixel.
And thankfully MIT implemented it such
that if the trash tries to go off the
screen, it will just stop automatically,
even if it's inside of a forever block,
lest you lose control over the sprites
altogether. But in parallel, what's
happening is this. Also, when the green
flag is clicked, uh the trash piece is
doing this too forever. If touching
Oscar, what's it doing in blue here?
Sort of teleporting away. Now, to your
eye, hopefully it looks like it's going
into the trash can. But what does that
mean to go into the trash can? Well, I
just put it back into the sky as though
a new piece of trash is falling. So even
though you saw one piece of trash, two,
three, four, and so forth, it's the same
sprite just acting that out again and
again. So here, if I click play on this
program, you'll see that it starts
falling one pixel at a time. Because
it's draggable, I can sort of pull it
away and move it over to the trash can
like that. And as soon as I do, it seems
to go in, but really it just teleported
to a different X location. Still at Y=
180. Again, it's not much of a game yet.
There's no score. There's no music or
anything, but let's go to Oscar 3 now.
And in Oscar 3, if we scroll over to the
trash, even more is happening here. In
so far as I realized, you know what?
There was kind of a inefficiency before.
Previously, I had these two programs or
scripts synonym whereby they both went
to the top by going to 0 to 240 for X
and then 180 for Y. And if you noticed,
I used that here and I used that down
here in both programs. Now that too is
kind of stupid because I literally
copied and pasted the same code. So if I
ever want to change that design, I have
to change it in two places and I already
proposed that we frown upon that. So
what did I do in this version? I just
created my own block and I decided to
call my own function go to top. What
does it mean to go to the top? Pick a
random x between those values and fixate
on y= 180 initially. Now in both of
those programs which are otherwise
identical, I just say what I mean. Go to
top. Go to top. And if I really wanted
to, I could drag this out of the way and
never think about it again because now
that functionality exists. So correct,
but arguably better designed. I've now
factored out commonality so as to use
and reuse my code as well. So let's go
up to Oscar version 4 now. And in Oscar
time version 4, the trash can does a
little something more whereby what have
I added to this mix even though we
haven't dragged this puzzle piece
together before?
Yeah. What's new?
>> Score.
>> Yeah. So, it turns out on the left here,
there's a variables category, which is
goes beyond the answer variable that we
just automatically get from the ask
block. You can create your own variables
X, Y, Z. But in computer and
programming, it's best to name things,
not silly simple words like X, Y, and Z,
but full-fledged words that say what
they are, like score. So, I'm setting a
score variable to zero. And then any
time the trash is touching Oscar before
it teleports away to the top, I change
the score by one. That is increment the
score by one. And what Scratch does
automatically for me is it puts a little
billboard up here showing me the current
score. So if I now play this game once
more, the score is going to start at
zero. But if I drag this trash over here
and even let it fall in, as soon as it
touches, the score goes to one. And now
if I click and drag again, the score is
going to as soon as it touches Oscar
going to go to two and so forth. And you
saw in the final flourish with Han
playing that once you had the sound and
other pieces of trash, which are just
really other sprites and I just had wait
like a minute, wait two minutes so that
the trumpet would fall at the right
time. I've broken down a fairly involved
program into these basic building
blocks. And when you too write your own
program, that's exactly how you should
approach it. Even if you have these
grand aspirations to do this or that,
start by the simple problems and figure
out what bites can I uh bite off in
order to make progress. Baby steps if
you will to the final solution. Well,
let's look at one other set of examples
before we have one final volunteer to
come up. And as you'll soon see, it's
tradition in CS50 to end the first class
with cake. So, in a moment, cake will be
served out in the transcept. And please
feel free to come up and say hi and ask
questions if you'd like to. Let me go
ahead and open up though a series of
building blocks here via which we can
make so-called Ivy's hardest game which
is one implemented by one of your
predecessors, a former classmate from
CS50. So here we have a whole bunch of
puzzle pieces written by your classmates
but let me go ahead and zoom in on this
screen. You'll see that this harbored
crest is my sprite. So it's not a cat,
it's not a trash can, it's a harbored
crest and it exists in a very simple
two-dimensional world with two walls
next to it. If I click on the green
flag, notice that with my hands here, I
can go up, I can go down, I can go left,
and I can go right. But if I try going
too far right, I get stuck on the wall.
If I go too far left, I get stuck on the
wall. Well, it's the sort of the
beginning of any animation or game. But
how do I do this? Well, let me go up
here and propose that the first thing
the Harvard sprite is doing is it's
going to the middle 0 comma 0. And it's
then forever listening for the keyboard
and feeling for walls. Now those are
functions I implemented myself to kind
of describe what I wanted the program to
do. And let's do the shorter one first.
What does it mean to feel for the walls?
Just to ask the question, if you're
touching the left wall, change your x by
one. If you're touching the right wall,
change your x by negative one.
Why have I defined touching walls in
this weirdly mathematical way? Yeah.
>> Sure. Yeah.
>> Like counteracts the movement.
Otherwise, you're like not moving.
>> Exactly. Because if I've gone so far
right that I'm touching the right wall,
well, I'm already kind of on top of the
wall a little bit. So, I effectively
want the sprite to bounce off of it. And
the easiest way to do that is just to
say back up one pixel as though you
can't go any further. And same for the
left wall. Meanwhile, let me scroll over
to the second script or program that's
running in parallel. It's a little
longer, but it's not more complicated.
What does it mean to listen for
keyboard? Well, just check. If the key
up arrow is pressed, change Y by one.
Arrow go up. Else if the key down arrow
is pressed, then change Y by negative 1.
Key right arrow is pressed, change X by
one, and so forth. So again, this is
where the math and the numbers are
useful because it gives you a world in
which to live. Up, down, left, right.
deconstructed into some simple
arithmetic values. All right, so the net
result is that we have a crest living in
this world. Well, let's add a bit of
competition here. And in the second
version of this game, let me go ahead
and full screen it again. Click play.
And now we'll see sort of an enemy
bouncing back and forth autonomously. So
there's no one playing except me. I'm
controlling Harvard. Yale is bouncing on
its own. And nothing bad's going to
happen if it hits me. But it does seem
to be autonomous. So how is this
working? Well, if it's doing this
forever, there's probably a forever loop
involved. So, let's see inside here.
Let's click not on Harvard, but on the
Yale sprite. And sure enough, if we
focus on this for a moment, we'll see
that the first thing Yale does is go to
0 comma 0. It points in direction 90°,
which just gives you a sense of whether
you're facing left or right or wherever.
And then it forever does the following.
If it's touching the left wall or
touching the right wall, I was a little
clever this time, if I may. I just kind
of turn around 180 degrees, which
effectively bounces me back in the
opposite direction. Otherwise, I go
ahead and no matter what just move one
step. And this is why Yale is always
moving back and forth. So, a quick
question. If I wanted to speed up Yale
and make this beginning of a game
harder, what would I do?
Yeah.
>> Yeah. So, let's have it move like 10
steps at a time, right? This looks like
a much harder game, if you will, like
level 10 now, because it's just moving
so much faster. All right. Well, let's
try a third version of this that adds
another ingredient. Let me full screen
this and click play. And now you'll see
the even smarter MIT homing in on me by
following my actual movements. So, this
is sort of like boss level material now.
And it's just going to follow me. So,
how is this working? Well, it's kind of
a common game paradigm, but what does
this mean? Well, let's see inside here.
Let's click on MIT sprite. It's pretty
darn easy.
go to some random position just to make
it a little interesting lest MIT always
start in the center and then forever
point towards the Harvard logo outline
which is the name the former student
gave to the costume that the sprite is
wearing that looks like a Harvard crest
and then move one step. So coral layer
of the previous question, how do we make
the game harder and MIT even faster?
Well, we can change this to be like 10
steps and now you'll see MIT is a little
twitchy because
this is kind of a visual bug. Let me
make it full screen.
Why is this visual glitch happening?
It's literally doing what I told it to
do. It just looks stupid. Yeah.
Say again.
>> Yeah. It's moving so fast that it's sort
of going 10 pixels this way, but then I
kind of it kind of overshot me. So then
it's doubling back to follow me again,
and it's doubling back this way. And
because these are such big footsteps, if
you will, it just has this visual effect
of twitching back and forth. So, we
might have to throttle that back a bit
and make it five or two or three instead
of 10 because that's clearly not
desirable gaming behavior here. All
right. Well, let's go ahead and do this.
Let's put them all together just as your
former classmate did when submitting
this actual homework. Uh, the game will
conclude hopefully in an amazing climax
where you've won the game. So, we need
someone ideally with really good hand
eye coordination to play this final game
here. Yeah, your hand went up first, I
think. Okay, come on up. Big round of
applause because this is a lot of
pressure to end.
All right. So, if you win the game, cake
will be served. If you don't win the
game, there will be no cake.
>> Okay. But introduce yourself in the
meantime.
>> Hi, I'm Jenny Pan, freshman at Hollis
and I'm actually a CS major or
concentration.
>> Nice to meet you. Head to the keyboard
here. This now is the combination of all
of those building blocks and even more
aka Ivy's hardest game. You will be in
control just as I would of the harbored
crest. And the goal is to make it to the
exit, which is this gentleman on the
right here. And you'll see there's
multiple levels where it's each level
gets a little harder. All right, here we
go.
Heat.
Heat.
All right, this is CS50 and this is week
one, our second week together. And
you'll recall that last week, week zero,
we focused on Scratch. Ultimately, this
graphical programming language by which
you can drag and drop puzzle pieces that
interlock together only if it makes
logical sense to do so. And many of you
had actually probably played with that
in like middle school or even prior at
some point. But for our purposes, the
goals of Scratch were to give us sort of
a mental model for some fundamental
constructs that we're going to see again
and again today in C in a few weeks in
Python and even thereafter. And those
include things like functions and return
variables and arguments and variables
and loops and conditionals and more. And
so even if today feels like a bit of a
fire hose, such as that picture here,
appreciate that a lot of today's ideas
are exactly the same as last week's
ideas, it's just that the syntax is
going to change. It's going to look a
little different. It's going to look a
little scarier. It's going to be harder
to sort of memorize, except with
practice will come that muscle memory,
but the ideas ultimately are going to be
the same. And indeed, this is, if
unfamiliar, uh MIT down the road has a
tradition of hacks whereby students once
a year do something fairly crazy. And at
this point, they happen to connect an
actual working uh drinking fountain to
an actual fire hydrant. And the sign
there, very pixelated, says, "Getting an
education from MIT is like trying to
drink from a fire hose." And that's
indeed how computer science, how
programming, how CS50 will sometimes
feel, but realize that what's going to
be ultimately most important is not
where you uh feel you are day after day,
but where 3 months from now you feel
that you are relative to last week
alone. so-called week zero. So, let's
look back at what week zero looked like.
It looked a little something like this.
The simplest of programs by which we get
get that cat to say hello world. Today,
that same code is going to start to look
a little like this, which was a glimpse
we gave you last week. But this time,
I've deliberately colorcoded it to try
to send the message that whereas in
Scratch, we had this yellowish puzzle
piece that sort of kicked things off
that didn't really do anything itself,
but it got the program started, whereas
the real work was done in purple here.
Same is going to be true today whereby
I'm going to wave my hands for a little
bit of time at this yellowish code on
the screen. But what's really going to
have the most effect is this same purple
line here and the white text within. And
we'll break down what all of these lines
mean over the next couple of weeks. But
sometimes we'll wave our hand at details
if we feel it's a little unnecessary at
this point in the story. And in fact,
let me get rid of the color coding for
now. And we'll see that this is the kind
of code in a language called C we're
going to start playing with and using
today and for the next several weeks.
And indeed, it's representative of what
we're going to generally call source
code. So source code is what programmers
write. It's what you write. It's what
you wrote, albeit by dragging and
dropping puzzle pieces. This week
onward, you're going to start using your
keyboard all the more. And you're going
to write source code. So this is code
that we humans can understand with some
training and with some practice. But of
course per last week, what language do
computers ultimately understand? Only
>> so binary zeros and ones. And so you and
I, yes, can write code starting today in
a form that looks a little something
like this, which admittedly might look a
little arcane and cryptic, but it's
certainly better than a whole bunch of
zeros and ones. But we're going to write
in source code. But the machines that we
write code for ultimately only
understand these here, zeros and ones,
which may very well say hello world, but
we're going to call this moving forward
machine code. So machine code is what
the the computers understand. Only the
zeros and ones. Source code is what you
and I understand and actually write. So
it stands to reason that we're going to
have to somehow translate one to the
other from source code to machine code.
And I alluded to this ever so briefly
last week, but we're going to use this
same mental model whereby the source
code we write might be the input to some
problem. The output we want there from
is going to be the machine code. So what
we're going to equip you with today
inside of this proverbial black box is a
special piece of software that takes
source code as input, produces machine
code as output, and that type of program
is called a compiler. And there's
bunches of difference of compilers in
the world. We're going to have you use
one of the most popular ones, but it's
simply a piece of software that someone
else wrote that converts one language to
another. Source code, for instance, in a
language called C to machine code, the
zeros and ones that our Macs, PCs,
phones, and other devices actually
understand. So, where are we going to do
this and how are we going to do this?
So, I promised last week that we'd
introduce you to this year tool, which I
used briefly at the very start of class
to whip up that chatbot. We're going to
use it though not for Python this week,
but indeed for a different language, C.
And indeed, this tool, Visual Studio
Code, or VS Code for short, is super
popular in industry. This is what real
programmers, so to speak, are using all
of the time nowadays. There's absolutely
alternatives. If some of you have
programmed before, you might have used
or experienced different tools, but this
is a very common tool that you'll see
even after CS50. And in fact, it's
something that ultimately you can
install for free on your own Macs and
PCs so that by the end of the course,
you're completely independent of CS50
and any CS50 related tools. But what we
have done for the very start of the
class is essentially provided you with a
cloud-based version of this tool. So all
you need is a web browser on any Mac or
PC or the like so that everything's
pre-installed for you, preconfigured for
you, and you don't have to deal with the
stupid technical support headaches at
the start of the term because it should
just work. But by the end of the term,
once you're a little more comfortable
with technology and with code in
particular, you can absolutely offboard
yourself from this tool. Install it,
download it on your own Mac and PC and
have pretty much the exact same
environment completely under your
control. So, starting today, you're
going to see an interface that looks
quite like this quite often. And we used
this same interface last week ever so
briefly. Moving forward, here's where
we're going to write code. At top right
is where one or more code tabs are going
to appear, similar to any tabbed uh
environment that you might use. Here,
for instance, is just a screenshot of
the first file we'll create today called
hello.c. The reason it's called hello.c
is because it's in a language called C,
as we soon shall see. No pun intended.
Meanwhile, the code here happens to be
colorcoded, not quite in the same way as
you saw before cuz I manually made it
look more like scratch blocks. But among
the features that VS Code and other
programming environments provide is
something called syntax highlighting
whereby you don't worry about or even
think about these colors. But as you
write out code in a recognized language,
tools like VS Code will just color code
different parts of your code for you
just to make different features jump
out. And we'll see what those features
are over the course of today. But you'll
also spend a good amount of time, as I
briefly did last week, down here in the
bottom right of your screen, the
so-called terminal window, which is
going to be where you run commands for
compiling code and writing code. And in
fact, as we'll see today, you're going
to start using your mouse and clicking a
little bit less. You're going to start
using your keyboard and typing a bit
more. And ultimately, even though if at
first that might feel like a step
backwards to sort of not use something
that's so user friendly, the reality is
most every programmer tends to find
themselves ultimately much more
productive, much more powerful using the
keyboard more often, more quickly than
say a traditional mouse or trackpad
would allow. Meanwhile, we'll see some
somewhat familiar features here at left,
like this is where you'll see the files
and folders that will create over time.
At far left here is going to be an
activity bar, which is essentially a
modern form of a menu via which you can
open and close things and access other
features. For my purposes, I'll
generally hide this part here. I'll
generally hide this part here so that
when we're together, we're focusing
almost entirely on code and commands,
but I'm just typing some quick keyboard
shortcuts to simplify my own user
interface in that way. So, with all that
said, just some terminology. So this
whole collective environment that I'm
describing here is generally what's
known as a graphical user interface.
Why? Well, it's an interface for users
that's graphical in nature with icons
and buttons and the like. Shorthand
notation for this is guey, GUI for
short. But within this graphical user
interface, as promised, is going to be
that terminal window at bottom right
where I promised we would be typing most
of our commands. And just to give you a
bit more jargon in computing, that's
generally known as a command line
interface or CLI for short, whereby
you're typing commands into that
interface instead. And the world of
computing software is essentially
divided into gueies and CLIs and
sometimes a piece of software might have
one of each as well. But without further
ado, why don't we go ahead and focus
entirely first on this here program,
which I dare say is the simplest program
you can write in a language like C and
see how we can actually compile and run
it together. So, I'm going to go over to
VS Code here where I've hidden my file
explorer with all the icons and I've
hidden my activity bar so that only do I
have room for tabs of code and the
command prompt at the bottom. I'm
calling this a command prompt because
it's at this dollar sign where I'm going
to run some of my commands. And it's a
dollar sign by convention. It has
nothing to do with currency. It's just a
computing convention. Some systems will
use a carrot symbol. Some systems will
use a greater than symbol rather or
something else. But it just means type
your commands here. The first such
command I'm going to type is this code
hello. C with a single space in between.
I've not used any spaces in the name of
the file. I've not capitalized any
aspect of the file just because this is
convention. Unlike your Mac or PC where
you might be in the habit of naming
files with spaces and capitalization,
generally you'll make your life simpler
by just using lowercase and no spaces at
all. As soon as I hit enter, what you'll
see is that a brand new tab appears
called hello C with a cursor blinking on
line one. And this is essentially VS
code waiting for me now to type the
first line of my code. Notice though
that the command is complete there by
whereby I am have another cursor here
which I've give if I give click in the
terminal window and give foreground to
it my cursor might blink there instead
that just means I can type another
command when I am ready. So let's go
ahead and whip up this code and I've
done this many times so I can type it
fairly quickly but in this tab I'm going
to do include standard io.h h so to
speak int main void then inside of
so-called curly braces indenting therein
by four spaces I'm going to say print f
quote unquote hello world back slashn
close quote semicolon and voila I've
written my first program in C in a class
like this no need to write down each and
every line of code that I write in fact
on the course's website will be copies
of everything that we've done as well as
excerpts there from in the courses notes
but you're welcome but not expected to
follow along in real time with what I am
typing here. So that's it. Like I've
written my very first program in C. If I
had done this on an actual Mac or PC
without a command line interface, I
might have a new icon on my desktop, so
to speak, called hello. And ideally, I
could double click on that or tap on it
and run the program. But because I'm in
this specific programming environment
that has a mix of a guey and a CLI, I
actually need to click down in my
terminal window. And I need to now
compile this program first because at
this point in time, it exists only as
source code. So to do this, I'm going to
compile my code by very aptly saying
make space hello. And I'm pronouncing
the space, but literally I hit the space
bar. Make space hello as it sort of
implies semantically will make a program
called hello. Notice I have not said
hello.c C again because the compiler,
let's call it make for now, even though
that's a bit of a white lie, is going to
infer that if I want to make a program
called hello, it's going to
automatically look for a file called
hello. C in this case. So, a bit of
magic. Enter. And remarkably, anytime
you don't see any output at a command
like this, that's probably a good thing.
Generally speaking, when you see output
when compiling your code, you have done
something wrong. Or in this case, I
might have done something wrong. But no
output is good because what I can now do
and this is a bit cryptic. I can run
this program not by double clicking or
tapping anywhere but by doing dot slashh
hello with no spaces. And this is a bit
weird but what the dot slash means is
that a having just made a program called
hello that program is going to end up in
my current folder. It's somewhere in the
cloud. Yes, more on that in a bit. But
the program called hello is just
somewhere in my current folder. When I
say dot slash, that's like saying go
into the current folder and run the
program therein called hello
specifically. Now, as I often do, I'll
cross my fingers, hope that I didn't
mess this up in any way, and I should
see in a second hello world indeed
printed onto the screen. And so, just to
recap those then commands. One, I ran
code hello.c, which is a VS code
specific thing. Code short for VS Code
just creates a new file called hello.c.
And then I'm on my way with my own
keyboard. Make hello compiles that
source code into machine code thereby
creating a new file called hello. And to
run that program hello, I type this
strange command dot /hello. But this is
a paradigm. No matter what you call your
programs, we're going to see again and
again and again. So even if you've not
done something quite like this, it will
very quickly get familiar.
Yes. Questions.
How when you say make hello, how like
how does how do you how does the
computer know like what part of the code
to what part of the code is ascribed to
hello?
>> Good question. When I say make hello,
how does the computer know what part of
the code is ascribed to this program
hello? It literally is going to take the
entire contents of hello.c and turn them
somehow into a program.
>> And does it have to be like named hello?
>> Does it have to be named hello? No. I
could have called it goodbye or anything
more my first program C. anything at all
so long as I change these words here
accordingly.
>> But it has to like it needs to be like
from the same thing like it needs to
>> Yes.
>> have like green C and make green or
whatever.
>> Exactly. If you change the name there
you need to change your commands
accordingly. Other questions on these
here steps?
No. All right. So let's tease apart what
it is we just did and like why this code
works in the way that it does. Well, to
recap, in Scratch, we had a program like
this. When the green flag was clicked,
we wanted to say hello world onto the
screen. The code that corresponds to
that is roughly here. And indeed, notice
that the yellowish or oranges code lines
up with the when green flag clicked. The
purple code here lines up with the say
block. And the white code inside of here
roughly corresponds to what was in the
white oval that we kept using again and
again last week. So, let's do more of a
onetoone correspondence. And these
slides are deliberately designed to give
you again that sort of mental model of
taking same ideas from last week and
just changing the syntax this week
onward. So when we have a function like
this thing here and recall that a
function is just an action or verb. It
sort of accomplishes a small piece of
work in code in C specifically you're
going to type of course not a purple
puzzle piece but you're going to say the
word print. Well, more technically print
f where the f as we'll soon see means
format the printed output because this
is more powerful than just printing some
raw text alone. Then you can have
parentheses open and close left and
right. And notice that it's no accident
that MIT MIT chose an oval for their
input to functions because it roughly
looks like the start of a parenthesis
and parenthesis on left and right.
Meanwhile, what goes inside of the
parenthesis in the corresponding C code?
Well, at the end of the day, minimally
hello, world because that's literally
what we want to print to the screen. But
in C, unlike in Scratch, there's a bit
of overhead, a bit of additional syntax
that you just got to deal with to make
clear to the computer what you want to
print. In particular, you're going to
have to surround everything you want to
print with double quotes to make clear
that hello is not some special function
or variable or something else. It's
hello world is the English phrase that
you want to print. So double quote here,
double quote there means here's the
beginning and the end of what I want to
print. You're also curiously going to
put a backslash
in most cases at the end of the word or
words you want to print. We'll take that
away in a moment and see what it does.
And then lastly, and perhaps most
annoyingly in programming circles, you
have to finish your thought with a
semicolon. Much like in English, you
would finish most sentences with a
period instead. And the thing in the
thing about programming is with C in
particular, if you mess up almost any of
these details I just rattled off,
something's going to go wrong. And so
you're in good company. The very first
program you try to write or try to
compile, odds are it might not work
correctly because you'll develop over
time the muscle memory for spotting all
of these seemingly minor and actually
minor details, but that do matter to the
computer. All right. So if you're
familiar of course with the notation in
like mathematics of functions like a
function in code is really the same idea
as a function in math whereby the
function f takes some input for instance
x and generally produces some output. So
if you're coming more from that
background realize that what we're
really doing here is roughly the same
but in code recall that we can have
different types of output. So if this is
our grand mental model and say we've got
a function as inside of this black box
that takes arguments, that is to say as
its inputs, it can sometimes have side
effects. And recall that side effects
are often visual things that happen as a
result. They display on the screen.
Maybe it comes out of the speaker. It's
something generally ephemeral that just
happens. But it's not necessarily useful
in the same way as another type of
function that we'll return to in just a
bit. But last week, recall that we got
the cat with a speech bubble to uh
manifest on the screen and say hello
world in that speech bubble when the
input was hello world and the
corresponding function was instead say.
So let's see if we can't now tease apart
what the code we wrote is actually doing
for us bit by bit. So let me go back to
VS Code here and let me propose to break
this in a little way. Let me delete the
backslash n if only because at first
glance who knows or cares what that's
doing. Let's just get rid of it if we
don't understand it. I could now go back
down to my terminal window and I could
do dot /hello enter again. But there's
seemingly no change, which is good.
Doesn't seem like I broke it, but I've
kind of misled you here. Why?
Why did nothing seem to change?
I didn't recompile it. So, recall that
the compiler converts source code to
machine code, but I already did that a
couple of minutes ago. If I've changed
the source code, it stands to reason
that I need to recompile the code to
actually see the effects of that. So,
let me do that again. Make hello enter.
Nothing seems to have gone wrong, but
let me now dot /hello enter. And it's
subtle now. And in fact, let me go ahead
and zoom in. It's really just an
aesthetic bug in so far as functionally
the program is still technically
printing hello world. But what's
seemingly wrong? Or put another way,
what did the backs slashn apparently do?
Yeah.
>> Yeah. So, it's somehow giving me a new
line. And that's essentially what the
back slashn denotes is give me a new
line there. And why was I doing that?
Well, really just for the aesthetics.
Like if this dollar sign represents my
prompt where I type commands. If
anything, it just looks kind of stupid
that I finished a program over here and
then the prompt is on the same line. It
just looks wrong. Even though you could
sort of argue that was my intent, even
though in this case it wasn't. So, what
would the alternative be? Well, what
you're seeing here is what's actually
generally known as an escape sequence,
which are sort of uh special sequences
of symbols like backslash and n in this
case that do a little something unusual.
And here's just a non-exhaustive list of
some you'll encounter in the real world
and including in CS50. Back slashn moves
you to a new line. Back slash r is a
so-called carriage return. If you've
ever seen or used an old school
typewriter, this refers to the process
of bringing the typing head back to the
left end. So it sort of moves the cursor
horizontally as opposed to vertically.
This one's interesting. Back slash
double quote.
Why does there exist this pattern?
Back slash double quote. Yeah.
>> If you just write double quote, it
closes the
>> exactly. So recall that phrase we tried
to type uh print out like hello, world.
If for some reason you didn't want to
say hello world, but you wanted to say
some or like sort of snarkily like hello
world or something like that, well, you
can't put a quote a quote a quote and a
quote and expect the computer to know
which quote corresponds to what. It's
just arguably ambiguous. So if inside of
double quotes, you actually want to
print actual double quotes, this is a
escape sequence that tells the computer,
this is not some quote delim delineating
where my thought begins and ends. This
is literally a double quote. And we'll
see other situations in which a single
quote or apostrophe is the same. We'll
see crazy situations in which you want
to print a backslash, but backslash
already has some special meaning. So
there's solutions to all of these
problems. But let's not get too far into
the weeds here. But let me go back to
the code and propose what the
alternative otherwise might have been.
If I didn't know about backslashn, my
instinct to move the cursor to the next
line might have been literally to just
like hit enter or do something like
this, like move the double quote, move
the parenthesis, move the semicolon on
to the next line. But this should start
to rub you the wrong way. And indeed,
this violates a principle of most
programming languages and that most
programming languages are linebased. You
sort of start and finish your thought
ideally on the same line. And this runs
a foul of that. And two, even if you're
seeing code for the first time, assume
that this just looks stupid as well to
sort of move part of your thought to the
next line, it just looks a little
sloppy. And it is. So C and many other
languages, Python among them, solve this
by giving you these so-called escape
sequences. So if you want a new line
there, you do back slashn and you will
get your new line there. Now, that's a
bit of an overstatement what I said in
that sometimes lines of code will be so
long that they do wrap onto multiple
lines, but generally that's a convention
that we're going to try to avoid. All
right, what else could go wrong? Well,
let's do this. Let me go ahead and clear
my terminal window, which I can do by
hitting uh L or I can literally type
clear. And I'm going to frequently do
this just to keep the screen clear, even
though it has no functional impact. It's
just an aesthetic. Let me do something
else accidentally. Suppose I forgot to
finish my thought and I omitted the
semicolon, but otherwise the code is
perfect. Let me do make hello. Now
enter. Now we're going to see some
output that's a little more arcane. Let
me go ahead and scroll back up here to
make clear that what's just happened is
I ran make hello, but I didn't get back
to another prompt. I don't see
immediately a dollar sign because
there's an error message here that is
almost as long as the code I tried to
write. Not to worry. Let's see. Here is
the name of the file in which the
problem exists. Stands to reason that
it's in hello C. Here is the line uh
number in which the problem seems to
exist. Line five. And that's helpful
because it lines up with this. And then
if you're you care to count, this is the
29th character. So if I count from left
to right around character 29, something
is wrong. Something is missing. So it's
a pretty decent error message. In fact,
it even says expected semicolon after
expression. There's a little green
carrot symbol pointing me at the
mistake. So this is an again a this is
another value of the compiler. Not only
will does it know how to convert source
code to machine code, it's also pretty
good at finding mistakes in your code
and trying to draw your attention to
them. So how do I fix this? Well,
assuming you've understood the error
message at this point. Well, you just go
back in, add the semicolon. Let me go
back down to my terminal window. I'm
going to clear it just to clean up the
mess. Let me rerun make hello. And now
we are back in business. And indeed, if
I do /hello, I've got hello world back
on the screen. Well, let's make one
other mistake. Suppose that I forgot, as
you sometimes will, to include this line
at the top, which will make more sense
next week, but for now, let's just omit
it and dive right into the code. You
would think this is enough, just
printing out hello world. Well, here,
let me go back down to my terminal
window. Let me do make hello again now.
And I'm going to get a whole different
error message instead. So now problem is
still with hello C. That makes sense.
Line three. Okay. So somewhere in there
print f is suddenly the problem even
though the semicolon is back and the
back slashn is back. So let's keep
reading. Error call to undeclared
library function printf with type int.
And then this is a whole mouthful. So,
here is an example of an error message
that unless you're sort of conditioned
to know what this means and you've seen
it before, it's quite more cryptic and
unclear like what the solution to the
problem is, especially when the rest of
your code is truly correct. I've just
forgotten something stupid. But how can
I sort of think about this problem?
Well, it turns out that another feature
of C is that it comes with a bunch of
header files. A bunch of files whose
names don't end in C, but end inh. And
these so-called header files which end
inh are contain code that other people
wrote that you can use in your own
programs. So for instance in this
particular case a header file is giving
us access to what's more generally in
computing called a library. A library is
code someone else wrote that you can
use. And I actually used a library last
week when I did that import line and
mentioned open AAI the company. I was
actually using a library from that
company that I had automatically
downloaded and installed into my
programming environment in advance of
class because I don't know how to
implement a chatbot without standing on
their shoulders and using a lot of the
code they themselves wrote. Same idea
here. Even though print f is a feature
of C, if you want to use it, you have to
include that library by telling your
program to include the header file that
defines that function. And you only know
this by being taught it or looking it up
in a book or a reference. But in this
case, I wanted to use a header file
called standard io.h stdiodio.h.
Um, it is not studio.h.
This is a very common bug online. Um, if
you find yourself typing studio.h, typo,
it's standard io.h.
And in that file then is defined the
printf function. So, if I go back to my
code here, the solution to this problem
truly is to just undo the deletion I
made a moment ago. Because what line one
is now doing for me is it's telling the
compiler, oh, by the way, I didn't write
all the code that I'm about to use.
Please include the definition of print f
from this other file called standard
io.h. And again, you'd only know this by
looking it up in a reference, attending
a lecture or something like that. It's
not obvious otherwise, but these are the
kinds of things you very quickly look
up. So, where do you look them up? Well,
it turns out the ecosystem of C has, you
know, hundreds of books you can buy or
download, many, many, many websites.
Among them is one of CS50's own. And in
fact, the conventional way to look stuff
up for the programming language called C
is to look at the official manual pages
or man pages for short for the C
language. Unfortunately, many of them
were written decades ago and they were
certainly written by fairly advanced
programmers and not for a broad
audience. And so what we have done is
imported all of that freely available
documentation uh hosted it at our own
URL here manual.cs50.io
and we've essentially simplified it for
those less comfortable those of you who
might be less familiar with less
comfortable with technology and really
for most people who aren't used to
reading manual pages. It's just useful
to have it written in teaching assistant
like language instead. So for instance
if you go to a URL like this you'll see
CS50's documentation for this official
library standard io.a H that comes with
C itself. If you get a URL like this,
you can look up the documentation for
print F itself specifically. So for
instance, let me go ahead and just give
you a teaser for this. If I were to do
the same on my own computer, I might see
the CS50 manual pages here and you'll
see header file by header file a bunch
of frequently used functions in CS50.
We've also filtered the list down from a
massive list to much shorter list so
that you can sort of see what's most
likely useful to you. If you go to a
specific page like standard io.h, you'll
see for instance here just over a
halfozen functions that we won't touch
on today beyond print def, but that
we'll see in the class over time that
does useful stuff. For instance, printf
prints to the screen. And we'll see
other functions for opening files,
closing files, and the like because all
of that's related to standard IO input
and output. If I go to a specific man
page for uh this uh header file, you'll
see the standard formatting for these
pages. So, here's the name of the
function, print f, and it prints to the
screen. You'll see a synopsis, and this
indeed indicates we're in less
comfortable mode. If you want to see the
original, more arcane documentation,
just uncheck that, and you'll see the
original official documentation, but
you'll see a mention of like what header
file this function is defined in so that
you know what file to use in your own
code. You'll see a so-called prototype,
which is just the first line of code
from that function. More on that in just
a little bit. You'll see an English
description. You'll see example code.
Long story short, this is the
authoritative answer. And even though
you have access in this class to the
virtual rubber duck at CS50.AI and in
other forms of it that you'll soon see,
you should also have the tendency and
the in instinct moving forward to check
the official documentation. And all of
today's AIS are trained on things like
the official documentation. So that's
the source material that any of these
AI, the ducks among the duck among them
are actually relying on. But what we're
also going to see is that besides these
official functions, there's some that
CS50 itself has invented. We use these
really as training wheels for just the
first few weeks of the course and then
we take these training wheels off. But
the reality is in a language like C,
certain stuff is just really hard or
annoying to do. Certainly if you're
learning how to program for the very
first time or at least you are new to C.
We'll eventually show you how to do it
that way. But even if you just want to
get input from the user like a string of
text or a number of some sort, it's
generally not that easy to do in C, at
least in these early days. So for
instance, at this URL here, you can see
documentation for CS50's own library and
CS50's own header file, CS50.h. And
you'll see such functions in the
documentation as these get string, get
int, get char, and a bunch of others as
well. And we'll touch on those this
week. But it will ultimately be a way of
just getting useful work done quickly by
standing on our shoulders and actually
uh using functions we wrote to then
solve problems of interest to you. So
let's focus for instance on one of these
first. Get string. A string in
programming speak means text. Zero or
more characters of text like h e l l o
comma space w o r l d. That is a string
of text in computer speak. And it's
obviously not a number like 50. It's
actual text that you would type on the
keyboard. We'll see then what other
things we want to get. But with this pro
this function, we can start to replicate
another program that we implemented
pretty quickly last week in Scratch. So
recall that in Scratch, this one was a
little more interactive. I used another
blue puzzle piece ask to actually get
input from the user. And recall that
unlike the print defaf function today
and the say block last week, this time
we still have the same input output
model, but if we pass in arguments to a
function uh that we're about to see, you
can get back not just a side effect
sometimes, but a return value like a
useful reusable value like the person's
name as we'll soon see. All right, so
let's actually do this. If in Scratch
the equivalent was asking the user,
what's your name? asking them that and
then waiting for an answer that we can
store in a variable. Let me propose that
in C side by side it's going to look a
little something like this. Instead at
left we have the scratch block the ask
function here is the argument there too
and then it and wait just means it's
going to wait till the user finishes
typing. If I want to translate this to C
now today moving forward well it looks a
little something like this. The closest
analog in C thanks to CS50's library is
going to be a function called get
string. So there's no C function called
ask. And we deliberately named this
function get string just to make super
clear what it is you are getting. A
string of text in this case. And we've
got the parenthesis ready to go
indicative of this white oval for user
input. If I want to prompt the user with
that same phrase, what's your name?
Well, I can just put it inside of those
parenthesis. But what next do I need to
add around my user input? Um, you did
the quotation marks.
>> Yeah, I need the quotation marks just to
make clear that these aren't special
individual words. This is a whole phrase
that I want to be displayed to the user.
So, I'm going to indeed put double
quotes around everything. And this is
just an aesthetic. I don't in this case
want to bother moving the cursor to the
next line. Like, I want the user to see
the question and I want the cursor to
just stay there blinking waiting for
their prompt. But I don't want the
cursor to be right next to the question
mark. So, I'm deliberately just leaving
a single white space there just to kind
of scooch it over a bit so it looks a
little prettier, at least to my eye.
Now, we're not done yet because we need
to do something with this value. The get
string function, as we'll soon see, is
going to prompt the user for me to type
something in like my name. But where do
I want to put that? Well, MIT has the
answer put in a variable called answer.
And you can't rename that in Scratch.
It's just defined as answer. But in C,
what I'm going to need to do is
something like this. If you want to keep
return values around from a function,
you literally use an equal sign and then
to the left of it, you put the name of
the variable into which you want to put
that return value. So in mathematics, we
would use X, Y, and Z as our variables.
Again, in code, as in Scratch, you can
name your variables anything you want.
By convention, they should usually be
lowercase. They should not have spaces
therein, similar to file names. But this
is a pretty good analog now of what's
going on collectively here. But C is a
little more precise. It you can't just
give the variable a name. You need to
tell C or really the compiler what type
of value you want to put in this
variable. So if it's a string of text,
you put string. If it's a number, you're
going to put something else. But for
now, it's a string. Per the function's
name, it's going to give me a string.
Now, we're so close to finishing this
comparison. There's one detail missing.
What's still missing from the code here?
Yeah.
>> Yeah. So, we have to finish the thought
lastly with a semicolon. So, if you're
getting to sort of the point already,
like this is one of the reasons why we
start with Scratch, you sort of you get
the intuition pretty quickly. And even
though nothing on the right hand side is
particularly hard, there's just all
these stupid little details that you
have to ingrain in yourself over time.
In this case for C, but for many
programming languages, we're going to
see the similar paradigm. But among the
goals of the course too are to show you
how ultimately languages have been
evolving. And so one of the things we'll
see in Python in a few weeks time that
some of this syntax actually goes away
because over time humans have gotten
annoyed at older languages like this.
Like why the heck do I have to keep
putting a semicolon when it's clear that
I'm at the end of the line. So we'll see
among languages like Python we can get
rid of some of these same features. But
for now it's just a matter of
remembering what goes where. All right.
So, let's go ahead now and take that
same idea of converting Scratch to C and
actually do something with this code.
Let me go back to VS Code here. I'm
going to keep my file name the same, but
what you'll see on CS50's website is
that we'll add version numbers to each
of the examples that I'm typing out. So,
you can actually see the progression of
these programs, even though we're not
changing the name. And what I'm going to
go ahead and do here, for instance, in
hello C this time, is the following. I'm
going to go ahead and uh first get rid
of the single hello world. I'm going to
go up here and include this time cs50.h.
So, not one but two header files. And
then inside of my curly braces, inside
the so-called main function, as we'll
soon call it, I'm going to go ahead and
do this. Exactly the same line of code
as on the screen before, I'm going to
get a string prompting the user for
what's your name question mark space
close quote semicolon. And as an aside,
this will will soon see print on the
screen what's your name. So that implies
that the get string function is actually
using print f itself to print out that
message. I do not need to use print f to
display that message on the screen
because I read the documentation for
CS50's get string function and I just
know that it is using print f for me to
achieve that particular goal. Now let me
do something intuitive but not quite
correct. If I want to print out that
answer so that the expression is going
to be not hello world but hello David or
hello Kelly. Let me go ahead and say
hello,
answer back slashn to move the cursor
down as before. semicolon. So this is
not quite right. And even if you've
never programmed before, you can perhaps
see where this is erroneously going. Let
me remake the program because I've
changed the source code and I need new
machine code. Nothing seems to be wrong
aesthetic uh uh logic rather
syntactically. But if I do now dot
/hello and hit enter, you'll see I'm
being prompt. What's your name? So I'm
going to go ahead and type in David and
then hit enter. But when I do, if you
know where this is going, what am I
going to see instead?
>> Hello answer. And the computer's just
doing literally what I told it to do. I
said quote unquote print out hello
answer. But obviously that's not the
goal that I have in mind. So how do I
actually work around that? Well, what I
really need to do is achieve the
equivalent of this thing here, which we
did by stacking blocks in Scratch or
nesting them, if you will, one inside of
the other. So, I want to join the
expression hello, space, and that
answer. And it turns out in C, you can't
do it quite like this. Like, there isn't
an analog of the join function, at least
that we'll see today. So, we have to do
this a little bit differently. We can do
it though by maybe telling the computer,
we'll go ahead and print out hello,
comma, space, and then maybe we can give
it like a placeholder to plug in the
name once we know the name. Because when
I'm writing my code, I have no idea
who's going to play this game, me or
Kelly or someone else. So, what if we
use special syntax to indicate where I
want the person's name actually to go?
Let me propose that we now do this.
instead of printing out hello quote
unquote uh hello comma answer quote
unquote let's go ahead and start
printing out something and I got my
parenthesis ready to go and I did my
semicolon in advance this time I want to
somehow now say hello placeholder and
you would only know this by someone
having told you or a reference online
percent s is the placeholder for a
string that you don't know when you're
writing the code but when someone else
is running the code it will be filled in
and substituted for other input. So,
hello, percent s is the closest we can
get to this. I still need though some
other syntax. I still I do need those
quotes on the left and the right just to
be uh aesthetically pleasing. I'm going
to put a back slashn there at the end to
move the cursor, but now I've left room
in my parenthesis for one more thing.
And you can perhaps guess where I'm
going with this. Again, even if you've
never programmed before, this is telling
print f print out h e l o comma space
something. What should I probably pass
in to these parentheses as a second
input so that print f knows what that
something is?
Yeah,
>> the variable.
>> The variable name. So the variable in
which I have the user's name and indeed
the convention is to put a comma after
the quotes and then the name of the
variable that has the value you want to
be substituted for that placeholder. Now
notice there's a collision of syntax and
grammar here. The comma inside of the
quotes is just an English thing. Hello,
comma, so and so. The comma outside of
the quotes is meaningful to C because it
delineates which is the first input or
argument to left and which now is the
second. And we haven't seen this before
in C. Up until now, we've only been
passing one input, but you can pass in
two or three or four. Completely depends
on what the function is designed to
expect. So, let me put this all together
now. Let me go back to VS Code.
Previously, we were literally printing
out answer, but I can change answer to
percent s. I can move my cursor outside
of those quotes, comma, answer, because
that's the name I gave to that variable.
I can go back down to my terminal window
and clear it just to reduce clutter. Let
me do make hello one more time. Seems to
work. Dot /hello. Enter. DAV ID. And now
hello,
David is printed.
Okay, questions on any and all of that.
>> I was wondering with the header file,
where is it pulling from?
>> Good question. Where is it pulling these
header files from? So, what you are
seeing here is a graphical user
interface that's somewhere hosted in the
cloud at cs50.dev, the URL I mentioned
last week, and we're going to tease this
apart in just a moment. That software is
running on a computer, and that
computer's got a hard drive or a solid
state drive, like folders of storage.
Those files, CS50.h and standard.io.h
age and many more are pre-installed on
the server to which I have connected and
they're stored in a standard place so
that the compiler in particular knows
where to look for them and those are all
things we did in advance for you. Yeah.
>> Why is back slashn not create a new like
a new line?
>> Why does the back slashn not create a
new line? So it is back slashn is
essentially being printed here which has
the effect of pushing the dollar sign to
the next line. Otherwise, the dollar
sign would stay on that second to last
line. Other questions?
>> Why is there no backslash on this?
>> Good. Uh, why is there no backslash and
over here?
>> Good question. My choice as the
programmer. I just wanted to see the
sentence, what's your name? And I wanted
the user me to type my name immediately
after it like this. But I didn't have to
do it that way. I just wanted to show
you the difference.
>> Gotcha. And then also like just
generally when we're like doing the work
should we always write the like first
four lines.
>> Should you always write the first four?
Oh these. Yes. For today trust me do
this, do this, do this, do this. And
next week we'll understand even more
what those lines do. However, slight
caveat only use cs50.h if you're using
one of our functions. Clearly you don't
need cs50.h if you're just printing
something out as in the first example.
Other questions?
is dividing the first input and the
second input. I understand that the
second input is what I type as the user.
The first input doesn't really feel like
input for me because that's the question
that you asked. Can you like explain a
little bit why both say input?
>> Correct. So to to summarize the question
on the right here, this input is
effectively provided by the user. This
first input though is provided by me.
That's the way it is. So uh these are
both inputs because they're being
provided as inputs to the function. The
origins of those inputs though are
entirely up to what I'm trying to
achieve. The first one I know in advance
like I'm the programmer. I know I wanted
to say hello, someone. The second input
I don't know in advance. So I'm using a
place I'm using a variable to store the
value that I'm going to get when the get
string function is used later on. But
they're both inputs even though they're
used in different ways. Good question.
Any others?
No. Okay. So, if we now have that done,
well, let's just take a step back into
the first question that was just asked
about um where are these files? Let's
take a look back at actually what it is
we're actually using here. So, it turns
out even though most of you are using
Mac OS or Windows, there's other
operating systems out there in the
world. Phones have iOS. Uh iPads have
iPad OS. Uh Android devices have
Android, which is its own operating
system. The operating systems in the
world are the pieces of software that
really just do the most fundamental
operations on a device like booting it
up, shutting it down, sending something
to a printer, displaying something on
the screen, managing windows and icons
and all of that sort of commodity stuff
that is used by other people's software
as well. A very popular operating system
in the programming world and in the
world of servers in the cloud and on the
internet at large is called Linux. And
it's a descendant of something called
Unix um which has been around for quite
some time and it's what many programmers
most programmers um use depending on
their environments in so far as Linux is
very highly performant like you can
support thousands of millions of users
on servers running an operating system
like this. It tends not to but it can
have a graphical user interface which
just means it can operate more quickly
because it doesn't need all of these
graphics that are really just for humans
benefits not necessarily for web
browsers and other devices. And Linux in
so far as it's usually used or often
used as a command line interface comes
with a whole bunch of commands that
you'll start to use and see over time.
Now I've used a bunch of commands
already. I've used code which is a VS
code thing. I have used make which is
for today's purposes our compiler but
that's a little white lie that we'll
distill next week. Uh and then I've used
dot /hello which is a command I
essentially invented as soon as I
created a program called hello. But
there's a bunch of other ones as well.
For instance, if I want to list the
files in my current folder, I can type
ls and hit enter for short. If I want to
uh create a new folder, otherwise known
as a directory, I can use mkdir to make
a directory. If I want to remove a
directory, I can use rm directory. If I
want to remove a file, I can use rm. If
I want to rename a file, I can use mv
for move. If I want to copy a file, cp.
If I want to change directories, change
into a folder, I can use cd. Now, these
two just take a little bit of time and
practice to memorize them, and they're
all very tur in so far as the whole
point of a command line interface is to
let people navigate things quickly. So,
for instance, even though this will be a
bit of a whirlwind, let me go back into
VS Code and let me propose that we play
around with just a few of these commands
so that you've seen me doing it, but
generally speaking, in CS50's problem
sets, we will tell you step by step what
commands to type so that you can achieve
the same results. And then later in the
term we'll stop bothering reminding you
pedantically how to do uh this and that
because it should come more naturally
eventually. But for instance let me go
ahead and do this. Let me go ahead and
reopen my file explorer at left. Yours
will look a little different. You'll
have a different number as your unique
ID but generally you'll see whatever
files and or folders you've created
already. The first thing I created today
was called hello.c. And then by using
make I created a second file I claimed
called hello. So the reason hello works
is because there is in fact a program
called hello in my current folder ergo
the dot that was created when I compiled
my source code into machine code. Now
suppose for the sake of discussion that
this is going to get messy quickly
because the more programs we create in
class and for problem sets, you're just
going to have a hot mess of files inside
of this one main folder. Well, let's
create subfolders like you might be
inclined to do on your Mac or PC or
Google Drive or whatnot. Well, we can do
this in a bunch of ways. I could
rightclick or controll-click on my file
explorer, and I'll see a somewhat
familiar uh contextual menu, and I can
literally choose new folder, or I can
rename things, or I can move things
around by dragging and dropping them.
But for today, let's focus more on the
CLI, the command line interface. And
again, commands like this. So, let me go
back into VS Code, and let me propose
that we do a few things just because as
a tour. First, let me delete the machine
code. I I've I'm done with this example.
I don't really want to keep these bits
around unnecessarily. I'm going to
delete hello. Not hello.c, but hello.
The compiled program. When I type that,
I'll be cautioned. Remove the regular
file, whatever that means, called hello.
Here, I'm being prompted for a yes no
response. Y suffices. So, I'm going to
hit Y, enter, and watch what happens at
top left. As soon as I use my terminal
window and this command to remove that
file, it disappears. I could have
rightclicked on it or control-cllicked
on it, but this command line interface
achieves the same thing. Now suppose
that for problem set one in future
problem sets, I want to keep like every
program I write in its own folder just
to keep myself organized, especially as
the term progresses. Well, let me create
a new folder called hello itself. So I
don't want to create a program called
hello. I want to call create a folder
called hello. Well, one way I can do
this per this here cheat sheet is to
make a directory which just means
folder. So, mkdir
hello. Enter. And you'll see at top left
now I indeed have a folder. And it even
has an obvious folder icon next to it.
Now I could cut some corners. I could
click and drag on hello.c and just drop
it into hello. But again, let's stick
with the command line interface. Let me
go ahead now and move mv for short.
Hello. C into hello. So this is the
first command where I'm passing in not
one word after the command like code
hello. see or make hello. Now I'm typing
two words after the command because the
way the move command is designed is to
expect the origin as the first word and
the destination as the second so to
speak whereby if I want to rename hello
C sorry if I want to move hello.c into
the hello folder I should type like
this. Now, you can, just so you know,
include a trailing slash, a forward
slash at the end of the destination just
to make clear that you want to put this
into a folder and not just rename
hello.c to hello. But because the hello
folder already exists, Linux knows what
it's doing. And it's just going to
assume that when you do that, watch what
happens at top left. Hello. C seems to
have disappeared. But if I click this
little triangle, ah, there it is. It's
now inside of that folder. But now I've
created kind of a predicament for
myself. Let me clear my terminal window.
And now let me type ls. And when I type
ls for list, you'll see only a folder
called hello. And it's colorcoded just
to call it out to your eyes. And there's
a trailing slash just to make obvious
that it's a folder. That's all done
automatically for you by Linux, the
operating system. But wait a minute,
where did my hello program go? Like
where is hello. C. Well, it's in that
folder. So I need to change into that
folder or directory. And here per the
cheat sheet, we have cd for change
directory. So, I can do cd space hello
with or without the slash and hit enter.
And now you'll see this. And it's
admittedly a little cryptic, but my
prompt has now changed to still be a
dollar sign, but before it is just a
constant reminder of where what folder I
am in. We uh adopted this as a
convention. Many systems do the same
thing, though the formatting might be a
little different. This is just to help
you remember where the heck you are
without having to type some other
command to ask the operating system what
folder you are in. So now that I'm here,
if I type ls and hit enter, what should
I see?
Just hello. C because that's the only
thing in that there folder. So now let's
do maybe one other thing. Let's do make
hello inside of this folder. That is
okay. And notice at top left what just
happened. Now I've got both files back.
All right. Suppose I want to get rid of
one. Well, I can do rm hello again. I
can type y for yes to confirm the
deletion. And now I'm back to where I
just was. Now suppose I want to do yet
other things. Suppose that I'm not
really proud of this version of hello.
C. Let me keep it but rename it. Well, I
can say uh how about MV hello C to old
C. I just want to rename the file. So MV
can be used not only to physically move
a file from one place to another. If you
use it onto file names, it will just
rename the file for you. So there's no
rename command that you need use
instead. Uh but you know what? Nope. I
regret that. This program was fine.
Let's rename it back. So, let's move old
C back to hello. C. And watch it. Top
left. It just renames the file again.
Um, let me go ahead and make a backup
though. So, let me copy with CP hello. C
into a file called like backup.c just in
case I screw this up. I want to have a
spare around. Now, you see at top left,
I've got both files. If I now type ls,
you'll see both files. So, what's
happening in the guey is the exact same
thing is happening in the CLI. But, you
know what? This was just for
demonstration sake. I don't need any of
this. So, let me remove the backup. say
yes for y. Let me go ahead and move
hello.c out of this folder, which I
could just kind of drag and drop it. But
how do I move hello C to the parent
folder, so to speak. I want to move it
out of this folder. Well, you would only
know this by having been told dot dot is
special notation. That means the
so-called parent folder. So, go back up
in the hierarchy. And now, if it's not
obvious, a single dot, which we have
seen before, means this folder. Two dots
means one step up. There's no triple
dots or quadruple dots. You have to use
different syntax, but more on that
another time. So, watch what happens
when I do move hello.c up into the
parent directory. Notice at top left
that the indentation changed because
it's no longer inside of that same
folder. And heck, now I'm going to go
ahead and do this. I could go back to my
main folder by doing cd dot dot to back
out of this folder. But when in doubt or
if you ever get yourself into a
confusing mess, just type cd enter alone
and you'll be magically whisked away to
your default folder, a home directory so
to speak, even though that too is a bit
of a white lie. So that will lead you
always where you're starting when
logging in to c50.dev aka VS Code. And
now I can see the folder which happens
to be empty and the file. So let me go
and do one last command rmder. Hello to
really undo all of the work such that
we're now back to where the story began.
But the point here is just to
demonstrate with that with these basic
fundamental commands, you can do
everything that you've taken for granted
on Macs and PCs for years with a mouse
instead. Questions on any of these here?
Linux commands. Yeah.
>> Files in a folder, how can you like to
open?
>> Really good question. If you have five
different f files in a folder, how can
you choose which one to open? Well, you
can certainly do code space and the name
of the file you want to open. Or we're
going to see other tricks like you can
use an asterisk or star for a so-called
wild card and say open everything in
this folder. And you can even use more
precise patterns than that. So over time
once we have more files at my disposal,
I'll be able to do tricks like that as
well too. Yeah.
>> I don't know if I said
it back.
>> Uhhuh. when you like delete the file was
that hello was that hello.
>> Sure. So one of the things I did in my
VS code a moment ago was once I was
inside of the hello folder into which I
had put hello.c just for the sake of
discussion. I then recompiled it by
running makehello. And this example is a
little confusing deliberately in so far
as I've got a file called hello.c C
inside of a folder called hello. But
because I compiled hello.c, I then
created a program called hello as well.
But that program hello was inside of a
folder called hello. Which is only to
say that you can totally do this. You
can't have a file in a folder in the
same place named the same thing because
they would collide. Like you can't do
that on a Mac or a PC as well. You have
to have unique names. But you can
certainly put something inside of
another folder without collision. Good
question. All right. So let's introduce
a few more building blocks and a few
more things we can do. So besides these
Linux commands which we'll now start
taking for granted, we have a bunch of
other features of of programming
languages that we saw in Scratch. Let's
now translate them to C. So conditionals
were sort of the proverbial fork in the
road enabling you to do this or this or
some other thing based on the answer to
a question, a so-called boolean
expression. Here for instance in scratch
is how we might express if a variable x
is less than a variable y we'll go ahead
and say x is less than y and out of
context I didn't include it in the slide
presumably we've created x and y and
somehow given them values whatever they
are but this is just now the conditional
part of the program in C the way you
would do the same thing is you would say
if and then a space then parentheses
which have nothing to do with functions
if is not a function it is a feature of
C that implements conditionals just like
this orange block is a feature of
scratch inside of the parenthesis you
put your same boolean expression. So
here too out of context if up here I
have defined variables X and Y well I
can certainly use them in this
conditional and I can use this less than
operator just like in math class to ask
this question and the answer even though
it's a less than sign is indeed if you
think about it going to be true or false
yes or no. It's a boolean expression. It
either is less than or it is not. All
right. Inside of the curly braces which
are necessary here I'm just going to
literally put our old friend print f.
And there's nothing interesting here
except the new phrase x is less than y
with the backslash end the semicolon and
the parenthesis. This though is
deliberate just like in Scratch the say
is sort of indented and sort of hugged
by the if orange puzzle piece. Similarly
do these curly braces are they meant to
sort of imply the same. It's sort of
embracing these lines of code. As an
aside in C they're not always necessary.
If you have a single line of code you
can technically omit them. However, what
you'll see in C as in as well as in CS50
in particular, we will generally preach
a certain style like any company in the
real world would do so that programmers
who are collaborating on code all write
code that looks the same uh so that it
doesn't uh devolve into a mess because
everyone has their own convention. So
this is a convention to which you should
indeed it here and then I've indented
four spaces to make clear logically that
this line of code only executes if the
answer to this question is true or yes.
Meanwhile in Scratch if we had an if
else condition so a two-way fork in the
road. If x is less than y say so else
say x is not less than y. How can I do
that in c? Well if x less than y
something else something else. And what
are the uh what's goes in between those
curly braces? Well, just two different
print fs. X is less than Y or X is not
less than Y. The only new thing here is
we've added else and another pair of
curly braces, just like we've got sort
of two uh orange uh shapes hugging those
two purple puzzle pieces there. All
right, how about something a little more
involved? And this looks like it's
escalating quickly, but it's just
because the scratch puzzle pieces are so
big. If x is less than y, then say x is
less than y. Else if x is greater than
y, then say x is greater than y. else if
x equals y then say x is equal to y. How
can we do this and see almost the same
idea. If x less than y else if x greater
than y else if x equals equals y. Well
before we reveal what's in the curly
braces. This is not a typo. Why have I
presumably done this even if you've
never used C before. Yeah.
>> Exactly. The single equal sign, which
we've used already when storing a value
from get string into a variable like
answer, is technically the assignment
operator. So humans decades ago decided
that when faced with the situation where
they wanted to copy from the right to
the left a return value into a variable,
it made sort of visual sense to use an
equal sign because you want those two
things ultimately to be equal. Even
though you kind of read the code from
right to left in that case, I can only
imagine at some point the same people
were in the room and they were coming up
with the syntax for conditionals and
like oh shoot we've already used equals
for assignment. What do we now use for
equality and the solution in C as well
as in many other languages is literally
this. They use two. So this is the
equality operator whereas a single one
is the assignment operator and it's just
because now Scratch is designed for
kids. No sense in confusing little kids
with equal equal signs. So, Scratch uses
a single equal sign, whereas C and most
languages use double equal sign. So, a
minor divergence there. What goes in the
curly braces? Nothing all that
interesting, just a bunch more print fs.
But here's an opportunity to distinguish
not only the equivalence of this scratch
code with CC code, but a misdesign
opportunity that we sort of tripped over
if briefly last week. This is arguably
not well designed even though it is
correct.
Why? Yeah,
>> you don't need to ask.
>> Yeah, we don't need to ask this third
boolean expression. Is X equal equal to
Y, so to speak? Well, logically, if
we're using sort of normal person
numbers, it's either less than or
greater than or by default equal to. So,
you're just wasting the computer's time
and in turn the user's time by asking
this third question. So, slightly better
here would be get rid of the else if
just have a default case, an else block
so to speak, that looks like this. if it
stands to reason that there's only three
possibilities, you only really need to
interrogate two of them out of the
three. So, a minor optimization, but you
could imagine doing that again and again
and again in your code. You don't want
to be wasting the computer or the user's
time if you can improve things like
that. All right. So, now that we have
these equivalences between Scratch code
and C code for these conditionals, well,
what other things can we throw into the
mix? Well, uh C has a whole bunch of
operators. And just so that you've seen
a list in one place, you've got not only
assignment and less than and greater
than and equality, but a few others here
as well. Now, even though in like
Microsoft Word, in Google Docs, you can
kind of do a greater than or equal to
sign one over the other or less than or
equal to, in C in most languages, you
actually just hit the keyboard twice.
You do the less than and an equal sign,
or you do a greater than and the equal
sign. And that's how you achieve the
notion of greater than or equal to or
less than or equal to. Um, this one
we've seen. Anyone want to guess what uh
exclamation point equals means?
Otherwise pronounced bang equals. Yeah.
>> Not equal. So generally in programming
you'll see an exclamation point implying
the negation of something else. The
opposite. So you don't want it to be
equal to, you want it to be not equal
to. Now you might think, shouldn't it be
not equal equal? Yes, but they're trying
to save keystrokes. So this is the
negation of that even though it doesn't
quite look like it should be. just two
characters instead of three. Um, and dot
dot dot there's many other operators
that we'll encounter in the wild over
time. Um, but there's also worth noting
in C more than just strings like strings
recall were strings of text and there's
other types of uh data that you might
get from a user or store. We've seen
string but we'll actually see a whole
bunch of others. So in C we're going to
see bools themselves a a variable that
can be true or false and that's it. So
very much interrelated with boolean
expressions. A variable itself can be
true or false. We're going to see chars
or characters. So not strings of text
like multiple letters and words and the
like but just individual characters. C
unlike some languages does distinguish
between single characters and multiple
characters. Uh double or rather let's
jump to float. A float is otherwise
known as a floatingoint value which is
just a number that has a decimal point
in it. a real number if you will, but a
float generally uses nowadays 32 bits
total to represent those numbers. The
catch with that is that how many total
values can you represent with 32 bits
roughly per last week?
It was one of the few numbers I propose
you remember. It's like roughly 4
billion. But how many real numbers are
there in the world according to math
class?
An infinite number. So we seem to have a
mismatch between what we can represent
in code and how many actual numbers
there are in the world. Okay, so not to
worry if you need more precision like
more significant digits. Well, you can
upgrade your variable so to speak from a
float to a double which uses 64 bits
which is way more precise twice as many
bits but it doesn't fundamentally solve
the problem because really it's still
finite and not infinite. And we'll end
today with a look at what the real world
implications of that are. But besides
floatingoint values, they're just simple
integers. 0 1 2 and the negatives
thereof. Uh but those conventionally use
32 bits, which means the highest a
computer can count using an int would be
4 billion. But if you want to do
negative numbers, it's going to be
roughly 2 billion. So you can go all the
way to negative 2 billion. So that's not
that large nowadays. Along uses 64 bits,
which is a much bigger range of values,
but there too still finite. And there's
a bunch of others as well. So these are
just the types of data that we can store
and manipulate in our programs. But a
couple of those know do uh couple of
those one in particular specifically
come from cs50.h. So among the things
you get by including cs50.h in your code
is access to not only get string but
these other functions as well. And we'll
start to use these in a little bit
whereby you can get integers or chars or
doubles or floats. We don't have a get
bool cuz it's not really useful to just
get a true or false value typically, but
we could have invented it. We just chose
not to. But we'll frequently use these
here functions that you can access by
using that there header file. But where
are we going to put these values and how
are we going to display them? Well,
turns out there's more than just percent
s. So percent s was a placeholder for a
string, but if you want to print out
something like a char, a single
character, you're actually going to use
percent c. If you want to print out a
floatingoint value, you're going to use
percent f. An integer percent i and a
long integer that is a long, you're
going to use percent li instead. So in
short, there's solutions to all of these
problems. These are not uh
intellectually interesting details, but
they are useful practical things to
eventually absorb over time. So let's go
ahead and do this. Let's do just a few
more examples together. In a little bit
we'll journey and we uh for a short
break uh during which uh snacks will be
served every week out in the transep.
But before we get to that, let's uh
focus on these here variables. So in
Scratch we had the ability to store a
bunch of values in variables that we
could create ourselves by creating new
puzzle pieces. In C you can essentially
achieve the same. So for instance
suppose that in Scratch we wanted to
keep track of someone's score using a
counter. Well, we might create a
variable called counter and set it
initially to zero and then eventually
add one to it, add two to it, and so
forth as they drop trash into the trash
can, for instance. Well, in C, you're
going to do something almost the same.
You can choose the name of your variable
just like I did previously with answer.
You can assign it a value like zero
initially, but per earlier, what more am
I probably going to have to do in C on
the right hand side here? Yeah,
>> I got to give it a type and a counter
in. in so far as it's numeric is not
going to be a string of text and I don't
think I need to worry about decimal
points if I'm just counting the
equivalent on my fingers. So int will
suffice and int is the go-to number and
le at at least if two billion plus
values is more than enough for your case
which this is going to be still one
minor thing missing. Yeahm
>> the semicolon to finish the thought. So
that on the right is the equivalent to
doing this here on the left. Suppose
that in Scratch you wanted to increment
the counter and add one to the score,
add two to the score and so forth. It
might look like this. Change counter by
one implicitly going up unless you did
negative which would go down. In C, you
can do this actually in a few ways. And
this looks a bit wrong at the moment.
How can counter possibly equal counter +
one. This does not mean equality per se.
The single equal sign recall is
assignment and it means take the value
on the right and copy it to the value on
the left or to the variable in this case
on the left. So this takes whatever the
current value of counter is zero adds
one to it and then stores that one in
the counter variable. So now the value
is one and if you do it again it goes to
two goes to three goes to four and so
forth. But honestly this incrementation
technique is so common that there's more
shorthand notation for it. You can also
just do this. Looks a little weird at
first glance but counter plus equals 1
semicolon does the exact same thing. You
can just type fewer keystrokes. And
honestly, doing this is so down common
in C that you can even do this counter
plus plus does the exact same thing by
adding one to the variable. There's no
plus+ or plus+ or more pluses. It's only
for incrementing individual values by
one. So arguably this version and this
version, albeit more verbose, are a
little more versatile because you can
add two or three or more at a time. And
there are equivalents for you doing
decrementation and doing minus minus or
the minus symbol more generally in
there. All right, so let's actually use
this technique in some code. Let me go
back into VS Code here. Let me close my
file explorer and let's go ahead and
create maybe this time like a a little
calculator of sorts. Let me propose that
we implement a very baby calculator or
rather not even a calculator yet. Let's
just compare some few values. So let me
do this code of compare C to create a
brand new program called compare. And
then in here I'm going to do a bit of
boilerplate. I'm going to go ahead and
include cs50.h. I'm going to go ahead
and include standard io.h. And I'm going
to go ahead and uh do int main void.
More on that next week. And then inside
the curly braces, let's use these these
new techniques. Let's give myself a
variable called x and set it equal to
the return value of get int. that other
function I promised exists. And let's
prompt the user for a value for x with a
sentence like what's x question mark and
then a space just to nudge the cursor
over. Let's get another variable y. Set
it equal to get int again and ask the
user this time what's y essentially
using the same function twice but to get
two different values. Now let's go ahead
and do something pretty mindless. If x
is less than y, go ahead and print out
with print f x is less than y. Back
slashn to move the cursor close quote
semicolon. So it's not that interesting
of a program, but it's at least dynamic
in that now I'm prompting the user for
two numbers. So let's do this. Make
compare. Enter. Seems to have worked.
And in fact, I can check that it worked
by typing what command to list the files
in my directory.
ls for short. And now you'll see I've
got hello.c. C, but no hello because I
deleted that with rm a few minutes ago.
I've got compare.c which I just created.
And then I've also got a program called
compare. And the asterisk there is just
a visual indicator that this is
executable. It's a program you can run.
It's not just a simple old file. Even
though I didn't type ls previously with
hello, uh it would have similarly had an
asterisk next to it in this context. But
you don't see that in the file explorer.
If I now do compare, well, let's do
something silly like one for x, two for
y. Okay, X is less than Y. Let's do it
again. Dot slashcompare two for X, one
for Y. Okay, and I see nothing. Well,
why am I seeing nothing? Well,
logically, I didn't have a condition for
checking for greater than, let alone
equal to. So, let's enhance this a
little bit. Let me go ahead and
minimally say, all right, else if X is
not less than Y, let's go ahead and
print out X is not less than Y back
slashn close quote semicolon. So I'm at
least handling that situation too. Let
me clear my terminal window. Do make
compare again. Dot /compare one and two
works exactly the same. Now let me go
ahead and do two and one. There we have
better output. Of course it's not really
complete yet because if I do dot slash
compare again and do one and one, it'd
be nice to be a little more specific
than x is not less than y. It's not
wrong but it's not very precise. So I
can add in the to the mix what we did
earlier and I can say okay well else if
x is greater than y say x is greater
than y else if x equals equals y go
ahead and print out x is equal to y back
slashn close quote but here too someone
observed that this is sort of stupidly
inefficient what line of code should I
actually improve here to tighten this up
yeah
>> instead What else did you just get rid
of?
>> Yeah. So line 17. I think I can just get
rid of that unnecessary question because
logically that's going to be the case at
this point. And now I can go ahead and
recompile this with make compare dot /
compare again. Enter one and one. And
now we're back in business catching all
three of those situations uh those uh
scenarios there.
Questions on any of these things here?
Why have I deliberately not done this?
Let me rewind just a moment and let me
hide my terminal window just to keep the
emphasis on the code here. Why not do
this and keep my code arguably simpler?
Like why not just ask three questions?
Step nine, step 13, and step 17 here.
Yeah. What don't you like?
>> Because then it would check each and
every condition. Um even though for
example the first one might be
fulfilled, it would check the second and
third. That wasted
Exactly. It's another example of bad
design because now no matter what, you
were asking three questions on lines 9,
13, and 17. Even if X ends up being less
than Y from the get-go, you're still
wasting everyone's time by saying,
"Wait, well, is X greater than Y?" You
already might know that it's not. Is X
equal to Y? You already might know that
it's not. And so these three
conditionals at the moment are mutually
exclusive, whereby you're checking all
three of them no matter what. even
though logically that shouldn't be
necessary. So our first approach was
actually quite better. And in fact, just
to show you the the density difference
here, let me go back to this very first
version here whereby I was only checking
that one condition. Is X less than Y?
Well, if you're more of a visual
learner, you can actually draw out what
code looks like in flowchart form. So
here is a drawing of a program that
starts here and ideally stops down here.
And each of these uh figures in the
middle sort of represent logical
components of the code. Uh here in the
di in the diamond here is my boolean
expression which represents the start of
the conditional. So if x is less than y
I have a decision to make yes or no true
or false. Well if it is less than y
true. Well let's go ahead and print out
quote unquote x is less than y and then
stop. However the first version of that
program recall just said nothing if it
were not the case that x were less than
y. That's because false just led to the
stop of the program. There's no keyword
stop. There's just no hand no code to
handle that situation. But the second
version of the code when I actually
added an else looked fundamentally a
little different. So now second version
of that code asked is X less than Y and
if true behavior is exactly the same.
But if it weren't true, it were instead
false, that's when I got the message X
is not less than Y. But in the third
version of the code where I added the if
else if else if then the picture gets a
little more complicated and let me zoom
in top to bottom here we have a longer
flowchart but the questions are really
the same. When I start this program I
ask is s is x less than y. If so I print
out x is less than y. However in that la
sorry in that last version of the
program I was still foolishly asking the
same question. Well wait a minute. Is x
greater than y? Wait a minute. is x
equal to y and that's the version in
which again I had all of that
unnecessary code which I just undded
here asking three questions at a time
ideally I don't want to make that
mistake by doing it again and again and
again so if I instead revert that code
to else if and else if then my flowchart
looks a little bit different because
notice the sort of shortcuts now if x is
less than y true we do this and we're
done Super quick. If X is not less than
Y, fine. We do ask one more question. X
is greater than Y. Well, if so, boom. We
make our way to the end of the program
by just printing that. Only if it's the
perverse case where X equals equals Y.
Do we check this condition? No. This
condition, no. This condition, and then
okay, now we can print out X is equal to
Y because it must be logically. Of
course, it's been observed multiple
times. This is a waste of everyone's
time. So we can prune this chart more
and just have one question, two
questions and that alone tightens up the
program. So again, if you're more of a
visual learner, most any block of code
you can re translate to this sort of
pictorial form, but it really just
captures the same logical flow that the
indentation and the syntax and the code
itself is meant to imply. All right, how
about a final exercise with one other
type here? Recall that this is our
available types to us. Actually, two
final examples here before we have a bit
of a break. Here we have a list of types
that we can use. And here we have a list
of functions that we can use. Let's go
ahead and make a a program that's
representative of something we do quite
often nowadays, but using a different
type. So, let me go back into VS Code.
Let me close compare.c. Let me reopen my
terminal window and clear it just so we
have a new prompt. And let's go ahead
and create a program called agree.c.
It's all too often nowadays that we have
to like agree to terms and conditions.
To be fair, it's usually in the form of
like a popup and a button that we click,
but we can do this in code at the
command line as well. Let me go ahead
and include to start CS50.h and include
to start standard io.h. Let me again for
today's purposes do int main void, but
we'll reveal next week what we why we
keep doing that. And now for a yes no
answer, it suffices just to ask for a
single char or character, not a whole
string. So let's do this. char C equals
get char and let's ask the user quote
unquote do you agree question mark for
instance and now I can actually compare
that value for equality with some known
answers for instance I could say if c
equals equals quote unquote y then go
ahead and print out for instance agreed
period back slashn close quote semicolon
else if c equals equals equals n in
quotes. Let's go ahead and print out,
for instance, not agreed period back
slashn semicolon. Now, there's still
room for improvement here, but notice
we're just now using the same building
blocks in C um in different ways to
solve different problems. But notice on
lines 8 and 12, I've used single quotes,
which I alluded to earlier. Why is that
the case? Why single in this case here?
>> Yeah, it's a single character. And this
is just the way you do it in C. When you
want to compare a single character, you
use chars and you use single quotes.
When you want to use strings of text,
like multiple characters, multiple
words, multiple sentences or paragraphs,
you use strings. So this would seem to
work, but arguably I could be a little
more efficient. If the user doesn't type
why, I mean, frankly, I could just chop
off this else if and make it an else and
just assume if you don't give me a Y
answer, then at least I'm going to
assume the worst and you don't agree.
But even here, the program's not all
that great. Let me go ahead and do make
agree and then do dot slag agree. And do
I agree? Sure. I'm going to go ahead and
type y. Meanwhile, if I type anything
else like n or uh even emphatically, no,
that would seem to Whoops. Why did that
not work? Yeah.
>> Exactly. So, among the features of
CS50's functions like getchar is that it
will enforce what type of data you're
getting. So even though I it because I
used getchar, if the user doesn't
cooperate and types in multiple
characters, get char like some of our
other functions is just designed to
prompt them again again and again until
they cooperate. That's useful so that
you don't have to deal with that kind of
error checking. But here I could type n
in uppercase and that seems to now work.
But that only works because of the else.
Let me go ahead and do this which is
very reasonable. I'm going to go ahead
and type y capital y which you would
hope works. That feels like a bug at
this point. Like it's fine if we don't
want to support yes and no. We just want
to support Y and N. But it's kind of
obnoxious not to support the uppercase
version thereof. So how can we fix this?
Well, let me hide my terminal window.
And I could go in and fix this as
follows. I can say well else if C equals
equals quote unquote capital Y in single
quotes. And then I could do print out
agreed period back slashn semicolon. And
then I can do uh else uh that that would
work. That would work there. But what
rubs you the wrong way perhaps about
this solution? Even if you've never
programmed before,
just applying some of the lessons from
last week. Yeah,
>> it's redundant. I mean, I didn't
technically copy and paste, but like
line 14 is identical to line 10, so I
might as well have copied and paste. And
that's generally bad practice. Why?
Well, if I want to change the English
language to say something else in that
case, now I have to change it twice. And
it's just I'm repeating myself, which is
just bad design. So, there are ways to
address this through other types of
operators that we haven't yet seen. If I
want to ask two questions at once,
that's fine. I can do something like
this. Well, if C equals equals quote
unquote Y or C equals equals quote
unquote capital Y, I can tighten things
up using so-called logical operators
whereby I am now taking a boolean
expression and composing it from two
smaller boolean expressions. And I care
about the answer to one of those
questions being true. So whether it's
lowercase Y or uppercase Y, this code
now will work. And if it's anything
else, we're going to default to not
agreed. So the two vertical bars, which
is probably not a character you type
that often, and it varies where it is on
your keyboard depending whether it's
American English or something else, just
means logical or. This is not relevant
here, but you could also in some context
use two amperands to conote and. But
this does not make sense. Why? Why is it
clearly not correct to say and in
between these two clauses? Yeah,
>> exactly. The variable can't both be
lowercase and uppercase. That just makes
most no sense. So, this would be a bug,
but using a vertical two vertical bars
here is in fact correct. All right.
Well, let's do one final flourish here.
Besides conditionals, we had these now
loops. Recall that a loop is just
something that does something again and
again and again. Here for instance to
scratch how we might meow three times in
C. There's going to be a few different
ways to do this. Here is one. You can in
C declare a variable like I for integer
or whatever you want to call it and set
it equal to three, the number you care
about. You can then use a loop and the
closest to the repeat block is arguably
a while loop. There is no repeat keyword
in C. So we can't translate this
verbatim, but we could say while I is
greater than zero. Why? Because that's
sort of logically what I want to do. If
I start counting at three, maybe I can
just sort of decrement one at a time and
get down to zero, at which point I can
stop doing this thing. So I'm going to
initialize a variable to I, a variable I
to three, and then I'm going to say
while I is greater than zero, go ahead
and do the following. And at the end of
that loop before whipping around again,
I'm going to use this line of code,
which we haven't seen, but you can
infer. IUS minus just means subtract one
from I. So this is going to have the
effect of starting at three, going to
two, going to one, going to zero. And as
soon as it goes to zero, this boolean
expression will no longer be true. And
so the loop will just implicitly stop
because that's it. So what are we going
to put inside of the curly braces
besides this decrementation? Well, I
think I can get away with just saying
meow. And that will now print 1 2 3
times. And yet that's interesting. I
sort of counted in instinctively 1 2 3
even though I'm proposing that we count
3 2 1. So can we implement the logic in
the other direction whereby we count up
from zero instead of down from three.
Well sure we just have to make a few
changes. We can set i equal to zero
initially. We can change our boolean
expression to check that i is less than
three again and again. And on each
iteration of this loop let's just keep
incrementing i with i ++. And at this
point it will have the effect of doing 1
2 3. Three is not less than three. So I
won't put any more fingers up. I will
meow in total three total times. And
again, if you're a visual person, here's
how we might start counting at zero
initially. Check that i is less than
three, which it is initially. And if so,
we print out meow. Then we increment i,
and we get whisked around again to the
boolean expression because that's how
while loops work. You constantly have
the condition being checked again and
again. That's just how C works. As soon
as I've incremented I from 0 to 1 to two
to three, three will eventually not
equal not be less than three. So the
answer will be false. So the loop will
just stop. So that has the effect of
achieving the same. But it turns out
that looping uh some amount of times is
so darn common that you don't strictly
have to use a while loop. A for loop, so
to speak, is another alternative there
too, whereby the syntax is a little
weird. It's a little harder to memorize,
but it allows you to write slightly less
code because you write more code on a
single line. So the way you read a for
loop is exactly the same in spirit. You
initialize the variable everything to
the left of this first semicolon. The
you then check the condition and the
computer does all this for you. If I
less than three, if so, you execute
what's inside of the curly braces and
then automatically the thing to the
right of the second semicolon happens.
So I gets incremented from zero to one.
In this case, the condition is checked.
Is one less than three? It is. So, we
print meow again. And C increments I to
two. Is two less than three? Yes. So, we
meow again. I gets incremented to three.
Is three less than three? No. So, the
for loop stops. So, it's exactly the
same, but just more magic is happening
in this first line of code here more
than you yourselves have to actually
write. And it's just arguably more
common convention. But both of them are
perfectly correct if you'd like to do
that yourself. So let's go ahead and
actually implement now this this
beginning of a cat in VS Code. Let me go
back to VS Code and close agree.c. Let
me reopen my terminal window and create
a actual cat in cat.c. And let's go
ahead and do this initially the wrong
way. Include standard io.h int main
void. And then inside of main let's go
ahead and print out quote unquote meow
back slashn semicolon. And then heck,
let me just copy paste. So this is
obviously the wrong way, the bad way to
do this because I'm literally copying
and pasting. But it is correct. If I
want the cat to meow three times, I can
make this cat. I can do slashcat and I
get my meow meow meow. But let's now
actually use some of those new building
blocks whereby we converted scratch to
C. And let me go back into this code and
I'll do the while loop first. So I could
instead have done int i equals 3. If we
count down initially while I is greater
than zero, then go ahead and print out
quote unquote meow back slashn. And then
do I plus+ or I minus minus?
I minus minus because we're starting at
three. Now let me go back to my terminal
window and clear it. Do make cat again.
Dot /cat and we get three meows. And
this is now arguably better implemented.
What if I want to flip things around?
Well, I could now change uh maybe do it
the normal person way. I could start
counting at zero. And I can do this so
long as I is less than three. And I can
do this so long as I increment I on each
iteration. Now I can do make cat again.
Dot /cat. Enter. And that too works. But
there's another way I could do this. If
I want to count like a normal person,
like start counting from one and count
up two and through three, I could do
this. But this is arguably this is
correct. It would iterate three times.
But it's a little confusing because now
I have to think about what it means to
be less than four. Okay, that means
equal to three. I could be a little more
explicit and say we'll do this while I
is less than or equal to three using yet
another one of those operators. So I can
make a cat yet again dot /cat and that
too would work. Now which of these is
correct or best? The convention
truthfully is in general in code to
start counting from zero. start counting
up to but not through the value that you
want. So at least you see the starting
point and the ending point on the screen
if you will at the same time. But of
course I can condense all of this a bit
more and turn this whole thing into a
for loop. And I instead could do four
int i equals 0 i less than 3 i ++ and
then down here I could do print out
quote unquote meow. And if only because
I typed fewer keystrokes that time like
this feels a little nicer. It's a little
tighter and more uh efficient to create
even though the effect is the same.
Indeed, when I make this cat and do dot
/cat a final time, this here too gives
me the three meows. So, what could go
wrong? Well, sometimes you might be
inclined to do something forever and we
might have done that in Scratch and
indeed we did when we had some things
bouncing back and forth off of walls and
so forth. You can achieve the same thing
in code. In fact, in C we could use a
while loop, but there is no forever
block. So while suffices, but recall
that the while loop expects a boolean
expression. And if I want to do
something forever, I essentially need an
expression here that's always true. So I
could do something stupid and uh
arbitrary like while two is greater than
three or while one is less than two. I
mean make a statement of fact that never
changes air go. It's just going to run
forever. But if the whole goal here is
to do something forever and to get this
boolean expression to be true, the
convention in programming is just to
literally say while true. And that
implies and functionally means that you
will do this thing forever unless you
somehow prematurely break out of those
curly braces. More on that before long.
So if I want to meow forever, I could
now just do this. And this would be an
infinite deliberate loop. But unlike a
game where you might want it to keep
going and going and going for some time,
I'm not sure this is going to be the
best thing for us. Let's go ahead and
try this. So let me go ahead here and
include for good measure uh CS50's
library if only because um it too is
giving us features like uh bools. Uh
here I'm going to go ahead and say while
true and then inside of my curly braces
I'm just going to print out meow. Let's
go ahead back slashn semicolon. Let's go
ahead here and make cat one final time.
Let me go ahead here and do dot
slashcat. And
this is like the annoying cat game. Just
like meowing, meowing meowing endlessly.
Like I've now kind of lost control over
my terminal window. And mark my words,
at some point you might do this, too.
But let's go ahead and take a juicy
10-minute break here. Uh we have some
delicious blueberry muffins out in the
transep. Come back in 10 and we'll
figure out how to stop this here cat.
All right, so it's been about 10 minutes
and like VS Code is freaking out with
high code space, CPU utilization
detected. Consider stopping some
processes for the best experience. So
this is what happens when you have
intentionally or otherwise an infinite
loop in so far as I've been printing out
meow endlessly. And I was warned by my
colleague that I probably shouldn't let
this run too long because we might lose
control over the environment altogether.
But the answer to how to solve this is
going to be control C. So there's a few
cryptic keystrokes that you can use to
generally interrupt things as in this
way. And in fact, if I go back and
you'll see, yeah, I kind of lost control
over my code space here. I'm going to go
ahead and try to reload the window
altogether. But had I hit control C in
time, let's hope this doesn't now go off
the rails.
C would have been our friend. There we
go. And we're back. Okay. So, now that
we've got control over our so-called
code space again, how can we go about
making our meowing program a little more
dynamic in so far as let's like start
asking the user how many times they want
the cat to meow. Certainly, rather than
do it an infinite number of times and
even rather than do it three times
alone, I think we have all of these
building blocks thus far. So, let me go
ahead and stay in cat.c here and go
ahead and delete the body of the
contents of my main function. And let's
go ahead and do this. Let's give myself
an int. And I'll go ahead and call it n
for number. Though I could be more
verbose than that if I wanted. I'm going
to set it equal to the so-called return
value of get int, which recall is going
to get an integer from the user. And
quote unquote, let's ask the user what's
n just like I asked earlier, what's x
and what's y, where n is the number of
times I want the cat to meow. Now, how
can I use this variable? Well, we have
that building block, too. I could use a
while loop or a for loop. And if I use a
for loop, I could do this. I could
initialize a variable i for integer, set
it equal to zero initially. I could then
do I less than not three this time but
n. So I can use that variable as a
placeholder inside of the loop to
indicate that I want to do this n times
instead of three. And on each iteration
through this loop I can do i ++. Of
course I could be counting down if I
prefer uh by using decrementation. But
logically I would say this is canonical.
Start at zero and go up to but not
through the value that you actually care
about. And I'll go ahead now and print
out quoteunquote meow with a back slashn
semicolon. Back down to my terminal.
Make this cat again. Dot slashcat.
Enter. I'm prompted this time for n. I
can still give it three and I'm going to
get three meows this time. However, if I
run it again with dot /cat and a
different input like four, of course,
I'm going to get four meows instead.
Now, what is get in doing for me? Well,
it does a few things similar to getch
doing a few things for me. For instance,
suppose that instead of answering this
question correctly with a number n, I
say something random like dog that is
not an integer. And so the get in
function is designed to reject the
user's input implicitly and just
reprompt again and again. Uh I can try
bird and it's going to do this again. So
somewhere in the implementation of get
in, there's a loop that we wrote that
does this kind of error checking for
you. But it doesn't do everything
because an integer is a fairly broad
category of numbers. It's like negative
infinity through positive infinity. And
that's a lot of possibilities. But
suppose I don't want some of those
possibilities. Suppose that it makes no
sense to ask the cat to meow like
negative one time. And yet the program
accepts that. It doesn't do anything or
anything wrong. But I feel like a better
designed program would say, "No, no, no.
Negative one makes no sense. Let's meow
zero or one or two or more times
instead." So, how can I begin to add
some of my own error checking and coers
the user to give me the type of input I
want? Well, let me clear my terminal
window and go back up into my code. And
why don't I do something like this?
After getting n, let's just check if n
is less than zero. Because if so, I want
to prompt the user again. And I can
prompt the user again by doing n equals
get int quote unquote what's n question
mark semicolon. Now what's going on
here? Well on line six I'm doing two
things. I'm getting an integer from the
user and I'm not only storing it in the
variable n. I'm also technically
creating the variable n. So, I didn't
call this out earlier, but on line six,
when you specify the type of a variable
and the name of the variable, you are
creating the variable somewhere in the
computer's memory. And that's necessary
in C to specify the type. If the
variable already exists though, and you
just want to reuse it and change it
later on, it suffices as in line 9 just
to reference it by name. It would be
sort of stupid to specify the type again
because C already knows what type it is
because you told C what it is on line
six. So that's why lines six and nine
are a little bit different. So let's see
how this now works. Let me go back to my
terminal window and remake this cat. Let
me do dot /cat again. Let me not
cooperate and type in like negative one
again. And notice I am reprompted this
time. Fine, fine, fine. Let's type in
three. And now it works. But you can
perhaps logically see where this is
going. Let me go ahead and run this
again. Dot /cat. Type in negative 1.
Type in negative one. And huh, it didn't
prompt me again. But that's consistent
with the code. If I hide my terminal
window here, you'll notice that I've got
one maybe two tries to get this question
right. And after that, there's no more
prompting of me. Now, you can kind of
imagine that this is probably not the
best way to do this. If I were to go
inside of line nine and then move the
cursor down and say, "Okay, well, if n
still doesn't uh is still is less than
zero." Well, let's just do get int again
and ask what's n question mark. And
heck, okay, if it's still less than
zero, well, let's just keep asking the
same, right? Why is this bad?
I'm repeating myself. I'm essentially
copying and pasting even though I'm
retyping. I mean, this just never ends,
right? Like, how many chances are you
going to give the user? In spirit, you'd
hope that they don't un uh not cooperate
this many times. But really to do this
the right way, we should probably prompt
them potentially as many times as it
takes to get the correct input. So this
is not the right path for us to be going
down. But of course, we have already now
this notion of like a loop whereby we
could just do this in a loop. Ask the
question once and maybe just repeat the
question again, but the same question.
So how might I do this? Well, let me go
ahead and delete all of this. And let me
just try to spell this out logically.
So, I want to get a variable n from the
user. And let's go ahead as follows.
While true. I know how to do infinite
loops now. And even though that created
a problem for me with the cat, I bet we
can sort of terminate the loop
prematurely like I proposed earlier as
follows. I could do this int n equals
get int and ask the user again what's n
question mark. And then I could do
something like this. If n is less than
zero, well then you know what? Go ahead
and just continue on with the same loop.
Else if it is not the case that n is
less than zero, what do I want to do? I
want to break out of this loop. So this
is new syntax. This is something you can
do in C whereby if n is less than zero,
fine. Continue means go back to the
start of the loop and do the same exact
thing again. Otherwise, if you instead
say break, it means break out of the
loop and go to below whatever curly
brace is associated with that loop. So,
continue essentially brings you to the
top. Break brings you to the bottom, if
you will. So, logically, I think this is
right, but this code curiously isn't
quite going to work and get me a value
for n. Let me go ahead and open my
terminal window again. Let's make this
cat. And, huh, cat. C line 19 character
25 is an error. Use of undeclared
identifier N. Well, what does that mean?
Again, cat. C line 19. Let me hide my
terminal window. Highlight line 19. N is
being used in line 19, but I created it
in line 8. And so what's the problem?
Why is it not declared seemingly? Yeah,
>> because you are using like within the
loop that you wrote.
>> Yeah, this is a subtlety, but I'm using
I'm creating N inside of this loop. I
mean, literally between the curly braces
on lines 7 and 17. The implication of
which because of how C works is that
that variable only exists inside of that
for loop. This is a problem of what's
known as scope. the variable n only
exists inside of the scope of the while
loop in which it was declared. So how do
I actually fix this? Well, I need to
logically somehow declare that variable
n outside of the loop so that it exists
later on in the program as well. And
there's a few different ways I can fix
this, but the best way is probably to
move the the declaration of n, so to
speak, the creation of n outside of the
curly braces and maybe kind of squeeze
it in here below line five. So still
inside of main, whatever that is. More
on that next week, but in the same curly
braces as everything else. So I can in
fact do this, and this is where the
syntax gets a little bit different. I
can solve this quite simply as follows.
I can go down to a new line six and just
say int n semicolon and that's it. This
declares a variable called n. It creates
a variable called n. And initially it
doesn't give it any value. So who knows
what's in there. More on that another
time. But now on line 9, I don't need to
recreate it. I just need to assign it a
value. And because now n has been
declared on line six and between the
curly braces on line five and all the
way down on 24. Now n is in scope so to
speak for the entirety of this code that
I've written. So let me reopen my
terminal window and clear that old
error. Let me do make cat again. Now the
error messages is gone. Let me go ahead
and do /cat. What's n? Now I'm back in
business and I can do three for meow
meow meow. Better yet, because I'm
inside of a loop now, watch that I can
do negative 1gative 1gative 1gative
1gative -2g350.
Finally, I can cooperate with something
like three. And because I'm in a loop
that by design may very well go
infinitely many times until the user
actually cooperates and lets me break
out of that exact loop. Now, I strictly
speaking don't need both continue and
break. I wanted to demonstrate that both
exist, but this is like twice as much
code than I actually need. If logically
I just want to break out of this loop if
and only if n is greater than or equal
to zero because I'm sort of comfortable
with the idea of zero meows but negative
makes no sense. Well, I can just flip
the logic. I can say if n is greater
than or equal to zero then go ahead and
break. And I've tightened up the code
further. I could technically do
something else. I could say something
like if n is less than zero, but wait a
minute. I want to negate that. You can
start to do tricks like this. An
exclamation point with some additional
parentheses. So you can invert the
logic. It's arguably a little hard to
read. Even though that would be
logically correct. So I'm just going to
say more explicitly as before. If n is
greater than or equal to zero, break out
of this here loop. All right. So this is
one way to use an infinite loop. But it
turns out there's another construct that
you can do altogether that is in a
feature of C. Instead of using a while
loop and forcing it to be infinite by
using while true and then eventually
manually breaking out of it, there
exists another type of loop altogether
and that's called a do while loop. And
you can literally say the word do which
means do the following. Then you can do
exactly what we did before n equals get
and quote unquote what's n question
mark. So exactly like before but then
after those curly braces you use a while
keyword. So at the end of the loop
instead of the beginning and that's
where you put your boolean expression. I
want to do all of that while n is less
than zero. So you can kind of invert the
logic and now kind of tighten things up
further by just telling the computer do
the following. What's the following?
Everything in between those curly braces
while n is less than zero. And this
implicitly handles all of the
continuation and all of the breaking by
just doing what you've said. Do this
while this is true. But the difference
between this dowh loop and a normal
while loop is literally that the
condition is checked at the bottom
instead of the top. So when you say
while parenthesis something that
question is asked first and then you
proceed maybe this condition is only
asked at the very end. And why is this
useful? Well often time when writing
programs where you want to do something
at least once like you obviously want to
ask the user this question at least
once. There's no point in asking a
question like while true or while
anything else. You should just do it and
then you should do it again if the
expression evaluates to true and tells
you to do something. Now you haven't
played with these loops yet most likely
unless you have programmed before. Uh
there's a fun sort of meme that's
apppropo of this moment. So let's see if
this maybe causes a few chuckles. If you
remember Looney Tunes here,
is this funny for people in the know?
There we go. Thank you. Okay, this
doesn't make sense. It eventually will.
And it still might not be funny, but it
will at least make sense. And it
illustrates the difference between doh
while loop like the roadrunner is
stopping because he's checking the
condition. While not on edge, he'll run.
But if he is on the edge, he's not going
to proceed further. But of course, the
coyote here, he's going to do running no
matter what. And then only too late.
Does he check? Haha. He's still on the
ed. All right. So, ah, thank you. All
right. Now, you're cool. All right. So,
many more memes will now make sense as a
result. But let's go ahead and revisit
this code and maybe do something a
little bit different here whereby we no
longer want to just fuss around with
some of these uh conditionals and these
loops. Let's actually make the software
a little better designed. And to do
this, we'll revisit an idea that we
touched on last week and had to do with
problem set zero, which was like create
your own function. Like C does not come
with everything you might want. CS50
library is not going to come with
everything you might want. And at the
end of the day, a lot of programming is
about abstracting away your ideas. So
you solve a problem once and then reuse
it, reuse it, reuse it. And heck, you
can package it up in a so-called library
like we have and let other people use it
as well. So here for instance in Scratch
is how we could have implemented the
notion of meowing as by getting the cat
to play the sound meow until done. We
abstracted it away and then we had a
magical new puzzle piece called meow in
C. This is going to be a little weird
today but next week these details will
start to make more sense. You would
instead do the following. Literally type
void the name of the function you want
to create and then void again in
parenthesis. For now know that this is
the return value of the function. So
void means it returns nothing. This is
the input to or the arguments to the
function. Void means it takes no inputs.
And that makes sense because literally
meow doesn't return anything. It doesn't
take anything. It just meows. It has a
so-called side effect audibly last week.
So this means hey c invent a function
called meow that takes no input,
produces no output, but does have a side
effect of printing meow on the screen.
Meanwhile, if I wanted to do something
like this in code last week where I
meowed three times, well, that's fine.
We have the building blocks for this.
And here's where inventing your own
function starts to get more compelling.
I can abstract away the notion of
meowing now. Like, this doesn't come
with C. It doesn't come with the CS50
library. I just created in the previous
code this meow function. So, I can
encode with a for loop and that new
function meow three times. But I can
abstract this away further. Recall that
the refinement in Scratch last time was
this. I could edit the new function and
I can say it actually does take an input
otherwise known as an argument called n.
And I clarified that this means to meow
some number of times. And then inside of
those scratch blocks, I repeated n times
the meowing act. Well, in C, I can
achieve the exact same thing. Even
though it's going to look a little more
cryptic, but meow still returns nothing.
It has a audible or visual side effect,
but it doesn't return a value. But this
version does take an input. And this
might look a little weird, but just like
before, when you create a variable in C,
you specify the type and the name. When
you invent your own function in C and it
takes one or more inputs, aka arguments,
you specify the type and the name of
those as well. No semicolons up there,
just inside of the parenthesis. And
you'll get used to with practice this
convention. But the rest of this code is
exactly the same, except instead of
three, I'm now using n. So again, I'm
just composing the exact same ideas as
last week, even though it looks way more
cryptic this week, but it will come more
and more familiar with more and more
practice. So how can I go about
implementing this myself? Well, let me
propose that we do something like this.
Let me go back to VS Code here and let
me go ahead and let's really delete most
of the code that I've written inside of
Maine. And let me just suppose for the
moment that meowing exists. And I'm
going to go ahead and say for the first
version for int i equals zero i less
than three. So we're not going to take
input yet. i ++. And then I'm going to
go ahead here and say meow is what I
want this function to do. Now if I
scroll back up, you'll see there's no
definition of meow yet. So I'm going to
invent that too. I'm going to go up here
and say void. Uh meow void. And again
this first version means no input, no
output, just a side effect. And that
side effect super simply is going to be
to say just quote unquote meow with a
back slashn. And now if I go and open my
terminal window, clear it from before,
do make cat, so far so good. /cat, we're
back in business, but I've abstracted
the function away. Now, much like last
week where I sort of dramatically
dragged the meow definition way down to
the bottom of the screen just to make
the point that you don't need to see it
anymore. Out of sight, out of mind. Let
me sort of try to do the same here. Let
me highlight and delete that and like go
way way way down arbitrarily just to be
dramatic and paste it near like the
hundth line of code and scroll back up.
Now out of sight, out of mind. I've
already implemented the idea of meowing.
We don't need to see or talk about it
again. But there is a caveat in C. When
I now clear my terminal and make this
cat, now I've introduced a problem and
there's like more problems it seems than
code. Let me scroll back up to the first
such error and you'll see this on line
nine of cat.c See character 9, there's
an error. Call to undeclared function
meow and then something fairly arcane,
but that means that meow is no longer
recognized as an actual function. I know
that it doesn't come from CS50.h, and I
know it doesn't come from standard.io.h.
It's just down there. But why is the
compiler being kind of dumb here? Uh,
yeah.
function.
>> Yeah, because in so far as the first
version worked like logically it would
seem that putting it at the bottom was
just a bad idea because C compilers are
fairly simplistic. Like they won't
proactively do you the favor of like
checking all the way down at the bottom
of the file. They're going to take you
literally. So if meow doesn't exist as
of line 9, that's on you. Like that is
an error. So I could fix this by just
undoing what I did and move it way back
up to the top. But let me argue that in
general when writing C programs, the
main function, which I keep using and
we'll talk more about next week, is
literally meant to be the main part of
your code. And so it kind of stands to
reason that it should be at the top
because when you open the file, it'd be
nice to see the main program that you
care about, the main function. So
there's an argument to be made that it's
a little annoying to have to put my
functions all at the top, which is just
going to push main further and further
down. So there is a solution, and this
is dare say the only time copying and
pasting is appropriate. Let me delete
most of these blank lines which is
unnecessarily dramatic and just move it
below main as over here. The way I can
uh the solution here though is to do
this to copy the first line of the main
function its so-called signature and
then just put that one line and only
that one line with a semicolon above
main. And this is what's known as a
prototype. So a prototype is just a bit
of a hint to the compiler, a promise if
you will, that hey compiler, there will
exist a function called meow. It takes
no input and it returns no output
semicolon. And it's on the honor system
that it will eventually exist later in
the file. We'll talk more about this
next week why that works, but this is
sort of a promise to the compiler that
it will eventually be defined. Now, what
I've done here on line four as an aside
is what's generally known as a comment.
I just wanted to put on the screen
exactly what I was verbalizing. Anything
in C that starts with slash is a note to
self, like a sticky note in Scratch,
which is just for the human, not for the
computer. And it's a way of reminding
yourself or someone else what's going on
on that line or those lines of code. But
I'll go ahead and delete it for now is
unnecessary because now if I go back
into my terminal and clear those errors,
make this cat again, now it does work
because the cat uh the meow function has
been defined exactly where it should be.
And now I can make the new version of
this uh cat even better. I could change
the function meow to take a variable n
as input for the number of times. And
then in here I could do something like
my for loop for int i equals z i less
than n i ++. And then in this for loop I
can print out quote unquote meow. And
then I'm going to have to change this
too because I have to copy and repaste
it if you will or just manually fix
that. But now I can get rid of all of
this and do meow three for instance. And
this now will be the second version of
the scratch code. If you will make cat
still going to work exactly the same.
Meow meow meow. But now I've implemented
my own function that does take input
even though it doesn't happen to return
any output.
All right. Questions
on any of these examples just yet?
confusion.
All right, let me add one other feature
to this to demonstrate that we can take
not only input but actually produce
output if we want. If I go back into
this code here, let me propose that it's
a little silly to be hard coding that is
fixating three. It'd be nice to get
input from the user. So I could do this.
I could use int n equals get int and say
something like what's n question mark
and then I could pass n in if only to
demonstrate a couple of things. So one
now the program is dynamic. I'm going to
ask the user how many times to meow and
I'm going to pass in that value n. Now
this deliberately is confusing at the
moment because wait a minute I got n
defined here used here but then
redefined here and then reused here. So
it turns out that even if you create n
up here and use the name n, no other
functions can see it for that same issue
of scope. So for instance, suppose I
didn't quite remember this and I sort of
naively just said void. Meow doesn't
need to take any inputs because heck
meow uh n is already defined in main.
Let me go ahead and open my terminal and
clear it. Make cat and see what error
comes out here. Well, error cat. Oh,
sorry. I made two mistakes here. Let me
I also have to change the prototype up
here to say void which means again meow
takes no inputs. Let me go ahead now and
rerun make cat. And there we have an
undeclared identifier again n. So in cat
line 14 which is here it doesn't like
that I'm using n. But wait a minute I
created n here but for the same logic as
earlier. That's fine. You created n on
line 8. But where does n exist? In what
scope?
Yeah, only between the curly braces,
which is lines seven and 10. So by the
time you get down to 14, it's out of
scope, so to speak. So it just doesn't
work. So the solution is exactly what I
did the first time. I can pass it into
meow as input, and I have to tell C to
expect that input. And I can use the
same name, but arguably that's going to
get confusing sometimes. But let me do
this. Let me go back into my code. Let
me undo this change such that now meow
does take an input, but instead of just
calling it n and using n everywhere for
number, this is crazy. Let's just call
this like times. So meow takes some
number of times and then it uses that
value. And now I'm passing in on line 9
n, but in the context of the meow
function on lines 12 onward, that same
variable n is now referred to as times
because you're passing it in as input
and giving it its own name. And that's
totally your prerogative. It's just a
matter of scope. I mean, I could have
called it M or some other letter of the
alphabet, but times is even more clear
because that's the number of times I
want the cat to meow. But again, the
whole point here is just this matter of
scope.
All right. So, let's take a higher level
look now at some of the things we've
been thinking about and then we'll do a
final deep dive or two on some of the
corner some of the problems that we can
solve with all of these building blocks
and some of the problems that we're sort
of ignoring for now. So, when it comes
to writing good code, CS50 and really
the world in general tends to focus on
these kinds of axes. Correctness,
design, and style. What does this mean?
Correctness just means does the code
work the way it's supposed to? In the
context of a class, it should do exactly
what the homework assignment aka problem
set tells you to do. In the real world,
it should do exactly what someone
decided the software should do, the
product manager, the CEO, or the like.
Correctness just means it behaves as it
should. That's different though from how
well designed the code might be. And
we've seen that a few times. I've had
some simplistic examples in Scratch and
C that were 100% correct. Like it did
the right thing logically, but I was
wasting the computer's time. I was
wasting the human's time by asking more
boolean expressions than I needed to and
so forth. So design is more about like
in the in the world of English like not
only saying things that are correct but
doing it well like in making a good
cogent argument not just one that
happens to be correct. Style meanwhile
is the third axis on which we might
evaluate the quality of someone's code
and that's more of the aesthetics like
is everything pretty printed that is
nicely indented are variables well-
named and not just called XYZ
arbitrarily or something like that. So
style matters really to other humans,
not to the computer, but to other
humans. And to illustrate these, you'll
see that in problem set one onward,
you'll be given a number of tools that
you can use. So one of those tools is
called check 50. And in each problem set
problem in C and Python and other
languages, you'll be showed how you can
test your own code. And you can
literally run a command that CS50
created called check 50. You'll then
specify what's called a slug, which just
means a unique identifier for that
homework problem. and you'll get uh
quick feedback on whether or not your
code is correct. It doesn't mean it's
well implemented or well-designed or
pretty that is well stylized. But at
least that's the first gauntlet in
getting good code submitted. Design
though is much more subjective. Design
is something you get feedback on from a
human for instance in section or a
teaching assistant or in software. You
can actually see at top VS code there's
a couple of buttons that I haven't yet
used but could. Design 50 is built on
top of the CS50 duck whereby if you have
a program open in a tab, you click
design 50, you will get chatgpt like
advice on how you can improve not the
correctness of that code but the design
of that code, the quality thereof, which
is a bit more subjective and modeled
after what a good teaching assistant
might say. Style 50, meanwhile, is a
third tool that will provide you with
feedback on the style of your code and
will show you on the left what your code
looks like and on the right what your
code really should look like in so far
as it should be consistent with what
we've taught in class and consistent
with CS50's so-called style guide. And
those of you who have some prior
programming experience undoubtedly won't
like some of CS50's stylistic choices.
And that's going to be the case in the
real world, too. But as I alluded to
earlier, in typical companies, you would
have an official style guide or tool to
which everyone adheres so that
everyone's code actually looks the same
as everyone else's even though people
have contributed different solutions to
problems. So correctness, design, style
is not only how we but really the world
at large tends to evaluate the quality
of code and we do it by way of these
CS50 specific tools here. All right, how
about one final flourish then to this
here program? Back in VS Code, I've got
a correct solution right now. Um, it's
well styled, I'll stipulate, even though
it could stand to have some more
comments. So, for instance, I could do
something like this, like meow uh some
number of times, a comment to myself. Or
up here I could say something like uh
get uh a number from user just to remind
myself and my TA or my colleague what it
is this code is doing. But what more
could I do in the way of design? Well,
this function here get in will indeed
get me an integer but not just positive
or zero but negative. And I could go in
and add a bunch of code like before like
I could actually do instead of this line
I could do something like int n
semicolon do the following. All right. n
equals get int and then I can say what's
n question mark and then after that I
can do something like while n is less
than zero keep doing that so I can have
a pretty verbose implementation of
getting user input or I can implement
another function of my own that only
gets a positive integer or non- negative
integer from the user for instance I
might do something like this uh I could
uh declare at the bot uh maybe below my
main function a function like this uh
int uh how about get n and then inside
of this I might say void because I'm not
going to pass in any input then inside
of this function is where I'm going to
do int n do while uh n equals get int
quote unquote what's n question mark and
then down here I'm going to do while n
is less than zero but rather than do
something immediately with n because I'm
no longer inside of my so-called main
function. What I'm going to do, which is
new, is return this value n. And notice
that this notion of returning a value,
which is the first time I've done this
explicitly, is consistent with this
little hint here on line 19, which
implies that this get n function, which
I'm inventing, is going to return not
void, which means nothing, but an
integer. And that's the whole purpose of
this function in life. Now, if I scroll
back down here, I can get rid of this
whole block of code and just say get n
from the user and then I can immediately
call meow with that value. I need to do
one other thing. I need to highlight
this line of code here and I'm going to
go ahead and add another prototype up
top, which is the only time again for
now that copy paste is encouraged and uh
best to do. So, I've invented my own
function getn. The whole point being now
I have this sort of abstraction here of
a function whose sole purpose in life is
to get me not just an integer but one
that is zero or positive and not
negative. If I open my terminal window,
clear the mess from before, make this
cat dot slashcat. What's N3? I'm now
back in business. And again, we've
essentially translated from scratch last
time into C this time. Exactly how we
might modularize now the code. abstract
away these lower level details and
ultimately create my own function that
as before takes not only arguments but
in this case has not only side effects
or doesn't have side effects but rather
a return value this time.
All right. So as you walked in we had a
little walkthrough of Super Mario
Brothers playing from yester year which
was a sidescrolling game in which Mario
would jump down and go up down left
right and try to collect coins and make
it to the end of the level. There's a
lot of obstacles throughout this kind of
game uh whereby the world might look a
little something like this. Like there's
a pit that Mario's got to jump over and
then there's these coins hidden
typically behind these question marks
that he can jump up and hit his head
with and actually acrew points. Now,
we're not going to do anything graphical
just yet. We're leaving graphics behind
for now in the form of scratch. But with
C, we can implement some of these ideas.
For instance, if I were to write code to
generate just this uh row of four
question marks, I dare say there's a
bunch of ways we can do this. In other
words, let's see if we can't use all of
today's building blocks to start
implementing our own tiny version of
Super Mario Brothers in a file, say,
called Mario.c. So, let me open and
clear my terminal window. Let me run
code Mario.c. And let's just try to do
something super simple like print four
question marks in a row. Well, to do
this, I need print f. So, I'm going to
include standard io.h. I'm then going to
do int main void. More on that next
time. And inside of main, my default
function that just automatically as
before gets called for me. I'm going to
print out the simplest possible
implementation just print out four
question marks like that. So no need per
se for a loop just yet. But I think we
can go down that rabbit hole too. Let me
go down into my terminal window. Make
this version of Mario dot / Mario.
Enter. And voila, we have a very black
and white version textual version of
four question marks in the sky. Now I'm
kind of cheating here by just hard-
coding four question marks. What if I
wanted not four but three or five or
some number other number? Well, we could
do that with a loop too. So let me
change this code here and do something
like this. Four int i equals say zero. I
less than say four for now. I ++ then
inside of this loop I can print out one
question mark at a time. Semicolon. Now
let me go back to the bottom. Make this
version of Mario dot / Mario. Enter. And
voila. It's not actually correct this
time. So why am I getting a column
instead of a row with this here change?
Yeah.
>> Yeah. So I've got I foolishly included
the backslash n after each question
mark. Okay. So that seems like an easy
fix. Let me get rid of that. Let me now
recompile Mario. Rerun Mario. And now so
close. Now I've just done something
stupid. All right. I need the back
slashn. So, I think I do want this here.
Or
what do you propose instead?
>> Yeah, I should really put the back slash
in outside of the loop. So, once I'm
done printing all of the question marks,
then I get the backslash. And that's
fine, even though we haven't seen this
before. Back slashn is an escape
sequence that you can certainly print by
itself. So, I do quote unquote back
slashn outside of the loop below those
curly braces. Now, if I do make Mario
dot slashmario, now I get the four uh
question marks in a row as well as the
new line at the very end. So, again,
kind of a little baby exercise, but
demonstrative of how you can just take a
few different techniques, a few
different building blocks we've used to
compose a correct solution to what a
moment ago was a brand new problem.
Well, let's try another. So later on in
Super Mario Brothers when you go into
sort of the underground world, you see
or rather it's still above ground, you
see a column of uh bricks like this that
he has to jump over. So those here, how
might we make a column? Well, we kind of
had that solution already. And in fact,
if I go back to VS Code here and just
change this version of Mario, I think we
can design this thing to be pretty
simply the same. I is less than three
though. And I do want to put the back
slashn at the end there. Make Mario dot
/ Mario. And albeit textual, I've got my
column of three uh of let's see, I don't
want question marks. Let's make this a
little better. Maybe we'll use the hash
symbol because that kind of sort of
looks like a square. So, make Mario dot
/ Mario. Okay, now we're back in
business. But let's make it more
interesting by going into Mario's
underground now. And here's the third
and final Mario problem whereby we want
to implement like this 3x3 grid of
bricks circled here. So, this one's
interesting because we've never done
something in two dimensions. I did
horizontal, I did vertical, but we
haven't really composed those ideas into
the same. So, let me now think a little
harder this time about how I can print
out row, row, row. And this is where if
you have in your mind's eye any
familiarity with like old school
typewriters, it's kind of the same idea
where you want to print a row of bricks,
then go back to the beginning, a row of
bricks, then go back to the beginning,
and a row of bricks. And that's kind of
what print f has always been doing for
us. It's printing line by line by line
of text. It's not jumping around. So, we
can leverage that perhaps as follows.
Let me go into my main function here.
And if I want to print out something
two-dimensional, let me kind of think
about it as rows and columns. So, maybe
I could do this for int i equals 0, i
less than 3, i ++. Why? Well, I want to
do something three times. Even if I have
no idea where I'm going with this
solution, I at least want to do
something three times, like three rows
of text. But how about this? On each
row, what do I want to do? I want to
print out three things. So I could steal
this idea like int i= 0, i less than 3,
i ++. And then inside of this loop, let
me just print out one brick at a time.
No new lines yet. One brick at a time.
But there is a bit of a problem here.
This is correct to nest loops in this
way. Totally fine to have an outer loop.
Totally fine to have an inner loop. But
I probably don't want the inner loops
variable competing with the outer loops
variable by giving them the same name.
And that's fine. It is pretty
conventional in code when you want
another integer and it's not I because
you've used it already. Fine. You can
use J. So using I and J and K is
generally fine. If you're using L, M, N,
O, like at that point, you're probably
doing something wrong. There's no hard
line, but at some point it gets
ridiculous and you should be coming up
with better variable names. But I and J,
maybe K is fine. So now what's really
happening? Let me suppose that this is
my uh for each row. This is my for each
column I want to print one brick. Now
this isn't quite correct but let me go
ahead and make this version of Mario dot
/ Mario and ah now there's what? One,
two, three. There's nine bricks there.
So I'm close, right? It's supposed to be
3x3. Nine total. What do I want to do
though to get this just right?
Yeah, over on the left. Yeah. What on
what line number would you or afterward?
Uh where would I put the new line?
Because I think I don't want to put it
here because I'm going to get myself
into trouble as before. How about in
back?
>> After the what?
>> After 13. Yeah. So, after I finish
printing each uh brick in the column
from left to right, I'm going to go
ahead and print out I think a single new
line here, nothing else. And now, if I
open my terminal, run Mike Mario again,
dot / Mario. Now, we've got it. And it's
not a perfect square like this one is
because like the hashtags are kind of
more vertical than they are horizontal,
but it's pretty darn close. The e the
takeaway here being you can certainly
nest these kinds of ideas and compose
them. And honestly, INJ is maybe making
this uh more confusing than necessary. I
could just give these better names like
row, row, row, and then maybe call for
column or column. I can spell it out if
that's clearer. Column column just to
make clear to myself, to my TA, to my
colleagues what exactly these variables
represent. And indeed, like an old
school typewriter, the outer loop is
handling row by row by row. But each
time you're on a row, you first want to
do column, column, column, column,
column, column. And that's what
logically the nesting is achieving. And
again, if I do make Mario dot/mario, all
I've done is change variable names. It
has no functional effect beyond that.
Now, this is a little more subtle, but
there is a bit of duplication in this
program. There's a bit of magic, and
this is subtle, but does anyone want to
conjecture what still could be improved
here?
What is maybe rubbing you the wrong way?
>> Yeah, I've hardcoded the three here and
here. It's not a big deal. It's like an
in-class exercise. Like, who really
cares if I'm just manually typing three.
But if I want to make this square bigger
and bigger and bigger over time, I'm
going to have to change it in two
different places. And I've conjectured
last time and today eventually that's
going to come back and bite you. You're
going to do something stupid or a
colleague isn't going to realize you
hard-coded three in multiple places.
Like just bad design. So, how could we
fix this? Well, we could just declare a
variable like n, set it equal to three,
and then use n in both places. And
that's pretty darn good. That's better
because now we're reusing the value. But
we can do one better than this. It turns
out in C and in many languages too,
there's the notion of a constant whereby
if you want to store something in a
variable, but you want to signal to the
compiler that this value should never
change. And better still you want to
prevent yourself a human let or not not
to mention a colleague from accidentally
changing this value you can declare it
to be constant or const for short. So if
I go back into VS code on line five now
and say constint that means that n is an
integer that has a constant value. So if
I do something stupid later in my code
and I try to set n equal to something
else the compiler won't let me do that.
It will protect me from myself. So, it's
just a slightly better design as well.
All right, questions on any of these
here, Mario
examples. The first of our sort of real
world problems, albeit simplified
textually.
All right, let's focus lastly on things
we can't really do well with computers.
Uh, namely some of the limitations
thereof. So, here is a cheat sheet of
some of the operators we've seen thus
far. We played with these with
comparison and uh doing some uh addition
or the like but here we have addition,
subtraction, multiplication, division
and the modulo operator which is
essentially the remainder operator which
you can do with a single command uh with
a single operator like this. Let's use
some of these to make our own calculator
and see what this calculator can and
can't do for us. So back here in VS
Code, let me open my terminal. Let's go
ahead and create a program called
calculator C. And in this program, let's
do something super simple initially that
just like adds two numbers together. So
let's include first uh cs50.h so we can
use our get functions. Then let's go
ahead and include standard io.h so we
can use print f. Let's just copy paste
our usual ma uh int main void. And
inside of main let's do this. Declare a
variable x. Set it equal to get int. And
let's ask the user what's x question
mark. Then let's declare another
variable y. set it equal to get int and
ask the user what's y question mark.
Then let's do something super simple
like give me a third variable. Heck,
we'll call it z. Set it equal to x + y.
And then lastly, let's just print out
the sum of x + y. So this is a super
simple calculator for addition of two
numbers. Print f quote unquote. What's
the answer going to be? Well, it's not
percent s. This was quick earlier.
What's the placeholder to use for an
integer?
percent I back slashn and what do I want
to substitute for that placeholder
just z in this case we haven't quite
done this before but again it's just the
composition of some of our earlier ideas
I can go ahead and make this calculator
enter dot slashcal enter what's x is one
what's y is two and indeed I get three
so not a bad calculator it seems to be
working correctly but it's maybe not the
best design like it's generally frowned
upon to create a variable like Z if
you're only going to use it a moment
later in one place. Like why are you
wasting my time creating a variable just
to use it once and only once? Sometimes
it's fine if it makes your code more
readable or clearer. And in fact, it
might if I called it sum. Like that's
arguably a net positive because I'm
making clear to the reader that it's the
sum of two variables. But even then, I'm
quibbling. I could just get rid of that
third variable altogether. And heck, I
could just do x plus y right here.
That's totally fine and reasonable,
especially since it's still a pretty
short line of code. It's not hard for
anyone to read. Feels like a reasonable
call. But this hints at again my comment
on design being subjective. There's no
steadfast rules here. Some of the TAs
might disagree with me, but like h this
feels fine. It's readable, which is
probably the most important thing
ultimately. Let's make this calculator
dot /cal enter 1 2 and we still get
three. So the code now is still working.
As an aside, if you're starting to
wonder how I type so fast, sometimes I'm
kind of cheating with autocomplete. So
if I know I want to create a program
called calculator and calculator.c
exists, I can start typing c
tab and you can hit tab to sort of
autocomplete the rest of the file name
if it happens to exist there. Better
still, if I want to go back to previous
commands I've typed, I can actually use
my up and down errors to go through my
history. So if I go up up, you'll see
all of the recent commands I typed, and
that saves me time, too. So just little
keyboard shortcuts that speed things
along. All right. All right. Well, let's
do something like this. Not just
addition, why don't we use some
multiplication? So, how about we prompt
the user not for two um numbers, but how
about just one initially x and let's go
ahead and multiply x by two. And I would
do x asterisk 2, which is the
multiplication operator in C. Let's make
this version of the calculator dot/cal.
And now, what's x? Let's do 1. So 1 * 2
is 2. Let's do this again. Let's type in
2. 2 * 2 is 4. Let's do this again. 3. 3
* 2 is 6. and so forth. That's fine. It
seems to work. But maybe let's implement
like a recent meme from the past year or
two. How about this? Let's uh let's see
if you recognize it as we go. So, I'm
going to get rid of this code al
together. And inside of my calculator,
I'm going to do something like int
dollars equals $1 by default. Then I'm
going to deliberately induce an infinite
loop just for demonstration sake. Then
I'm going to do a character from the
user and say something like this using
getch char which gets a single
character. Uh, how about I'll tell the
user here's this many dollars percent I
with a US uh dollar sign before it
double it and give to next person
question mark if you're familiar with
that one and I'm going to prompt them
for yes no answer but I'm going to plug
in the current number of dollars so they
know what they're wagering on then below
this I'm going to say if the character
the human typed in equals equals y for
yes then I'm going to go ahead and do
dollars times equals 2 which recall was
our shorthand notation
for doubling something. Uh, in this
case, I could more pedantically say
equals dollars* 2. But again, I can save
some keystrokes and do dollar uh times
equals 2 instead. There's no plus+
there's no star star trick asteris
asterisk trick. You have to do it in
this way uh minimally. However, if the
user does not want to double it and give
it to the next person, then let's do an
else and just break out of this infinite
loop altogether. But notice what I've
deliberately done in get char similar to
print f. I have included a placeholder.
Why we implemented getchar and get in
and get string just like print f in that
you can pass in placeholders and plug in
values. Why? Well again for the meme
sake I want to be able to tell the user
how much money I'm about to hand them
when I ask them the question. Do you
want to double it and give it to the
next person? I want to see the number.
And the dollar sign is just because
we're talking about dollars. The percent
i is because we're talking about
integers. All right. If I didn't mess
this up, let's make this version of a
calculator or meme. So far so good.
Dot/calculator. Enter. Here's $1, which
was the initial value of my dollars
variable on line six. Double it and give
it to the next person. All right. Why?
Here's $2. Double it and give it to the
next person. Okay. Okay. Okay. Okay.
Okay. I'm going to do it faster. It's
getting pretty good. You can see the
power of exponentiation.
It's getting pretty high. Let's keep
going. Keep going. Lot of doll.
Too far.
That does not happen in the memes. What
happened here?
What's going on? Yeah. What do you
think?
>> Exactly. Good intuition. Because the
computer only has a finite number of
bits allocated to each integer. I
hypothesized earlier that it's usually
32 bits, maybe 64 bits, but it's finite,
which means you can only count so high
and it's roughly 4 billion or again an
integer by default can be negative or
positive. So it's roughly 2 billion and
that's pretty close to what we were
getting here. In fact, we overflowed the
integer in memory. In fact, integer
overflow is a term of art whereby you
can overflow an integer by trying to
store too big of a value in it. And the
reason for this is again to make this
clear, this is a piece of memory inside
of a laptop or a desktop or some other
device. And in these little black chips
is a whole bunch of bits or really bytes
that can store information
electronically. But they allocate those
bits in units of 8, maybe 16, maybe 32,
maybe 64, but finitely many per value.
And whether we're using 32 or 64, you
can only count so high if you have a
finite number of bits. And we've seen
this problem even on a small scale with
our flat light bulbs last week. If we
have a three-digit number as represented
by like three physical light bulbs or
three tiny transistors in the computer,
I can count from zero to one to two to
three to four to five to 6 to 7. If I
want to count to eight though, I need a
fourth bit. But as the red suggests, if
you don't have a fourth bit, for all
intents and purposes, that number is
just zero. Or as an aside, depending on
how you're representing your number,
sometimes a leading one indicates that
the number itself is negative, which is
why in VS Code, we actually saw both
symptoms. First, we went negative
because we wrapped around logically,
much like that one resulted in our
getting back effectively to zero, and
then we did indeed end up on zero
ultimately. So, how can we chip away at
this? Well, a couple of solutions
perhaps. Let me close my terminal window
here, and instead of using an int, well,
let's just kick the can down the road.
Let's use a long which is 64 bit. So at
least we can give away even more money
in this scenario. I can't use percent I
and need to use percent li now for a
long integer. But I think at this point
if I go back to VS Code's terminal
window here. Oh, and I quit that program
by hitting C quickly. Uh now I'm going
to go ahead and do make calculator again
dot /cal. And I'm just going to keep
hitting Y. But because I'm using a long
int now and thus 64 bits, if I do this
long enough, it's going to get crazy
high and much much higher than before.
High enough that I'm not going to keep
clicking Y enter because we're never
going to hit the boundary. But
eventually, especially if I did this in
a loop automatically, it would certainly
Oh. Oh, okay. I guess exponentiation
works fast. Okay, so it did work. I
didn't think I was going to hit it
enough times, but the same problem
happened again. We overflowed this long
integer even using that many bits
because I was talking so long I kept
hitting y enough times to overflow even
that long integer. So that too was a
problem and this happens truly in the
real world. So picture here is a Boeing
787 from a few years back, long before
there were all the more recent problems
with Boeing planes, whereby after 248
days of continuous power, which is kind
of a thing in the aviation industry,
like time is money and generally they
want the planes in the air as much as
possible, which means they want them
powered on as much as possible, which
means they don't like turn them off at
night. They keep them going and flying.
After 248 days, the New York Times
reported a few years back that a model
787 airplane that has been powered
continuously for 248 days can lose all
alternating current electrical power due
to the generator control unit
simultaneously going into failsafe mode.
This condition is caused by a software
counter internal to the GCUs that will
overflow after 248 days of continuous
power. Boeing is in the process at the
time of developing a GCU software
upgrade that will remedy the unsafe
condition. So literally what this means
is that the power to these planes would
just shut off if the planes were on for
more than 248 days at a time. And this
was a common thing for planes to be
maximal power. Why was this actually
happening or what was the solution?
Well, the short-term fix because it took
a while for Boeing to fix this was what?
What would you do if the the symptom is
that the plane shuts off mid-flight
after 248 days? Yeah.
>> Turn it off back on. literally turn it
off and back on again, much like you've
probably been taught with your phones
and computers and any other electronic
devices that somehow freak out on
occasion. Reboot the plane. Now, why is
that? Well, anytime you reboot a phone
or a laptop or a plane, all of those
variables get reset to their default
values, which if it's the first line of
code, like in some of my examples, gets
set back to zero again. For instance,
the first line of code is executed from
top to bottom. So, this effectively
solved the problem. But when they
finally rolled out a fix, then you
didn't have to do that anymore. But the
or source of the problem is essentially
that they were probably using 32-bit
integers, but also negative values. So
they had 31 bits at their disposal to
count to positive numbers. And 248 days
is roughly how many tenths of a second
there are, which means once you count in
tenths of a second for 248 days, you
would overflow an integer and the power
would shut off effectively because
something ended up going to zero. So,
there was a lot of sort of marketing
speak or technical speak in that, but it
boiled down to just a simple integer
overflow. There's a historical bug in
Pac-Man. If you've ever played this uh
in any of its forms, whereby you can
play up to level 255, but because there
was a missing if condition that checked
what level you were on, you could
accidentally garble the screen if you
were amazing at Pac-Man because they too
would overflow an integer and just
random characters would end up appearing
on the screen. So, it's sort of like a
badge of honor to actually hit level 256
in this way because of this bug. But
there's yet other issues we can see
here. And if you don't mind, we might go
a couple minutes over, but let me just
demonstrate what these examples can do
for us here. If I were to revamp my
calculator here as follows by clearing
my terminal window after hitting C to
kill that, let me go ahead and get rid
of all of this meme code here. Scrolling
down to the inside of main, and let's
just do a couple of things like this.
int x equals uh quote unquote uh what's
x question mark. Then let's go ahead and
do int equals get int quote unquote
what's y question mark. Then let's go
ahead and print out just x / y. So
here's a percent i back slashn x / y.
This would seem to be a calculator now
for division which I can make as before.
And actually sorry I don't want to do
missing terminating. Oh, sorry. Missing
a double quote. There was an unintended
bug. So, if I make this your calculator,
do do/calculator, type in 1, type in
three, I get zero, which is weird. What
if I do instead maybe two and three?
It's zero instead of 66. What if I do
three and three? Well, that curiously
works. But if I do something like four
and three, which would be 1.33, that two
doesn't seem to work. So there's this
other issue in computing when you have
finite numbers of bits known as
truncation whereby even when you're
trying to do floatingoint math like with
a decimal point if you are using an
integer you're going to throw away
everything after the decimal point
unless you're explicitly using the right
data type. And we saw an illusion to
this earlier. If I actually go in now
and change my values from integers to
floats and change the percent i to a
percent f and remake this calculator.
Now I can do 1 / 3 and I actually get
back that their response. But there's
another issue latent here which happens
to in the real world whereby I'm going
to tweak this percent f to be a little
arcane. It turns out you can tell C how
many digits you want to show, how many
significant digits you want, if you
will, by just using a dot and then a
number like 50 arbitrarily. And contrary
to what you might have learned in grade
school, this calculator would seem to
think that dot /calc 1 divided by three
is not 0.3333
infinitely many times. There's all this
random stuff happening at the end. Long
story short, this is because computers
one only use finitely many bits even to
represent floatingoint numbers. And if
there's an infinite number of those, you
can't possibly represent every possible
floatingoint value. So we're essentially
seeing an approximation of 1/3
precisely. But this too happens quite a
bit in the wild. There's really no
solution to this other than by throwing
more bits at the problem using a a
double instead of a float or at least
somehow trying to detect this and catch
this. That then is what we'd call
floatingoint imprecision. But to tie
this together and sort of induce a bit
of fear and for the coming years these
things happen all of the time. Back when
I was finishing school, there was the
so-called Y2K problem or year 2000
problem whereby for decades, computers
had been using not four digits to
represent years, but just two because it
was convenient. It was more efficient
because you use half as much memory to
represent maybe the year 1999, just
using two digits instead of four. Of
course, when the uh year rolled around
from 20 thou from 1999 to 2000, if you
didn't have these numbers even in
memory, you might confuse 2000 with
1900, which was the presumption if
you're only storing two digits. So, we
screwed that up. And thankfully, the
world scrambled. And if you read up on
Wikipedia and news articles from the
time, everyone thought the world might
very well end, but it didn't. So, you'd
think we'd have learned our lesson.
Unfortunately, another such problem is
coming up in the year 2038 whereby
historically since uh the 70s and prior,
computers have generally used 32-bit
integers to keep track of time, the date
and the time by means of counting how
many seconds have passed since January
1st, 1970. And all of the math is just
relative to that date because that's
when computers were really starting to
come onto the scene, if you will.
Unfortunately, there's only 4 billion
values you can count to or two billion
if you're doing negatives from uh
January 1st, 1970. And so, um on the
date January 19th, 2038, we will
overflow a 32-bit counter. And suddenly,
if this problem is not fixed by you or
other people before the year 2038, our
computers and phones and other devices
may very well think it's December 13th,
1901.
So, there are solutions to these
problems. CS50 is all about empowering
you with solutions to these problems.
But if you'd like to scan this here
code, um, this will add that date to
your Google calendar or your Outlook
calendar. Keep an eye on it. That though
is week one for CS50. Problem set one
will be in your hands soon. We'll see
you next time.
Heat. Heat.
Heat.
One fish. Two fish. Red fish. Blue fish.
>> Congratulations.
Today is your day. You're off to great
places. You're off and away.
>> It was a bright, cold day in April, and
the clocks were striking 13. Winston
Smith, his chin nuzzled into his breast
in an effort to escape the vile wind,
slipped quickly through the glass doors
of victory mansions, though not quickly
enough to prevent a swirl of gritty dust
from entering along with him.
All right, this is CS50 and this is week
two. And if we could after this dramatic
reading, a round of applause for our
volunteers.
So we can now take for granted from week
one that we now have a new way to
express some of the ideas that we first
explored in week zero like functions and
conditionals and variables and the like.
And now we're doing in C what we used to
do in Scratch. Today what we're going to
start to focus on is some real world
problems so that we can take for granted
that we have that expressiveness. We
have some tools in our toolkit and
actually start to solve some realworld
problems if representative thereof. In
particular, the real world problem that
we're going to start today and this week
with is that of reading levels. Odds are
when growing up, you read at a certain
level based on the age at which you were
at. Maybe it was first grade level or
fifth grade level or 10th grade level or
the like. And that was a function of
just how comfortable you were with the
words in the book or words on the screen
that you were reading. What you've just
heard, thanks to our volunteers, are
three different reading levels that each
of these three volunteers reads at. And
in fact, why don't we go ahead and hear
them again and be a little more
thoughtful this time as to assess at
what reading level your classmate is
reading. So, let's start with Leah if
you'd like to introduce yourself first.
Hi, I'm Leah. I'm a first year in
Hworthy. And here is my little thing.
One fish, two fish, red fish, blue fish.
>> So, at what reading level would you say
Leah reads based on her recitation
thereof? Yeah, in the front.
>> Kindergarten.
>> Kindergarten. Okay. Okay. So, a fairly
young age. And what makes you say
kindergarten?
>> He is speaking in very short phrases
without much complexity.
>> Okay. Very short phrases without much
complexity. And indeed, according to one
scientific measure that we'll explore in
this week's problem set, indeed. We
would say that Leah reads before grade
1, so kindergarten would indeed be apt.
But welcome to the stage here. Let's
move on now to Maria if you'd like to
introduce yourself.
>> Yeah. Hi, I'm Maria. I'm in Stoutton
thinking of applied math. Um,
congratulations. Today is your day.
You're off to great places. You're off
and away.
>> Another familiar phrase, perhaps. At
what reading level would you say Maria
is?
Well, yeah. Over here.
>> Third grade.
>> And what makes you say second or third
grade?
>> Okay.
>> So, now we're starting to introduce uh
complexities like rhyming and a bit more
substance to the quote. And indeed,
based on that reading, that same measure
that I described earlier, which will
involve a mathematical function that
somehow analyzes what it is Maria just
said. Indeed, we would conclude that she
read at a third grade level or grade
three. Finally, Omar, if you'd like to
introduce yourself and read once more
yours.
>> Okay. Um, so, hi everyone. I'm Omar. Um,
I'm a freshman at Earl, but thinking of
doing Kamsai and this is my reading. Um,
it was a bright cold day in April and
the clocks were striking 13. Winston
Smith, his chin nuzzled into his breast
in an effort to escape the vile wind,
slipped quickly through the glass doors
of victory mansions, though not quickly
enough to prevent the swirl of gritty
dust from entering along with him.
>> All right, sort of escalated quickly.
What reading level is Omar at, would you
say? Someone else.
What might you say or estimate?
Yes, right here in the front.
>> Eighth grade.
>> Okay, eighth grade. And what made you
say that?
more comp,
>> more complex sentences, more complex
words. And indeed, according to that
same measure, this full paragraph of
text now, which indeed has even more
grammar when you see it there on the
screen, would be said to be at grade 10
because of that added complexity. So,
with that said, we're going to need to
be able to somehow sort of crunch these
numbers to determine given a body of
text at what reading level someone is.
But in order to do that and apply any
metrics to a body of text, we're going
to need to represent that text in memory
using something like strings from last
week. But last week with strings, we
could really just print them out or
display them wholesale on the screen.
But I think we're going to need to break
down these various texts and others like
it at a finer grain level. And indeed,
among the goals for today is to explore
exactly that. and also to take the
proverbial hood off of the car to take a
look underneath and how the computer is
actually working, how these things like
strings are actually functioning. So, if
you could join me one last time in a
round of applause for our volunteers.
Thank you so much for helping out. Thank
you guys. Thank you. Thank you to Maria
as well. So among the goals for today
beyond exploring a representative
problem like this of reading levels is
going to be another one which is even
more important and more omnipresent than
reading levels namely cryptography. The
art of scrambling information or
specifically encrypting it so you can
send secure communications. Now you sort
of take this for granted increasingly
nowadays that when you send a text
message or perhaps an email or check out
online with a credit card that somehow
or other your information is secure. And
over the coming weeks, we're going to
explore to what extent that is actually
true and why or why. Now, now with
cryptography, similarly too, if we want
to be able to send messages securely,
such that if I want to send a message to
you, I don't want anyone else in the
room to be able to figure out what it is
I have said, even if they physically
intercept that message, which is all too
possible in a digital world. We're going
to need to come up with metrics and
mechanisms for actually scrambling
information in a reversible way so that
I can write my message somehow scramble
it. You can receive that message even if
after it's passed through many other
hands and you can descramble or decrypt
that same message. So for instance, here
on the screen is a message, a fairly
simplistic one that has somehow been
encrypted. And we'll see by the end of
today and by the end of this week that
this encrypted message and there's a bit
of a tell on the end there actually will
be said to decrypt to this is CS50. But
why is going to be the underlying
question and what additional tools do we
need on our toolkit in order to do that?
Another word on tools. So, up until now,
you've probably experienced some bugs,
whether it was in Scratch or ever more
so in C. In fact, don't feel too bad if
like the very first program you wrote in
C like didn't even work. You couldn't
even make it or compile it until you
went back and fixed some of the code
that you had written. Well, it turns out
that bugs, mistakes in programs are ever
so commonplace. And even though we've
already provided you with tools like the
virtual rubber duck at CS50.ai, also
embedded into VS Code at CS50.dev, dev
of whom you can ask questions along the
way. Among the goals today are to give
you some lifelong tools at how you can
actually debug software yourself when
you don't have a duck nearby, when you
don't have a TA nearby, let alone any
humans at all. So with debugging,
there's going to be a number of
techniques that we can use all toward an
end of like finding and removing bugs or
mistakes from our software. And perhaps
the person best known for having
popularized this term of bugs is that of
uh Dr. uh Grace Hopper pictured here who
was a rear admiral in the Navy and was
one of the original programmers of the
so-called Harvard Mark1, a very early
mainframe computer that if you wander
across the Charles River over to the
science and engineering complex here at
Harvard, you can actually see part of
this on display still in the lobby. It
was succeeded by the Harvard Mark II.
And on the Harvard Mark II, Dr. Hopper
and her team were known for having put
this note in their log book after having
done some number crunching on the system
there. And if we zoom in, they had found
a problem with the computer this one day
whereby there was literally a bug, a
moth inside of the circuitry of the
computer. And as was written here, first
actual case of bug being found. And ever
since then, do we say ever more so, the
phrase bug and debugging when it comes
to finding and eliminating problems in
our code. So let's start with just that.
In fact, let me go over to VS Code and
let's deliberately make some mistakes
together that might very well be
reminiscent of some of the mistakes
you've accidentally made thus far, but
along the way give you all the more
tools for solving those problems as
opposed to sort of uh having to ask
someone else, be it virtual or physical,
for help and actually find these
mistakes in your own code. Let me go
ahead and consciously in VS Code create
a program known to be buggy called
buggy.c.
And in this program, let's go ahead and
do some fairly familiar code initially.
I'm going to go ahead and start just
like we did last week with int main
void. More on that today before long. Uh
inside of my curly braces, I'm going to
say print f hello,
world. Uh that's it. Now I'm going to go
back to my terminal window here. I'm
going to go ahead and do make buggy to
make a program from that source code.
But before I do, odds are even after
just a week of this stuff, you can
probably spot a few mistakes I've made,
a few bugs. What do you see wrong
already? Yeah,
>> include standard.
>> I didn't include standard io.h, that
so-called header file, which is
important because it tells the compiler
that I plan to use functions therein
like print f, which clearly I'm doing.
So, let me go in and include standard
io.h.
What else seems to be wrong here? Yeah.
I'm missing a semicolon at the end of
line five here. So, I'm going to go
ahead and add that in. And this is
subtle and arguably not a bug, but maybe
an aesthetic detail. What else have I
done arguably wrong? Yeah. And back.
>> Yeah, I forgot my backslash and the new
line character just to move the cursor
to the next line so that when I get a
new prompt, it's on a fresh line of its
own. Again, more of an aesthetic, but
certainly a pretty reasonable thing to
do. So, let me go ahead now and actually
in my terminal window run make buggy.
and it indeed compiled. But up until
then, had I not fixed those mistakes, I
would have triggered a whole bunch of
bugs, a whole bunch of error messages as
a result. In fact, let's rewind in time
and undo the fixes I just made and go
back to the original form here and try
running again. Make buggy. Enter. And
we'll see some scary looking messages up
here. Let me scroll up to the top of the
output here where we see buggy c,
which means line three. That's where the
problem is right now. error call to
undeclared library function print f with
type and then it starts to get a little
more complicated but I do see clearly
that it's calling my attention to print
f. So hopefully at some point if not
last week hopefully this week onward
your instinct will be ah all right I'm
an idiot I forgot the header file in
which print f is actually declared it's
not a huge deal it's going to come with
practice so that's how I might know uh
in more intuitively what in fact uh the
solution here might be now here's
another common mistake that I've just
gone in and fixed but I did do something
wrong and hopefully none of you actually
did this because it's an annual FAQ.
What did I just do accidentally wrong?
So it's not studio.h, it's standard
io.h. So do kind of ingrain that one for
standard input output. The next though
bug that I haven't yet fixed is that
semicolon. So let me clear my screen and
rerun make buggy. I should no longer see
that first error message anymore. But I
now do see another error message on line
five. Expected semicolon after
expression. All right, that one's pretty
explicit. So I'm going to go ahead and
fix this. But notice that up until now,
my code wouldn't have been able to
compile because of those two error
messages. it stopped showing me uh by
showing me these errors. But at this
point, if I run make buggy enter, it did
in fact compile. And yet it's arguably
still buggy because when I run dot
/buggy, I get my prompt on the wrong
line. So this is a distinction now
between a syntax error, something that
or a programming error that outright
stops my program from compiling. It's
sort of a dealbreaker versus something
that's maybe more of a logical error. I
actually meant to move the cursor to the
next line. And so there's different
types of errors in the world as we're
seeing here. Of course, if I rerun make
buggy again/buggy. Now we're back in
business hopefully with the intention of
having this uh display exactly that. All
right. Well, let's modify to look a
little more like something else from
last week. Recall that last week I
started to get someone's name more
dynamically. So I said something like
name equals get string. And that was a
function we introduced. And I might have
said something like this. what's your
name? question mark with a space just to
move the cursor over. I know now I
definitely need to end my thought with a
semicolon. I could try and compile this
make buggy now and I'm seeing a
different error message altogether that
you might not have seen yet. So on
buggy.c line five error use of
undeclared identifier name.
What now is the mistake that I've made?
Why does it not know? declare the type.
>> Yeah, I forgot to declare the type of
this variable, which for those of you
with the prior programming experience is
not something you have to do in some
languages like Python for instance. But
in languages like C, C++, Java, and
others, you do in fact need to
explicitly tell the compiler that you
want to instantiate a variable, create a
variable in the computer's memory by
telling it its type. And it's not going
to be an int because I don't want an
integer, of course, in this case. I want
text which we now know to be called
string instead. All right, I think this
fixes that bug. So, let me do make buggy
again. And hopefully, huh, a fatal error
this time. Again, indicating that my
code did not recompile on line five.
Still, I have an error, but this time it
says use of undeclared identifier
string. Did I mean standard in? So, this
is a bit of a red herring. The compiler
is trying to be helpful and saying did I
mean standard in but I don't think I
actually do that just is the most
similar looking word in the compiler's
own memory. What's the actual mistake
that I've made here? Yeah,
>> you didn't CS library.
>> Yeah, I didn't include the CS50 header
file because string recall is a feature
of the CS50 library as is get string and
get int and others. So the solution here
is indeed to go up here and just to be
nitpicky I tend to alphabetize my header
files. It's not strictly required
technically but stylistically I find it
nice to be able to skim the header files
alphabetically to see if something is
there or not. I can include cs50.h in
addition to standard io.h and it's in
that file c50.h that not only is get
string define declared so that the
compiler knows that it exists it turns
out so is the word string. So this is a
bit of a white lie and this is something
we do in the early weeks of the class.
We dug up these old training wheels from
a bicycle. The whole idea being to sort
of keep you up and avoid you having to
do all too much complexity early on. The
point of these training wheels in the
form of the CS50 library is to let us
kind of ignore what a string really is
for just another week or two after which
we will then uh peel back that layer,
take off those training wheels and
reveal to you what is actually going on.
So, for now, strings exist, but they
exist because of the CS50 library. In a
couple of weeks, they're still going to
exist, but we're going to call them by a
different name, as we'll eventually see.
But everyone in the real world, uh,
every software developer uses the phrase
string. So, this is a concept that
exists. It is not CS50 specific at all.
It's just that in C, the word string
doesn't typically exist unless you make
it so, as we have. All right. So I think
now if I clear my terminal window and
rerun make buggy now it should in fact
compile. And if I run dot /buggy enter I
should be able to type in my name. And
now voila hello. So this is now not a
syntax error because I didn't screw up
my code per se like it compiled.
Everything is grammatically correct so
to speak but logically intellectually
this is not what I wanted right I wanted
it presumably to say hello David. So,
let's fix one final bug here. How do I
fix this? On what line?
How do I get it to say, "Yeah, hello,
David."
>> Yeah. On line seven, I need to do the
string placeholder, the format code, so
to speak, percent s. And then one more
thing, someone else. What do I do after
this? Yeah. And back.
>> Yeah. A comma. and then add the variable
name that contains the value I want to
substitute in there which is indeed name
though I could have called it anything I
want. All right, so now make buggy enter
seems to have compiled again dot /buggy.
Now I type in my name once more and now
we're back in business. So over the
course of these few exercises, clearly I
I meant to make most of all of these
bugs, these mistakes, but they
demonstrate not only syntax errors,
which are just going to stop the
compiler in its tracks. Like you won't
even be able to compile your code until
you fix those things, but even after
that, there could be these latent bugs
that seem to not be there until you
actually provide input and see what's
actually happening at so-called runtime
when you're running the actual code. And
so here's where it's no longer as easy
as just reading the error message and
figuring out what it means because there
is no error message that appeared on the
screen when it said hello, world. We had
to use our own human intellect and
realize, okay, that's clearly not what I
wanted. Had you run CS50's own check 50
program on something like that, we could
have told you that that's not correct by
automatically assessing the correctness
of it. But the compiler has no idea what
you are trying to achieve logically. it
only knows about the language C itself
and the requisite syntax for actually uh
writing and compiling code. So how could
we go about solving logical problems in
code? So I would propose that we start
to consider this here list whereby when
you want to find a logical problem in
your code and better understand what's
going on or really what's going wrong,
print f is going to be your friend. Up
until now we've used printf to literally
print on the screen. Hello David, hello
Kelly or anything else on the screen.
But you can certainly use print f
temporarily to just print stuff out
inside of your program that you might
want to better understand. And then once
you understand it and once you've solved
some problem fine then you can delete
those temporary lines of code recompile
and move on. So let's use print f as a
debugging tool in that sense. Let me go
back over to VS Code here and let me in
this same program buggy.c see sort of
delete everything and start over with a
different sort of bug. I'm going to
include standard io.h at the top. I'm
going to do int main void after that.
And then inside main, I'm going to do a
simple for loop that just prints out
like a a stack of three bricks like we
saw in the world of Mario when Mario
needed to we claimed sort of jump over a
stack of bricks. We want to print out
just three of those at the moment. So
I'm going to go ahead and say for int i
equals 0. i is less than or equal to
three because I want three of these i
++. Then inside of this for loop, I'm
going to go ahead and quite simply do
print f hash symbol to represent the
brick followed by a new line to move the
cursor to the next line. Semicolon to
complete the thought. Now, I've
deliberately made a stupid mistake here,
but in the context of a simple enough
program that we can focus on the
debugging technique on, not on the
obscurity of the bug in question.
Hopefully, you'll spot the bug in just a
moment, if not already. When I do make
buggy now and dot/buggy, I don't get
three bricks. I of course get one 2 3
four total. So, there's a logical bug in
this program. And odds are you can
already spot what it is. But let me
propose that this program is
representative of a type of problem that
you can solve a little more
diagnostically by poking around and
really asking the computer via printf to
show you what's really going on. And I
would propose that one of the most
helpful techniques in a situation like
this if you're trying to wrap your mind
around why are there four bricks instead
of three. Well, clearly this is related
to the loop somehow. So let's look a
little more thoughtfully at what the
value of i is before we print out each
of those bricks. And I might literally
do something like this temporarily. Uh,
print f quote unquote i is percent i
back slashn close quote. And then I
could just print right here and now the
value of i just so that I can actually
see it. Let me now go down into my
terminal window make buggy again dot
/buggy. And now and I'll full screen my
terminal. I'll get some diagnostic
information at the same time. So when I
is one I get a brick. When I sorry when
I is zero I get a brick. When I is one,
I get another brick. When I is two, I
get another brick. When I is three, I
get a fourth brick. So now I can kind of
see that, okay, my loop is working, but
I'm going too far. I'm going too long.
Now I can do this even more succinctly.
For what it's worth, I don't need a
whole new print def statement. I could
just go into my existing print def, put
my percent I there, and then maybe a
space just to scooch things over and
then print out I in that same line. If I
now do makebuggy slashbuggy. Okay, now
I'm seeing that I'm printing a hash a
brick for each value of i from i equals
0 1 2 and also three. So the solution of
course is that I shouldn't be starting
at zero and iterating less than or equal
to three. The solution is like ah I'm an
idiot. I should have said less than
three. Or if I prefer to count starting
at one like a normal person, I could
have set I equal to one and then go up
two and through three. But as I claimed
last week, the canonical way, the most
common way to do this is start counting
at zero and go up two, but not through
the total value that you have in mind.
But there's going to be another
technique that's worth knowing here. Let
me go ahead and sort of abstract this
away by whipping up a slightly better
variant of this as follows. Let me go
ahead and delete this for loop. Let me
assume for the moment that inside of
main I'm going to ask the user now for
the height of a pyramid. And I'm going
to do something like this. int h equals
get int. And let's prompt the user for
the height value of this pyramid or this
wall. And then let's go ahead and assume
there exists a function called print
column who takes as input a number h
which is how many bricks you want to
print. Now this function does not exist
yet. Print column. Get in does exist but
I don't have access to it. So let me not
make the same mistake twice. What do I
need to add at the top of this file?
Yeah,
>> CS50 header file.
>> I need the CS50 header file because I'm
using the get int function now, which
again comes from our library, not C. So,
let me go ahead and include CS50.h, but
now print column. I can invent this
function myself. So, let me go ahead and
say void print column int height in
parenthesis. More on that in just a
moment. And then I'm going to recreate
the loop from before for int i equals z.
I is less than or equal to the height.
So I'm going to deliberately for now
make that same mistake as before. i ++
and then inside of this for loop I'm
going to go ahead and print out a single
hash and a new line to represent that
there brick. So now main can use a
function called print column. It's going
to pass in the value of h and then this
for loop in the print column function is
going to take care of printing this
thing for me. So, let me do this again.
Make buggy. Enter. So far so good. Dot
/buggy. Let's put in a height. I'm going
to say manually height of three. And I
should see three bricks. But of course,
I'm still seeing four. Now, before we
move on, let me hide my terminal and
propose that this is just kind of
stylistically bad to put anything other
than your main function at the top. But
recall that if I move my helper
function, print column, and it's a
helper function in so far as I made it
to help me solve another problem. I
can't recompile and run my code now.
Why? The compiler won't let me. Yeah.
>> Exactly. When the compiler gets to line
seven of my code, it's going to abort
compilation because it doesn't know what
print column is. Why? Because I don't
tell it what it is until line 10. And
this was the only time I proposed that
copy paste is reasonable is to highlight
and copy the very first line of that
function. Paste it above main with a
semicolon. And that's a so-called
function prototype. It specifies what
the name of it is, what its inputs are
if any, and what its output is if any.
And more on these inputs and outputs
later on. But now this is just a more
complicated but more modularized version
of this same program. Let me do make
buggy. Still compiles dot /buggy. type
in three and I still have that same bug.
But the catch now is that my code has
gotten more complicated. And the point
of my having abstracted away this idea
of printing a column into a new function
is that there's just more code now to
debug. I could certainly go in there and
start adding print fs, but at some point
print f is going to be a very primitive
tool and you're going to waste more time
adding print defs, recompiling your
code, running your code, changing the
print f, recompiling your code, running
your code. It's going to get very
tedious quickly when you have lots of
lines of code on the screen. So, can I
actually step through my code line by
line? Maybe like your TA would in a
section or a small class line by line
walking through the code. You can
because another tool that you have
access to is that called debug 50. So,
this is a CS50 command that will start
an industry standard debugger. And a
debugger is a piece of software that is
used in the real world that literally
lets you do that, debug your code by
letting you slow down or even pause
execution and walk through execution of
your code line by line. The only reason
we call it debug 50 is because in VS
Code it's a little annoying to start the
debugger. And so we automated the
process of starting the debugger, but
everything thereafter has nothing to do
with CS50 and everything to do with
realworld software engineering
techniques. So how do we use this? Let
me go back to VS Code here and let me
propose that I want to step through this
code line by line just like we might at
a whiteboard in a smaller class to
figure out why I'm getting four instead
of three hashes. Well, in my terminal
window, what I'm going to go ahead and
do is this debug50 space/buggy.
So debug 50 is the command. It needs to
know what program I want to debug. So
I'm specifying/buggy,
which is the name of the program I just
compiled. I'm going to get an error
though the first time I run this. Uh, as
will you if you make the same mistake.
I'm about to see this message here.
Looks like you haven't set any break
points. Set at least one break point by
clicking to the left of a line number
and then rerun debug 50. So, what is
this really telling me? Well, the
debugger has no idea when and where I
want to pause execution so as to start
walking through my code line by line. It
wants me to tell it where to break. That
is where to pause by clicking on a line
number. So, let me hide my terminal for
just a moment. And you've probably never
done this intentionally, but if you
hover over the space to the left of your
program's line numbers, you'll see a
little red dot, a little stop sign of
sorts. If you actually click on a line
number, that red dot will stay there.
And you can see the hover here saying
click to add breakpoint. What I'm going
to go ahead and do is say click to add a
breakpoint at main. Maine is the entry
point to my program. It's the default
function that gets called. Let's break
right away so I can step through this
code line by line. All right, let me
reopen my terminal window and clear it
and then run debug 50 again with dot
slashbuggy enter. And now a whole bunch
of stuff is going to happen quickly on
the screen. And then it's going to clean
itself up because once the debugger is
running and ready to go, it's going to
allow me to start stepping through my
code line by line. So what is going on?
Well, notice nothing has happened in the
terminal yet. Why? Because my code has
been paused inside of main. in
particular, it's been paused in the
first real line of code. So the curly
brace is uninteresting. The first line
is just the function's name essentially.
So line 8 is the first juicy line of
code that could possibly do anything
useful. It's been highlighted here in
yellow. And that the fact that this
cursor is here means that we have broken
execution on this line, but we have not
yet executed this line, which is why in
the terminal, I don't see anything yet.
I definitely don't see height followed
by colon. Notice what else has happened
here. All of a sudden in the lefth hand
side of the screen where your file
explorer typically is or where the CS50
duck typically is, we see mention of
variables, you can actually see inside
of the debugger what the value of any
variable in the computer's memory
happens to be. Now I don't quite
understand this right now. We'll come
back to this over time, but weirdly
before line a 8 even executes, it seems
that h has a default value of 32,764,
which seems to have come from nowhere.
As an aside, this is going to be what's
called a garbage value. And this is
actually why we have Oscar so
omnipresently here. A garbage value
tends to be a default value inside of a
variable that's the result of that
memory having been used previously for
something else. Inside of your computer,
you've got all of this memory, random
access memory or RAM. More on that
today. And it stands to reason that the
my computer or whatever cloud server
we're using has been running for some
time. So the bits that H is going to use
might already have some random switches
on and off. Some random pattern of bits
that happens to give me 32,764.
But the moment this line of code
executes, that value is going to get
changed to what I actually want it to
be, which is what the human is going to
type in. Meanwhile, at the bottom here,
you'll see a so-called call stack. More
on this too in the weeks to come, but
you'll see that we've paused on the
function called main in the file called
buggy.c.
So, how do I do something useful? Well,
at the very top of the debugger, you'll
see a whole bunch of color-coded icons.
One looks like a play button. And if I
click that, it's just going to continue
execution of my code as though I don't
want to step through it anymore. So, I'm
not going to click that just yet. The
second arrow, which is a little curved
arrow over a dot, is the so-called step
over line, which will mean step over
this line and execute it, but only one
line at a time. Let's go ahead and do
exactly that. So, I'm going to click the
step over icon, the second one, which is
the curved arrow with the dot under it.
Click. Now, I see in my terminal window
height being prompted. All right, let's
go ahead and type in three, just like I
did before, and hit enter. Now, notice
what happens. Execution has paused on
line 9 instead of 8. And you'll see that
my variable, a so-called local variable,
has the value of three as intended. All
right. So far, this isn't all that
enlightening other than demonstrative of
the fact that I can pause execution of
my program anytime I want. So, let's now
click that step over button again so
that we actually print this column.
Click. And there we have it. Four hashes
at the bottom of the screen. Now,
execution has paused at the end of the
function. This is just my opportunity to
either stop or restart or continue. I'm
just going to go ahead and click the
play button and let it finish executing.
Unfortunately, that wasn't really at all
in enlightening except to confirm for me
that I typed in three and three is what
is in the computer's memory. Not that
interesting though yet. So, let's do
this. Let's leave the breakpoint on line
six as before. Let's rerun the debugger
by running debug 50 space/buggy.
Let's let it do its startup thing, which
looks a little messy at first, but now
we've highlighted line 8 again. I'm
going to go ahead and step over this
line because I do want to get an int.
I'm going to type in three again. enter.
But this time, instead of stepping over
line 9 and just letting print column
happen, this is where the debugger gets
powerful. Let me step into line 9 and
walk through the print column function
itself line by line. So, let me go ahead
and click not this button, which is the
curved arrow over the dot, but the next
one, which is the step into button.
Click. And now you'll see that execution
has jumped inside of print column and
paused on line 14. At which point I can
see at top left what the default value
of I is. And this is some crazy garbage
value because whatever bits are being
used to store I's value have some random
garbage from some previous use of that
memory. But as soon as line 14 executes
once, I bet I is going to take on a
value of zero. So let's do that. I'm
going to go ahead and click step over
because I don't need to step into this
because there's no other functions
there. Step over it and immediately at
top left I is now zero. Now line 16 is
highlighted. Let's step over this. Okay.
And notice in the terminal window, what
do you see? The first of our hashes.
Let's step over. Step over. Second hash.
And I is now one. Step over. Step over.
Now we see a third hash. And I is now
two. Step over. Step over. Okay, there's
the symptom of the bug. Four hashes and
yet I is three. But wait a minute, this
is going to draw my attention now to
line 14 before I continue onward. Wait a
minute. Three is of course less than or
equal to three, which is why I got that
fourth hash on the screen. So at the end
of the day, like you still need to
exercise some of your own human
intellect to figure out and understand
what's going on. But the value of this
here debugger is that you can pause and
work through things at your own pace and
poke around inside of your own code and
better understand what's happening as
opposed to compiling the program,
running it, and just now having to infer
from the symptoms alone what the source
of the problem might be.
So that was a lot. Let me go ahead here
and just let it continue to the end
because I know what the problem is. Now
I need to change the less than or equal
to sign to a simple less than instead.
Questions though on debug 50 or any of
these steps. Yeah,
>> I have two questions.
>> Sure.
>> Could you go over what the break point
thing is? And then my second question
was around the garbage.
The second time you ran it, it still
gave that same garbage value even though
you had assigned to H.
>> Correct. So in order of your questions,
what again are these break points? The
break point or the little red stop sign
here just tells the debugger where to
pause execution. So frankly, I didn't
have to break pause execution at main.
If I really care about debugging print
column, I could have clicked down here
instead and then it would have just run
main automatically and only paused once
print column gets called. So a break
point is where your code will break, the
point at which it will break. As for the
garbage values, I'm tell it's I'm
oversimplifying exactly what's going on
inside of the computer's memory. and
it's not necessarily using exactly the
same memory as before, but the operating
system will govern exactly how the
memory is laid out. Um, this is actually
a significant problem, long story short,
in a lot of today's systems because it's
not that interesting to me to know that
there was 32,000, whatever that number
is, or the negative number. But suppose
that that revealed the password of some
another program or function that had
some information there. It seems all too
easy with the debugger, let alone C, to
actually poke around the computer's
memory. And we're going to come back to
that in a couple of weeks. But for now,
it's a garbage value in so far as you
didn't put the value there. It somehow
got there on its own for now. Other
questions?
>> When you have a four, does the i=
to one at the end of the four or the
next?
Correct. So the question is about the
order of operations for a for loop. So
the first time you go through a for loop
the initialization happens the stuff
before the first semicolon and the
condition is actually checked the
boolean expression. Then everything
inside of the curly braces is executed.
Then the incrementation or update
happens which in this case is I++ and
then the condition is again checked the
boolean expression. The code is
executed. The update happens. The
condition again the code is updated. And
so it starts to loop like this. The
debugger's graphics are fairly
simplistic and it just highlights the
whole line without making super clear
what's happening. But that's just the
definition of a for loop. Good question.
Others about debug 50 or print def.
All right. Yeah.
>> Can you change the position of I++ and
height? Short answer, no. The first
thing is the initialization, the
variable you want to create and
initialize. The second thing is the
actual condition, the so-called boolean
expression. The third thing is always
the update. So, it must come in this
order. What you're not seeing is that
you can actually have multiple boolean
expressions, you can have multiple
initializations, you can have multiple
updates, but we're keeping it simple for
now. And this is canonical. All right.
So to make clear, assuming that either
print f or debug 50 helped me figure out
where the illogic was in my thoughts, I
now know that the fix here is to just go
and change the less than or equal to to
a simple less than. And if I run the
program again, of course, it's going to
give me the three bricks that I always
wanted instead. But there's other
techniques we can use too. So besides
print f and debug, you might wonder why
we have a 7ft duck behind me here. All
of these little rubber ducks on the
floor. So rubber duck debugging per week
zero is actually a thing. Uh this was
popularized in a book some years ago and
the idea is that when you are facing
some bug, some mistake in your program
or you're just confused on some concept.
There is anecdotal evidence to suggest
that just talking out the problem with
an inanimate object like a rubber duck
on your desk is enough often for that
proverbial like light bulb to go off
over your head because you hear in your
own words what confusion you're having,
what illogical thoughts you're having,
and you don't even need another human or
TA or AI in the room to answer the
problem for you. So in fact on the way
out today at the end of class we've got
hundreds of ducks and enough for
everyone to take home with you if you'd
like to use that as another debugging
technique whether in CS50 or something
else. But of course now in the age of AI
you also have the AI powered virtual
duck at cs50.ai and also in VS Code at
cs50.dev which really is a mechanism for
asking questions that you don't think
you can solve on your own. So, it might
be reasonable to ask the duck, "What
does this error message mean?" If you're
having trouble wrapping your mind around
it, but it's less reasonable to say copy
paste your code into the duck and say,
"What's wrong with my code?" You should
really be meeting the AI halfway. After
all, what's the point of actually doing
this or any other class is to develop
that muscle memory, develop those mental
models, get some practical skills. So
try hard to walk that line between
asking the duck too much versus
deploying some of these same tools
yourself. Print fbug 50, even a physical
rubber duck on your desk before you
resort to sort of escalating it to human
like or duck help. All right, so with
those tools added to one's toolkit,
let's actually consider and reveal
what's been going on underneath the hood
since last week. So this was the mental
model we proposed for last week whereby
when you write source code in a language
like C. It's not something that the
computer itself understands natively
because computers we saw only understand
zeros and ones aka machine code. So the
compiler is the program that we use to
convert your source code to the machines
code from C to zeros in one in this
case. More generally a compiler is just
a program that translates one language
to another. And in this case we're going
from source code to machine code. So
let's consider what's really happening.
And indeed, this is among the goals of
this week is to take a look at a lower
level so that when you encounter more
interesting, more challenging problems,
you'll understand from so-called first
principles what the computer is actually
doing and supposed to do. So you can
deductively figure things out for
yourself and generally not view
computers as like magic or I don't know
how this works. you'll have a fairly
bottom-up sense of how everything works
by terms end inside of any computer,
laptop, desktop, phone, or the like
these days. So, here's the simplest of
programs that we wrote last week, even
though there's a lot of syntactic
complexity as we've seen. The goal is to
get it to machine code. These here,
zeros and ones. So, how has that been
happening when you just run make since
last week? Well, these are the two
commands that we've typically run after
creating a file like hello. C. We then
compile it with make hello and then we
run it with dot /hello. So let's give
ourselves this starting point real quick
just so that we have an example in mind
of exactly what it is we're compiling.
So let me go back to VS Code here. Close
out buggy.c and let's create a new file
just like last week called hello.c
inside of which is our old friend
standard io.h h int main void and then
inside of this we'll keep it simple just
printing out hello world which again is
my source code in C. How do I now
actually compile that? Well, of course I
can go down to my terminal window make
hello/hello
and we're off and running. So it was a
bit of a white lie for me to let you
think though that last week the compiler
itself is called make. Make is a command
that literally makes your program. It
makes it by compiling it. But make is
not technically the compiler. If we
really want to get nitpicky, the
compiler you've been using is actually
called clang for C language. And this is
a very popular compiler, freely
available, open source so to speak. You
can even look at the code other humans
wrote to create the compiler online. And
what make is really doing for us is
essentially automating this command. So
all this time I could have just run
clang spacehello.c.
But the default file name from Clang the
compiler weirdly and for historical
reasons is not going to be hello as you
would hope. It's going to be a.out for
assembler output. And we don't do this
in the first uh in week one of the class
because like this just makes things
unnecessarily complex that we're adding
some random name that you just have to
know to type. However, we can do this
now as follows. Let me go back to VS
Code here. And let me clear my terminal
and type ls. And we'll see everything
we've created thus far. Buggy. C, which
when I compiled it, I got buggy. And
hello.c, which I just wrote. And when I
compiled it, I got hello. Let's do this
command now manually, though. Let's use
clang on hello. C, and hit enter. That
two seems to work. But if I now type ls,
you'll see a third program specifically
called a.out, which happens to be the
same as hello. It just is using the
default name instead of my custom name,
hello. But if I do dot slash a.out
indeed that too will work. But the
reason we don't do that certainly in the
first week of the course is that things
get a little annoying or sort of
escalate quickly thereafter. So let me
go ahead and change this program as
we've done a few times already. Let me
include cs50.h so that we get access to
like get string. Let me do string name
equals get string quote unquote what's
your name question mark close quote. And
then down here, just like before, let me
add my percent s and add in my name. So,
I did that super quickly, but it's the
same program we wrote a few minutes ago,
and it's the same one we wrote last
week. What happens now, though, is as
follows. If I now try to do clang hello
C enter, I actually get an error
message. This one perhaps more cryptic
than most. Somehow or other, I have this
error. Linker command failed with exit
code one because of an undefined
reference to get string. Now, in the
past when we've seen undefined or really
undeclared mentions of get string, the
problem was just with missing this line.
This line is clearly here. But the catch
is I'm getting this error message now
because when I run clang of hello.c, I'm
just assuming that clang knows where to
find the CS50 version of get string. And
that is not the case. Technically, if I
want the compiler to compile this code
for me, what I'm actually going to have
to do is this. Let me go back to uh my
terminal window here, and I'm going to
say clang hello. C, but I'm then going
to specify -Lcs50, which is cryptic at
first glance, but this is telling the
compiler to link in the CS50 library so
that it knows what the zeros and ones
are that belong to the get string
function. Long story short, if I hit
enter now, the error message has gone
away. If I type ls, I've still got
a.out, but it's a new version thereof.
And if I do dot / a.out, now I see the
new behavior where I can type in my name
and see hello, David. Now, this is
getting a little stupid that I keep
using a.out. We can change that as well.
In fact, these commands, as we're
starting to see, support what are called
command line arguments. And a lot of the
programs we've run already take command
line arguments. When we run code space
hello.c, the so-called command line
argument to code is hello. C. When I run
make hello, the command line argument to
make is hello. In other words, the
command line arguments to a program are
all of the words you're typing in your
terminal after the name of the program
itself, whether it's make or whether
it's code or anything else. So, this is
to say what I just ran clang of hello.
C-LCS50,
I was passing in two command line
arguments. Hello. C, which is the code I
want to compile, and LCS50, which means
use the CS50 library, please. But I can
add another to the mix. I can actually
do something like this. whereby I do
clang-
o hello hello then I can do hello c and
then -lc cs50 enter. Now that too seems
to work. And if I type ls I've got all
the same programs as before. So let's go
ahead and get rid of those to make clear
what's going on. I'm going to remove
a.out. I'm going to remove hello. And
just for good measure I'll remove buggy
as well. So that all I have left in this
folder is source code. So if I type ls
there's my two files. Let's do this
again. clang- o hello hello c-lcs50
enter. Now if I type ls I don't see
a.out anymore because apparently
according to the documentation for clang
the actual compiler if you pass d- o as
a command line argument followed by
another word of your choice you can name
the program anything you want without
having to resort to mv or clicking on it
and typing a new name in manually. So if
I now do /hello, I see the exact same
version where it's just asking me for my
name and then printing it out. But long
story short, the whole point of this
exercise is that like running commands
like this quickly gets very tedious. You
have to remember like the order in which
to do it, what the command line
argument. I mean, this is just stupid
waste of time typically, certainly in
week one of the course to have to
memorize these kinds of magical commands
to get things working. But for now, know
that when you run make, it's essentially
automating all of that for you and
making it as simple semantically as make
hello or make buggy. But what's really
happening is the make command because of
the way we've configured cs50.dev for
you is doing all of this behind the
scenes. And it's not that magical. This
just means change the file name to hello
when you compile it. This just means
compile this code. And this just means
use the CS50 library. like that's all.
But that message about linking something
in there's there's something juicy going
on there such that make is in fact
helping us sort of solve a whole bunch
of problems when we compile and in fact
let me propose that if we take a step
back and look at some of the actual code
that we're compiling. Let's consider
like what we actually mean by compiling.
Yes, it's the case that to compile your
code means to go from source code to
machine code. But technically there's a
few more steps involved. Technically
when you compile your code that's sort
of become the industry term of art that
really is referring to four separate
processes all of which are happening in
succession automatically but each of
which is doing a different thing. So
just once let's walk through these these
several steps. So what is this
pre-processing step? So consider this
program here which we wrote uh in brief
last week. We've got include standard
io.h which is there because we want to
be able to use print f ultimately. We've
then got a prototype for this meow
function. And the meow function does
this. All it does is print out quote
unquote meow followed by a new line.
Takes no input, returns no return
values. The main function now has a for
loop. Iterates three times each time
calling the meow function. And we saw
this already earlier today. This line of
code here, the so-called prototype is
necessary because we need to tell the
compiler that meow exists before we
actually use it here, especially if I
don't get around to implementing it
until later. So this copy paste of that
first line of code, a so-called
prototype solve that problem. This is
what the header files are essentially
doing for us. Before I use print f down
here, the compiler needs to know what it
is, what its inputs are, what its
outputs are. Turns out the prototype for
print f is going to be in standard io.h.
And that's what that line of code has
been doing for us all this time. In
fact, let's take a simpler example that
we keep using here whereby I'm including
CS50.h and standard io.h. And I'm using
the CS50 get string function to get
someone's name and put it in a variable
called name and then I'm printing out
hello, such and such. What's going on
now when I pre-process this file by
running make, which in turn runs clang?
Well, the compiler finds on the server's
hard drive the file called cs50.h H goes
inside and essentially copies and pastes
its contents into my own code.
Meanwhile,
such that we get the prototype there for
get string. And we haven't seen this
yet, but it stands to reason that all
this time using print f, we've been
passing in a prompt like what's your
name? And we've been getting back a
string. What's inside the parenthesis,
recall, is the input. What's before the
function name is the output, the
so-called return value. What about
standard io.h? It's in that file that
print f's prototype is. So essentially
what the compiler does when
pre-processing this file is it finds
standardio.h somewhere on the server's
hard drive, goes inside and copy and
pastes those relevant lines of code into
my code as well. It's to avoid me having
to do all of that myself, find the file,
copy paste it, or manually type out the
prototype. These pre-processor
directives just automate all of that
TDM. So what this effectively has at the
top of my code after the files been
pre-processed is all of those hash
symbols followed by include are changed
to contain the actual contents of those
header files. Now the compiler knows
what get string is all about and what
print f is all about. That then is the
pre-processing step. What is compiling
technically mean? Compiling means taking
that pre-processed code, which again
looks a little something like this, and
convert it into something called
assembly code. And we won't spend much
time in this class on assembly code, but
this is how programmers used to write
code. Before there was C, before there
was Python and Java and all of these
other modern languages, programmers were
writing code like this. Before this
existed, they were programming zeros and
ones into the earliest of mainframe
computers using punch cards and other
technologies. Like literally sheets of
paper with holes in them. Not very fun.
Very tedious. So the world invented
this. Also not very fun, very tedious.
So the world invented C. Not that much
fun. So the world invented Python and so
forth. We continue to sort of evolve as
a species with code. But the compiler
technically takes your pre-processed
source code and converts it into
something that looks like this. Cryptic,
and that's to be expected. But there are
some familiar phrases. There's mention
of main. There's mention of getstring.
There's mention of print f. And there's
a bunch of other things. Move and push
and exor and call and these other
commands here. These are the assembly
instructions. Those are the lowest level
instructions that the CPU inside of a
computer understands. CPU is the central
processing unit. The thing by Intel or
AMD or Apple or other companies. Those
are the lowest level commands that the
actual hardware inside of the computer
understand. It's just nice to be able to
write words like main and for and uh
print f than it would be to run these
much more arcane commands that you'd
have to look up in a manual. So
compiling just takes CC code and makes
it a lower level type of code called
assembly. When I said a.out means
assembler output, that's why inside of
that file is essentially the output of
an assembler. All right, we're almost
there. What does it mean to assemble a
program? which is step three of the
compilation process. That means
converting assembly code to the actual
zeros and ones we keep talking about. So
if the file is called hello C, when that
file is assembled, the assembly code
becomes the zeros and ones for your code
in hello. C. But your code is not
everything that composes your final
program. Your code from hello.
has to be combined with code from CS50's
library from the standard IO library
that other humans wrote. I and the team
wrote the CS50 code. Other humans in the
world wrote the print f code in standard
IO. So essentially the fourth and final
step is to link all of those zeros and
ones together. Somewhere on the server
there is not just the header file CS50.h
and standard io.h but your code hello.c,
our code cs50. C and the code that
contains print def's own implementation.
Bit of a white lie. It's technically not
called standard io. C, but the point
remains ultimately the same. So these
files have already been compiled for you
in advance. This is your code. What the
assembly process does is it combines all
of that into zeros and ones and then all
three chunks of zeros and ones are
linked together. So if you think back to
when I tried compiling the code without
-Lcs50, there was some mention of linker
linking just means the computer did not
know how to link your code with CS50's
code because we were missing LCS50 which
tells the compiler to go find it
somewhere on the hard drive. And the
final step then of linking is to combine
all of those zeros and ones into one
bigger blob of zeros and ones. And
that's what's inside your hello program
that you can execute. So long story
short, these four steps are what's been
happening ever since the start of last
week. Pre-processing, compiling,
assembly, and linking. But thankfully,
the world of programmers generally just
treats all four of these steps as what
we know now as compiling. It's just a
lot easier to say compile and not worry
about those lower level details. But
that might reveal better to you what all
of these error messages mean when you
see hints of this kind of terminology
questions on any and all of that from
here on out. We're going to go higher
level than lower. Yeah.
I I I don't get the part with the like
when we're talking about com um when I
think it's the assembly process when you
basically convert it to zeros and ones.
>> Um doesn't like across the multiple like
the three different ones. Don't the
zeros and one signify different things
like one signify text and the other
signify something else. How does the
computer know like what part what 8 bit
corresponds to which part?
>> Really good question. How does the
computer know which of those zeros and
ones corresponds to data like numbers or
strings of text or actual commands?
We're going to come back to that in week
four of the class. But long story short,
what we just saw on the screen is a big
blob of zeros and ones actually follow
some pattern where the bits up top
represent a certain functionality. The
bits on the bottom represent something
else and they're organized into
patterns. So, long story short, we'll
come back to that, but they follow
conventions. It's not just a hot mess of
like zeros and ones.
>> Other questions?
>> So, Preprocessing step is just replacing
the hashtag.
>> Correct. The pre-processing step goes
into the header file and essentially
copies and paste the contents of it into
your own code so you don't have to waste
time doing that manually yourself. Other
questions?
>> Just curiosity when you're talking about
the compiling step um how it converts it
to assembly code and you're saying that
the CPU understands all those commands.
Is the CPU then converting that into
Uh no the so when you compile your code
you're going from the uh assembly code
to the zeros and ones that sorry uh when
you compile let me pull up the the chart
again when you compile your code you're
going from the C code to the assembly
code and the patterns you get when you
see the assembly code are specific to a
certain CPU. So long story short, if
you're designing software for iPhones or
for Android devices or Macs or PCs,
you're going to necessarily use a
different compiler because given the
same C code, you will get different
assembly instructions in the output. And
this is why you can't just take back in
the day like a CD containing a program
from a Mac and run it on a PC or vice
versa because it's the wrong patterns of
instructions. But the reason why we have
all of these annoying layers of
complexity is because one, four
different people can now implement the
notion of compiling. Someone can
implement the pre-processor, someone can
implement the compiler, the assembler,
the linker, and you can actually
collaborate by breaking things down into
these quantized steps. But also you can
do this step, this step, and then two
different people can write compilers to
actually write uh to output assembly
code for like iPhones over here and
Android devices over here. But all of us
can still enjoy using the same language
up here. So there's a lot of reasons for
this complexity. Just understanding it
is useful, but you're not going to need
to use this sort of knowledge day today,
but it's what enables so much of today's
complexity nonetheless. All right, so a
bit of a flourish now as to what we've
been doing with compiling. Well,
compiling is going ultimately from
source code to machine code. Couldn't
you just kind of reverse the process,
right? If someone wrote really
interesting software like Microsoft Word
or Excel or something like that, well,
when I buy it or download it, like I
literally have a copy of all of those
zeros and ones, couldn't I just kind of
reverse this process and reverse
engineer someone else's code by
decompiling it? And this is genuinely a
threat. And this comes up in matters of
law and intellectual property because
the zeros and ones have to be accessible
to you and to your computer. So, it's
not a great feeling if someone with
enough time and enough savvy could sort
of reinvent Microsoft Word by just
figuring out what all those zeros and
ones mean. However, it's sort of easier
said than done to reverse engineer code
from these zeros and ones. For instance,
this pattern of bits on the screen here
did what did we say last week?
Silly. No normal person should be able
to answer this, but I did say it before.
These zeros and ones print what?
>> It just prints out hello world. And I
cannot glance at that and figure it out
like off the top of my head. But if I
know what architecture, what CPU this
code has been compiled into and I pay
attention in week four and know what the
various layout of the zeros and ones
are, I could painstakingly figure out
what each of those patterns of zeros and
one means by breaking them into chunks
of 8 or 16 or 32 or 64, which are common
units of measure that I alluded to last
week. Now, that's going to take a crazy
amount of time. And the sort of pre
presumption is that if you are smart
enough and capable enough and have
enough free time to do that, it would
probably take you less time to just
implement Microsoft Word the normal way
and just rebuild the software. It's
going to take you more time to go in
reverse than it would in the so-called
forward direction. But there's other
subtleties as well. Inside of this code
is not only commands like print,
functions like printf, but suppose that
it contained a loop for instance to
print meow meow meow. Well, we know
already that you can use a for loop
sometimes or you can use a while loop,
but they're functionally equivalent.
It's sort of a stylistic decision which
one you use, whichever one you're more
comfortable with, or maybe feels a
little better designed, but you can't
figure out from the zeros and ones
whether or not it was a while loop or a
for loop, because it just results in the
same pattern of zeros and ones. It's
just a programmer's choice. Which is to
say, you can't even perfectly reverse
engineer everything because it's not
going to be obvious from the zeros and
ones what the source code originally
looked like. But again the bigger deal
breaker is if you have that much time
and energy and savvy just like
reimplement Microsoft Word itself don't
try to reverse the whole process which
is going to be much more painstaking and
timeconuming instead. Now this is not
true for all languages and just as a
teaser in a few weeks time when we talk
about web programming and another
language called JavaScript it turns out
that JavaScript source code is actually
sent from web servers to web browsers
and you can look at the source code of
any website on the internet harvard.edu
edu, facebook.com, gmail.com, it's going
to be there. So, not all languages, it
turns out, are even compiled. Typically,
sometimes the source code is just
executed by the underlying computer. So,
we're just scratching the surface of
some of the implications of all this. In
a little bit time, let's take a look
further under the hood at the actual
memory, solve some other problems, but I
think it's now time for cheese it. So,
let's go ahead and take a 10-minute
break. Uh, snacks are now served. See
you in 10.
All right, we are back. And up until now
when we've been writing code, recall
that we have to specify like what type
of value you want to put in a variable.
Like that's why I had to go in and add
string before the word name in my first
bug today. But it turns out C, as we've
kind of seen already, has a whole bunch
of these data types. Um, I rattled these
off last week. Bool, int, long, float,
double, char, string. But we'll consider
for a moment just how much space each of
these things takes up and see if we
can't help you see what the debugger was
seeing earlier. That is what is where in
memory. So, a bull, it turns out,
actually takes up one bite, which is
kind of stupid because technically a
bool, true or false, really only needs
one bit. It just turns out that it's
more efficient and easier to just use a
whole bite, eight bits, even though
seven of them are effectively unused.
So, a bool will take up one bite, even
though it's just true and false. An int
recall uses four bytes. So, if you want
to count really high with an int, the
highest you can go is roughly 4 billion,
we've claimed, unless you want to
represent negative numbers, in which
case the highest is like 2 billion.
because if you want to be able to count
all the way down to negative two
billion, you got to kind of split the
difference. A long meanwhile is twice
that. It uses eight bytes which is
roughly nine quadrillion possibilities
which is quite a few more than 4
billion. Um that is if you want to
include negative numbers as well. Then
we had floats which were real numbers
with decimal points which speak to just
how precise you can be with significant
digits. A float is four bytes by
default, but a double gives you twice as
many bits to play with, which gets you
get lets you be more precise. Even
though at the end of the day, whether
you're using floats or doubles, floating
point imprecision, as we've seen, is a
fundamental problem for scientific,
financial, and other types of computing
where precision is ever so important. A
char meanwhile, at least as we've seen
it, is a single bite using asy
characters specifically. And then string
I'll put as a question mark because a
string totally depends on its length. If
you're storing high, that's like one,
two bytes. If you're storing hello,
that's like five bytes and so forth. So,
strings depend on how many characters
you actually want to store inside of
them. So, where does this go? Well, here
is a picture of a a stick of memory uh a
a dim so to speak, whereby on this uh
stick of memory, which is slid into your
computer, your laptop, your desktop, or
some other device, there's all these
little black chips that essentially
contain lots of room for zeros and ones.
it's somehow electronic, but inside of
there are all of the zeros and ones that
we can uh store data in. So, if we kind
of zoom in on this, it stands to reason
that for the sake of discussion, if this
one chip represents like one gigabyte, 1
billion bytes, it stands to reason that
we could slap some addresses on these
bytes whereby we could say this is the
first bite and this is the last bite or
more precisely this is by 0 1 2 3 dot
dot dot bite 1 billion. And it doesn't
matter if it's top, down, left, right,
or uh any other order. We're just
talking about this conceptually at the
moment. So in fact, let's go ahead and
draw this really as a grid of memory, a
sort of canvas that we can just use to
store types of data like bools and ints
and chars and floats and everything
else. If we are going to use one bite to
store like a char, well, you might use
just these eight bits up here, one bite
up here. If you want to store an int,
well that's four. You might use all four
of these bytes necessarily contiguous.
You can't just choose random bits all
over the place. When you have a four
byte value like an int, they're all
going to be contiguous back to back to
back in memory like this. But if you got
a long or a double, you might use eight
bytes instead. So truly, when you store
a value in memory, whether it's a little
number or a big number, all you're doing
is using some of the zeros and ones
physically in the computer's hardware
somewhere and letting it permute them,
turn them on and off to represent that
value you're trying to store. All right,
so let's go ahead and abstract away from
the hardware though and let's just start
to think of this grid of memory uh sort
of in zoomed in form and consider more
at a lower level what is actually being
stored inside of here. For instance,
suppose that we've got some code like
this containing three scores on like
problem sets. You got a 72 on one of
them, a 73 on another, and a 33 on the
third. I've deliberately chosen our old
friends 72 73 33 which recall spell high
or together in the context of colors is
like a shade of yellow just so that
we're not adding some new random numbers
to the mix. These are our old friends
three integers. Well, let's use these in
a program. Let me go over to VS Code
here and let me create with code a
program called scores.c. That's just
going to let me quickly calculate my
average score on my problem sets. I'm
going to go ahead and include as we
often do standard io.h at the top. I'm
going to do int main void after that.
And then inside of my curly braces, I'm
going to do exactly those sample lines
of code. My first score uh was let's say
a 72, my second score was 73, and my
third score was 33. So I've declared
three variables, one for each of my
problem set scores. Now let's calculate
the average. So print f quote unquote
average colon just so I know what I'm
printing. And now I'm going to go ahead
and use maybe percent uh i back slashn.
And then what I'm going to pass in is a
bit of math. So to compute an average,
it's just score 1 plus score 2 plus
score 3 divided by three. And I put the
scores the numerator in parenthesis just
like in grade school like I need to do
that operation first before doing the
division. So just like math class
semicolon at the end to finish my
thought. Let's see how this goes. Make
scores. enter dot slashcores and it
would seem that my average across these
three problem sets is 72
which I which is great but I don't think
that's actually what I want here. What
have I done wrong? It's unintentional.
Yeah.
>> Yeah. I'm kind of being a little
generous with myself here. I didn't
really factor in my worst score. So that
was accidental. So now let me do this
correctly. make scores dot slashscores
and now okay my average is 59 but I I
beg to differ I'd like to quibble my
score technically I think mathematically
should really be 59 and a3 I'm kind of
being cheated those that third of a
point so what's going on here why am I
only seeing 59 and not my full grade
>> you're using so
it's going to
>> perfect because I'm using integers when
I divide by three it's going to truncate
everything after the decimal point which
we touched on at the very end of week
one, which is an issue with just
truncation in general. So, one approach
to fix this, I could change my percent I
to percent F, which is the format code,
it turns out, for a float, and that is
what I want to print. So, let's see if
that fix alone is enough. Make scores.
Oops, it's not. I got ahead of myself
there. And let me scroll up to the
error. Format specifies double, but the
argument has type int. Turns out you can
use percent f for doubles as well. So,
that's why I'm saying double, even
though I intended a float in this case.
So, there's a problem here. I the
argument has type int even though I'm
passing in percent f. You're seeing
mention of percent d here which is an
alternative to percent i. We typically
encourage you to use percent i because i
for integer but there is uh that is not
the solution to this problem because I
want my third of a point back. So how
could I go about fixing this? Well the
fundamental problem here is that I'm
trying to format an integer as a float
or even as a double. Well I need to
convert these scores to floats instead.
So, I could go in and change this to
float, this to float, this to float, and
heck, just to be super precise, I could
add a 0 on the end of each of them just
to make super clear these are floats.
But there's another way. I could, for
instance, uh, simply convert my
denominator to 3.0 because it turns out
so long as you involve like one float in
your math, the whole thing is going to
get promoted, so to speak, to floating
point values instead of integers. I
don't have to convert all of them. So I
think now if I do make scores dot
slashscores now ah there's my third of a
percent uh the third of a point back.
There's another way to do this just as
an aside and we'll see this again down
the line if you really want to stick
with three cuz it's a little weird just
semantically to divide by 3.0 like
that's an implementation detail but
you're truly computing an average of
three things. You can technically cast
the three to a float in parenthesis. You
can specify the data type that you want
to convert another data type to. And
this too should make the compiler happy.
Aha. Dot /cores. I get roughly the same
answer. We're seeing some floatingoint
imprecision though nonetheless. But that
too would achieve the goal here. But
short that's all just a function of um
floating point arithmetic there. So
what's going on now actually in the
computer's memory? Let me revert back to
the simpler one with just 0 there. And
let me propose that we consider where
these three things are in memory. Well,
if we treat this as my grid or canvas of
memory, who knows where they're going to
end up? But for the sake of discussion,
let's assume that 72 ended up in the top
left of my computer's memory. I've drawn
it to scale, so to speak, and that this
score one variable is clearly taking up
four bytes of memory, and it's an int.
And that's typically how many bytes are
used on systems. Technically, it depends
on the exact system you're using, but
nowadays it's pretty reasonable to
assume that an integer will be 32 bits
on most modern systems. Score 2 is
probably over there. Score 3 is probably
over there. So, I'm using 12 bytes
total, four bytes for each of these
values. All right, so that's really all
that's going on underneath the hood. I
don't have to worry about this. The
compiler essentially figured out for me
where to put all of these things in
memory. But what really is in memory?
Well, technically each of these
variables if it's used if it's composed
of 32 bits is really just a pattern of
literally 32 zeros and ones. And I
figured out the pattern here. I crammed
them all into the space there. But you
see here three patterns of 32 bits which
collectively compose those numbers
there. But let's consider design now in
terms of my code. This gets the job
done. It's not that bad or big of a deal
for just calculating the average of
three scores. But this should also start
to rub you the wrong way. this week
onward when it comes to design like this
is correct especially now that I uh
clamorred back my third of a point but
this is bad design using the variables
in this way why might you think
yeah
>> you're going to have to type in each
score manually assign variable
individually
>> yeah I'm going to have to type in each
score manually with each passing week
when I get the fourth problem set and
the fifth I mean surely people who came
before us came up with a better way to
solve this problem than like manually
create 10 variables, 20 variables,
whatever it is by the end of the
semester. It just feels a little sloppy.
And indeed, that's often the the way to
think about the quality of something
that's designed. Think about the
extreme. If you don't have three scores,
but 30 or 300, is this really going to
be the best way to do it? And if you
feel like, no, no, there's got to be a
better way, odds are there are.
Certainly, if the language itself is
well designed, so let's consider how
else we might go about solving this.
Well, it turns out we can treat our
canvas of memory, that grid of bytes you
into uh chunks of memory known as
arrays. An array is a chunk of
contiguous memory back to back to back
whereby if you want to store three
things, you ask the computer for a chunk
of memory for three things. If you want
30, you ask for one chunk of size 30. If
you want even more, you ask for a chunk
of size 300. Chunk is not a term of art.
I'm just using it to colloqually explain
what an array actually is. It's a chunk
or a block of memory that is back to
back to back to back. So what does this
mean in practice? Well, it means that we
can introduce a little bit of new syntax
in C. If I want to create one variable
instead of three and certainly one
variable instead of 30, I can use syntax
like this. Hey compiler, give me a
variable called scores plural. Give me
room for three integers therein. So,
it's a little bit of a weird syntax, but
you specify the type of all of the
values in the array. You specify the
name of the array, scores in this case,
and I pluralized it just semantically
because it makes more sense than calling
it score now. And then in square
brackets, so to speak, you specify how
many integers you want to put into that
chunk of memory. So, this one line of
code now will essentially give me 12
bytes automatically, but they'll all be
referable by the name scores plural. So,
let's go ahead and weave this into some
code as follows. Let me go back to VS
Code here, clear my terminal, and now
let's just whip up the same kind of
program, but get rid of these three
independent variables. And instead,
let's go ahead and just say int scores
plural bracket three. Now, I need a way
to initialize the three values. But this
I can do too. It turns out that if I
want to put three values in this, I just
need slightly new syntax. I can say
scores bracket 0 equals 2 72 scores
bracket 1 equals 73 scores bracket 2
equals 33 so it's not all that different
from having three variables but now I
technically have one variable and I am
indexing into it at different locations
location 0 1 and two and it's zero
because we always in computing start
counting from zero so I do scores
bracket zero is going to be my 72
problem set scores bracket one is my 73
problem set and scores bracket two was
my weakest my uh 33 P sets. Now my
syntax down here has to change because
there are no more score one, score two,
score three variables, but there are
scores bracket zero plus scores bracket
one plus. And notice what VS Code is
trying to do for me. It's saving me some
keystrokes. As I type in scores and type
one single bracket, notice it finishes
my thought for me and magically puts the
cursor where I want it so I can put the
two right there and generally save on
keystrokes. But that has nothing to do
with C. just has to do with VS Code
trying to be now helpful. So I think now
if I go down here and do make scores dot
slashcores, we get the same answer, but
it's arguably better designed because I
now have one variable instead of three,
let alone many more. And in fact, if I
wanted to change the total number of
scores, I can just change what's in that
initial square bracket. So if we
consider what's going on now, if we look
at the computer's memory, it's the same
exact layout, but there's no more three
variable names. There's one scores
bracket zero, scores bracket one, and
scores bracket two. And notice here,
ever more important, an array's values
are indeed contiguous back to back to
back. Now, the screen is only so wide.
So, they kind of wrap around to the next
row of bytes, but the computer has no
notion of up, down, left, right. I mean,
it's just a piece of hardware that's got
lots of available that can be addressed
from the first bite all the way down to
the last bite. The wrapping is just a
visual artifact on this here screen. All
right. So if I've done this now, maybe
we can make this program a little more
dynamic than just hard- coding in my
scores. Let me go in and add the CS50
header library so that we could also use
for instance like get int and start
getting these scores dynamically. So I
could do get int and I could prompt the
user for a score. I could use get int
again and I can prompt the user for
another pet set score. I can use get int
a third time and prompt the user for a
third such score. And then pretty much
the rest of my code can stay the same.
Let's do make scores again. Dot
slashcores 72 73 33. And now my
program's a little more interactive.
Like this doesn't work for just my three
scores. It could work for anyone scores
in the class. Now this too hints of bad
design. I like my introduction of the
array because I now have one variable
instead of three. But what now might rub
you the wrong way among lines n 7, 8,
and nine? Yeahive.
>> It's repetitive. I mean, I typed it
manually, but I might as well have just
copied and pasted like literally the
same thing. So, what's a candidate for
fixing this? Like, what programming
construct might clean this up? Yeah,
>> yeah, we could use a for loop or a while
loop or whatever, but a for loop would
get the job done. And that's often my
go-to. So, let's do that instead. Let's
go under my declaration of the array and
do four int i= 0, i less than 3, i ++,
which we keep seeing again and again.
Uh, now how do I index into the array at
the right location? Well, here's where
the square brackets are kind of
powerful. I can just say my scores array
at the location I should get an int
from the user as follows. So now I'm
using get int once inside of a loop, but
because I keeps getting incremented as
we've done many a time now for meowing
and other goals, I'm putting the first
one at location zero. Why? Because I is
initialized to zero. I'm putting the
second one at location one. Why? Because
I'm going to plus+ or increment I on the
next iteration, then the next iteration.
So, this has the ultimate effect of
putting these three scores at location
zero, one, and two instead of me having
to type all of that out manually. Now, I
don't love how I've done this still. If
we really want to nitpick, this solves
the problem correctly, but it's kind of
got a poor design decision still. It's
got a a magic number as people say. What
is the magic number here and why is it
bad?
Yeah, over here.
>> Yeah, it was a little soft, but I think
the number three is hardcoded in two
places. We've got it on line six, which
is the size of the array, and then again
on line seven, which is how many times I
want to iterate. But those are the exact
same concepts, but it's on the honor
system that I type the number three
correctly both times. So, I think we can
fix this a little better. I could do
something like int n equals 3 and then I
could use n here and then I could use n
here so that now I only change it in one
place. If your eyes are wandering to the
bottom of the program, there's still a
problem here because I've still
hardcoded 0, one, and two, but we'll
come back to that. But this is arguably
a little better. But let's talk a little
bit about style. Typically when you have
a con when uh typically when you've got
a a variable that should not change its
value we saw last week that we should
declare it as constant and the trick
there is to literally just write const
for short in front of the type of the
variable and now it should not be
changeable by you by a colleague a
collaborator or the like but typically
too by convention stylistically to make
visually clear to another programmer
that this is a constant it's convention
also to capitalize constants so to
actually use like a capital N here in
all places just to make clear visually
that there's something interesting about
this variable and indeed it is a
constant that cannot be changed. All
right, with that refinement, I don't
think we've really improved the program
fundamentally. I think we're going to
need to do a bit more work to do this
really well. So, I'm going to do this a
little quickly, but mostly to make the
point that we can make this indeed more
dynamic. So, let me hide my terminal
window there. Let me go ahead now and
get the scores as I already am as
follows here. And let me go ahead and
uh assume for the sake of
time that we have a function that exists
already called average and I simply want
to pass in to that average function the
scores whose average I want to
calculate. So average does not exist off
the shelf like I can't just use an
existing library for it. I'm going to
have to implement this thing myself. But
how? All right. Well, let's go ahead and
do this. At the top of my file, I'm
going to go ahead and compute or define
a function called average uh that takes
in what? An array of numbers. So, this
syntax is going to be a bit new, but the
way I do this is int say array bracket
zero or array sounds a little too
generic. Let's just call it numbers for
instance here. So that says my average
function is going to take as an argument
an array of numbers. This average
function though should return a value
too. And it should return what type of
value from what we've seen thus far?
A number, a float specifically. It could
be int. But then I'm going to get short
changed my third of a point potentially.
So I think I wanted to return a float.
Or if you really want precision, you
could return a double just to be really
nitpicky. But that seems excessive here.
All right. Well, now inside of my
average function, how can I calculate
the average? Well, this is just kind of
like a math thing. So, I could declare a
variable called sum and set it equal to
zero. I could then have a for loop
inside of this function for int i gets
zero, i less than, huh? Uh, I'm going to
come back to this the number of numbers
in the array. And then I'm going to do i
++. And then on each iteration, I'm
going to do sum equals whatever the
current sum is plus whatever is in the
numbers array at that location. So I'm
going a little quickly, but again, I'm
just applying the same lesson learned.
Numbers is my array. Numbers bracket i
means go to the i location in there. But
if my loop starts at zero, that means go
to location zero and then one and then
two. And heck, if there's more scores in
this array, it's just going to keep
going on up from there because of the
plus+. But I hesitated here for a couple
of reasons. So I put a to-do here, which
is not a thing. That's a note to self.
How far do I iterate? Well, if you've
pro come into CS50 with programming
before, you can usually just ask an
array, aka a vector, what its length is
in Java and in Python and the like. You
can't do that in C. So if I want to know
what the length is of this array, I've
got to have the function tell me. So I'm
going to additionally propose that this
average function can't just take the
array. It's also going to have to take
another argument, a second input, for
instance, called length that tells me
how long it is. And then down here,
which is where we started the story,
when I use this so-called average
function, I'm going to have to tell the
average function by passing in n how
many numbers are in that array, just
because this is annoying that you have
to pass in not only the array, but also
its size separately. That's the way it's
done in C. More recent languages have
improved upon this. So you can just
figure out what the length of the array
is as we'll see in a few weeks in
Python. All right, back to the average
function at hand. I think we're almost
there. This is a little unnecessarily
verbose. Recall that we can tighten this
up by just doing plus equals whatever is
in numbers bracket I. That's just
tightening it up. It's syntactic sugar,
so to speak. And then the last thing I'm
going to do in my average function is
what? Actually calculate the average. So
what is the average? It's just the
numerator. like the sum of all of the
scores divided by the total number of
all of the scores. Well, I've got the
sum. So, I think I just want to do sum
divided by what to get the actual
average now?
>> Yeah.
>> Exactly. Sum divided by length will give
me the average because the sum is the
numerator effectively all of the scores
added together and the denominator is
the length. How many numbers were there
actually? Now, I can't just write this
math expression here. If this is going
to be my function's return value, and
we've done this once or twice before, I
literally say in my average function,
return this value. So, it hands back the
work. I could use print f and just print
it on the screen, but I don't want that
visual side effect. I want to hand it
back so that on line 23, I can simply
calculate the average of those n scores
and let print f use it as the value of
that format code percent f.
All right. Unfort uh I think we are in
reasonably good shape. Let me cross my
fingers now and hope I didn't screw this
up. Make scores. Okay. Dot slashcores.
How many do we want to do? So we'll do
72 73 33. Enter. And there is Oh, so
close. Average.
I've had a regression. I've made the
same mistake again just in a different
way. I think I saw your hand go up. Why
am I getting 59 and I'm not getting my
third of a point?
>> Yeah, I in this return line on line 11.
Right now, I'm again stupidly doing
integer divided by integer. That will
make us suffer from integer integer
truncation because if you're returning
an integer, there's no room for the
decimal point or any numbers thereafter.
So, how do we fix this? Well, I could
change the sum to float. like that would
be reasonable. So then I do a float
divided by the length. I could do my
casting trick like convert the float the
length to a float just for the sake of
floating point arithmetic. There's a
bunch of ways to solve this but I think
I'll go with this one. Now let me now do
make scores again dot/score 72 73 33 and
now I've got albeit with some
imprecision I think enough precision
certainly for like a college grade in
this case 59.33
and so forth. Okay. So what are the
things to actually care about here? So
there's a decent amount of code here.
Most of it is sort of stuff we've seen
before, but the interesting parts I
would propose are this. When you create
your own function that takes an array as
input, you have to take as input the
length of the array. You're not going to
be able to figure it out correctly. As
in mo newer languages, you also need, of
course, to pass in the array itself. How
do you pass in an array? Well, when
you're defining the function, you
specify the type of values in the array.
whatever you want to name the array
inside of this function and then you use
empty square brackets like this. You
don't have to put n or some other number
there. All you need to tell the compiler
is that my average function is going to
take some array of values specifically
this many. You don't put it inside the
square brackets there. Then when I use
it now it's just the now familiar syntax
when you want to index into your array
that is go to location zero or one or
two you just use square bracket notation
here. But the array itself, recall, was
actually created in Maine when I did
this line of code here where I said,
give me an array called scores, each of
whose values is going to be an int, and
I want this many of them. And so maybe
the final flourish that I'll add here,
just to be sort of nitpicky, is I keep
saying that main should really go at the
top. Fine, no big deal. Let me highlight
my average function, move it to the
bottom of my file just because, and then
and only then I'll copy and paste that
first line, the so-called prototype, so
that Clang doesn't freak out by not
knowing what the average function is. So
in short, there's seemingly a bunch of
complexity here, but all we're the only
thing that's really new in this one
example is this is how you pass to a
function an array that already exists
elsewhere, not by its name, but by with
the square brackets there.
Okay,
questions on arrays or any of this new
syntax? Yeah,
>> a bit slow, but
back when you did the whole like average
thing,
>> okay,
>> you said that we could store it as a
float
>> and instead of saying 3.0 was a float,
you just said because 3.0 is a float.
How does it know it's not a double?
>> Oh, uh, how does it know it's not a
double? So, by default, if you just type
a number like 3.0 zero into your code,
it will be assumed to be a double just
because um raw values, literal numbers
with a decimal point will be treated by
the compiler as doubles and be allocated
64 bits.
>> So how come you still do percentage?
>> Uh uh just because like the world did
not need to create a new format code
like percent D is not double percent D
is decimal integer but don't worry about
that. We tend not to talk about it too
much in class. Percent I is integer.
Percent F is float. But percent F is
also double. And this is not consistent
because what's a long percent L L I.
What did I say last week? Percent LI
gives you a long integer. It's just a
mess. That's there's no good reason for
this other than historical baggage.
>> Thank you.
>> Sure. I'm not sure if that's reassuring,
but All right. So,
um
Okay. Let's use these this knowledge for
like something useful now and actually
tease apart what is uh how we can use
these um these skills for good and to
better understand what's going on inside
of the computer as follows. Let me go
over to our grid of memory and this time
let's not store some numbers but let's
store like these three lines of code
these three variables. So three chars
even though we you know where this is
going like this is not good design
because I got three stupidly named
variables C1 C2 C3 but let's make a
point first. The first variable's value
is quote unquote H. Second is I. Third
is exclamation point. Why though am I
using single quotes suddenly instead of
double quotes?
>> It's a character. Chars are single
quotes. Strings are double quotes. And
we'll see the distinction why in a
moment. So for instance, if this is my
grid of memory and this program contains
just three variables, each of them a
char. Odds are they'll end up like this
in memory. C1, C2, C3, HI, exclamation
point. Assuming there's nothing else
going on in my program, they're just
going to end up being back to back to
back in this way. even though it might
not uh in in this way. So what does this
really mean is going on? Well, let's go
ahead and poke around. Let me go back to
VS Code here. Let's close scores.c
reopen my terminal and let's create a
new program called high C and just do
something playful. So let me include
standard io.h at the top. Let me do int
main void after that. And inside of my
curly braces, let's just repeat this. C1
equals H in caps. Char C2 equals I in
caps. and then char C3 equals
exclamation point in cap uh in
exclamation point. That's all. Now,
let's actually poke around and see
what's inside the computer's memory. So,
I could do something like this. I could
print f for instance, percent c percent
back slashn and percent c turns out
means character. So, what do I want to
plug in? C1, C2, and C3 semicolon. So,
let's go ahead and do this. Make high.
enter dot /h high and voila, there's my
hi exclamation point. There's no magic
here. Like I'm literally just printing
out three char variables. I can I don't
need the spaces. If I want to get rid of
those spaces between the word, I can
remake this. Make high dot /h high. And
now we're back in business. hi
exclamation point. But here's where an
understanding of types can give you a
bit of power and sort of satiate some
curiosity. What if I change my percent C
to percent I? percent I percent i. So
int int int. Well, turns out that a char
is really just a number because it's an
asky value from 0 to 255. So there's
nothing stopping me from telling the
compiler, don't print these as chars,
print them as integers. So let's do make
high dot /h high. Enter. And that's a
little cryptic. It looks like it's
saying 727,333,
but no, let me add those spaces back in
between each of those placeholders. make
high again dot /hi there are our old
friends 72 73 33 it is not necessary in
this case to say int int int because the
compiler is smart enough and print f is
smart enough that if you hand it a value
that happens to be a char it knows
already it's going to be an integer
essentially so you don't even need to
bother explicitly casting it this way
we're essentially implicitly casting it
to an integer by using those format
codes as such. All right, so that just
proves that what I've claimed is the
case, that there is this equivalence
between characters and numbers is
actually the case inside of the
computer's memory. So even though you're
storing hi exclamation point,
technically you're storing three
patterns of eight bits each that give
you these decimal numbers 72, 73, and 33
or specifically these patterns here. All
right, then what is a string? And this
is where things get a little more
interesting. string as we've used it is
like a whole word or a phrase or when we
started class today like a whole
paragraph of text. So that's multiple
values. Now why is that interesting for
us potentially? Well, let's go ahead and
write one line of code as a string. So
here for instance is one line of code
with a string. Let's go ahead and put
that into my program. So I'm going to go
back to VS Code here and clear my
terminal. And I'm going to go ahead and
delete all of this code here for a
moment. And I'm going to do something
like this. String s equals quote unquote
high with excl uh with double quotes
now. And now just like in week one, I'm
going to print out percent s back slashn
and print out the value of s per earlier
because string is technically one of our
training wheels for just a few weeks.
I'm going to additionally include cs50.h
at the top so that the compiler knows
about what this word is string. All
right, let's go into the terminal. make
high dot /h high enter and we're back in
business printing that out now as an
entire string. Well, what's going on
inside of the computer's memory this
time? Well, I still have hi exclamation
point, but it's a string now. Well, it
turns out the way that's going to be
laid out in the computer's memory is
exactly like before. There's no mention
of C1, C2, C3 because those variables
don't exist. There's just one variable
S, but it's referring to three bytes of
memory, it would seem. hi exclamation
point. And you can kind of see where
this is going. Like a string, as a
spoiler, turns out is actually just what
an array.
>> It's just going to be an array of
characters. Hence the the dots we're
trying to connect today. So at the
moment though, this is a single variable
s a string. The value of which is hi
exclamation point. But you know what? If
it is in fact an array, I bet we can
start playing around with our new square
bracket notation and see as much in our
actual code. So in fact, let me go ahead
and do this in VS Code. Now let's not
use percent S. Let's use percent C,
percent C, and percent C three times.
Then instead of just S, let's print it
out like it is an array. S bracket zero,
S bracket 1, S bracket 2. Let's go back
to VS Code. Uh my terminal in VS Code,
make high dot slhigh. and nothing has
changed, but I'm printing it out now one
character at a time because I understand
what's going on underneath the hood. In
this case, I can actually see these
values. Now, let's go ahead and change
the percent C to percent I and add a
space just so it's easier to read.
Percent i space percent i space. I don't
need my casts in parenthesis because
print f is smart enough to do this for
me. Make high again dot /h high. There
again is my 72 733. However, that came
from the mere fact that I put in double
quotes hi exclamation point. So, what's
really happening here is it seems that a
string is indeed just an array of
characters.
But how does the computer know when
doing percent s know what to actually
print? In other words, it stands to
reason that eventually if I've got more
variables, more code, there's going to
be other stuff in the computer's memory.
Why does print f know when using percent
s to stop here and not just keep
printing characters that are over here?
Especially if I did have more variables
and more stuff in memory. Well, let's
take a look at what's just past the end
of this array. Let's go back to VS Code.
And now let's get a little crazy and add
in a fourth percent I. And even though
this shouldn't exist, let's do S bracket
three, which even though it's the number
three, it's the fourth location, but hi
exclamation point is only three values.
So, let's look one location past the end
of this array. Make high dot slashh
high. Interesting. It seems, and maybe
it's just luck, good or bad, that the
fourth bite in the computer's memory
seems to be a zero. Well, that's
actually very much by design. And it
turns out if we look a little further by
convention what the compiler will do for
us automatically is terminate that is
end any string we put in double quotes
with a pattern of 8 zero bits. More
succinctly it's just the number zero
because if you do out the math you've
got eight zeros it gives you zero in
decimal or more technically the way it's
typically written is this because it's
not like the number zero that we want to
see on the screen. back slashz0 similar
to back slashn is sort of a special
escape character. This just means
literally 8 zero bits not the number
zero that you might see in a phone
number or something like that. So even
though we said string s equals quote
unquote high with an exclamation point
seemingly three characters, how many
bytes does a string of length three
actually seem to take up in memory?
It's actually going to be four. Then
this happens automatically. That's what
the double quotes are doing for you.
They're telling the compiler, "This is
not just a single character. This is a
sequence of characters. Please be sure
to terminate it for me automatically
with a special pattern of 8 bits." And
that special pattern of 8 zits actually
has a name. It's the so-called null
character or null for short. The null
character is just a bite of zero bits
and it represents the end of a string.
You've actually seen it before if super
briefly two weeks ago. Here was our ASKI
chart and we focused mostly on like this
column here and this column here and
then we looked at the exclamation point
over here. But all this time over here
asky character zero is null n which just
means that's how you pronounce all eight
zero bits. It's been there this whole
time. So why is it done this way? Well,
how is the computer actually printing
something out in memory? Well, it needs
to know where to stop. Print F is pretty
stupid. Odds are inside of print f
there's just a loop that starts printing
the first character, the next character,
the next character, and it's looking for
the end of the string. Why? Well,
consider what might happen. Suppose
you've got a program that has not just
one string, but two. For instance, two
strings like this. So, in fact, let me
go back to VS Code here, clear my
terminal, and let's just make this
program a little more interesting for a
moment. String t equals quote unquote
by, for instance. And then down here,
let's do two print fs. percent s back
slashn and print out s print f percent s
back slashn print out t. Now to be
clear, percent s means string
placeholder. T and s are just also the
names of the variables. There's no
percent t that we want to use here. All
right, let me go down to my terminal
make high and voila, I get high and by
just like you would have expected last
week. But what's going on inside of the
computer's memory? Well, in so far I
asked I have asked it to create two
variables s and t like this. Odds are
what's happening in the computer's
memory is high is ending up here aka s t
because there's nothing else in this
program is probably going to end up here
b exclamation point but it wraps on this
particular screen. T is taking up 1 2 3
4 five bytes total just as high is
taking up four bytes total because the
compiler is automatically adding for me
the back slashzero the null character to
make clear to other functions where this
string ends.
So what does this mean in real terms and
why is it zero? Well, why is it zero?
Like h just because like at the end of
the day all we have is bits. We've got
eight bits to work with for chars. You
got to pick some pattern. We could have
chosen all ones. We could have chosen
all zeros. We could have chosen
something arbitrary. A bunch of humans
in a room years ago decided eight zeros
will mean the null character. That's the
special character we will use to
terminate strings in this way. Well,
what does that mean with our new syntax?
Well, it means we could poke around with
strings as well. So, even though that
first variable is S and that second one
is T, you could technically poke around
and access S brackets 0 and 1 and 2 and
3. t bracket 0 1 2 3 and four and so
forth. So, in fact, if I wanted to dive
in deeply there and actually see that,
well, let me go ahead and do this. Uh,
back in VS Code here, let me make a
refinement here. I've now got, uh, my
two strings here. Um, I could go and,
for instance, down here, just like
before, percent C, percent C, percent C,
percent C, percent C, percent C, percent
C. And if I then do s bracket zero, uh,
s bracket 1, s bracket 2, whoops, two,
and then down here, t bracket zero, t
bracket 1, t bracket 2, t bracket three,
and I'm doing that only because the word
by is longer than the word high. If I do
make high, same principles work even in
this context here. But let's add an
interesting twist just because if I have
these values in memory here uh as
follows. Well, it's kind if I've got two
words in memory, I could use them in an
array too. Instead of having like s and
t or word one and word two, I can
actually put strings in an array, too.
So, let's go ahead and do this. Let me
go back to VS Code. And just for fun
now, let's go ahead and do this. Give me
an array called words that's going to
fit two strings. Then in the first
words, words bracket zero, put hi. Then
in words bracket one, put by. The only
thing new here is that I'm making an
array of strings now instead of an array
of ins. But all of the syntax is exactly
the same. How can I go about printing
these things? Well, just as before, I
can do print f percent s back slashn and
print out words bracket zero. Then I can
do print f quote unquote s back slashn
words bracket one. And again, I'm just
sort of applying the same simple syntax
that we saw before. SLHigh again of the
sixth version of this program, right?
I'm just sort of jumping through
syntactically to demonstrate that these
are just different lenses through which
to look at the exact same idea. And
while a normal person would not do this,
we could think about what's really going
on in memory with arrays of words when
those words themselves are arrays of
characters. because a word is just a
string. So this code here gives us
something like this in memory in that
program a moment ago. This is words
bracket zero. This is words bracket one.
The only thing that's different is I'm
not calling them sn. I've given them one
name with two locations 0 and one. Well,
if each of these values is itself a
string, well, you said earlier that a
string is just an array. So we can
actually think of these two strings even
though the syntax is getting a little
crazy using two sets of square bracket
notation where I can index into my array
of words and then index into the
individual letters of that word by just
using more square brackets. And again,
this is just to demonstrate a point, not
because a normal person would do this.
But if I go back to VS Code, instead of
printing out these two strings, why
don't I do something like this? Print f
quote unquote percent C percent C
percent C back slashn. Then let's print
out the first word, but the first
character therein. Let's print out the
first word, but the second character
therein, the first word, but the third
character therein. And even though I'm
saying third and second and first, it's
2, 1, and zero respectively because we
start counting at zero. And then lastly
here, we can print out the second word.
Percent C, percent C, percent C, percent
C, back slashn, then words bracket. How
do I get to the second word in this
array?
Words bracket one, the first character
they're in. Words bracket one, the
second character they're in. Words
bracket one, the third character they're
in. words bracket one the last character
therein and again I'm this is just to
demonstrate a point but if I do make
high now dot slashh high we have full
control over everything that's going on
if you now do agree and understand that
an array can be indexed into square
bracket notation as can a string because
a string is itself just an array strings
are arrays for today's purposes then
questions on any and all of these
tricks.
No. All right. Yeah. In front.
>> Okay.
How do you like that?
>> How do you establish or create an array?
Well, in the context of this program, if
I go back to VS Code, line six here
gives me an array of size two, an array
of two strings, if you will. The
previous example we were playing with,
which was my scores, uh, whoops, wrong
program, wrong file. If I open up scores
C as before, this line here, line nine,
gives me an array of n integers.
So, that is what establishes or creates
the array in memory. You specify a name,
the size, and the type.
That's all. And the only thing that's
new today again is the square bracket
notation, which in this context creates
an array of that size. But once it
exists, you can then access that chunk
of memory by using square brackets as
well.
Other questions on arrays? Yeah, in
front.
all the values in the array as you
declare it or do you need to go in index
by index to declare?
>> Good question. Do you need to go index
by index to put things inside of an
array? Short answer, no. So, let me open
up again scores.c from before and what I
could have done in an earlier version of
my program would be something like this.
I could have done 72 73 33. And I
deliberately didn't show this because I
didn't want to add too much complexity,
but you can use curly braces in this new
way and initialize the array in one
line. And in that case, you don't even
need to specify the size because the
compiler is not an idiot. It can figure
out that if you've got three numbers on
the right, it knows that it only needs
three elements on the left to put them
into. But let me undo that and leave it
just as I did. But short answer, yes.
You can statically initialize an array
if you know all of the values up front
and not when using get int.
All right. So, if you're on board with
the idea that all a string is is an
array and that array is always null
terminated, we can now
use that knowledge to like solve some
simple problems and problems that others
have already solved before us. So, let
me go ahead and close that file in VS
Code. Let me go ahead and open up
another program here called length.c.
And let's just play around with the
length of strings as follows. Let me
include the CS50 library at the top. Let
me include standard io after that. Let
me do int main void after that. And then
inside of main, let's prompt the user
for their name by using get string and
just say name colon today. And then
after that, let's go ahead and figure
out the length of the person's name.
Like d- avid, I should get the answer of
five. And ke ly, we should get the
answer of five. And hopefully for a
longer or shorter name, we'll get the
correct answer as well. So, how can I go
about counting the number of characters
in a string? Well, the string is just an
array, and that array ends with the null
character. There's a bunch of ways we
can do this, but let me go ahead and do
this. Let me create a variable called n,
which eventually will contain the length
of the name. And I'm going to set it
equal to zero because I don't know
anything yet about the length. Then, I
can do this with a for loop, but I
prefer this time to use a while loop.
I'm gonna say the following. While the
person's name at that location does not
equal backs slashz0,
go ahead and add one to the value of n.
And then after all of this, go ahead and
print out with percent i back slashn the
value of n. So what's going on here?
This is easier said when you know
already where you want to go with it,
but with practice, you too can bang this
out pretty quickly. n is going to
contain the length of my string. I have
in my loop here a boolean expression
that's just asking the question, does
name at the current value of n not equal
the null character? In other words,
you're asking yourself, is this
character null? Is this character null?
Is this character null? Is this
character null? And if not, you keep
going. You keep going. And this is kind
of a clever trick because I'm using n
and incrementing it inside the loop. So
when I look at d, that's not equal to
back slashz. So I increment n. Now n is
one. So I look at name bracket one.
What's at name bracket one if it's my
name? A. A does not equal back slashz0.
So it increments n. What's at location
two in dav ID? V. V does not equal back
slashn. So we repeat with i. We repeat
with d. And then we get to the end of my
name which is the null character because
the get string function and c put it
there automatically for me. The null
character does equal backs slash0. n
does not get incremented any more time.
So at this point in the story on line
13, n is still five because I have not
counted the new the null character. So I
hope I will see five on the screen. This
is just kind of a very mechanical way of
checking checking checking checking
trying to figure out uh through
inference how long the string is because
it's as long as it takes to get to that
back slash zero the null character. So,
let's do make length. Enter dot slength.
Type in my name, David. And I indeed get
five. Let's go ahead and dolength Kelly.
I indeed get five. And hopefully for
shorter and longer names, I'm going to
get the exact same thing, too. In fact,
we can try a corner case. Dot
slashlength. Enter. Let's not give it a
name at all. If I just hit enter here,
what should the length of the person's
name be?
Zero. Which is not incorrect. It's
literally true. But that's because we're
going to get back essentially quote
unquote. But even though it's quote
unquote in the computer's memory, it's
still going to take up one bite because
the get string function will still put
null at the end of the string even if
it's got no characters therein. So it
turns out this is not something you need
to do frequently like initializing a
variable using a loop like this. It
turns out there are better solutions to
this problem. You do not need to
reinvent this wheel yourself because it
turns out in addition to standard io.h H
and CS50.h and as you probably saw in
problem set one, math.h uh and perhaps
others. There are other libraries out
there, namely the string library itself.
In fact, if you go into the CS50 manual,
you can look up the documentation for a
header file called string.h, which
contains declarations for that is
prototypes for a whole bunch of helpful
functions. In fact, the manual pages for
it are at this URL here. The most
important function and the one we're
going to use so often for the next few
weeks is wonderfully called stir lang
for string length. Someone else
literally decades ago wrote the code
that essentially looks quite like this
but packaged it up in a function that
you and I can use. So we don't have to
jump through these stupid hoops just to
count the length of a string. We can
just ask the string length function what
the length of a string is. But odds are
if we looked at the C code that someone
wrote decades ago, it would look indeed
quite like this. So how can I simplify
this program? Well, I can get rid of all
of this code here. I can include
string.h at the top of my file. And then
I quite simply could do something like
this. int length equals sterling of
name. That's going to put in the
variable length. Actually, let's be
consistent. int n equals stir length of
name. And then on line nine, let's print
it out. Let's try this. Make length dot
slashlength David. Okay, Kelly. Okay,
and no one. And zero. It seems to now be
working. So this is a wheel we do not
need to in reinvent. And frankly, now in
a matter of design, I don't really need
the variable n anymore. Recall that we
can nest our functions just like we did
with average before. So let me get rid
of that line and just say sterling of
name is actually perfectly reasonable
here. All right. Well, what more can we
do with this? Well, let's consider some
other matters of design. Let me close
out length C and let's create another
program of our own called string.
C in which we'll play around now with
this library and others. Let me go ahead
and include cs50.h.
Let me go ahead and include standard
io.h. Let me go ahead and include also
string.h.
All right, what do I want to now do?
Well, in main void and inside of main,
let's go ahead and write a program that
prints a string character by character
just to demonstrate these mechanics. So,
string s equals get string and I'm going
to ask the user for some input because I
just want to play around with any old
string. I'm going to go ahead and
proactively say output here and I'm
going to go ahead and uh not use a new
line character there deliberately below
this. Now I'm going to have a for loop,
though I could use a while loop that
says int i equals z, i is less than
sterling lang of s, the string I just
got from the human, and increment i on
each iteration. And on each iteration,
print out just one character in that
string, specifically at s location i.
And then at the very bottom of this
program, let's just print a single
backslash n to move the character onto a
new line. Long story short, what have I
done? I wrote a stupid little program
that prompts the user for a string,
prints the word output thereafter, and
then it just prints the word that they
typed in character by character by
character by character until it reaches
the end of the string based on the
length returned by Sterling. So, let's
go ahead and run this in my terminal
window. I'm going to do make string dot
sling and I'll type in my own name of
before. This was a subtlety. I
deliberately wrote two spaces here
because I just um to be nitpicky, I
wanted input and output to line up
perfectly. So you can see what's
happening. Indeed, if I do enter here,
now I see input is David. The output is
David as well. So that was just a
formatting trick that I foresaw.
Why is this program correct but not
arguably well-designed?
It's pretty good in that it's using the
Sterling function. I didn't reinvent the
wheel unnecessarily, but there's an
inefficiency that's kind of subtle.
And it relates to how a for loop works.
Any thoughts? This program I claim is
doing unnecessary work somewhere.
Yeah.
>> Why do you have to character?
>> Okay, that's definitely stupid. Um, you
don't have to output a character by
character. That's just my pedagogical
decision here. So, correct, but not the
question we're fishing for. There's a
second stupid thing. Yeah.
>> Yes. Every time through this loop, and
this isn't so much my conscious choice,
but my mistake. I'm checking the length
of S again and again. Why? Because
recall how a for loop works. The
initialization happens once at the very
beginning. Then you check the boolean
expression. Then if it's true, you do
the code. Then you do the update. Then
you check the boolean expression. Then
you do the code. update boolean
expression you do the code but every
time you evaluate this boolean
expression you're asking does ah is i
less than the ster length of s but this
is a function call like you are
literally using sterling again and again
and again and like a crazy person you're
asking the computer what's the length of
s what's the length of s what's the
length of s it's not going to change
it's going to be the same no matter what
so how can we fix this well I could
solve this in a couple of ways like I
could for instance down here do int n
equals stir lang of s and store it in a
variable n and just do that. I think
that eliminates the inefficiency because
now I calculate the length of s once.
It's not going to change nor is my
variable. So I can now use and reuse
that variable. It's just saving me a
little bit of time, you know,
microsconds maybe. But when you're
writing bigger programs and you're doing
things in loops, if that loop is running
not three times or five, but a million
times, uh, millions of times, all of
those microsconds, milliseconds might
very well add up. But it turns out
there's some syntactic tricks we can do
too. I alluded to this earlier. If you
want to initialize not one variable but
two, you can actually do it all before
the first semicolon like that. So now on
line 9, I'm declaring a variable called
i and setting equal to zero. And I'm
declaring a second variable called n,
also the same type, int, and setting it
equal to the length of s. And now I can
use that again and again. Now, as an
aside, this is a little bit of a white
lie because smart compilers nowadays are
so advanced that they will notice that
you're calling Sterling again and again
inside of a loop and they will just fix
this for you unbeknownst to you. But
it's representative of a class of
problems that you should be able to spot
with your own human eyes and avoid
altogether so that you don't waste more
time and more compute and more money in
some sense than you might otherwise need
to in this case. Any questions on that
there? Optimization. Yeah,
>> you do not say int. Again, the
constraint is that you have to use the
same data type for all of your
initialization. So, you better hope that
you only want ins otherwise you got to
pull it out and do what I did earlier.
Good question.
Others on this?
Yeah.
>> When does it spaces?
>> When does it account for spaces? A space
is just uh character asky character
number 32. So there's nothing special
about it. It's sort of invisible but it
is there. It is treated like any other
character. There's no special accounting
whatsoever. The null character which is
also invisible is special because print
f and sterling know to look for the end
of that variable the end of that value
as such. All right, let's try one other
demonstration of some of these ideas
here. Let me go into uh a another file
that we'll create called how about
uppercase C. Let's write a super simple
program that like uppercases a string
that the human types in and see how we
can do this sort of good, better, and
best. So I'm going to call this file
uppercase C. Inside of this file, let's
use our now friends include CS50.h.
Let's do include standard io.h. Let's
then include lastly, how about uh
string.h.
And the goal here inside of main is
going to be to get a string from the
user. So string s equals get string. And
we're going to ask the user for a before
string representing what it is they
typed before we uppercase everything.
Then I'm going to go ahead after that
and print out just as a placeholder
after and two spaces just to be nitpicky
so that the text lines up vertically on
the screen. Now I'm going to do the
following for int i= z n equals sterling
lang of s semicolon i less than n just
like before i ++. So I'm just kicking
off a loop that's going to iterate over
the string the human typed in. Now if my
goal in life is to change the user's
input from lowercase if indeed in lower
case to uppercase let's just express
that literally. If the current character
in the string, so s bracket i is greater
than or equal to quote unquote a and s
bracket i is less than or equal to quote
unquote z using single quotes. This is
arguably a very clever way of expressing
the question is it lowercase. We know
from our ASKI chart from week zero that
uh the ASKI chart has uh not only
numbers representing all the uppercase
letters but also numbers representing
all the lowercase letters. Lowerase A
for instance is 97 and they are all
contiguous thereafter. So we can
actually treat just like we did before
chars as ins and ins as chars and sort
of ask mathematical questions about
these chars and say is s bracket i
between a and z inclusive. So if it is
lowercase and I'll add a comment here
for clarity. If S bracket I is lowercase
what do we want to do? We want to force
it to uppercase. So this is a little
trick I can do as follows. Print f the
current character. But let's do some
math on it. Let's change s bracket i by
subtracting some value. Well might that
value be? Well recall from week zero our
asky chart here. And let's focus for
instance on the lowercase letters here
and the uppercase letters here. What's
the distance between all upper and
lowercase letters? It's 32, right? And
the lowercase letters are bigger. So, it
stands to reason if I just subtract 32
from the lowercase letter, it's going to
immediately get me to the uppercase
version thereof. So, this is kind of
cool. So, I can actually go back to VS
Code and I can literally subtract the
number 32 in this case because ASKI is a
standard. It's not going to change.
else. If the letter is not lowercase,
I'm just going to go ahead and print it
out unchanged without doing any
mathematics at all to it. And I'll make
clear with a comment. Uh, else if not
lowercase makes clear what's going on
there. All right, let me go ahead and
make uppercase in my terminal window.
Dot sluppercase. Let's type in my name
all lowercase. And I get back David. H,
minor bug. Couple bugs actually. Let me
fix my spacing. I think I want another
space after the word after. And at the
very bottom of my program, I think I
want a back slashn. Now, let's rerun uh
make unuppercase dot /upercase enter
dab. And now it's forcing it all to
uppercase. Meanwhile, if I do it once
more and type in name capitalized, it's
still going to force everything else to
uppercase. Questions?
>> You're spacing for the after.
>> Oh, I'm an idiot. Okay, thank you.
Yes. Uh I misspelled after otherwise my
lining my alignment would have worked.
So let's do this again. Make uppercase
if only so that we can prove it's the
same dab and all lowercase. And there we
go. That was thank you the intent. All
right. So it's kind of a little trick
but this is kind of tedious, right? Like
Microsoft Word, Google Docs all have the
ability to toggle case from uppercase to
lowerase or lowerase to uppercase. It's
kind of annoying that you have to write
this much code to achieve something so
simple seemingly and so commonplace.
Well, it turns out there's a better
approach here, too. In addition to there
being the string library, there's also
the cype library in cype.h, another
header file, there's a whole bunch of
other functions that are useful that
relate to characters uh characters uh in
ASI. So, for instance, if we go ahead
and use this as follows, I'm going to go
ahead at the top of my file here and
include now cype.h. It turns out there's
going to be functions via which I can
actually ask these questions myself. For
instance, in this next version of the
program, I don't need to do any of this
clever but pretty verbose math. I can
just say if the is lower function which
comes from the cype library passing in s
bracket i returns true, we'll then
convert the letter to lower uppercase by
subtracting 32. But you know I don't
even need to do this mental math or math
in code. I can also from the cype
library use a function called to upper
which takes as input a character like s
bracket i and let someone else's
function do the work for me. So let me
go back down to my terminal window here.
Let me make uppercase now dot /upercase
enter before dab ID. This now works too.
But if I really dig into the
documentation for the cype library,
you'll see that you can just use the is
lower function on any character and it
will very intelligently only uppercase
it if it is actually lowercase. So
someone else years ago wrote the
conditional code that checks if it's
between little A and little Z. So
knowing this, and you would see that
indeed in the documentation, I don't
even need this else. I can instead just
get rid of this whole conditional,
tighten my code up significantly here
and simply say print f using percent c
the two upper version of that same
letter and let the function itself
realize if it's uppercase pass it
through unchanged if it's lowercase
change it first and then return it. So
now if I open my terminal window again
and clear it make uppercase dot
slashupcase enter dav ID and we're back
in business. So again, demonstrative of
how if you find that coding is becoming
tedious or you're solving a problem that
like surely someone else has solved,
odds are there is in fact a library
function for whether it's from CS50 or
from the standard library that you
yourselves can use. Um and unlike the
CS50 library, which is indeed CS50
specific, which is why Clang needed to
know about -L CS50, many of these
libraries just automatically work. You
don't need to link in the cype library.
you don't need to link in other
libraries. Um, but non-standard
libraries like CS50's training wheels
for the first few weeks, we do need to
do that. But make is configured to do
all of that automatically for you.
All right, in our final minutes
together, let's go ahead now and reveal
some of the details we've been rubbing
um uh sweeping under the rug about
Maine. I asked on week one that you just
sort of take on faith that you got to do
the void, you got to do the int, you got
to do the void and all of that. Well,
let's see why that actually is. So, main
is special in so far as in C. It is the
function that will be called
automatically after you've compiled and
then run your code just because not all
languages standardize the name of the
function, but C and C++ and Java and
certain other ones do. In this case,
here is the most canonical simple form
of main. We know that including standard
io.h H just gives us access to the
prototypes for functions like print f.
But what's going on with int and what's
going on with void? Well, void in
parenthesis here just means that main
and in turn all of the programs we've
written up until this moment do not take
command line arguments. Literally every
program we've written /
a.outhello/scores
dot sl everything else. I have never
once typed another word after the name
of our programs that we've written in
class. That is because every program has
void inside of these parenthesis telling
the computer this program does not take
command line arguments, words after the
program's name. That is different from
make and code and cd and other commands
that you've typed with words after them
their names at the prompt. But it turns
out the other supported syntax for the
main function in C can look like this
too, which at a glance looks like kind
of a mouthful, but it just means that
main can take zero arguments or it can
take two. If it takes two, the first is
an integer and the second is an array of
strings. By convention, those inputs are
called arg and arg. arg is the count of
arguments that are typed after the pro
uh after the program's name. Arg is the
argument vector aka array of actual
words. In other words, now that we have
the ability to use arrays, we can get
zero or one or two or three or more
words from users at the prompt when they
run our own programs. So what do I mean
by this? We can now write programs that
actually have command line arguments as
follows. Let me go into VS Code here and
close our old program uppercase. Let's
write a new simpler program here in my
terminal called greet C and just greet
the user in a couple of different ways.
So I'm going to include initially CS50.h
and then I'm going to include standard
io.h here. Then I'm going to say int
main void without introducing anything
new just yet. I'm going to ask the user
like we did last week for a return value
from get string asking them what's your
name as we've done so many times. Then
I'm going to say print f hello percent s
back slashn spitting out their answer as
follows. Same program as last week again
I'm going to make greet. I'm going to
say /greet and I'm prompted now for my
name. I hit enter. Notice that I did not
take any command line arguments. The
only command I ran was dot / greet no
other words. Let's now use this new
trick and actually let the user type
their name when they're running my
program rather than waste their time by
using getstring and prompting them. Let
me go into my editor here. Let's get rid
of the CS50 library. Let's get rid of my
use of get string and let's simply
change void to int arg c then string
argv open bracket close bracket. That's
all down here. Let's simply print out
argv bracket 1 for reasons we'll soon
see. The only change then I'm making
really is changing the prototype for
main from the first version which we've
been using for like a week and a bit now
to the second version which is the only
other version supported. I'm going to go
back to my terminal window now. Make
greet and darn it. I shouldn't so close.
Why did I make uh how do I fix the
mistake I accidentally made? Yeah, in
back. Oh, no. In front.
>> Yes, I should have kept the CS50 library
because it's in the CS50 library that
string is defined. So, include CS50.h.
In week four, we will delete that line
for real and actually show you what
string actually is. I promised at the
start of class that string is a term of
art, but it's not a keyword in C, but it
we'll see what it means in a couple of
weeks time. Okay, let me fix this. make
greet dot slashgreet but now I'm gonna
type before I even hit enter my actual
name and when I hit enter now I see
hello David if I instead dot /g greet
kelly enter now I see hello Kelly if I
do nothing like greet enter I just see
hello null which is not the same null as
before n this is n u lll for reasons
we'll come back to before long but
clearly print f knows something's going
on there's no actual word there. Why
though did I do arg bracket one? Well,
it turns out that just as a feature of
C, if I recompile this program and do
dot /greet and type in nothing else, I'm
going to see something kind of curious.
Hello.
Because automatically the zero location
in the arg variable will automatically
contain the program's own name. Why is
this useful? If you ever want to do
something self-referential like thanks
for running my program or you want to
show documentation for your program and
the name of your program that it depends
on whatever the file itself is called,
you can use argv bracket zero which will
always contain the program's name no
matter what the file has been named or
renamed to. But we can fix that null
issue now in a couple of ways. So arg c
is the other input that I said now can
exist which is the count of arguments at
the prompt. So if I want to check if the
user actually typed their name, I could
say something like if arg c equals
equals 2. Well then and only then go
ahead and print out their name. Else
let's just do some clever default like
print f quote unquote hello world or
heck nothing at all. This version of the
program now is a little smarter because
when I run make greet and dot /gre of my
name works exactly as intended. But if I
forget and only dot slashgreet it's
going to say hello world. Moreover, if I
don't quite cooperate and I say David
Men enter, it similarly just ignores me
because arg count is not two anymore.
It's now three. So, arg contains the
total numbers of words at the prompt,
but the first one is always the
program's name. Question.
>> Sorry. Can you say that once a little
louder?
Why is it information that we just have
or
>> Oh, so the short answer is just because
like the definition of C, if you look up
the documentation for C, you can either
define main as taking no arguments with
the word void
Or you can specify that main can take
two arguments and the compiler and the
operating system will just ensure that
if you provide two those two variables
arg will be filled with those two val
values automatically.
Someone else decided that though that's
just the way it works. You can't come up
you can't put three there. You can't put
four there. You can change the names of
those variables but not the types
because of this convention. So there's
one last feature of main then it's the
actual value it returns. Up until now
every program I've written starts with
int main something. Int main something.
What is that int? We have yet to use it.
Technically the value that main returns
is going to be called a so-called exit
status which is a numeric status that
indicates success or failure. Numbers
are everywhere in the world of
computing. So for instance here's a
screenshot from Zoom whereby if
something goes wrong with Zoom like you
have bad internet connectivity or
something like that you might see an
error code like 1132. That means nothing
to normal people unless you Google it,
look up the documentation, but it means
something very much to the software
engineers who wrote this code because
they know, oh shoot, 1132 means this
error and they probably have a
spreadsheet or a cheat sheet somewhere
that converts those codes to actually
useful error messages. And frankly, in a
better world, they would just tell you
what the problem is rather than just say
report the problem and mention this
number. That said, on the web, odds are
you're familiar with this number 404,
which is also a weird thing for so many
normal people to know, but this
generally means file not found. It's a
numeric code that signifies that
something has gone wrong. Exit status
isn't quite this, but it's similar in
spirit. In Maine, you can return a value
like zero or one or two or something
else to indicate whether something was
successful or not. By convention, a
program, a function like Maine returns
zero on success if all is well. And that
leaves you then with like several
hundred possible things that can go
wrong because you could return one to
signify one thing, two to return
another, three to signify another, and
so long as you have a spreadsheet or a
cheat sheet or something, you can just
keep track as the programmer as to what
error means what. So what does this mean
in real terms? Well, if I go over to VS
Code here, let me implement a relatively
simple program, our last called
status.c.
So in status C, I'm going to go ahead
and use the CS50 library at the top, the
standard IO library at the top, and then
inside of int main and with our new uh
format int arg c string arg v square
brackets inside of main, I'm going to
now do the following. If arg c does not
equal to, then I'm going to go ahead and
print out this time a warning. I'm not
going to have some silly default like
hello world. Let's tell the user that
they didn't use my program correct. And
I'm going to say print f missing command
linear argument. And we'll assume they
know what that means. Then to signify an
error, I'm going to say return one. It
could be two, it could be three, but
this is the first possible error. So I'm
going to start simple with one.
Otherwise, if arg does equal to and I
get to this part of my code, I'm going
to say hello, percent s back slashn and
pass in argv bracket 1 just like before.
And just to be super specific, I'm going
to return zero to tell the computer, the
operating system, that this is success.
Zero signifies success. Any other value
signifies error. Let's make status now.
Let's do dot /st status. And this is a
little magical, but let me go ahead and
cooperate initially. I'm going to type
in my name David. And I'm going to see
hello, David. Uh most people wouldn't
know this but among the commands you can
type at your terminal are this one here
and the TFS and II the TAS and II would
do something like this. We after running
your code can do echo space dollar sign
question mark and we can see secretly
the return value that your program
returned zero in this case. Meanwhile if
we do this again dot slatus uh dot slash
uh status and let me not type my name
this time. When I do this, I see missing
command line argument. What value should
the code have returned? Then one. So
let's see echo dollar sign question
mark. There's the one. So even after
just one week of CS50, if you've ever
wondered how check 50 knows if your code
was correct or not, among the ways we
check for that is by checking this
semi-secret status code, this exit
status, which isn't really a secret.
It's just not displayed to normal people
because it's not all that enlightening
unless you're the software developer who
wrote the code in question. But this
means we could return one in some cases
or two in other cases or three or four
in yet others. And these command line
arguments are sort of everywhere. And in
fact, a program I skipped over a moment
ago was going to be this. There's no uh
academic value to what you're about to
see. But uh another program that takes
command line arguments is known as cows.
And this is sort of very famous in
computing circles because it's been on
systems for many years. Cowsay is a
program that allows you to type in a
word after the prompt like moo and it
will print out what's called asky art.
An adorable little cow with a speech
bubble that says moo. So kind of
evocative of like scratch, but it takes
other command line arguments, not just
the words that you want to come out of
its mouth, but even the appearance that
you want it to have. So for instance, I
can say -f duck and run it again. Enter.
And now I have a little cute duck saying
moo, which is a bit of a bug. So let me
change that to quack for instance
instead. And again no academic value
here. It's just fun to now play with the
various options. But if we really want
to have fun with this, we can do another
one. So cow say-f dragon. And we can say
something like raar. And now we have
this crazy dragon appearing on the
screen. Which is to say again no value
here. It's just fun to play with command
line arguments sometimes. And how is
cows doing this? Well, someone wrote
code maybe in C or some other language
using arg c and argv and poking around
at their values and maybe a conditional
that says if the -f value is dragon then
print this graphic else if the value is
duck then print this other one. It all
boils down to the same fundamentals of
week zero of functions and conditionals
and loops and boolean expressions and
the like. It's just being composed into
more and more interesting things. And
indeed in closing among the other
interesting things we'll play with this
week to come full circle is that of
cryptography. the art of scrambling
information so as to have secure
communication. So important nowadays
with passwords and credit card numbers
and personal messages that you might
want to send and we'll have you explore
through code some of the algorithms via
which you yourselves can encrypt
information. And there's a number of
ways we can do this form of encryption
and they all boil down to this mental
model. You've got some input like the
message you want to send and you want to
incipher it somehow, encrypt it somehow
so that no one knows what message you've
sent. So you want your plain text, which
is the human readable version in English
or any other language to become cipher
text ultimately. So the code you'll be
writing this week is inside of this
black box some kind of cipher, an
algorithm that encrypts information so
that you can do exactly this. Now the
catch is that you can't just give it
plain text and run it through an
algorithm and get cipher text because
you need to somehow have a secret
typically for encryption to work. Like
if I'm going to send a message to
someone in back, well, I could just
randomize the letters that I'm writing
down. But how would they know how to
reverse that process? Probably what we
need to do is agree in advance that you
know what, I'm going to change every A
to a B and every B to a C and a C to a D
and a Z to an A. I'll wrap back around
at the end of the uh the alphabet. It's
not very sophisticated, but who know
middle school teacher if they intercept
two kids passing notes in class are
going to waste time trying to figure out
this cipher. But it does presuppose that
there's a secret between them, the
number one in that case, because I'm
changing every letter by one place. So
how might this work? Well, if I want to
encrypt the word hi, hi exclamation
point and my secret key with someone
that I've come up with in advance is
one. I should send the cipher text i j
exclamation point. Now, this is a simple
cipher, so I'm not really encrypting the
punctuation, which may or may not be a
good thing, but I am encrypting at least
the alphabetical letters. But what does
the recipient then have to do to decrypt
this message? When they see on paper I J
exclamation point, how do they know what
I said? Well, they use that same key but
subtract. So B becomes A, C becomes B, A
becomes Z and so forth. Essentially
inverting the key from positive one to
negative 1. Of course, slightly more
secure than uh a cipher of one, a key of
one would be 13. And in fact, in
computing circles, 13 has special
significance. ROT 13, RO T13 is an
algorithm that's been used for many
years online just to sort of avoid
spoilers. Like Reddit might do this or
other websites where they want you to
have to do some effort to see what the
message says. But it's not all that
hard. You just have to click a button or
write the code that actually does this.
But if you use 13 instead, you wouldn't
get uh J uh you wouldn't get I J. You'd
get UV because U and V are 13 places
away from H and I respectively. But
again, we're not touching the
punctuation. Or we could send something
more personal like I love you and the
message comes out like that. Slightly
more secure than that would be rot 26.
No.
>> No. Why? Because it's the same thing. It
literally rotates all the way around. A
becomes a, b becomes b. So there's a
limit to this. But more seriously, that
speaks to just how strong this
encryption is or is not. Because if you
think about this now from an adversar's
perspective, like the teacher in the
room intercepting the slip of paper, how
much work do they need to do? Well, they
just try all possibilities. Key of one,
key of two, key of three, dot dot dot,
key of 25. And at some point, they will
see clearly that they guessed the key,
which means that cipher is not very
secure. Nonetheless, what we're talking
about is historically known as the
Caesar cipher because back in the day,
when Caesar was communicating by uh by
uh by legend uh with his generals, if
you're the first human on Earth to come
up with encryption or come up with this
specific cipher, it doesn't really
matter how not complex it is if no one
else knows what's going on. Nowadays,
it's not hard at all to write some C
code or any other language that could
just brute force their way through this.
So there are much more sophisticated
algorithms nowadays than simple
rotations of letters of the alphabet as
we'll soon see. But when it comes to
decryption, it really is just a matter
of reversing that process. So this
message here, if we rotate all the
letters in the opposite direction by
subtracting one, will be our final
flourish for today. There's a bit of a
hint there which will reveal that this
message and our final words for us as
the clock strikes 4:15 is going to be
the U becomes T and the I becomes H. Um,
this I'm the only one. This is amusing.
H I S W A S C50. And this was CS50.
We'll see you next time.
Heat. Heat.
Heat. Heat.
Heat.
Heat.
Ow.
Black.
B.
W.
Heat.
Heat. Heat.
All right, this is CS50. This is week
three. And this was an artist rendition
of what various sorting algorithms look
and sound like. Recall from week zero
that an algorithm is just step-by-step
instructions for solving some problem to
sort information as in the real world
just means to order it from like
smallest to largest or alphabetically or
some other heristic. And it's among the
algorithms that we're going to focus on
today in addition to searching which of
course is looking for information as we
did in week zero too. Among the goals
for today are to give you a sense of
certain computer science building
blocks. Like there's a lot of canonical
algorithms out there that most anyone uh
who studied computer science would know,
who anyone who leads a tech interview
would ask. But more importantly, the
goal is to give you different mental
models for and methodologies for
actually solving problems by giving you
a sense of how these uh real world
algorithms can be translated to actual
computers that you and I can control. We
thought we'd begin today uh with an
actual algorithm for sort of taking
attendance. We of course do this with
scanners outside, but we can do it old
school whereby I just use my hand or my
mind and start doing 1 2 3 4 5 6 7 8 9
10 11 12 and so forth. That's going to
take quite a few steps cuz I've got to
point at and recite a number for
everyone in the room. So I could kind of
do what my like grade school teachers
taught me, which is count by twos, which
would seem to be faster. So like 2 4 6 8
10 12 14 16 18 20. And clearly that
sounds and is actually faster. But I
think with a little more intuition and a
little more thought back to week zero, I
dare say we could actually do much
better than that. So, if you won't mind,
I'd like you to humor us by all standing
up in place and think of the number one
if you could and join us in this here
algorithm. So, stand up in place and
think of the number one. So, at this
point in the story, everyone should be
thinking of the number one. Step two of
this algorithm for you is going to be
this. Pair off with someone standing.
Add their number to yours and remember
the sum.
Go.
Okay. At this point in the story,
everyone except maybe one lone person if
we've got an odd number of people in the
room is thinking of what number?
>> Two. Okay. So next step, one of you in
each pair should sit down.
Okay, good. Never seen some people sit
down so fast. So those of you who are
still standing, the algorithm still
going. So the next step for those of you
still standing is this. If still
standing, go back to step two.
Air go repeat or loop if you could.
And notice if you've gone back to step
two, that leads you to step three. That
leads some of you to step four, which
leads you back to step two. So this is a
loop.
Keep going. If still standing, pair off
with someone else still standing. Add
together and then one of you sit down.
So with each passing second, more and
more people should be sitting down
and fewer and few are standing. Okay,
almost everyone is sitting down. You're
getting farther and farther away from
each other. That's okay. I can help with
some of the math at the end here.
All right, I see a few of you still
standing, so I'll help out and I'll I'll
join you together. So, I see you in the
middle here. What's your number?
>> 32.
>> 32. Okay, go ahead and sit down and I'll
pair you off with What's your number?
>> 20. Okay, you can go ahead and sit down.
Uh, who's still
You're still standing?
>> 27.
>> 27. Okay, you can sit down.
>> You guys are still adding together.
Who's going to stay standing? Okay.
What's your number?
>> The worst part is doing like arithmetic
across a crowded room, but
>> 27.
>> 27. Also
>> 47.
>> 47. Okay, you can sit down. Is anyone
still standing? Yeah,
>> 15.
>> Nice. 15. Okay, you can sit down. Anyone
still standing?
Okay, so all I've done is sort of
automate the process of pairing people
up at the end here. When I hit enter, we
should hopefully see Oh, the numbers are
a little What's going on there? There we
go. When I hit enter, we'll add together
all of the numbers that were left. And
if you think about the algorithm that we
just executed, each of you started with
the number one, and then half of you
handed off your number. Then half of you
handed off your number. Then half of you
handed off your number. So theoretically
all of these ones with which we started
should be aggregated into the final
count which if this room weren't so big
would just be in one person's mind and
they would have declared what the total
number of people in the room is. I'm
going to speed that up by hitting enter
on the keyboard. And if your execution
of this algorithm is correct, there
should be
141 people in the room. According to our
old school human though, Kelly, who did
this manually, one at a time, the total
number of people in the room, according
to Kelly, if you want to come on up and
shout it into the microphone, is of
course going to be
>> I don't know, something around 160, I
think.
>> 160. So, not quite the same. Okay, but
that's pretty good. Okay, round of
applause for your your accuracy.
Okay, so ideally counting one at a time
would have been perfectly correct. So,
we're only off by a little bit. Now,
presumably that's just because of some
bugs in execution of the algorithm.
Maybe some mental math didn't quite go
according to plan. But theoretically,
your third and final algorithm wherein
you all participated should have been
much faster than my algorithm or Kelly's
algorithm whether or not we were
counting one at a time or two at a time.
Why? Well, think back to week zero when
we did the whole phone book example,
which was especially fast in its final
form because we were dividing and
conquering, tearing half of the problem
away, half of the problem away. And even
though it's hard to see in a room like
this, it stands to reason that when all
of you were standing up, we took a big
bite out of the first problem and half
of you sat down, half of you sat down,
half of you sat down, and theoretically
there would have been, if you were
closer in in uh space, one single person
with the final count. So let's see if we
can't analyze this just a little bit by
considering what we did. So here's that
same algorithm here. Recall is how we
motivated week zero's demonstration of
the phone book in either digital form as
you might see in an iPhone or Android
device looking for someone for instance
like John Harvard who might be at the
beginning middle or end of said phone
book but we analyze that algorithm just
as we can now this one. So in my very
first verbalized algorithm 1 2 3 4 you
could draw that as a straight line
because the relationship between the
number of people in the room and the
amount of time it takes is linear. It's
a straight line with each additional
person in the room. It takes me one more
step. So if you think to sort of high
school math, there's sort of a slope of
one there. And so this n number denoting
number of people in the room is indeed a
straight line. And on the x-axis, as in
week zero, we have the size of the
problem in people and the time to solve
in steps or seconds or whatever your
unit of measure is. If and when I
started counting two at a time, 2 4 6 8
10 and so forth, that still is a
straight line because I'm taking two
bytes consistently out of the problem
until maybe the very end where there's
just one person left, but it's still a
straight line, but it's strictly faster.
No matter the size of the problem, if
you sort of draw a line vertically,
you'll see that you hit the yellow line
well before you hit the red line because
it's moving essentially twice as fast.
But that third and final algorithm, even
though in reality it felt like it took a
while and I had to kind of bring us to
the exciting conclusion by doing some of
the math, that looked much more like our
third and final phone book example.
Because if you think about it from an
opposite perspective, suppose there were
twice as many people in the room. Well,
it would have taken you all
theoretically just one more step. Now,
granted, one more loop and there might
be some substeps in there, if you will,
but it's really just fundamentally one
more step. If the number of people in
the room quadrupled, four times as many
people, well, that's two more steps.
Equivalently, the amount of time it
takes to solve the attendance problem
using that third infogal algorithm grows
very slowly because it takes a huge
number of more people in the room before
you even begin to feel the impacts of
that uh growth. And so today indeed, as
we talk about not only the correctness
of algorithms, we're going to talk about
the design of algorithms as well. just
as we have code because the smarter you
are with your design the more efficient
your algorithms ultimately are going to
be and the slower their cost is going to
grow and by cost I mean time like here
maybe it's money maybe it's the amount
of storage space that you need any
limited resource is something that we
can ultimately measure and we're not
going to do it very precisely indeed
we're going to use some broad strokes
and some standard mechanisms for
describing ultimately the running time
the amount of time it takes for an
algorithm or in turn code to actually
run. So, how can we do this? Well, last
week recall we set the stage uh for
talking about something called arrays,
which were the simplest of data
structures inside of a computer where
you just take the memory in your
computer and you break it up into chunks
and you can store a bunch of integers, a
bunch of strings, whatever, back to back
to back to back. And that's the key
characteristic for an array. It is a
chunk of memory wherein all of the
values therein are back to back to back.
So, right next to each other in memory.
So we drew this fairly abstractly by
drawing a grid like this and I said well
maybe this is bte zero and this is bte 1
billion whatever the total number amount
of memory is that you have. We zoomed in
and looked at a little something like
this a canvas of memory. We talked about
what and where you can put things. But
today let's just assume that we want 1 2
3 4 5 6 seven chunks of memory for the
moment. And inside of them we might put
something like these numbers here. Well,
the interesting thing about computers is
that even though if I were to ask you
all, find the number 50 in this array. I
mean, our minds quickly see where it is
because we sort of have this bird's eye
view of the whole screen and it's
obvious where 50 is. But the catch with
computers and with code that we write is
that really these arrays, these chunks
of memory are equivalent to a whole
bunch of closed doors. And the computer
can't just have this bird's eye view of
everything. If the computer wants to see
what value is at a certain location, it
has to do the metaphorical equivalent of
going to that location, opening the door
and looking, then closing it and moving
on to the next. That is to say, a
computer can only look at or access one
value at a time. Now, that's in the
simplest form. You can build fancier
computers that theoretically can do more
than that, but all the code we write
generally is going to assume that model.
You can't just see everything at once.
You have to go to each location in these
here lockers, if you will. Starting
today two when we talk about the
locations in memory we're going to use
our old uh zero indexing uh vernacular
that is to say we start counting from
zero instead of one. So this will be
locker zero locker one locker two dot
dot dot all the way up to locker six. So
just ingrain in your mind that if you
hear something like location six that's
actually implying that there's at least
seven total locations because we started
counting at zero. So that's intentional.
Um we don't have in the real world
yellow lockers. So, we're going to make
this metaphor red instead. We do have
these lockers here. And suppose that
within these seven lockers physically on
stage. We've put a whole bunch of money,
uh, monopoly money, if you will, but the
goal initially here is going to be to
search for some specific denomination of
interest and use these physical lockers
as a metaphor for what your computer's
going to do and what your code
ultimately is going to do. If we're
searching for the solution to a problem
like this, the input to the problem at
hand is seven lockers, all of whose
doors are metaphorically closed. The
output of which we want to be a bull.
True or false answer. Yes or no? That
number is there or no it is not. So
inside of this black box today is going
to be the first of our algorithm
step-by-step instructions for solving
some problem where the problem here is
to find among all of these dollar bills
specifically the $50 bill. If we could
get two volunteers to come on up who are
ideally really good at monopoly. Okay.
How about over here in front? And uh how
about let me look a little farther in
back. Okay. Over here there and back.
Come on down. All right. As these uh
volunteers kindly come down to the
stage, we're going to ask them in turn
to search for specifically the $50 bill
that we've hidden in advance. And if uh
my colleague Kelly could come on up too
because we're going to do this twice.
Once searching uh in one with one
algorithm and a second time with
another. Uh let me go ahead and say
hello if you'd like to introduce
yourselves to the group.
>> Hey, I'm Jose Garcia.
>> Hi, I'm Caitlyn Cow.
>> All right, Jose and Caitlyn. Nice to
meet you both. Come on over and let me
go ahead and propose that Jose um the
first algorithm that I'd like you to do
is to find the number 50. And let's keep
it simple. Just start from the left and
work your way to the right. And with
each time you open the door, stand over
to the side so people can see what's
inside and just hold the dollar amount
up for the world to see. All right, the
floor is yours. Find us the $50 bill.
20.
>> Shut it.
>> No, that's good. That's good acting,
too. Thank you. No, you can shut it just
like the computer. All right.
No. Very clear. Thank you.
Still no. $10 bill.
Next locker.
$5 bill. Not going well.
Uh $100 bill, but not the one we want.
This one. H $1 bill. Still no 50. Of
course, you've been sort of set up to
fail, but here, amazing. A round of
applause. Jose found the $50 bill.
All right. So, let me ask you, Jose, you
found the $50 bill. Um, it clearly took
you a long time. Just describe in your
own words, what was your algorithm, even
though I nudged you along.
>> Yeah. So, my algorithm was basically
walk up to the first door available,
open it, check if the dollar bill was
the dollar bill that I was looking for,
and then put it back, and then go to the
next one.
>> Okay. So, it's very reasonable because
if the $50 bill were there, Jose was
absolutely going to find it eventually,
if slowly. In the meantime, Kelly's
going to kindly reshuffle the numbers
behind these doors here. And even though
Jose took a long time here, I mean, what
if Jose like wouldn't have been smart to
start from the other end instead, do you
think?
>> Um, not necessarily because we don't
know if the 50 is going to be at that
end.
>> Exactly. So, he could have gotten lucky
if he sort of flaunted my advice and
didn't start on the left, but instead
started on the right. Boom. he would
have solved this in one step, but in
general that's not really going to work
out. Maybe half the time it will. You'll
get lucky, half the time it won't. But
that's not really a fundamental change
in the algorithm whether you go left to
right, right to left. To Jose's point,
if you don't know anything priori about
the numbers, the best you can probably
do is just go through linearly left to
right or right to left. So long as
you're consistent. Now, could you have
jumped around randomly?
>> Uh, I guess I could have, but if again,
if they weren't in any like specified
order, I don't think it would have
helped either. Yeah. So, in
additionally, if he just jumped around
to random order, they might get lucky
and it might be in the very first one
might have taken fewer steps ultimately,
but presumably you're going to have to
then keep track of like which locker
doors have you opened. So, that's going
to take some memory or space, not a big
deal with seven lockers. But if it's 70
lockers, 700 lockers, even random
probably isn't going to be the best job.
So, let me go ahead and take the mic
away and hand it over to Caitlyn. You
can stay on the stage with us. Caitlyn,
what I'd like you to do is approach this
a little more intelligently by dividing
and conquering the problem, but we're
going to give you an advantage over
Jose. Kelly has kindly sorted the
numbers from smallest to largest from
left to right.
>> So, accordingly, what's your strategy
going to be?
>> Start in the middle.
>> Okay, please.
And go ahead as before and reveal to the
audience what you found. Not the 50, the
20. But what do you know, Caitlyn? At
this point,
>> it'll be in on the left is left.
Correct. So the 20 is going to be to the
left. So where might you go next with
this three locker problem? Let me
propose that you maybe go to the middle
of the three.
>> There we go. The middle of the middle.
Like that would have been good. But
let's
>> Oh no.
>> Oh no. It's a 100 instead. You failed.
But what do you now know?
>> It's in the middle.
>> That I should have just let you. But now
we have a big round of applause for Kayn
for having found the 50 as well. Okay.
So, the one catch with this particular
demo is that because they know
presumably what monopoly money
denominations are because we just did
this exercise and we had the whole cheat
sheet on the board, you probably had
some intuition as to like where the 50
was going to be. even though I was
trying to get you to play along. But in
the general case, if you don't know what
the numbers are and that they're the
specific denominations, but you do know
that they're going from smallest to
largest, going to the middle, then the
middle of the middle, then the middle of
the middle again and again would have
the effect of starting with a big
problem and having it, having it, having
it, just like the phone book as well.
So, thanks to you both. We have these
wonderful parting gifts that we found in
Harvard Square. Uh, if you like
Monopoly, you'll love the Cambridge
edition filled with Harvard Square name
spots. So, but thank you to you both and
a round of applause for our volunteers
here.
>> All right. So, let's see if we can't
formalize a little bit these two
algorithms known as linear search in so
far as Jose was searching essentially
along a line left to right and binary
search by implying two because we were
having that problem in two again and
again and again. So for instance with
linear search from left to right or
equivalently right to left we could
document our pseudo code as follows. For
each door from left to right if the 50
is behind the door well then we're done.
Just return true. That's the boolean
value which was the goal of this
exercise to say yes here is the 50.
Otherwise at the very bottom of this
pseudo code we could just say return
false. Because if you get all the way
through the lockers and you have never
once declared true by finding the 50,
you might as well default at the very
end to saying false. I did not find it.
But notice here, just like in week zero
when we talked about pseudo code for
searching the phone book, my indentation
of all things is actually very
intentional. This version of this code
would be wrong if I instead used our old
friend if else and made this conditional
decision. Why is this code now in red
wrong in terms of correctness? Yeah, if
it's not behind the first door, it'll
return false.
>> Exactly. Because if the number 50 is not
behind the first door, the else is
telling you right then and there, return
false. But as we've seen in CC code,
whenever you return a value, like that's
it for the function. It is done doing
its work. And so if you return false
right away, not having looked at the
other six lockers, you may very well get
the answer wrong. So the first version
of the code where there wasn't an else
but rather this implicit line of code at
the very or this explicit line of code
at the very end that just says if you
reach this line of code return false
that addresses that problem and to be
clear even though it's right after an
indented return true when you return a
value as in C that's it like execution
stops at that point at least for the
function or in this case the pseudo code
in question. All right, so here's a more
computer sciency way of describing the
same algorithm. And even though it
starts to look a little more arcane, the
reality is when you start using
variables and sort of standard notation,
you can actually express yourself much
more clearly and precisely, even though
it might take a little bit of practice
to get used to. Here is how a computer
scientist would express that exact same
idea. Instead of saying for each door
from left to right, we might throw some
numbers on the table. So for i a
variable apparently from the value zero
on up through the value n minus one is
what this shorthand notation means if 50
is behind doors bracket i so to speak.
So now I'm sort of treating the notion
of doors as an array using our notation
from last week. If 50 is behind doors
bracket I return true. Otherwise if you
get through the entirety of that array
of doors you can still return false. Now
notice here n minus one seems a little
weird because aren't there n doors? Why
do I want to go from 0 to n minus one
instead of 0 to n? Yeah,
>> because zero is the first block.
>> Exactly. If you start counting at zero
and you have n elements, the last one is
going to be addressed as n minus one,
not n because if it were n, then you
actually have n + one elements, which is
not what we're talking about. So again,
just a standard notation and it's a
little turser this way. it's a little
more succinct and frankly it's a little
more adaptable to code. And so what
you're going to find is that as our
problem sets and programming challenges
that we assign sort of get a little more
involved, it's often helpful to write
out pseudo code like this using an
amalgam of English and C and eventually
Python code because then it's way easier
after to just translate your pseudo code
into actual code if you're operating at
this level of detail. All right. So, in
the second algorithm, uh, where Caitlyn
kindly searched for 50 again, but Kelly
gave her the advantage of sorting the
numbers in advance. Now, she doesn't
have to just resort to brute force, so
to speak, trying all possible doors from
left to right. She can be a little more
intelligent about it and pick and choose
the locker she opens. And so, with
binary search, as we call that, we could
implement the same pseudo code. We could
implement pseudo code for it as follows.
We might say if 50 is behind the middle
door, then go ahead and return true.
Else if it's not behind the middle door,
but 50 is less than that number behind
the middle door, we want to go and
search the left half. So that didn't
happen in Caitlyn's sense because we
ended up going right. So that's just
another branch here. Else 50 is greater
than what was at the middle door. We
want to search the right half. But
there's going to be one other condition
here that we should probably consider,
which is what is it here? Is it to the
left? Or is it to the right? But there's
another a corner case that we'd better
keep track of. What else could happen?
>> If it's not in the array or really like
we're out of doors, so we can implement
this in a different way. I left myself
some space at the top because I
shouldn't do any of this if there are no
doors to search for. So, I should have
this sort of sanity check whereby if
there's no doors left or no doors to
begin with, let's just immediately
return false. And why is that? Well,
notice that when I say search left half
and search right half, this is
implicitly telling me just do this
again. Just do this again, but with
fewer and fewer doors. And this is a
technique for solving problems and
implementing algorithms that we're going
to end today's discussion on because
what seems very colloquial and very
straightforward. Okay, search the left
half, search the right half is actually
a very powerful programming technique
that's going to enable us to write more
elegant code, sometimes less code to
solve problems such as this. And more on
that in just a little bit. But how can
we now formalize this using some of our
array notation? Well, it looks a little
more complicated, but it isn't really.
Instead of asking questions in English
alone, I might say if 50 is behind doors
bracket middle, this pseudo code
presupposes that I did some math and
figured out what the numeric address,
the numeric index is of the middle
element. And how can I do that? Well, if
I've got seven doors and I divide by
two, what's that? 7id two,
three and a half. Three and a half makes
no sense if I'm using integers to
address this. So maybe we just round
down. So three. So that would be locker
number 0 1 2 3 which indeed if you look
at the seven lockers is in fact the
middle. So this is to say using some
relatively simple arithmetic I can
figure out what the address is the index
is of the middle door if I know how many
there are and I divide by two and round
down. Meanwhile, if I don't find 50
behind the middle door, let's ask the
question. If 50 is less than the value
at the middle door, then let's search
not the left half per se in the general
sense. More specifically, search doors
bracket zero through doors bracket
middle minus one. Otherwise, if 50 is
greater than the value at the middle
door, go ahead and search doors bracket
middle + one through doors bracket n
minus one. Now let's consider these in
turn. So searching the left half as we
described this earlier seems to line up
with this idea like s start searching
from doors bracket zero the very first
one. But why are we searching doors
bracket middle minus one instead of
doors bracket middle.
Yeah
>> middle.
>> Yeah exactly. We already checked the
middle door by asking this previous
question. And so you're just wasting
everyone's time if you divide the half
and still consider that door as
checkable again. And same thing here. We
check middle plus one through the end of
the lockers array because we already
checked the middle one. So same reason
even though it just kind of complicates
the look of the math, but it's really
just using variables and arithmetic to
describe the locations of these same
lockers. But let's consider now what we
mean by running time. The amount of time
it takes for an algorithm to run. and
consider which and why one of these
algorithms is better than the other. So
in general when talking about running
time we can actually use pictures like
this. This is not going to be some like
very low-level mathematical analysis
where we count up lots of values. It's
going to be broad strokes so that we can
communicate to colleagues uh to other
humans generally whether an algorithm is
better than another and how you might
compare the two. So here for instance is
a pictorial analysis of two different
algorithms. It's the phone book from
week zero and then the attendance taking
from today itself. And let's generally
as we've done before sort of label these
things. So the very first algorithm took
n steps in the very worst case if I had
to search the whole phone book or if I
had to count everyone in the room. So
the first algorithm took indeed n steps.
The second algorithm took half as many
plus one maybe but we'll keep it simple.
So we'll call that n /2. And the third
and final algorithm both in week zero
with the phone book and today with
attendance is technically log base 2 of
n. And if you're a little rusty in your
logarithms, that's fine. Just take on
faith that log base 2 alludes to taking
a problem of size n and dividing it in
half and half and half as many times as
you can until you're left with one
person standing or one page in the phone
book. That's how many times you can
divide in half a problem of size n.
Well, it turns out that we're getting a
little more detailed than most computer
scientists t care to get uh when
describing the efficiency of algorithms.
So in fact we're going to start to use
some not common notation instead of
worrying precisely mathematically about
how many steps today's and the future's
algorithms take. We're going to talk in
broader strokes about how many steps
they are on the order of and we're going
to use what's called big O notation
which literally is like a big O and then
some parenthesis and you pronounce it
big O of such and such. So the first
algorithm seems to be in big O of N
which means uh it's on the order of N
steps give or take some. this algorithm
here, you might be inclined to do
something similar. Ah, it's on the order
of n / two steps and ah, this one's on
the order of log base 2 of n steps. But
it turns out what we really care about
with algorithms is how the time grows as
the problem itself grows in size. So the
bigger n gets, the more concerned we are
over how efficient our algorithm is. if
only because today's computers are so
darn fast. Whether you're crunching a
thousand numbers or 2,000 numbers, like
it's going to take like a split second
no matter what. But if you're crunching
a thousand numbers versus a million
numbers versus a billion numbers, like
that's where things start to actually be
noticeable by us humans and we really
start to care about these values. So in
general, when using big O notation like
this, you ignore lower order terms or
equivalently, you only worry about the
dominant term in whatever mathematical
expression is in question. So big O of N
remains big O of N. Big O of N / two.
Eh, it's the same thing really as like
big O N. Like it's not really, but
they're both linear in nature. One grows
at this rate, one grows at this rate
instead. But it's for all intents and
purposes the same. They're both growing
at a constant rate. This one too, ah,
it's on the order of log of n where the
base is who cares. In short, what does
this really mean? Well, imagine in your
mind's eye that we were about to zoom
out on this graph such that instead of
going from 0 to like a million, maybe
now the x-axis is 0 to a billion. And
same thing for the y-axis, 0 to a
million. Let's zoom out. So, you're
seeing 0 to a billion. Well, in your
mind's eye, you might imagine that as
you zoom out, essentially things just
get more and more compressed visually
because you're zooming out and out and
out, but these things still look like
straight lines. This thing still looks
like curved lines, which is to say as n
gets large, clearly this green
algorithm, whatever it is, is more
appealing it would seem, than either of
these two algorithms. And if we keep
zooming out, like at some point, the ink
is going to be so close together that
they all for are for all intents and
purposes pretty much the same algorithm.
So this is to say computer scientists
don't care about lower order terms like
divide by two or base 2 or anything like
that. We look at the most dominant term
that really matters as n gets bigger and
bigger. So that then is bigo notation
and it's something we'll start to use
pretty much recurringly anytime we
analyze or speak to how good or how bad
some algorithm is. So here's a little
cheat sheet of common running times. So
for instance here's our friend big O of
N which means uh the algorithm takes on
the order of n steps. Uh here is one
that takes on the order of login steps.
Here are some others we haven't seen
yet. Some algorithms take n times log n
steps. Some algorithms take n squared
steps and some algorithms just take one
step maybe or maybe two steps or four
steps or 10 but a constant number of
steps. So let me ask of the algorithms
we've looked at thus far for instance
linear search being the very first today
what is the running time of linear
search in big O notation that is to say
if there's n people uh if there's n
lockers on the stage how many steps
might it take us to find a number among
those n lockers big O of yeah
>> big O of N in fact is exactly where I
would put linear search. Why? Well, if
you're using linear search in the very
worst case, for instance, the number
you're looking for, as with Jose, might
be all the way at the end. So, you might
get lucky. It might not be at the very
end, but generally, it's useful to use
this bigo notation in the context of
worst case scenarios because that really
gives you a sense of how badly this
algorithm could perform if you just get
really unlucky with your data set. So e
even though big O really just refers to
an upper bound like how many steps might
it take it's generally useful to think
about it in the context of like the
worst case scenario like ah the number I
care about is actually way over here but
what about binary search even in the
worst case so long as the data is sorted
how many steps might binary search take
by contrast
>> big O of log N so binary search we're
going to put here which is to say that
in general and especially as n gets
large binary search is much faster it
takes much less time. Why? Because
assuming the numbers are sorted, you
will be dividing in half and half and
half just like with the phone book in
week zero that problem and you will get
to your solution much faster. Why should
you not use binary search though on an
unsorted array of lockers
like a random set of numbers? Yeah,
>> you could just get rid of the value
because you don't know like what the
inequality is going to be.
>> Exactly. You're making these decisions
based on inequalities, less than or
greater than, but based on like no rhyme
or reason. You're going left, going
right, but there's no reason to believe
that smaller numbers are this way and
bigger numbers are that way. So, you're
just making incorrect decision after
incorrect decision. So, you're probably
going to miss the number altogether. So,
binary search on an unsorted array is
just incorrect. Incorrect usage of the
algorithm. But, like Kelly did, if you
sort the data in advance or you're
handed sorted data, well, then you can
in fact apply binary search perfectly
and much more efficiently.
>> I have a question. Is there ever a case
where linear search is more efficient
just because the process of sorting the
data yourself?
>> Absolutely. Is linear search sometimes
more efficient if it's going to take you
more time to sort the data and then use
binary search? Absolutely. And that's
going to be one of the design decisions
that underlies any implementation of an
algorithm because if it's going to take
you some crazy long time not to sort
like seven numbers but 70 700 7,000 7
million but you only need to search the
data once then what the heck are you
doing? Like why are you wasting time
sorting the data if you only care about
getting an answer once? You might as
well just use linear search or heck do
it even randomly and hope you get lucky
if you don't care about reproducing the
same result. Now in general that's not
how much of the world works. For
instance, Google's working really hard
to make faster and faster algorithms
because we are not searching Google once
and then never again doing it. we're
doing it again and again and again. So
they can amortize, so to speak, the cost
of sorting data over lots and lots of
searches. But sometimes it's going to be
the opposite. And I think back to
graduate school where I was often
writing code to analyze large sets of
data. And I could have done it the right
way, sort of the CS50 way by fine-tuning
my algorithm and thinking really hard
about my code. But honestly, sometimes
it was easier to just write really bad
but correct code, go to sleep for seven
hours, and then my computer would have
the answer by morning. The downside, as
admittedly happened more than once, is
if you have a bug in your code and you
go to sleep and then seven hours later
you find out that there was a bug,
you've just wasted the entire evening.
So there too, a trade-off sometimes when
making those resource decisions. But
that's entirely what today is about,
making informed decisions. And sometimes
maybe it's smarter and wiser to make the
more expensive decision, but not
unknowingly, at least knowingly. All
right, so there might we have our first
two algorithms, but let's consider
another way of describing the efficiency
of an algorithm. Big O is an upper
bound. Sort of how bad can it get in
these uh cases where maybe the data is
really uh not working to our advantage.
Omega, a capital omega symbol here is
used for lower bounds. So maybe how
lucky might we get in the best case, if
you will. How few steps might an
algorithm take? Well, in this case here,
here's just a cheat sheet of common
runtimes, even though there's an
infinite number of others, but we'll
generally focus on uh um u functions
like these. Let's consider those same
algorithms. So with linear search from
left to right, how few steps might that
algorithm take?
For instance, in like the best case
scenario?
Yeah. Is this hand about to go up?
>> Yeah. So one step. Why? Because maybe
Jose could have gotten lucky and opened
this door and voila, that was the 50. It
didn't play out that way, but it could
have. In the general case, the number
you're looking for could very well be at
the beginning. So we're going to put
linear search at omega of one. So one
step and maybe it's technically a few
more than that, but it's a fixed number
of steps that has nothing to do with the
number of lockers. Case in point, if I
gave you not seven but 70 lockers, he
could still get lucky and still take
just one step. So omega is our lower
bound. Big O is our upper bound. Ah,
spoiler. What is binary search's lower
bound? Well, apparently it's also omega
of one. But why? That is in fact
correct. Yeah,
>> you could just get lucky again.
>> Same reason you could get lucky in the
best case and it's just smack dab in the
middle of all of the data. So the fewest
number of steps binary search might take
is also actually one. So this is why we
talk about upper bound and lower bound
because you get kind of a r a sense of
the range of performance. Sometimes it's
going to be super fast which is great
but something tells me in the general
case we're not going to get lucky every
time we use an algorithm. So it's
probably going to be closer to those
upper bounds the big O. Now, as an
aside, there's a third and final uh
symbol that we use in computer science
to describe algorithms. That of a
capital theta. Capital theta is jargon
you can use when big O and omega happen
to be the same. And we'll see that
today. Not always, but here's a similar
cheat sheet. None of the algorithms thus
far can be described in this way with
theta notation because they are not all
the same with their big O and omega.
They differed in both of our analyses.
But we'll see at least one example of
one where it's like okay we can describe
this in theta and that's like saying
twice as much information with your
words to another computer scientist
rather than giving them both the upper
and the lower bounds. The fancy way of
describing all of what we're talking
about here big O omega and theta is
asmmptoic notation. And asmtoic notation
refer or asmtoic uh lee refers to a
value getting bigger and bigger and
bigger and bigger but not necessarily
ever hitting some boundary as n gets
very large in short is what we mean when
we deploy this here asmtoic notation.
All right. So, with the first of these
things like linear search, let's
actually kind of make this a bit more
real. Let me actually go over to in just
a moment uh my other screen here. Okay,
in VS Code, let me go ahead and create a
program called search.c. And in search
C, let's go ahead and implement a fairly
simple version of linear search
initially. So, let me go ahead and
include, for instance, cs50.h. Let me go
ahead and include standard io.h. Then,
let me go ahead and do in main void. So,
we're not going to bother with any
command line arguments for now. And then
let me go ahead and just give myself an
array of numbers to play with. And we
did this briefly last week in answer to
a question, but I'm going to do it now
concretely rather than use something uh
ma more manual to get all of these
numbers into the array. I'm going to say
give me an array called numbers. And the
numbers I want to put in this array
initially are going to be the exact same
denominations we've been playing with.
20 500 10 5 100 1 and 50. Again, this is
notation that I alluded to in answer to
a question last week whereby if you want
to statically initialize an array, that
is give it all of your values up front
without having the human type them all
in manually, you can use curly braces
like this. And the compiler is pretty
smart. You don't have to bother telling
the compiler how many numbers you want,
1 2 3 4 5 6 7 because it can obviously
just count how many numbers are in the
curly braces, but you could explicitly
say seven there so long as your counting
is in fact correct. So on line six, this
gives me an array of seven numbers
initialized to precisely that list of
numbers from left to right. All right,
let's ask the human now what number they
want to search for just as I did our two
volunteers and say int n equals get int.
Then let's just ask the user for the
number that they want to search for.
Then let's implement linear search. And
if I want to implement linear search in
terms of the programming constructs
we've seen thus far like what type what
uh keyword in C should I use? What
programming technique? Yeah. Yeah. So,
maybe a for loop or a while loop, but
for loop is kind of uh my go-to lately.
So, let's do four int i equals zero
because we'll start counting from the
left. I is less than seven, which isn't
great to hardcode, but I'm not going to
use the seven again. So, I think it's
okay in one place for this demo. then I
++ then inside of this array let's go
ahead and ask a question just like Jose
was by opening each of the doors by
saying if numbers bracket I equals
equals the number we asked about n well
then let's go ahead and print out some
informative message like found back
slashn and then for good measure like
last week let's return zero to signify
success it's sort of equivalent to
returning true but in main recall you
have to return an int. That's why we
revealed at the end of week two the
return type of main is an int because
that is what gives the computer its
so-called exit status which is zero if
all is well or anything other than zero
if something went wrong but I think
finding the number counts as all is well
but if we get through that whole loop
and we still haven't printed found or
return zero I think we can go ahead and
safely say not found back slashn and
then let's just return one as our exit
status to indicate that we didn't find
the actual number. So in short I think
and see this is linear search. Let me
open up my terminal window again. Let me
make search enter. Let me do / search
enter. And I'll search for as I asked
Jose the number 50. And we indeed found
it at the end. Let me go ahead and rerun
dot slash search. And let's search for
the other number at the beginning 20.
That then works. And just to get crazy,
let's search for a number we know not to
be there like a th00and. And that in
fact is not found. So I think we have an
implementation then of linear search.
But let me pause here and ask if there's
any questions with this here code and
the translation of algorithm to
see. Yeah, in the back
why I did not specify the length of the
array. So it is not necessary when
declaring an array and setting it equal
to some known values in advance to
specify in the square brackets how many
you have because like the compiler is
not an idiot. It can literally count the
numbers inside of the curly braces and
just infer that value. You could put it
there, but arguably you're opening up
the possibility that you're going to
miscount and you're going to put seven
here but eight numbers over there or six
numbers there. So it's best not to tempt
fate and just let the compiler do its
thing instead. A good question. Other
questions on this code so far?
All right, if none, let's go ahead and
maybe convert this linear search to one
that's maybe a little more interesting
that involves like searching for strings
of text. After all, we started the class
in week zero by searching for names in a
phone book like John Harvard. Let's see
if we can't adapt our code for searching
for strings instead of integers. So, in
my code here, let's go ahead and delete
everything inside of main just to give
myself a clean canvas. Let me go ahead
and give me another array. This one
called, let's just call it strings, cuz
that's the goal of this exercise. And
set them equal to some familiar pieces
from the game of Monopoly if you might
have played. So, there's like a
battleship piece in there, there's a
boot in there, there's a cannon in
there, an iron, a thimble, and a top
hat. Though, it does vary nowadays based
on the addition that you have. So kind
of a long array, but I have 1 2 3 4 5
six total values in this array of
strings. Now let's ask the user for a
string. We'll call it s for short. And
say with get string, what string are you
looking for among those six? Then I
think we can do an a for loop again for
int i= 0 i less than 6 i ++. And then
inside of this loop, let's do the same
thing. If uh let's say
uh strings
bracket i equals equals the string s
that the human typed in. I think we can
go ahead and say print found back slashn
and then as before return zero to
signify success. And if we don't after
that whole for loop let's print print f
not found back slashn down here and
return one to signify error. So, it's
really the same thing at the moment,
except that I'm actually using strings
instead of integers. All right, let me
go ahead and open up my terminal window
again and clear it. Let me go ahead and
recompile this code. Make search.c seems
to compile. Okay, let me do dot / search
and let's go ahead and search for the
first one. How about battleship enter?
Huh, not found. All right. Well, let's
maybe typo. Maybe let me search for
something easier to spell. boot not
found. That's weird. Both of those are
at the very start of the array. Let's do
dot slarch again and search for top hat.
Enter. Not found. What is going on?
Well, this isn't actually that obvious
as to what I'm doing wrong. But it turns
out that when we actually compare
strings instead of integers in C, we're
actually going to have to use this other
library, at least today, that we saw
briefly last week. Last week we
introduced it because of a function
called sterling which gives us the
length of a string. Turns out that
string.h also comes per its
documentation with another useful
function called stir comp for string
compare and its purpose in life is to
actually compare two strings left and
right to make sure they are in fact the
same. So for today's purposes suffice it
to say you cannot use equals equals
apparently to compare two strings
intuitively. Why is that? Well, for a
computer, it's super easy to compare two
integers because they're either there or
they're not in memory. But with a
string, it's not just a character and
another character. It's like seven a few
characters over here and a few
characters over here. Maybe it's a few,
maybe it's more. You have to compare
each and every character in a string to
make sure they're in fact the same. So,
stir compare does exactly that. probably
in the implementation of stir comp from
like years ago someone wrote a while
loop or a for loop that looks at each
string left to right and compares each
and every one of the characters therein
and then gives us back an answer. So how
do we go about using this? Well to use
stir compare what I can actually do in
VS code here is go and change my code as
follows. Instead of using equals equals
I'm going to actually use this function
per its documentation. I'm going to call
stir compare. Then I'm going to pass in
one of the strings which is in strings
bracket I. Then I'm going to pass in the
second string which is S. However,
having read the documentation and this
is a little non-obvious. It turns out
that stir comp will return zero if the
strings are equal. Otherwise, it's going
to return a positive number or a
negative number. So what I care about
for now is does the return value of stir
comp when given those two strings give
me back zero. If so, they are equal and
I'm going to say quote unquote found.
So, let's go ahead and open the terminal
again. Let me go ahead and clear it and
do make search to recompile my code. And
huh, I've done something wrong. Let's
see. Let me scroll up to the very first
line. In line 11, error call to
undeclared library function stir comp
with type in and something something
which gets more complicated after that.
Why is line 11 not working despite what
I just preached? Yeah.
>> Yeah. I just did something stupid. I
didn't include the string.h header
library. So all clang, our compiler, is
doing when invoked by make is it's
encountering literally the word stir
comp and not knowing what it is because
we haven't taught it what it is by
simply saying include string.h at the
top. Okay, let me reopen my terminal
window. Clear that message away. Do make
search again. Now it's compiling. Dot /
search. Enter. Now I'm going to go ahead
and search as I did before for
battleship. Ah, now it's finding it. Let
me run dot slash search again. Search
for boot. Ah, okay, that's found. Let me
go ahead and search for top hat. That
too is in there. Let me go ahead and
search for something that's not there,
like the number 50. Not in fact found.
So I think we've actually fixed that
there problem. But if we go back to this
code for a moment, it's indeed the case
per the documentation that equals equals
0 is what I want to do. Why in the world
would stir comp be designed to return a
positive or a negative number too? It's
not returning true or false. It's
returning one of three possible values.
Zero, negative, or positive.
Why might it be useful? Yeah.
>> Um you could kind of like compare which
of the strings is like greater.
>> Yeah, super clever. So, if you're
passing in two strings, it's great to
know if they're equal. But wouldn't it
be nice if this same function could also
help us sort these strings ultimately
and tell me which one comes first
alphabetically. And technically, it's
not going to be alphabetically. It's
going to be a cute phrase asetically
because it's actually going to look at
the asky values of the characters and do
some quick arithmetic and tell you which
one comes first and which one comes
later, which is enough as we'll
eventually see for actually sorting
these strings as well. So in short, the
documentation will tell me that I should
check not only for zero if I care about
equality, but if I care about
inequality, that is checking if one
comes first or last, I should check
whether something is less than zero or
greater than. But for this demonstration
implementing linear search, I don't care
about comparing them uh for inequality.
All I care about is that they are in
fact the same or not in this case. All
right. All right. Well, let's go ahead
and do one other example of sort of
linear search, but let's make the
problem more like that actually in week
zero of searching a phone book. So, let
me go back to VS Code here. Close
search.c and let's make an actual phone
book. So, I'm going to say code of
phonebook C. And then inside of
phonebook C, let's use our same header
file. So, include CS50.h, include
standard io.h,
and let's include an advanced string.h.
Then let's before as before do int main
void. No command line arguments today.
Then inside of here, let me give myself
first an array of strings. How about
some names in the phone book? So I'm
going to say string names equals and
then three names just to make uh a
demonstration. Kelly and David and say
John Harvard here. But if it's a phone
book, I need more than just names. So
let me go ahead and give myself another
array. String of numbers open bracket
close bracket equals. And now the same
phone numbers we used in week zero for
the three of us. Uh + 1 617 495 1. Uh
same for both Kelly and me. So plus1
617495
uh 1. And then as before, if you'd like
to text or call John directly, you can
do so at plus1 9494682750
and semicolon. So one question first. I
obviously declared our names to be a an
array of strings because that's what
text is. Why have I also declared phone
numbers to be strings and not integers?
Because a phone number is like literally
a number in the name of it. Yeah.
>> Yeah. So even though we have phone
numbers in the US, even though we have
social security numbers and a bunch of
other things that we call numbers, if
you have other non-digits in those uh in
those values, you have to actually use
strings because if it's not an actual
integer, but it does have things like
pluses or dashes or parentheses or any
other form of punctuation as is common
in the US and other countries for phone
numbers in particular, you're going to
actually want to use strings and not
numbers. as well as for corner cases
like if there are if you're in the habit
back home if you're not from uh say the
US and you actually have to dial zero
first to make like a local regional call
you don't want to have a leading zero in
a integer because mathematically as we
know from grade school like leading
zeros number zeros that come first have
no mathematical meaning they're going to
disappear effectively from the
computer's memory unless we store them
in fact as characters in strings in this
way okay with that said let's go ahead
and ask the human now after having
declared those two arrays for the name
they want to look up the number of. So
let's say string name equals get string
and let's go ahead and ask the human uh
for the name for which to search. Then
let's use a for loop as before for int i
equals z i less than 3 which again for
demonstration purposes I'm just hard
coding today i ++ and then in the for
loop I'm going to use our new friend
stir comp. If the return value of stir
compare passing in names bracket I and
the name the human typed in equals
equals zero signifying that they are in
fact the same. Well that means we found
the location i where the person's name
is. So let's go ahead and print out
found. But just to be fun let's print
out whom we found. So percent s back
slashn and then output there the number
which is going to be in the
corresponding numbers array at that same
location I will return zero and at the
very end of this program let's go ahead
and print out not found if we get that
far and return one. All right. So, a
little more complexity this time, but
notice I'm comparing the names just like
a normal person would in your iOS app or
your Android app when looking for
someone's name. But what I care about is
getting back the number. So, that's why
two lines later, I'm printing out the
number that I found at location I, not
the name because I already know the
name. All right. In my terminal window,
let's go ahead and make this phone book
dot /phonebook. Let's go ahead and
search for John, whose number is
hopefully indeed exactly that number.
So, suffice it to say, this code two
does work. This is a linear search
because I'm searching left to right.
These aren't actually sorted
alphabetically by name or let alone
number. So, I think we're doing well
here, but I don't necessarily love this
implementation. Even if you're new to
programming, what might you not like
about how I've implemented a phone book
in the computer's memory?
Why is this maybe not the best design?
Yeah.
>> Like there's a correspondence between
names and numbers. So like having two
different
>> Okay. Yeah. And I would say so uh you're
pointing out that we have this duality.
We've got two arrays. They're the exact
same length. And it just so happens that
location zero's name lines up with
location zero's number and location one
and location two. But we're kind of on
the honor system here whereby the onus
is on us to make sure we don't screw
this up and we make sure we always have
the same number of names and the same
number of numbers and better and
moreover that we make get the order
exactly right. We are just trusting that
when we print out the e number so to
speak that it lines up with the e name.
So that's fine and honestly for three
people who really cares it's fine. But
if you think about 30 people, 300, 3
million, well, we're not going to
hardcode them all here, but even in some
database that we'll store them in later
in the course feels like just trusting
that we're not going to screw this up is
asking for trouble. And indeed, a lot of
programming is just that, like not
trusting yourself and definitely not
trusting your colleague not to mess
something up, but programming a bit more
defensively and trying to encapsulate
related information a little more
tightly together and not just assume as
on the honor system that these two
independent arrays will line up. But at
this point, we have no means of solving
this problem unless we give ourselves
just a bit new functionality and syntax.
So I used this phrase earlier to kick
things off. data structures. It's like
how you structure your data in the
computer's memory. Arrays are the
simplest of data structures. They just
store data back to back to back from
left to right continuously in memory.
But they all have to be, as we've seen,
the same kinds of values. Int int or
string string string. There's no
mechanism yet for storing an int and a
string together and then another int and
another string together or let alone two
strings, two strings, two strings that
are somehow a little bit different. But
it would be nice if C gave us an actual
data type to store people in a phone
book such that we could create an array
called people inside of which is going
to be a whole bunch of persons if you
will back to back to back and I want two
of them. So wouldn't it be nice if I
could literally use this code in C. Well
decades ago when SE was invented they
didn't give us a person data type. All
we have is int and float and char and
bool and string and so forth. Person was
not among the available data types. But
we can invent our own data types it
turns out. So in C what we can do if we
want persons to exist and every person
in the world shall have a name and a
phone number for now we can do this
string name string number. Now that's a
decent start but it's going to be kind
of a stupid implementation if I then
just do name uh string name one string
name two string name three string name
four. We've already started down that
road last week and decided arrays were a
better solution. But here's an
alternative when you want to just store
related data together. I can use these
two keywords and see typed defaf strruct
which albeit tur just means define a new
type that is a data structure. So
multiple things together inside the
curly braces you literally put the two
things you want to relate together
string name string number and then
outside the curly braces you specify the
name you want to give to this brand new
custom type that you have invented.
Technically, stylistically, you'll see
that style 50 prefers that the name
actually be on the same line as the last
curly brace, which looks a little weird
to me, but that's what industry tends to
do, so so be it. But these several lines
together tell C, invent for me a new
data type called person, and assume that
every person in the world has a string
called name and a string called number.
And now I can use this new data type in
my own code to solve this problem a
little bit better. So, in fact, let me
go ahead and do this as follows. I'm
going to go back to VS Code here. And at
the very top of my code, above main,
just to make this available to not only
Maine, but maybe any future functions I
write, I'm going to say type defrct, as
we saw on the screen. Inside of my curly
braces, I'm going to say string name and
string number. And then I'm going to
name this thing person. Now, I'm going
to go about using this and I'm going to
go ahead and delete my previous honor
system approach of having names and
numbers in separate arrays. And I'm
instead going to give myself an array of
people. Uh, we could call it persons,
but I'm trying to be somewhat
grammatically correct. So, I'm going to
say people bracket three to give myself
an array called people inside of which
is room for three persons inside of
which is room for a name and number
each. So, how do I now initialize these
values? So I'm going to hardcode them.
That is type them manually. But you can
imagine using get string or get or some
other function to get this data from the
human themselves. I'm going to say go to
the people array at location zero and
access the name field. And this is
syntax we haven't seen yet, but it's not
that hard. You literally use a dot, a
single period to say go inside of that
structure and access the name field, the
name attribute, so to speak. And let's
set that equal to Kelly. Then let's go
into that same array location people
bracket zero and set the number for the
zeroth person to be + one 6174951000.
Then let's go ahead and do the same
thing for people bracket 1. Set that
person's name to for instance mine
David. Then let's do people bracket 1
number equals quote unquote same as
Kelly cuz we're both in the directory.
So + 1 617495
1,000. And then lastly, people bracket
2.name
equals quote unquote John for John
Harvard. People bracket 2 number equals
+ one uh 949
468 275
0 in this case. And now the rest of the
code is almost the same. I'm going to
now on the new line 24 still ask the
user what name they want. I'm going to
still iterate from 0 to three because
there's still three elements in this
array even though each has two values
within. And I'm going to compare now not
names but people bracket i.name
to go access the name of that i person
and compare it to the name that the
human has typed in. And when I find that
person I'm going to go into the people
array at location i but print out the
number instead. So all we've done here
is add this dot notation which allows
you to access the inside of a data
structure. And all we've done is
introduce up here some new C keywords
that let you invent your own data types
inside of which you can put most
anything you want. I have chosen a
string name and a string number. All
right, let me go ahead and open my
terminal window and clear it from
before. Let me do make phone book to
make this version. So far so good. Make
phone book. Enter. I'm going to go ahead
now and search for say John. And I have
again found his number. So this is still
correct. But even though this took more
minutes in terms of the voice over and
it took more lines of code, it's
arguably better designed now because at
people bracket zero is an actual person
and everything about them. At people
bracket one is another person and
everything about them and so forth. This
is what we mean by encapsulate. You can
think of these curly braces as sort of
hugging these data types inside of the
data structure together so as to keep
them together in the computer's memory
as well.
All right. Well, just to set the stage,
uh, literally as we'll strike the
lockers and put something else up, the
efficiency of binary search as
implemented by Caitlyn was predicated on
Kelly having in advance sorted the
values up front. But of course, we've
only considered now the running time of
searching for information using two
algorithms, and there can be many others
in the real world, but those are two of
the most canonical. We found that binary
search was faster than linear search,
but it required that we sort the data.
So to your question earlier, maybe we
should consider just how expensive it is
in terms of time, money, space, humans
to sort data, especially a lot of data,
and then decide whether or not it's
worth using something like binary search
or perhaps even something else. So the
next problem we'll solve today
ultimately is given a generic input and
output. The input to our next problem is
going to be unsorted data. So like
numbers out of order, the output of
which should be sorted data. So for
instance, if we pass in 72541603,
I want whatever black box is
implementing my sorting algorithm to
spit out 0 1 2 3 4 5 6 7. So that's
going to be the question we answer. But
first, I think it's time for some
delightful hello pandas, chocolate
biscuits. Uh let's take a 10-minute
break and snacks are now served.
All right, we are back. And recall that
the cliffhanger on which we left was
that how do we go about sorting numbers?
Well, here are some numbers, eight of
them in fact, from 0 to seven. but
currently unsorted. Um, we don't quite
have enough Monopoly boards for
everyone, but we do have some delightful
uh Super Mario Brothers Pez dispensers.
If I could get eight volunteers for this
final demo up here. Oh, and not a lot of
hands. Okay. All right. One, two, three,
four, five, six, and let's go farther
back. Seven, and eight. How about All
right. Come on up. Hopefully I counted
properly. Come on over.
Upon arrival at the stage, go ahead and
grab your favorite illuminated number
and stand in that same order at the
front of the stage if you all could.
Welcome to the stage. All right, grab
your favorite number. Stand in that same
order.
All right,
good. And one, two, three, four, five,
six. I definitely said one through
eight. Who is the number eight then?
Okay, we need an eight. Come on down.
All right. Well, technically we need a
four, but come on down. Yeah. All right,
grab the four and let me start from this
end first if you want to give a quick
hello and a little something about you.
>> Uh, hi, my name is Cameron. I'm a first
year and I want to study mechanical
engineering.
>> Welcome.
>> Hi, I'm Charlotte. I'm also first year
and I'm in Canada F.
>> Welcome.
>> Hi, I'm Ella. I'm also a first year and
I'm in the
>> Hi, I'm Precious. I'm also a first year.
I'm there.
>> Hi, I'm Michael. I'm just an Eventbrite
guest.
>> Yeah.
>> Hi, I'm Marie. I'm a first year and I'm
in Canada.
>> Welcome.
>> Hi, I'm Rick. I'm a first year and I'm
in whole worthy.
>> Welcome.
>> Nice.
>> I'm Jaden. I'm a first year in
Hullworthy and I really like free stuff.
>> Okay. Well, let's see then uh if we
can't award all these Super Mario
Brothers Pez dispensers. The first
notice, of course, that all eight of our
volunteers are completely out of order,
but in an ideal world, we would have the
smallest number over here.
Go over there. Number zero. Wait a
minute. Seven. Let's go over here.
Two. Okay. F. Okay. Make yourselves look
like that.
No pez. It's okay. All right. So, 725
41603.
Okay. We won't do the introductions
again, but now we have a list of numbers
completely out of order. And wouldn't it
be nice if zero were eventually over
here, seven were all the way over there,
and everything else was sorted from
smallest to largest? Well, if you all
could go ahead and sort yourselves from
smallest to largest. Go.
All right. And Jaden, what was your
algorithm for doing that? Um I
I I I know that I have the least number
because I don't think there anybody has
a number less than zero. So I put myself
at the last bottom line.
>> Okay. And I assume Precious. What was
your algorithm?
>> I knew I had the largest number. So I
just had to be at the end of the
>> Okay, fair. So you guys got the easy
ones. Uh number four. How about
>> I knew three was before me and five was
after me.
>> Nice. So number four didn't actually
have to move coincidentally. But as for
five and three and two and one and six,
they probably had to take into account
some additional information. Who's to
their left? Who's to their right? And it
just kind of worked. But it didn't look
very algorithmic, if you will. It looked
very organic and obviously correct. But
I'm not sure that same approach would
work well if we had not eight, but 80 or
800 or 8,000 pieces of data. So let's
see if we can't formalize this a little
bit. Let me take the mic and if you guys
could reset yourselves to those same
original positions from seven on the
left to three on the right. Let me
propose a couple of algorithms,
canonical ones if you will, but see if
maybe we can't formalize step by step
what to do. So the first one I'm going
to do given all of these numbers is just
try to select the smallest number. Why?
To Jaden's point earlier, I just want to
put the smallest number over here. At
least that's a problem I can solve. It's
very well defined. It's a nice bite out
of the problem. So seven. Okay, smallest
so far. Two, that's that's smaller. So
I'm going to remember that two is the
now smallest number I've seen. Not five,
not four. One is even smaller. So, I'm
going to remember one, not six, zero.
That's pretty good. But I'm going to
check the whole list. Maybe there's
negative one or something like that. But
no, three. So, I'm going to remember
that zero was the smallest element I
found. Let's select Jaden and put Jaden
over here. But before Precious or anyone
else moves, we don't really have room
for you. Like, Precious is in the way
because if this is an array of eight
values for integers, well, we can't just
kind of make room over here because if
you think back to last week, we might
have uh some garbage values there or
something else is going on. We don't
want to change data that doesn't belong
to us. So what to do with precious?
Well, maybe Precious, maybe you can go
over there. So you just take Jaden's
spot and we'll swap these two values
accordingly. Now though, Jaden is in the
right space, which is good because now I
can move on to the second problem.
What's the next smallest element that's
presumably greater than zero? Well, at
the moment, two is the next smallest
element. Not five, not four. Ooh, one is
the next smallest element. I'm going to
remember that. Not six, not seven, not
three. Okay, so number one, if you could
go to the right location, but I'm afraid
we're going to have to evict number two
to make room. All right, let's do this
again. Zero and one are in good shape.
So now I think I can ignore them as
complete. Five is the current smallest.
Nope. Four now is Nope. Two now is six.
No. Seven. No. Three. No. Okay, so two
is the next smallest. So let's swap two
and five. And now I've solved three out
of the eight problems. Let's do this
again. Four is at the moment the
smallest. Not five, not six, not Oh,
three is the now smallest. So, let's
swap three. Four and three, which
unfortunately is making the four problem
a little worse. Like he belongs there,
it would seems, but I think we can fix
that later. So, now half of the list is
sorted. Five is the next smallest. Six
and seven. A four. Now, we got to fix
the four. So, four goes back there. Now,
I messed up the five, but it will come
back to that. All right. Six. Seven.
Okay. Five. Let's put you where six is.
And now one more mistake to fix. So,
seven. Okay. Six and seven need to swap.
And now I've solved eight problems in
the aggregate. So it's complete. Now to
be fair, my approach is clearly way
slower than your approach, but you all
were working in parallel, whereas I was
doing it more methodically, step by
step. And I dare say my algorithm is
probably going to be more translatable
to code. And indeed, what I just acted
out is what the world would call
selection sort, whereby on each
iteration, each pass in front of the
humans, I was selecting the smallest
element I could find. All right. What
how else could I do this, though? So,
let's do something that's maybe a little
more organic like your approach where
you were actually comparing who was next
to you. Go ahead and reset yourselves
one final time to this arrangement.
Seven on the left, three on the right.
And let me propose again to walk through
the list again and again. But let me
focus more narrowly on the problem right
in front of me because I felt like I was
taking a lot of steps back and forth,
back and forth. Maybe we can chip away
at some of that wasted time. Let's
compare seven and two. They're obviously
out of order. So, let's just immediately
swap you two if we could. All right.
Now, seven and five clearly out of
order. Let's swap these two. Seven and
four out of order. Let's swap these two.
Seven and one out of order. Let's swap
these two.
Seven and six out of order. Let's swap
these two. Seven and zero out of order.
Swap these two. Seven and three out of
order. Swap these two. So, a lot of work
for Precious there. But, I've now indeed
solved one of the eight problems.
Moreover, I don't need to keep uh
addressing the seven problem because
notice that Precious has essentially
bubbled her way up to the end of the
list. And indeed, that's going to be the
operative term here. Another algorithm
that computer scientists everywhere know
is called bubble sort, whereby the goal
is to get the biggest elements to just
bubble their way up to the top of or the
end of the list one at a time. Now, am I
done? Well, no. Clearly not. There's
still stuff out of order except for
precious. Indeed, I have solved one of
these eight problems. And now fine, I'll
go back and I'm just going to try this
same logic again. Two and five, good.
Five and four, nope, swap those. Five
and one, nope, swap those. Five and six
are good. 6 and zero, nope, swap those.
Six and three, nope, swap those. And I
already know that Precious is where she
needs to be. So, I think I'm done with
the second of eight problems. And I'll
do this a little faster now. Two and
four. Four and one, swap. Four and five
are good. Five and zero, swap. Five and
three, swap. And now we solved three
problems. Let me reset. Two and one,
swap. Two and four are good. Four and
zero, swap. Four and three, swap. And
now I've solved half of the problems.
Four out of eight. We're almost done.
One and two are good. Two and zero,
swap. Two and three are good. Okay. And
now we're done with five out of the
eight problems. One and zero swap.
Uh, one and two are good. Those are all
good. And let me just do a final sanity
check. Everything now is sorted. So now
I'm done solving all eight of those
problems. So, you all were wonderful. We
need the numbers back, but Kelly has
some delightful Pez dispensers for you
on the way out. If you want to head that
way, just leave the numbers on the
shelves. And a round of applause for our
eight volunteers for helping to act this
out.
Thank you.
So, let's see if we can't formalize what
these volunteers kindly just did with
us. Starting with the first of those
algorithms. Thank you. Namely, selection
sort. Let's see if we can't slap some
pseudo code on this. thinking of our
humans now as more generically an array.
So we had the first person at location
zero and we had the last person at
location n minus one. And just for
clarity so that you've kind of seen the
uh symbology this obviously is going to
be location n minus2. This is location n
minus3 and so forth until sort of dot
dot dot you hit the other end that we've
already written out. So that's just how
we would refer to all of our eight
volunteers locations or in this case 1 2
3 4 5 6 seven locations but dot dot dot
in the middle conoting that this can be
a much much larger array. So here's some
pseudo code for the first algorithm
selection sort for i from zero to n
minus one. So from the first element to
the last element find the smallest
number between the numbers bracket i and
numbers bracket n minus one. In other
words, if you're starting I at zero,
look at specifically every lighted
number between location zero and
location n minus one. When you have
found that smallest element, swap it
with the number at location i, which
starts again at zero. That's how we got
I think jaden into place at the very
beginning. Then I by nature of how for
loops work gets updated from 0 to one.
So that we do the same thing. Find the
smallest number between numbers bracket
one. So the second element through the
eighth element because this number is
unchanged. N is the total number of
values. So the end point there is not
changing. Once we found the second
smallest person, we swap them with
location I aka one. And that's how we
got the number one into position and
then the number two and then the number
three and number four. So this then was
selection sort in pseudo code form. And
that allowed us to actually go through
this list again and again and again in
order to find the next smallest element.
So what was happening a little more
methodically if it helps just to map
that symbology of the bracket notation
and the eyes. If this is where we
started with location I and we did
everything between location N minus one.
Essentially I traversed this whole list
from left to right literally walking in
front of our volunteers looking at each
element and the first element I saw was
seven. At the moment that was the
smallest element I had found. And who
knows in a different list maybe seven
would be the smallest element. So I kind
of stored it in a variable in my mind.
But I checked then two and remembered no
no two is clearly less than. Now I'm
going to remember two. Okay. Now I'm
going to remember one when I find it.
Then I'm going to remember zero when I
find it. And then what I did once I
found jade in it with the value of zero
uh lighted up. I moved location that
location to here and then evicted
precious recall and moved precious over
to that location that we had freed up.
Why? Why all this sort of back and
forth? Well, you have to assume with an
array that you're not entitled to the
memory over here. You're not entitled to
the memory over here if you've already
decided that you have seven lockers or
eight people. You have to commit to the
computer in advance. That's why we put
the number typically in the square
brackets or the compiler infers from the
curly brackets how big the array
actually is. All right. And suffice it
to say when I went through this again
and again and again, I did the same
thing over and over. Now, you might have
thought me sort of dumb for having asked
the same questions again and again like
I was surprised to discover the number
one. I was surprised to discover the
number to two even though on my very
first pass I literally looked at all
eight of those numbers but you have to
think about what memory I'm actually
using. Now I certainly could have
memorized all of the numbers and where
they are. But I propose that just very
simply I was using like a single
variable in my brain just to keep track
of the then smallest element. And once
I'm done finding that and solving that
problem I moved on to do it again and
again. But that's going to be a
trade-off. And this is going to be
thematic in the coming weeks whereby
well sure you could use more memory and
I could have been smarter about it and
maybe that would have improved or um
hurt the running time of the algorithm.
There's often going to be a trade-off
between how much memory or how much time
you actually use. So we'll discover that
over time. So how fast or slow is
selection sort? Well consider when I had
eight humans on stage I first went
through uh all n of them. But how many
comparisons did I make? Really, I was
doing n minus one comparisons because if
I've got n people, I've got to compare
the smallest number I found against
everyone else. And you compare n people
left to right n minus one times total.
So the first pass I was making I was
asking n minus one questions. Is this
the smallest? Is this the smallest? Is
this the smallest? N minus one times.
Once I solved one problem, when we got
Jaden into Jaden's right place, then I
had one fewer problem. Then one fewer
fewer problem and so forth. So, it was
like n -1 steps plus n -2 steps plus n
-3 steps plus dot dot dot one final step
once I got to the final of the eight
problems. Now, if you remember kind of
the cheat sheet at the back of your math
books, uh say growing up, you'll note
that this uh series here can be more
simply written as n * n -1 all / 2. And
if you've not seen that before, just
take on faith that this is identical to
this series of numbers up here. So, now
we can just kind of multiply this out.
So that's technically n^2 minus n all
divided by 2, which is great. If we
multiply that out, that's n^ square over
2 - n /2. We're getting too into the
weeds. Let's whip out our big O notation
now, whereby we can wave our hands at
the lower order terms only care about
the biggest most dominant term, which
mathematically in this expression, if
you plug in a really big value of n,
which is going to matter more? The n
squ, the two, the n, or the two?
Like the n squ? like the others
absolutely contribute to the total
value. But if you plug in a really big
value, the dominant force is going to be
this n squ because that's really going
to blow up the total value. So we can
say that selection sort when analyzed in
this way, ah it's on the order of n
squared steps because I'm doing so many
comparisons so many times. So if that's
the case, the question then is um what
is indeed not just its upper bound but
maybe it's lower bound as we'll
eventually see. So for selection sort
for now, let's stipulate that it's
indeed in big O of N squ. And that's
actually the worst of the algorithms
we've seen. Like that's way slower than
linear search because at least linear
search was big O of N. Selection sort is
N squar which of course is N * N which
is and will feel much much slower than
that. So what if though we consider the
lower bound of selection sort? All
right, maybe it's bad in the worst case,
but maybe it's really good when the
numbers are mostly sorted.
Unfortunately, this is the same pseudo
code for selection sort. We make no
allowance for checking the list to make
sure it's already sorted. And in fact,
that's kind of a perverse case to
consider for any algorithm. What if the
problem's already solved? How's your
algorithm going to perform? Like if all
of my volunteers is they kind of almost
did accidentally, they started lining up
roughly in order. Suppose they literally
had been in order from 0 to 7. Well, my
stupid algorithm would still have me
walking back and forth, back and forth,
back and forth. Why? because the code
literally tells me do this this many
times and every time I do that find the
smallest element. So it's going to be
sort of a stupid output because the list
is not going to be any changed any any
at all changed but my code is not taking
into account in any way the original
order of the numbers. So no matter what
this is to say that if we consider
whether the lockers or the humans the
omega notation for this algorithm even
in the best case where the data is
already sorted is crazily also n
squared. Now I could certainly change
the pseudo code but selection sort as
the world knows it is more of a
demonstrative algorithm or sort of a
quick and dirty one. Its running time is
going to be in omega of n squ. And now
we can actually deploy our theta
notation because the bigo notation is n^
squ and the omega notation is n^ squ and
the same. We can also say that selection
sort is in theta of n^2 which is not
great because that's annoyingly slow. So
maybe the solution here is don't do
that. Let's use bubble sort instead. The
second algorithm where I just compared
everyone side by side again and again.
Well, here's some pseudo code for bubble
sort which you can assume applies to the
same kind of array from zero on up to n
minus one. Here's one way to write
bubble sort. Repeat the following n
times. For i from 0 to n minus 2, if the
number at location i and the number at
location i + 1 are out of order, swap
them. And there's kind of an elegance to
this algorithm and that like that's it.
And you just assume that when you go
through the list, this is how from I
from 0 to n minus two, this is how I was
effectively comparing elements 0 and 1,
one and two, two and three, three and
four, dot dot dot, uh seven, six and
seven. But notice I didn't say eight.
There were eight total people. Why do we
go from 0 to n minus2 instead of from 0
to n minus one?
Uh yeah. Yeah. We already checked the
last one.
>> Not quite. So it's not that we've
already checked the last one. I'm saying
with this line of code here, we never
even go to N minus one. Technically,
>> if we have NUS, it is going to compare
against NUS because that's
>> exactly because we're doing this simple
arithmetic here. We're checking current
location I + 1. You can think of these
as my left and right hand. Left hand is
pointing at zero. Right hand's pointing
at one. I don't want to do something
stupid and have my left hand point at n
minus one because then my right hand
arithmetically when you add one is going
to point at n which does not exist.
That's beyond the boundary of the array
because the array goes from zero to n
minus one. So just a little bit of a
safety check there to make sure we don't
walk right off the end of the array. But
we do this n times because recall that
precious ended up being where uh seven
needed to be at the very end of the
list. But that didn't mean there weren't
seven uh seven more problems still to
solve. 0 through six. So I did it again
and I did it again and per its name
bubble sort the biggest element bubbled
up first then the next biggest then the
next biggest then the next business
biggest biggest that is seven then six
then five then four and we got lucky on
some of them but eventually we finished
with zero. So how do we analyze this
thing? Well, we could also technically
do this n minus one times as an aside if
you're thinking through that I'm wasting
some time because we get one for free
once we get to uh solving seven
problems. You get the eighth one for
free because that person is obviously
where they need to go. So when we had
these numbers initially and we were
comparing them with bubble sort again
left hand right hand it's like treat
this as I this is I plus one and we just
kept swapping pair-wise numbers if in
fact they were out of order. So all this
is saying is what our humans were doing
for us organically. So how do we
actually analyze the running time of
this? Last time I just kind of
spitballled that it was n minus one
steps plus n minus two steps. Well, you
can actually look at pseudo code
sometimes and if it's neatly written,
you can actually infer from the pseudo
code how many steps each line is going
to take. For instance, how many steps
does this first line take? I mean like
literally n minus one. The answer is
right there because it's saying to the
computer or to me acting it out, repeat
the following n minus one times. All
right, so that's helpful. How many line
how many steps does this inner loop
induce? Well, you're going from i to n
minus2. So that's actually n minus one
total steps not n. And then this
question here, if numbers bracket i and
numbers i are out of order, it's a
single question. It's like our boolean
expression. We'll call it one. I mean,
maybe you need to do a bit of more work
than that, but it's a constant number of
steps. Doesn't matter how big the list
is. Comparing two numbers is always
going to take the same amount of time.
And then swapping them, oh, I don't
know, it's going to take like one or two
or three steps, but constant. Doesn't
matter which the numbers are takes the
same amount of work. So, let's
stipulate, let me rewind, stipulate that
the real things that matter are the
loops. These constant number of steps,
who really cares? But the loops are what
are going to add up as n gets large. So
this really then is if this is the outer
loop and this is the inner loop. Think
about our two-dimensional Mario square
from week one. We did something on the
outside and then something on the inside
to get our rows and columns. This is
equivalent to n -1 * n minus one. If we
do our little foil method, n^2 - n - n +
1 combine like terms, n^2 - 2 n + 1. Who
cares? This is ultimately going to be on
the order of big O of
N squared only because again if you ask
yourself when I plug in a really big
value for N which of these is really
going to contribute most to the answer
it's obviously going to be n^ squ again
and we can ignore the lower order terms.
So this doesn't seem to have made any
progress like selection sort was on the
order of big O of N was on the order of
N squ bubble sort based on this analysis
is also on the order of N squed. Maybe
we're getting lucky in the lower bound.
So on the upper bound for bubble sort,
it's indeed n squ as was selection sort.
But with this pseudo code for bubble
sort, unfortunately
we rather unfortunately we were not
doing anything clever to catch that
perverse case where maybe the list was
already sorted. After all, consider if
the list was sorted from 0 to 7. I was
still asking all the same darn
questions. Even if I did no work, I was
going to repeat that n minus one times
back and forth making no swaps but
making all of those comparisons. But
here's an enhancement to bubble sort
that we can add that selection sort
didn't really have room for. I can say
after one pass of this inner loop
walking from left to right, if I made no
swaps, quit. So put another way, if I
traverse the list from left to right, I
make no swaps, I might as well just
terminate the algorithm then because
there's no more work clearly to be done.
All right. So based on that
modification, the lower bound of bubble
sorts running time would be said to be
an omega then of
n because I'm minimally going to need to
make one pass through the list. You
can't possibly claim that the list is
sorted unless you actually check it
once. And if there's n elements, you're
going to have to look at all n of them
to make sure that it's in order. But
after that, if you've done no work and
made no swaps, no reason to traverse the
list again and again and again. So a
bubble sort can be said to be an omega
of n because indeed we can just
terminate after that single pass if
we've done no work. We can't say
anything about theta because they're not
one and the same big O and omega. But
that does seem to have given us some
savings. Unfortunately, it really only
saves us time when the list is already
or mostly sorted. But in the average
case and in the worst case, odds are
they're both going to perform just as
bad on the order of n square. In fact,
let's take a look at a visualization
that'll make this a little clearer than
our own humans and voices uh might have
explained. Here is a bunch of vertical
purple bars uh made by a friend of ours
uh in the real world. And this is an
animation that has a bunch of buttons
that lets us execute certain algorithms.
A small bar represents a small number. A
big bar represents a big number. And the
goal is to get them from small numbers
or small bars to big numbers or big bars
left to right. So I'm going to go ahead
and click on selection sort initially.
And what you'll see from left to right
is in pink the current smallest element
that's been discovered, but also in pink
the equivalent of my walking across the
stage left to right again and again and
again trying to find the next smallest
element. And you'll see clearly just
like when we put Jaden at the far left,
the smallest element ended up over here.
But it might take some time for precious
for instance or number seven to end up
all the way over on the right because
with each pass we're really just fixing
one problem at a time and there's n
problems total which is giving us on the
order of those n squared steps and now
the list is getting shorter so we're at
least doing some work that you don't
have to keep touching the elements you
already sorted which just like I was. So
now selection sort is complete. Let's
visualize instead bubble sort. So let me
rerandomize the array just so we're
starting with a random order. Now let's
click on bubble sort. And you'll see the
pink bars work a little differently. It
conotes which two numbers are being
compared at that moment in time. Just my
like my left hand and right hand going
left to right. And you'll see that even
though it's not quite as pretty as
selection sort where I was getting at
least the smallest elements all the way
to the left here, we're just pair fixing
pair-wise problems, but the biggest
elements like precious's number seven
are indeed bubbling their way up to the
top one after the other. But as you can
see, and this is where n squared is sort
of visual visualizable, we're touching
these elements or looking at them so
many times again and again. We are
making so many darn comparisons. This is
taking frustratingly long. And this is
only what a few dozen bars or numbers.
You can imagine how long this might take
with hundreds, thousands, or millions of
values. I dare say we're going to have
to do better than bubble sort and
selection sort because we're not done
even yet. just trying to give the
satisfaction of getting to the end and
now we are. But neither of those
algorithms seems incredibly performant
because it's still taking us quite a bit
of time to actually get to that there
solution. So how can we actually do
better than that? Well, we can try
taking a fundamentally different
approach. And this is one technique that
you might have encountered in math or
even in the real world even if you
haven't sort of applied this name to it.
Recursion is a technique in mathematics
and in programming that allows you to
take sort of a fundamentally different
approach to a problem. And in short, a
recursive function is one that's uh
defined in terms of itself. So if you
had like f ofx equals f of something on
the right hand side of a mathematical
expression, that would be recursive in
that the function is dependent on
itself. More practically in the world of
programming a recursive function is a
function that calls itself. So if you
are writing some function in C and in
that function you call yourself you
actually have a line of code that says
call that same function by the same
name. That function is recursive. Now
this might feel a little weird because
if a function is calling itself it feels
like this is the easiest way to get into
an infinite loop because why would it
ever stop if the function is calling
itself calling itself calling itself
calling itself? We're going to have to
actually address that kind of problem.
But in the real world, we've actually or
rather in this class already, we've
actually seen implicitly an example of
this including today as well as in week
zero. So here is that algorithm for
searching the doors of the lockers. And
recall that after we did this check at
the very top, if there are any doors
left, return false. If if uh not, we did
these uh conditions. We said if the
number is behind the middle door, return
true cuz we found it. But things got
interesting here where I said if else if
the number is less than the middle door
then search the left half. Else if the
number is greater than the middle door
then search the right half. Well at that
point in time you should be asking me or
yourself well how do I sort search the
left half? How do I search the right
half? Well here you go. Like on the
screen right now is a search algorithm.
And even though it says down here search
the left half or search the right half
which is like well how do I do that?
We'll just use the same algorithm again.
And this is how in terms of my voice
over, you end up searching the left half
of the left half or the right half of
the left half or any such combination.
This line here, search left half. This
line here, search right half, is
representative of a recursive call. This
is an algorithm or a function that calls
itself. But why does it not induce an
infinite loop? Like why is it important
that this line and this line are written
exactly as they are so as to avoid this
thing just forever searching aimlessly?
Yeah,
>> there's the condition at which it stops.
>> We do have this condition at which it
stops. But more importantly, what is
happening before I make these recursive
calls?
>> Exactly. I'm recursing that is calling
myself but I'm handing myself a smaller
problem. A smaller problem. a smaller
problem. It would be bad if I just
handed myself the exact same number of
doors and just kept saying, "Search
these, search these, search these."
Because you would never make any
progress. But just like our volunteers
earlier, so long as we did divide and
conquer and we search smaller and
smaller numbers of doors, eventually
indeed we're going to bottom out and
either find the number we're looking for
or we're not. So, generally, we're going
to call these kinds of conditions that
sort of just ask a very obvious question
and want an immediate answer base cases.
Base cases are generally conditionals
that ask a question to which the answer
is going to be yes or no right then and
there. A recursive case by contrast
these two down here is when you actually
need to do a bit more work to get to
your final answer. You call yourself but
with a smaller version of the problem.
So we could have in fact in week zero
have written this sort of similarly. If
you go back to in your mind to week zero
we had more of a procedural approach so
to speak. When we were searching the
phone book, I proposed that this induced
what we called loops on line 8 and line
11, which just literally said go back to
line three. And that was more of a
mechanical way of sort of inducing a
loop structure. But if I really wanted
to be elegant, I could have said, well,
you know what? 7 and 8 together really
just mean search the left half. And 10
and 11 together really mean just search
the right half. So let's condense these
pairs of lines into shorter
instructions. Search the left half of
the book. Search the right half of the
book. I can then delete two blank lines
and now I have a recursive algorithm for
searching a phone book. It's a little
less obvious because you have to ask
yourself when you get to line seven or
nine, wait a minute, how do I search the
left half or the right half? And that's
when you need to realize you start the
same algorithm again but with a problem
that's half as large. In week zero, we
do the procedural approach where we
literally tell you what line of code to
go to, but today we're offering a
different formulation, a recursive
approach where it's more implicit what
you should do. and we'll see now a
couple of examples from the real world,
so to speak. So, here's a screenshot
from Super Mario Brothers 1 on the
original Nintendo uh entertainment
system. Let me go ahead and get rid of
some of the distraction like the the um
ground and the mountains there. And here
we have a sort of half pyramid, not
unlike that you implemented in problem
set one. But this is an interesting
realworld physical structure in that you
can define it recursively. Like what is
a pyramid of height for if you will?
Well, just to be a little uh a little
difficult, a pyramid of height four is
really just a pyramid of height three
plus one more row. Okay. Well, what is a
pyramid of height three? Well, a pyramid
of height three is really just a pyramid
of height two plus one more row. Well,
what's a pyramid of height two? Well, a
pyramid of height two is really just a
pyramid of height one plus one more row.
Well, what's a pyramid of height one? A
single brick on the screen. And I sort
of changed my tone with that last remark
to convey that this could then be our
base case whereby I just tell you what
the thing is without sort of kicking the
can and inviting you to think through
what a smaller structure is plus one
more row. Whereas every other definition
I gave you then of a pyramid of some
height was defined in terms of that same
structure albeit a smaller version
thereof. So we can actually um see this
in the real world. Let me go ahead and
pull up one thing here. I'm going to go
to uh give me one sec before I flip
over. Here I am on google.com. If you'd
like a little computer science humor
here, uh if you ever Google search for
recursion and hit enter, you'll see uh a
joke that computer scientists at Google
find funny.
Haha. One, two laughs. Does anyone see
the joke? I did not make a typo, but
Google's asking me, did I mean
recursion? And if I click on that, I
just get the same haha page. Okay. All
right. That didn't go over well. Anyhow,
so there are these Easter eggs in the
wild everywhere because computer
scientists are the ones that implement
these things. But let's go ahead and
actually um implement, for instance, a
version of this in code. Let me go back
over here in a moment to VS Code. And in
VS Code, let me propose that in my
terminal window, let me create one of
two final programs. This one's going to
be called iteration C. Just to make
clear that this is the iterative that is
loop-based version of a program whose
purpose in life is to print out a simple
Mario pyramid. I'm going to go ahead and
include cs50.h at the top as well as
standard io.h. I'm not going to need
string.h. I don't need any command line
arguments today. So this is going to
start off with inmain void. And now I'm
going to go ahead and ask a question
like uh give me a variable called height
of type integer and ask the human for
the height of this Mario like pyramid.
And then let's assume for the moment
that I've already implemented a function
called draw whose purpose in life is to
draw a pyramid of that height semicolon.
So I've abstracted away for the moment
the notion of drawing that pyramid. Now
let's actually implement draw whose
purpose in life again is to print out a
pyramid akin to the one we saw a moment
ago like this here on the screen. Well,
in order to print out a pyramid of a
given height, I think I need to say uh
void uh draw int n for instance because
I'm not going to bother returning a
value. I just want this thing to print
something on the screen. So void is the
return type. But I do want to take as
input an integer like the height of the
thing I want to print. I can call this
argument or parameter anything I want.
I'll call it n for number. So how can I
print out a pyramid that again looks
like this? Well, I'll do this quicker
than you might have in problem set one.
But seems obvious that like on the first
row I want one brick. On the second row
I want two. On the third I want three.
On the fourth I want four. So it's
actually a little easier than problem
set one in that it's sloped in a
different direction. So let me go ahead
and do exactly this in code. Let me say
for int i= 0 i less than n the height i
++. So this is going to be really for
each row of the pyramid pyramid. Let me
go ahead now and in an inner loop for
int j equals z, let's do j less than i +
1 for reasons we'll see in a moment and
then j++ and then inside of this loop
let's just print out a single hash no
new line but at the end of the row let's
print out a single new line to move the
cursor to the next line. Now why am I
doing this? Well, this represents for
each column of pyramid. And if you think
about it, on the first row, which is row
zero, I actually want to print not zero
bricks, but one brick. So that's why I
want to go ahead here and go from zero
to i + 1 because if i is zero, i + 1 is
1. So my inner loop is going to go from
0 to 1, which is going to give me one
brick. It's a little annoying to think
about the math, but this just makes sure
that I'm actually getting bricks in the
order I want them. And then it's going
to give me two bricks and then three and
then four. And between each of those
rows, it's going to print a new line. So
let's go ahead and do make iteration to
compile this code. Ah, I messed up. Why
do I have a mistake on line
eight of this code? Let me hide my
terminal and scroll back up. It seems
clang. My compiler does not like my draw
function. Yeah.
Yeah, I forgot the prototype. So this is
the one and only time where it seems
reasonable to copy paste. Let's grab the
prototype of that function up here and
go ahead and teach the compiler from the
get-go what this function is going to
look like even though I'm not defining
it now until line 13 onward. All right,
let's go ahead and make iteration again.
Ah, dot /iteration. Enter. Let's do a
height of say four. And voila, now I've
got that there pyramid. So, I did it a
little quickly and it's certainly to be
expected if it took you hours on problem
set one to get the other type of pyramid
printed. But the point for today is
really to demonstrate how we can print a
pyramid like this using indeed what I'd
call iteration. Iteration just means
using loops to solve some problem. But
we can alternatively use recursion by
reimplementing our draw function in a
way that's defined in terms of itself.
So let me go into my code here and I'm
actually going to leave the prototype
the same. I'm going to leave main the
same. But what I'm going to go ahead and
do is delete all of this iterative code
that's doing things very procedurally
step by step by step with loops. And I'm
instead going to do something like this.
Well, if I want to print a pyramid of
height n, what did I say earlier? Well,
a pyramid of height n is really just a
pyramid of height n minus one plus one
more row. So, how do I implement encode
that idea? Well, let me go back in code
here and say, well, if a pyramid of
height n first requires drawing a
pyramid of height n minus one, I think I
can just write this, which is kind of
crazy to look at, but cuz you're calling
yourself in yourself, but let's see
where this takes us. Once I have drawn a
pyramid of height n minus one, that is a
height three for instance, what remains
for me to do is to myself print one more
row. And so to print one more row, I
think I can do that really easily with
fewer loops. I can do four int i= 0 i
less than n i ++ and then very simply in
this loop I can print out a single hash
one at a time at the end of this loop I
can print out a new line but no more
nesting of loops what I've done is print
one more row and here I've done print a
pyramid of height n minus one
I'm not quite done yet but I think this
is consistent with my verbal definition
that a pyramid of height three is a
pyramid of height sorry a pyramid of
height four is a pyramid of height three
which I can implement per line 16 just
draw me a pyramid of height n minus one
and then I myself will take the trouble
to print the fourth and final row but
something's missing in this code let me
go ahead and try running it let's see
what happens make oh oh darn it I meant
to call this something else so I'm going
to do this I'm going to close this
version here I'm going going to rename
iteration C to recursion C to make clear
that this version is completely
different. Let me now go ahead and make
the recursion version. And huh, Clang is
noticing that I have screwed up. On line
14, it says error. All paths through
this function will call itself. And
Clang doesn't even want to let me
compile this code because that would
mean literally just forever
loop effectively by calling yourself. So
what am I missing in my code here? If I
open up what we're now calling
recursion.c
in my editor,
what's missing here over here? Yeah, I'm
missing a base case. And I can express
this in a few different ways, but I
would propose that before I do any
drawing of anything at all, let's just
ask ourselves if there is anything to
draw. So, how about if n equals zero,
well then don't do anything, just
return. You don't return a value. When
your return value is void, it means you
don't return anything. So you just
return period or return semicolon. Or
just to be super safe, I could actually
do something like this, which is
arguably better practice just in case I
get into this perverse scenario where
someone hands me a negative number. I
want to be able to handle that and not
print anything either. So just to be
safe, I might say less than or equal to
zero. I'm not doing one because if I did
do one, then I would want to at least
myself print out one brick, which is
fine, but I'd have to like rech change
all of my code a little bit. So I think
it's safer if my base case is just if n
is less than or equal to zero, you're
done. Don't do anything. And this then
ensures that even though thereafter I
keep calling draw again and again and
again and the problems getting smaller
and smaller from four to three to two to
one, as soon as I hit zero, the function
will finally
return.
So let's go ahead and open up my
terminal. Rerun make recursion to make
this version did compile this time. dot
/recursion enter let's type in four
cross my fingers and this too prints the
exact same thing and even though it
doesn't look like fewer lines of code I
would offer that there's an elegance to
what I've just done whereas with the
iterative version with all the loops it
was very clunky like step by step just
print this and print that and have a
nested loop inside of another but with
this especially if we distill it into
its essence by getting rid of my
comments like this and frankly I can get
rid of the unnecessary curly braces only
because for single lines in
conditionals. You don't need them. Like
this is arguably like a very beautiful
implementation of drawing Mario's
pyramid even though it's calling itself
and arguably because it is calling
itself.
Questions then on this idea of recursion
or this implementation of Mario? Yeah.
>> Are there no scope issues involved if
you like?
>> Good question. Are there any scope
issues involved? Short answer, no.
However, the current value of I, for
instance, will not be visible to the
next time the function is called. It
will have its own copy of I, if that's
what you mean. And we'll next week talk
in more detail about what's going on
here. And in fact, I probably can't
break this in class very easily. But it
turns out if I use a very large version
for heights, let's just hit a lot of
zeros and see what happens. That was too
many. Let's see what happens. That's
also too many. Let's see what happens
there.
That's the first time at least I in
class have encountered this error. You
might have encountered this weird bug in
office hours or in your problem set and
that's fine if you did. We'll talk about
what this means next week too. But this
is bad. Like this clearly hints at a
problem in my code. However, the
iterative version of this program would
not have that same error. So this
relates to something involving memory
because it turns out as a little teaser
for next week, each time I call draw,
I'm using a little more memory, a little
more memory, a little more memory, a
little more memory, and my computer only
has so much memory. this program in its
current form is using too much memory.
There are workarounds to this, but that
is a trade-off to the elegance we're
gaining in this solution. So, what's the
point of all this? And how do we get
sidetracked by Mario? There's another
sorting algorithm. The third and final
one that we'll consider today that
actually uses recursion to solve the
problem not only elegantly arguably, but
also way faster somehow than bubble sort
and selection sort. And in essence, it
does so by making far fewer comparisons
and wasting a lot less work. It doesn't
keep comparing the same numbers again
and again. Here in its essence is the
pseudo code for merge sort. Sort the
left half of the numbers, sort the right
half of the numbers, then merge the
sorted halves. And this is kind of a
weird implementation of an algorithm
because I'm not really telling you
anything. It seems like you're asking me
how do I sort numbers and I say, well,
sort the left half, sort the right half.
It's like someone being difficult. And
yet implicit in this third line is
apparently some magic. This notion of
merging halves that are somehow already
sorted is actually going to yield a
successful result. As an aside, we're
actually going to need one base case
here, too. So, if you're only given one
number, you might as well quit right
away because there's nothing to do. So,
we'll toss that in there as well. And
base cases are often for zero or one or
some smallum sized problem. In this
case, it's a little easier to express it
as one because if you have one element,
it's indeed already sorted. So, what
does it mean to merge two sorted halves?
Well, let's actually consider this. I'm
going to reuse some of these same
numbers here. I'm going to put my one,
my three, my four, and my six on the
left. And these together represent a
list that is indeed sorted of size four.
And then I'm going to put four other
numbers on the right there that are
similarly sorted as well. And by merging
these two lists, I mean start at the
left end of this list, start at the left
end of this list, and just decide one
step at a time which number is the next
smallest. And then I'm going to put it
on the top shelf to make clear what is
sorted. So if my left hand's pointing at
this list, my right hand's pointing at
there, which hand is obviously pointing
to the smaller element, left or right?
Like the right. So I'm going to grab
this and I'm going to use a little more
space up top here and put the zero in
place. And then I'm going to point to
the next element there. So my left hand
has not moved yet. It's still pointing
at the one. My right hand is pointing at
the two. Which number comes next?
Clearly left. So, I'm going to grab the
one and put it up there and update where
my left hand is pointing. So, now I'm
pointing at the three here and the two
there. What comes next? Obviously the
two. What comes next? Obviously the
three. What comes next? Obviously the
four. What comes next? Obviously the
five. But notice my hands are not going
back and forth, back and forth, back and
forth like any of the algorithms thus
far. I'm just taking baby steps, moving
them only to the right, effectively
pointing at for a final time each number
once and only once. What comes next?
Six. And now my left hand is done. What
comes last? The number seven. So what I
just did is what I mean by merge the
sorted halves. If you can somehow get
into a scenario where you've got a small
list sorted and another small list
sorted, it's super easy now to merge
them together using that left right
approach, which I'll claim only takes n
steps. Why? Because every time I asked
you a question, I was taking one bite
out of the problem. There's eight bytes
total. I asked you eight questions or I
would have if I verbalized them all. So,
it's n steps total to merge lists of
that size. So, what then is merge sort?
Merge sort is really all three of these
steps together only one of which we've
acted out. Two of which are sort of
cyclical in nature. They're recursive by
design. So what does this mean? Well,
let's start with this list of eight
numbers which is clearly out of order. 6
3 4 1 5270. And let's apply merge sort
to this set of numbers. And I'll do it
digitally here because it'll take
forever to keep moving the numbers up
and down physically. So let's move it to
the top just to give ourselves a little
bit more room. And let me propose that
we apply merge sort. What was the very
first step in merge sort? At least that
we highlighted the juicy steps.
What's the first step in merge sort?
Sort the left half. Yeah. And then the
second step was going to be sort the
right half. And then the third step was
going to be merge the sorted halves. So
let's see what this means by actually
acting it out on these numbers. So
here's my eight numbers. Let's go ahead
and sort the left half. Well, the left
half is obviously going to be the four
numbers on the left. And I'm just going
to pull them out just to draw our
attention to them over here. Now I have
a list of size four and the goal is to
sort the left half. How do I sort a list
of size four?
>> Uh be well yes but just be more pedantic
like how do I sort any list using merge
sort
>> sort the left half. So let's do just
that. So of a list of size four how do I
sort this? Well I'm going to sort the
left half. How do I sort a list of size
two?
>> Sort the left half. All right. Well I'm
just going to write the six here. How do
I sort a list of size one?
I just don't. I'm done. That was the
so-called base case where I just said
return. Like I'm done sorting the list.
Okay, so here I here's the story recap.
Sort the left half. Sort the left half.
Sort the left half. And I just finished
sorting this. So what comes next? Sort
the right half, which is this. And now
I've sorted the left half of the left
half of the left half, which is a big
mouthful. But what do I do as a third
and final step when sorting this list of
size two? Merge them. This part we know
how to do. I point left and right. And I
now take the smallest element first,
which is the three. Then I take the six.
And now this list of size two is sorted.
So if you remind in your mind's eye,
what step are we on? Well, we have now
sorted the left half of the left half.
So what comes after the left half is
sorted? We sort the right half. So we're
sort of rewinding in time, but that's
okay. I'm keeping track of the steps in
my mind. I want to now sort this list of
size two. How do you sort a list of size
two? Well, you divide it into a list of
size one. How do you sort this? You're
done. You then take the other right half
and you sort it. Done. Now you merge the
two sorted halves. So I point at the
four and the one. Obviously the one
comes first, then the four. Now I have
sorted the right half of the uh the
right half of the left half of the
original numbers. What's the next step?
Now that I have the left and right
halves of this list of s four sorted
merge those. So same idea but with fewer
elements. I'm pointing at the three and
the one. Obviously the one comes. Now
I'm pointing at the three and the four.
Obviously the three comes next. Pointing
at the six and the four. The four comes
next. And now the six comes last. Now I
have sorted the left half. And it's
intentional that 1 3 4 6 is the original
arrangement of the lighted numbers I had
on the shelves a moment ago. All right,
it's a long story it seems. But what
comes after you sorting the left half of
the original list? You sort the right
half. So let's put some uh put those
numbers over here. How do I sort a list
of size four? Well, you sort the left
half. How do you sort this thing of size
two? You sort the left half. You sort
the right half. And now you merge those
together. How do I now sort the right
half of the right half? Well, I sort the
left half. I sort the right half. And
then I merge those together. Now I have
sorted the left half and the right half
of the right half of the original
elements. What's next? The merging 0 2 5
and 7. Now we're exactly where we were
originally with the lighted numbers.
I've got 1 3 4 6. The left half sorted
0257. The right half sorted. What's the
third and final step? Merge those two
halves. of course 0 1 2 3 4 5 6 and 7
and hopefully even though there's a lot
of words that come out of my mouth I was
acting this out there wasn't a lot of
back and forth like I definitely wasn't
like walking back and forth physically
and I also wasn't comparing the same
numbers again and again I was doing sort
of different work at different
conceptual levels but that was like only
what like three levels total it wasn't n
levels on the board visually so where
does this get us with merge sort s.
Well, with merge sort, it would seem
that we have an algorithm that I claim
is doing a lot less work. The catch,
though, is that merge sort requires
twice as much space, just as we saw when
I needed two shelves in order to merge
those two lists. So, how much less work
is actually going to be possible? Well,
let's consider sort of the analysis of
the original list and how we might
describe its its running time in terms
of this big O notation. Hopefully, it's
not going to be as bad as n^ squ
ultimately. So, here are some like
breadcrumbs that if I hadn't kept
updating the screen and deleting numbers
once we moved them around, here are sort
of like traces of every bit of work that
we did. We started up here. We did the
left half, the left half of the left
half, the right half of the right half,
and then everything else in between. And
you'll see that essentially I took a
list of size eight and I did three
different passes through it. At this
conceptual level, at this conceptual
level, and at this one. And each time I
did that, I had to merge elements
together. And if you kind of think about
it here, I pointed at four elements here
and four elements here. And in total, I
pointed at eight elements. So there was
n steps here for merging. And if you
trust me, I'll claim that on this level
conceptually, there were also eight
steps. I wasn't merging lists of size
four, but I was merging two lists of
size two over here and two more lists of
size two over there. So if you add those
up, those are n total steps or or
merges, if you will. And then down here,
this was sort of kind of silly. I was
but I was merging ultimately eight
single lists alto together into the
higher level of con uh of conceptually.
So from a list of size eight we sort of
had three levels of work and on each
level we did n steps the merging. So
where is three? Well it turns out if you
have eight elements up here the
relationship between 8 and three is
actually something formulaic and we can
describe it as log base 2 of n. Why?
Because if n is eight, if you don't mind
doing some logarithms here, log base 2
of 8 is the same thing as log base 2 of
2 to the 3 power. The log 2 and the two
cancel itself out, which gives you
exactly the number three that I sort of
visualized with those traces on the
screen. Which is to say irrespective of
the specific value of n the big O
running time of merge sort is apparently
not n^ squ but it's log n time n or more
conventionally n * log n because you're
doing n things log n times technically
base 2 but we don't care about that
generally for big O notation and indeed
in big O notation we would say that
merge sort is on the order of N log N
that's its big O running time sort of at
the upper bound. What about the lower
order bound? Well, there's no clever
optimization in our current
implementation as there was for bubble
sort. And so it turns out the lower
bound would be an omega of n login and
in theta therefore of n login as well
because big o and omega are in fact in
this case one and the same. And if we
actually go back to our visualization
from earlier, give me just a moment to
pull that up here. In our earlier
implementation or an earlier
demonstration of these algorithms, we
had a side-by-side comparison of all the
comparisons. But here, if I go ahead and
randomize it and click merge sort,
you'll see a very different and clearly
faster algorithm. Even though the
computer speed has not changed, but it's
touching these elements so many fewer
times, it's wasting a lot less time
because of this cleverness where it's
instead dividing and conquering the
problem into smaller and smaller and
smaller pieces. And to give this a final
flourish since that was yes faster but
not necessarily obviously faster than
other things that we've done. How might
we actually compare these things side by
side by side? Well, in our final moments
together, let's go ahead and
dramatically and for no real reason just
dim the lights so that I'll hit play on
a visualization that at the top is going
to show you selection sort with a bunch
of random data. On the bottom is going
to show you show you bubble sort with a
bunch of random data. And in the middle
is going to show you merge sort. And the
takeaway ultimately for today is the
appreciable feel of difference between
big O of N^2 and now big O of N log N.
Heat. Heat.
All right. The music just makes sorting
more fun. But that's it for today. We
will see you next time.
All right. This is CS50 and this is week
four, the week in which we take off the
proverbial training wheels that have
been the CS50 library and reveal to you
all the more what's going on underneath
the hood of a computer in terms of its
memory. We'll also talk about files and
how you can actually persist information
for a long time, whether it's a file
you've downloaded or today that you've
created yourself. But first, I just
wanted to share some artwork that two of
your classmates, Avery and Marie, kindly
made before class, which is a picture
made out of Post-it notes. uh some
green, some purple, which collectively
from where you are looks like what?
>> Yeah. So indeed it's a cat that they
made using only zeros and ones or green
and purple pieces. And in fact, even
though this is fairly low resolution in
that it only has a few pixels this way
and a few pixels this way, it's actually
representative of how computers do
actually store images underneath the
hood. So let's actually start there. In
fact, we've had this bowl of stress
balls for some time here on the lect
turn. And if we take a beautiful photo
of it, they look a little something like
this. Of course, this too is a finite
resolution. And by resolution, I just
mean how many dots go horizontally and
how many dots go vertically. Multiply
those two together and you get some
number of bytes, maybe in kilobytes,
megabytes, or heck, if it's a massive
image, it could be even bigger than
that. But it is in fact finite. And if
we zoom in on this image, you start to
see a little more detail. But at the
same time, if you keep zooming in, you
start to see indeed that there's only
finite detail. And when we go really uh
zoomed in, you start to see actual dots
or pixels as they're called. In fact, on
most any screen, any image you look at,
if you look close enough by pulling your
phone up to your eyes or walking really
close to a TV, you may very well see the
same thing because any image on a screen
like this is represented by hundreds,
thousands, millions of tiny little dots
called pixels. And each of those pixels
has a color that gives it collectively
the appearance of stress balls in this
case or cats in this case. So in fact
among the things we're going to do this
week in the problem set is actually have
you write code via which you can
manipulate your own images um not only
to understand what's going on underneath
the hood but to apply some of today's
most familiar filters so to speak. In
fact if we go all the way down here
you'll see that this image of course is
multiple colors. We've got some white
and some red and shades in between. But
let's keep things simple for a moment
and propose that instead of looking at
these dots, we look at these zeros and
ones. And let me propose that in a
picture like this, any zero will be
interpreted as black. Any one will be
interpreted as white accordingly. If you
can see it, what is this a picture of?
>> Oh, smiley face is in fact right.
Because if you kind of focus only on the
zeros and try to ignore those ones, as I
can do here for you, you'll see that
embedded in that image was in fact this
smiley face. Now, this would be a sort
of one bit image. You either have a zero
or one representing each of the colors.
In modern times, we would actually use
16 bits per color, 24 bits for color,
maybe even more. And that's how we can
get every color of the rainbow instead
of just something black and white. But
in effect, what's happening here is that
if you did have a file on your Mac or PC
or phone storing this pattern of zeros
and ones and you opened it up in some
kind of image program or like the photos
app, it would be depicted to you
visually as this simply a grid X and Y
where some of the dots are white, some
of the dots dots are black. All right,
so with that said, how what kinds of um
representations might be involved here?
Well, we can actually rewind to week
zero. Recall that we talked briefly
about RGB, which just means red, green,
and blue, which is one of the most
common ways to represent colors inside
of a computer. And if any of you have
ever dabbled with Photoshop or similar
editing programs, or if maybe in high
school or earlier you made your own web
pages, odds are you're actually familiar
with a syntax we're going to see a lot
of today. This doesn't add anything
intellectually new. It's just an
introduction to a common convention for
how else we can represent numbers. So,
this is a screenshot of Photoshop's
color picker. Photoshop being a popular
program for editing photos and files.
And you'll see here that my selected
color looks to the human eye as black.
And I've highlighted here how I got
that. I chose black by typing in 0 0 0.
Which also, if you look up here, means
that I want zero red, zero green, and
zero blue. And yet, we somehow
translated it to six zeros instead of
just three. Well, if we take a look at
another color like white instead, I
claim that you can represent white in
Photoshop and today in code with FF FFF
or equivalently 255 red, 255 green, 255
blue. And here, if you think back to
week zero is maybe a hint at where we're
going with this. If you're using an 8bit
number, which means then you can count
from zero on up to 255. So recall that
255 is like the biggest number you can
represent with just eight bits. And yet
somehow there's going to be a
relationship between the 255s and these
Fs that we see down here. Let's just run
through a few more. If we wanted to
represent something like red, we're
going to use FF 000000. If we want to
represent green, we're going to use 00
FF 0. And lastly, to represent blue,
we're going to use 0000
FF. So what's going on here? And why do
we have just this different convention?
Well, turns out in the context of images
and also memory in general, it's just
human convention or programmer
convention to use this alternate
representation of numbers. Not the
so-called decimal system, but another
one that's not all that far off from
what we've been doing over the past few
weeks. So, here again was the binary
system. You've got just two digits in
your vocabulary, 0 and one. Here is the
familiar decimal system where you've got
10 instead, 0 through 9. Suppose we
wanted a few more digits. Well, we're
sort of out of Arabic numerals here, but
I could toss into the mix like A, B, C,
D, E, and F, either in lowercase or
uppercase. And in fact, that's what
computer scientists do when they want to
have more than just 10 digits available
to them, but as many as 16 digits
available. And in fact, when you want to
use this many digits, you call it hexa
decimal, implying that you've got 16
digits, aka base 16. Now, this there's
an infinite number of base systems. We
could do base 3, base 4, base 15, base
17 on up. But this is just one of the
relatively few conventions that are
popular in computing. And let's just
tease it apart because we're going to
see these kinds of numbers a lot. Well,
thankfully, like in week zero, like it's
the same old number system with which
you're familiar with the columns and the
placeholders. It's just the bases in
those columns mean a little something
different. So instead of using powers of
two or powers of 10, we're going to
today use powers of 16. So 16 to the 0
of course is 1. 16 to the first power is
uh 16. So we have the ones column, the
16's column and so forth. Meanwhile, if
we wanted to therefore start counting in
hexadimal, this twodigit number in
hexadimal is of course the number you
and I know in decimal as 0 because it's
still just 16 * 0 + 1 * 0. This in
hexadeimal is how you would represent
one, but you would say 01 or 01 instead
of just one to make clear there's two
digits. This would be 02 03 04 05 6 7 8
9. Now things get a little interesting.
In the decimal world, we're about to
carry the one and give ourselves two
digits 1 and zero. But in hexodimal, you
can keep going. So the next number in
hexodimal is going to be 0 A 0 B 0 C 0 D
0 E 0 F. And now things get interesting
again. What probably comes after zero F?
Even if you've never seen hex before
>> so one zero. You still still carry the
one as before. This goes back to zero.
And why is this now appropriate? Well,
how many digits did we just how many
numbers did we just count through? Well,
we started at 0 0. We went up through 0
F. And that's a total of 16
combinations. So, the highest we
counted, let me rewind. This number
here, of course, is going to be 1* F.
But what is F? Well, let's rewind
further. In fact, let's have our little
cheat sheet here. If we want to have
these digits at our disposal, I dare say
that 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15. So fif f is just going to represent
the number 15. So if we now fast forward
back to where we were just counting from
zero on up through 0 a through 0 f, we
land here. This of course is 16 * 0 1 *
f which is 1 * 15. So this is how in
hexodimal you would represent the number
15. This in hexodimal is how you would
represent the number 16 instead. 15 to
16. This is not 10. That's how you would
pronounce it in decimal. This is 1 0 in
hexodimal because 16 * 1 + 1 * 0 gives
us of course 16. Now we could do this
toward infinity but we won't. 1 2 1 3
dot dot dot all the way up to ff. So
quick mental math. 16 * f. That is to
say 16 * 15 + 1 * 15 is any guesses?
>> It is in fact 255. You don't even have
to do the math because if you just think
about where we were going with this,
indeed we saw pairs of fs in the
Photoshop screenshots because this is
how a computer would represent the
number you and I know in decimal is 255
by just using two fs. So why do we care
about hexadimal? Well, it turns out that
it's just convenient to use two
hexadesimal digits to represent numbers
because a single hexodimal digit can be
used to represent four bits at once. For
instance, let me go ahead and explode
this by putting a little bit of space
between the two digits here. And let's
consider how you would represent f.
Well, if f is 15 and you want to
represent 15 in binary, I think that's
just going to be 1 one one.
Now, why is that? Well, one in the
eighth's place plus one in the four's
place uh plus uh one in the two's place
plus one in the onees place indeed gives
me 15. So using a single f I can count
up as we've seen already as high as 15.
But of course I've claimed in the past
that it's super common to use eight bits
at a time or one bite to represent any
value because that's just a very useful
common unit of measure. And so in
hexadimal if you wanted to represent
four ones you can say f. If you want to
represent another four ones, you can
just say f, which is to say that f and f
together is just like the same as eight
ones together, which is how we finally
get to the total number of 255 because
this is the ones place, the two's place,
the four's place, the 8s, 16, 32, 64,
128. But if you group these into
clusters of four bits alone, you can
represent all of the possibilities from
0 through 15 just using 0 through f. So
with one hex digit you can represent
four bits which is a long way of saying
is it's just convenient for that reason
which is why the world tends to use hex
when talking about colors and as we'll
see memory as well. So in fact let's
consider what is meant by memory and
what's going on inside of the computer
when we've been storing values thus far.
Well here's that canvas of memory. I
proposed last time uh in uh I proposed
last time and before that we can sort of
number these bytes arbitrarily but
reasonably. This is bite 0 1 2 3 4 5 6 7
dot dot dot and maybe this is bite 15.
That's fine. Nothing wrong with that.
But in the real world, any programmer
would actually think of these locations
instead not in decimal notation but in
hexadimal notation just because because
it's convenience for the reasons
discussed. So we would actually number
these from zero on up through 9 and then
keep going with a b c d e f and so
forth. So what does that mean for the
other digits? Well, this would be 1 0.
This would be 1 1. This would be 1 2 dot
dot dot. Here now is 1 9. But here's 1
A, 1 B, 1 C, 1 D, 1 E, 1 F, and so
forth, just using hexodimal notation.
But there's arguably some ambiguity
here. For instance, if you just at a
glance were to look at this board and
see this address 1 0, is that by 10 or
is that byte 16? It's just non-obvious
because if you don't know what base
system you're working in, which you
could infer by looking at the rest of
it, it could potentially be ambiguous.
So in the world of hexodimal, super
common to literally prefix any number
you ever write in hexodimal notation
using 0x. The zero doesn't mean anything
per se or the x. It just means what
follows the 0x is a number in hexodimal
notation which makes unambiguous the
fact that this is o x10 which if you do
the math in decimal again ends up being
16 not of course the number 10. In short
today you're about to see a lot of zero
x's and a lot of twodigit or fourdigit
or 8digit numbers in hexodimal notation.
Generally we don't care what the numbers
translate to. You don't need to do a lot
of math but it's going to be common
place to see syntax like this. All
right, back to sort of normal time. So,
here is a line of code int n equals 50
wherein we might want to declare a
variable called n and store a number
like 50 in it. Let's actually go ahead
and do this simple now as it probably is
in a file called how about addresses C.
We're going to play around with computer
addresses. And in addresses C, I'm going
to do something super simple at first
whereby I'm going to include standard
io.h. Then I'm going to go ahead and in
uh write int main void. No command line
arguments here. And then I'm going to
declare this variable n, set it equal to
the arbitrary but familiar value of 50.
And then just so that this program does
something mildly useful, let's go ahead
and print out with percent i and a back
slashn that value of n. So nothing new
here. I'm just literally going through
the motions of declaring a variable and
printing its value. So let's do that.
Make addresses enter dot slash
addresses. And hopefully I'll indeed see
the number 50. So, not all that much
going on in the code, but let's consider
what's going on in the computer's
memory. This line of code and the one
after it is giving the results of that
program, but where is that n ending up?
Well, here's my grid of memory. And
let's just suppose for the sake of
discussion that the 50 ends up down
here. Maybe there's other things going
on in my program. So, this part of my
computer's memory is already in use. So,
it's reasonable that it could end up in
this location here. But what is
important is that how many bytes am I
using for n? Apparently,
>> four. And that's because we've said
integers tend to be four bytes aka 32
bits. So this is at least to scale even
though I'm just imagining where it ends
up in memory. So that's where the 50
actually ends up. So when I actually
call print f and pass in n, clearly the
computer is going to that location in
memory and actually printing out that
value. But that value is indeed at a
specific memory address. It's not going
to be quite as simple as ox0 or o x1 or
a small number typically. It maybe is
going to be something arbitrary like
ox123 where I'm just making this up.
It's an easily pronouncable number in
hexadimal notation. All right. So what
can I use that information for? Well,
thus far this hasn't been useful to us,
but certainly programs we've been
writing have actually been making use of
this. But with a bit more syntax, I can
actually start to see things like this,
not just on the screen, but in code. In
fact, let me propose that we introduce
two new operators in C. So, two new
pieces of syntax. One is a single
amperand and one is a single asterisk.
And we'll see that uh the asterisk has a
few different uses, but the amperand has
a very simple straightforward one, which
is to just get the address of a variable
in memory. So if you've got a variable
like n, if you prefix it with amperand
n, you can actually ask the computer at
what address is this variable stored.
You can find out if it's indeed ox123 or
something else altogether. So in fact,
let me go ahead and do this by going
back to my addresses.c program and let's
see if we can print out not the value,
which is obviously going to be 50, but
let's actually print out the address
thereof. So up here in my code, I'm
going to change the N on line six to be
amperand N instead. And I'm going to go
ahead and make one other change because
yes, N lives at an address. And yes,
that address is technically a number,
but it's conventional not to use percent
I to display that number, but rather
another piece of syntax, which is just a
new format code, which you don't often
need. This is more demonstrative than
useful, I would say. But percent p is
going to be what we use when we want to
print out an address of something in the
computer's memory. So, back to the VS
Code. One more change. I'm going to
change my percent i to percent p
instead. So, at this moment, we should
see a version of the program that's not
going to display 50 anymore, but
something like ox123, but probably a
bigger number than that cuz my computer
has way more memory than that address
suggests. So, let's again make
addresses. Let's run dot / addresses.
And indeed, this variable at that moment
in time apparently lives somewhere in
the computer's memory at address ox7
FFD3 C34 EC C. All of those are
hexodimal digits. It would be painful to
do the mental math to figure out what
the numeric address is. But we're seeing
it indeed in this common hexodimal
notation which is not going to be often
useful for us as humans. But the
computer is and has been using this
information for some time. So in fact
what we're about to introduce is
admittedly one of the more complicated
concepts in computing and in C in
particular namely a topic called
pointers. And I will say today more so
than ever might feel like a bit of a
fire hose. In fact, all these years
later, I still remember the day in which
I finally understood this topic, which
was not the day of the lecture in which
it was introduced, but it was in like
the back right corner of the Elliot
House dining hall. I was sitting down
during office hours with my teaching
fellow and he finally helped that light
bulb go off over my head. So, if some of
this feels a little arcane today, it
just comes with time and with practice
like everything else. So, what is a
pointer? A pointer is going to be a
variable that can store an address. Now,
yes, that address is technically just a
number, like an integer, but we
distinguish between integers that we
care about like 50 and things we might
do math on, and a pointer, which in this
case is just going to be the address of
a variable uh the address of a value in
memory. So, what does this mean? Well,
we can start to do things like this. I
can declare my variable n as before and
set it equal to the value 50. But I can
actually get the address of n and put
that address in another variable. And
that variable we now call a pointer. So
P is going to be the name of this
variable. It's going to store the
address of N which we can get using the
amperand. But there's one more piece of
syntax which I promised before. This
asterisk here. And the asterisk here
means that this variable P stores the
address of an integer, not an actual
integer per se. It's weird looking
syntax. It kind of looks like
multiplication, but it isn't. It's just
the developers of C decades ago decided
to use an asterisk, even though it's
admittedly nonobvious what it's doing.
But in this context, when you see an
asterisk right after a data type like
int, it just means that the variable in
question is not going to be an int per
se, but an address of an integer. Okay,
so let's put this to the test using a
line of this code in my own file here.
Let me propose that we do this. Let me
go back to VS Code here. Let me
introduce this additional variable int
star p as it's typically pronounced. Set
that equal to amperand n and then do the
exact same thing as before. Let's not
print out amperand n but let's actually
print out the value of p itself because
p is now equivalent to amperand n. So
let me go back to VS Code. Let me do
make addresses again. And huh, I did
something wrong and stupid here. This
was not meant to be the moral of the
story. What did I do wrong? Yeah.
>> Yeah, I just missed the semicolon. So,
still making those mistakes here. All
right. And let me clear my screen again
and do make addresses. Entertresses.
And now I should indeed see the address
of N, I just so happen to temporarily
store it this time inside of a variable
called P. Now, just so you've seen it,
it turns out that when using this syntax
of using a star to declare a so-called
pointer and amperand over here to get
the address of something, you might see
in online references and such different
formattings of this. This is the
canonical way to declare a pointer. Int
space, then the star, then without a
space, the name of the variable.
However, it will work and you will
sometimes see that the star is over here
or the star is in the middle. But again,
we would recommend stylistically that it
just go here. Admittedly, I think it
would have been clean clearer if the
star were over here, making clear that
it's related more to the int than it is
to the variable name. But this is simply
the convention. So this means, hey
computer, give me a variable called p
that's going to store the address of an
integer. And the amperand is just
saying, hey computer, tell me the
address of n. And it's the compiler and
computer itself that decided where to
put that variable in memory. Questions.
>> Would you get an error if you didn't put
the asterisk? You would. And let's take
a look. So, let me go ahead and clear my
terminal. Let me go ahead and delete the
star before the variable p. Now, let me
go ahead and do make addresses again.
And indeed, I'm getting an error.
Incompatible pointer to integer
conversion initializing int dot dot dot.
And even though that's a lot of big
words, it kind of says what it means.
You're trying to go from a pointer on
the right to an integer on the left,
which is just not appropriate here. Yes,
at the end of the day, they're all
numbers, but it's more properly a
pointer or an address on the right, but
a little old int now incorrectly on the
left. So, the fix there is just to
indeed put it back. Other questions on
this new syntax? Yeah.
you do like
>> indeed. To recap the question, can you
use the address of operator to find the
address of other data types like
strings? Absolutely. And we'll do that
with a couple of examples today as well.
We're just using ins to keep it super
simple initially. Other questions on
these addresses and pointers.
>> So we still use
variables even if they're not integers.
Is that right?
>> Correct. Correct. Even if it's not an
int question, we'll come back to other
data types in a little bit. You're still
going to use the star. That is the same
syntax for everything.
And yes,
>> can you tell the computer I want to
store these variables in this address?
>> Oh yes. Can you tell the computer you
want to store a variable in this
address? That's where we're going in
just a bit. Indeed. Now that we have the
ability to find out the address of
something in memory, stands to reason
that we can go to that address ourselves
and maybe poke around and actually put
values there. And in fact, that's that's
among our goals for today. So let's
consider how we might get there. So here
now is my canvas of memory and let me
propose that the number 50 happened to
get stored in the variable n down there
at bottom right just because and that's
probably ox123 or in reality a much
larger address but it's easier and
quicker for us to just pretend it's at
0x123.
What is actually happening in code when
I declare P and put a value there? Well,
recall a moment ago I declared P to be a
pointer to an integer. that is the
address of an integer. So what's
happening in memory is this. If n is
down here and happens to be at address
ox123
when I actually assign p to amperand n
that just literally takes that address
of n and puts it inside of p. Now p as
an aside happens to be pretty big. It
turns out by convention on most systems
a pointer that is a variable that stores
an address is actually going to be eight
bytes large. It's going to be 64 bits.
Why is that? Our computers have so much
darn memory nowadays in the gigabytes
that you need to be able to count higher
than 4 billion. As an aside, if you only
used 32 bits for your pointers, you
could only count recall as high as 4
billion. 4 billion uh is 4 gigabytes
equivalently. That would mean your
computers could not have 8 gigabytes of
memory, 16 gigabytes of memory. Your
servers couldn't have tens of gigabytes
of memories. We use 64 bits or eight
bytes nowadays for pointers because our
computers have that much more memory.
All right. So what is Ptor Storing?
Literally just an address like this. So
when we wrote this code just a moment
ago, what the computer did and has been
doing for the past several weeks is
literally just finding the location of N
in memory and plopping that value inside
of P which itself is taking up a bit of
memory but or uh by convention more
memory 8 bytes in this case. The thing
is who really cares about this level of
detail? Typically, as programmers, it's
useful to understand what's going on,
but rarely are we going to care
precisely about where things are in
memory. Today is really about just kind
of looking at what's going on underneath
the hood. So, in fact, we can abstract
away most of my computer's memory, I
would propose, because at the moment,
all we care about is P existing and N
existing. So, who really cares what else
is going on? And frankly generally I am
not going to care that N is at address
ox123 just that it is at an address that
happens to be ox123. And so the way a
programmer or computer scientist when
talking about design on like a
whiteboard or frankly in sections and
office hours on a whiteboard we rarely
care what the actual addresses are. So
we generally abstract the specific
address away and literally represent
pointers with arrows on the screen or on
the whiteboard or the like. This just
means that P is a variable that points
to the number 50 in memory.
Okay. Questions on this mental model for
what a pointer is. It's a pointer in
like very much the literal sense.
Okay. So, if you're on board with that,
let me propose that we consider now um
what these things look like maybe more
physically. In fact, we've we've got a
couple of mailboxes here to make clear
with a little metaphor that uh here is a
physical representation of our variable
say P labeled as such. Inside of this is
presumably going to be the address of
some actual value. That value at the end
of the story is going to be the value of
N which recall for consistency is that
address ox123.
So what happens when you actually try to
uh locate a value in memory is analogous
to sort of looking up something inside
of these mailboxes which if you think of
your computer's memory as hundreds or
thousands of little mailboxes maybe more
apartment style where you've just got
rows and columns of mailboxes as opposed
to individual ones for single family
homes. Each of those mailboxes can
contain the address of some value in
memory. And so what's really happening
is that if this is P, not drawn to scale
because they only make mailboxes so
large. Inside of P is going to be an
address like ox123. And just to be
dramatic since there's a big football
game this weekend, uh here is a Harvard
foam finger metaphorically like this
pointer is like pointing at that value
over there. And in fact, we're going to
see as you asked a moment ago, can we
actually go to an address in memory? We
don't yet have the syntax for that, but
we're about to. Yes, you can. And in
fact, if I follow what I'm pointing at,
open up this location in memory, voila,
there is the 50 in question. So, anytime
we're talking about values or we're
talking about the addresses thereof, you
can think of it analogously as being
like physical mailboxes, one of which
might contain a useful number like 50,
one of which might contain the address
of that value. And we now have the
syntax we'll see to actually go from one
to the other. Let me actually go back
into VS code here which in the most
recent version of my program what I was
doing was getting the address of N and
storing it in P and then I was literally
printing out P itself and that's when we
saw the big hexodimal number that is
generally not useful but it's maybe
interesting to see that one time. Let me
instead though introduce another use of
that star or asterisk operator that
allows us as was asked a moment ago to
actually go to that address. So in this
version of my program, I'm going to keep
N equal to 50. I'm going to keep P equal
to the address of N. But what I'm now
going to do is show you how
syntactically I can print out not P, but
N, but by using P, following the
proverbial uh foam finger metaphor by
printing out percent I back slashN and
printing out N instead. Now, obviously,
I could cheat and just say N and print
out N like in version one, but that
doesn't really demonstrate anything
interesting here. However, if I only
have P at this point in the story, it
turns out you can use the star for
another purpose. If you simply prefix
your variable name with a star, that is
the so-called now dreference operator,
which means go to the address in P. So
if I now open up my terminal here, do
make addresses for this version, then
dot / addresses and enter, I now get
back the number 50. So what's really
happening in line five, as has been true
for several weeks now, we have a
variable called n being initialized to
the number 50. Then on my next line six,
I'm declaring p as an address of some
value, an integer specifically, and
putting the address of n in there
exactly. And then on line seven, I'm
actually saying print out an integer
percent I as we've done for weeks. But
what integer? Go to the address in P and
print out what you find there. So that's
equivalent again to the the foam finger
which is over there pointing at the
address I actually want to point print
out instead.
Okay. So
usefulness. Well, I think we can get
there by taking a look at one of our
little white lies that we've been
telling. In fact, let's turn our
attention to strings, which up until now
have been a sequence of characters in
the computer's memory. A string is a
thing in programming more generally, but
in C, it technically doesn't exist by
this name. But you can still use strings
in C, but just not by calling them str
iing as the actual data type. But let's
let's start with our familiar code here.
Let me go into addresses.c. Let me add
our trading wheels in for now and
include cs50.h
because in this version of my addresses
program, what I want to do is declare a
string s and I'm going to set it equal
to high exclamation point. Then as we
did in week one, let's go ahead and
print out with percent s back slashn
that value of s. So nothing new, nothing
interesting here. So let me just do it
quickly and do make addresses then dot
/resses and we see hi on the screen. So
that has all been something we've been
taking for granted. But let's consider
what is going on underneath the hood of
even that program. So the string we've
declared in memory exists somewhere in
the computer's canvas of memory. So
string s equals high might end up
somewhere down here. And I'm going to
stop drawing all of the boxes when not
necessary. But here we have hi
exclamation point. And as we discussed
two weeks ago, the null character and ul
which just means the string stops here.
So as a quick refresher, even though the
word is three characters, it takes up
how many bytes? Four. always because you
need that null terminator. All right, so
maybe that string could be accessed then
by its name S. And we've seen this
before. S bracket zero is the first
character. S bracket 1 2 and then if you
want to poke around, you can go into S
bracket 3, but you'll probably see quote
unquote null on the screen or the
compiler will sort of the computer will
sort of remind you that you don't really
want to look there at that point. So,
three characters accessible via this
array syntax. But we know now that
everything in the computer's memory is
addressable. And maybe that H just so
happened to end up at ox123 and the i
ends up at ox124 125 126 respectively.
Doesn't matter what these numbers are,
but because strings are sequences of
characters back to back up to back in
memory, it must be the case that these
addresses are themselves contiguous back
to back to back without gaps inside of
them. That's how a string has always
been stored in memory. It's just an
array of characters. All right, so with
that said, what really is S? We've
thought of S in every program we've used
strings in before as just a string. Like
that is the sequence of characters or
really it's the name of an array. But
that's a bit of a white lie because what
S really is is going to be a more
specific value. Take a guess what is
actually going to be the value in S.
>> Yeah, the address of if I may that
array. So we've got like sort of four
possible answers here. A, B, C, and D.
Multiple choice. Which of those numbers
probably makes sense to store in the
variable called S in order to get to
this string? What what is S's value?
Yeah.
>> 0x123
is correct. So we don't talk about this
in like week one because like it's
already hard to like remember semicolons
in week one. Like god forbid start
thinking about like what these specific
addresses are. S is a string. S. But
technically S is and has been since week
one a pointer. The address of an array
of characters in memory. The address
specifically of the first character in
memory which is sufficient. Why? Because
of this null terminating convention that
we talked about weeks ago that tells the
computer where the string ends. The
pointer tells the computer where the
string begins. And that's how you get
using just numbers, zeros and ones
inside of a computer to store something
as interesting as an actual string. So
in fact, let's make let's take a closer
look at this. In fact, let me go into uh
VS Code again and just for the sake of
discussion, let me declare S as before,
but instead of printing out uh the whole
string at once, let's go ahead and do
this. print f uh quote unquote percent p
back slashn
and then let's print out s itself
initially to see whether it's actually o
x123 or presumably a much bigger number
then after that let's print out another
pointer another address rather percent p
back slashna
and now I'd like to print out the
address of the first character of s but
let's let's not get ahead of ourselves
let me go ahead and make addresses n dot
/resses. Okay, there now in this high
program is the address at which the
string itself is stored. ox
5a7143027004.
So bigger than ox123. Well, let's now
poke around. What if I were to do this?
What if I want to print out the address
of how about the first character in that
string? Well, at the moment, recall that
s bracket zero is literally the first
character. That is a char. So with what
syntax could I get the address of the
first character?
Well, we haven't learned all that much
that's new today. It's just a single
amperand that will get me the address of
that character. If I do this for the
next character, I can see one after
another. And in fact, this is going to
have four characters in total, including
the null character. So let me copy
paste, which is generally frowned upon,
but not for a lecture demo because we're
just trying to do this quickly. Let's
print out the address of S itself. and
then more specifically the address of
S's first character, the address of S's
second character, third, and the address
of that null terminator. All right,
let's go back into make addresses. Let
me go ahead and clear my terminal and
dot slash addresses. And we see if I
zoom in on my terminal here, the
following. S itself contains ox 56199
bd00004.
And the address of the first character
in S, aka S bracket zero, is exactly the
same thing. The next character, the I in
high is one bite away. The exclamation
point is one more bite away. And the
null terminator is one more bite away.
So again, bigger numbers, but the point
is these are indeed just the actual
addresses of all of these characters in
memory. All right, let me pause for any
questions here. Yeah,
>> why do you need a reference specific
but not S?
>> Good question. Why do I need the
amperand before the specific characters
in S but not S itself? Think what S
actually is. I'm claiming for the moment
that S itself is the address of that
whole string which just so happens by
design to be equivalent to the address
of the first character because that is
the convention humans came up with
decades ago to represent a string. Now
you might think that you need the
address of every character in the
string. But no, that's why humans
decades ago decided to just terminate
every string in memory with the
backslash zero or null terminator
because if you give me the beginning of
the string and the end, I can obviously
with a loop find everything else in
between. Other questions? No. All right.
Well, what is then this actual thing in
memory? Well, it turns out that S is
yes, a string as we've been describing
it. It turns out that yes, S is a string
as we've been describing it all this
time. But technically, I think we're
ready to reveal what little white lie
we've been telling or if you will, what
abstraction S actually is in the CS50
library. The type you know as string
since week one all this time has simply
been a synonym for char star s this is
where
maybe so what does this really mean well
we saw instar p earlier here we're
seeing char star s but what does that
really mean well s is the name of the
variable and yes it's a string but what
is it really s is the address of a char
and so in week one of the course in the
actual CS50 50 library. We've told this
little white lie by just creating a
synonym in the library that makes char
star so to speak the exact same thing as
string s t r i n g just so that we don't
have to think about this level of detail
let alone hexodimal notation and
addresses and pointers and dreferencing
and all of this complexity in the first
weeks of the course. It simply abstracts
away what the char what a string
actually is. And in fact we've seen this
technique before in a more complicated
way. In fact, if you recall a couple
lectures uh last week, we actually
claimed that you could create a phone
book for instance using uh persons and
persons have names and numbers and we
created our own type by saying type
defaf and that type was a whole
structure which is the complexity part a
structure containing a name and a number
and we gave that data type ultimately
the keyword person. So we've already
invented in class our own makebelieve
data types to create things that didn't
come with C itself like a person. Well,
the strruct is very specific to what we
were trying to do with the phone book,
but typed defaf is more generally useful
because it literally allows you to
define your own type. So, for instance,
if we wanted to create an synonym for
int because we never remember what it is
and call it integer instead, you could
simply say type def int.
And that would create in your
programming environment a data type
called integer that is literally
equivalent to int. Now, this is not all
that useful. So instead in the CS50
library, we do use typed defaf to tell
the computer that charar should instead
be spelled as string semicolon. And that
just means that string ever after is the
same thing as saying char star. So all
of this time since week one, I could
have been doing exactly that if I
wanted. And in fact, if I go back to VS
Code here, let's simplify this quite a
bit and go back to the very first
version of the program wherein I use
percent s and just print it out s is
value itself, the string high. Well,
this of course is going to work as
always as follows. It's just going to
print out high on the screen. But now,
if I get rid of the CS50 library and try
to recompile this, notice we'll get an
error that I think I've seen before.
Here we have if I scroll up to the very
first line use of undeclared identifier
string did I mean standard in and no I
don't and no I didn't a couple weeks ago
when I accidentally did that but it the
compiler does not know about the keyword
string at the moment. Well that's fine
even if I don't have the CS50 library
installed on this computer. I can just
get rid of the word string which is a
concept but not a keyword in C and just
rename it to char star. And now in my
terminal window, I can do make addresses
again, dot slash addresses, and voila,
we're back in business with no CS50
training wheels whatsoever because
printf knows given a char star, go to
that address, print, print, print, print
until you get to the null terminator,
and then stop printing. There's a loop
in there that does exactly that.
questions
on char star or what a string actually
now is.
>> Yeah. In front.
>> Good question. How does print f know to
keep going until it gets to the null?
the format code because I've been using
percent s which means print a string
instead of percent c which means print a
single character print fc is that
percent s and it was like oh I should
use a loop to print out all of the
characters until the null terminator if
I instead passed in just percent c it
would stop after a single character
>> okay that makes sense
>> other questions
>> good question why Why don't I dreference
S in order to print it out? So, let me
try that for just a moment here. Why do
I not have to now or any week prior do S
here? Because after all, if S is the
string, I want to go to the string and
print it out. Well, the first answer is
that print f is doing this for you
because it's being handed the address
and it is going to the address for you.
So, that star is somewhere in print f's
implementation. But this is also
incorrect conceptually because yes s is
the string but more technically today s
is the address of the first character in
the string. So I really want to provide
print f in this case with the address
not the specific character because I
want it to treat it as a string not a
single character indeed. So I could use
the percent s if I change to percent uh
I could use star s if I change to
percent c to print out the single
character. All right. So let's play
around just syntactically for just a
moment here in VS code. Let me propose
that we still use charst star s here and
then just demonstrate exactly what's
going on. So I'll do exactly what was
just asked. So I'll use percent c and
then I'm going to go ahead and print out
for now our old week 2 syntax treating s
as an array. So s bracket zero, s
bracket one and s bracket 2. And I'm
using some copy paste just for time
sake. This of course is not going to do
anything all that interesting, but it is
going to demonstrate that indeed we have
h i exclamation point back to back to
back in memory. And if I really want um
I could print it all on one line by
getting rid of of course those new
lines. But what more can I do with this
syntax? Well, I could take literally the
fact that s is the address of the first
character in memory. So instead of using
this array notation which we introduced
in week two, I could technically go to
the address of S. Why? Well, S is the
address of the first character of the
string. Star S means go to that address.
And voila, you're at the first character
by definition of what S is. So I could
print out the first character using star
S instead of S brackets zero. How could
I do this? Well, here's where we can
actually take advantage of the fact that
pointers and addresses more generally
are in fact numbers and you can actually
do arithmetic on pointers themselves. In
other words, there is a concept known as
pointer arithmetic which means given an
address, you can add to it, subtract to
it. Heck, you could even multiply or
divide. Even though that would probably
be weird in most cases, we could
certainly add numbers to an address. So
for instance, if I want to print out the
second character of S, that's kind of
equivalent to going to S but then moving
over one character. So maybe I should do
a little bit of pointer arithmetic and
do S + 1 in parenthesis just so that
like in math class we uh do order of
operations correctly. And then down here
I could go to S again. But wait a
minute, I want to go to S plus two
characters away or two bytes away. So
now I can do make addresses down here.
Oh, and I did mess up. Oh, new mistake.
Unintentional.
Yep, I forgot my parenthesis on the very
end here. So that was just user error.
Make addresses again dot sladdresses.
And now I indeed see h i exclamation
point one more time using pointer
arithmetic instead of our familiar array
notation. So what is that array
notation? It's what we would generally
call syntactic sugar, which is a very
weird way of saying like it's just nicer
syntax. Like no one wants to write code
that looks like this. It sort of, you
know, bends the mind a little bit to
read and parse all of this visually.
Just s bracket zero is much more
straightforward. But what it's really
doing is this. And the computer is
essentially converting that bracket
notation for us into this more esoteric
but correct version instead.
All right. What else can I do? Well,
just for fun, for some definition of
fun, let's go ahead and print out three
different strings. And recall that a
string is a sequence of characters that
starts at some address. So, let's first
print out the sequence of characters
that starts at s. Let's next print out
the sequence of characters that starts
at s+ one. And let's lastly print out
the string that starts at s+ 2. Just
playing around with the definition of
what these pointers are. Let me do make
addresses.
And oh, not my day.
What did I forget? Semicolon. So if it
happens to you, it happens to me, too.
Make addresses dot sladdresses. And now
this one's going to be a little curious.
But I see hi I and just exclamation
point. Why? Because I'm treating a
string literally as what it is, a
sequence of characters, but I'm giving
print f the address of the first
character initially, then of the second
character, then of the third. But all
three of those statements work because
all three of them happen to be
terminated by the same null character.
Even though I and the exclamation point
alone was not really my intention, that
doesn't stop me from being able to do it
nonetheless.
All right. Well, let's do one other
maybe uh application of this idea. Let
me propose that. Let me propose that we
take a look at our computer's memory
here and let's suppose that we want to
start uh comparing values because in
week one we did a lot of that and we
even in week zero we did a lot of that
with if and else if and else and so
forth. So let's make this a little more
real and also reveal why last week we
had to solve a unexpected problem using
another string function namely stir comp
str cmp. So here for instance are two
arbitrary variables in memory I and J
and I gave them both the value of 50 and
maybe they indeed end up there each of
them taking up four bytes. Last time
recall that we weren't able to compare
two values in memory just by using the
equal equal operator unless those values
last time were actually integers. In
fact let's do that. Let me go back into
VS Code here. close out addresses and
let's code up maybe another version of
my compare program from last uh from the
past. This time I am going to use the
CS50 library just to keep things simple
initially. I'm going to include both it
and the standard IO library here. I'm
going to give myself main with no
command line arguments. And then in main
I'm going to declare exactly what we
just saw on the screen. A variable I set
to 50, a variable J set to 50. And then
we're going to do our old familiar
syntax from week one. If I equals equals
J, then let's go ahead and print out
something like same back slashn. Else,
let's go ahead and print out quote
unquote uh different back slashn. So
super simple program that simply
compares two variables that yes are
obviously going to be the same, but
let's do this. So let's do make compare
dot /compare. They're in fact the same.
Okay, so that actually works as
intended. But why didn't it work last
time when we tried comparing strings?
The solution to which was actually to
introduce stir comp. Well, let's go back
to VS Code and resurrect that buggy
example initially. In fact, let me go
into VS code here and instead of using
say integers, let's go ahead and do
this. And I'll rename them just by
convention. So my first string will be
quote unquote uh let's do my first
string will be whatever get string gives
me. So we'll prompt the user for s. My
next string will be called T by
convention and I'm going to ask the user
for that. Then down here, instead of
using I and J, which are common for
integers, I'm just going to use S and T,
which are common for strings, and just
ask literally the same question as we
have in the past. All right, let me go
ahead and do make uh compare
and wow, what's the error? Well, I'll
show you the error message. What did I
unintentionally do wrong here?
Yeah, I'm getting a string, but I'm
trying to store it into an int. So, this
is just frowned upon. So, let me go
ahead and change that to what I should
have typed the first time. Give me a
string s and a string t. Now, if I do
make compare, we're back in business.
All right, let me do dot /compare. And
I'm going to go ahead and type in, for
instance, uh let's say hi exclamation
point and high exclamation point, both
for S&T, which are obviously clearly
different.
Now, we've tripped over this before and
recall that the solution was indeed to
introduce a function called stir comp.
And I explained at a high level. Well,
that's because you're not just comparing
two values. You got to compare character
after character after character. And
that's what indeed stir comp does. So,
let's go ahead and do that. Let me go
back into this file. Let's go ahead and
include the string library at the top
here. And instead of doing s= t, let's
do if the string comparison of s and t
happens to equal equals zero, which per
the documentation for the function means
they're equal instead of one before or
one after the
other.
No, I did not get it wrong this time. I
caught it. Um, yes. So, how do we
actually go ahead and compare the
strings this time? Well, let me go ahead
and do make compare dot /compare. And
now type in exactly the same thing. Hi
exclamation point. Hi exclamation point.
And now they're in fact the same. And
just to demonstrate that this isn't just
some fluke, I can type in hi for
instance and buy. And those are in fact
different. So clearly stir comp is doing
something useful. But what is it
actually doing? Well, first of all,
let's make clear that what was a string
last week is technically a char star
this week. So I can remove that training
wheel. I'm still going to include the
CS50 library because as we'll see by the
end of class today, get string and get
int and all of those get functions from
CS50 are actually still useful because
it's a pain in the neck in C still to
get user input without using functions
like those. But I'm going to get rid of
the data type that we thought was called
string. This will still work exactly as
before. If I do make compare dot
/compare and type in high and high,
we're indeed seeing that they are now
the same. So, what's actually going on
inside of the computer's memory with
strings? Well, I would offer that S
probably ends up like over here in
memory. And then maybe it actually has
its characters down here. So, notice the
duality. S as of now, is an address,
which means it takes up eight bytes or
64 bits, but the actual characters, it
turns out, end up somewhere else in the
computer's memory. And this is what's
different about an int. The int i and
the int j both ended up exactly where
the variables were named. But with
strings, the variable itself contains
not the string, but the address of the
first character in that string, which I
claim could end up anywhere else in the
computer's memory. So that those
addresses might be ox123, 1 124,125, and
126 for instance. Meanwhile, S is going
to contain literally the address of that
first character. When I create T in
memory now, it ends up maybe over there
taking up eight bytes of its own down
here ends up the second thing that I
typed in not at the same address but at
ox456 457 458 459. Now if the computer
were really smart and generous, it could
probably notice, oh wait a minute, you
typed that thing in already. Let me just
point you at the other memory. But
that's not how it works. When you call
get string, you get your own chunk of
memory for whatever the human typed in.
Even if by coincidence it's exactly the
same. So T's characters are ending up
here. S's characters are ending up here.
What value should go in T?
>> Exactly 0x456 because that's the first
uh address of the first character in T.
So we put ox456 there. So at this point
in the story, we have two strings in
memory and two pointers there too. And
so in fact, if we kind of abstract that
away, it's kind of equivalent to S
pointing at the chunk of memory on the
left and T pointing at the chunk of
memory on the right. So why was string
comparison actually necessary? Well, in
this case, we wanted to make sure that
the stir comp function was handed the
address of S and the address of T. So
that the stir comp function written by
someone else decades ago actually has
its own for loop or while loop that
essentially starts at the beginning of
each string and compares them character
by character by character by character.
That's what it's designed to do. By
contrast, when I was using equal equals
a few minutes ago and also last week
incorrectly to compare strings, what was
getting compared? Well, if you literally
compare s= t, that's like saying, does o
x123 equal equal ox456?
And that's obviously not true because
those are literally two different
addresses. So, the answer I was getting
last week and today was correct. Those
addresses are different. But
conceptually of course I actually
intended for the program to compare the
actual characters in the string not the
uh simply the addresses thereof. So how
do we go about fixing something like
that? Well using stir comp ensures that
we can actually go ahead and compare
them character by character and I don't
need to create my own for loop or y
loop. The stir comp function does that
for me. And we can see this too. If I go
back to VS Code here, get those two
strings and just for kicks, go ahead and
print them both out using print f of
percent p back slashn. Then let's go
ahead and print out with percent uh p
again back slashn for each of them
passing in those variables s and t
respectively. What I should see that
even if I type the exact same thing,
we're going to see two different
addresses when I make this version of
the program. Here's my first high.
Here's my second. And the two addresses
are it's subtle very much different. The
first one ends in B 0. The second one
ends in F0. Both of which are hexadimal
values.
Question
on any of that thus far?
Any qu? Oh yeah, question in front.
Yeah. What's that?
>> Really good question. When you create a
pointer in memory or really when you
allocate a string or an integer in
memory, how does the computer decide
where to put it? It uses different
chunks of memory for different purposes.
And in fact, one of the topics we'll
look at after break today is exactly
that. How a computer decides where to
lay things out. It's often very
intentional and it is often auto
incremented. So they'll go back to back
to back when possible, but over time
things will start to get messier,
especially in larger programs where
you're adding and subtracting values
from memory all the time. So more to
come. Other questions on what we have
done here.
All right, before we break, let's do one
other example that elucidates perhaps
what can go wrong without understanding
some of these underlying building
blocks. whereby let's go ahead and
create a program this time that aspires
to copy two strings, which seems pretty
reasonable at a glance because it's
certainly easy to copy two integers. You
just set one equal to the other, but
that's not going to be the case, it
turns out, with copying a string. So,
let me open up how about uh copy C, a
new program, and I'm going to include a
few libraries at the top. We'll use
CS50.h so that we can still use get
string conveniently. We're going to
include uh cype.h for reasons we'll soon
see, but we saw that a few weeks back.
We'll include standard IO as always. And
lastly, we'll include string.h
inside of my main function, which won't
take any command line arguments. Let's
go ahead as before and declare a string
equal to get string and just prompt the
user for a variable s. Then let's go
ahead and try to copy
uh s into a new variable t just like I
would copy any two variables using the
assignment operator. Then let's treat
the copy otherwise known as T now as an
array which we're allowed to do per week
2. So let's say the first character in T
we actually want to set equal to the
uppercase version of that same
character. So this line 12 at the moment
is literally on the right hand side
saying use the two upper function from
the cype library which we used a couple
weeks back. Pass in the first character
of the copy T and then update the actual
first character of T. So let's
capitalize T but not S. Now at the very
bottom of this program, let's go ahead
and print out the value of S at this
point in time. And then let's print out
the value of T at this point in time.
And
when I go ahead and make this program
called copy and dot /copy, let's type in
high exclamation point. Uh no, let's do
it lowerase first. Let's do high in
lowercase. Enter. And we'll see
curiously that S and T both got
capitalized even though the only
character I touched was T bracket zero.
I didn't touch S after making this copy.
Now to be clear what's going on? Why
don't we remove one of these training
wheels? So string really doesn't
technically exist. It's always been a
char star. And this string is also a
char star. So what's really going on?
Well, more clearly now S is the address
of the string uh that the human typed
in. But T is a copy of what? Literally
the address of the thing the human typed
in which is going to be one and the
same. So in fact pictorially you can
think about it this way. If here is my
canvas of memory and the user is
prompted for S and the user types in
high in lowercase as I did and it
happens to end up down there. what gets
stored in S is going to be the address
of that memory which for the sake of
discussion is maybe ox123. So ox123 is
what is stored in S. When I then on my
second line of code create T, I get
another eight bytes of memory or 64 bits
to store a pointer charar aka string.
But what is put in S? What is put in T?
Literally S o X123. So abstractly it's
essentially equivalent to S and T both
pointing to the same chunk of memory. So
when I do t bracket zero and go to the
zeroth or first character of t, that
happens to be the exact same chunk of
memory that s is pointing to. And so
when that lowercase h becomes a capital
h, it's as though both s and t have
changed. And recall too, if you're
enjoying the syntax, if I go back to VS
code here, I did use array notation, but
I equivalently could have said go to the
address in t. go to the address of that
first character which functionally is
exactly the same. We're just not using
the syntactic sugar now of the square
brackets. That is why hi is actually
being capitalized for seemingly both
versions of it. The original and the
copy. So how do we go about fixing this?
Well, we need a couple of new solutions,
namely two new functions here. Maloc is
going to be a function that allocates
memory. So memory allocation aka maloc.
and then free which is going to be the
opposite which is when you're done with
new memory you can hand it back to the
computer and say use this for something
else. So using these two functions alone
I dare say we can solve now this problem
in memory by making an actual conceptual
copy of the string by copying hi
exclamation point and the null character
elsewhere in memory so that we can
actually manipulate the copy thereof. So
how do I do this? Well, let me go back
to VS Code here. Let me propose that we
get rid of much of what we did earlier
except we'll keep around the declaration
of S. But now if I want to create a copy
of S, it turns out I'm going to need to
ask the computer for as much memory as S
itself takes up. So hi exclamation point
takes up how many bytes in memory?
Four is correct because you need the
null character. So how do we figure this
out? You can do this. Let me give myself
another string called T. But we don't
need that white lie anymore. Another
char star called t and set it equal to
not s which we knew was going to go
wrong. Set it equal to the return value
of this new function maloc which is
going to return the address of a chunk
of memory for me. How many bytes do I
want? Well, technically I just want four
bytes. So I could do maloc of four. And
that will literally ask the operating
system running in the cloud in VS Code
for four bytes of memory somewhere in
that black and yellow grid I keep
drawing on the screen. I don't know
where it's going to be, but I don't care
because Maloc's return value will be the
address of the first bite thereof. Now,
it's a little dumb to hardcode four, not
knowing what the human's going to type
in, but that's okay. We can do this more
dynamically and use our old friend
Sterling, ask the computer, what is the
length of S? and then
add one because we know that we need to
additionally have an extra bite even
though the length of high in the real
world is three but we know underneath
the hood we actually need that fourth
bite hence the plus one. Now to use
maloc I actually need to add another
library here standard lib for standard
library.h
and that's going to give me access to
the prototype for and in turn the maloc
function. Now with this chunk of memory,
it's up to me to copy the string. So how
do I go about copying a string from S
into T? Well, I can do this in a bunch
of ways, but let me propose that we do
it like this. For int i equals zero, i
is less than the string length of s,
whatever that is, i ++. And then inside
of this fairly mundane loop, let's just
set the uh i value of t equal to the i
value of s and copy literally very
mechanically every character from s into
t.
Then down here, let's go ahead and
capitalize just the first character of t
by using two upper as before with or
without the syntactic sugar. And then at
the very bottom of this program, let's
print out the value of S itself just for
good measure to make sure we didn't
screw it up this time. And let's print
out the value of T just so we see that I
in fact have capitalized T and only T.
But I'm not quite done yet. There's a
design flaw here and a mistake, but it's
subtle. Does anyone want to pluck off
one or the other?
Check 50 and design 50 are not going to
like this. Yeah. We don't actually pop
over the like terminating character of
the string.
>> Yes, because Sterling always returns the
sort of real world length of the string.
Hi exclamation point 3. This would seem
to accidentally forget to copy the null
character. So I can fix this in a few
different ways. I could for instance at
the bottom of my loop actually do
something like t bracket 4 equals single
quotes back/z and manually terminate it
myself because I know it's got to end
with a null character. This would be
frowned upon too. I shouldn't be hard
coding the four. This is all too sloppy.
So don't do this. What I could instead
do is say go up to and through the
length of S because if the length of S
is three, but I use less than or equal
to that thing's going to iterate of
course four times because I'm starting
at zero as always. So that I think fixes
that problem. But now the design flaw
which is subtle but we've seen it
before. Yeah.
Exactly. It's just dumb of me to be
asking the computer what's the length of
s what's the length of s what's the
length of s and every iteration. So this
is why we introduced this trick where
you can set another integer variable
like n equal to that string length and
then after the semicolon just keep
comparing i against n which means you're
not calling functions wastefully as
before. All right if I didn't mess up
anything else let me go into my
terminal. Let me do uh oh did I mess
something up?
I still Yes, I did mess something up. I
should have put this back as well. Thank
you. All right. So, let's go ahead and
do make copy. Enter dot /copy. And now
I'm going to go ahead and type in hi in
all lowercase and hit enter. And you'll
see now that s is unchanged. It's
printed out again in lowercase, but t is
in fact capitalized here. Now, why is
this? Well, in this case, what's
happened is that I've got S in memory,
but this time when I allocate T, I then
use Maloc to get a whole chunk of memory
here that initially just contains who
knows what garbage values as we've
called them before. I'll just leave them
as blank here, but it happens to be for
the sake of discussion at ox456 7 8 and
9. When then I actually set t equal to
the return value of maloc, it's as
though t is just pointing to this chunk
of memory. Then in my own loop when I go
from zero on up through n that just
means to copy the h then the i then the
exclamation point and because of the
equal sign also print uh copy the null
character instead.
So this is getting a little tedious
though admittedly like this is a lot of
work just to copy a couple of strings.
Could we be doing this a little bit
better? So we actually can because of
the libraries we're including. Turns out
there's functions for copying strings
that come with C. So in fact if I go
back to VS code here I don't actually
need any of this for loop here so long
as I have actually allocated enough
memory for this string which I do think
I've had. I can actually use literally a
function called stir copy strcpy for
short and pass in the destination and
the source in that order. Almost feels a
little backwards but that's the way it's
done to copy s's bytes into t. It's easy
to mess them up, but don't mess them up.
Per the documentation, the destination
comes first and then the source string
instead. So, if I do this now, let's do
make copy. We're good to go. Uh, if I do
dot /copy now and type in high and all
lowercase, we still have preserved that
good property. But let me propose that
things can go wrong. And in fact, this
is about to make the program look way
more complicated than feels ideal. But
I've been a little lazy here. There's a
bunch of things that can go wrong for
which it's worth knowing about the
return values of these here functions.
So all of this time it has been possible
for certain functions we've been using
get string among them to return
confusingly
this null value null. Again humans
decades ago decided that one would be
called null. Other humans decided this
new thing would be called null. N UL
pronounced null is just the null
terminator back/zero. It is a single
bite of eight bits all of which are
zeros. That's been true for a few weeks
now. NL happens to be a special memory
address literally ox0 at which nothing
is supposed to ever live. So whenever I
describe the top left corner as this is
address zero, this is one, this is two.
Humans years ago decided, you know what,
let's just waste bite location zero and
never put anything there so that we have
a special value to ensure that we can
signal when something has gone wrong. So
humans just decided don't use memory
address ox specifically and a few bytes
after it. So what does this mean? Well,
in my code all this time and since week
one, frankly, things could have gone
wrong. So in VS Code here, I'm using get
string and I'm using Maloc and I'm using
stir copy and um all of these print
statements here, but I'm not actually
adding as many error checks as I should.
So it turns out if you read the actual
documentation for get string, which in
fairness we never told you about until
now, in cases of error, get string can
return null. Why would it ever have an
error if the human types in such a large
paragraph of text maybe that there's no
room in the computer's memory for
everything they've typed in? Well, you
don't want to just get back part of the
text and not know that something went
wrong. Get string is designed to return
a special sentinel value null in all
caps. That just means I can't oblige. I
can't return you a correct value. Here's
an error instead. So what I should
always have been doing since week one
but we consciously don't because it adds
just too much overhead is check if s
equals equals null then we should abort
the program altogether and for instance
like return one as we've done before to
just signify error like we cannot
proceed because get string did not work
that is true of maloc 2 technically we
should say if the address in t also
equals null that is ox0
we should also return one because
something uh went wrong.
So, let's do this one more time. Turns
out that even two upper is taking for
granted the fact that the humans typed
in anything at all. What if the human
just types enter? Well, that's a valid
string. It's the so-called empty string,
quote unquote. But what is the length of
nothing? It's going to be zero. And
that's problematic because if you try to
go to T at the first location, what is
actually there? Well, that's actually
the null character, which is not
something you should even try to
capitalize, it would seem. So, what we
should really do here, too, is check
only if the sterling of S is greater
than zero should you even bother
uppercasing that first character. I
mean, one, at best, it makes no sense
because if there's no string, there's
nothing to uppercase. At worst, I could
break something by touching memory that
I should not. And if I may, there's
another issue. Now, on line 15, I'm
asking the computer for memory, and it's
going to hand me those four bytes. But
technically, I'm never giving them back.
And so, even though this program is so
short that it's going to quit pretty
soon, and it's not a big deal, the
computer will automatically reclaim that
memory in longunning programs that like
servers or things that are running for a
long time. If you use Maloc and ask for
memory, but never give it back to the
computer, never free it, so to speak,
your computer might get slower and
slower and slower and slower essentially
because it's running out of memory. Not
physically, but the computer thinks it's
using all of its memory even if it's not
actively in use. You as the human know
best. And so at the end of this program
when I am completely done with T, you
should similarly call free of T passing
in the address that you allocated
previously so that the operating system
gets that memory back. If you don't do
that, it's what's called a memory leak.
If you've ever used a Mac program, a
Windows program, an iPhone or Android
program that somehow is just getting
slower and slower and slower and slower,
that is often a symptom of a human
having messed up and not freeing memory
that they don't actually need anymore.
Questions on null or any of these kinds
of checks?
No. All right. Well, as a teaser, in
just a bit, we're going to reveal when
and why things can go terribly wrong by
way of a little bit of claimation from
our friends at Stanford, but feels like
we're long past a good uh snack break.
So, why don't we go ahead and have some
oranges and some fruit snacks, and we'll
see you in 10.
All right, we are back. So, with memory,
a lot of things can go wrong. And in
fact, a question came up during the
break about whether or not I should have
also called free on s, which was the
string that I actually got back from get
string. The short answer is no. This has
been a deliberate choice over the past
several weeks whereby the implementation
by CS50 of get string automatically
frees memory that it has given to you
once it is no longer needed. So that's a
bit of magic underneath the hood once
those train once you no longer use that
though that feature goes away. But
because I actually used maloc to get my
memory for t I did have to free that
specific memory. So the rule of thumb
quite simply is if you maloclocked it
you must free it. If we get string
malocked it, you do not have to free it
yourself. But of course, things can go
wrong. And thankfully, there are tools
via which we can find memory related
errors. And one thing we're going to
show you briefly is another tool called
Valgrren, which is a nice complement to
something like debug 50 and print f and
the duck for actually chasing down
specifically in this case memory related
errors. So in fact, let me go over to VS
Code and open up a program I wrote in
advance because it's just not all that
useful, but it is demonstrative of some
things that can go wrong. And in
memory.c we have this code here. We
include standard IO.h and we include
standard lib.h the latter of which
recall is necessary now when you want to
use maloc and in turn free. And inside
of this main function I'm doing a few
things. I am first allocating three
integers in kind of an interesting way
because it turns out that maloc takes as
its argument the number of bytes that
you want to get. Now I know on most
systems an integer is indeed four bytes.
So if I want space for three integers, I
could just do 3 * 4 is 12 and put 12
inside the parenthesis here. But that's
generally frowned upon because it would
make my code less portable to other
systems where an int might not be four
bytes. So turns out you can use this
operator size of and actually ask the
computer how big is a data type like an
int on this specific system. And for
chars you'll always get back one. For
ins usually get back four. And same goes
for other data types as well. But this
is the more dynamic way to ask that
question. If you want to get three uh
integers worth of memory, what I'm then
going to do is assign on the left hand
side the return value of maloc to this
variable x just because and x itself is
a pointer to an integer more
specifically to this chunk of memory
which is a sequence of three integers.
This is very arbitrary and this is only
meant to demonstrate things you can do
incorrectly ultimately. But this is how
I would dynamically get space for three
integers from maloc and store the
address thereof in x. So it stands to
reason that I could put my first value
at uh x bracket 1 equ= 72, my second
value uh equaling 73 and my third value
equaling 33. Now if some of this is
rubbing you wrong, like these are
actually there's riddled with mistakes
already, some of which are old to us.
What's the first thing I've done wrong?
Even if you have no idea what's going on
with line eight, what about lines 9, 10,
and 11? What I do wrong?
Yeah.
>> Yeah, my indexing is wrong. Like we've
known for weeks now that with arrays or
with array syntax, you always start
counting at zero, then one, then two,
not one, two, three. So that's an issue.
And this is a new detail. But given that
I've used maloc on line 8, what other
mistake have I done in this version of
the program?
What's missing?
Free. So I didn't actually call free. So
this program has a memory leak. It's
asking for memory and never handing it
back. Now that's pretty good. You know,
a few of us were able to just kind of
eyeball the code and debug it. But
that's not going to be true for all
people, all programs, certainly when the
programs get larger and more
complicated. So a program like
Valgrren's purpose in life is to help
you spot these kinds of errors. So for
instance, when I run make memory to
compile this program and then do
slashmemory at a glance, like it
actually seems perfectly fine, if only
because I'm not seeing any me errors
even when I compile it or when I run it.
But we I do claim that there's at least
two that we've seen here. It's just
we're not getting so unlucky that the
program is actually crashing as a
result. So this is a more latent, harder
to detect bug. But what I'm going to do
now is this. I'm going to open up my
terminal window in full screen. I'm
going to then do Valgrind space
memory so as to run the Valgrren memory
checker on this program. So similar to
debug 50, but the name now is Valgrren.
This isn't a CS50 thing. This is a
common program that programmers use.
When I hit enter, the output's going to
be atrocious, frankly. Um it's more way
more complicated than it needs to be.
They put this number here, which means
something specific, but it's just stupid
that it's on every line of output. So
it's overwhelming at a glance. But once
you've trained your eyes to look for
useful information, there's a couple of
useful insights here. So one, invalid
write of size 4 that apparently is
somehow related to line 11. So let's go
there. Let me just minimize my terminal
window, look at line 11 of memory C, and
just see which line that was. Okay,
invalid write of size 4. Well, writing
means like changing a value. Reading
means accessing a value. So they're sort
of opposites. invalid write of size
four. Well, here's why it's generally
useful to know generally how big an int
is. Like four, you're trying to write
four bytes incorrectly. So why is line
11 invalid?
Just to be clear,
because the index is off like I'm
touching memory that I should not. If I
ask the computer for space for three
integers, each of which is four bytes,
that should give me location 0, one, and
two, not location three. So you still
have to know a little something about
programming to be able to make good use
of that information invalid right of
size four but once you've sort of
trained your mind and your eye to catch
it like h now I'm an idiot I have to go
in and fix that problem but what else is
wrong based on valgrren's output here so
this is kind of worrisome leak summary
definitely lost 12 bytes in one blocks I
don't really know what one blocks means
for now but 12 bytes should be familiar
because if you generally remember that
an int is four bytes and you ask or
three of them. Oh, there's my 12. So,
somehow I'm losing 12 bytes of memory.
Not in a literal sense, but it means by
the time the program finishes, you have
not returned or freed all of the memory
that you asked for. So, this line here
is your hint that you've done something
wrong with respect to 12 bytes in total.
And sometimes you'll see slightly
different output here. For instance, we
see mentioned up here, 12 bytes and one
blocks are definitely lost in loss
record 101. Very verbose. But the juicy
part is ah on line 8 is the source of
that error specifically. So there too
it's a little bit of a breadcrumb
leading me to the solution for fixing
this. So if I go up here, I look at line
8. Okay, there's only so much that I
could have done wrong on line 8. If I've
maloced the memory on line 8, sounds
like I do need to free it later on. So
let's fix both of these problems. The
first one is just the indexing issue.
Change the 1 2 3 to 0 1 2. Let's then ch
fix the second problem by just freeing x
at the very end. And just for good
measure,
this was not caught by Valgrren because
it doesn't always happen. But there's
one other
scenario that could go wrong and it
relates to line eight.
What should I be doing?
>> I am doing an array, but recall that we
can use array syntax on chunks of
memory. So technically what line 8 is
doing is this. It is allocating 12 bytes
of memory from the computer just because
just to demonstrate how maloc works and
it's storing the address of that first
bite in a variable called x. The bracket
notation is just the syntactic sugar
that allows me to change values at x's
address. I could alternatively just use
pointers and say go to x and put 72
there. Go to x + one and put 73 there.
go to x + 2 and put 33 there using
pointer arithmetic. But those are
identical and no generally, you know,
most people would just use square
bracket notation because it's just a
little cleaner and easier to read and
write. Okay, but back to this question.
There's still a subtle bug here based on
our example just before break. What
should you be doing anytime you call
maloc and get string and a few other
functions for that matter?
Did I hear the answer? Checking for
checking for null, right? Because if me
lock has an error, there's not enough
memory for whatever reason, you should
not be proceeding to touch that memory
because it might be the null address
that is 0x0. So what you should really
be checking is, well, if x equals equals
null, there's no more work to be done
here. Let's just return one down here.
And only if we get all the way to the
bottom should we maybe return zero to
signify uh explicitly that there is in
fact successful operation. All right,
with that said, let's go back down here.
Remake memory. No error messages from
the compiler. Dot /memory. That too
seems okay, but it was fine the first
time. Let's now run valgrren. Let me uh
maximize my window. Run valgrren dot
slashmemory. Crossing my fingers as
always. And now this is actually pretty
good. It's much shorter output even
though it's just as scary at a glance,
but most of this is fluffy and not uh
very uh revealing. Heap summary in use
at exit zero and zero. So look like all
heap blocks were freed. No leaks are
possible. Heap is a word we'll come back
to, but this means there's nothing
wrong. In fact, zero errors, which is a
good thing. So in short, Valgrren is
among the most arcane programs we're
going to use. It's output was really
designed for those more comfortable, if
you will. But there's still juicy
insights there. If you just kind of look
for things that lead you to like this
file on this line number, odds are that
will lead you to the most subtle of
bugs. In fact, another type of bug is
when we do indeed touch memory, we
shouldn't. So, let me uh zoom out on
that, clear my terminal, and let me open
up another program or maybe write this
one real fast incorrectly. So, let me
create a program called garbage.c C to
demonstrate what we've generally called
garbage values. That is values that are
still in memory, but I didn't put them
there myself necessarily. I'm going to
include standard io.h. I'm going to
include standard lib.h. And then I'm
going to go ahead and actually no need
for standard lib this time. Let's do int
main void. And inside of main, let's
give myself an array of like way too
many exam scores or whatnot. We used to
do just a few, but let's say there's
a,024. Then let's go ahead and do for
int uh for int i equals z i less than
124 i ++ and in here let's go ahead and
print out uh whoops let's go ahead and
print out using print f each of those
scores of course I have clearly
forgotten to do something in this
program which is what
I haven't actually put in any scores
there for real like I've asked the
computer give me an array for 12,024
integers, but I've not used get int or
even manually typed in any of my quiz
scores, which we did in the past. That's
because I'm intentionally trying to show
us garbage inside of the computer's
memory. What this loop is going to do on
line 8 now is literally print out the
first int, the second int, the third
int, all,024 ins, but all of them should
be garbage values because I myself
haven't put anything in those addresses
yet. So, let's go ahead and make
garbage. Let's go ahead and maximize my
terminal window just to see more on the
screen. Do dot/garbage. It's going to be
super fast output because the computer's
way faster than,024 variables values
alone. There is a lot of garbage output.
So when we talk about garbage values in
the abstract like here's just some
random zeros, a 25, a 32,000, a negative
number and so forth, that's because
that's essentially remnants from the
computer's memory of stuff that might
have happened previously, not
necessarily by me in this moment, which
is to say you just shouldn't touch that
memory at all whatsoever. So now we're
seeing garbage values for the actual
first time. Let's consider another
example of a program that uh doesn't
contain that does contain potentially
memory errors. And let's look at this
too. So this is not really a useful
program. It's meant to be demonstrative
of some of these concepts. So here we
have a program takes no command line
arguments. Up here we've got a line that
pair of lines that declares two pointers
but doesn't yet initialize them to any
variables. And that's fine. You don't
have to have an equal sign with any
variable. You just eventually should
assign it some value. But this just
tells the computer, give me a variable X
that's going to store the address of an
int. Give me another variable Y that's
going to store the address of another
int. Okay, what happens next? Well, on
this line of code, in this simple
example, we're allocating enough space
for a single integer just because it's a
stupid exercise. There's no reason to do
this other than to demonstrate how Maloc
works for the moment. Maloc returns the
address of that chunk of memory. So
that's what goes in X. So X is now
pointing at somewhere in memory four
bytes of space that it can certainly put
a value at. How do we do that? Well, if
you do star X and use the dreference
operator, that means go to that chunk of
memory and put the number 42 there.
That's totally valid. This says go to
the address in Y and put the unlucky
number 13 there. Unlucky quite literally
because what is Y pointing to at this
moment?
It's just the garbage address. Why?
Because if you don't initialize Y, who
knows what it's going to be pointing to?
Maybe it's zero, maybe it's 25, maybe
it's 32,000, a negative number, just
like we saw in the previous example. You
have no idea what values are going to be
in X and Y unless you yourself put those
values there. So, this is highlighted in
red because bad things are going to
happen if you try to dreference an
invalid or a bogus pointer. Even worse
than just touching uh variables that
might not have values, if you dreference
an address and try going to some random
place, the computer is generally not
going to like that. And in fact, our
friends at Stanford wonderfully brought
this particular scenario to life whereby
even though this example is a bit
contrived just to fit it all on the
screen at once, it is going to be the
case that bad things happen if we don't
check for these values and actually
assign valid values in the form of as
we'll see now some claimation. So here I
give you uh binky
uh which is a bit of claimation from our
friend Nick Parlante at Stanford. If we
could dim the lights unnecessarily
dramatically.
>> Hey Binky, wake up. It's time for
pointer fun. What's that? Learn about
pointers. Oh goody. Well to get started
I guess we're going to need a couple
pointers. Okay. This code allocates two
pointers which can point to integers.
>> Okay. Well, I see the two pointers, but
they don't seem to be pointing to
anything.
>> That's right. Initially, pointers don't
point to anything. The things they point
to are called pointies, and setting them
up is a separate step.
>> Oh, right. Right. I knew that. The
pointies are separate. So, how do you
allocate a pointy?
>> Oh, thanks.
>> Okay. Well, this code allocates a new
integer pointy, and this part sets X to
point to it.
>> Hey, that looks better. So, make it do
something.
>> Okay. I'll dreference the pointer X to
store the number 42 into its pointy. For
this trick, I'll need my magic wand of
dreferencing. Your magic wand of
dreferencing. Uh, that that's great.
This is what the code looks like. I'll
just set up the number. And
hey, look, there it goes. So, doing a
dreference on X follows the arrow to
access its point. in this case to store
42 in there. Hey, try using it to store
the number 13 through the other pointer
Y. Okay, I'll just go over here to Y and
get the number 13 set up and then take
the wand of dreferencing and just
Oh, hey, that didn't work. Say, uh,
Binky, I don't think dreferencing Y is a
good idea cuz, uh, you know, setting up
the point is a separate step and, uh, I
don't think we ever did it. H good
point.
>> Yeah, we we allocated the pointer Y, but
we never set it to point to a point D. H
very observant.
>> Hey, you're looking good there, Binky.
Can you fix it so that Y points to the
same point as X? Sure, I'll use my magic
wand of pointer assignment. Is that
going to be a problem like before? No,
this doesn't touch the pointies. It just
changes one pointer to point to the same
thing as another. Oh, I see. Now Y
points to the same place as X. So, so
wait, now Y is fixed. It has a pointy.
So, you can try the wand of dreerencing
again to send the 13 over.
Okay, here it goes. Hey, look at that.
Now, dreferencing works on Y. And
because the pointers are sharing that
one point, they both see the 13. Yeah,
sharing. Uh, whatever. So, are we going
to switch places now? Oh, look, we're
out of time. But I can only imagine how
long that took, Nick. But the key detail
was that bad things happened to Binky
when we did this line of code.
Dreferencing a invalid pointer that had
no true value assigned. It was just some
garbage value. Now what's the solution?
Well, as Nick proposed, just don't do
that. And instead, at least do something
sensible like assign X equal to Y. Not
to make a copy of anything per se, but
to literally point X at the same
location in memory to point Y at the
same location in memory as X. Then a
line like this is perfectly valid. you
can go to that address which happens to
be the same as the 42 and that's why in
the claimation form we saw that the 42
became a 13 instead. So again at the end
of the day this is only demonstrative of
these basic building blocks that we now
have at our disposal but also how easy
it is to do things incorrectly. So this
is one of those with great power comes
great responsibility. C is one of the
languages that is incredibly high
performing. It's so close to the
hardware that you have so much control
over the memory and operation that you
can write really good, really fast code.
And that's why even all these decades
later, it's among the most omniresent
programming languages in the world. At
the same time, you can really screw
things up. And so many of today's
software that are hacked in some way or
crashed for some reason is often because
humans have just missed some simple
mistake like this that happens to relate
to memory. So more modern languages that
we'll soon see like Python and if I in
high school you studied Java. Uh you
don't have this much control over the
computer's memory. There's many more
defenses put in place to protect you and
me from ourselves so to speak. But you
pay the price by some of those languages
tend to be uh less uh slower and less
performant. Yeah.
What is the difference here that we're
now playing with memory? This will
become clear this week and next. And in
fact, some of the examples on which
we'll end today will motivate needing to
have finer grain control over what's
going on inside of the computer. When
you want to deal with files, for
instance, you're going to need to know a
little something about memory addresses
and where things are. when you want to
build structures in memory beyond the
complexity of an array. In fact, next
week we're going to start building like
two-dimensional structures in the
computer's memory to represent the
equivalent of like a family tree, for
instance, or trees more generally that
can store data in a more efficient way.
Up until now, all we have is arrays. And
with arrays, you can achieve something
like binary search, but we're going to
see there are things you can't do with
arrays, especially if speed's important.
>> But I I was saying like, for example, if
you were to ask me to do this like say
last week about this, I would be like x
equals like 13 or something like
assigning a variable.
>> Correct. So last week if you just said
int x= 13 or in y equals 42 or whatnot
totally fine. And again this program
sole purpose in life is to demonstrate
how you can make mistakes in and of
itself is not useful here but it's
representative of how we're going to
start using this syntax not only in this
week's problem sets but next week as
well.
All right. So, with that claim made that
we can do a lot of damage, let's
consider how pointers and knowledge of
memory addresses can actually solve some
useful problems. Um, can we get one
volunteer to come on up and help pour a
drink? Come on up. All right. What is
your
name?
Come on over.
>> If you want to say a quick hello to the
group.
>> I'm Olivia.
>> Okay. and and a little something about
yourself.
>> Oh, um I live in Canada.
>> Okay, welcome. Well, come on over here,
Olivia. And we have um two glasses.
Well, really three glasses. So, we have
these fancy ray bands that have cameras
built in whereby we can sort of capture
your point of view. If you're
comfortable, we'll put these on. There's
no lenses in them. The white light will
mean we're recording. Hopefully, a
memorable moment.
This battery too is dead. All right. We
don't have a backup for the backup, so
we're going to pretend that this part
never happened. So,
>> Olivia, we have two glasses here for
you. And I'm going to go ahead and pour
uh some colored liquid into both. So,
we've got some blue liquid here into
this glass. All right. So, we'll fill
this up here.
And then in this one, we're going to go
ahead and pour this orange liquid. And
at this point in the story, I'm going to
exclaim, "Oh no, I accidentally put the
wrong liquid in the wrong glass. So, I
got this backwards." So, what I'd like
you to do is swap the values in these
glasses so that the blue goes into that
glass and the the orange goes into this
glass
>> without mixing it or
>> without mixing it. So, well, you're
hesitating. Why?
>> Well, it would be hard to do unless you
can like talk to the mic if you could.
>> Oh, it would be like hard to do um
without mixing the two because like you
don't have anywhere to put the other
one,
>> of course. So, in the real world, this
is not really solvable unless for
instance, we have a temporary variable
if you will, like an empty glass in
which to do this. So, here is your third
variable if you want to go ahead now and
get the blue into that one and the
orange into that one. Yeah.
No pressure.
All right. So, we're putting one value
into the temporary variable. We're
putting the other value into the
original value.
Okay. And now you're probably going to
take Yep. I'm guessing the temporary
value put it into the original variable
and that that was very well done. If
maybe we can give Olivia a round of
applause for just that. Thank you. We
have
little parting gift for you here too. So
goal here really being to create a
memorable moment of like oh remember the
time Olivia tried to swap two values she
needed a temporary variable is the
takeaway. So why is that? one code. If
we wanted to do the same principle,
we're going to need somewhere temporary
to put one of those values before we can
make this happen. The catch is though
that if we don't do this intelligently,
like it's just not going to work in C
unless we take advantage of some of
these new capabilities. So, in fact, I'm
going to go over to VS Code here and I'm
going to open up a program called swap.c
that I wrote in advance whose purpose in
life is simply to swap two variables
values. So, I've got standard io.h at
the top so I can use printf. I've got
the prototype for a swap function which
is uh might as well be Olivia in this
case that's going to take two inputs A
and B or two uh glasses and swap their
values ultimately is its purpose inside
of main though I'm going to do this I'm
going to set two variables X and Y equal
to one and two respectively I'm then
just as uh point of clarification going
to print out the value of X is such and
such y is such and such then I'm going
to call the swap function aka Olivia to
swap the values x and y then I'm going
to print out x is this and why is this?
So that hopefully I'll see that they've
indeed been swapped. At the bottom of
this file, we have the actual swap
function. And as you might expect, it
takes two inputs, A and B, both of which
are integers. So I could have called
them anything I want. The first thing
this function does is it grabs an empty
glass called temp, puts a or the blue
liquid into it. Then we put into A the
value of B. So we've sort of lost the
value of A at this point except that we
did make a copy of it into temp. And
then lastly, we put into B the temporary
variable. And at the end, the temp
variable is empty. Although technically
it still has a copy of the value, but
it's no longer useful because the job is
done. And A has become B and B has
become A. So I dare say this is like the
literal translation of what Olivia just
did. And I I like the logic of it.
However, when I actually run this
program, something goes ary. So let me
go ahead and do make swap dot slap. And
I'll maximize my window. I should see
hopefully that X is one, Y is two, and
then X is two, and Y is one.
But no, like even though I literally
translated into code what Olivia did,
this didn't actually seem to work. And
why is that? Well, it turns out that
this version of the program is not
right. In fact, because of issues of
scope. And we've talked about scope
before, generally in the context of like
where a variable lives. We've said that
a variable only exists in like the most
recent curly braces that you opened up
for it. And that was true. It's just
sort of a colloquial way of describing
what scope is. But scope comes into play
here because it turns out that A and B,
in so far as they are the arguments or
parameters for the swap function, they
have a different scope than X and Y. And
that still follows the same definition.
They're inside of different curly braces
than X and Y are. So it seems that I may
very well be swapping A and B, but I'm
not having any impact on X and Y. So why
is that? Well, in C, all this time,
anytime you pass in arguments to a
function, you are passing in those
arguments by value, so to speak. You're
literally passing in copies of the
variables to the function you are
calling. So what does this mean? Well,
more concretely, if like this is a p
photograph of a chunk of memory inside
of the computer and we sort of zoom in
as we've done before and we abstract
away all of the bytes from top to
bottom, what's really happening inside
of the computer's memory is that we're
using some of it for X and Y and some
other memory for A and B. But how is
that in fact happening? Well, it turns
out to a question that came up before
the break, memory in a computer is
actually assigned in a somewhat
deliberate fashion. And generally if we
think of this rectangle is representing
my computer's whole chunk of memory.
Generally what happens when you run a
program with dot slash something or on a
Mac or PC by double clicking or on a
phone by single tapping. What happens is
all of the zeros and ones that were
compiled by the company or person who
made that program are loaded into the
top of the computer's memory so to
speak. This is just an artist rendition.
There's no notion of top or bottom per
se, but it's loaded into this chunk of
memory at the very edge of the
computer's memory aka machine code. the
zeros and ones that compose the actual
program. That's where they go. So,
they're copied from the hard drive or
the SSD, whatever you know it as, the
persistent storage, and it's put there
in the computer's RAM or random access
memory, which is the faster memory where
programs and files live while you are
using them. Meanwhile, if your program
or the program you're using has any
global variables, global in the sense
that they're defined outside of main and
not inside of main or inside of other
functions, they end up right below that
machine code by convention, just so
they're accessible everywhere.
Meanwhile, there's this big chunk of
memory below that called the heap. The
heap is the chunk of memory that Maloc
uses to allocate memory for you. So the
first time you call Maloc, it's going to
give you probably this chunk of memory.
The second time this chunk, the third
time, this chunk, and this chunk, and so
forth, back to back to back in memory,
but Maloc is going to manage all of that
for you. You don't have to worry about
where it's coming from, but it's coming
more generally from this big heap area.
But it turns out that the way computers
are designed is that the heap of course
sort of grows and therefore downward
again even though there's no notion of
up down inside of the computer but it
grows in this direction. But it'd be
nice to make use of this other area of
memory and that's what's called the
stack. And the stack is the area of
memory that's used anytime you create
local variables or call functions. So
again, maloc uses memory from up here
and functions and variables use memory
down here just because this is what
humans in a room decided years ago is
how the computer's memory would be used.
Therefore, the stack grows sort of
vertically much like stacking trays in a
cafeteria or the dining hall. They go
from bottom to top in this model. All
right. Well, let's consider for the
moment just how the stack is used
because we're using a main function in
this program. We're using a swap
function in this program. So I claim
that those functions are going to use
memory down here. Well, how are they
going to use it? And how is this in fact
bad for our current goal? Well, when you
call the main function, it uses this
chunk of memory here. Specifically, if
main has any arguments like command line
arguments, or if main has any local
variables, they end up down here in
memory. Meanwhile, when Maine calls
swap, swap gets the next available chunk
of memory above it, so to speak, in
memory, and any of its arguments or
local variables end up there. So when
main uh when swap is done executing it's
as though that memory disappears even
though the zeros and ones are still
there but the computer can now reuse
that same chunk of memory later. Airgo
garbage values when functions are being
called going up and down conceptually
that's why you're getting remnants of
previous values in the computer's
memory. But let's focus on main for a
moment in Maine in this program. Recall
that I declared two variables X and Y. X
getting the value one Y getting the
value two per these two lines of code.
Then I called the swap function. So swap
is going to get its own chunk of memory,
more technically called a frame of
memory. And inside of that frame, it has
two arguments, A and B, and a local
variable called temp. So I'll draw them
as such. When you actually call swap
passing in X and Y, X and Y are passed
in by value, that is to say copy. So A
becomes a copy of X and B becomes a copy
of Y. So when this line of code or
rather this uh prototype for swap just
makes clear that it takes two arguments
a and b both of which are integers in
that same order. So x comma y uh lines
up with a comma b. So what happens then
inside of the swap function if a is a
copy of x and b is a copy of y. Well at
the moment it's equal to one and two
respectively. But consider this first
line of code int temp gets a. So temp
takes on the value of a. Next line of
code, A gets B. So A gets the value of
uh B. Sorry, which just happened.
Meanwhile, B gets the value of temp. So
B gets the value of temp. Now temp still
has a copy of one. So it's not quite
analogous to the liquid because we're
that glass is clearly now empty, but it
does contain remnants of what it once
did. But the key here is that A and B
have successfully been swapped. If I
were to print out A and B, I would see
that they've been swapped. But what has
obviously not been swapped in this
story? No one has touched X or Y because
when swap returns, especially if I don't
even print out anything in swap, X and Y
are unchanged. So A and B, the copies
were swapped but not the original
values. And that's the essence of the
problem here with this represent this
simple uh example of swapping values
because I was passing by value. But as
of today, we now have a solution to this
problem. Because previously today, if I
asked you to write a function that
swapped two values, you could not
physically do it in code because you had
no way of expressing the solution to
this problem. But now we have the
ability to pass by reference. That is
use pointers and addresses more
generally to tell the function how to go
to an address and do something there.
How to go to another address and do
something there. How do I express this
syntactically? It's going to look a
little scary at first glance, but it's
just an application of today's new
building blocks. This bad version of the
program where a and b are both integers
just needs to change to be addresses of
integers. So give the function a sort of
treasure map that leads it to the actual
x and y by saying that a is now not
going to be an int per se but the
address of an int. b is going to be the
address of an int. And now to use those
values, you can say the following. int
temp gets whatever is at location A, go
to location A and put whatever is at
location B, go to location B and put in
the temp value. And here is a perfect
example of where this use and overuse of
the star or asterisk operator is just
like cognitively confusing frankly
because we use star for multiplication.
We use it for declaring a pointer. We
use it for dreferencing a pointer.
Ideally, humans years ago would have
come up with another symbol on the US
English keyboard to represent these
different ideas. But this is where we're
at. We're using the star for different
things in different contexts. So, this
just tells the computer that A is going
to be a pointer, an address of an int.
This tells the computer that B is going
to be the address of an int. This star
when there's no data type to the left of
it means go to that address, as does
every other example thereof. So, what's
happening this time? If we actually look
at the diagram again, X and Y are still
one and two respectively. Swap gets
called. It gets now the values of the
address of X and the address of Y. So
pictorially we might draw that as
following. A is pointing to X. B is
pointing to two. I mean technically it's
like ox123 and ox12 whatever, but who
cares? We're just going to abstract it
away now with actual arrows or pointers.
The beauty of this now then is if we
look at the swap function, int temp gets
star a that means start at a and go
there sort of shoots in ladder style
familiar with the game and you find the
value one. So you put the value one
inside of temp which is why it's there.
Now meanwhile this next line of code go
to A's address go to B's address and
copy the ladder to the former. So this
means go to A. This means go to B where
you find the two. So put the two where A
is pointing. Lastly, go to B and put
temp there. So that's easy. Go to B and
point temp, which is why we now have the
one. And the beauty of this now is that
when swap is done executing, this
memory, this frame sort of goes away
conceptually, even though the zeros and
ones are still there, but it's done
being used, but we have now mutated the
actual values of X and Y by giving them
a proverbial treasure map of the
addresses of X and Y, not copies of the
values themselves.
So hopefully this is the beginning of an
answer to like why is this stuff useful?
You can now solve a whole new class of
problem and even more next week. Other
uh questions though on any of the syntax
pictures or the like.
This is good use of pointers now instead
of bad. All right. So with that new
capability,
let us consider here
how things can still go wrong and why
indeed with this power comes that
responsibility. Well, if you consider
now the bad version of the code is
fixable via this good version of the
code, we've still left a big glaring
problem in the diagram itself. Designing
something that grows this way against
something that grows this way, like this
is not going to end well. Why? Because
the more you call maloc, the more memory
that gets used here. The more functions
you call, the more memory that gets used
here. And at some point, like they will
collide because the computer only has a
finite amount of memory. So how do you
avoid this situation? Like you kind of
don't like you honestly just make sure
that you minimize how much memory you're
using by calling maloc only as much as
you need to and not calling for a
million bytes of memory just because you
might need them. You only allocate what
memory you need. and you try not to call
functions again and again and again and
again and again and again without them
finally returning. So if you ever did
something recursive a a couple weeks ago
where you accidentally maybe called a
function that never had a base case
never divided and conquered and actually
shrunk the problem you could overflow
the stack or equivalently heap by just
using too many frames of memory. So it's
just a mistake in the programmer uh for
the program themselves. So if you've
ever heard these phrases now, which some
of you might have heap overflow or stack
overflow, there's a very popular website
called stack overflow. And this is the
etmology thereof. Like stack overflow
refers to this representative big
problem with computers memories if
you're not mindful of how you're using
the computer's memory. And this is just
the way it is. If you've got finite
amount of anything, that resource can
eventually run out at which point
program will crash or something else
might very well go wrong. In fact, this
is a general more specific examples of
what are called buffer overflows. A
buffer overflow is generally just a
chunk of memory like an array that
actually just gets uh overflowed with
too many values like using allocating a
small array and trying to put too many
numbers therein. There's problems that
um and in fact you can see this very
simply if we take off those last of our
training wheels. So for instance these
are the functions in the CS50 library
get int get string and so forth. um
they're harder to take off these
training. It's harder to take off these
training wheels because C does not
fundamentally make it that easy to
manage memory yourself. So for instance,
let's focus for just a moment on get
int. I'm going to go over to VS Code
here in just a second and let's go ahead
and create our very simple program
called getc whose purpose in life is to
just get an integer much like CS50's own
function. So, in get C, I'm going to
propose that we write a program that
does a little something like this. Uh,
include CS50.h,
include standard io.h, and then inside
of main, let's go ahead and declare an
int n. Uh, set it equal to get int, and
we'll just ask the user for the value of
n. Then let's go ahead and print out n's
value verbatim back by just doing quote
unquote comma n. This program is simply
using the get in function in order to
get an int and stored in n. So let's run
it. Make get slashget. Type in a number
like 50. Seems to work. And yes, I think
this program is correct even though it
is using the CS50 training wheel of get
int. Let's stop using get int though. It
turns out that you don't have to use get
int if you instead use a function called
scanf which scans formatted input which
just means read something from the
keyboard into memory. This is
essentially what get string and get in
using although that too is a bit of an
oversimplification but let's use it here
now is an opportunity to get rid of the
training wheel of the CS50 library al
together and down here let's do this
instead of using get int let's declare a
variable n but not give it a value yet
let's now print out just a little prompt
just to tell the human what we want we
want them to type in a value for n and
now let's use this new function called
scanf and say scan from the user's
keyboard an integer represented by
percent i, our old friend and format
code. And please put the integer that
the human types in
in the variable n. This is slightly
buggy though because if I want a
function like scanf to be able to change
the value of a variable, just like the
swap function, I can't just pass in n. I
need to pass in the address of n here.
In fact, let's take a moment now to go
into the swap function which we knew to
be buggy before and actually update it
to match what we saw on the slides. I
claim that the problem is that we're
passing in originally x and y as one and
two into the swap function but therefore
we're passing in copies. But what if we
change the swap function to take indeed
the address of an int and the address of
an int. Let me change my prototype
accordingly because that two must be
changed. Then when I change this
function to take in those pointers, I
need to change my code to dreference
them. But there's one last thing I need
to do. I'm still on this line of swap
passing in X and Y, which is literally
the values X and Y. If I want to pass in
the address of X and the address of Y,
what other operator do I now need?
the amperand x and the amperand y to
pass in sort of the treasure map the
pointer to those two variables
locations. So if I open up my terminal
window now do make swap on this version
dot / swap cross my fingers now this new
and improved version of swap as claimed
does actually swap the values the key
being swap now has access not to x and y
per se but to the addresses of x and y.
So if we now close out swap and go back
to get, here is the same principle
applied to scanf. If scanf exists and it
comes with c, its purpose in life is to
scan an integer from the keyboard and
put it somewhere you want. You can't
just give it the variable name because
it's going to get a copy of whatever
garbage value is in there. You have to
say put this answer in the address at
the address of n itself. So lastly after
this, let me go ahead and print out n
colon and then percent i again as a
format code back slashn, n. This line is
just my prompt because I just want the
human to know what they're being asked
for. This line is printing out n colon
and then the actual value. So the only
interesting part here is that I'm
declaring a variable called n, but I'm
not giving it a value myself, but I'm
using scanf instead of get int to scan
so to speak an integer from the keyboard
and put it at the address of n. So that
scanf has access to that value. So if I
now do make get without any cs50
library/get,
let's type in the number 50, I indeed
see the number spit back at me. And just
to be clear, print f uses these format
codes of percent i and so forth. Scanf
uses essentially the same format code.
So that's why I'm using percent i in
both places. Both functions per their
documentation are designed to do just
that. So this is great. We've gotten rid
of get int. Catch is that getting rid of
get string is much much harder. Why?
Well, let's try another example. Let's
go ahead and try to get a string from
the user instead of just an int. So
we'll call it string s. But wait a
minute. CS50 library is not included. So
we need to use the actual thing that
this is. So char star s means give me a
variable that's going to store a string.
Let's go ahead and print out that prompt
just to prompt the user for s just for
clarity. Now let's use scanf and scan a
string with percent s and put it at
location s. Then let's go ahead and
print out just a reminder that the value
of s is now that passing in s. Now
there's something a little bit bit
different here. Notice that I've
deliberately not used an amperand before
this s why even though I did before the
n. Yeah.
>> Yeah. So I want to pass in the address
of the string which is if I may like
already s like s is by definition the
address of some string that is what a
char star is or rather it's the address
of a character but we know already that
if you lead it to the first character
whatever function can find the end of it
thanks to the null character except that
that's not going to be wholly true here
but I don't want to do amperand here
because if s is an address doing
amperand s would be the address of an
address which is actually a thing called
a pointer to a pointer but none of at
today, but it's going to be correct as
written here. N was an integer, so I
needed the address of it. S is already a
pointer by definition. It's a char star,
so I don't use the amperand here. But
the problem is this. If I now do makeget
dot slashget, and let's type in a word
like how about hi.
Okay, it did work. Let me try something
even bigger like hi. Let's just hold
this down a lot. Uh, let's do how about
this? A really long string. Oh, come on.
Let's type in a really long string
like hi.
And it's always a gamble to see if I've
done this long enough, but okay, it
didn't break. Okay, you'd like to think
that this is correct, but let's go ahead
and do this. Valgrind of get uh slashget
enter. Let me maximize my screen. Oh,
uh, and let me go ahead and type in a
value for S. While Valgren is running,
I'm going to type in hi exclamation
point. And now
lot, uh, let's actually scroll down to
the scroll up to the top of this. A lot
of error seems to have happened here.
Use of uninitialized value of size
eight. Use of uninitialized value of
size eight. Like a lot of stuff is going
wrong here apparently on it looks like
maybe line four, which is quite early in
the program. And in fact, well, actually
that's not it. Uh, line
multiple lines of code here we're having
issues with. But why? Well, let's focus
on the code here alone for a moment.
Line five is giving me what? A variable
called S. That's the address of a char.
But what is S right now? Like what value
is in there?
>> It's a garbage value because there's no
equal sign involved. I'm just saying
give me space. Like give me eight bytes,
64 bits to store the address of a
character. But if I don't use the equal
sign and actually put anything there, it
is in fact just some garbage value. The
print f is uninteresting. It's just
printing out son. Scanf though is saying
go to this address and store the
characters that the human typed in. But
that means like following the wiggly
line that we drew on the screen before
because we have no idea where S is
pointing. It might be there, there,
there, there. You're putting the string
at a bogus location in memory. You
haven't actually allocated memory. So
when you then try to print it, you're
just trusting that you're going to
memory again that you control. So what
is the solution here? Well, there's a
few different ways we could solve this.
We could do something like this.
Actually allocate space for like four
bytes so that the human can safely type
in uh so the human can safely type in
high exclamation point with room for the
null character. We could change S to
actually be an array of size four
because we can treat arrays as though
they're addresses and addresses as
though they're arrays. It turns out that
syntactic sugar really goes in both
directions. This too would solve that
problem. Or better still, we wouldn't
use scanf at all because how do I know
how many characters the human's going to
type in? Like this was a question too
that came up during break. Well, high
will fit in four bytes with the null
character. By will not. So maybe I need
five. Well, what if they type in a
longer word? Six. Well, maybe the longer
words, seven. Well, maybe a hundred or
maybe a thousand or 10,000 or 100,000 or
a million. Like, at some point, you've
got to draw a line in the sand and say
you can't type in something longer than
this. And you see this in applications
all the time. Like on the web, you can
only type in so many characters
sometimes into forms. And that's for
various reasons. Among them is this. Get
string though will handle almost an
infinite number of characters because
the way we implemented get string is to
take baby steps through the input. When
you type in a word on the keyboard or
even a paragraph on the keyboard, we get
strings implementers call maloc
essentially again and again and again
and again just asking for one more bite
if we need it, one more bite if we need
it, one more bite so that you don't have
to worry about doing that. The problem
is if you were to write code yourself
without the CS50 library or someone
else's equivalent library, you have to
decide like how many bytes do you want
to allow and you have to trust that the
human is not going to mess around and
type in more values than you actually
expect. So what's happening with all of
these examples thus far is that if you
think of your memory as kind of a
minefield of garbage values wasn't a
problem when we declared n to have a
value of 50 because we told scanf to go
to that address and put the number 50
there and it fits. That's fine because
an int is always four bytes in this
case. Who knows how many times the human
is going to hit the keyboard when typing
in a string. Could be three or four or a
million or anything else. So when we
declare S here to be a pointer, it takes
up eight bytes per the Oscar the grouch
Oscar is the grouch here whereby that's
eight garbage values that collectively
represent that address at the moment
because we've not assigned it to any
other value. So if we try to tell scanf
go to this address and store high or
anything else there like who knows where
it's going to end up in memory hence the
squiggly line again and the program will
quite often crash. I didn't get it
because I didn't type in long enough of
a string, but it would eventually, if I
tried hard enough, crash because you're
touching memory that you yourself did
not allocate as an array via maloc or
some other mechanism. So, what is the
solution? Honestly, like don't use C for
user input like this unless you're
prepared to implement that complexity
yourself. Use the CS50 library or some
other library. This too is why in two
weeks we're going to switch to Python
because Python makes life so much easier
when it comes to basic things like
getting user input as do many other
modern languages. But those languages
just have code that other humans have
written to solve these problems for you.
So these problems exist but they'll be
abstracted away for you. All right,
let's tie this now together with where
we began, which was to convey ultimately
that we want to have uh the ability now
to actually access files. And we
introduce now a topic called file IO. IO
for input and output. A file is just a
bunch of bytes that are stored on disk,
where disk might mean a hard drive, the
thing that spins around with a platter
with lots of zeros and ones on it, or an
SSD, a solid state drive, which is u no
moving parts nowadays and generally
where our data is stored long term.
Whereas RAM, random access memory, the
y, the yellow pictures we've been
drawing, is volatile. That is to say,
when you lose power, the battery dies,
you lose everything in RAM. On a hard
drive or a solid state drive, that's
persistent storage or nonvolatile
storage, which means when the power goes
out, thankfully, you don't lose all of
your documents and essays and so forth,
whether it's on your Mac or PC or
somewhere in the cloud. But we haven't
yet seen any code via which you
yourselves can create files. Like
literally every program we've written,
even the phone book example last time
when I typed in names and numbers, they
got deleted as soon as the program quit
and ended. So with File IO though, we
have the ability now to start creating,
saving, editing, deleting files much
like you would from the file menu of
Google Docs, Microsoft Word, or the
like. Here are just some of the
functions that come with the programming
language C that allow you to open files
aka FOP, close files, aka Flo, print to
a file, scan from a file, read a file,
write to a file, lots of different
functions, some of which we'll explore
this coming week. But why don't we first
use them to solve a problem here in VS
Code. So, let me go ahead and close
get.c. Let's go ahead and open up a new
program called phonebook.c, C, but
implement a persistent version of it
ultimately that doesn't just get deleted
from memory when the program quits.
Let's go ahead and only because it will
make life easier, let's include the CS50
library still for this. Let's include
standard io.h for this. And let's
include string.h for this. Then inside
of main, no command line arguments,
let's go ahead and open a file called
phonebook.csv.
CSV stands for commaepparated values.
Many of you have probably used them in
the real world. They're like very
lightweight spreadsheets where things
are effectively stored in rows and
columns where the columns are
represented by just commas between
values. And we'll see this in just a
moment. How do you open a new file
called phonebook.csv?
Well, I'm going to do file star file
equals fop phone.csv.
And then I'm going to do quote unquote w
for write. So what's going on here? fop
is opening a file whether or not it
exists yet called phonebook.csv
and it's opening it in such a way that I
will be allowed to write to it. Hence
the quote unquote w per the
documentation it means I can write to
this file and not just read it. The
return value is going to be stored in a
variable called file. All lowercase by
convention but that file is technically
a strct called file in all caps. It's a
little weird. It's among the few things
that is fully capitalized in C. It
doesn't mean it's a constant or anything
like that. It's just how someone
implemented it years ago. This is giving
me a pointer to essentially the contents
of that file. That's a bit of a white
lie. Technically giving you a pointer to
a chunk of memory that represents that
file, but for all intents and purposes,
it's a pointer to the file for now. Now,
let's go ahead and ask the user for a
name and number to add to this phone
book. Let's do charar name equals get
string uh quote unquote name to prompt
the human for that. Charar number. Let's
prompt them for that. and do it with
this. And I could be using the string
data type, but I'm trying to at least
remove what training wheels we don't
technically need anymore. And now that
we've got a name and number in
variables, let's print them to the file.
That is, let's save them to the file.
Instead of print f, we're going to use
frrint f, we're going to specify what
file we want to print to in case we have
multiple ones open. What do I want to
print? A string followed by a string
followed by a new line. ergo comma
separated values one after the other per
line. Then I'm gonna pass in the values
name and number respectively. And now
I'm going to go ahead and
do f close to close that file so that
it's effectively saved. All right. So
let me go ahead and demonstrate first
that phone book.csv
does not really exist. It's empty
initially. Let me go ahead and scooch it
over to the right here so we can see
both at the same time. I'm now going to
do make phone book. Enter. So far so
good. Dot slashphonebook and let me go
ahead and type in for instance uh let's
see uh my name 617495
1000 and watch the top right of your
screen as the program f writes to it and
f closes the contents. All good. All
right, let's run it again because maybe
like the iOS app or the Android app, I'm
adding new friends to my phone book
here. So, I'm going to do dot /phonebook
and I'm going to go ahead and uhoh, top
right just got turned blank. Well, let's
try this. Kelly 6174951,000.
Enter. Okay, she's back. Let me run it
again. Dot phone book gone. Well, what's
going on here?
It's not persisting at least as long as
I would like. It seems to be the case
that like writing to a file means
literally rewrite the file. So if you
use W, you're going to write to the
file, but literally starting at the
first bite. If you want to be smart
about it and append to the file, well,
per the documentation for FOP, you
instead use quote unquote A for append
instead of quote unquote W for write.
This is a convention in other languages,
too. All right, let's start this over.
Let me go ahead and recompile this
program. Make phone book. Now, let me do
/phonebook. I'll type in my name again
first. 6174951000.
Enter. So far so good. Phonebook. So far
so good. Kelly 6174951000.
Enter. And now we're on our way. In
fact, I can close this file. I can close
this file. I can then open up
phonebook.csv.
And indeed, it has persisted. And in
fact, if I downloaded this file onto my
Mac or my PC, I could then rightclick it
or double click on it and probably open
it in Microsoft Excel or Apple Numbers.
I could import it into Google Sheets or
any number of other spreadsheet tools
because now I am persisting and writing
files of my own.
questions on any of the techniques we
just tried out here.
If we really want to be nitpicky, like
technically I should fix one bug or
missed opportunity if I open up
phonebook.c, I'm going to propose that
as with any use of pointers and
addresses more generally. Here too,
something could be wrong like maybe I'm
just out of space and so fop can't
physically open the file for me. So here
too, I should check if file equals
equals null. Okay, fine. return one and
then maybe at the very bottom here I
return zero to make clear nope nope if I
get this far all is well. So in short
anytime you are dealing now with
pointers you should be checking the
return values to see if all in fact went
well. Yeah
>> yes everything we are using is part of
standard io.h H which is wonderfully
useful now because it has not just print
f but frint f and so forth as well. Good
questions. Yeah.
>> Yes. So we have how are pointers used in
this code? The short answer is you have
to use pointers because this is how C
designed files to work. So, we couldn't
really introduce you all to files, file
IO in week one or two or three because
we had it. We'd have to introduce like
this stupid little character to you and
you'd be like, "What does this mean?
It's not multiplication." Because the
way file IO works is that when you open
a file, you are essentially handed the
address of that file in memory. That's
an oversimplification. You're
technically handed the address of a data
structure in memory that references the
file actually on disk. But for all
intents and purposes, as I said, this
gives you a pointer to the contents of
the file. And if you want to write to
the file, you need to then do use frint
f in this case, tell it what file to
write to. So you can go there and then
store something like this string with
these values plugged in. So in short, in
C without pointers, you just can't do
file IO unless it's abstracted away for
you by some library. Good question.
Other questions on file IO?
All right. Well, let me do one other
example here that's a little reminiscent
of things we see all the time on our
phones and laptops and desktops, like
these progress bars for like video
players. And you're all probably
generally familiar with the term like
buffering. If only because YouTube and
other apps when they are slow or you
have a slow internet connection, they
might say buffering dot dot dot. Well,
what does that mean? Well, a buffer is
just a chunk of memory. More
specifically, it's often an array that
is only a finite size that stores bytes
of stuff. Well, in the context of a
video player, for instance, this red
line here, which represents you're that
way through that much through the video,
it's an array that stores like the next
few bytes of a video. And ideally, if
you have a fast enough connection, when
you hit play, those bytes keep getting
downloaded and added to the buffer. And
hopefully, you don't finish watching the
bytes that have been downloaded before
more bytes have been downloaded. So, a
buffer is just a chunk of memory or more
specifically an array in a language like
C. Well, just to demonstrate how else
you can do things with file IO, let me
propose that we write a simple little
program that is our own implementation
of the CP program, the copy program that
we've used a few times already that
allows you in your terminal window to
copy one file to another, likening it to
this idea of a progress bar, where bite
by bite, you want to do something,
namely in this case, copy it, not watch
it instead. So, let me go in VS Code and
code up a program called CP.C. And in in
this program, I'm going to go ahead and
include standard io.h at the top. I'm
going to then give myself a main
function that this time does take
finally a command line argument via int
arg c and our old friend string uh arg v
which today we can now reveal to be also
just a char star. In fact, this is how
we could now technically write the
declaration for main because string no
longer exists without the CS50 library
per se. So that's really what's been
going on this whole time. Now, let me go
ahead and do this. I want to be able to
write a program that takes two command
line arguments actually. The name of the
file to copy and the name of the new
file to create from it. So let's go
ahead and create a file using the same
syntax as before called src for short,
source as is a convention. And let's
open a file using
uh the file name argv bracket one. So
the first word the human types and let's
go ahead and open it in read mode
because I want to read the source and
write to the destination. My next file
file star dst destination for short will
be fopen of argv 2,
quote unquote write. Now why one and two
and not zero and one in zero is the name
of the program which is not interesting.
One and two will contain the next two
words that the human types. Now let me
propose that I want to copy this file
from source to destination bite by bite
similar in spirit to a buffer like this
where you're just grabbing from the
internet one bite of the video at a time
so as to watch it. In this case I want
to copy it. So how can I do this? Well
we don't have a data type per se for
representing a bite eight bits. However,
a common convention is to actually use
our new friend type defaf and simply
declare bite to be something significant
or something specific. So, let me
declare a type uh called bte. And what
is a bite going to be? Well, it ideally
is just a char because a char we know is
one bite or eight bits. But recall that
chars can be treated as integers and
integers of course can be positive and
negative. So even though this is a
little esoteric, technically I want to
define a bite to be what we'll call an
unsigned char, which is probably a
keyword you haven't yet seen. But it
just tells the compiler that this char
that is this sequence of eight bits
cannot be interpreted as a negative
number because I am not doing anything
with math. These are just raw bytes or
eight bits. So now down here I can give
myself a bite and I'll call it B for
short. And now I'm going to write a loop
similar in spirit to what YouTube and
other players are probably doing which
just iterates over a file bite by bite
making in our case a copy thereof. So
while I am reading from this file into
this bite the size of one bite one at a
time into this destination.
Go ahead and check that I've read at
least one. So while the return value of
a new function called fad is not equal
to zero go ahead and
oops sorry source go ahead and call
fright another new function going to
that address of the bite grabbing the
size of it which happens to be one but
I'll use size of for consistency grab
one such bite and write it to
destination this is a huge mouthful
admittedly the last thing of which I
need to do is close the destination so
as to save it close the original file
the source. Um, but this huge mouthful
which you'll get more familiar with the
next problem set is essentially saying
on line 12 while I can read one bite at
a time, write on line 14 that bite to
the file. Implementing essentially this
idea of the red progress bar going bite
to bite to bite reading one bite at a
time reading from one file the source
writing to the other the destination.
And here too to your question earlier
like why why pointers? This is the way
file IO is done. You have to be able to
express go to this address, go to this
file if you want to get data from it or
to it. And a minor refinement too,
technically when you open in files, if
you know they're binary files, that is
zeros and ones and not asy or unicode
text files, you can technically tell fop
write and read in binary mode. So
there's no mistaking the bits for
something other than raw data, an image
or otherwise. All right. So, if I go
ahead now and do make cp, it so far
compiles. Let's try this out. So, here
again is phonebook.csv.
Whoops. Here, that's phonebook.c. Here
again is phonebook.csv with two of us,
David and Kelly. Let's try to make a
copy of this file as follows. CP. So,
this is my version of the copy program,
not the one that comes with the system.
Let's copy phonebook.csv
into copy.csv.
Enter. Let's open now the copy of
the CSV. Enter. And voila. Thank god
like it actually worked. I have made a
bite forbyte copy of this file using
syntax that was not available to us
until today. So who cares? And what's
the motivation? Well, it's a lot more
fun to treat not just text files and
these tiny little examples, but to
actually play with real world examples.
And in the next problem set, among the
things you'll do is experiment with BMP
files, bitmapped files, which
essentially just means a grid of pixels
top to bottom, left to right, much like
our cat uh that our volunteers at
classes start created for us. With a bit
mapap file, you'll store in files
literal uh sequences of pixels or dots,
each of which is going to be represented
with a specific color, a red value, a
green value, and a blue value. And among
the things you'll be able to do given
such beautiful photos as this is as the
weeks bridge down by the Charles River
is actually make your own Instagram-l
like filters to apply to photos like
this understanding now as you do or soon
will understand to be able to iterate
over the file top to bottom left to
right over each of the bytes therein and
somehow mutate the bites to look a
little bit different. So if this is the
original photo, you might be able to
make it all grayscale by changing the
Rs, the G's and the B's to smaller
values somehow that are simpler values
that are just black and white and gray
tones. You might take that same photo as
input and give it more of a sepia tone
like an old school photograph instead.
You might actually reflect it like
actually put these bytes over here and
these bites over here so as to create
the inverse of the image by reflecting
it over the the vertical axis here. Or
you might even blur the image like this.
This is kind of a common feature in a
lot of photo editing programs to either
blur or deblur. Well, you can sort of do
a little bit of math and make every
pixel a little fuzzier by kind of
clouding what the human is actually
seeing. Or feeling more comfortable, you
can actually write code now that you
know how to manipulate files and
addresses thereof and actually do edge
detection and find the salient
characteristics of something like the
bridge to distinguish it from the sky
and actually find filter-like edges like
these. So, those are just some of the
problems that you're going to solve over
the coming week's problem set and
manipulating ultimately files like these
as well as JPEGs. And the last thing we
thought we'd end on is a sort of
computer science joke which for better
or for worse, you're now getting more
and more able to interpret. So, I'll
leave you dramatically with this here
famous joke.
Oh, that's more laughter than usual. All
right, that's it for week four. We will
see you next time.
Heat. Heat.
All right, this is CS50 and this is week
five already uh wherein we will focus
today on data structures which is a
topic we've touched on a little bit in
simp in simple form but today we'll dive
all the more deeply and for better or
for worse this is our last week on C uh
next week of course we transition to
Python which is a so-called higher level
programming language which is really
frankly just going to make our lives a
lot easier we're going to be able to
solve a lot of the same problems but so
much more quickly as humans but not
necessarily as we'll see as fast when we
run the code as the computer might have
if we were still using a lower level
language like C. So indeed thematic over
this weekend next is going to be the
theme we've seen before of tradeoffs.
But before we get there, why don't we
focus on a couple of data structures
that you might encounter in the real
world. Uh namely stacks and cues. Let's
learn some facts about both of these. If
we could dim the lights dramatically.
Once upon a time, there was a guy named
Jack. When it came to making friends,
Jack did not have the knack. So, Jack
went to talk to the most popular guy he
knew. He went up to Lou and asked, "What
do I do?" Lou saw that his friend was
really distressed. "Well," Lou began,
"Just look how you're dressed. Don't you
have any clothes with a different look?"
"Yes," said Jack. "I sure do. Come to my
house and I'll show them to you." So
they went off to Jack's and Jack showed
Lou the box where he kept all his shirts
and his pants and his socks. Lou said,
"I see you have all your clothes in a
pile. Why don't you wear some others
once in a while?" Jack said, "Well, when
I remove clothes and socks, I wash them
and put them away in the box. Then comes
the next morning and up I hop. I go to
the box and get my clothes off the top."
Lou quickly realized the problem with
Jack. He kept clothes, CDs, and books in
a stack. When he reached for something
to read or to wear, he chose the top
book or underwear. Then when he was
done, he would put it right back. Back
it would go on top of the stack. I know
the solution, said a triumphant Lou. You
need to learn to start using a queue.
Lou took Jack's clothes and hung them in
a closet. And when he had emptied the
box, he just tossed it. Then he said,
"Now Jack, at the end of the day, put
your clothes in the left when you put
them away. Then tomorrow morning when
you see the sunshine, get your clothes
from the right, from the end of the
line. Don't you see? said Lou. It will
be so nice. You'll wear everything once
before you wear something twice. And
with everything in cues in his closet
and shelf, Jack started to feel quite
sure of himself. All thanks to Lou and
his wonderful queue.
All right. Our thanks to Professor
Shannon Deval at Elon University who
kindly put together that animation. And
it's meant to paint a picture of a
couple of things that we've all
encountered in the real world. But more
technically, what we just saw were what
are known as abstract data types whereby
they're data structures in some sense,
but it's really about the design
thereof. What characteristics or
features or functionality these
structures offer irrespective of how
they are implemented in terms of lower
level implementation details, which is
to say you can implement, as we'll see,
cues and stacks in any number of ways,
which are going to have real world
implications for how you can actually
use them and what kinds of problems you
can solve with them. So let's consider
for instance Q's in the first place. So
a Q is something you sort of experience
all the time. Anytime you go to a store
uh go to uh some event in for which you
have to line up in a so-called queue.
You'd ideally like there to be some
fairness property about that queue such
that if you got in line first you get
into the store first. You get to check
out first or some other such goal.
Meanwhile, the person who got there last
actually is at the end of the line and
stays at the end of the line and
therefore gets served or enters in at
the end. So Q's have what a computer
scientist would say is a FIFO property.
First in first out. That is if you're
the first person in line, you're the
first person to get out of line. And for
many problems, that is a good solution.
Certainly if you're concerned with
fairness. Um but more technically, AQ
has what we'll call two operations. NQ,
which is a fancy way of saying getting
in line, and DQ, a fancy way of saying
getting out of the line from the front
of it. But those two operations, if you
think about it in code, could it be
implemented with different actual
details? And by that I mean this here is
one way that we could go about
implementing in CC code a que for a
bunch of people or persons who want to
line up for something. So for instance
we'll decree that this queue can hold no
more than 50 people like that's the
physical capacity and then we define a
structure which we've done a couple of
times in the past whereby this structure
has not only an array of persons that
we'll call people and that will be as
big as is the capacity. So this is an
array of size 50 for 50 such persons.
And then we're going to propose that we
also keep track in this implementation
of a queue of the current size of the
queue. So we're going to make a
distinction between the capacity like
how many total people can be there and
the size like actually how many people
are in line at that moment in time so
that you know which of the spots in the
array are effectively empty. And we're
going to call that whole structure a Q.
Now the catch with this particular
implementation in code of a Q is what
there is inherent in it a a limitation
something you just kind of have to deal
with and I see you nodding what what's
your instinct for this
>> for example 50 students
>> okay well I think you hit the nail on
the head in that it's only for 50
students or 50 people which means if a
50irst person wants to get into line you
literally have no means of remembering
them in this data structure so how do
you solve that well we could just
recompile our code after changing the 50
to like 51 or maybe 500 or 5,000. But
there there's this trade-off because you
could still be undershooting the total
number of people trying to get into
maybe a big concert in the case of an
extreme. But at at the same time, if you
overallocate memory using 5,000
locations in memory, what if only a few
people show up? Now you're just wasting
memory. And certainly at the end of the
day, you only have a finite amount of
memory in the computer. So you kind of
have to decide a priority like before
compiling your code, how big is this
structure going to be? how much space
are you going to waste? And in the end,
it's all sort of stupid. It would be
ideal if instead we could just grow the
queue as needed and shrink it.
Essentially asking the operating system,
as we started doing last week, for more
memory and then giving it back if we
don't actually need that memory, which
is to say can't really do an array in
this static sense. And by static, I mean
we're literally deciding in advance at
compilation time how big this thing is
going to be. As an aside, this is also a
bit annoying for implementing a queue
because you have to somehow keep track
of who is at the head of the queue, the
front of the queue, because as you start
plucking people off, you need to
remember who's the next person
effectively. But there are ways in code
that we could solve this. So let's
consider an alternative to a queue which
gives us very different properties,
namely a stack. And we saw that in the
animation whereby uh Jack used a stack
to put his clothes into a box so that
every time he got dressed he sort of
took the sweater from the top from the
top from the top and might never wear
anything other than black as a result.
If he does a wash before he actually
reaches the blue and the red sweater
there. So a stack as we've just seen has
a LIFO property to it. Last in first
out. So, if I do a load of laundry and I
plop some more sweaters on this stack,
well, I'm presumably going to use the
last sweater that went in first as
opposed to trying to create a mess and
like, you know, pull the bottommost
sweater out, which is just going to be a
little more effort than uh than it would
be otherwise from just taking it from
the top. So, sometimes last and first
out doesn't give you maybe this fairness
property you might want for other
problems, but it does give you an
efficiency, a convenience certainly. So,
maybe that might be compelling. And
stacks are actually everywhere, too. If
you've checked your Gmail recently, odds
are you've opened up gmail.com or
outlook.com and you've looked at your
inbox. And where does the new mail by
default end up? At the top. At the top.
At the top. And I dare say all of us are
guilty of sort of neglecting emails that
fall below the break or onto the next
page and sort of focusing only on the
last in and therefore replying to it
first out, which isn't great maybe for
the senders of those emails, but it's
just how those user interfaces are
implemented quite often unless you
override those default settings. So how
might we implement a stack? Well, we
need to implement more technically two
fundamental operations. The analoges of
NQ and DQ in the world of stacks are
called push, which means push something
onto the top of the stack, and pop,
which means remove something from the
top of the stack also. And the the team
in the cafeterias and dining halls on
campus do this all day long. Any of the
cafeterias or dining halls that have
stacks of trays, of course, you put the
first tray at the bottom and then the
next tray and the next tray and the next
tray. And which tray do all of you pick
up? Well, presumably the one on the very
top because it's even harder to grab the
bottommost tray than it would be for
something like a sweater. As a result,
there's maybe undesirable properties
like maybe no one ever gets to the nasty
tray at the very bottom of the stack
because we're constantly replenishing
the top ones. But thanks to gravity,
like that just happens to be the most
appropriate data structure in the real
world for distributing things like trays
in a cafeteria. So, how might we
implement that idea in code? Well, funny
enough, we can pretty much use the exact
same structure. We could just rename Q
to stack because at the end of the day
we need to keep track of some number of
people and maybe people's is a weird
sort of analog here but we kept
everything else the same so why not that
but the size is also something we still
need to remember and it turns out it's a
little easier to implement a stack in
this way because you could always remove
it from the end of the array end of the
array and the first thing that went into
the stack the first in can always stay
at location zero for instance but
ultimately we could implement it in this
way but we have the same darn limitation
You can still only put 50 sweaters, 50
trays, 50 people into that stack data
structure. So this is just one
implementation approach. But that
doesn't mean that's necessarily a
limitation of stacks and cues. They're
abstract in the sense that we could do
better. We could maybe start to manage
our own memory, move away from
statically defining the total size of
this array and just start allocating and
deallocating, that is growing and
shrinking the data structure instead.
which is to say we can make these
abstract data types much less abstract
with actual implementations. Let's
consider a data structure that we saw an
abstract data type that we saw early on
that we didn't necessarily give this
name. A dictionary is yet another
abstract data type that's sort of
everywhere in the world literally in the
world of dictionaries containing words
and their definitions. And you can think
of a dictionary really in the abstract
if you were to draw this on the
chalkboard as really just a two column
table whereby on the left is the word
and on the right is the definition. And
if it's a physical book, it's
essentially the same thing with lots of
columns of words on the left, often
bold-faced, and then the definitions
right next to them. You can also see
this in the context of like a phone
book, which is where we began the course
in week zero, where it's essentially a
dictionary of names and numbers instead
of words and definitions. And a computer
scientist would generalize the notion of
a dictionary further and just call the
thing on the left a key and the thing on
the right a value. And these things are
omniresent in computing. And you're
going to start to see them all the more
today. next week and beyond in that if
you just want to associate some piece of
data with another piece of data, a
so-called key value pair, a dictionary
is going to be your go-to data type. But
even these two we can implement in
different ways for reasons that we've
already seen. Like maybe there's only a
finite size to this dictionary if we're
using an array. Maybe we can do better
than that. And maybe a dictionary if
implemented one way is going to be fast.
Maybe if implemented another way is
going to be slow. So we'll consider
these other design possibilities today
too in the context of phone books and
other data structures as well. After
all, if you have an iPhone or an Android
phone and Apple or Google only decided
that you can have 50 friends because
they implemented the contacts app in an
array. I mean that would be an annoying
limitation. So presumably they've done
things a little more dynamically as
we'll do today. So let's focus on the
first of the data structures we saw back
in week 2. That is an array which recall
was just a chunk of memory where you can
store values in it back to back to back
and that was the fundamental definition.
The values are back to back to back or
contiguous in memory and as we've seen
we generally have to decide in advance
the size of an array. So for instance if
we want to store three values like 1 2
and three it might look pictorially like
this or in code let's go ahead and
implement this same idea and take a
moment to whip up our very first program
here and we'll call it say list C. And
in this program, let's just do something
demonstrative of how you could use
arrays to store three things in memory.
It's quite simply the numbers 1 2 3, but
you can imagine it being three people's
names, three sweaters, three people, or
any other piece of data as well. So, I'm
going to go ahead and at the top of list
C include standard io.h. I'm going to
then do int main void. So, no command
line arguments. Then, I'm going to go
ahead and give myself an array of
integers of size three called list. And
that's how we've done that uh from week
two onward. Then just for the sake of
discussion, I'm going to hardcode some
representative values. So the first
value will be at location zero because
arrays are zero indexed. Then I'm going
to do the second value which will be
two. And then the third value which will
be at location two, but the value will
be three. Now just to prove that we've
stored this correctly in memory, let's
just do a quick for loop for int i
equals uh equals z. Uh i is less than 3
i ++.
And then inside of this for loop, I'm
just going to do a quick print f of
percent i back slashn printing out the
value of list at location i. So it's not
a useful program per se, but it gives us
an array to play with. It prints out
that what's in it. So hopefully we will
see one, two, and three on the screen.
So let me make this list program dot
/list enter. And voila, we're on our way
going. All right. But what if now we
actually want to uh change that design
and be like, "Oh, shoot. I now have a
fourth number that I want to store or
just bought a fourth sweater or a fourth
person wants to get in line or I want to
add a fourth friend to my contacts.
Whatever the scenario might be, it
stands to reason that ideally you would
plop that fourth value right here in
memory so that everything remains
contiguous. You're still using an array.
Your code doesn't really have to change
except for the length. All for for all
intents and purposes, it's the same
implementation using a just a bit more
memory. But recall that when you declare
an array of a fixed size, you only are
getting promised that chunk of memory,
not necessarily more memory to the
right, to the left, above or below
conceptually because recall in the
context of your whole computer, you've
got this canvas of memory, all of which
represent here bytes. And there could be
a whole bunch of actual values or
garbage values in memory. So in a more
complicated program, that 1 2 3 sure
might end up here. But if I also had
created a string in this program, h e l
o comma world might have also ended up
right next to it in memory. Which means
I can't just plop the four here because
then if I'm still using that string
elsewhere in my program now it's going
to say hello world instead of hello
world because you're just claiming the h
that bite as your own which does not in
fact belong to your array. Of course
there looks like there's plenty of other
memory I could use here because these
garbage values represented by Oscar are
not being used. They've been used in the
past, but we treat garbage values as
memory we could reuse. Certainly. So,
wouldn't it be nice to maybe just plop
the 1 2 3 and four in this chunk of
memory over here? And I can totally do
that. But, of course, if I want to do
that, I got to copy the first three
values over and then put the fourth one
there and then presumably give back to
the operating system the memory I no
longer need. So, that in fact when using
arrays is a perfectly valid solution.
And I think we can go ahead and do this
in our same program. So let me go back
to VS Code here. And instead of
statically allocating memory for this
array and by static I mean literally
hard hard- coding the number three here
in a way that is permanent uh
effectively. Let me go ahead and do this
instead. At the top of my code, let me
delete the static allocation of that in
uh that array before. And now let me
leverage my understanding if still
preliminary of pointers and memory
management from this past week four to
just dynamically allocate a guess at how
much memory I need initially. So I'm
going to go ahead and use maloc and
allocate space for three integers but
integers take up a few bytes and it's
usually is four but just for good
measure I'm going to say times whatever
the size of an int is is the total
number of bytes I want. So presumably
it's going to be 3 * 4 equals 12. But
I'm generalizing it. But then recall
that maloc returns the address of that
chunk of memory, the address of the
first bite. So if I want to create an
array effectively called list, I can't
just do int list like this yet. But what
I could say is that all right now my
list variable is actually going to be
the address of an integer and set
maloc's return value equal to that. So
in code here what I've done is I'm
asking on the right hand side the
operating system please give me 12
contiguous bytes in memory. All of those
bytes of course can be numerically
addressed like ox123425.
We've had that story before. Maloclock
by definition returns the address of the
first such byte and it's on me to
remember that I allocated 12 if need be.
So I'm just storing the address of that
first bite in a pointer called list. But
recall from last week, there's this
functional equivalence we saw between
treating a pointer as an array and
sometimes even treating an array like a
pointer. The C uh language sort of lets
us do this this conversion if you will.
So what I could do here now is quite the
same syntax as before. I could say list
bracket 0 gets one, list bracket one
gets two, list bracket two gets three.
And even though I have this fancy new
line inspired by week four, the syntax
thereafter can be exactly the same. Why?
Well, recall that these three lines here
using square bracket notation is just
syntactic sugar for the stuff we learned
last week. Specifically, I could instead
of doing list bracket zero, I could much
more arcanely say go to that address in
list and put the number one there,
please. I can say go to the address list
+ one and put the value two there. I
could then say finally go to the address
at list + two and put the number three
there. But this looks ridiculous and
even u sort of an experienced programmer
might not be inclined to do this. If
with using fewer keystrokes and more
readable code, they could just do
instead what I did the first time
around, which is functionally the same,
and just treat that chunk of memory as
though it's an array. and the computer
will essentially do the requisite
pointer arithmetic to figure out where
to put one, two, and three. So even
though this is still kind of fresh, hot
off the press from last week, it's
exactly the same as we tinkered with
last week. So suppose now that some time
passes and I realize for the sake of the
story that oh shoot, I need more than
three integers. I need space for four so
as to achieve this picture in memory.
Well, I could of course just like delete
all that code, change the three to a
four, redo the whole thing, recompile
the code, rerun it. But let me propose
that we write our code in a way that
allows us to change our mind while the
program is running how much memory we
actually need. And case in point, if you
meet someone new, you want to add them
to your phone. Well, you obviously don't
want to have to wait for Apple to
recompile the contacts app, reboot your
phone just to add one more person. You
want the program just to ask the
operating system for more memory for
that new person. So in this case, let's
just pretend that some time passes and
now I want to go ahead and actually
change my mind and instead allocate
space for four integers instead. Well, I
could do something like this. I could
just say literally list equals maloc of
4*
size of int semicolon. I don't need to
redeclare list on line 13 because it
already exists from line five. But this
is bad because what have I done wrong
here in line 13? I've made a poor
decision. Yeah, in front.
>> You
like waste all the memory that
>> Yeah, I'm wasting all of the memory I
had from line five because I'm
essentially forgetting where it is. If
the list pointer is literally a pointer,
like a foam finger pointing somewhere in
memory, what I'm really doing is saying
point it over here now, but I've
completely lost track of those other
three integers in memory. And that's
what we described last week as a memory
leak, which you could find with
valgrren. And if you didn't find it or
fix it in code, eventually the computer
and the program would slow down over
time. So this is probably bad. It's not
good to just unilaterally change your
mind and say, "No, no, no, forget about
that memory. Give me a new chunk of
memory." especially if you want to copy
the old memory into the new, just like I
did a bit ago when trying to get the 1 2
3 into the bigger chunk of memory that
can fit 1 2 3 4. So, how might I do
this? Well, a temporary variable is kind
of our go-to solution anytime we need to
remember something in addition to uh
something we already have in mind. So,
let me just give myself a temporary
variable called tmp by convention for
short and set the return value of this
mala call to that. And then what I could
do is something like this. Much like my
print statement earlier, I could do
another for loop and say for int i
equals 0, i is less than 3, i ++. And
then in this for loop, I could say treat
that new chunk of memory as an array
like we can set the i location equal to
the i location in list. So these lines
here
copy old list into new list. It copies
those first three values. And then what
I bet I could do at the bottom here is
then just manually I can say go to the
fourth location which when you zero
index is technically bracket three and
set that equal to the number four. So
these lines here copy the one, the two,
and the three using a loop. And then
line 20 here at the moment just adds the
fourth value. And again, this is a
stupid sort of way to write code in that
if you want to put the four there, you
should have just done it earlier. I'm
just pretending that some time has
indeed passed in the program. and I've
changed my mind along the way and I want
to let the user add some value to
memory. Okay, but before we proceed
further, I dare say that there are some
other mistakes we should clean up. One
of the lessons I preached last week was
that anytime you use Maloc, what should
you do or check for
is you should always what? You should
always free. So here I'm clearly not
freeing any memory. So I should
definitely do that. And there was one
other rule of thumb with memory. What
should you always do when using Malik?
Yeah.
>> Check to see if null came back, which
just means something is wrong, like it's
out of memory or something else went
wrong. And if you don't do that, your
program may very well crash with one of
those segmentation faults that we saw uh
briefly in the past. So, it makes the
code a lot more bloated, but it is good
practice. So, let's just check if the
list pointer I get back contains null.
There's no point continuing on. Let's
just go ahead and immediately return one
because something has indeed gone wrong.
And then down here under maloc again,
let's do the same. If the temporary
pointer also contains null, now let's go
ahead and similarly return one or any
other nonzero value. But here's a
subtlety and let me combine your two
ideas. If I immediately return one on
line 20 after the second maloc call
fails, what should I still go back and
do first?
Yeah. Yeah. You want to elaborate on
your first instinct?
>> Yeah. I want to still free the first
chunk of memory because if we execute
line five and all is well, which means
that line 6, 7, 8, and 9 don't apply.
Like it's not in fact null. We got back
a legitimate value. That means we have a
chunk of memory given to us for three
integers, which means it still exists
down here at line 19 and 20. So if I'm
ready now to abort this program and
return one to signify error, I first
want to free that original list and say
to the operating system, here's your
memory back. Now, as an aside, strictly
speaking, this is not necessary because
the moment the program itself quits, the
computer is just going to give back the
memory to the operating system. So when
programs quit, the memory leaks sort of
go away, but your code is still buggy.
And generally we're running software
that doesn't run for a split second but
for minutes, hours, days, uh continually
in which case it's best practice to
squash these memory related bugs now.
Check for null, free any memory so that
you never indeed encounter these kinds
of leaks. All right, so let's forge
ahead a little bit more and let me
propose that after we have done the
copy, we now want to similarly free the
original list. However, what I think
we're going to want to do first is after
freeing the original list is remember
that the new list is effectively
that which we allocated the second time
around. So even though this program is
getting a little long, notice that what
I've just done is I've said, okay, store
in the list variable the address of this
new chunk of memory. So that list now
with a foam finger is effectively
pointing here instead of up here. But
before that, I made sure to free what my
finger was pointing at originally, the
list pointer. All right. Lastly, let's
just scroll down to the bottom of the
code here. I can manually change the
three to a four just to demonstrate that
I've stored all four values in here. And
then at the very end of the program, I
think I have to free the list again
because now list is pointing all the
foam finger to the bigger chunk of
memory, the 1 2 3 4. And then I can go
ahead and return zero at the very end
because all is hopefully well at this
point. Let me go ahead and open my
terminal window again and make this
version of list. I made a lot of
mistakes here it seems. Let's scroll up
to the very first call to undeclared
library function maloc dot dot dot. What
have I apparently done wrong or
forgotten? What have I done wrong? Yeah.
In back. Yep. Yeah. So in standard lib.h
H is where maloc is actually declared.
So let's just add that quickly. Let's go
ahead and include standard lib.h in
addition to standard io.h. Let me clear
my terminal window. Rerun make list.
Enter. Now we're good. Dot /list. And ph
we see 1 2 3 4. Okay. So at this point
in the story, all we've done is write a
dopey little program that allocates
memory for three integers. 1 2 and
three. then changes our mind and
allocates more memory for four integers,
freeing the original chunk of memory
after copying the first three integers
into the new memory and adding that
fourth value. But this is kind of a lot
of hoops to jump through. And let me
propose one refinement here. So if back
in VS Code, we go back into list.c here.
It turns out that at least this loop
isn't strictly necessary, not to mention
the fact that we already have another
loop for just printing the list. If I
want to more cleverly reallocate memory,
it turns out that there's another
function that we didn't talk about last
week, but is in standard lib.h2 called
realloclock, which as the name kind of
suggests, it reallocates memory, but a
little smarter in that it will try to
grow your existing chunk of memory if it
can, which is going to be super
efficient because then you can just plop
the four at the very end. or if there
just isn't room there because maybe
someone else put hello world right there
in memory elsewhere in your program.
It's going to do all of the copying for
you. So what you get back ultimately is
a pointer to the new chunk of memory
containing all of the original data as
well. However, we're still going to have
to check for null. We're still going to
want to free the original list if
something goes wrong and then return
one. We're still going to want to add
the fourth value because realo has no
idea what more we want to put in the
list. But I can in fact delete my other
for loop whose purpose in life was just
to copy all of those integers from old
into new. All right, that was a lot. Let
me pause for any questions.
>> How does real know that it should
reallocate the memory in list? Should
you tell like if you have a lot of
before, how does it specifically?
>> Very good question. That's because I
wrote a bug uh that we didn't trip over
because I didn't compile this version of
the code. So the question is how does
realloc know what to realloclock? Well,
according to the documentation which I
forgot to read, you need to tell
realloclock what the address is of the
chunk of memory that you do want to
realloc. So the first argument to
realloc, which I did admittedly forget
until a moment ago, is to put the
address of the chunk of memory that you
already maloced earlier so that it knows
to go there, see if there's indeed some
garbage values it can reclaim at the end
of that chunk of memory or if it has to
wholesale move things elsewhere in
memory to give you four times the size
of the int this time instead of just
three. But still things can go wrong
like you still want to check for this
null value because real might not be
able to give you enough memory or your
memory could just be so fragmented that
even though you want four bytes maybe
there's three bytes over here two bytes
over here one bite over here if there
aren't four contiguous bytes realloclock
2 could fail and it will return null to
signify as much other questions on any
of this
>> why do we still need the tempable
>> why do we still need the temp variable
for the same reasons as before because
if we just say list equals reallock and
something does go wrong. Realloc by
definition will return null but not
touch the original memory which case we
have now lost track of where that
original chunk of memory is. So we can
never go back to it to print it to
change it to free it. So we have to use
this temporary variable here. Good
question. Other questions? Yeah.
>> Is there a reason?
Is there a reason that we free list
instead of temp? Uh, so let me So down
here or further down? Okay, so further
down, let me scroll down to where we
came from. So here after we've added
this fourth value to temp, I've gone
ahead and freed list, which at this
point in the story is still pointing to
the original chunk of memory, the 1 2 3.
Then I am updating
list as a variable to point to the new
chunk of memory. Then I'm doing my thing
by printing out all of the integers
therein. Then I am freeing what list is
then pointing to. So I'm not technically
freeing the same address in memory
multiple times because I'm in the
intervening time moving what list is
pointing to.
>> Absolutely
yes. it would be correct to go ahead
down here and just say temp because temp
is still in scope. It's still pointing
at the same thing. I would just argue
that that's semantically wrong because
at this point in the code really list is
the variable you care about. Temp was
really meant to be a throwaway temporary
variable and you're asking for trouble
if you use a temporary variable later
than you the programmer intended. And if
a colleague did that too, who knows what
you've done with the temp variable in
the meantime. Good questions. Yeah, in
front
Real always goes for the like memory
space right after your original place.
>> Correct. Realloc will try to give you
more memory in the same location as
before if there's room at the end.
>> The code we made earlier originally
instead of realloc
>> so realloc will two potential things for
you. So if the computer's memory looks
like this, you're sort of out of luck
because realo can't give you this bite.
However, if it finds like four bytes
down here, for instance, realloc will
not only allocate those four bytes for
you, it will then copy the data for you
over to it, which is wonderful because
it just means we don't need an extra for
loop all the time we do this.
Yeah, in front.
>> How does it know how much data?
>> How does it know how much data to
>> copy?
>> Uh because how much how does the how
does real know how much data to copy?
Because the operating system and you can
think of it as the standard library
stdlib.h
keeps track of what memory has been
allocated for you in the past. So when
you pass in that same address, it knows
it has essentially a lookup table, a
dictionary if you will, that tells it
what memory has been allocated already.
So you don't have to worry about that.
>> Yeah. In front.
>> Good question. In other programming
languages, you don't always have to
declare the length of an array. Case in
point, Python coming next week. That is
because someone else who invented that
programming language wrote all of this
kind of code for you. And indeed, that's
one of the goals with our transition
between weeks five and six is to
demonstrate that all of these problems
are still being solved, just not by you
and not by me anymore. We're standing on
the shoulders of other smart people who
have invented not just new code, but
like a new language and a new compiler,
or as we'll see, an interpreter for it
so that we can hide all of these lower
level details. Because honestly, as you
can see already, like this is an
annoying number of lines of code just to
have a conversation about the numbers 1
2 3 4. In Python, we could reduce this
code to like two lines of code, one line
of code. It's going to be fun. All
right, so with that said, the uh among
the goals here was to demonstrate that
there are a bunch of ways in which we
can implement these data types, but
let's talk more concretely about what
we'll call data structures, which are
concrete definitions of how you use the
computer's memory to lay stuff out in
memory. and using data structures, you
can implement stacks and cues and
dictionaries and all of these other
things. So, we're going to put into your
toolkit today a whole bunch of canonical
data structures that like every computer
scientist does and should know that you
necess won't necessarily implement all
of the time yourself. But when you use
some feature of Python or Java or C++ or
some other language, you are choosing
among typically implementations of these
data structures that someone else has
written the code for so that you can
just benefit from the functionality and
the features thereof like that FIFO
property we talked about or LIFO without
having to get into the weeds too much
yourself. So when it comes to data
structures, let's consider that we have
at our disposal now a few new pieces of
syntax in C and we're going to add just
one more today. We saw last week that we
have the strruct keyword and we've seen
that for a few weeks now. Whenever we
want to invent our own data structure,
we can use literally strruct. We saw in
the past that you can use the dot
operator to actually go inside of a
structure to get at someone a person's
name or their number. And we saw last
week the star operator for dreferencing
a pointer, dreferencering an address to
actually go somewhere like inside of a
structure wonderfully. Today we're going
to see that you can actually in some
cases combine the dot and the asterisk
into a single operator with two
characters that literally looks like an
arrow and that will help reflect the
yellow and black drawings that we've
done over the past couple of weeks where
we have an arrow on the screen pointing
somewhere. This literal arrow in code is
going to line up with that same concept.
So let's introduce the first of our
alternatives to arrays. An array again
is a contiguous chunk of memory where
the values are back to back to back.
Among the upsides so fast because like
all the data is right there. We've seen
since week zero, you can do binary
search and just jump around randomly by
just doing simple arithmetic to go to
the middle the middle of the middle by
just dividing by two a couple of times
and rounding as needed. But the problem
with arrays to be clear is that they are
statically
uh they are statically all allocated to
be a specific size maybe three maybe
four but it is a finite value which is
problematic because look at all the code
we had to write just to resize these
things again and again. Well, what if we
sort of try to preempt that kind of pain
and try to just build up a list by
linking it together no matter where the
values actually are in memory and move
away from this constraint that
everything has to be contiguous. After
all, as I said a moment ago, if the
computer has plenty of memory here,
here, here, here, that to collectively
is more than enough memory, but none of
those individual chunks is quite as big
as you need for an array. Well, heck,
let's at least try to leverage all of
the available memory and stitch together
the data structure as opposed to really
holding firm this constraint that the
array be back to back to back and
contiguous. So, a linked list is
something you can now build using that
syntax from last week and a bit more
today in your same canvas of memory. So,
that for the sake of discussion, suppose
that we want to store first in our list
the number one. Well, we all know
already that it might very well exist at
an address like ox123 for the sake of
discussion, but it's somewhere there.
Suppose that you want to store a second
value in memory, but you didn't think
about it initially and so you weren't
smart enough to put it like right next
to the one and then the next value next
to that, but you know somehow from maloc
or similar functions that you could put
the number two over here at address
ox456 for the sake of discussion and
similarly there's room for the number
three over here at say address ox789.
So already we have a list of values in
memory, but because they're not
continuous, you can't just do some
trivial plus+ trick to go from one to
the other because they're differing
numbers of bytes apart. They're not just
backto back one bite. So what if we try
to solve that problem in the following
way? Instead of just using one bite for
each of these values, let me waste a
little bit of memory or spend a little
bit of memory and have some metadata
associated with our data. So data is
value or values you care about. Metadata
is data that helps you maintain the data
you care about. So let me propose that
we use two chunks of memory for every
value such that the top of each of those
chunks represents the actual var you we
care about 1 2 and three respectively.
And you can perhaps see where this is
going. The second chunk of memory that
I've allocated to each of these values
could perhaps be a pointer to the next
one. A pointer to the next one. And if
this is the end, we can put our old
friend o x0 aka null and just treat that
as the end of the list implicitly. So
even though these things could be
anywhere in memory, by just storing with
each value the address of the next value
in memory, creating effectively a
treasure map or breadcrumbs, however you
want to think of it metaphorically, we
can get from one node to the other. And
indeed, that's going to be a term of art
we start using. A node is just a generic
structure that contains data and
metadata usually like the number you
care about and a pointer to the next
such node. Um these are not to scale as
an aside. This is typically four bytes.
A pointer as we've discussed is
technically eight bytes but it just
looks prettier to draw them as simple
squares on the screen. So what does this
really mean? Well, who really cares
about ox 1 2 3 4 5 6 7 8 9. We can
really think of this actually as being
more of a picture with arrows. But to
keep track of this list of three values,
I do propose that we're going to need
one additional value over here. And it's
deliberately just a single square
because to keep track of this list of
three values, I'm going to use just one
variable called say list and store in
that variable a pointer as we defined it
last week, the address of the first
node. Why? Because the first node can
then get me to the second. The second
node can then get me to the third and so
forth. So what's the upside now? If I
want a fourth value somewhere on the
screen, I could put it here, here, here,
here, wherever there's enough room and
just make sure that I update the arrow
to point to that next chunk. Update the
arrow to point to the next chunk.
There's no copying of data. 1 2 and
three can stay there now forever until
the program quits and we do actually
free it. But we can just keep adding
adding adding or growing this data
structure in memory. So that is what the
world knows as a linked list. In Python
to which you were essentially alluding
um a list in Python is indeed a linked
list. Other languages call these vectors
but they are essentially arrays that can
be grown and shrunken automatically
effectively without you having to worry
quite as much about it. So how does the
code for implementing something like
this work? Well, let me propose that we
have this familiar friend of a person,
which we claimed in past weeks has a
name and a number associated with them.
We know from last week that strings are
not technically a thing in C as a
keyword. So that's technically just char
star name and number, but same idea
otherwise. And this is what we defined
in the past as a person. So this is a
structure we've seen before. I now need
to implement the code equivalent of
these rectangles, each of which has an
integer and then a pointer to the next
such value. So let me propose that we
delete what's inside this structure,
change the name from person to node,
which again is a generic term for a
container of values, and let me propose
that inside of this new node structure,
we put literally an int for the number
we care about. There's going to be my 1
2 3 or four. And then and this is a
little bit new. Let's include in this
structure a pointer to the next such
node. It's a pointer in the sense that
it's an arrow. It's the address of the
next node. So that's why we say node
star. I could call it anything I want,
but semantically calling it next makes
perfect sense because it's the next such
node. But this isn't quite right. For
annoying technical reasons, I need to do
one other thing here. I need to
technically and we've not done this
before put the name give the a temporary
name to this structure if you will. So
literally say strruct node here even
though I've already said node here. Why?
Because I technically need to change
this line to say strruct node star. Long
story short why is this necessary? Well
recall in the past C and the compiler
read your code top to bottom left to
right. Well if in a previous version of
this code we use the word node here but
the compiler never sees the word node
until down here. like it's just not
going to compile because the word
literally doesn't exist. We saw this
with functions in the past. So we the
solution to that was to put the
prototype higher up in the file and then
it would compile. Okay, you can think of
this as somewhat analogous whereby if I
give this structure a name on this first
line even if it's redundant to this one
then I can say struck node inside of
these curly braces because the compiler
has already seen the word node there. So
just you have to do it this way. So now
that we have this in code, we can kind
of start playing around with actually
storing these things in memory. So let
me propose that we go ahead and do this
by transitioning back to VS code here.
And let's instead of using our array
based implementation, let's implement
the first of our linked lists. And I'm
going to be a bit extreme and delete
pretty much everything inside of main. I
am for convenience now going to include
the CS50 library not so much for the
char star thing but because as we
discussed last week it's still useful
for getting ints and getting strings and
other things which instead unless you
use scanf are much harder and more
annoying to get in C. So let's go ahead
and do this um outside of main let's go
ahead and invent this node called
strruct node here. Then inside of my
curly braces, we'll give every such node
a number and every such node a pointer
to the next such node. And we'll call
this whole thing node by convention.
Then inside of main, let's go ahead and
do this one step at a time. Let me
propose that to create a linked list.
Initially, it's empty. So how do I
represent an empty linked list? Well, I
could call the variable list and set it
equal to null. But what is the data type
for a linked list? Well, per the picture
that we had up earlier, in so far as all
we need is a single pointer at far left
here to represent the address of the
first node in the list. I dare say all
we need to say is that our list is of
type node star. That is to say, what is
the link list? Well, it's by definition
the address of the first node in the
list.
So that's the first subtlety here. So
that gives me a picture with no other
nodes. It just gives me a single pointer
initialized to null. Now let's go ahead
and for par with the previous example
just do something three times. So in
this for loop structured exactly as
before, let's go ahead and allocate a
new node, ask the user for a number to
put inside of it and then start
stitching things together so as to
achieve a picture in memory quite like
this. So how am I going to do this?
Well, first I need to allocate a new
node. How do I do that? Well, I can use
our new friend Maloc and allocate the
size of a node. I want to store the
address of this chunk of memory
somewhere. And what I'm going to propose
is that we have a temporary variable and
I'll call this n which whose type is
that of a node star. So what am I doing
here? I'm trying to build up this list
in memory so that I first have a pointer
to the list. I I first have a pointer
that is null pointing nowhere. no list
exists. I then want to go ahead and
create one new node, store value in it,
and then point my list at that node.
Then I want to do it again and again a
total of three times. So how do we do
this? We allocate space for the size of
a node. However many bytes that's going
to be, it's probably going to be 12 cuz
it's four for the int and eight for the
pointer, but who cares? Size of will
answer that question for me. I'm going
to store the address of this chunk of
memory inside of a temporary variable
called n for node and that's why it has
to be node star because it's going to be
pointing to an actual node. I'm going to
do my quick sanity check. So if n equals
equals null, we can't proceed further.
I'm going to go ahead and just return
one right now. So that's just sort of
boilerplate code you should be in the
habit of doing anytime you're using
Maloc. But if all goes well, let's do
this. Let's go to the address in n and
then go inside of that node and change
its number to be whatever the human
wants it to be by using get int and just
prompt the human for their favorite
number. Then let's go to that same node
and update the next field to equal for
now null because all I want to do is
allocate one new node with that number.
That's it.
Then I'm going to need to stitch this
together further. So I'll propose that
all we need do and let's clean this up
first is now make sure that we string
these nodes together. This syntax isn't
quite right because technically because
of precedence I need to drefer oops I
need to
uh dreference n and then go inside of
it. I need to dreference n and then go
inside of it. However this syntax if
it's looking a little overwhelming and
you have no idea now what's going on.
Thankfully in C there's much simpler
syntax which is this. Go to the node and
go inside it to get the number. Go to
the node and go inside it to get next.
So the arrow notation that I promised we
would now have is the same thing as
using the star operator the deep
reference operator parenthesizing it.
Then the dot operator which is just a
pain in the neck to write out all the
time. I dare say n arrow number and n
arrow next is just much simpler. It says
go to n and point at the number field or
the next field respectively. All right.
So the last thing I'm going to propose
we do and then we'll make this much more
clear in picture form is this. Let's go
ahead and prepend
the node to the list. And by prepend I
mean insert it at the beginning. Insert
it at the beginning. Insert it at the
beginning again and again. I'm going to
say n next equals list. Then update the
list to set equal to n. And then after
all of this mess, I'm going to return
zero. Okay, this was a huge amount of
code, but let me give a quick recap.
Then we'll paint a picture. Here is my
init list initially. So the foam finger
is pointing to null, which is means the
list is of size zero. There's nothing
there. Then I ask the computer to do
this three times. Give me enough memory
for a new node. Then after checking that
it's not null, put the user's favorite
number in it and update the next field
for the moment to null. Then lastly, go
ahead and prepend this brand new node to
the existing list. And by preand
prepend, I mean put it at the front. So
n at this moment is pointing to that new
node. And I'm saying, you know what,
whatever the current list is, empty or
otherwise, set the next pointer equal to
the list, whatever that list is, and
then change the list to point at this
new node. So now let's do this more
carefully, step by step, in picture
form. So I'm going to propose that we go
through some of these representative
lines as follows. Here is the first line
of code even without the assignment. If
you just allocate a variable called list
that's a pointer to a node, what you
essentially has is a box of memory that
looks like this. It's a garbage value
though because there's no assignment
operator. So who knows what's inside of
this pointer. That is why in my actual
code I set it equal to null which
effectively creates in memory the same
box but gets rid of Oscar the Grouch and
puts the null value there. So we know
it's not a garbage value. It's a pointer
known as null. So that's what that very
first line of code did in the computer's
memory. The next thing I wanted to do
was allocate enough memory for a node,
not a node star, for a whole node. I
want that whole chunk of a rectangle
given to me in memory. That's going to
return to me the address of the first
bite thereof. And I'm going to store
that in a temporary variable called n.
So at this point in the story, n is
going to be a pointer of its own,
another box that initially sure is going
to be a garbage value, but because I am
using the assignment operator, it's
going to point to that chunk of memory
which maloc if successful presumably
allocated for me in the computer's
memory. So n for all intents and
purposes points at that same chunk.
These values are still garbage values
because it's just a chunk of memory. Who
knows what it's been used before? But
that's why after this line of code, I
took care to get an int from the user
and then initialize the next pointer to
null. So for instance, for the sake of
discussion, let's get rid of get int for
the picture and just say the human typed
in the number one initially. Well,
that's equivalent to putting the one in
the number field by first going to the
address of in n and then dreferencing it
using the star and the dot notation
respectively. So that means follow the
arrow and then change number to the
value one. Then the next line of code or
rather or equivalently you can just do
the same thing. And thankfully now C
syntax lines up with what the pictures
look like we've been drawing. Go to N
follow the arrow to the number field.
That's literally what the syntax is
telling me. Meanwhile, if I use that
same syntax again for N arrow next set
it equal to null. That's like saying go
to N follow the arrow and change the
next field in this case to null. or
we'll just blank it out to be clear. So
at this point in the story, we have
allocated the node. We have stored one
and null. There list is still null. N is
pointing to this, but the whole point of
this exercise is to add this node to the
list. So we need to somehow update this
value, which is why ultimately I'm going
to do something like list equals N. Now
that seems a little weird semantically,
but recall that N is a pointer. That is
the address pointing at ox123 or
wherever that is. So to point list at
the same node, it's equivalent to
setting list equal to n because then
we'll effectively have an arrow
identical from list pointing at that new
node. And at this point, I don't even
care what n is anymore. It was always
meant to be a temporary value. This now
is my list. So even though I did it in
code already pre preemptively in a loop,
the first iteration for that loop
literally created this in memory. Let me
pause before we go through numbers two
and three for any questions
because the VS Code version looks scary.
This is perhaps a little more
bite-sized.
Okay. So, how about we do this twice
more for two and three, respectively.
So, again, inside of our loop, we're
back to this line, which asks the
operating system for enough memory for
the size of a node, stores that address
temporarily in a variable called n. So,
here's our friend Oscar brought back
onto the screen. Maybe the new chunk of
memory is over there. This effectively
points n at that chunk of memory. The
next line of code inside of that loop
that's relevant is this. And we'll get
rid of get int and just pretend that I
literally typed in two. We're going to
go to this version of n, follow the
arrow, go to the number field, and set
that equal to two. The next line of
code, we start at the end, follow the
arrow, change the next field to null.
And then same lines as before, we now
need to update list equaling n. But
something's about to go wrong here. If I
update list to point to the same node
that n is pointing at, watch what
happens. I set list equal to that n
because it's temporary might as well go
away at this point. But
what have I done wrong logically here?
Yeah,
>> you lost the arrow to
>> Yeah, I lost the arrow to the original
node. I have orphaned the first node
because now nothing in my code is
actually pointing at it. I've got in
duplication two pointers pointing at
this chunk of memory. So this thing,
even though we obviously as humans can
still see it, we have lost track in code
of where it is, which means that is the
definition of a memory leak. I can never
get that back or give it back to the
operating system until the program
itself finally quits. So, I think I need
to be a little smarter and not do this
line quite like this yet. I think what I
want to do, and I've rewound, so list is
still pointing to the original list. N
is pointing to only the new node. What I
think we need to do is something like
this. And this is why the code was
fairly non-obvious in VS Code at first.
Go to N, follow the arrow, go to the
next field, and here's the cleverness.
Point this pointer to the existing lists
value. So if the existing list is
pointing here, that just means, hey,
point this to the exact same thing
because now I can safely update the list
to point at the same thing as n. So its
arrow now points here. But even when I
get rid of n, I wonderfully have the
whole thing stitched together. And the
metaphor I often think of is like around
like Christmas time in olden times when
people would like stitch popcorn
together. That's what you're kind of
doing with a thread here. You're trying
to stitch together these nodes or
popcorn kernels if you will such that
one can lead you to the next can lead
you to the next can lead you to the next
but you can never let go of part of that
strand in the process. So here now we
have a list which is great because
notice we haven't touched the one but
we've added the two. We can go ahead in
a moment and add the three but you can
perhaps see where this is going. I'm
kind of doing it backwards by accident
but we'll get there soon. So now let's
allocate a new node run through in our
mind's eye all of those same steps. I'm
going to hopefully end up with a list
that now looks like this. And even
though it's kind of long and stringy,
these values could be anywhere in
memory, but because of these various
pointers, I can jump from one location
to the other, making more efficient use
of everything inside of the computer's
own memory. All right, but of course,
we've got this symptom that I didn't
really intend whereby the whole darn
thing is backwards. But I think that's
kind of okay for now. But I'd like to
propose that we consider how we can now
maybe traverse this thing and actually
print out the values in memory. So let
me go ahead and do this. Let's go ahead
and how about
let's say let's go back to VS code here.
So at this point in the story we've got
the same code that implements that same
idea except I'm using get int just so
that I can dynamically type in the one
the two and the three without having to
hardcode it into the actual code.
Suppose that after doing this exercise,
I actually want to do something
interesting like print the numbers.
Well, we don't have that code yet in
this version of my program. So, let's
bring that back. Last time I did this
just using a for loop and array
notation. And I think I can do that. But
let me propose first that I implement
this idea pictorially. Here's the same
diagram. This is what exists in the
computer's memory. If I want to go ahead
and print out these numbers, albeit in
reverse order, let me propose that we
can do this by giving ourselves another
temporary variable. We'll call it ptr,
pointer for short. And that's like
having another foam finger that points
at the start of the list. So it's not
pointing at list. It points at whatever
list is pointing at, which means here.
Then I can print out the three pretty
easily. So long as I next update pointer
to point to the two, print it out. then
point it to the one, print it out, and
eventually I'm going to realize, oh, I'm
out of nodes because the end of this
list is null. So that's the idea I want
to implement now logically in code.
Create a temporary variable called
pointer. Set it equal to whatever the
list itself is. Print out the value,
update the pointer, print out the value,
update the pointer, print out the value,
update the pointer, realize it's null,
and stop. So in code, it's a relatively
small loop, even though the syntax is
still pretty new since we've only just
started playing with memory since last
week. But what I'm going to do is
exactly what I proposed. I'm going to
create a new pointer called ptr and set
it equal to the list itself. That's like
having another foam finger temporarily
pointing at the first element in the
list. Then what I'm going to do is say
while that temporary variable is not
null, go ahead and traverse the list.
What do I mean by that? Well, let's go
ahead and print out the current element
in the list by using percent i back
slashn and printing out whatever the
pointer is pointing at specifically its
number field. So that is follow the
arrow and print out the number. Then
inside of this loop, I'm going to update
after doing that my temporary variable
called pointer to be equal to pointer
arrow next. And that will have the
effect with just those few lines of code
of implementing precisely this idea. I
first set pointer equal to the list
which happens to point here first. I
then do my print f and then I update the
next field rather I update pointer to be
the value of pointer follow the arrow
next. So if this is ox123 for instance
that is what is now in oh sorry if this
is ox456 that is what's now in pointer.
So the arrow effectively looks there in
my for loop I print out with percent i
this number and then I go to the next
field follow the arrow and then set it
equal to rather whatever this pointer is
here ox789
set it equal to the pointer there. So I
effectively move the arrow there. Then
lastly, I update ptr to point to the
value of this next field which is null.
Which means effectively pointer itself
is null. Which means the for loop
cleverly
stops now because I was supposed to do
this whole loop while pointer is not
null but pointer is now null. And just
as an aside, if you prefer the semantics
of a for loop, there's nothing new here
per se. I can do this exact same thing
using a for loop simply as follows. And
it's a little tighter to implement as
follows. I can say for instead of int i
equals z in that old approach. I can
actually use pointers in a for loop like
this. For node star pointer equals the
start of the list. Keep doing something
so long as pointer does not equal null.
And on each iteration of this loop,
update the pointer to equal whatever the
pointer's own next field is. And then
inside of this for loop print out using
percent i back slashn the current
pointers number field semicolon. So here
is where again we see the equivalence of
for loops and while loops. What you can
do with one you can do with the other.
This is a little more elegant in that
you can express a whole lot of logic in
one line of the for loop. Frankly I do
think the first version is nonetheless
more readable. So let me undo undo undo
undo everything I just did. On the
courses website you'll see both of these
versions. This one's a little more
pedantic as to what it's doing step by
step. Okay, that two was a lot. Let me
pause here to see if there are any
questions.
And if you're feeling like that fire
hose like this is why we transition to
Python where all of this now gets swept
under the rug but is still happening
just not by us in a week. Questions?
Yeah.
Yeah, really good question. So we I I
here I've been preaching like we don't
want to lose memory. We don't want to
leak memory. And here I am fairly
extravagantly now spending twice as much
memory to maintain this data structure.
That's going to be among the themes with
all of the data structures we talk
about. If we want to gain some benefit
like dynamic growth and shrinking of the
data structure, you got to give me
something. And what you've got to give
me in this case is the ability to use
more space. Um, in a bit today and after
break in particular, we're going to
decide we'd really like these algorithms
to be faster. Well, that's fine, but
you're going to have to give me
something in return. You're going to
have to spend more space to make the
code faster. And so time and space and
financial cost and human time and any
number of other resources are all things
that you need to evaluate as a
programmer or a manager and decide which
is least andor most important to you.
And right now I don't care about space
as much as I care about the dynamism
that I'm trying to solve first. Other
questions on here? Yeah.
>> Yes. Why am I using pointer instead of
n? I Well, yes, I could reuse n at this
point. I deliberately chose to use
pointer for two reasons. One, I'm using
it for different reasons here. Um, two,
it's not necessarily the best idea to
use one variable here for a specific
purpose and then reuse the name down
here besides it's out of scope at this
point anyway. Um, so it just makes me
feel better that I have different
variables doing different things, but it
would not break if I did it your way.
Other questions?
Yeah. And back
>> are pointers temporary? Not necessarily.
Like the linked list we are building up
in memory exists because we are using
pointers to build this data structure
and to keep it intact for as long as the
program is running. My temporary
variables n and pointer ptr in this case
those are ephemeral and I'm only using
them to kind of stitch things together
temporarily.
A good question. All right. So let's now
motivate why we're spending so much time
sort of stitching these things together
so carefully. Well, here's our little
cheat sheet of common but not exhaustive
running times. Let's consider what the
running time is for some fairly basic
operations like inserting a number into
a linked list, maybe searching for a
number in a link list or traversing it
uh and also deleting ultimately numbers
in a linked list. So here is my list
initially completely empty. And suppose
I go ahead and insert the one, then I
insert the two, then I insert the three
using code like we just wrote. I love
this approach because even though it
looks a little scary at first, this is
probably the simplest way to implement
insertion into a linked list. Why?
Because I'm just constantly prepending
the next element. Prepending,
prepending, which means all of my hard
work is just here at the beginning of
the list. So even if this thing has a
thousand elements in it, I'm only
manipulating some pointers all the way
over here pictorially at the left, which
means it's pretty darn fast. So given
that definition in this picture, what
would you say the big O running time is
of insertion into a link list when using
my current implementation?
>> Big O of one. Why? Well, it's not
literally one step, but it is a constant
number of steps because if we literally
counted the lines of code I was
executing, it's a a few steps to sort of
point one thing up here, point the other
thing down here, then update the third,
and boom, we're done. In particular,
what my current code does not care about
is the whole length of this list. Why?
Because I'm never traversing the whole
thing for the insertion part. I am
obviously for the printing part, but for
the insertion, I'm just prepending again
and again. The downside though of this
approach is that the whole darn thing is
coming out backwards. I'm not doing
anything with regard to the ordering of
these elements, which means what's the
running time of search going to be? For
instance, if I tell you search for like
the number one, find it for me.
What's the running time going to be
there in big O?
Big O of yeah, big O of N because in the
worst case, it's going to be all the way
at the end. And we've seen this scenario
before. So, it's big O of N for
searching. It's definitely big O of N
for traversing or printing. But that
goes without saying. If you want to
print every element, obviously you have
to touch every one of the N elements.
But what about deletion? Suppose I want
to delete an element. That's going to be
in big O of
>> N.
>> Also N. Why? Because again in the worst
case it could be all the way at the end.
So only insertion as currently
implemented is bigo of one because we
are exercising full control over where
the new elements go irrespective of what
the actual values are. So things could
escalate quickly here if we do actually
want to start keeping things say in
sorted order because we can no longer
just naively plop things at the very
beginning of the list. I think we need
to start being a little more careful as
to where we put things. So in fact, even
though we're doing okay on insert right
now, we still have big O of N for the
searching and for the deletion, which we
won't do in code, um as well as of
course for traversal. So how else might
we go about building this list? Well,
let me propose that we could maybe
append to the end of the list. Let's try
that and see if it gets us anywhere
better. So here's my list initially,
completely empty, aka null. I go ahead
and insert the number one as before, but
now in this algorithm I'm going to
insert the number two and the number
three. So this is great because now by
chance it ended up beautifully in order.
But that's because I chose the numbers 1
2 3. But we'll come back to that detail.
Let's consider now what the running time
is of this algorithm of insertion using
appending to the list. What's the big O
not big O running time of insertion now?
Big O of N. So it's sort of strictly
worse because now it's always going at
the end. Now I could be a little smart
about it. I could just allocate another
pointer and just always have another
pointer pointing at the end of the list
just as I have a pointer pointing to the
start of the list. That's totally fine
if you're willing to spend one more
pointer which is a drop in the bucket. A
legitimate solution. But where I'd like
to go with this is let's maintain sorted
order no matter the order in which the
numbers are inserted. Whether it's 1 2 3
3 2 1 213 312 whatever order the human
types in the numbers I want to build the
structure out such that they always end
up in sorted order just so that my
contacts in my iPhone or my Android
phone for instance are sorted as
intended. So how do we go about doing
that? Well here we're still dealing with
some big O. Let's try this. Here's my
list initially empty. Now we the user
inserts person number two first. So it
ends up there. Then they insert number
one. I'd like it to go there. person
number four, it goes over there. And
then person number three, it ends up
here. Even though it's sort of obvious
with a piece of paper and pencil how to
stitch this together, this is now an
annoying number of logical steps because
there are so many opportunities where I
could screw up and orphan one or more of
these nodes. But let's consider the
scenarios that might we encount we might
encounter. Maybe we get lucky and it's
like an empty list and we just have to
insert one new node. That is trivial.
We've done that already. The two was
super easy to implement. The one could
be really easy to implement too because
that involves the prepending scenario
and we've seen that prepending is super
simple. So there's only two other
scenarios to consider appending if it's
a really big number and ends up at the
end and we've talked about but haven't
seen code for that. The annoying one I
dare say is going to be when the new
number belongs in the middle. But I
propose to think through it this way
because now you just have four problems
to solve not just one massive illdefined
problem. You've got scenarios in which
you want to insert a new node into an
empty list. you want to prepend the new
node into the beginning of the list,
append it to the end of the list or
somewhere in the middle. So that's like
four blocks of code in my program. I can
now sort of take the proverbial baby
steps and implement this bit by bit. And
to do this, let me propose that in a
moment I'll switch over to VS Code, but
uh sort of Julia Child style, I'm going
to open up a pre-made version of the
program that actually gives us a working
solution, albeit initially with some
bugs. So here we have out of the oven
this version of list C at the top of the
file I've got my same includes as before
I've got my same structure as before
here I've again got in main void I've
got the beginning of my list here
setting it equal to null and then for
the sake of discussion I'm going to
insert three values for this example 1 2
and three by allocating enough room for
a node setting it equal to n then I'm
going to make sure a sanity check that n
is not null and then I'm going to
populate this with the human's first
choice of values. So, let me scroll
down. But as such, there's nothing too
new just yet.
Here we have the lines of code in which
I'm getting an int from the user,
setting next equal to null, and then I'm
prepending no matter what per our
earlier version that we did on the fly
this new node to the list and then
updating the list to point to it. And
then down here, I'm printing the number.
So, this is where we left off, but this
is a pre-made version that's nicely
commented. It's on the courses website
for reference. What I'm not doing now is
intelligently prepending, appending, or
plopping the code in the middle. So, how
do we do that? Let's take a look at this
version of the code. So, everything thus
far is the same. And if I scroll down
besides the new comments, you'll see
that now I'm starting to make some
decisions after I have allocated the new
node and populated its number and next
field. As an aside, I don't strictly
need to initialize the next field to
null because eventually, as we've done
in every past example, I've updated that
next field anyway. However, because this
one might now end up at the end of the
list, and I just want to program
defensively, initializing pointers to
null before you're ready to assign their
value is a good thing in general. So,
here's the first of the questions I'm
going to ask myself. If the list into
which I am inserting this new node is
empty, so it's the beginning of the
story. Super easy. Just set the list
equal to the address of that new node,
and we're done. That's what happened
when I inserted a bit ago the number two
for the very first time. So indeed what
has just happened here is that now the
list previously empty contains only a
node containing two. However, thereafter
there was another scenario. So when we
moved on in our story and added the
number one to the list, well that
happened to end up at the beginning but
it could also end up at the end or in
the middle. So let's break down those
scenarios here too. So here if it is not
the case that the list is empty in that
if condition we're going to end up here
now in the else. What do I want to do
here? Well let's go ahead and for now in
this simplified version append it to the
end of the list so we can see that code.
How do I do this? Well I'm using a for
loop much like the one I had before
which just allows me to traverse the
existing list whether it has one node or
many. And I'm gonna ask a question. If
following the current nodes pointer
field, next field leads me to null, aka
the end of the list. Okay, let's go
ahead and update the end of the list to
actually equal the new node. So in other
words, if I'm sort of following
following following all of the arrows
and I reach a node whose next field is
null, no problem. Update that next field
to point to the new node I want to
insert. Irrespective of the values, I
just want to append this node. no matter
what. And then I want to break out of
the code. Then at the bottom of this
version of the program, it's all quite
the same, printing out the numbers using
the for loop version of my code from
before instead of the while loop, but
they're equivalent. But what I did do in
advance in baking this version of the
program is also go through the motions
of freeing every one of the nodes
afterward, but we'll come back to that.
So this version of the code, just to be
clear, only appends nodes to the list.
It's still not treating things in order.
But we've now seen two of the scenarios
plucked off. The list is empty or it has
numbers and we want to put something at
the end. So let me propose now that I
take out of uh our distribution code
another version of this program that
does that and a bit more. I'm going to
go ahead and open up in just a moment a
new and improved version of list.c. And
now it looks almost the same at the top.
Scrolling down. Scrolling down.
Scrolling down, here's some now familiar
code. If the list is empty, do that
simple thing as before and just prepend
it. Uh rather just set it equal to the
list. But here is now where we're adding
some inequality. So if the number in
question belongs at the beginning of the
list. So if the number in the new node n
is less than the number in the current
list which is presumed to be the first
node at the moment then go ahead and
update the new node's next field to
point at the existing list and then
update the list to point at this new
node thereby giving us from two in the
list to one and two in the list. To be
clear, if I go back to VS Code here,
what's happened here is because one is
less than two, of course, I'm going to
update the new nodes next field to point
to the list. What does this mean? Well,
the new node at this point in the story
is the new node for the number one
because that's the second thing we're
inserting. I'm going to update its next
field to be whatever the list a moment
ago was already pointing at. So this is
the after effect but a moment ago list
was pointing at only the two. So now the
next field of the one points at the two
and then lastly here in this line I
update the list pointer to be the
address of that new node. And here's
where I'll wave my hand a little bit
today because it starts to escalate
quickly. It's useful and it might very
well be useful for problem set five in
particular, but I think more healthily
reviewed step by step at a slower pace.
Here is where I'm asking myself, all
right, if it's not the only element in
the list and it doesn't belong at the
beginning of the list, well, it belongs
somewhere later in the list, which gives
me two final scenarios. Let's figure out
which scenario we're in. Let's use this
for loop to iterate over all of the as
as many of the nodes in the list as we
need to. If we get all the way to the
end, because our pointer variable now
equals null, it's like following the
arrows, following the arrows, and maybe
we're trying to insert the number five.
I've already hit the number four. I've
hit null. five belongs at the end. So
here we have our promised append code
which is exactly the same as before but
now I'm doing it conditionally if I've
indeed found my way to the end of the
list. And then lastly, let me scroll
down just a little bit. If it's not the
case that the list is empty and it's not
the case that the new node belongs at
the beginning and it's not the case that
the new node belongs at the end, I'm
just somewhere in the middle of the list
because the new number I'm inserting is
less than the one I'm looking at here.
And it's okay to use two arrows, but
I'll wave my hands at that for now.
These three lines, two pointer
manipulations and a break is what's
going to stitch together that three in
between the two and the four. And let me
propose for lecture sake, take this on
faith that this collectively does stitch
things together properly. But I do think
as you'll see in problem set five, it's
a much better exercise to think through
a little more carefully step by step
because there's just a lot of
fine-tuning of these pointers together
and the order of operations does matter.
But at the very end of this program,
notice this is kind of mindless even
though the syntax is undoubtedly less
familiar. Here is how just like
traversing the whole list to print it
out, we can similarly do one more pass
over the linked list and free every one
of the nodes. But notice it's not quite
as simple as just saying free the whole
list. Free is not that smart. Maloc is
not that smart. And even though you have
called maloc one, two, three times, you
have to really call free. You have to
call free one, two, three times. You
can't just pass at the beginning of the
link list and say you figure out what to
delete cuz it has no idea what a linked
list is or what your data structure
actually is. So the reason that this
loop is a little complicated is that
what I'm doing with these three lines is
essentially traversing my list
and making sure that I have a pointer
that when I'm ready to delete the three,
the one, I have a pointer pointing at
the two and then I free the one. I
update my pointer to point at the three
and then I delete the two. I update my
pointer to point at the four, then I
delete the three, and then I delete the
four. So, there's a bit of trickery
involved in making sure you don't orphan
things step by step.
Okay, that was a lot. Let me pause here
to see if there are in fact any
questions, even though we're
deliberately waving our hands at some of
those details.
Questions on this? Now, let me add one
final flourish. If we were to really
quibble over this, I mean, my god, we're
up to 80 lines of code already just to
implement the numbers one, two, three,
four. But there are some subtle bugs in
here at the moment. So, for instance,
suppose that something goes wrong with
maloc inside of this for loop here. And
suppose that it's not your first
iteration, something goes wrong on maybe
the second or the third iteration. Why
is this error check suddenly bad as I've
implemented it?
Yeah,
I didn't free the memory from the
previous iteration. So this is where
like oh like memory management starts to
get really annoying because if you do
want to practice what I've been
preaching which is free any memory
you've allocated and you've already
allocated one maybe two nodes because
maloc is again failing maybe at the last
iteration here you have to somehow go
back and free all of that and that's
fine like we have code at the bottom of
my file here which could traverse
through the existing list and just free
it all. So I could just copy paste that
code, put it into my if condition and
then run that code too to delete the
whole list. But at this point if you're
copying and pasting you're probably
doing something wrong. And so let me
propose as a final version of this just
for your reference later in the ninth
and final in version nine of this file
here zero indexed what we have. Give me
one second to just make a quick copy and
copy it over in list 9. see our last
version of this. We have the following
whereby now in my function uh in my main
function I have the exact same code as
before but I've taken the liberty of
implementing an unload function so that
I can call it here as well as at the
bottom of this main function. So I can
unload it here or unload the list there.
And all I've done now is in good form in
terms of design just implement the
notion of deleting a linked list in its
own function. So I could call it any
number of times from any number of
places. But just so you've seen how I
might do that there. All right. So let's
ask the question after all of this. What
is the running time of inserting into a
linked list?
Big O of
say a little big O of
>> N. Damn it. Like that's no better. All
right. What's the running time of
searching a link list?
>> Big O of N. Damn it. Uh what's the
running time of deleting from a link
list?
>> Big O of N. So like everything is
literally big O of N. So there's the
price we've suddenly paid. We have an
hour after we started with arrays gotten
to the point where we can dynamically
grow in a linked list and I dare say
even though we've not done it and won't
do it today, shrink the link list by
freeing things that we don't need. So we
have the dynamism and we can make more
efficient use of memory even if it's
very fragmented and there's a few bytes
here a few bytes there but we've paid
this price because with arrays recall
even our phone book example we at least
had binary search the running time for
which was big O of log so my god not
only are we spending more space the darn
thing is slower surely this is not how
our phone contacts are implemented
surely this is not how stacks and cues
are always implemented and indeed it's
not this is just going to be a stepping
stone to now doing a sort of mashup of
data structures whereby we take the best
features of arrays, the best features of
link list, mash them together to get new
and improved data structures. But for
that, we're going to have to have some
cookies first and we'll come back in 10
minutes. Cookies are now served.
All right, we are back. So, let's recap
how we got here and why. So, we started
with our old friends arrays, which we
introduced in week two. And recall that
the whole appeal of arrays was that one,
as all things go, like relatively
simple, certainly now in retrospect, but
more importantly, they were really darn
fast. Like arrays in so far as they are
stored backtoback contiguous in memory
means that we could do very simple
arithmetic recall to like fi figure out
the length of it and then divide by two
to get the middle divide by two again to
get the middle of the middle and so
forth. And even though we might have to
deal with a little bit of rounding
arrays lent themselves to binary search
and thus logarithmic time so big O of
login. But today I claim that the
downside of arrays is that you have to
decide in advance how big you want it to
be. And if you guess wrong and it's too
small how much uh memory you ask for,
you then have to reallocate memory. And
that's fine. It's solvable with maloc or
realloclock. But it's going to take some
amount of time to copy all of the old
memory into the new memory. Whether you
do it with a for loop or mal realloclock
does it for you. Meanwhile, we only did
it with like three values, maybe four.
But imagine it being 3 million values
that you now need to allocate more space
for. You're going to waste a huge amount
of time copying 3 million values from
the old location to the new. And so
that's just generally not very
appealing. And so that motivated our
whole discussion of linked lists whereby
now we can create a more dynamic data
structure whereby we only allocate
memory as we need it. So we don't have
to worry about underestimating or
overestimating and therefore wasting
memory. We can just go bit by bit for
each new value. We allocate another
node, another chunk of memory, and the
thing just grows and grows and grows.
But as we saw just before break, the
downside is even though we're avoiding
the inefficiency of having to move stuff
around in memory, once allocated, the
nodes can stay where they are and we
just update our pointers. All of our
running times for searching, inserting
new elements, deleting old elements
would seem to be big O of N. But why was
that? Well, in the context of a linked
list, recall that it might look a little
something like this, whereby we have a
pointer called list pointing to maybe
four values like this. And suppose that
we do want to uh search for a value.
Now, it's nice because in our latest
version of this linked list, it was
sorted from smallest to largest. And
that was always a precondition of doing
binary search. But even though it's
obvious to our human eyes where the
middle is, it's like roughly over there.
How is the computer going to figure that
out? is how is your code that you write?
Well, unfortunately, the way we've
stitched a link list together with these
pointers is if you want to find the
middle, you can, but you got to start at
the beginning, traverse the whole thing
to figure out how long it is, then do it
again, and stop halfway through once you
know what the halfway point roughly is.
Then, if you want to search the middle
of the middle, you've essentially got to
do that whole process again. And so, now
just to use binary search, you need to
spend big O of N steps just to even find
the middle. Now, if your mind is kind of
spinning and you're like, well, maybe I
could just kind of cheat and use a
pointer to always point to the middle of
the list. Totally fine. You can spend in
some additional space to remember the be
the middle of the list, the end of the
list. But where does that stop? What if
with binary search, you go not just to
the middle, but the middle of the
middle, the middle of the middle of the
middle, the middle? Are you going to
keep around a pointer to every element?
Because if you do, you're essentially
back to an array if you've got one
location for every other location. So it
just kind of devolves into a mess. Even
though there's some minor optimizations
we could in fact make. In fact, we
didn't talk about it yet. But one common
alternative to a singly linked list,
which ours is, it's linked with a single
pointer from node to node. Uh computer
scientists also like to talk about
doubly linked lists where there's arrows
going both directions, which actually
would have simplified some of the last
code that we looked at because I don't
have to look ahead to figure out what I
want to free or what and where I want to
insert some value. But that too doesn't
fundamentally change the speed. It just
makes your code a little easier to
write. So in short, with link list, we
get dynamism. We can now grow and shrink
things without wasting time copying. But
we've lost hold of our binary search.
And that was very appealing as far back
as week zero when we wanted to do
something quite quickly. So let's see if
we can't make some mashups now. take
some arrays, take some link lists,
literally mash them together into a sort
of Frankenstein data structure and see
if we can't get some of the speed of
arrays, but the dynamism of linked
lists. And so I give you trees. If you
think about in your mind's eye what a
family tree looks like where you
typically have some parents and then
some children and some grandchildren and
so forth. It's this sort of treelike
structure even though by convention it's
drawn top down instead of bottom up like
trees in the real world. But the top of
that family tree uh we're going to call
the root of the tree. It just so happens
to indeed grow down. But a tree is a
very common data structure and it's
interesting visav arrays and link lists
in that it's the first of our
two-dimensional data structures. An
array is effectively just a single
dimension along from left to right. A
link list is essentially the same. Even
though in reality it might be up, down,
left, and right in memory. It's still
just one thing stitched together in a
single dimension. A tree adds now a
second dimension. And specifically
useful for us is what we're going to
call binary search trees, which is
spoiler going to give us back the
ability to use binary search. But we're
going to store the data a little more
cleverly than in arrays alone. Instead
of storing our data in one dimension in
a binary search tree, we're going to
store in effect in two different
dimensions. And that's going to gain us
some speed. So here for instance is an
array of seven numbers as we might have
seen it back in week uh two when we
first introduced arrays. Let me draw our
attention to the middle element and then
to the middle of the middles and then
the middles of the middles of the
middles just by color coding them
slightly differently. If I were to run
binary search on these numbers or the
lockers that we had on the stage a few
weeks back, I would jump to the middle
then the middle of the middle and so
forth. The catch though is that
implementing it as an array, it's not
going to be very easy to add new values.
Why? Because if I want to add the number
eight or nine or 10, I might get lucky
and there might be room in memory here,
but I might get unlucky. In which case
then we got to start jumping through
those hoops of maloc or realloclock and
all and and copying all of this memory
to a new location which is doable. We
solved it in code but it's going to be
slow for larger data sets. So can we
avoid that? Well maybe I deliberately
colorcoded things like this because let
me propose that instead of storing these
seven values in an array, let's store
them in a family treel like structure
like this where I just kind of exploded
them vertically on the y-axis here. So
now the middle element, the fours at the
top of this tree. The four, the two and
the six which were the middle elements
after the middle are going to be to the
left and right of the four. And then
these leaf nodes so to speak. We borrow
a lot of vernacular from the world of
actual trees. These are leaves in the
sense that they themselves have no
children. They're at the edge of the
data structure are going to be the
middles of the middles of the middles.
But all of the data is still there. I've
just exploded it from one to two
dimensions. And let me propose that now
that we have this technique of using
pointers which we use with CC code but
you can depict them pictorially with
arrows. Let me propose that we stitch
together these seven values in memory
using a bunch of pointers whereby now
each of these nodes drawn as a single uh
square for simplicity is going to have
not only an integer associated with it
and not just one pointer but per these
arrows as many as two arrows associated
with it. So our nodes are about to go
from data structures with two things, a
number and a pointer to three things, a
number and two pointers for the left and
right child respectively. And I dare say
now that we have a two-dimensional tree
data structure, consider how you might
find a number therein. Suppose I'm
searching for the number five. Well, I
start at the root of the data structure.
And even though our human eyes obviously
know where we're going, notice what's
important about this binary search tree.
If I go to the root of the no of the
tree, I see the four. Four is obviously
less than five. What does this mean?
This means I can divide and conquer the
problem right off the bat. I know that
five is going to be to the right of this
node, which means effectively, if you
think in your mind's eye about snipping
the branch there, I have just haved the
problem essentially like dividing the
phone book in half. Why? Because I don't
even waste time looking at this subtree,
the left child of the four element.
Meanwhile, if I go from the root to its
right child here, I see the number six.
Five, of course, is less than six. So,
this is effectively like snipping off
that child because I don't need to go
further there because I know a smaller
element is going to be in this
direction. And that's the key property
of a binary search tree. It's not just a
family tree with numbers all over the
place. They follow a certain pattern.
every element is going to be greater
than its left child and less than its
right child assuming you don't have
identical values and that property is
actually a recursive one to borrow
terminology from a couple of weeks back
recall that a recursive function is one
that calls itself a recursive data
structure like the pyramid in Mario is a
data structure that can be defined in
terms of itself well binary search tree
is a recursive property in so far as if
it applies to this node it also applies
to this node case point two is greater
than one but it's also less than three.
It's true over here. Six is greater than
five but less than seven. And it's
technically true of the leaf nodes
because the definition is at least not
violated there because they don't even
have children themselves. So this is a
binary search tree because of that
pattern. So this then invites the
question, well how long does it take us
to search for a value in a binary search
tree? Well, if the number is five, it's
going to take me one two steps. But if
there's n elements here, can someone
want to generalize that either
mathematically or just instinctively?
Big O of
log n. And even if you're not quite sure
how the math works out, anytime you take
a data set and you have it, have it have
it, we're talking about log base 2 of n
again. And indeed, that's going to
describe the height of this tree. The
height of this tree is essentially log
base 2 of n because if n is seven, it's
going to give me uh essentially two when
we round appropriately. If we round up,
if we've got eight elements, log base 2
of 8 2 the 3r. So that means three. So 1
2 3. It kind of works out even if I'm
doing that a bit quickly. The height of
this tree is log base 2 of n aka bigo of
login. How long does it take to insert?
I think it's going to take login because
I can insert over here or over here or
over here depending on where the number
goes. Uh how long does it take to
delete? I'll claim it's going to take
about the same. So wow, we're back in
business. I've got now the ability to
grow and shrink my data structure
because if I want to insert the number
eight, it's going to go right there. If
I want to insert the number like 5.5, I
I can see where I would put it. It's
going to be easy to add new nodes by
just updating the pointers without
copying everything in memory like we had
to for arrays. But there is a downside
here. I got to concede something. What
am I what price am I paying? What's the
trade-off here to gain that dynamism and
that speed? But
>> each individual node takes more memory.
>> Yeah, I'm literally using three times as
much memory now because even though it's
not depicted here explicitly, each of
these squares represents an integer and
a pointer and another pointer. So that's
like 16, that's like 20 bytes at this
point of memory instead of just four
bytes for each of the integers in an
array. Nowadays though, space is pretty
cheap. We all have very large Dropbox
folders, iCloud folders, and the like.
So it's not really a big deal to use
that many more bytes. Certainly not a
big deal for seven numbers, but if it's
seven million numbers, maybe this isn't
the best data structure to use, even if
speed is important. You got to decide
ultimately based on your actual use case
what matters more. So in short, a binary
search tree you can kind of think of as
an amalgam of or rather a variant of a
linked list except that every node has
as many as two pointers instead of one,
which is what gives us now this this
second dimension. And in fact, this
translates pretty nicely to code. In
fact, if we consider how we implemented
in a linked list a node, recall that it
looked like this where you got a number
in each node and a pointer to the next
element in the linked list. Well, I
think for a binary search tree, we can
sort of borrow this as inspiration, make
a little more room because we need two
pointers instead of one. And I'm just
going to call the left child the left
pointer and the right pointer. But here
is the three times as much space give or
take because I now have three elements
associated. Two pieces of metadata and
one piece of data that I actually care
about to stitch this thing here
together. All right. Well, if this is
the data structure there, how could I
implement this in code? Well, here's
where recursion again comes into play.
The fact that a binary search tree is
recursive in nature in that what you say
about this node about it being greater
than the left child and less than the
right child can be said of this node and
this node and this node and this node.
You can leverage that beautifully in
code like this. So suppose I'm
implementing a search function in C
whose purpose in life is just to say yes
or no true or false the number you're
looking for is in this tree which might
be a useful thing to uh check uh in a in
an algorithm. Search is going to take
two arguments. I propose the number
you're searching for and a pointer to
the tree. That is the root of the tree
initially. So how do you actually
traverse this thing in C code? Well, we
can pluck off the the easy case first.
The base case if the tree itself is
null. Like if you hand me nothing, I'll
give you your answer right now. False.
Like there's no number here if the tree
is empty. So that's easy. Otherwise, if
the number you're looking for is less
than the number in the current node. So
tree is what's passed in a pointer to
the root. So if you follow the arrow,
you can get inside of that value and see
its number. If the number you're looking
for is less than that, okay, you want to
what? Snip off the right tree and dive
down the left subree. So you search the
trees left child for the same number.
Else, if the number you're looking for
is greater than that number, you search
for the trees right child for that same
number. And the fourth and final
scenario is what? Well, if the number
you're looking for equals the number in
the current node, you got it. Return
true. And if you're uh recall some of
our past design discussions, this is
sort of a waste of everyone's time to
ask this question explicitly. Let me
tighten this up design-wise because
there's only four possible scenarios.
Either there's nothing there, it's to
the left, it's to the right, or you
found it. It's right there. So whether
or not you agree at this point in your
programming career, like there is a
beauty to this code that most
programmers would claim is here and that
it's so relatively elegant whereby
you've defined what the function is.
You've got this base case which is
arguably one of the clunkiest parts. But
the fact that you can just check a value
here and then traverse the exact same
structure but a subset of it by
traversing the left subree or the right
subree is like a beautiful application
of recursion. And it allows you to uh
search for this thing no matter where it
is in the computer's memory. Questions
then on this idea of a binary search
tree or this actual code thereof.
>> And if you don't ask the question, if
the number is not there,
>> uh, nope. If the number is not there, we
recall. So, if we get all the way to the
bottom of the tree such that now I'm at
one of those leaf nodes and that's not
the number I'm looking for, such that
there's no left child left, no right
child left, this conditional is going to
kick in and I'm going to return false.
But if I find it along the way, whether
it's at the top of the tree or somewhere
in the middle or among the leaves, I
will eventually return true.
Good question. And to be clear, even
though I'm calling this a tree, that's
true certainly for the first time I call
this function because I'm passing in a
pointer to the whole tree structure. But
if you think about it, what's the left
subree and the right subree? It's just a
smaller tree. It's like a baby tree
that's attached to this parent node, so
to speak. So it's perfectly reasonable
to just call the search function with
that child because it in turn has a
whole subree below it or the right child
which has the whole subree below it
instead. All right. So I like this
direction. We've now kind of improved
upon link list. We've gained back some
of our performance because we can now
find something with big O of log and
time. I don't love the fact that I'm
using three times as much memory
roughly. That feels like kind of a high
price to pay just to speed things back
up. But let's consider whether or not
this thing is actually going to work as
the data structure gets bigger and
bigger as well. So it looks beautiful
here as written and that's deliberate
because I drew the picture like this and
it's got seven elements in it. But how
did we get to seven elements? Let's
start from the beginning. Suppose that
the tree is initially empty and suppose
that a human using get int or some other
technique inserts the first element into
the list like the number two and the
goal is to maintain the binary search
tree property which means you got to
have it greater than left child less
than the right child. So suppose the
human using get int or some other
technique next gives me the number one
no big deal I plop it right there as the
left child suppose they give me the
number three next no big deal it goes
right there I have very deliberately
manipulated this story to work out
beautifully such that the tree is
smaller but it's still a binary search
tree and nicely balanced so to speak but
what if the user for whatever reason
just gives me a more perverse sequence
of inputs like the worst case scenario
to give me three elements and suppose
they give me one first Okay, that's the
root. Then they give me two. Okay,
that's cool. That's like the right
child. But what if they then give me
three? Well, to maintain that binary
search property, the three has to go
over here. Suppose perversely then they
didn't give me four, then five, then
six. Imagine in your mind's eye where
this story is going. What have I
accidentally created in memory? Then a
link list, which is like bad for all the
reasons we discussed before the break
because even though we're getting the
dynamism, it's devolving into big O of
N. So I've kind of manipulated the
situation here with their original
example with seven seven elements and
then three elements by making sure that
they were inserted in just the right
order. Because unless you are clever
about how you build the tree in memory,
it could very well devolve from a tree
in two dimensions into actually a linked
list in one dimension. And now this is
just a long and stringy tree that does
not violate the binary search tree
definition, but it is surely not
balanced in this case. Now, as an aside,
if you take higher level languages and
data structures and algorithms, there's
many different alternatives to binary
search trees that actually have baked
into the algorithms a little bit of
rejiggering of the structure so that
really as soon as you insert this three,
you spend a little bit more time and
clean the situation up. And essentially
what you do is like pivot the thing
around this way so that two becomes the
new route and then one hangs off of it
and three still hangs off of it. So with
each insertion or deletion, you
rebalance the tree as needed, which does
cost you a bit more time, but it avoids
the thing devolving into big O of N
again. And we won't do that in code. So
this is recoverable, but not if you
implement it naively, as I did, at least
verbally in this story. All right. Well,
can we do better than that? Well, why
might we want to? Well, at this point in
the story, it certainly could devolve
into big O of N, and that's not great.
Certainly for large data sets, it's nice
that we're back to login. At least if
you take on faith that we could kind of
rebalance this thing as needed and
maintain a logarithmic height for it.
But really the holy grail of data
structures is to achieve something that
is big O of one like constant time
whereby no matter how many numbers or
names or sweaters are in the data
structure it will take just one step or
maybe three steps or even 100 steps but
a number of steps that is completely
independent of how many actual pieces of
data are in the data structure. That is
to say over time it doesn't get any
slower even if you've got tens,
hundreds, thousands, millions of
elements in there already. So how do we
gain something like big O of one
constant time the appeal of which is
reminiscent of our early picture from
week one like this was our early
algorithm for finding someone in a phone
book or counting students in the room
something linear literally straight
lines. This was the logarithmic curve
which especially as you zoom out starts
to get very very appealing time-wise.
Something that's constant time looks
even prettier. It is a straight line at
like the one step mark or the twostep
marks whatever the constant number of
step marks is. And even though
logarithmic will still grow in
perpetuity, constant time by definition
never changes. And this is what we'd
really like. So when you're searching
for someone in your phone, you're
searching for something on Google,
you're asking a question of chatbt, you
get an answer like that in constant time
independent of how much data is actually
in there. Well, let's see how we can do
this. To do this, we're going to at
least need a new building block, a term
of art known as hashing. Hashing sort of
formally takes an infinite domain of
values and maps it to a finite range of
values. So from high school math class,
domain is the input, range is the
output. So an infinite domain to a
finite range is the goal here of
hashing. And we might see this actually
in the real world when you're playing,
you know, games or whatnot or you're
cleaning up after a game like here is
here are some super jumbo playing cards
that we got online. And suppose that you
want to just get these into sorted
order. Um you could do this very
painstakingly. There's 52 cards here.
You can kind of lay them all out and
start sifting through them and put the
two over here and the four over here and
the hearts and the clubs and so forth.
Or you can start to look at the cards
and bucketize them first to take a 52-
size problem and maybe uh shrink it down
into four 13 byt problem. So here for
instance is where uh the first diamond
might go, the club here, spade over
here, diamond over here. And I can kind
of just do this again and again
bucketizing literally all of these
values so that I've got a very simple
heristic that allows me to move the
cards into these buckets each of which
is going to have a subset of the values
and then I've got smaller problems I can
deal with. So dot dot dot assume that I
bucketize all 52 of these values. Then
I've just got four problems remaining.
And I dare say it's a little easier then
because they're all of the same suit and
so I can pretty easily sort it from ace
to king or whatnot because those are
effectively just numbers at that point.
So hashing refers to again taking values
from an infinite range. In this case, it
it can be finite and it is in this case.
But if you were doing it more generally
with numbers, you just have to map it to
a finite range like 1 2 3 4 finite
number of buckets of values at which
point then you can solve the problem a
little differently or a little more
efficiently. So why is this gerine?
Well, I would propose that if we want to
start organizing our data in memory
toward an idealistic goal of achieving
constant time, hashing might be one
ingredient for the solution there too.
And generally, we're going to describe
the process by which you decide what
input goes to what output is namely
what's called a hash function. It's a
mathematical function or a function in
code that takes as input a card from a
deck or maybe a word from a dictionary
and outputs a value that represents the
bucket into which it should go. So in
the case of our contacts app for
instance, of course in the guey of it,
you have all of your friends and family
top to bottom uh alphabetically
presumably you might want to ideally
find someone quite quickly, ideally in
constant time, right? The naive
implementation that Apple or Google
could implement is just use linear
search. Search through all of your
contacts top to bottom and eventually
you will correctly find the person. But
wouldn't it be nice if they instead use
an array and then they can use binary
search and get you the person in
logarithmic time? That's great. But if
you have a lot of friends and family in
there or a much larger data set,
wouldn't it be nice to just jump to the
answer in one step instead of even log
of nst step? So that's our goal. Can we
get close to or actually at constant
time? So with a hash function, we
essentially have our old friend problem
solving here, the inside of which the
algorithm is known as a hash function.
And for instance, if I'm looking at
Mario's number, I might now want to look
for Mario, not top to bottom or not
divide and conquer, jumping around to
the half, the middle of the middle of
the middle. Let me just figure out what
bucket Mario is in. And in the English
alphabet, there's 26 letters of the
alphabet, A through Z, either uppercase
or lowerase. And suppose that I want to
find what bucket Mario is in. Well, much
like these cards and the suits thereof,
wouldn't it make sense that anyone whose
name start with with A goes into the
first bucket and maybe the B's go into
the second bucket and the dot dot dot
Z's go into the last bucket. So, it
stands to reason that if I pass in Mario
to a hash function implemented in C or
some other language, I would like to get
back the number 12 because M is the 13th
letter of the alphabet, but if we start
counting at zero with our buckets, which
are essentially an array, then it's
index location 12 instead of 13.
Similarly, if Luigi is the input, I'd
like to get back the number 11. So, my
hash function somehow takes as input in
this story, a string, and gives me an
integer. I claim there's theoretically
an infinite number of names in the world
in the English language. But there's
only going to be 26 possible answers
from this hash function 0 through 25.
So, that's our infinite domain to our
finite range. Instead of four, it's now
26. All right. So what should we do with
the computer's memory to leverage the
fact that we can very easily bucketize
names based on the first letter of
someone's name? Well, let me propose
that the hash function part of this
arcane as it looks is actually pretty
straightforward. So if you wanted to
translate this idea into C, you can
include uh cype.h, which we've used a
few times to get it access to like
functions like two upper. And this is
just to make sure you can be case
insensitive. Here's my hash function.
It's going to return an int, which is
the goal. Takes a string as input. We'll
call it name. And what does this
function do? Well, it's kind of some
clever asymmetric. It first converts to
uppercase. The first letter of that
person's name. So, if it's in all
lowercase, forces it to uppercase. Why?
Because I want to subtract no matter
what 65 aka the asky value of capital A
from this. And I don't want to screw up
the math. If I'm doing like a lowercase
letter minus a capital, I want capital
minus capital is all. So this will
return to me a number between 0 and 25
inclusive because if it is a letter a
name that starts with a. I'm only
looking at the first letter. I'm
subtracting off a that gives me zero and
I'm going to return zero as a result.
Dot dot dot. If it's z, I'm going to
return 25 instead. Now there's no error
checking in here. If you type in uh
non-English symbols, uh it's going to
break. So let's just assume for
simplicity this is indeed an English
name that's coming in. I can refine this
a little bit. I'm going to propose
moving forward in our final week here of
C, there are some added defenses you can
put in place when writing code. Like if
you know that you're receiving a name as
input, that is you're passing something
in by reference, there's a danger now
per last week, because now the caller of
this function, whoever's using this
function is telling you where to find
Mario and where to find Luigi's name.
The problem with that is that you could
go to that address and actually change
their name in memory. Even if you're not
supposed to, you're supposed to just use
the name. So you can do something like
const which says you should not be able
to change this value even though I'm not
giving you a copy of it by value. I'm
giving you a reference there too.
Another refinement here is that a hash
function for an array as the goal should
return a value that's zero or one or two
on up. Never negative. So we can even
more protectively say it's not just an
int, it's an unsigned int. And we talked
briefly about that last week, albeit in
the context of chars. These are just
like minor improvements that makes your
code arguably better designed because
you're opening yourself up to fewer
possible mistakes or issues. All right,
so with that said, let's now assume that
we've got this kind of function in uh
implemented and we can now use it to
decide what bucket to put these people's
names into. Well, let's give you what
are called hashts, which are sort of the
Swiss army knives of data structures.
the kind of thing that some computer
scientists have been quoted as saying if
they were stuck on a desert island with
only one data structure, this is
probably the one they would want. Why?
It's just really generally useful
because it allows you quite powerfully
to associate keys with values. Which is
to say to come full circle today, hashts
are often how you would implement at a
lower level the thing we began class
with talking about dictionaries,
collections of key value pairs. That
after all is what a phone book is. We
call it, you know, names and numbers,
but it's keys and values. That's what an
actual English dictionary is. The Oxford
English dictionary, it's a bunch of
words and definitions or keys and
values. So useful in general to be able
to associate one piece of data with
another. Argo hashts. So here's how you
might implement in C a hash table. You
want it to be of size 26 for instance.
So 26 buckets from A to Z, hence the 26.
You want this to be an array and that's
fine. This is an array of four buckets.
I'm going to use an array of 26 buckets
because a hasht 2 is going to be an
evolution of our linked list mashed
together with an array. So a hasht in
short is going to be an array with
linked lists as we'll soon see. Here's
the array. 26 pointers to nodes. So I'm
going to give myself an array of
pointers that is going to store
ultimately a whole bunch of person
objects like this. So for instance,
here's a char star name, charst star
number, as we've discussed in the past,
representing a person. These are the
pieces of data I might want to store in
this data structure. However, let's
simplify. Let's not worry about the
phone number because we're not going to
call anyone today. But for a linked list
of persons, I'm going to need to store
let's say the person's name, but also a
pointer to the next such name, to the
next such name, to the next such name.
So again, I'm just deleting number as
being unnecessary detail. But if we're
going to have an array of link lists,
this is our new definition of node for
this part of class whereby it's not for
a tree. It's now for a hash table. And
we'll see this in action now. Here is my
array of size 26. I drew it vertically,
but who cares? These have always been
artist renditions thereof. It just fits
nicely on the screen this way. This is
location zero. This is location 25. So
any A names should end up over here. any
uh Z name should end up down here and so
forth. Let's just generalize this away
as letters of the alphabet for clarity.
That's where all the names are going to
go. So hopefully Mario here, Luigi here,
and everyone else. So what are each of
these squares? They're just pointers to
nodes. Initially, all null, all claim.
But as soon as I insert Mario into this
so-called hash table, I'm not going to
put him literally here. I'm going to
create a new node in memory, put Mario
there, and then stitch it together.
Because if I get another M name, I'm
going to stitch it together and together
and together again. So for instance,
here comes Mario into this data
structure. So this is a pointer to a
person structure. Here's Luigi. And
here's a third character as well, Peach.
That's all working out great. Dot dot
dot. There's a whole bunch of characters
in the Nintendo universe. Here's a lot
of them. Unfortunately, especially if
you're a fan, there's also other names
that do start with M and L and other
letters of the alphabet. So, we're
poised to have what we're going to call
collisions, which is a downside of using
a hash function. If you're going from
something infinite to something finite,
by definition, you're going to have a
heck of a lot of potential collisions
somehow. Multiple M names, multiple L
names, and so forth. So, we've got to
mitigate this somehow. Well, if you meet
someone in the real world whose name
happens to start with M, and you already
are friends with Mario, well, you could
delete Mario from your phone and put
that new person there. But that's kind
of dumb. You could clobber the value,
that is. Or maybe you put the M friend
here. And when that fills up, you put
the M friend here. And then when you
meet someone else whose name starts with
M, you put it here. But then it just
devolves into this mess. At which point
now there's no rhyme or reason as to who
is where. It devolves back into
something linear. If you have to search
the whole darn thing looking for M
friends just because you ran out of
space where you want it. So here's the
beauty of mashing together an array with
a linked list. You hash the name to the
intended location like box 12 here. And
then you just start stringing them
together in a linked list. And hopefully
you don't have too many of those
collisions, but at least now you don't
have to delete or make a mess of the
data structure. So here's another bunch
of names, three starting with L. Here's
a bunch for the other letters of the
alphabet. And it's just a linked it's an
array now of linked lists. This then is
a hash table. So the question to
consider now is this better than an
array? Is this better than a linked
list? Well, I dare say it's better than
a linked list because if it were a
linked list from A to Z, what would be
the running time of searching for
anyone? Well, I'll spoil it. Big O of N.
Because even if it's alphabetically
sorted, you got to start at the
beginning and go all the way through the
list potentially to find someone like
Zelda whose name starts with, of course,
Z. But here we have an array of linked
lists. So what's really the running time
here? It's not quite as bad as n steps
because if you assume a uniform
distribution of names such that the
world of Nintendo maybe has as many M
names as L names as A names as B names,
you could assume that there's a bunch of
chains, a bunch of linked lists here
chained together, but they're all
roughly the same. So maybe you have n
names in your phone book this way, but
there these lists are only of size uh
they're only 126 of that length because
you've got that many names there. So
what's the running time? Well, ideally
we'd move away from link lists with big
O of N and achieve our constant time.
But uh we have these collisions to worry
about here. Just to be clear, we want to
get from big O of N to something
constant time, but we're not going to
get to constant time if we've got
collisions. If we've got three L names
and a few B names and a few A names, we
can't just jump to that location and
find the person we're looking for. So,
what's the fundamental goal? Well, I
think we want to maybe use a smarter
hash function. And here depicted is an
excerpt from a bigger hash table that is
a much bigger array that assumes that
you're not looking at the first letter
of everyone's name, but apparently what
instead the first three letters of the
person's name, which just decreases the
probability of collisions because in
this model, I dare say there's no one
else's name in the Nintendo universe
that starts with L I N. So now Link has
its own location in memory. And
similarly for Luigi, LUI I believe is
unique in the Nintendo universe. So we
don't have a collision. Unfortunately,
while this does seem to eliminate
collisions based on this tiny example,
what's the trade-off
or what's the catch? Yeah,
>> use a lot more memory.
>> This is a lot more memory. I mean, kind
of hinted at the fact that I didn't even
fit most of it on the screen anymore.
Here's L A. Here's L U. But what about
all of the other letters of the alphabet
and the other combinations of dot dot
dot dot dot dot all possibilities.
Moreover, some of these just don't make
much sense. At least in English or in
the Nintendo world, I don't think
there's anyone whose name is going to
start with a aaa or a aab or a a or a a
d or a and so forth. You we're wasting a
huge amount of space to reduce the
probability of collision. So that's
fine. We might get constant time now,
but at what cost? Well, a heck of a lot
more memory. And so this is one of the
tensions when using a hash table is you
want to come up with a good hash
function that's maybe a little more
sophisticated than the first letter but
not so wasteful that you need a crazy
number of buckets and therefore a huge
amount more memory. So really even with
collisions it's not quite as bad as n
steps cuz technically if you have k
buckets where k is like 26 buckets or
four in this case technically if you do
assume that the names are uniformly
distributed over a through z the English
alphabet. Well each of those link lists
is going to be hopefully no bigger than
n / k. So n / 26. But what do we know
about higher order terms when doing big
O notation? Big O of N / K. Yes, it's
faster but asmmptoically that is
theoretically you're still talking about
big O of N. So here's the tension though
like it's absolutely going to be faster.
It will be like 26 times faster than a
linked list but it's still just big O of
N because it's going to take an amount
of time that's still linear in the size
of the data set. So we seem to have
strayed yet again away from our constant
time search. So can we find this holy
grail? Well, we kind of can if you let
me spend just like a lot more space.
There are tries in the world, which
could weirdly is short for retrieval,
even though we don't say retrival, but a
try is a tree made out of arrays, right?
So, at some point, computer scientists
were just like mashing things together
Frankenstein style, like like length
lists and arrays, and now we've got uh
trees and and arrays. You two can mash
something together and come up with your
own. Let's look at what a try actually
is because it is going to get us that
constant time grail. So here is the root
of a try. You can think of each node in
a try as really being an array of values
a through z in the case of an English
problem like we've been playing with
here. And what you do is you treat this
array as being indexed from 0 through 25
or equivalently a through z. And you
treat each of those elements as a
pointer to another such node in the try.
And what you do is implicitly store the
names that you're storing in this data
structure by going to an appropriate
location based on the first letter in
their name and then adding a pointer
that represents the second letter in
their name. Adding a pointer that
represents the third letter of their
name and so forth. So what do I mean by
this? Suppose we want to insert Toad,
one of the characters from the Nintendo
universe first. If we count up where T
is in the alphabet, this uh pointer here
will be changed from null to a pointer
to a new node that represents the second
letter in Toad's name, which is going to
be, of course, O. Then to insert to o A,
we're going to need another node. A is
going to lead me to D. And for p uh
depiction sake, I'm going to draw in
green, even though this would actually
be a boolean or something like that in
memory that indicates that Toad's name
stops here. So in other words, this try
in memory has four nodes. Now each of
those nodes is essentially an array of
size 26. But the word toad is not
actually stored in the data structure
explicitly. There's no charar toad, but
implicitly because the tinter is
non-null, the o pointer is non-null, the
a pointer is non-null, and the dp
pointer is in fact null at this point is
the common technique here. This allows
me to to insert other names from
Nintendo's universe like Toadette
because I can continue from here to go
to the E node to the T- node uh to the
T- node again and an E node which I'll
again mark in green. So you can even
have names that are substrings or
equivalently superstrings of each other
by just having all of these various
breadcrumbs along the way where again a
non-null pointer here to a non-null to a
non-null to a null pointer here
indicates that or it can't be null at
this point. This is where we have to use
a boolean indicates that there is a name
in this data structure that ends here
and there's another name that ends here.
Meanwhile, if there's a third name from
the universe like Tom, same idea, but
eventually we can start reusing some of
these arrays whereby non-null non-null
null or there's a boolean flag here that
says true, a name ends here. Now we're
reusing that same array. So each of the
nodes represents the e letter of the
word or the name you're trying to store
in the data structure. And by playing
around with null and non-null and some
booleans, you can implicitly store names
in this structure. Now, it's way too
uh pictorially difficult to depict lots
and lots of names in this form. So, just
imagine in your mind's eye that there's
dozens, hundreds, thousands of names now
in this data structure, but just more
arrows and more arrays. How do you
actually look someone up in this data
structure? Well, if you want to ask a
question like is Toad in this data
structure or is toad in this data
structure or anyone else, you can simply
start at the root node as we would do
for any tree and you hash on the first
letter of toad's name which gives you
this location and you check is it null?
If not, T is implicitly there. So, you
follow that pointer here and then you
hash the second letter of Toad's name,
an O, and check this pointer. And you
follow that arrow. Then you check the
third you hash on the third letter of
Toad's name A and you follow that arrow.
Then the fourth letter of Toad's name D
and you see ah there's a boolean here
represented in green that means Toad is
in this data structure. And notice
what's subtle here. It doesn't matter if
there's three names in this try or three
million names in this try. How many
steps did it take me to confirm or deny
that Toad is in this try? one, two,
three, four, which is arguably constant.
Even though the names can vary, at some
point there's no Nintendo name longer
than what, like 10 characters, 20
characters, maybe 30. I mean, there's
some reasonable bound that is finite
where there's never going to be a name
longer than that because Nintendo's
never going to come up with a crazy long
name for a game. And so, you effectively
have constant time for looking up to o a
d, Toadette, Tom, Mario, Luigi, Peach,
any of the other names we've looked at.
So this is to say a try allows you to
ask questions like is Toad in this data
set or equivalently what is Toad's phone
number in this data set because if you
assume now that each of these pointers
ultimately is not just a bull saying yes
or no but maybe it's an actual person
structure with a name and a number you
can store even uh data like that your
key value pairs where your names are
your keys and your phone numbers are
your values to make this more clear then
here is a data structure how we might
represent in See each of these nodes.
It's not quite technically an just an
array. It's an array of size 26. We'll
call it children because it represents
the children of that node of type struck
node star. And then here for instance
for simplicity is that person's number.
If we reintroduce numbers and want to
store in this data structure someone's
phone number as well. So using that data
structure and that kind of uh code you
can implement a try using something as
simple as this. Initially your try is
just a pointer to a node. one such uh
strct. We can of course initialize it to
null to make clear that there's no names
in here. But each time we allocate a
node, we can then add another node,
another node, hashing on the first, the
second, the third, the fourth, dot dot
dot, the last character in the person's
name, allocating a node as needed,
flipping that boolean to true or false,
or adding their phone number as a char
star to indicate that we have then found
them. And so of all the data structures
we've looked at today, big O of one is
actually achieved with tries. And yet
curiously for problem set five, you're
not going to implement tries, you're
going to implement hashts, that sort of
Swiss Army knife of data structures that
like every programmer everywhere knows
about. Why? Like why not use tries very
often in practice? Perhaps
you certainly can, but what's the
trade-off perhaps? Yeah,
>> take up too much memory.
>> It's a huge amount of memory. Things
have escalated since the start of class.
We add we started with one int. Then we
added an int and a pointer and int and
two pointers. Now I'm proposing 26
pointers plus a boolean or a data
structure called person. I mean it's
escalating significantly. And the
biggest catch with a try as you might
have imagined with toad and toad and Tom
on the screen there's a huge amount of
wasted memory just as we saw with a hash
function potentially but that can be
reigned in as you'll explore in the
problem set with a try. most of the
pointers in those arrays are just null
and unused and it just tends to result
in you're using way more memory to solve
the problem correctly but in a way that
tends to slow the computer down and just
waste more memory than is useful. That
said, just as we started today, there
are stacks in the real world. There's
cues in the real world. There are even
hashts in the real world which you'll
indeed implement in code for problem set
five. Has anyone here ever had a salad
from a restaurant called Sweet Green in
Harvard Square? also elsewhere in the US
like not one, two, like two of us, three
of us. Okay, so not hard to imagine
going to such a store, getting in a
queue and staring at a shelf like this
because what Sweet Green and similar
restaurants do when you order for pickup
is they hash your salad into a shelf
like this. And so literally in Sweet
Green might you see some wooden shelves
like this. This is the A through E
bucket, the F throughJ bucket, the K
through N bucket and the uh O through Z
bucket whereby if your name like Min
happens to be in one of those ranges,
they will hash my salad and put it here.
But of course, even in the real world,
there are some constraints. And what can
go wrong with this here hasht system?
Someone who's been there maybe what can
go wrong? Imagine like the extreme lots
of values here. Yeah. So there's no more
space, right? So and this has happened
to me in the past especially since green
before adopting this system. And they
used to put the A's here, the B's here,
the C's here, the D's here and so forth.
And then someone at some point realized
that they were very frequently
overflowing the A's to the B's and the
B's to the C's. The no one was using Q
or Z with any frequency. And so they
were sort of wasting space and running
out of space. So at some point they
decided to like literally remove most of
the letters of the alphabet, make the
buckets bigger and fewer. So now it's
very unlikely that you're going to have
so many K's through N's that you
overflow the shelf. But this is in the
real world a data structure like we've
seen today. And so therefore among the
goals, even as arcane as things seem to
be getting with all the pointer notation
and dreferencing this and that, really
all we're doing in code is implementing
realworld solutions that other people
have already come up with and
translating them to a new domain. And
the very last thing you'll do in C this
week is indeed implement your very own
spell checker whereby we'll give you a
very large file of 100,000 plus English
words. you'll have to come up with a
clever and efficient way to load it up
into memory. And we'll give you tools
that will actually measure how fast or
how slow your code is, how much memory
or how little memory your code is so as
to actually compare it against not just
your own but perhaps others as well. So
with that said, we'll end a bit early
today. We'll see you next time.
Heat. Heat.
All right, this is CS50 and this is
already week six wherein we transition
away from C to a programming language
called Python. And that's not to say
that the past several weeks haven't been
among the goals of the course. Indeed,
in learning C, I very much think that
you'll have at the end of this class so
much more of a bottom-up understanding
of how computers work, of how
programming languages work. And in
particular, you'll appreciate and
understand better how Python and Java
and C++ and Swift and so many other
languages are actually doing their thing
nowadays. But recall that we started
with Scratch some weeks ago. When in
Scratch, what was nice was that the
first program we wrote, hello world, was
just all too accessible. All you had to
do was interlock two puzzle pieces in
order to make the cat in that case say
hello world. Well, thereafter, of
course, we transitioned to C. And recall
that in week one, we asked you to take
on faith that you can sort of ignore
that first line and a lot of these
parentheses and the curly braces and
really just focus on the essence of the
program, which clearly is still about
hello world and printing it, albeit
using a different function and a bit new
syntax. Today, very excitingly, all of
that is truly going to go away and be
distilled into a single line of code
when you indeed want to have the
computer say something like hello world.
And this is what we mean by Python being
a higher level language. So, humans over
the decades learned uh from earlier
designs, earlier programming languages,
what worked well, what did not.
Computers got faster, computers had more
memory, and so you were able to start
spending more of those resources in
order to have the computer do more for
you. And so, you don't need to be as
pedantic syntactically anymore. you
don't need to write as much code anymore
and frankly you can just start solving
problems of interest to you building
products of interest to you so much more
readily by choosing the right tool for
the job and so in the real world if you
continue coding after CS50 like
sometimes C will be the right tool for
the job sometime Python will be the
right tool for the job and sometimes
it's going to be a different language
altogether that you'll never have
studied in school and in fact what's
compelling I think about this week six
much like when I took the class back in
the day is that after CS50 50, you'll
have a taste of one, two, maybe a few
different programming languages. And
that's going to be enough to bootstrap
yourself and teach yourself new
languages because you're going to start
to recognize in the real world
similarities with past languages that
you've seen, programming paradigms that
are still sort of with us. And the
syntax, yeah, that's invariably going to
change, but that's the stuff that you
are going to Google or ask chat GPT or
some other AI about down the line. So
long as you know enough of it to sort of
get real work done, you'll focus mostly
ultimately on the ideas and the problems
you want to solve and less on the
syntax. And so among the goals for this
week and this week's problem set and
really the rest of the course is to get
you more comfortable feeling
uncomfortable in front of your keyboard
because we're not going to give you and
tell you everything you need to know for
a language like Python. You're going to
turn to the documentation. You're going
to turn to the duck and you're going to
learn to teach yourself ultimately a new
language. So let's actually write our
first program and compare and contrast
with how we might do that in C. So
recall that in C we were in the habit
for the first couple of weeks and doing
make hello and make this build utility
just kind of magically new to look for a
file called hello.c C and magically to
create a program called hello and then
you could run it with dot/hello and then
a week or so later we revealed that make
is really just automating compilation of
your program with the actual compiler
clang in this case and passing it
command line arguments like - o to get a
specific output like the file name hello
instead of the default which recall was
a.out out passing in the name of the
file you want to compile and turning on
any libraries that you might want to
compile into your program link into your
program beyond the standard ones but
then you could still run it in exactly
the same way starting today when you
write Python code and then want to run
it you're simply going to run the Python
program itself so just as clang is a C
compiler uh Python is itself not only a
programming language but a program as
well and with the Python program which
understands the Python programming
language. Can you run code that you'll
have written in a file called hello.py?
And what this program is doing is a
little bit different from what clang is
doing, but we'll see that difference
before long. But first, let me go over
to VS Code and let's write our simplest
our first of Python programs by doing
code hello.py. And then in this file
without any includes, any int main
voids, I'm simply going to say print
quote unquote hello, world close quote.
All right. Now I'm not going to do make.
I'm instead just going to do Python of
hello.py. Cross my fingers as always and
voila, my first program in Python. So
it's sort of obvious that we got rid of
the uh hash include. We got rid of the
int main void. No curly braces. Only a
couple of parentheses here. But what
else is different to your eyes that's a
little more subtle here versus C. Yeah.
>> Yeah. So there's no F. So the print
function is a little more human
friendly. It's print instead of print f
where the f did mean formatted, but
we'll see that we still have that
functionality.
>> No need for the line break.
>> So no need for the line break,
specifically the back slashn. And yet
here's my cursor on the next line. So I
dare say humans over the years realized
we are more commonly wanting a new line
than we don't want it. And so they made
the default actually give it to you
automatically. And there's one more
detail. Yeah.
>> No semicolon.
>> So there's no semicolon. So, I finished
my thought at the end of the line, but I
didn't need to explicitly terminate it
with a semicolon. This is just with one
program, all of these salient
differences, but I'd argue that we got
rid of all of the annoying stuff thus
far anyway. So, we can really focus on
what this program itself is doing. But
what's exciting with Python 2 is just
how quickly you can solve certain
problems. And this isn't true of just
Python. It's really any higher level
language than C. In fact, just for fun,
let me go ahead and implement Problem
set five wherein you're challenged with
implementing the fastest spell checker
possible. So let me go back here to VS
Code. Let's close out hello.py and clear
my terminal window. And let me go ahead
and do this. Let me first split my
terminal by clicking this rectangular
icon over here. And that's going to give
me two terminal windows now left and
right. Because in the first one at left,
I'm going to CD into a directory I came
with today, which is the staff's
solution to problem set 5's spellch
checker in C. And on the right hand side
here, I'm going to CD into another
directory I brought with me today called
Python. Inside of which is a translation
of problem set 5 into Python. In
particular, I've implemented in advance
a spell.py file, which is the analog in
Python of spellar.c in C. And I've also
prepared a dictionary. Py file.
Unfortunately, if we open up
dictionary.py,
you'll see that it's not actually
implemented yet. So in dictionary.py,
let's implement in Python problem set
five and see how long it takes. Well,
the first thing I'm going to do is
declare a global variable. We'll call it
words. And set that equal to the return
value of a Python function called set,
which essentially gives me a set object,
wherein I can store a whole bunch of
words without duplicates. Python's going
to manage all of that for me. In effect,
it's going to implement what I needed to
implement myself in problem set 5, a
hash table. Now, down here, I'm going to
go ahead and define a function called
check. Pass in as input a parameter
called word because, of course, that's
how it was implemented in C. But notice
a difference already. In Python, we use
a new keyword called defaf to define a
function. And we don't have to specify
the type of the variable being passed in
word in this case. And we also don't
have to specify a return type for the
function. Now, inside of this check
function, it suffices to do this. I'm
going to return word.
In words, which is effectively a boolean
expression asking, is the lowercase
version of this word in the set? If so,
return true. Otherwise, return false.
done with the check function. Now let's
go ahead and define another function
called load which recall took an
argument of the dictionary that you want
to load into memory. And let's go ahead
now and do this with open dictionary as
file which effectively opens the
dictionary as in C we used fop in Python
we use open and it gives it a variable
name of file. Then once that file is
open, I'm going to go ahead and update
that entire set of words which starts
out empty by taking the file, reading
the entire contents top to bottom, left
to right, and splitting all of the lines
therein on the new lines that terminate
each of the strings, effectively
updating the set with every word in that
their dictionary. Then I'm going to
assume that it all just worked because
there's a lot less effort for me to uh
to perform myself in Python. And I'm
just going to go ahead and return true
capital T in Python. Done. Next, let's
go ahead and define that other function
from problem set 5 size whose purpose in
life was to tell me the size of the
dictionary I had loaded. Well, in
Python, that's pretty easy. I can just
return the length or leen for short of
the set in which I've stored all those
words. Done. And then lastly, I'm going
to go ahead and define an unload
function, which recall was responsible
for freeing any memory I myself had
allocated. I don't seem to have done any
of that in Python. In fact, that's
managed for me now. So, I'm going to go
ahead and simply say return true because
there's no work to be done. And that's
it. In like 19 lines of code in Python,
most of which are blank lines, I claim I
have reimplemented problem set 5 in
Python. Well, let's take a look now at
the difference. I'm going to go ahead
and reopen my terminal window, and I'm
going to go ahead and maximize it so we
can see more output. And now I'm going
to go ahead and run Python, which is
going to be not only the name of the
language, but the name of the program we
use today to start running our Python
code. And I'm going to run it on
spellar.py, which I brought with me
today, specifically on the largest of
problem set 5's files homes.ext. Enter.
And as with problem set 5 itself, we'll
see a whole bunch of misspelled words
being printed to the screen. Some of
which might very well be misspelled.
Some of which are just not in the
dictionary. Some of which are simply
possessives of words that are in the
dictionary. But at the very end of this
output, I should see not only how many
words were found, but the total time
involved, which appears to be 1.87
seconds. Not bad, seeing as it only took
me like what, a minute or two to write
the actual code. But there is going to
be a trade-off. We'll see. Even though
it took me much less human time and
arguably was a lot easier to implement
this imp spell checker in Python than I
dare say it was for most everyone in C.
Let's see what that trade-off might be.
over in my lefthand terminal window in
which I'm in the C directory which I
brought with me as the staff solution in
C to problem set 5. Let's go ahead and
make that spellch checker. Then let's go
ahead and do/speller
and run it on the same file uh homes.ext
and see how long the C implementation
takes. Enter. And we see some of the
same output might be slower sometimes
just because of the cloud. there. Total
time spent in the CPU, not necessarily
printing everything to the screen, which
might take longer, is only 1.32 seconds
versus the 1.87 seconds in Python. Now,
while only half a second, that's a
decent percentage of the total amount of
time spent running the spell checker in
each of the windows. And so, that alone
seems to be one of the trade-offs. Even
though it seems to be much faster and
there say easier to implement a problem
in Python, there's going to be
trade-offs in so far as the code might
very well run slower. And as we'll see
today, that's in large part because
whereas C is of course compiled. That's
why I ran make and in turn clang. And
then the zeros and ones, the so-called
machine code is what you're running. In
Python, generally the pro the computer
is interpreting your code essentially
reading it top to bottom, left to right,
much like a human in between two other
humans might slowly translate one spoken
language to the other if those two
people don't in fact speak the same
language themselves. So there's a bit of
overhead when using Python, but I will
say that the Python community has been
working on this problem for some time.
And so in general, it's not necessarily
going to be as significant a trade-off
because there are certain tricks we can
do. And in fact, underneath the hood,
what the Python language can do for you
and the specific interpreter you're
using is technically semi-secretely
compile your code for you into something
called bite code and then run that bite
code, which is more efficient than
actually reinterpreting it again and
again. But we'll see more of this over
time. For now, let's take a look at
maybe two other problems that we might
solve, dare say more easily, more
quickly than we could have in C for
problem set 4. Let me go ahead and
shrink down my terminal window here.
Close out dictionary.py. close one of my
terminal windows and cd back to my main
directory. And let's go ahead and open
up that bridge bit mapap photograph that
we used in problem set four and had to
apply a number of Instagram-l like
filters there too. Well, now let's go
ahead and implement maybe one of those
filters, the blur filter, whose purpose
in life is just to blur this image.
Well, let's see how long this takes. Let
me go ahead and open up say uh blur.py,
which is now going to be a Python
program for blurring images. It's empty
initially, but I can pretty much write
this quite quickly. Now, let me go ahead
and at the top of this file, write the
Python keyword from PIL for Python image
library. Import a object called image
and another one called image filter. In
particular, two features of the Python
image library that's going to make this
so much easier to actually solve. And
then let's go ahead and define a
variable. We'll call it before
representing the before version of this
image. And set that equal to image.open
open quote unquote bridge.bmp where that
of course is the name of the file we
want to blur. Then let's go ahead and
create a variable called after
representing the after version of this
same filter and set that equal to before
filter open parenthesis image filter.box
blur and then just to be a little
dramatic I'm going to blur it more so
than you needed to in problem set four
but we'll see it more visibly now on the
screen. Let's do an argument of 10. And
then at the very end of this process,
let's do after.save and save it in a
file called say out.bmp.
Done. So in just four lines of code, I
claim I've implemented the blur function
now in Python of what we did previously
in C. Let me open my terminal window.
Let me run the Python command this time
on blur.py. Cross my fingers as always.
And indeed, I've made a mistake. Perhaps
even if you've never written Python
before, you can see it. And in fact,
we'll see a number of these errors. Some
intentional, some unintentional. But on
line four, what I intended to do was set
equal to uh before.filter that variable
I created called after. All right,
that's all right. Let's go back down to
my terminal window, clear it to get rid
of all that, and rerun python of
blur.py. Cross my fingers even harder
this time. Nothing bad seems to be
happening indeed. Now, let's go ahead
and open up out.bmp. And before we
reveal that, let's go back to the
original, which is bridge.bmp. BMP. And
now dramatically, let's see the blurred
version thereof.
Voila. Hopefully to your eyes, too. It
looks quite a bit blurry. Well, how
about one more flourish? Those of you
who were feeling more comfortable last
week and implemented perhaps uh edges
edge detection in C. Well, let's see if
we can whip that up quite quickly, too.
Let's go ahead and write a file called
edges.py using that same bridge.bmp
file. And in this file, let's go ahead
and do the following. As before, from
the Python image library, let's import
uh the image feature and the image
filter feature. Then, as before, let's
create a variable called before. Set it
equal to image.open, passing in
bridge.bmp. So, so far the same as
before. Now, let's create a variable
called after. Set it equal to before.
Passing in this time image filter.find
edges, which is different from box blur.
And by definition, it's going to find
the image the edges of this image. And
then after, as before, let's do
after.save of out.bmp and just clobber
the version of the blurred file that we
just created. All right, that's it.
Let's go ahead and open up my terminal
window now. Let's go ahead and again run
Python, but this time on edges.py. Cross
my fingers real hard. So far so good.
And that was quite fast. Recall that the
bridge.bmp image looked like this. But
now when we open up this new and
improved version of out.bmp, BMP. Thanks
to Python in just four lines of code, we
now have all of our edges detected.
So, what can we then learn from C
itself? Well, C had, of course,
functions. And functions were those
actions or verbs that simply got work
done. And let's go ahead and compare
side by side, much like we did with
Scratch and C, the ideas that today
onward, are still going to be the same.
And uh how they translate to Python. So,
on the left here, we'll now have our
friend Scratch. This, of course, was one
of the first puzzle pieces we saw. It's
a purple puzzle piece saying say and it
was a function in so far as it said the
value of its argument which in this case
is hello world. Well, we've already seen
in Python what this looks like. It looks
similar to the version in C, but it's no
longer print f. There's no longer a
semicolon and there's no longer an
explicit new line. So in Python, it's
quite simply this. Meanwhile, in Python,
there are a whole bunch of libraries as
well. Now in C we had simply header
files and those header files give you
access to the prototypes of that is the
signatures of the functions that you
want to use from those libraries. Python
uses somewhat different vernacular
whereby Python has what are called
modules and packages and a package is
just a collection of modules. But a a
module is just a library using Python
speak so to speak. So, anytime you hear
someone discussing a module or a package
in Python, they're just talking about
using a library. And that library might
come with the language itself just built
in as standard or it might be a
third-party library that you might
download and install yourself much like
I did a few weeks back when we installed
uh the cowsay program so that I could
actually have a cow or other animals on
the screen display text. So, in C
recall, we had something like this
include CS50.h, which was the header
file pre-installed for you somewhere.
But we will have for at least this week
a analog of the CS50 library in C also
in Python just to make this transition
from C to Python a bit easier. These two
though are meant to be training wheels
that you can take off and should take
off, you know, even within a week or so.
It's just meant to smooth that
transition and make clear what's the
same and what's different. So in the
CS50 library for Python, we also have a
function called get string whose purpose
in life is to get a string. To access it
though, you don't use hashincclude
cs50.h. That's a C thing. In Python, you
would say from CS50 import get string.
It's a little more verbose, but it's
also a little more precise as to what
you want from the library, especially if
you don't want the whole thing loaded
into memory. So here, for instance, is
now a Scratch program that was a little
more interesting than just printing out
hello world. This was the first program
we wrote that actually got some user
input. So in fact, let me go back to VS
Code and let's see if we can't resurrect
this C program real quickly in the form
of a new hello.c. So I'm going to run
code of hello.c and then in my ter in my
uh code tab I'm going to do include
cs50.h
include standard io.h and then below
that I'm going to go ahead and whip up
our familiar version of this int main
void and then inside the curly braces
we'll bring back string even though we
now know it's char star. We'll call our
variable answer. Set it equal to get
string. Ask the user quote unquote
what's your name with a space just to
move the cursor over. still need my
semicolon and C. And then after that,
recall back in week one, we did hello,
percent s back slashn and then plugged
in the variable answer so as to see
hello David, hello Kelly or something
else. Just to be safe, let me do make
hello. All is well so far dot /hello
type my name. And this version in C
seems to be working. Okay, so in C,
these lines of code here translate
pretty literally to what we just saw.
Although we got the answer variable in
Scratch for free. That blue puzzle piece
just existed without R having to create
it. But it's a decent number of hoops to
jump through in order to just get user
input and print it out. Well, in Python,
this is going to get a little more
succinct in that the Python version of
this code is now going to look like
this. Print f is now print. The
semicolons are gone. And what else seems
a little bit different?
Yeah.
>> I don't need any placeholders. Yeah. So,
we don't need the percent s anymore. In
fact, I'm curiously using a plus, which
if some of you studied Java or some
other language, you might have actually
seen this before. Even if you've never
seen Python before, you've only seen C
in CS50, you can probably guess what the
plus is doing. Even if you don't know
the the technical vocab, what is the
plus probably doing here?
Yeah. So, it's concatenating or joining
together the thing on the left with the
thing on the right. And we actually had
that vernacular in the world of Scratch.
We had the join puzzle piece that joins
hello, space and the value inside of
answer. A plus in Python can do exactly
the same thing. So it's a little more
user friendly than having to anticipate,
oh, let's put the placeholder here and
then come back later and plug in the
variable. Humans over time just realize
that it's a lot easier to sort of do
this in this way than bother with
placeholders. Though you can still use
placeholders for other purposes. Another
subtle difference between the C and
Python version of these two lines.
More subtle than that.
What's missing?
Yeah, I'm back.
>> Uh, so the back slashn is again gone for
Python. So that sort of happens for free
indeed. And one more difference.
>> You don't need to declare the type of
answer.
>> Yeah, we don't need to declare the type
of answer. Recall that if we rewind in
the C version, you needed to tell the
compiler that this is a string. And last
week, we could have changed string to
char star, but we still had to tell the
compiler what data type we're putting
into that variable. In Python, we can
now get rid of that data type. And
Python will just figure it out from
context. If get string returns a string,
well then obviously the variable should
store a string. If a function returns an
int, well then obviously the variable
should store an int. And the language is
just doing more of that decision-making
for you just to save you time and save
you thought. There's a subtlety here
though where we can make this program a
little bit different. In fact, let's
whip it up first in Python. Let me go
back to VS Code here. Clear my terminal
and let's go ahead and create a program
again called hello.py. That'll open up
my previous version thereof. And just so
we can see these things side by side,
I'm going to drag that tab over to the
right of VS Code and let go. And now you
can see the C version still on the left
and the Python version at the right.
What I'm going to do here now in my
Python version is change it to be quite
like the version in C now at left. So as
promised I'm going to do from CS50
import get string. Then below that I'm
going to say simply answer equals get
string quote unquote what's your name
question mark space no semicolon. But
then on the next line what I'm whoops
but uh parenthesis. Then on the next
line, I'm going to do print quote
unquote hello, space close quote plus
answer. Down here, I'm going to go ahead
and run Python if hello.py again. No
compilation step. I'm just going to
interpret it line by line. What's my
name? David. And it seems now to work
exactly the same. Now, it turns out in
Python there's even more ways to solve
problems like this, even trivial
problems like this. So here we're using
the plus sign, not as addition per se,
but as the concatenation operator, the
join operation. If you want though you
can take advantage of the fact that
print in Python can take more than one
argument. It can take two or three or
four or even zero by simply changing the
plus to a comma getting rid of that
seemingly superfluous space and just
give print two things to print because
it turns out per the documentation of
print which we'll eventually see it
knows that if it takes one two arguments
by default separate them for you by a
single space and that's something we can
override as well. which one is better
like h like I don't know like they're
sort of equivalent. It's such a trivial
difference but it speaks to the
flexibility that you'll start to have
whereby the language is a little less
rigid than C was certainly when it comes
to printing strings. So in fact if I go
back to VS Code here and I go ahead and
change that plus to a comma and get rid
of the space inside of the quotes. I can
rerun Python of hello.py, type in my
name and we see exactly the same result
there. But we can take this one step
further. Even though it's going to look
a little cryptic, this is sort of the
more Pythonic way to do things. And that
too is actually a term of art to do
something Pythonically is to do it the
way that most Python programmers would
do it. It's not the only way. It's not
necessarily the right way, but it's sort
of the recommended way in the community.
So here we have that latest version
where I'm passing two arguments to
print. The first is quote unquote hello,
and then the second of which is the
value of answer. I could similarly write
this same program with this crazy
syntax. Takes a little getting used to,
but it turns out it's actually kind of
nice overall. What's obviously
different? Well, one, there's these
weird curly braces are back. They're not
part of the logic of the program.
They're literally inside of the double
quotes. But you can probably guess how
this what this does for me because
there's one other crucial difference.
What else has changed between before and
after?
Yeah, there's this weird f which is not
part of print f. It's actually inside of
the parenthesis and next to the double
quotes. And even this one when this came
out was a little weird looking to
people. But this is how you get this
thing to be a formatted string, aka an F
string, as opposed to it being just a
literal string of text. Now, you can
probably guess what it means to put the
variable's name inside of the curly
braces. It means the value of that
variable is going to be substituted
right there. Similar in spirit to the
percent s in C, but a little more
explicit. With the percent S, you had to
remember that that percent S corresponds
to this variable's value or something
like that, which was just annoying if
anything else uh if anything. But this
time you have a placeholder in curly
braces that just says what you want
there, that particular value. And what
this means more technically is that the
answer variable will be interpolated by
the interpreter which means its value
will be plugged in right there. So let's
try this. Let me go back over to VS Code
and quite simply on my last line of code
here, let's change the input to print to
be quote unquote hello, and then curly
brace answer
then close curly brace close quote. And
I've done this. This is intentional, but
let's see. Let me go ahead and rerun
python if hello.py davv ID. What are we
about to see? Hello,
answer. So this is a bug, but just to
demonstrate like what is going on and
what's therefore missing. What what did
I forget? Yeah.
>> Yeah, I didn't declare that this is a
so-called fring or format string. The
fix for this, weirdly, is just to put an
F right there. And now if I rerun Python
of hello.py, Pi. Type in my name again.
Cross my fingers. Now I see that the
variable has indeed been interpolated
and its value plugged in where I wanted
it. All right. Turns out we can take off
one of these training wheels already. I
I propose that get string just exists in
the library just to smooth the
transition, but honestly it's not really
doing anything all that interesting. So
let's take this first training wheel
off. It turns out that Python comes with
a function appropriately named input
such that if you want to get input from
the human via their keyboard, you can
just use the input function. So we can
already for this program get rid of the
CS50 library because input essentially
behaves just like the get string
function. So if I go back to my Python
version here, I can change get uh get
string to input. And I can even go and
delete this training wheel up there.
Rerun Python of hello.pay in my
terminal. DAV ID enter and we're still
in business as well. So input is
generally going to be the way you go
about getting input now from the user.
All right, let me pause here and see if
there's any questions as we try to
bridge these two worlds from C to
Python. Yeah,
>> so in Python, we don't need the main
function. And why is that?
>> Good question. In Python, why don't we
need the main function anymore? because
clearly that's been omnipresent in like
every program we've written thus far.
And here we have it in all of our Python
programs thus far absent. It turns out
that humans realize it's just so common
that you want the file you're editing to
be the main part of your program. Like
why bother adding the additional syntax
of saying int main void or something
analogous? It's just easier if you want
to write two lines of code to get some
work done. Why do you have to waste my
time adding all of these this
boilerplate code which we've been doing
up until now. Now that said, we're going
to bring back main in a little bit
because it will solve a problem. But
generally speaking, what I'm doing here
is indeed a program, but people in the
real world would also call these scripts
where a script is like a lightweight
program that pretty much just reads top
to bottom, left to right. It might be
fairly lightweight. It's really
synonymous with writing a program, but
this is again one of the appeals of a
language like Python. You can just get
right in and get out and get the job
done. Even Java has moved to this in
recent years where you don't have to put
everything in a class. Uh public static
void main for those familiar. You can
just write uh system.out.print line and
get some work done.
>> Yeah.
>> Is input only for string?
>> Good question. Is input only for a
string? Yes. Right now it will get input
from the user via their keyboard and
you'll get back a string just like get
string. And we'll come back to why
that's maybe not a a good thing. All
right. So what's more might we want to
do at this point? Well, let's tease
apart some differences now with C. So up
until now, every argument we've ever
passed into a function in C and Scratch
for that matter is a so-called
positional parameter. And a parameter is
the same thing as an argument, but
generally when you're looking at the
function from the functions perspective,
it's a parameter that it accepts. But
when you're calling the function and
passing in an input, you call it
typically an argument, but they refer to
essentially the same thing. And all of
the parameters we've been passing into
functions thus far have been positional
in the sense that the order matters. the
first thing, then the second thing, then
the third thing, and so forth. For
instance, with print f, the first thing
has to be the quoted string, maybe with
a placeholder, and then if there's
another argument after the comma, that
can be the second argument, the third
argument, and so forth. But it turns out
Python additionally supports what are
called named parameters, whereby you
don't have to rely only on the order in
which you're enumerating the arguments
to a function. And that's helpful
because some functions, especially in
the real world, when you start using
other people's libraries that have lots
of functionality, they might not take
just one or two arguments. They might
take four arguments, 10 arguments, maybe
even more. And it can just be unwieldy
to have to remember the precise order of
all those arguments. You're just asking
for trouble if you're going to screw up
or a colleague is going to get the order
out of uh out of whack. So with name
parameters, you can actually be explicit
with Python and tell it what argument
you are trying to pass in by giving it
an actual name. So let me go over to VS
Code here and propose that we use this
for really the simplest of programs in
order to override that default new line
that we seem to be getting for free just
by calling print. In other words, let me
go ahead here and clear my terminal
window. Let me close. C and focus only
on hello.py for just a moment. And let's
make it much simpler like the very first
version and just print out using
Python's print function, not print f
quote unquote hello world close quote.
And now here I'm going to do Python of
hello.py. Enter. And we still see that
the cursor moves to the next line. The
dollar sign moves to the next line
because I'm automatically getting a new
line. Well, what if you don't want that?
How can you override that behavior?
Well, you can actually use a named
parameter in Python. And I can go up
here and add a second argument that if
it were just something like uh this,
that would literally print out the word
this because it's just another string.
But if I give it a name like end equals
quote unquote, I can override the
default behavior of the Python print
function by changing the value of its
end parameter to be the so-called empty
string, quote unquote, which means
literally there's nothing there. Watch
what happens now. If I run Python of
hello.py and hit enter, the dollar sign
is weirdly and sort of in the ugly way
on the same line, just like it was when
I made the mistake in C in week one of
omitting the backslash.
That is to say, what the default value
of this end parameter really is is quote
unquote back slashn. And I can make it
explicit by changing my code as such.
I'm going to go ahead and rerun python
of hello.py. And now the cursor is back
on the next line. And not that this is
that useful other than overriding that
default, but you could do fun things
like exclamation point, exclamation
point, exclamation point if you really
want print to be excited to print some
things for you. And if I now run Python
of hello.pay a third time, now you see
that it's ending with exclamation point,
exclamation point, exclamation point.
Looks a little stupid with the dollar
sign. So you could even toss in a new
line there. Run it yet again. And now we
sort of get both of those there. But I
would say the common case is to use that
end uh named parameter simply to
override it. So how do you learn more
about these kinds of things? Well, if
you go to the official documentation for
Python, which is a thing more so than
with C, like if you want to learn more
about Python and the functions it offers
and the arguments it takes, you go to
the official documentation uh
docs.python.org. This is essentially
analogous to the so-called manual pages
or man pages that CS50 has a version of,
but there is no one de facto source for
those man pages. Several different
versions of them exist in the while.
Whereas Python itself as a community
maintains its own official
documentation. So for instance, if you
go to a specific URL like this ending in
functions.html, you'll see an exhaustive
list of all of the functions that come
with Python besides just the print
function. And we'll see a bunch of more
today. If specifically you scroll down
to the print uh documentation, you'll
see something that's a little arcane
that looks like this. But this is
representative of a Python prototype, if
you will, often also called a signature
that just tells you the name of a
function and then how many and what type
of arguments it takes. So how to read
this? Well, the print function takes
some number of objects. So in Python
specifically this syntax of star objects
just means zero or more objects whatever
that is like a number or a string or
something else the stuff you want to
print out. After that if you start using
named parameters you can specify what
the default separator is the separator
between arguments to print. So, recall
that when I did quote unquote hello,
comma, quote unquote, uh, or quote
unquote hello, comma, answer, that was
separated automatically for us by a
single space, even without my hitting
the space bar inside of my quotes.
That's because the default value here is
in fact a single space. The default
value for end, as promised, is indeed
back slashn. And then there's some other
stuff related to file IO that print can
also deal with, but more on that perhaps
another time. There's one curiosity
here. In Python, it turns out that you
can use double quotes or single quotes
around strings, where in C, it was much
more regimented. Double quotes are for
strings and single quotes are for
chars, characters only, single
characters. It doesn't matter in Python
which one you use so long as you're
consistent. And stylistically, you
should really pick one and go with it.
And the only time you should really
alternate between the two is maybe if
you want to put like an apostrophe for
some human's name inside of double quote
inside of single quotes or something
like that. But generally you have a
little more flexibility in Python. And
you'll see in different languages Python
community tends to use single quotes at
least in the documentation. The
JavaScript world tends to use single
quotes. Um we in CS50 often use double
quotes just for consistency with what we
do in C. But any uh community or company
would typically have its own style guide
that dictates which one you should use
if only for consistency
questions then on this here print
function
as just representative of all of the
docs that you'll see.
All right. Well, let's take a quick look
at variables. We've used these a few
times already, but let's focus in a
little more detail on what's actually
different in Scratch. If you wanted to
create a variable called counter and set
it equal to zero, you would use this
orange puzzle piece here. In C, you
would do something like this. The type
of the variable, the name of the
variable, and then set it equal to the
initial value semicolon. In Python, it's
going to be a little similar, but you
can probably guess where we're going
with this. How is this line of code
probably about to change? Yeah,
>> good. We're not going to bother with int
or the data type more generally. We're
just going to say counter cuz obviously
like a smart interpreter can just figure
it out from context that you're putting
a zero in there. It's obviously an
integer. And what else is about to go
away? The semicolon. So this is the C
version. And voila, this now is the
Python version. And this is as silly as
this example is, it's kind of
representative of how languages like
Python just tend to be a little more
programmer friendly because you just
type less and get the same work done.
All right. So if we wanted to do
something now in Scratch like increment
the counter by one, you would use this
puzzle piece here. In C, we could do
something like this. In Python, it's
going to be almost exactly the same
except of course no semicolon. In C, we
could alternatively do this. And you can
also do this in Python. Uh in C though,
you could also do what other technique
>> plus+ I'm sorry, but Python has taken
that away from us. So if you got into
the habit of using plus+ or minus minus,
that's great. Use them in C all you
want. In Python, they just don't exist.
So you'll see this more commonly instead
as the heruristic. All right. What about
the various types that exist in Python?
Because even though you don't have to
specify the types when declaring your
variables, they do in fact actually
exist underneath the hood. And it's
worth knowing a little something about
them because not knowing will lead often
to some form of bug. So in C, we had
types like this bull, char, double,
float, int, long, and string. The last
of which was thanks to the CS50 library.
that last week we would have started
calling uh a string charst star instead
which it still is a data type the
address of some char. In Python we're
going to whittle this list down to a
subset of those essentially whereby we
still have bulls we still have floats we
still have ins and we do have strings
but they're literally called stirs str.
So it's not a CS50 thing. The Python
community call strings str. But absent
from this list is any mention of star
not to mention charst star. There are no
pointers in Python. And indeed, as
powerful as I'd hope you found uh weeks
four and five to be, I dare say you also
found them incredibly frustrating and
challenging and want to yield bugs in
your code because with that power of
memory management comes a whole slew of
potential mistakes that you can make.
And that's true not just for CS50
students, but for programmers, adult
programmers, full-time programmers
around the world. And so among the other
features of languages like Python is
they try to take away certain features
of languages like C that were just too
dangerous in the first place might be
wonderfully powerful might help you
solve problems more quickly more
precisely but if they tend to do more
damage than they're worth sometimes it's
worth just abstracting those details
away. Similarly Java has references as
some of you might know but does not have
pointers per se. You can't go poking
around arbitrary locations in memory in
the same way that you can with C. So,
let's take some of these data types out
for a spin and see what's the same and
what's different. Let me go back to VS
Code here and let me propose that we
bring back one of our old calculators
from a while back. So, let me clear my
terminal, close hello.py, and let me go
ahead and open up a version of this
program that I brought in advance, which
was our calculator version 0 from back
then. So, just to remind you, one of the
first versions of our calculator had the
CS50 library as well as the standard IO
library. And then we simply got an int
using get int in week one. We got
another int in week one using get int.
And then we simply perform some
addition. So it was a very trivial
calculator that we did very early on
just to demonstrate some of the
operators and syntax of C. Well, let's
go ahead and try converting this to
Python by creating our own program
calculator.py. So in my terminal window,
I'm going to write code of uh
calculator.py.
It's going to open another tab which I'm
just going to drag over to the right
just so we can see both side by side. I
won't bother with uh say well let's do
it for par here. Let me copy the C code
into the Python file even though this
will not work in the same way but let's
keep what we need and get rid of what we
don't. So instead of the slash for
comments in Python turns out the
convention is to use a single hash
symbol like this. So it's a minor
difference. It's uh half as many
keystrokes. So that's nice, but we're
not going to include anything like this.
But we are going to do from CS50, let's
import a function that I promised would
exist called get int. But we'll soon get
rid of that training wheel as well. We
don't need main or this curly brace. We
don't need this curly brace. And we
don't need all of this indentation as a
result. So I'm going to move all of that
over to the left. I'm going to fix all
of the comments to be Python comments by
changing the slash to hash symbols. And
now I'm going to change each of these
three lines of code, as you might
expect, to the Python version. So you
probably can guess already, we can get
rid of the int there and the int there.
We can get rid of the semicolon here and
the semicolon here. We can get rid of
the f in print f here. And we can get
rid of the semicolon here. And there's a
few different ways we could do this, but
I dare say the simplest is going to be
to get rid of the format code altogether
and that first argument and just tell
Python to print x + y. So, there's a few
different ways we can do this, but
that's probably the most literal
translation of the program at left to
the program at right. Let's reopen the
terminal window and run Python of
calculator.py and hit enter. Let's do
something like x is 1, y is two, and
hopefully we do in fact get three. All
right, so that's all fine and good, but
let's take off one of our training
wheels now. So, let me get rid of our C
version here and focus just for the
moment on Python. Let's take away this C
code. And what was the function we can
use to get user input?
Yeah, it was called a little louder.
It's just called input. So, let's get
rid of CS50's get int already and use
input instead. All right. So, this
program is much simpler already. So,
let's go ahead and reopen the terminal
window. Run Python of calculator.py.
Do one again for x, two again for y, and
of course 1 + 2 equals 12.
So what's going on here? Because clearly
this is a step backwards. Yeah.
>> Yeah. So in the context of strings, plus
represents concatenation, the joining of
two arguments on the left and the right
here that seems to be what's happening
because it's not 12 per se. It's more
literally one two concatenated together.
But why is that? Well, apparently the
input function indeed returns a string.
That is the key. Those are the
keystrokes that came back from the user.
might look like numbers and Arabic
numerals to us one and two but it's
being treated as a string more
technically like underneath the hood
there is some char star stuff going on
there even though we're not using that
same terminology so intuitively what's
going to be the solution
without just reverting to using the
training wheel that is the get int
function from CS50 put another way how
did CS50 probably implement get int
might you think
>> Yeah. So recall that in C we could cast
some data types to other data types.
Typically ints to chars or chars to
ints. It's not quite as simple as
casting in this case because underneath
the hood thanks to our knowledge of C.
There's a bunch of stuff going on.
There's probably a one and there's a
null character. There's a two and
there's a null character. So it's not
quite as literal as a char to an int or
an int to a char. So, we're going to
more properly convert the string or the
stir to an int. We're not casting, but
converting. And converting just implies
that there's a little more work that has
to be done. But thankfully, Python can
do this for us. In fact, let me go up to
line four here and say, uh, pass the
well, actually, let's do it in this a
couple ways. Let's first convert the x
value to an integer. Let's convert the y
value to an integer as well. So, funny
enough, it's very similar syntactically
to casting, but in C, when you cast
something, you actually wrote the data
type in parenthesis. Now, the data type
itself is a function that takes an
argument, which is the stir or string
that you want to convert. So, let me go
back to my terminal, do Python of
calculator.py, enter, type in one, type
in two, and now I get back my three
answer. Now, as you might imagine, just
like in C, we can kind of play around
with where we're performing some of
these operations. And this looks, you
know, arguably a little less obvious now
as to what is being added. So I really
like the simplicity of x plus y just
does what it says. So I could convert
these in other ways. I could say after
line four, you know what, re change x to
be the int version of x. But generally
speaking, that's kind of wasting a line
of code by just doing something you
could do on a single line. So let me
delete that and instead just say that
well if I know the return value of the
input function is a stir let's just pass
that output as the input to the int
function and it'd be a little more
Pythonic so to speak to just pass the
input functions output as the input to
int which is really hard to say but
we've done this in C just nesting
function calls like this. All right so
if I run this one more time Python of
calculator.py pi. Type in one. Type in
two. We're back now in business. Now,
what I won't trip over just yet is a
subtlety that whereby I'm deliberately
typing in actual numbers like one and
two, but if you are following along at
home or on your laptop, if you were to
type in cat and dog, like bad things
will happen. But we'll come back to that
before long. All right. Questions though
on any of this conversion of our strings
to our
integers in this case? Oh, all right.
Well, what more does Python offer to us?
Well, in addition to these data types,
there's actually going to be a bunch of
others. A few of which we'll actually
use today. In fact, we'll see ranges of
numbers. That's like that's a thing
built into Python. We'll see lists of
numbers, which is going to be like a new
and improved version of an array that
solves like all of last week's problems
when we talked about the downsides of
using arrays. There's going to be tpples
for things like x, y coordinates or GPS
coordinates or anything where you have
collections of values. There's going to
be dicks or dictionaries whereby you can
have key value pairs provided to you
without having to write a whole hash
table yourself. And you can have sets
which you can use to just contain unique
sets of values that you just want to
check for membership. And there's
bunches of other data types as well. And
this is where languages like Python
start to get really powerful because all
of the data structures we talked about
in C, we really only got from the
language itself an array. everything
else we had to build or at least talk
about building in class. These now and
more come with the language. Meanwhile,
in the CS50 library for Python, just so
you know, there are a whole bunch of
functions. These though were the C
versions. In Python, it stands to reason
that we don't need as many because
there's fewer data types in Python, but
get float, get int, and get string do
all exist in the CS50 library for
Python. you're welcome and encouraged to
use it because indeed among the goals
for problem set six are going to be to
redo some of your C problem set problems
in Python where you can look at your own
C code and hopefully um uh you like that
solution and figure out how to convert
it line by line essentially to the
corresponding Python version but clearly
we've seen ways of taking these training
wheels off quite quickly as well and in
fact if you wanted to import all three
of those functions for a larger program
you could do this just following the uh
approach that I took so already, but you
can also just separated them by commas
like this. Or it turns out you can also
import the whole CS50 library as you'll
see in some code and then just access
the functions within with slightly
different syntax as well. All right, how
about another construct from scratch and
from C now in fact in Python. So in uh
Scratch if we wanted to do a comparison
like is X less than Y where each of
those are variables then say as much
here in C it looked like this and nicely
enough you can probably guess already
which what's going to change here like
the f is about to go away the back
slashn is about to go away the semicolon
is about to go away but some other
stuff's about to go away as well focus
your attention on the syntax like
parenthesis and curly braces because in
Python it's just that so we got rid of
the parenthesis because they didn't
really add all that much logic ically we
got rid of the curly braces which
technically we could do in C anytime
there's a single line of code inside of
a conditional but for uh consistency
stylistically we always use them as
well. Python though does not have you
use any of those curly braces at all.
But Python requires that you indent your
code properly. So, if you've ever been
among those who are writing out your
program and like everything is just
crazily like left aligned and just a big
mess until style 50 swoops in and cleans
it up for you, you're not going to be
able to write Python code like that
anymore. That's been such a societal
problem among programmers, newbies and
professionals alike, that the language
itself requires logically that if you
want this line of code to execute if
this boolean expression is true, you've
got to indent this line by convention
four spaces. You can't be lazy and leave
it all left aligned and sort of fix it
up later. This has made Python code
arguably more readable because of these
language-based requirements. Meanwhile,
let's look at a if else construct in
Scratch which looked a little something
like this. In C, it looked like this,
which is kind of a lot of lines just to
express the simple idea. All of those
same things are going to go away.
Whereby in Python, it looks like this
instead. And the only other difference
worth calling out is that because you
don't have the curly braces, you do have
a colon which precedes the subsequent
indentation as well. Meanwhile, if we've
got an if else if else in Scratch in C,
of course, it looked like this. A lot of
this is going to go away in the flash of
a screen, but there's going to be a
curiosity, which is not in fact a typo.
Notice what happens with the elseif.
It's abbreviated L if. And honestly, to
this day, all these years later, I can
never remember if it's l if or else if
because different languages use
different shorthand spellings of this
phrase. It's L if in Python. Uh because
that's maybe the most succinct you can
make the two words themselves. But
everything else is effectively the same,
including the additional colon this
time. Okay, questions on any of those
conditionals and syntax. Yeah.
>> So, what language did they code Python?
>> What a good question. What language did
they code Python in? The interpreter we
are using within VS code is itself
written in C aka C Python. However, you
can implement a Python interpreter
really in any language including machine
code like raw zeros and ones if you have
that much free time in assembly language
which we saw briefly weeks ago. You
could write an interpreter for Python in
Python if you really want to be meta
about it or in C++ or in Java. This is
the thing about programming languages.
You can use any language to create a
compiler for or interpreter for another
language. What's going to vary is just
how easy or difficult it is and how much
time it therefore takes you. Good
question. Other questions on any of
these here features?
Oh. All right. Well, let's do something
a little bit uh different in Python visa
VC by opening up maybe a comparison
program that we looked at some time ago.
So, let me go back to VS Code here. I'm
going to close my calculator and I'm
going to open up now from my uh
distribution code today a version of our
comparison program from a while back
which was essentially the uh version
three zero index thereof. So this one
has comments which the very first one in
week one did not. But notice as a
refresher what this comparison program
was doing. It was including cs50.h and
standard.io.h. It was prompting the user
for two integers via get int x and y. It
was then doing a very simple comparison
comparing X against Y to determine if
it's less than, greater than, or dot dot
dot the same as X and uh the same or
equal to the same. So just so that we
can go through the motions of converting
one of these to the other, let's do that
side by side. Let me code a program
called compare.py. Let me close my
terminal. Drag the Python version over
to the right here. And without comments
this time, let's just do from CS50
import get int. Then below that, let's
do x equals get int and ask the user for
what's uh x question mark. Then let's
ask the user for y using get intquote
what's y question mark. Then below that,
let's do if x less than y colon. Go
ahead and print quote unquote X is less
than Y. Close quote. L if X greater than
Y. Go ahead and print quote unquote X is
greater than Y. Else colon, let's go
ahead and print out quote unquote X is
equal to Y. So I dare say these are now
equivalent. It's clearly fewer lines
because a lot of the lines it left were
admittedly comments, but also some curly
braces. And there's more syntax like
parenthesis that we got rid of, too. Let
me open my terminal window. Let me run
Python of compare.py.
We'll type in one and two. One is less
than uh x is less than y. Let's do it
again using two and one. x is greater
than y. Let's do it one last time. One
and one. And of course, those two now
are equal to each other. All right. But
why go down this road again? Because
that was kind of a simple exercise. But
recall that we introduced this
comparison of ants because it was so
sort of stupidly simple. even if the
syntax at that week was completely new.
But we ran into an issue pretty fast
when we started comparing strings. And
that was a problem we really only fixed
in week four when we finally revealed
what a string actually is. If we focus a
bit more on Python strings, it turns out
that we can solve that problem much more
easily in the world of Python. In fact,
let me go back to VS Code here. Let me
close these two versions of int
comparison. Let me open up at left a
version of my program that I brought
with me here that contains a version
from week 2 wherein we finally revealed
that a string is just a char star. But
recall that the solution in week four as
well as in week one when we first
encountered this problem was to use stir
comp a function that whose purpose in
life is to compare two strings character
by character by character using a for
loop or something like that. But they
have knowledge therefore of how to
navigate pointers, how to look for the
null character, the back/zero at the
end. And all of that came from our
friend string.h. Well, how can we go
about implementing the same idea in
Python? Well, let's open up VS Codes
terminal window, open up a new program
called compare.py,
but this time let's get rid of the
integer version thereof. Let's get two
ins from the user. And I won't even use
any CS50 training wheels. Let's just use
the input function to get S and ask the
user for a value of S. So S colon close
quote with a space T equals input ask
the user for a variable T. And then
let's just ask the question. If S equals
T, then print out quote unquote same.
Else go ahead and print out quote
unquote different. Let me move these
side by side just so you can see the
difference. Notice how much code we have
to write and how much we needed to
understand in order to compare something
as trivial as two strings in C. But in
Python, we're literally just using
equals equals. And let's see if it
actually works. So, Python of
compare.py. Enter. Let's type in maybe
cat for s and dog for t. And those are
in fact different, but we would have
gotten the same answer in C. Let's rerun
Python of compare.py and type in cat.
Type in cat again. And now it's
detecting them the same. So wonderfully,
Python has solved that seemingly
annoying problem of not taking us
literally like don't compare the pointer
against the pointer. Compare what a
reasonable programmer probably really
cares about the values of those strings.
So the equal equals is doing all of the
for loop or the while loop iterating
over those things character by character
and actually giving us the answer we
want. So what else gets easier in
Python? Well, let's focus a bit more on
these strings. Let me go back into VS
Code here. Let me close out our two
comparison programs and clear my
terminal. And let me go ahead and open
up a prior program that we wrote that
one called agree.c. And namely in the
staff version of the code online, this
was agree to. C, which is where we left
it. Now recall in this C program that we
did the following. We first using CS50's
get char function prompted the user for
a char hopefully Y or N for yes or no
respectively. And then we used a boolean
expression and actually the combination
of two using the two vertical bars to
ask whether the inputed character is
capital Y or the inputed character is
lowercase Y. And if so, we went ahead
and printed out that the user agreed.
Otherwise, if they type in anything else
for that character, we simply printed
out not agreed. Well, how can we go
about implementing that same program in
Python? For instance, in a file called
agree.py. Well, let me go ahead and open
up my terminal window again. Let's
create a file called agree.py. not pi as
before. Let me go ahead and drag it over
to the right so we can see these two
things side by side. And let me go ahead
and do this. I'm going to set a variable
say called s uh equal to the return
value of input quote unquote do you
agree thereby asking the user the same
question as before. No need to use the
CS50 library because the input function
here suffices. And instead of using C,
I'm deliberately using S because it
turns out in Python, there is no way to
get a single character per se, but you
can get a string that has a single
character. Indeed, char is not a data
type in Python. But once we have this
input from the user, let's now go ahead
and implement a conditional using one or
more boolean expressions. Well, let's
ask if S equals equals quote unquote
capital Y or S equals equals lowercase
Y, then let's go ahead and print out as
before quote unquote agreed. And now
notice what's different this time. I'm
literally using the word or instead of
the two vertical bars because in the
spirit of Python, things tend to be a
little more English-like, a little more
readable, top to bottom, left to right.
And indeed, or hits that nail on the
head. Otherwise, if it is not an capital
Y or a lowercase Y, let's go ahead and
print out quote unquote not agreed. And
that's it for converting this program
from C here into Python. But of course,
this isn't the most robust version of
the program because it would be nice if
the user could type in something like
yes uh ye capitalized maybe in different
ways. So, how might we go about
implementing that? Well, we could do
this in a few ways. I could of course
and let's go ahead and get rid of my C
version now and focus just on the
Python. I could do something like this
and just start oring together more
possibilities like or S equals uh quote
unquote yes or S equals equals quote
unquote yes very emphatically or and so
forth. But you could imagine that this
doesn't scale very well. If I want to
consider all the possible permutations
maybe of the caps lock key being up or
down, that's quite a few possibilities
to enumerate. So perhaps we could do
this a little bit differently. And in
fact, we can by maybe storing all of the
possibilities in a so-called list. So
whereas C had of course arrays, Python
has what are called lists which
effectively underneath the hood are
indeed linked lists as we explored in
week five. Now a linked list of course
can dynamically grow and even shrink.
And that's indeed what Python does for
us. I can simply create a list of values
from the get-go. Or as we'll eventually
see, I can add things to it, remove
things from it, and all of the
underlying memory gets managed for me.
And in fact, with lists, we get a whole
bunch of features that can make this
possible. But for now, let's use them
simply as statically initialized lists
with values I know from the get-go that
I want. And I'm going to go ahead and do
this in VS Code. I'm going to delete
most of this boolean expression, the
combination of all of those there
phrases. And I'm going to simply say if
S is in using a Python keyword in,
literally the following list of values
quote unquote Y, quote unquote yes. And
for now, I'm going to use just those
two. But let's see how it works. Let me
open up my terminal window again. Let me
run python of agree.py. Really for the
first time, but let me claim that it
would have worked even in the previous
version. Enter. I'm going to go ahead
and type in lowercase y. And I've
agreed. I'm going to go ahead and run it
again and type in lowercase n. And I've
not agreed. I'm going to go ahead and
run it again. And I'm going to type in
all caps. Yes, because I really agree.
And yet I don't because there is a bug
still in this version. So even though up
here in my Python implementation I do
have a list of values that I'm looking
for, Python's going to look literally
for those values. So lowercase Y and
lowercase yes. So how can I go about
tolerating different capitalizations by
the user? Well, I can do this in a few
different ways. I could for instance
after getting the user's input in a
variable called S, I could update S to
be S.L, lower which is going to have the
effect of lowercasing the word for me
and then updating the value itself of s
and now I think this will work even for
an uppercase version let me go ahead and
run python of agree.py pi emphatically
type in yes enter and yet this time I've
agreed because I forced the user's input
to lowercase and then I have compared
against the canonical forms I've written
which are all lowercase I could have
done the opposite I could have forced
the user's input to uppercase and then
enumerated in my Python list in between
those square brackets uh capital y and
capital yees but either approach here is
fine now technically I don't need this
additional line here I can go ahead and
delete that line wherein I lowercased it
and in Python I can actually ain some of
these function calls together by saying
input.lower so that the return value of
input ultimately gets forced to
lowercase by using lower here. Uh
alternatively still I could just
lowercase the very at the very moment
I'm actually comparing it and down here
I could do s.
And then compare the lowercase version
of what's going on uh to y or yes. Now
what's really this all about? Well, this
is actually an example of what's
generally known as object-oriented
programming or OOP for short, whereby in
Python and a lot of other languages.
Now, you can have variables and data
types more generally that have not only
values associated with them like Y or
yes, but also functionality built in. In
other words, whereas in C, we would have
used a function from like the C type
library called to upper or to lower and
we would have passed as an argument to
those functions the very character that
we wanted to force to uppercase or to
lowercase. Well, in Python and indeed
object-oriented programming languages in
general, the developers behind the
language recognize that sometimes
there's functionality that's inherently
related to the values in question. And
indeed, when we're dealing with strings,
it's pretty reasonable to want to
sometimes uppercase them or lowercase
them, capitalize them, or do any number
of other things. And so, built into the
string type in Python is in fact the
lower function itself, as well as a
whole bunch of others. In fact, at this
URL here, can you see the documentation
for all of the string functions built
into Python? More technically, when a
function is built into a data type and
you access it via this dot notation,
instead of by calling some global
function and passing an argument into
it, you are using what are called
methods. So methods are simply functions
that are inside of objects. And in this
case, the object in question itself is a
string. So what's really happening with
this here example when I'm checking
whether the user has agreed or not is
I'm taking that value that string s
which is technically now an object in
memory and inside of that object are is
not only the user's input but some
built-in functionality otherwise known
now as methods and those methods were
written by the same people who invented
the string data type itself. So this is
just the first of these examples, but
we'll see yet others. But notice the
syntax is actually quite similar to C,
just as in C. When you wanted to go
inside of a structure, you can similarly
go inside of an object in Python and
access not just the values ultimately,
but also these built-in methods.
All right, how about another comparison
of C to Python again involving strings?
Well, let me go ahead and reopen and
clear my terminal and close out of
agree.py. Let me go ahead and open up a
version of copying strings from a couple
of weeks back whereby we finally started
solving it correctly by doing some
proper memory management. So here in the
staff version of copy 5.C we have not
only a commented version of what we did
a couple weeks back but we also have a
reminder of how what was involved in
copying strings in C. Recall for
instance that we prompted the user in
this example using CS50's get string
function for a string that they wanted
to make a copy of and then we did some
error checking ultimately to make sure
that there was enough memory and nothing
went wrong. Then recall that the right
solution to this problem in C was not to
just use the assignment operator and
assume that S can be copied into T, but
rather to allocate using maloc enough
memory for the copy plus one more bite
for the null character. Again, making
sure that all is well by checking the
return value of that. and then actually
copying character by character by
character the characters from S into the
chunk of memory now known as T or
ultimately recall we used a built-in
stir copy function which does all of
that looping for us and then when it
came time to capitalize just the copy we
did a quick sanity check is the length
of t greater than zero otherwise there's
nothing to capitalize and if so go ahead
and use the cype libraries to upper
function passing as input that specific
character t bracket zero and and
updating t bracket zero itself. So
here's an example of procedural
programming in contrast with
object-oriented programming. Again, I'm
passing the argument to be uh uppercased
into the two upper function as opposed
to simply going to that character and
asking it via some dot operator to for
instance uppercase itself. Now I went
ahead in the C version and printed out
the two strings. I freed up my copy of
memory that I myself had allocated and
that was it for this program. So, it was
a decent amount of work, recall, in C,
to actually go about just copying a
string. Well, as with so many things in
Python, it's going to be so much easier.
Let me go ahead and do this. Let me open
my terminal window. Let me create a file
called copy.py.
Let me move it over to the right hand
side so we can see them side by side.
Closing my terminal window. And let's do
roughly the same. Let's create a
variable called s. Set it equal to on
the right hand side the return value of
Python's own input function because we
don't really need CS50's own get string
function. and ask the user for s. Then
let's go ahead and create a second
variable called t. Set it equal to
literally s. capitalize whose purpose in
life, if we read Python's documentation
for string methods, will be to uppercase
the first letter of the word that the
user has presumably just typed in. Then
I'm going to go ahead and print out as
before the user's input. And I can do
this in a couple of different ways, but
I'm going to use one of our format
strings and say s colon and then
interpolate that variable s by using my
curly braces to say put the value of s
here. Then I'm going to go ahead and
print out t by saying t colon
interpolate its value here inside of
quotes close parenthesis. So let's see
if this works. Let me go ahead now and
run python of copy.py. I'm going to go
ahead and type in say cat in all
lowercase and hit enter. And now notice
S remains in all lowercase, but the copy
indeed has been capitalized alone. All
right. Well, let's take a look at one
other example involving strings uh
between C and Python equivalents. Uh let
me go ahead and remind us that a few
weeks back too, we created this
uppercase program whose purpose in life
was to prompt the user using get string
for a string saying here's the before
string. then it prints out after because
the purpose in life of this program was
to uppercase all of the characters in
the string, not just capitalize the
first one. So, as you might expect, we
used a loop a few weeks back and we
iterated from zero on up to the length
of the string using plus+ to increment i
in each iteration and then each time we
went ahead and printed out one character
at a time. So, strictly speaking, we
didn't change the string from lowercase
perhaps to uppercase. We just changed
each letter to uppercase and printed it
out right away. Well, how might we do
something similar in Python? Well, here
too we have a couple of different
approaches. Let me go ahead and open up
my terminal now. Run uh code of say
uppercase.py.
Close my terminal window and let's drag
this to the right so we can see them
side by side. And let's do roughly the
same. Let me create a variable this time
called before. uh set that equal to the
return value of input and just prompt
the user for that before string. Then
after that, let's go ahead and print out
preemptively after colon space space
just to align everything nicely. But let
me not print a new line yet because I
want to go ahead and see uh the
following string on that same line. And
then let's go ahead and do this
analogously to the C version first, but
then tighten things up. Here's how we
can iterate in Python over every
character in a string. I don't need to
bother with I and indexing into the
string or anything like that. I can
using a Python for loop simply say for
each character C in that string called
before go ahead and print out the
uppercase version of that character. But
don't yet print out a new line. But at
the very end of this loop, go ahead and
print out nothing but a new line. Let me
go ahead and open my terminal. Run
Python of uppercase.py.
Enter. Type in cat in all lowercase.
Cross my fingers. and after each and
every one of the characters is
uppercased. And what's nice about this,
if nothing else, is that this for loop
in Python there on line three is pretty
elegant, whereby you implicitly get
access to each character in the string
because that's how Python knows how to
iterate over a string object. But it
turns out we don't have to do this quite
as analogously in Python as we did in C.
We don't have to do it character by
character in so far as Python is
object-oriented and these strings are
objects and those objects have methods.
those methods will actually operate on
the entire string at once unlike the
more pedantic work we had to do
character by character in C. So in fact
let me go ahead and close the C version
here uh clear my terminal and hide it
and let's go ahead and make this quite
simpler. Let's get rid of the for loop
al together and let's simply and let's
get rid of that print statement al
together leaving only the before
variable and getting the user's input.
And now let's create an after variable.
Set it equal to before dot upper thereby
uppercasing the entire string called
before and setting the return value to
the after variable. And then let's go
ahead and print using our old friend
string uh after colon uh space and then
interpolate the value of that after
version. So now we're down to just three
lines at that. Let me go ahead and
reopen my terminal. Python of
uppercase.py enter. Type in cat and all
lowercase. And voila. Now I have
capitalized the cat all at once.
All right. Before we take a break for
some uh fruit by the foot, let's go
ahead and take a look at Python's
implementation of loops further. So in
Scratch, recall that we implemented a
loop with something like this. If I
wanted to meow three times on the
screen, I would literally use a repeat
block. In C, it was a little clunkier to
mimic that same idea. Like we could
implement a variable uh called I and set
it equal to zero. Then we could ask a
boolean expression, is I less than
three? If so, print meow and then
increment i using our old plus+ friend,
which in Python is now gone. In Python,
we can do this almost the same except I
don't think we need the data type. I
don't think we need the semicolon. We
don't need the parenthesis. While still
exists, we don't need the curly braces.
And we can't use the plus+. We don't
need the f. I mean, we're mostly just
trimming clutter from this here
implementation. So, this is the C
version. This now is the Python version.
a little tighter, a little easier to
read. It's pretty much the minimal
syntax available to get the job done.
So, how can we actually have a cat meow
in this case? Well, let me go into VS
Code and I'll stop doing everything side
by side and just stipulate that we've
done most of these examples previously
in C. And in my first cat, well, I could
certainly do it the easy way. And let me
go ahead and create cat.py. And like we
always started in the past with, I could
just do me and then our old friend copy
paste. And this of course was bad for
bunches of reasons, but it gets the job
done. In Python, if I want to do this,
well, I can just borrow that same
inspiration and I could say set I equal
to zero, then do while uh I is less than
three colon, then go ahead and print out
meow and then go ahead and do I equal or
rather I plus= 1 is maybe the most
succinct way to express that same idea.
All right, just to confirm that this
works, Python of cat.py. Enter. Meow
meow meow. All right. So, how else can
we do this? And how can we do this more
Pythonically? This is perfectly correct.
Many people might implement it this way,
but it's not quite as succinct as we
could alternatively do in Python. Yeah.
>> Yeah. So, we could maybe use a for loop.
And in fact, let's let's go there
because we don't quite have the same
types of for loops in Python as we did
in C. while loops are essentially the
same, but for loops are actually a
little bit different and actually a
little bit better. So, let me go into my
code here, delete all four of these
lines, and literally just say for i in
this list of values 01 and two colon
print meow. In other words, in four
loops in Python, you don't have the
parentheses, you don't have the two
semicolons, you don't have the
initialization and the boolean
expression and the update. You just say
a little more English-like for each I in
the following list or for each value of
I in the following list. And what Python
will do for us is automatically on the
first iteration set I equal to zero. On
the second iteration set I to one on the
third iteration set I to two and then
there's only three things in the list.
So that's it. And so just as before with
the Y and the yes example where I use
square brackets similar to arrays and C,
I was using a Python list of strings in
that case. Here I'm using a Python list
of integers 0, one, and two. And they're
integers in the sense that they have no
quotes around them. So they're obviously
not strings. And I'm printing out meow
this many times. And indeed, if I do
Python of cat.py again, I get meow meow
meow. This is correct. This is arguably
better, at least in the sense that it's
two lines of code instead of four. And
it's arguably more readable as well. But
what do you not like about this perhaps
even if you're only seeing it for the
first time?
>> Yeah, it's going to be a lot more
difficult to do things more than three
times because recall in Python in in
Scratch at least. And in C, we had the
ability to either express ourselves
literally or at least in C, we could
just change that three to any number we
want. 30, 300, no big deal. It's a super
simple change, even though it was kind
of annoying to type all of this out.
Well, in Python, yeah, I could do this
and say for I and 0 1 and two just to
mimic the numbers that we'd be setting I
equal to in the C version. Frankly, this
can be any list. It could be 1 2 3 4 5 6
uh cat, dog, bird, or any three things
whatsoever. But I'm just using 0 1 and
two for consistency with the way C would
have done it. But slightly better than
this is to use one of those other data
types that was briefly on the screen
earlier. We have not just floats and
ints and stirs and lists and tpples. We
also have what are called ranges. And
range is not only a data type in Python,
but more literally a function that you
can call to get a range of values from
zero on up. So I can change this list of
three values to a function call to a
function called range. Pass in how many
things I want and by default, per the
documentation, I'll get back a list of
numbers 0, 1, and two. And nicely,
Python's pretty smart about this. It
technically doesn't hand you back all of
the numbers at once, whether it's three
or 30 or 300 or 3 million. It sort of
hands them back to you one at a time. So
you're not using more memory just
because you're doing more iterations. So
now if I do want to iterate four times,
five times, 30 times, 300 times. I again
can just change the single value. And if
you want to be fancy too, you can skip
numbers. You can go count all the way
through odd numbers or even numbers. You
can change the incrementation factor.
But the default and the most canonical
is indeed just to count up like that. So
if I go back to VS Code here and improve
this, I can change that hard-coded list
to just range of three, clear my
terminal, run this cat one more time,
and now I'm back in business as well. In
fact, this is so common. Let me throw up
one alternative to this. You'll notice
that in the previous example, both in VS
Code and on the screen, um I am not
actually using I in any way. In fact, if
you look back at how we converted the
Scratch to Python code, I'm using I
because when you use a for loop in
Python, you have to give it a variable
in some list or range of values. That's
just the way it is. But I'm technically
not using or printing I anywhere. And
that's fine. And so it's arguably
Pythonic, too. If you have a variable
out of necessity, but you're not
actually going to use it for anything
useful, just call it an underscore
instead. And even though this is weird
looking, an underscore is a valid symbol
for a variable name in Python. So it is
Pythonic to just use this just to signal
to yourself later and to colleagues that
yeah, I'm using a variable because I
have to, but it's not one I'm actually
going to use elsewhere. It's a minor
subtlety and not strictly uh necessary,
but perhaps commonly done. All right,
how about a couple final versions of
cats then? So recall that if we wanted
to do something in Scratch forever, we
had a forever block which literally did
that. Well, in C, we couldn't quite
translate that literally. So the closest
uh approximation was probably this while
true, whereby you have a boolean
expression that by definition is always
true. So the loop is never going to
stop, thereby infinite. If you wanted to
print out meow meow meow on the screen,
adnauseium. In Python, you can do it
almost the same, but the curly braces
are about to go, the f is about to go,
the back slashn, the semicolon, and the
parenthesis. But for whatever reason, in
C, we lowercase true and false. In
Python, we capitalize true and false.
So, a minor subtlety, but it's now
indeed capital T, but the indentation
has to be the same and the colon has to
be there as well. So, with that, we can
of course induce intentionally or
otherwise some infinite loops. As with
C, you can break out of them if need be
with control C to interrupt the process.
But let's just see lastly with this cat
how we can make it a little more
abstract like the final versions of our
cat in Scratch and C. So let me propose
to open up here uh in a pro version of
cat that we looked at that we wrote in
the past. Uh it was version 12 at the
time which looked a little something
like this. This was one of the final
versions of our cat in C that simply
allowed me in Maine to call a meow
function that took an argument which is
the number of times I wanted to meow.
This in C is how we implemented that
helper function so to speak that
returned nothing. So its return type was
void but it did take an integer called n
as its input. And then there was a for
loop inside of there that printed meow
that many times. So long story short,
this was how both in Scratch and in C we
invented our own functions. Well, how
can we do this now in Python? Well, let
me bring this version of cat over to the
right here. Delete that previous
version. And let me propose that we do
this. For I in range of three, let's go
ahead and assume for the moment that
there is a meow function in Scratch
whose purpose in life is to just meow on
the screen. Well, that of course does
not exist. So, in Python, I'm going to
use a trick that allows me to define my
own function. And the keyword for this
is literally defaf for define. the name
of the function and then parenthesis if
it takes no arguments. You don't need
the void keyword even if it takes no
inputs. So let's do a simpler version of
the cat first that takes no arguments
and then we'll add back that argument.
How do how does a cat meow? It literally
just says meow on the screen. So already
we seem to be an improvement. I've got
like four lines of actual code here
versus like 20 or so on the lefth hand
side. Let's go ahead and run Python of
cat.py.
Enter. And we see the first of our
errors which is remarkable because
usually I would have messed up by now.
So here we have in Python the equivalent
of like a compiler error message. The
program has not run. It's tried to run.
It's tried to be interpreted but it
encountered some error. These are
generally called trace backs in the
sense that you see a trace back in time
of everything the program was trying to
do just before it failed. So if you've
called a function which called a
function which called a function, you'd
see all of those function calls on the
screen. I've just tried to call one
function. So, it's a relatively short
error. This is clearly a problem. And
here's the type of problem. Name error.
The name Meow is not defined.
So, intuitively, even if you're seeing
Python for the first time, why is ma
meow not defined even though it's
literally defined right there? Yeah.
>> Yeah. As smart as Python is visav,
still kind of naive in that meow doesn't
exist until line four. So, if you try to
use it on line two, too soon. All right.
So, in C, we fix this problem by
initially just kind of hacking things
together by just all right, well, let's
just define it up here and then move
that down there. And that's totally
reasonable. And in fact, if I clear my
terminal and rerun Python of cat.py,
we're back in business. But I'd argue
you can only do that so many times,
especially once you've got a bunch of
functions. You don't want to relegate
like the main part of your program,
which really this loop is, to the very
bottom of the screen, if only because
like that's the first thing you care
about. I want to see at the top of the
screen. And that's the whole point of
putting main at the very top. So what
was the solution in C? The solution in C
was to put the prototype for the
function at the top of the file. That
though is not a thing in Python. You
don't just copy that first line of code,
put it at the top of the file, add a
semicolon, and then it works. Instead,
the Pythonic way to solve this problem
for better or for worse is to actually
put your code in a main function. Main
in Python has no special significance in
this sense. It's just convention to
borrow the name that so many other
languages use as the main function in
those languages. But you just wrap your
function in a function main so that
you're defining main then you're
defining meow before you're actually
using the meow function per se. But I
have made a mistake. If I run Python of
cat.py pi. Now cross my fingers for good
measure. And now the program does
nothing.
Why is that?
Yeah. Why is that?
>> Oh, sorry. Go ahead.
>> Yeah, curiously, I never called the main
function. So whereas in C and in Java
and C++ and a bunch of other languages,
main is special. Like main is the
function by definition that is
automatically called. Python has no such
special magic. It's not going to call
main for you just because you created
it. In fact, I didn't even call that
main function main. It's just a
convention. But the solution is exactly
that. Well, if the problem is that main
wasn't called at the bottom of this
file, what I can do is just literally
call main, which we would never have
done in C, but this is conventional to
do in Python. So that after you've
defined main up here and then define
meow down here now you can call main
which in turn will call meow but at that
point in the story both of those
functions functions exist. So if I go
down here and run cat.py again now I see
my meow meow meow. Now let me add one
final flourish because this version of
the code in C recall actually let me
specify how many times I want to meow
whereas here I actually have my for loop
in main at the right and I'm calling
meow that many times. Well, what if I
want to get rid of this loop over here
and de-indent main meow here and pass in
literally the number three here. Well,
in Python, you can just say inside of
the definition of a function that it
takes an argument like n. You don't have
to specify the data type. Python's smart
enough to figure it out. Then in your
function, you can use that as with for i
in range of n. Go ahead and print meow.
So now the right-hand version of this
program is pretty much equivalent to the
lefth hand version of this program as
always using fewer lines of code. Let me
go ahead and run python of cat.py. Meow.
Meow. Meow. We're good. And then let me
make one final change if only because
most every documentation you see online
or website tutorials on Python will
actually have you not just literally
call main at the bottom but you'll do
this crazy syntax that is solves a
problem that we won't trip over in this
class but typically it's Pythonic to
actually call main after asking the
question if name
equals equals quote unquote_ain
main. This is a stupid mouthful of code
that even I had to think about when I
was typing it out if I got all the
underscores correct. But long story
short, this convention of using a
conditional before you call main allows
you to write more modular code in Python
so that some of your files don't
actually do anything other than define
define define define functions that you
can then import into other files you
write. So in short, this is the right
way to do it. Even though in CS50 it is
unlikely that we are to trip over this
bug. Questions now on that last piece of
how we define functions in Python. Yeah.
>> Ah good question and good eye. Why do I
have two lines between my functions in
Python? As you will see via style 50, it
is Pythonic that is Python convention to
separate functions in your code by two
lines. Whereas there is no such
convention in C. So I'm trying to be
consistent with what the world does.
Yeah.
>> If you want to count backwards in a
loop, can you do that? Absolutely. You
could use the range function in a
different way. Start count uh start with
a much larger value and count down. How?
But you could alternatively do that with
a while loop. I would say that yeah, you
can make that work, but you shouldn't.
It just people don't do that unless it
does actually solve a problem for you.
Other questions on this?
All right. Well, when we looked at C,
recall there was a bunch of things that
ultimately like we couldn't do well. We
ran into issues of like full loading
point precision and integer overflow and
truncation and like all of these worlds
problems. Um, there's still going to be
some of those, but first let's take a
fruit by the foot break and we'll be
back in 10. Help yourself to seconds
today.
All right, so we're back and let's use
our remaining time together to focus not
only on some of the problems that Python
can solve more readily than C, but also
some of the problems that remain. So
here was a program early on in our
discussion of C that had this weird bug
whereby when we implemented a relatively
simple calculator to divide two numbers
x / y. We experienced what we called
truncation at the time whereby 1 / 3 was
curiously zero and like something like 4
/ 3 was curiously one and we were losing
everything after the decimal point. And
this was true even if we tried using
floats because with truncation recall
everything after the decimal point with
integer math is simply discarded. So if
you do int divided by int you're going
to lose what is after the decimal point.
So let's take a look in Python at
whether this is still actually a
problem. So let me go back into VS Code
here. We'll close out the C version
thereof and let's go ahead and create
our own program called calculator.py.
And in this version, let's modify the
original, which just did some addition,
and instead have it do some division
instead. I'll get rid of my outdated
comments and perform now division
instead of uh addition by doing x / y.
Python of calculator.py, let's try one
and let's try three. And oh, our
fractions are actually back. So it turns
out in Python, even when you're
manipulating integers, if you divide one
by the other, and the result logically
should actually be a floatingoint value,
that's what in fact you're going to get
back. And you don't have to jump through
the same hoops that we did before to
actually force things to floats and then
do floatingoint arithmetic and so forth.
In fact, if you want the old behavior,
it's still actually there. And you can
use two slashes in Python to use the old
integer division as opposed to what
we're seeing here. But a typical
programmer I dare say nowadays would
want it to behave in exactly the same
way. So truncation seems to be less
therefore of an issue for us. All right.
Well, what other problems did we
encounter at the time? Well, recall we
had issues of floating point imprecision
whereby even when we divided something
simple like one divided by three and in
grade school we learned that was like
0.333
repeating infinitely many times, we
started seeing weird numbers that were
not three at the end of that value back
in the day. in C. Unfortunately, that's
a problem that's still with us. In fact,
if I use this same program here, let me
go into VS Code and instead of printing
out just X / Y, let's go ahead and do
this temporarily. Let me give myself a
variable called Z and set it equal to X
/ Y only because it'll be a little
easier to see the formatting trick I'm
going to use. Let's go ahead and print
out a format string that prints out Z.
And for the moment, let me just claim
that this is do going to do the exact
same thing. It's just completely
gratuitous that I'm using an F string
now as opposed to just printing out Z.
But if I do 1 / 3, we're still seeing
0.333.
But we're only seeing just over 10 or so
digits here. What if we want to see like
50 digits and really start poking around
at what's being represented? Well, the
syntax is a little weird, but in Python,
using an F string, you can do tricks
similar to what we did with the percent
f with print f and c. And if after my
variable's name in this uh set of curly
braces, I do a colon and then a dot
because I want to see numbers after the
decimal point and say something
arbitrary like show me 50 digits after
the decimal point and treat this as a
float. This is a crazy incantation I do
think of a format string even I am sort
of cheating off of the paper in front of
me but this is how you format strings if
you want to see them with a little uh
more precision or so I think. If I rerun
Python of calculator.py pi and do one
divided by 3. Darn it, we're still in
the same mess that we were before. Now,
why is this? Well, it's still the case
that I'm running the code on the same
kinds of computers that I did before.
It's still the case that these computers
only have a finite amount of memory. And
so, even though I'm manipulating clearly
floatingoint values, Python is only
allocating, say, 64 bits to those float
variables. And so, there's only so much
precision that's possible. And so what
we're seeing is essentially the closest
representation to an infinite number of
threes that we can represent using
binary using a floatingoint
representation therein. So still a
problem but I do think in Python you'll
find that there's so many more libraries
out there thirdparty software that comes
not just with the language itself but
from others whereby you can use uh
libraries for more precise scientific
computing that essentially implement
their own versions of floatingoint
values so that you can use not 64 but
128 or more bits than that when it
really matters to some level of
precision. Thankfully though one problem
is at least solved for us namely integer
overflow. So recall that this was
another problem we ran into whereby if
you try counting higher than say 4
billion or even higher than 2 billion if
you're representing negative numbers
which has the total range that you have
available to you in the positive range
we ran into the situation where it
somehow wrapped around became negative
and then even ended up being zero as a
result. Well, Python wonderfully
nowadays just gives you more and more
bits as needed if your integers are
getting larger and larger. So this is a
wonderful feature and that we've at
least addressed one fundamental
limitation we ran into in C and this
time the language itself provides us a
solution. Python 2 has some pretty handy
features as well. One of them is what
are called exceptions. And so an
exception in Python is a way of handling
error conditions without relying on
return values alone. So recall that in C
if you ever wanted to signify that
something went wrong you have to return
like most recently like null n ul which
was a special sentinel value technically
it's just the zero address and by
checking for that you can make sure that
you know if you're getting back a valid
pointer or not and in other functions if
something went wrong you might similarly
have to check the return value maybe
checking for zero or negative one or one
or something like that but return values
were the only way in C that functions
could communicate back to the programmer
that something went wrong. And this is
problematic because if you imagine
implementing a function that's supposed
to return maybe an integer, whether
positive, negative, or zero, it's kind
of unfortunate sometimes if you have to
steal one of those values and say,
uh-uh, you can't use this value. It's
fine in the world of pointers because
the world decided years ago, we're never
going to use the actual address o x0,
the zero address. But that's still
technically costing us one or more bytes
of space. But in general, it's a bit
annoying if your function can't truly
return all possible values. Think about
a function like get string. If something
went wrong in getstring, what do you
want to return? Well, we saw in the C uh
CS50 library, we do in fact return null
once we introduce that. But in general,
wouldn't it be nice if functions could
somehow signal out of band, so to speak,
that something went wrong? So, by that I
mean this, let's go into a new program
that's inspired by one of our programs
today. And in VS Code, I'm going to go
ahead and close my calculator, open my
terminal window, and create a new
program called integer.py. So in
integer.py, let's just play around with
some integers and see what we can break.
So here, I'll define a variable called
n, and set it equal to the input
function, which comes with Python, just
asking the human for some input. Then
I'm going to go ahead and ask a
question. Is the user's input numeric?
And it turns out if you read the
documentation for strings in Python,
they come with not just an upper
function, a lower function aka methods,
but also is numeric function or method
that tells you whether or not the string
itself happens to be numeric. That is
looks like a number. All right. So I
think if I do that, I could then do
something like this. If n is numeric,
I'm going to go ahead and claim that in
fact it is an integer. Else if it's not
numeric, I'm going to claim that it's
not an integer. I have no idea what it
is. Maybe it's cat. Maybe it's dog.
Maybe it's a mix of numbers and letters,
but it's definitely not an integer as
defined by a sequence of decimal digits
in this case. All right, so let's try
this out. Python
of integer.py. Enter. We'll type in one.
That's an integer. We'll type in two.
That's an integer. We'll type in zero.
That's an integer. Type in cat. Not an
integer. So that seems to in fact work.
But what if I wanted to immediately
convert this to an int as we did in the
past. And so let me modify this a little
bit here and say instead this n equals
not just input
asking the user for an integer or rather
let's just ask them more generally for
input but let's assume that we want to
convert this input to an int. And
actually we can go ahead and say integer
here. All right. Well, here I'm going to
go ahead and just print out the claim
that yep, this is an integer because if
we get to line two, well, clearly we've
handled uh the user's input correctly.
In other words, how can I get rid of
constantly checking the return val
sorry, how can I get away from
constantly checking the return values of
functions to make sure it is what I
expect. All right. Well, let's go ahead
and run Python of integer.py now. Enter.
Type in one tells me it's an integer.
Type in two tells me it's an integer.
zero tells me it's an integer. Type in
cat. Notice this time what goes wrong.
Whereas last time we saw this kind of
trace back error message, it was a name
error because I was using the meow
function name too early. Now I'm getting
a value error which is a different type
of error that relates to invalid literal
for int with base 10 cat. Now that's a
mouthful. So unfortunately Python's
error messages aren't all that much
better than clang's error messages. But
clearly the interpreter does not like
the fact that I'm passing something to
int related to base 10, but that's quote
unquote cat. And really, the best you
can do with this kind of error is
realize like, okay, it's clearly the
case that cat is not an integer. So,
it's having trouble converting cat to an
integer. It makes no logical sense. All
right. So, what's the gist of the
problem? Well, I'm just blindly
converting the user's input to an
integer, even if it's not input. uh even
if it's not an integer. Well, all right.
Well, I could rewind to the previous
version of my function, use the is
numeric function, and then conditionally
convert it, but I'm trying to move away
from constantly checking return values
of error messages. And wouldn't it be
nice if I could somehow catch this value
error and just deal with it if it
happens? And in fact, you can with
Python exceptions and which exist in
other languages as well, Java among
them. You have the ability to sort of
listen for errors happening inside of
functions without having to rely on
return values alone. So, let me go back
to VS Code here, clear my terminal just
to simplify things a bit, and let me
literally say to the interpreter, please
try to execute the following two lines
of code, except if something goes wrong,
like a value error, in which case go
ahead and print out something like not
integer. So, wouldn't it be nice if you
could just wrap all of the code you've
written in CS50 thus far with try and
sort of ask the computer politely like
please try to execute this code? But
that really is the the semantics behind
it. Try to execute these lines of code
except if there's an error then do this
other thing instead. And therefore, you
don't have to check any return values.
you can just blindly pass the output of
the input function as the input to the
int function knowing that if something
goes wrong inside of there, Python is
going to execute this code instead
except when something goes wrong. So let
me go ahead and run Python of integer.py
now. I'll type in one and that works
because it's trying to execute line two
and succeeding. It's trying to execute
line three and succeeding. So lines four
and four never actually kick in. But if
I try again here with cat, line two is
going to fail. Line three is never going
to get reached because Python is
immediately going to jump to this
exception handler, so to speak, thereby
catching the error or the exception and
printing not integer instead. So it's a
little bit of a weird convention. It's
different from what C offers, but a lot
of newer languages nowadays do offer
this because it's a better way of just
writing code that you know should work
99% of the time. But if something does
go wrong out of memory, the human types
something wrong in or something like
that, you can handle all of those
exceptional cases, exceptional in a bad
sense using this accept keyword instead.
questions on any of this here technique.
Yeah,
>> a really good question. In this case, I
used a value error. Do I need to define
every possible thing that can go wrong?
Short answer, yes. Now, there aren't
terribly many. There's some standard
ones and they're all capitalized in this
way. Capital letter, capital letter,
something error. Typically, you can even
invent your own. Um, and it's good
practice to enumerate the kinds of
things that you think can go wrong.
Value error is pretty generic, but there
could be memory related errors. There
could be file not found related errors.
There's a bunch of different exceptions
that are all documented in Python that
you can listen for. That said, as nice
as Python's documentation is overall, it
is not good at documenting for specific
functions what exceptions they can
throw. And I've never understood this
after all of these years that no human
has gone into the documentation and
painstakingly enumerated all of the
possible things that can go wrong.
What's too often the case in the real
world with some of my own code included
is if you encounter an exception that
you didn't think was going to happen,
you go in and improve your code and add
to this list of except clauses. What
else might go wrong? Shouldn't be that
way. And different libraries are better
about documenting these things.
All right. Well, with that in mind, let
me propose that in the CS50 library for
Python, get int and get float, they work
just like the C library whereby if you
type in cat or dog or bird into those
functions, they just reprompt you. They
just reprompt you. And long story short,
this is the kind of code we wrote in
Python. Try to get input from the user
except if something goes wrong, prompt
them again, prompt them again. So, we
too were using precisely these features
even though it wasn't something that was
available to us in C. All right. But
something else that we did in C was play
around with Mario in a few different
forms. And in lecture recall a few weeks
back, we experimented with like using
some asy arts, some very simple text to
print out something like this pyramid of
height 3. Well, how can we go about
printing something like this? Well, I
would propose that if I go back to VS
Code here, let's close out my integer
examples, code up a new version of Mario
in Mario.py. This one's kind of simple.
I can say something like for I in range
of three, go ahead and print out quote
unquote a hash. down in my terminal
window, Python of Mario 3, and I've got
really the closest analog to three
bricks stacked on top of each other in
this way. But in C in eventually, uh,
our implementation of Mario started to
get a little fancy and we started to
prompt the user for the height of the p
of the wall and therefore we could have
not just three but maybe four or even
more bricks being printed. So, let me
actually open up that version from a few
weeks back whereby from week one we had
a version of Mario that looked like this
whereby we after including some header
files declared in main a variable called
n. Then we saw a new construct at the
time, a dowhile loop that just keeps
using get int get int get in so long as
n is not uh one or greater equivalently
so long as n is less than one and kept
prompting the user again and again. The
reason for having n up here recall was
issues of scope. This therefore it's
accessible lower in the function as
opposed to it being confined to those
curly braces. And then down here we used
a for loop to actually print out that
many hashes. So in short, the dowhile
loop solve the problem in C, whereby you
want to get user input at least once and
maybe again and again and again if they
don't cooperate the first time. And
that's where doh loops really shine. Do
something at least once and maybe again
again and again. Otherwise, it's a
little more annoying to do it with while
loops or for loops. Unfortunately,
Python does not offer a dowhile loop.
And so here too, we have an opportunity
to introduce you to what the world would
call Pythonic. What is Python's solution
there too? Well, on the right hand side
here in Mario.py, let's change this a
little bit and let's do from uh let's go
ahead and do
uh while whoops while true capital T. Go
ahead and use a variable n. Set it equal
to int input
height asking the human for the height
of the wall. And I'm going to just cross
my fingers that they're not going to
type in cat or dog or something that's
not an int. In this case, I'm going to
say if n is greater than zero, that is a
positive number. That's useful. We can
proceed. I'm going to now break out of
this loop. And then lower in the file,
I'm going to say for i in range of n, go
ahead and print out the hashes. So we
still have that same lesson as before,
like the Python version seems to be
shorter, more concise, even if you
ignore the comments on the lefth hand
side. And I've completely avoided using
a dowhile loop. But there are a few
things that are different nonetheless
that feel like versus C shouldn't even
work. Like what's weird about this
solution even though I think it's
actually correct?
Yeah,
>> I have two.
>> Okay, so it's not correct. That's uh one
of the first things to point out. So,
too many prepositions for this was
supposed to say for I in range. Okay.
So, now that this program's correct,
what looks weird to you and probably
could break it. Yeah.
>> Yeah. So, the end variable should be it
seems to be scoped to the while loop, at
least in so far as it's indented inside
the while loop, which feels analogous to
being inside of curly braces and C. And
so it seems weird that I'm presuming to
use n on line six even though it was
only defined on line two. It turns out
this is possible in Python. The issue of
scope that we encountered in C is not as
rigorously enforced. We'll say for today
such that when you define N up here, you
can actually use it down here. And you
can think of this as being a little
reasonable because if there's no more
specification of what data type n is and
no more semicolon. Just imagine it would
look kind of stupid if you just put an a
blank N there and hit enter just so it
kind of exists. There's no way to
express the idea of create this variable
in advance without actually assigning it
a value. Whereas in C we could do that.
So this is in fact okay and correct. Um
what else is going on here? Well instead
of a do while we're kind of just
implementing the idea of it. I'm just
blindly inducing deliberately an
infinite loop like do the following
forever but then as soon as I have the
answer I want like a positive integer
from the human break out of this loop
and this is indeed the pythonic way to
say get user input because this will
minimally ask the user for a height once
and maybe more and more times. So no do
loops only while loops and for loops and
only while loops are really the same as
in C. Even for loops we've seen are a
bit different. All right. Well, how
about instead of just that Mario uh
example, recall this one where we wanted
to print like four question marks in the
sky side by side. Well, we can do this
in a few different ways. Let me go back
to VS Code, close the C version, and
let's just completely change Mario.py to
implement this. Now, I want four
question marks in the sky. So, I think I
can do something like for I in range of
four, go ahead and just print out quote
unquote question mark. Do you like this?
Python
of Mario.py Pi. Should I run it? No.
Why?
This is how I did it in C. Yeah.
>> Yeah. I got to edit the end value, the
named parameter for the print function
because otherwise if I hit enter,
they're all on different lines, which is
not the effect I want when all four
question marks are meant to be side by
side. All right. Well, that's an easy
fix. I can pass the named parameter
called end into the print function. Set
it equal to quote unquote with double
quotes or with single quotes. As always,
stylistically, I would be consistent.
So, I'm going to use double quotes even
though the documentation is consistent
with its single quotes. Now, I'm going
to rerun Mario of Python Mario.py. And
I'm so close. Now, they're on the same
line, but the stupid cursor didn't move
to the next line. That's fine. How to
fix this? Well, just logically, I can
put a blank print statement below. And
even though I'm not passing anything in,
you get a new line for free when calling
print. So even though I'm not passing in
any arguments, I am getting the
aesthetic effect that I want. So that is
a perfectly reasonable way to do it.
Now, if you feel yourself becoming a bit
of a geek though in learning about
Python and previously C, you can even
solve this problem even more
Pythonically by saying print quote
unquote question mark* 4 using
multiplication similar in spirit to the
plus operator for concatenation. And now
multiply the exclamation point by itself
four times. So now if I go down here and
run Python of Mario.py, I get a very
elegant solution to exactly that same
problem. even more concisely than my
previous version. What if I want to do
something in two dimensions? Well,
recall that we moved to the underground
of Mario Brothers here and we had like a
3x3 grid of bricks. How can we do that?
Well, in C, we had nested for loops
using I and J back in the day. And I
could do the same thing in Python. Let
me go back into VS Code here and let me
do one outer loop for I in range of
three. Then let me do an inner loop for
J in range of three. Then let me go
ahead and print out a hash. But let me
learn from my past mistakes. I don't
want to print out a new line every time.
So let's override that default. But
after each row, let's print a new line.
So that down here, I can go in Mario.py,
run it, and I've got my 3x3 grid of
bricks. I could change this a little bit
and call this row and column. Even
though here too, even more so. I'm not
literally using row and column anywhere
explicitly, but semantically it kind of
explains maybe a little clearer to the
reader what's actually going on. So that
might help. But we could tighten this up
too, right? If I just want to print a
3x3 grid, well, I know that the top
thing here will iterate three times. And
I know how to very elegantly print
things out with a oneliner. So I could
just print out a hash times three in
this case. And then down here, I can go
to Python of Mario. And voila, I'm back
in business 2. So it's just sort of
easier to do these kinds of things and
express yourself all the more
succinctly. Well, what else can we do?
Well, it turns out in Python that unlike
arrays, you can ask lists how long they
are. So you don't have to keep around a
variable of how large an array is. You
can just add stuff to a list and then
ask Python how long is this list? How
many elements are in it? Case in point,
let me go back to VS Code and clear out
Mario.py pi and let's reimplement from a
few weeks back the notion of uh
calculating uh like and the average uh
quiz score that you might have in a
class. So in score.py, let's go ahead
and create a program that's got a list
called scores of three scores that we've
seen before, 72, 73, and 33. And recall
that we tried a few weeks back and see
to average these together. And to do
that, we had to add them all together.
We had to uh divide by the total number
of elements in the list. Like it wasn't
that hard. It was sort of like grade
school arithmetic to calculate an
average. But Python has more functions
available to us. Not just length, but
even summation. So let me go ahead and
do this. Let me say that my average
variable shall be the sum of those
scores divided by the length of those
scores. And indeed, per the
documentation, Python has a lang
function, leen for short, a sum function
which takes the add uh which adds
together all of the elements in that
list. And so down here now I can say
something like print with an f string or
format string that the average is
whatever that value is. And I don't have
to do any loops or math myself. I can
just call the function like I could in
Excel or Google Sheets or Apple numbers.
Python of score.py
enter. And my average is in fact
59.3333. And then some weird imprecision
at the end there. And in fact just for
consistency with our C code, let me
rename this. I'm going to rename score
to scores plural. That's going to close
the window. But now at least you'll see
online that we have a program indeed
called scores. Well, this is not that
interesting because I've just hard-coded
my 72, my 73, and 33. What if we want
the human to be able to type that in?
Well, I think we can do that, too. So,
let me actually open up that version of
the file now pluralized. Let me go ahead
and not initialize the list for the
human, but let me set it equal to an
empty list. Just using an open square
bracket and close square bracket, like
an array that has nothing in it. But
this one is literally of size zero at
the moment. And now let me do for I in
range of let's just for now ask the user
for three scores. Even though we could
certainly ask the user how many scores
do they want to input and then use that
number instead. So in each of these
iterations, let's ask the user for a
score using something like int input
score. I'm going to set aside the
reality that if the user types in cat or
dog, the whole thing's going to break
and therefore I should really add my try
and my accept. But I'm going to discard
that error checking and focus only on
the essence of this program for now. Now
after line three, if I have in a score
variable the user's quiz score, how do I
put it into that array? Well, in in that
list, well, with an array, I had to use
the square bracket notation, keep track
of how big it is and use like bracket I
or something like that. No longer in
Python because a
uh list is an object that has not only
data but functions aka methods
associated with it. I can just call a
method that comes with every Python list
called append and pass in that score
using that same dot notation as before.
The rest of my code can stay exactly the
same. If I now run Python of scores.py
pi and I type in 72 73 33 manually
though I still get that same average and
notice I did not need to decide in
advance how big that list of scores was
going to be questions on what we've just
done with lists.
No. All right. Even cooler for some
definition of cool is that we can now
implement hash tables or more
generically dictionaries sets of key
value pairs by just using a data type
that comes with Python. I claimed last
week that like Python that dictionaries
are sort and hashts in particular are
sort of the Swiss army knives of data
structures and that they just let you
associate some piece of data with
others. With Python, you do not need to
jump through the hoops that you needed
to with problem set five implementing
your own spell checker and your own
hasht. you just create a dict object in
Python, a dictionary that gives you the
ability to associate keys with values.
So, case in point, let's do this. Let me
go back into VS Code and close out
scores.py and let's create a new and
improved version of our phone book in
phone book.py. Let's go ahead and come
up with a list of names just to
demonstrate how we could store a bunch
of names in the phone book irrespective
of numbers and set those equal to say uh
Kelly's name and my name and John
Harvard's name just by putting four
quoted strings or stirs inside of this
list. Now let's ask the human using the
input function for the name that they
want to search for in this list. And now
let's implement linear search using
Python. I can do this in a bunch of
ways, but one way is to say for each uh
name, we'll call it n in names, go ahead
and ask the question if the name I'm
looking for equals the current name in
the list that I'm iterating over, go
ahead and print out just something
generic like found and then break out of
this loop. And let's see if we can find
Kelly or David or John or someone else.
Python of phonebook.py. Enter. Searching
for the name, say David. Enter. And it
was in fact found. Let me go ahead and
search for someone else's name that's
not in there, Brian. And now it's not in
fact found. Although it's not all that
enlightening to just ignore the question
altogether. It would be nice to say not
found. And here where is where in C it
would be kind of nonobvious to do this
in C. If you wanted to print out found
or if you get through the whole list and
you still haven't found the user, print
not found. you'd have to like keep track
with the variable of whether or not you
found the person or you'd have to return
from the code prematurely just to get
out of it logically. Turns out somewhat
weirdly but wonderfully usefully for
loops in Python can have else clauses
associated with them whereby I can say
down here print not found. If I run this
version of the program and search for
someone who's not in the phone book like
Brian now I actually see not found.
Semantically, it's a little weird, but
essentially what's happening is if you
get through this whole loop and you
never call break, then you've not
actually broken out of the loop. So,
you're going to hit the else. And in
that case, you're going to print out not
found. And this is such a common thing
to like do this kind of bookkeeping and
keep track of whether or not something
has happened inside of a for loop. And
if so, do this, else do that. Else
literally handles that scenario in
Python. And this is the most C unlike
thing that we've perhaps seen in terms
of features with regard to at least
loops. All right. Well, this is great
that I've kind of implemented linear
search, but like we did that in C and
it's getting a little tedious. Can't we
do better? We actually can. Let me clear
my terminal and tighten this up. Instead
of iterating over every name in names,
just like we keep iterating over
integers in ranges and checking for each
name if it equals the thing we're
looking at, you can actually do
something much more clever. You can just
literally ask Python if the name you're
looking for is in the names list, then
go ahead and print out uh found, else
print not found. And so this is where
Python 2 gets kind of cool. In line
five, you have just a simple if
condition with a boolean expression name
in names. How does Python know if name
is in names? It uses linear search
presumably to search over the whole list
of names looking for what you care about
and then tells you true or false if it
found it. You don't have to write the
code to iterate over it with a while
loop or for loop or whatnot. You just
say what you mean. And so here too, it's
a little more English-like. If name in
names, question mark, then print found,
much more so than it would be
pronouncable in C. So that's one other
cool feature that we now have at our
disposal. What's yet another? Well, when
it comes to dictionary objects in C, or
rather in Python, a dict object really
just gives you a set of key value pairs.
And we've seen this kind of chart before
whereby we might have name and number
and name and number and name and number.
How do we translate this to code?
Because in C, as with problem set 5, it
was going to be quite an undertaking to
be able to store a whole bunch of things
in memory in the form of something like
a hash table. Well, in Python, we can
actually define a dictionary ourselves.
So, these square brackets represent a
list, but I can alternatively use curly
braces for a very new purpose. I'm going
to go ahead and hit enter just to move
the second curly brace to a new line.
And I am going to now enumerate a bunch
of key value pairs. Namely, quote
unquote Kelly for the first key colon.
Then we'll do + one 617495
1,000 as the number. Then I'm going to
go ahead and do quote unquote David for
the second key. And since we both work
here, I'm going to go ahead and just use
that same number as we've done in
before. Then a third key for John
Harvard colon. And for John, we'll use
plus one 949
uh 4682750,
which is fun to call or text this. Now,
even though it's syntactically a little
different, gives me the equivalent of
this chart here, key value pairs, where
the keys are the staff names and the
values are the staff numbers. That
implements all of that, a hash table, if
you will, in Python's own syntax. So,
how do I now use this? Turns out I can
actually use it in exactly the same way.
I'm going to go ahead and generalize
this now to people because it contains
not just names but names and numbers. So
I'm going to change this variable down
here to people too. But notice the
syntax now. I can still ask the human
for a name they want to look up. I can
now still say if the name is in the
people dictionary. And by definition,
Python's going to interpret that
preposition in as meaning is the
following key in the dictionary. And if
so, it's going to return true. But
what's cool about this is that besides
just making this work as follows. Python
phonebook.py. And let's type in David.
And there's my number. Oh, that's not my
number. It just says found. Let's run it
again and type in say Brian. Not found.
Okay, that's as expected. But I'd like
to know what my number is or Kelly's
number or John's number. Well, that's an
easy fix, too. Inside of this
conditional, I can say something like
this. Number equals people bracket name.
And we've not seen this before, but we
have seen square brackets in C when we
had arrays. This square bracket notation
is how you indexed into an array to get
a specific value 0 1 2 3 4. What's
amazing about dictionaries, not just in
Python, but in other languages as well,
you can now index into a dictionary just
as you can index into an array. But
whereas an array you use numeric
indices,
in dictionaries you use string indices.
You can use strings to look up their
corresponding value. So to be clear,
name at this point is given to us by the
human's input. So if I typed in DAV ID,
name equals David. So this is like
saying people square bracket quote
unquote David. Find David's number. that
stores the answer from this two column
chart in the variable called number. And
all that remains is for me to print it
out, which I can do using an old fing.
Now, let me go down into my print
statement, change this to an fstring,
add a colon, add the number variable to
be interpolated, rerun this program as
Python of phone book.py, type in my
name, and there's my number as found.
And this is incredibly powerful. And why
again
uh hashts and in turn more generally
dictionaries are sort of the Swiss army
knife. Being able just to look up data
with such simple syntax is wonderfully
useful and powerful. And in fact we can
even do more than this. For instance,
let me propose that if you think about
other incarnations of um key value
pairs, you see them all the time. For
instance, in like spreadsheets, like
here's a screenshot of Google Sheets
whereby I've got the beginnings of a
spreadsheet with uh names and numbers.
But in this model, I want to actually
associate some metadata with my data. So
the data I care about is the actual
names and numbers. But you could imagine
having a third column like email address
and maybe home address or any number of
other pieces of data associated with
these three people. For now, I've just
got two columns or two attributes, names
and numbers. Each of the rows in a
spreadsheet, as most anyone knows who's
used a spreadsheet before, represents
different records or different pieces of
data, like this is Kelly, this is David,
this is John, and so forth. We can
implement this idea using dictionaries
and lists together. So the syntax is
going to be a little strange at first,
but let me go back to VS Code here and
let me change my people uh dictionary to
be a people list between square
brackets. And the elements of this list
now are going to be uh dictionaries
themselves. I'm going to use some curly
braces inside of these square brackets
and say that the name of one person is
quote unquote Kelly and the number for
that person is quote unquote +16174951
1000 close quote then comma on the
outside of the curly braces then I'm
going to have another quote unquote name
colon dv ID comma then another number
colon I'm going to borrow the same phone
number because we both work here then
lastly a comma and finally quote unquote
name colon quote unquote John and then
lastly a quote unquote number for John
colon plus one uh 949468275
zero.
All right. So what's going on here now?
Our people variable is now not just a
simple dictionary with just individual
key value pairs. Name number name number
name number number. We now have a more
generalized way of storing not just a
name or a number but an email address or
a home address or any number of other
values. How? Well, the commas just
separate the key value pairs now. So, if
I do have email addresses for us, I can
put comma quote unquote email colon like
and I can just keep adding these key
value pairs to each of the dictionaries
because a dictionary is a collection of
key value pairs. So it stands to reason
that I can associate name with David,
number with the number, email with
mailinhar.edu and so forth, effectively
implementing this idea now in the
computer's memory. And at the risk of
significantly oversimplifying, this is
what Google and Microsoft and Apple are
doing with their spreadsheet software.
They have written code that presents to
you a nice table with a graphical user
interface on the screen, but underneath
the hood, what they effectively have is
lists of dictionaries representing each
of those rows. And we're going to come
back to this when we start experimenting
before long with our own databases.
Going to get back rows of data from
databases. We are going to store that
data in lists of dictionaries for the
same reason as well. So, how can we use
this? Well, let me hide my terminal for
a second and tweak the program just a
little bit. I'm still going to get the
name of a person to look up their
number. I'm still going to uh how about
iterate over this because I've lost the
ability at least for now to just ask a
question like is this name in the
structure because it's a list I do now
need to iterate a little bit
differently. So I'm going to do for each
person in the people list go ahead and
check is the current person's name equal
to the name I'm looking for and if so go
ahead and create a variable called
number. set it equal to that person's
number and then go ahead and print out
for instance found colon then in my
curly braces that specific number and
then after all that break out of this.
So this is a mouthful but recall that
it's all the same syntax we've seen
before in smaller parts. Square brackets
and square brackets means here comes a
list. What are the elements of this
list? dict dict three dictionaries back
to back to back each of which has a key
and a value and a key and a value called
name and number respectively. The second
one temporarily has name and number and
email as keys plus three values and the
third one has keys of name and number as
well with their corresponding value. So
when I iterate over each person in the
people list that means on each iteration
person is going to be set to this
dictionary then this dictionary then
this dictionary on each iteration I'm
asking this question is that current
person's name key uh is rather is the
value of that person's name key equal to
the name I'm looking for and if so grab
a variable called number set it equal to
the value of that person's number key
and then just print it out. And if we
wanted email instead, I tweak the word
uh number to email. If I want to look up
anything else, you can tweak that code
there. But being able to index into
dictionaries using strings is sort of
the fundamentally powerful new technique
that we have here.
Question now on any of this? Yeah.
>> If both
>> Good question. If you wanted both name
and number on the screen, do you
concatenate? Sure, you could do that. Or
print them out by passing a comma into
the print function and printing one out
each way. Absolutely. However you want
to format it. And actually, just as an
aside too, even though this becomes a
little less readable, this is a little
silly that on line 11, I'm declaring a
variable called number only to use it
one line later and then never again.
Technically with those curly braces and
format strings, I could just take this
code on the right, plug it into those
curly braces and get rid of this
variable altogether. Just at some point
though, fstrings start to get a little
too hard to read with quotes inside of
quotes. And so like I kind of prefer
being a little more pedantic about it
and explicitly putting it in a variable
and then interpolating just that
variable. But you could do it in
different ways still.
All right, couple final features of
Python that'll get us on our way with
doing other things. Turns out there's a
whole bunch of libraries that come with
the language itself that you nonetheless
have to import. Even though they're not
third party, you didn't have to install
them. You just need to add them to your
code by importing them. One of them is
CIS. And among the things that the CIS
library has in Python is the ability to
give you access to command line
arguments. After all, we've lost access
to command line arguments because
there's no more main, at least by
convention. There's no int main void.
There's no int main argv arg stuff going
on in our code. But all of that
functionality is still available in a
library called uh cis. So how do we use
this? Well, let me go back to VS Code
here now. Let me create a relatively
simple program called greet.py. Similar
to a few weeks back that's just going to
greet the user using command line
arguments instead of get string or the
input function. I'm going to do this by
saying from the cy library import argv.
In this case, argv is essentially just a
list. It is a list of the command line
arguments that the human has typed. It's
a list, which means you can just ask the
length function leen what its length is.
So, there's no need for arg anymore. You
can just literally ask arg how long it
is, which is kind of nice. So, I'm going
to say this. If the length of argv
uh equals 2, which means the human typed
two words at the prompt. Okay, let's go
ahead and greet them assuming that's
their name and say hello,
and then whatever their name is. Let me
make this a format string. And to be
pedantic, let me create a variable
called name and set it equal to argv
bracket 1, which is going to be the
second word that the human typed in, as
has been our convention in the past.
Else, if they didn't type exactly two
command line arguments, let's just go
ahead and print out something like hello
world as generic. Let me run python of
greet.py. Enter. And you see hello world
because I apparently did not type in
exactly two words and yet I did. So
let's see where this is going. Let me
rerun Python of greet.py but type in my
name David at the command line. Enter.
And huh I screwed up unintentionally.
What did I do wrong? All right. Print f
is not a thing. So that's an easy fix.
Let's delete it. Let me clear my
terminal window. Rerun python of
greet.py space David. Enter. And now I
get hello David. The only thing that's
weird here is that I typed in three
words at the prompt and yet I'm checking
for two. And it's a bit subtle, but with
Python and RV, it ignores the Python
interpreter. It goes without saying that
you're using the Python interpreter to
run a Python program. So the only things
that are being counted are the words
after the Python interpreter itself. So
when I type greet.py and David, that's
two. When I only typed greet.py, that's
one instead.
All right. So now that I've done that, I
have access to my command line
arguments. Again, what about my exit
statuses? This was getting a little low
level, but in recent C programs, we've
had you all returning zero on success,
returning one on error. Can we still do
that? Well, yes. And in fact, the CIS
library is used for that as well. So if
I want to actually add some exit
statuses to a program to facilitate
check 50 and automated tests in the real
world, I can do that with a program
called let's call this uh exit.py. And
in exit.py, Pi I'm similarly going to
import uh CIS but in a different way.
I'm going to give myself access to
well yes let's go ahead and import the
whole library just to demonstrate how
you can access things inside of it
without explicitly saying from cis
import such and such as before if uh the
length of cis.orgv arg. So this is a
little bit different, but I'm asking the
same kind of question. Does not equal
to. I want to go ahead and print out to
the user missing command line argument,
which is something we did a while back
as well. And then I want to exit with
code one. CIS.exit
one else. If I don't run into that
issue, I'm going to go ahead. Actually,
let's not even bother with an else.
Let's for parody with our C version,
let's do this. print f quote unquote
hello
uh cis.orgv bracket one close quote
cis.exit exit zero. All right, that's a
whole mouthful, but what's really going
on? So, I could have done from cis
import argv, but I don't need to
enumerate every single variable or every
single function that I want from a
library. I can also just more generally
say import the whole library. Give me
access to everything and then I'll tell
you what I want from it later.
Therefore, on line three, I can still
access argv. I just have to scope it to
the cy library. So that I say cis.orgv
not arg means go inside of that library
and find me arguing
it to a variable unto itself in my own
code. Why am I saying not equal to two?
Well, if they don't give me two words uh
after the interpreter's name, I want to
yell at them and say missing command
line argument and then exit one. I'm not
going to give them a default hello world
anymore. I want them to give me their
name. Meanwhile, if I get this far and I
haven't exited from the program, I can
print out cis.orgv bracket one, which is
going to be David in the example I typed
before. And this means success. So
cis.exit
zero signifies success. It's more syntax
than before uh than it was in C, but we
have the exact same functionality
available to us as we have in the past.
How about one other example that we've
had in the past. Let's convert it to
Python as well. So you have a few more
tools in your toolkit. How about
implementing a version of this phone
book that actually persists? So instead
of hard coding into it Kelly and David
and John in this way, let's actually let
the user type in a name and a number
just like on your iPhone or Android
phone and add it to a text file like a
CSV file as we did before uh using
commaepparated values. Well, it turns
out that Python comes with a library to
handle CSV files. We don't need to
hackishly implement our own CSV support
by printing the commas ourselves.
Instead, we can import the CSV library.
We can then create say a variable called
file set it equal to open and open a
file called phonebook.csv
in append mode. So this is almost the
same as C except it's open instead of
fop which we saw a couple of weeks back.
Now let's ask the user via the input
function for the name they want to add
to their contacts and the number that
they want to add to their contacts. And
then in after that, let's go ahead and
do this, which is a bit of uh muscle
memory to to remember, but I'm going to
create a variable called writer, but I
could call it anything I want. Set it
equal to CSV.riter,
which means there's a function called
writer in the CSV library that I'm
simply accessing it because I didn't
import it explicitly by name. And I'm
going to pass it that file. This tells
Python, turn that file into a CSV that
can be written to. The next line of
code, I'm going to literally say
writer.right
row. Write row is a method aka function
associated with this writer object. And
I know that only because I did actually
read the documentation uh for the CSV
library. What do I want to write? Well,
I want to write a list of values, namely
a name and a number. And I'm using
square brackets to tell the right row
function that here you go. Here's a list
of values, two of them, a name and a
number. After all that, I'm going to do
file.close and just close the whole
file. All right, so where does this
actually get me? Well, let me go ahead
and open up phonebook.csv, which is
initially empty. I'll move this over to
the right hand side.
But when I now run this program with
Python of phonebook.py,
enter. I'll type in, say, Kelly's name.
Enter. + 1 6174951000.
Enter. And voila, it ends up in the CSV
using a little bit less code than we had
to last time with C. Let's run it once
more. And I'll type in my name. And I'll
again use + 1 617495
1000. Enter. It's being appended to that
file as well. And one last time for
John. Plus 1 9494682750.
Enter. Voila. So it's pretty easy. That
is to say in Python to start creating
files like this. But this isn't really
Pythonic. Let me in fact close the CSV
file, hide my terminal, and propose that
we can tighten up this code a bit too. I
don't need to open up the file way up
here. I can go ahead and get my
variables values uh this way first. And
in fact, I could have done that code a
little later anyway, but I can do this
in Python. I can say with the following
file opened, phone book.csv CSV in
append mode and refer to it as a
variable called file. Do this stuff and
close the file yourself. So this program
is suddenly significantly shorter
because this one line has the effect of
opening the file for me in append mode,
assign it to a variable, do this stuff,
and then as soon as the program's
indentation ends and there's code over
here or no code whatsoever, the file
gets closed for me automatically. This
just helps us avoid like memory leaks
and like stupid mistakes we've made in C
because you forget to close a file that
you have to open and you don't
necessarily notice unless you run valr
or something on it. Python tries to
avoid this by giving you a new keyword
with that doesn't really make sense
semantically except with the following
file open and it will close the file for
you. So that's two among the features
that you sort of get with Python. The
catch though is that this CSV is fairly
simplistic. In particular, it's missing
a header row that actually indicates
what is in each of the columns. In fact,
if I go ahead and run code of
phonebook.csv, we'll see again that the
file contains just one row for Kelly,
for me, and for John. Whereas, ideally,
it would look a little something more
like this Google sheet version, which
actually has at the very first row
something say name and number, which
then describes the data therein, after
which are the three actual rows. Now,
the simplest fix here, frankly, would
probably be to just start with name,
comma, number at the top of the file and
then assume that my phonebook.py program
is just going to append, append, append
additional rows to the file containing
the names and numbers respectively. I
could have done that from the get-go.
And in fact, that would be better than
putting some code inside of phonebook.py
PI that writes out that specific row
because after all, if I'm writing
running this program again and again, I
don't want the header row to appear
again and again and again unless I
complicate the program a little bit to
ensure that I only do that once. But
assuming that I do go into phonebook.csv
and from the get-go do have a file that
contains name and number, we can
actually start to improve upon the
implementation of phonebook.py pi
because we can take advantage of the
fact that my dictionary can act that my
writer can actually read that same
header. In fact, let me put these files
side by side here. And then in phone
book.py, let's go ahead and transition
away from using a writer to using a
so-called dictionary writer or dict
writer for short. Capital D, capital W.
And then let me go ahead and specify one
additional argument to this particular
function, namely field names, which I
know exists because I looked it up in
the documentation. And the value of this
argument is supposed to be a list of the
fields that are presumed to exist in the
CSV that we're about to write to. So I'm
going to do quote unquote name, quote
unquote number. Line's a bit long, so
it's scrolling there. But if I scroll
back to the left, we'll see that the
line is otherwise unchanged. But when I
go down now to write each respective
row, notice that I don't have to rely on
this list which just assumes somewhat
naively that name will always be in the
first column or column zero and number
will always be in the second or column
one. After all, if someone were to move
that data around, at least in the
spreadsheet using Excel or Google Sheets
or something else, my code would end up
being fairly fragile because at the
moment it's just assuming blindly that
name goes first followed by number. But
once we have that header row in there
and tell dict writer about it, we can
actually now pass in not a list but an
actual dictionary of key value pairs and
let the dictionary writer figure out
where in the file which column those
values should go in. So inside of this
dictionary, I'm going to have one key
called name, the value of which is
indeed the name the user typed in. The
second key of which is going to be quote
unquote number, the value of which is
the number that the user typed in. And
let me go back actually now and fix a
typo from earlier. We're only asking the
user for one number. So all this time I
should have just requested one number
aesthetically with my input function
there. Now notice I have the file ready
to go. Indeed name and number are there
that matches the field names I've
provided to my code and it matches the
key value pairs that I'm subsequently
passing to right row. So let's go ahead
and give this a try. Let me go ahead and
run again with this otherwise empty CSV
file. Say for the header uh phonebook.py
with uh Python of phonebook.py. Enter.
I'm going to now go ahead and type in
say the first name which was Kelly
before plus 1 617495
1000 and watch what happens at top
right. Kelly and her number end up in
the file even though I didn't actually
specify explicitly as with a list or
numeric indices which value goes where.
Let's run it once more and put in myself
again. Plus 1 617495
1000. Enter. And there again I am. And
lastly, just for good measure, let's go
ahead and put John back in the file with
plus one 949-468-2750,
which if you still haven't called or
texted, do feel free enter. And voila,
in phonebook.csv, we have all of those
same rows and code that's a little more
resilient now against any changes we
might subsequently make there, too. All
right, how about now some final
flourishes using some other features of
Python that we did see a glimpse of some
time ago, namely the ability to install
libraries of our own choice. So, up
until now in CS50.dev, we CS50 have
pre-installed most of what you need,
including back in week uh the earliest
weeks of the class when we had that cows
program that I wrote that was using a
thirdparty library that I had installed
into my code space in advance. Well, you
can use a program called pip to install
Python packages into your own code space
and if using your own Mac and PC onto
your own Macs and PCs as well if those
libraries are freely available as open
source online and in the repository from
which the Python uh pit program actually
draws. Let me go back to VS Code and let
me go ahead and create a new program
called cow.py. And with this program,
I'm going to go ahead and import that
library cows. And after that, I'm going
to call cowsay.cow
quote unquote say this is CS50 to have a
cute little cow on the screen say
exactly that. Now, in a previous
lecture, I had pre-installed this
library. But suppose I had forgotten to
do so today. Let's see what other type
of error we'll see on the screen. Well,
let me go ahead and run Python of
cow.py. Enter. And there's another one
of those trace backs. This one's a
little more straightforward than the
name error and the value error we saw in
the past. This is a literally module not
found error. no module named cows. Well,
this is where the pip command comes in.
If something hasn't been pre-installed
uh for you in cs50.dev or in the real
world on whatever system you're using,
you can use pip install cows and
assuming you've spelled it correctly and
assuming the library is publicly
available, hitting enter will result in
pip automatically downloading the latest
version, installing it in this case into
your code space and solving hopefully
that problem. Let me clear my terminal
window, run python of cow.py Pi again.
Definitely cross my fingers. And there
is the most adorable cow. And if we full
screen the terminal, we'll see that he's
indeed saying this is CS50. Now, that's
just one of the things we can install
with cows. I could also install
libraries onto my own Mac and PC. In
fact, in just a moment, I'm going to
switch over to another computer here
where I have a terminal window open on
my own actual Mac. And I'm doing this
because I'd like to play around with
some speech uh some texttospech uh
library functionality which you can't
really do in cs50.dev because it's
browserbased and when you run code in
the cloud it's not going to pass the
audio along to your speakers on your
laptop or desktop. But if I'm running
Python and my own code on my own
computer, a Mac in this case, or a PC in
someone else's case, I can install that
kind of library, speech to text, and
have my own code on my own computer, use
my own speakers to verbalize some string
quite like that. So, how can I go about
doing this? Well, having read some
documentation, I'm going to go ahead and
install with pip a library called pi to
text uh text to speech version 3.
hitting enter goes and finds and
downloads as needed the uh the library
if it's not already installed and then
brings me back to my terminal and I'm
going to use an older school program
here called Vim or vi to actually
implement a cow program on this computer
whereby I'm going to go ahead and write
some code using this library without VS
code but with just another text editor
instead to do this at the very top of my
file I'm going to import this library
called Python texttospech so pyttsx3
for version three and then I'm going to
use only three lines of code to
synthesize some voice. I'm going to say
a variable called engine. Set it equal
to pi ttsx3.init
because the documentation taught me that
I need to initialize the library the
first time I use it. I can then use this
variable called engine to actually say
something quite like scratch albeit
verbally instead of pictorially like
this is c-50 quote unquote. And then
lastly I can use engine.run run and wait
similar to some scratch block so that
the whole expression is actually
verbalized before my program actually
quits. Now, the first time I run this,
it might take a moment for the library
indeed to initialize itself. But on my
own Mac here, I'm going to run Python of
cow.py. If we could raise the volume
just a little bit, hopefully we'll not
see but hear this cow's greeting.
>> This is CS50.
It was very much in a rush to say it,
but after initializing for that long.
And if we ran it again and again and
added some optimizations, we could get
it talking much more quickly than that.
But we now have a version of the program
that indeed verbalizes what string or
stir it is that I've passed into it
here.
>> CS15.
>> It's really in a rush to finish there.
All right. But let's try one final
flourish of another library that's fun
to play around with, if only because
it'll motivate some of the things you
can now do in Python yourself. Let me go
into VS Code in my code space because
this one does not require my speakers.
I'll close that first version of the cow
and I'm going to go ahead and create a
QR code generator after installing with
pip uh a library called QR code which I
read about online and now it's installed
in my code space. I'm going to now go
ahead and create a file called uh QR.py.
So let's go ahead and code up QR.py and
I want to generate my own QR codes. Most
of you in the h are in the habit if
you've ever generated a QR code before,
you probably just Google around for some
generator online for which someone else
wrote code to generate the QR code. But
I can do that for myself and actually
generate my own images. I'm going to go
ahead and import the library that I just
installed. Import QR code. And then
below that, I'm going to create a
variable called for instance image and
set that equal to this libraries QR code
function. No relation to the make that
we use for C. And I'm going to make a QR
code containing a URL maybe of one of
the lecture videos. So let's do
httpsyoutube.com
the short version and then xvfz
j5
p g
uh gg0 if I got that just right. Then
after that I'm going to go ahead and
call image.save to save that URL as a
file called qr.png
quote unquote. And then PNG will be the
format which is portable network graphic
which is akin to a JPEG or a GIF but
with different features. I'm just going
to double check my writing here. So we
go to the right lecture video and I
think we are indeed good. And what that
should do after running my code is leave
me with today's final flourish a ping
file in my code space that when open is
going to be QR code that you can scan
with your phone. So if you'd like to get
ready for this final flourish I'm going
to go ahead and run Python of QR.PI and
hit enter. Thankfully, it worked. I'm
going to now open up qr.png
and close my terminal window. And for
our final moments together this here in
week six, after which we'll ultimately
transition to yet more languages and
problems to be solved, here is a final
code for you to scan of today's here
lecture.
All right, that's it for today. We'll
see you next time.
All right. This is CS50 and this is
already week seven wherein wherein we
introduce another programming language
this time known as structured query
language or SQL or SQL for short. Now
SQL as we'll see is a different sort of
programming language that allows us to
solve like a lot of the same kinds of
problems that we've been dabbling with
over the past several weeks but arguably
in a lot of context it allows us to
solve those problems more easily.
Indeed, among the goals for today are to
demonstrate that sometimes there's
multiple tools that you can use to solve
the same problem, whether it's C or
Python or today's SQL. Um, but we'll
also see that uh SQL allows us a
different sort of approach to solving
problems. Whereas C very much so and
Python to a large extent are very much
procedural programming languages whereby
you have to write these procedures,
functions step by step that tell the
computer what to do including loops and
conditionals and all of that. SQL is
said to be a declarative programming
language which is a different sort of
paradigm whereby when you want to solve
some problem you essentially declare
what problem you want to solve or you
declare what question you have and it's
up to the programming language to figure
out using loops and conditionals and all
of those lower level building blocks how
to get you the answer. So ultimately
today is all about teaching you yet
another language mostly so that you can
learn again to teach yourself new
languages and to appreciate that once
you exit a class like CS50 and are out
there in the real world really isn't all
that big a deal to pick up new
programming languages especially when in
advance you've seen different
programming paradigms like procedural
like object-oriented like today
declarative as well but today ultimately
is also about data and so to get us
started we thought we'd collect some
real world data by asking all of you a
couple of questions So, if on your
laptop or phone you would like to pull
up this URL here,
it will also exists in just a moment in
QR code form. So, if you'd like to go to
that URL there or simply scan this here
QR code with your phone, that's going to
lead you to a Google form. For those
unfamiliar, Google has lots of tools
among which are uh is a tool via which
you can ask people questions via forms.
Microsoft has something similar as well.
And at that URL, what you'll soon see is
a form that looks a little something
like this. Among those questions are
which is your favorite language, at
least among those we've studied thus
far. So go ahead and anonymously answer
the questions you see on this form.
You'll see which is your favorite
language and also which is your favorite
problem in problem sets thus far. And
meanwhile, as you might know, if you've
used Google forms yourself to collect
data, we can move from questions here to
actual responses. And as people start to
buzz in, we'll see that the data set
here is starting to update in real time.
And Google gives us these nice graphical
user interfaces or guies via which we
can analyze the data. And so far, Python
is easily the winner with 70% plus of
you preferring it. 11% of you uh wishing
we were still in Scratch and N 18% of
you in C. And you'll see the responses
are coming in here. But for our purposes
today, what's more interesting than the
actual answers to these questions is how
we can get at the raw data. So among the
things you can do in Google Sheets is
quite literally click view in sheets,
which is in Google forms is click on
view in sheets. And what this is going
to allow me to do is access the
underlying raw data. Now, because Google
has forms and spreadsheets, they sort of
tied these two products together. But
what's especially nice about Google
spreadsheets is that I can also download
the raw data as a file. I can download
it as an Excel file, a text file, a PDF.
But for today, we're going to download
it in a very common format known as CSV
for commaepparated values. And indeed,
if I go to the file menu, download
commaepparated values. This is perhaps
the most uh straightforward, easiest way
to get raw data out of any kind of
tabular data like this to load it into
code that we are about to write. So, if
you haven't buzzed in already, that's
fine. But at this point in time, now
that I've clicked the button, I now have
a CSV file in my Mac downloads folder,
which if I go ahead and open up here, I
can see that indeed I've got this long
named file, favor-form responses 1.csv.
I'm going to shorten that file name to
just favorites.csv.
And what I'm going to go ahead and do is
open up VS Code. And in my file
explorer, I'm going to literally just
drag and drop favorites.csv from my Mac.
that's going to have the effect of
uploading the file as it was at that
moment in time so that we can now begin
to write some code using this file. And
VS Code has automatically gone ahead and
opened it up for me. And what you're
looking at here is what we're going to
start to call a flat file database. It's
a very lightweight database in the sense
that it stores a lot of data. And it's a
flat file in the sense that it's
literally just a text file. And by
convention, the way the data is stored
in this file is indeed by separating
values with commas. There are other
conventions as well, but CSV is probably
the de facto standard. But TSV is a
thing for tab separated values, PSV,
which is pipe separated values where you
might have a vertical bar. Essentially,
these file formats try to use a
character that might not appear in the
actual data so as to separate your rows
and columns. So indeed, if I switch back
to VS Code here and we take a look at
the data, you'll see that from Google
Sheets, I've been given three columns.
Timestamp, which was automatically
generated for me, the language, as well
as the problem. And what I see here is
that we had a few respondents buzz in a
little early. Uh very excited for
today's data. But here's the rest of
them from like 1:30 p.m. Eastern onward.
And you'll see separating separated via
commas are effectively three columns of
data. So everything before the first
column represents a time stamp.
Everything between the first and second
comma represents the choice of language
that you all buzzed in with. And then
everything after the second comma
represents the problem. Now it's kind of
uh jagged edges. It doesn't line up in
nice rows and columns because some
answers are longer, some answers are
shorter, but the commas are sufficient
to tell the code we write where one
column ends and the next one begins. So,
how do we go about writing code like
this? If we'd now like to ask some
questions about the data, like what is
the most popular language? What is the
most popular problem? Or conversely, the
least of each of those. Well, we could
look at the original data in Google
forms and that's where we got the pie
chart. But how is Google figuring out
what the most popular answers are and
what uh pie charts it wants to depict?
Well, they probably wrote some code not
unlike what we're about to do. Although,
we'll start with just a command line
environment as always. So, within VS
Code, I'm going to go ahead and do this.
I'm going to go ahead and open up a
program called favorites.py. And let's
write a program whose purpose in life is
to open the CSV file, read it top to
bottom, left to right, and then crunch
some numbers, figure out what the most
popular answers are to those questions.
So, I'm going to go ahead and import a
package that comes with Python, a
library called the CSV library. And
nicely enough, this is just code that
someone else wrote years ago that
figures out how to read data from a
file, separating it via comma, so that
you and I don't have to write all of
that ourselves. Then, I'm going to use
this Pythonic convention with open quote
unquote favorites.csv
as file. Though, if I want to be super
explicit that I intend only to read this
file, which is the default, I'm going to
go ahead and explicitly say quote
unquote R, just like we did in C when
using fop to open a file in read mode.
And now I'm going to do this. I'm going
to go ahead and say reader equals
CSV.reader
file. So, this is a Python convention
whereby the CSV library comes with a
function called reader that takes as its
sole argument here a file that has
already been opened. And what that
reader will do is figure out where all
of the commas are so that I can iterate
over this reader in a loop and get back
row after row after row without me
having to write all of the code to
figure out where those commas are. So
what I'm going to do in this loop here
uh in this uh block of code is for each
row in that reader, let's go ahead and
just print out maybe the second column
which was the language column. So I'm
going to go ahead and say print row
bracket one because what we'll see is
that this reader which again comes with
Python hands me a list a list a list for
each of the rows wherein bracket zero
would represent the first column bracket
one would represent the second bracket
two would represent the third because
everything is zero indexed in Python.
All right so let's see what the effect
is here let me maximize my terminal
window run python of favorites.py Pi
cross my finger that I got this right
and voila there is every language that
was selected by you all in the form from
top to bottom by default chronologically
but there's a bit of a bug I dare say
let me scroll up and up and up in this
output through all of these answers
until I get to the very top where I ran
the program myself which is here python
of favorites.py Pi. There's a minor bug
here. What's the bug in the output?
Yeah,
>> yeah, it accidentally includes the
header, which is a bug in the sense that
I really just wanted to see the
languages, but the code is doing what I
told it to, which is just print out
every row. So, there's a few ways we
could ignore this. Let me go ahead and
minimize my terminal window and let me
go ahead and say, well, you know what?
after we create this reader, let's just
skip to the next uh let's just skip to
the next row and ignore it effectively
and then begin iterating over everything
thereafter. And so what happens now is
if I remaximize my window, rerun python
of favorites.py
enter and now scroll up again to the
beginning of this incarnation of the
program. You'll see that the very first
thing I see after my program was run was
indeed Python, Python, Python, Python,
and so forth. No more quote unquote
language. So, how is that? Well, this is
a a feature we haven't quite seen before
or talked about in much detail, but this
reader is is stateful in some sense. And
this was actually true of all of the
file IO we did in C whereby when you
were using f read or some other function
to read data from the file something was
remembering where it was in the file so
that you didn't get the same bites again
and again and again. It was more like uh
a cassette tape, an old school cassette
tape if you will, or a scrubber along
the bar uh along the bottom of like any
streaming video whereby when you just
read some data, it grabs the next chunk,
the next chunk, the next chunk, the next
chunk, and something inside of the
computer's memory remembers where it is.
So, this says skip to the next row. And
thus, when you do four row in reader,
you get everything but the first row
because the reader is stateful. It
remembers where it is in memory. All
right. All right. Well, thus far this
isn't all that useful because all I'm
doing is just printing out the data. But
let's take a step toward making this
program a little more useful. In
particular, let's just be a little more
pedantic and specify that what I'm
really doing here inside of this loop is
figuring out what the current rows
favorite is. So, I'm going to create a
variable called favorite and set that
equal to row bracket one. And then even
though this doesn't change the
functionality, I'm going to print that
favorite just because semantically,
stylistically, it's nice to know what
row bracket one is as by defining a
variable that tells me or anyone else
who reads this code in the future what
it's actually doing. All right, but
readers are only so useful. And in fact,
if I were to open up this CSV file,
maybe in Microsoft Excel or Apple
Numbers or Google Sheets, again, you
could imagine someone kind of moving the
data by just dragging one of the columns
to the left or the right such that now
it's no longer timestamp language
problem. Maybe it's timestamp problem
language or maybe time stamp is all the
way over to the right. You could imagine
therefore that the indices we're using 0
1 and two could be a little fragile
because if someone changes the data on
me now my code is just going to break
because I am blindly assuming that the
second column aka bracket 1 is going to
be the language column but that might
not be the case but there's an
alternative to this and you might recall
having seen this before. I'm going to go
into favorites.py and tweak my code a
little bit not just to use a reader but
a dictionary reader. So I'm going to
change this to dict reader instead of
just reader. And then the upside of
using a dictionary reader is that every
time I go through this loop reading row
by row by row, each row that I'm handed
by this reader is not going to be a list
anymore that's numerically indexed with
zeros and ones and twos. Each row is
going to be, as you might guess, a a
dictionary, which is a collection of key
value pairs, which means now we can use
words as our indices instead of just
numbers. Which is to say if I switch
from reader which gives me lists to dict
reader which gives me dictionaries I can
change this line 10 now and say I
specifically want the language column
wherever it is all the way to the left
or the middle or the right. So in
general using a dictionary reader is
probably just going to be more robust
because it's resilient against changes
in that actual numeric ordering. All
right, let me pause here to see first if
there's any questions on this exercise
whose purpose in life is just to
demonstrate how we can download the CSV
data then iterate over it line by line
without actually analyzing it yet.
No. Okay. So let's ask maybe the most
natural question which is like how many
people prefer Python? How many people
prefer C or Scratch in turn? In other
words, how can we recreate in our own
code what Google Forms is doing for us
graphically with those pie charts? Well,
I think what we could do is write some
code logically that essentially relies
on this mental model. What I have here
is an opportunity to use a bunch of key
value pairs because if I want to know
how many instances of Python there are
and C and Scratch, well, those might as
well be three keys, the values of which
are hopefully going to be three numbers
that represent the counts of the
popularity of each of those languages.
So in memory, I essentially want to
construct something that looks like this
and would if I were doing this on a
chalkboard. But recall that this mental
model maps perfectly to the notion of a
Python dictionary because a dictionary
in Python is indeed key value pairs. And
we've seen it already because that's how
the dictionary reader works. But we
could certainly use our own uh
dictionaries to solve this same problem
ourselves. So the goal at hand is to
count the number of people who said
Python and C and Scratch respectively.
So how to do this? Well, I think what I
could do is Oh, and actually let me
delete this line. Because we are using a
dictionary reader, we no longer need to
skip the first row. It is automatically
consumed by the dictionary reader for
us. So, this now would be the better
version of the dictionary reader. Let's
go ahead and do this. Let me declare
some variables first that will store for
me the total number of people who said
Python, Scratch, and C respectively. So,
I could say Scratch equals 0, uh C
equals Z, Python equals Z. And I could
just set three variables equal to 0 0 0
and 0. If you haven't seen it before,
there are some Pythonic uh tricks you
can do here. If you've got three
variables that you want to initialize
all at once because it's that simple,
you could alternatively do scratch, c,
python equals 0, 0, 0. This too would
have the intended effect and it looks a
little better because it's all a simple
oneliner. But what do I want to do now?
Well, down here, let's go ahead and do a
simple conditional before we enhance
this by using an actual dictionary. Let
me go ahead and say if the current
favorite in that reader equals equals
scratch. Well, let's go ahead and
increment the scratch variable by doing
plusals 1 as we saw last time. Uh, else
if the favorite in the current row
equals equals quote unquote C. Well,
let's go ahead and then increment the C
variable by one. uh else if the favorite
equals equals Python, then let's go
ahead and increment plus equals uh
Python by one instead. I could
technically get away with saying else
here, but I'm consciously this time not
trying to overoptimize this because if
someone changes the form maybe next
semester and whatnot and we're asking
about a fourth language, I wouldn't want
my code to assume that anything that
isn't Scratch or C must be Python when
there could be some future fourth
language. So, this is a little more
robust and in this case, we'll just
ignore anything that isn't Scratch or C
or Python. All right, at the end of
this, let's go ahead and not just print
out the favorite, but outside of the for
loop, let's go ahead and print out, for
instance, the Scratch count is this.
Then, let's go ahead and print out the C
count is this. And then let's print out
the Python count is this. But, of
course, there's a subtle bug here. Yeah.
Ah, so I didn't format these things as f
string. So I need the little f over here
to the left of each of these strings.
All right, so let me go ahead and
maximize my terminal window, run Python
of this version of favorites.py, and
hopefully what we'll see is not every
row again and again and again, but three
lines of output, giving me the total
counts instead. All right, this seems to
line up with the rough percentages that
we saw coming in earlier on Google
Forms. 109 of you like Python, followed
by 58 of you in C, and 24 of you
preferring Scratch instead. All right,
but why does this perhaps rub you the
wrong way? I already alluded to the fact
that we're going to get rid of this, but
why is this not the best design just
using three variables like this? Yeah,
>> different categories.
>> Yeah, exactly. If we were to add a bunch
more languages, a fourth one, a fifth
one, a sixth one, a 10th one, a 20th
one, like having that many variables is
just certainly going to look unwieldy
and it's just not going to it shouldn't
rub you the right way. At that point, we
should really be graduating to some
proper data structure, whether it was an
array in C or better still in Python, an
actual dictionary. So, let's do that
instead. Let me go ahead and in a newer
version of this file, let's get rid of
these individual variables and let's
just have a generic variable called
counts, for instance, and set it equal
to an empty dictionary. And just using
two curly braces will give me an empty
dictionary. Or if you want to be more
pedantic, you can actually call the dict
function, which will return to you an
empty dictionary. I'd argue though that
most people would probably just use the
double curly braces like this to
indicate that here comes a dictionary
for me. Now, how do I use this? Well, I
don't need to update three separate
variables. I think I could just do
something like this. I could say once
I've determined what the current rows
favorite value is for language, I could
say counts bracket favorite. So, use the
current string as an index into the
dictionary. So, it's going to be quote
unquote Scratch or C or Python. and then
just increment that by one. And then
down here, we don't have these variables
anymore. So, I'm going to go ahead
instead say uh how about this? We'll use
a loop for each favorite in those
counts. Let's go ahead and print out uh
how about the favorite value and the
counts thereof without any fing.
Okay. So the only thing that's different
is I'm using a dictionary here which is
essentially the code version of this two
column chart whose keys are going to be
the favorite strings uh scratch or C or
Python the values of which are going to
be the actual counts and I'm just doing
some simple math by plus+ing or
incrementing the count each time I see a
certain language. Unfortunately this
code is not quite going to work. Let me
go ahead and run Python of favorites.py
Pi and dang it, there's a key error. Let
me minimize the terminal window so we
can see both at once. Why is there a key
error apparently on line 11 wherein I'm
indexing into the counts array uh
dictionary?
What's going on? Yeah,
>> the key already exists.
>> Yeah, it's a little subtle, but if this
is like the very first time through the
file, there is no key Python. There is
no key C or scratch because no one has
put them there. And yet recall that plus
equal means you're going to that
location in the dictionary and just
blindly incrementing it. But what is it?
Well, it's effectively a garbage value.
But it's not even that because there's
no actual key there. So we need to do a
little bit of logic here. And we can
solve this in a couple of ways. Well, I
could say something very pedantically
like this. I could just say, well, if
this favorite is in the counts
dictionary, this is the Pythonic way to
ask that question. Is this key in this
dictionary? If so, well, then it's safe
to go ahead and increment it just as
I've done before. But if it's not, what
I think I want to do is set counts
favorites equal to
one instead because either I want to
increment the current count by one or
this is the first time logically I've
seen this favorite so I want to set it
equal to one instead. We could do this a
different way logically just like we
could in C solve problems differently. I
could instead say something like this. I
could get rid of all this code and just
say if favorite not in count then I
could say count bracket favorite equals
zero. So just always initialize it to
zero if it's not there. Now I can safely
blindly update the count by one because
now I know no matter what once I get to
line 13 that count is actually there.
All right, so let's see with this
version of the code. Let's go ahead and
clear my terminal window. Uh, rerun
python of favorites.py. Cross my
fingers. And there we go. Python and
Scratch and C. Interestingly, the order
switched around this time uh based on
the order in which I was inserting
things into the dictionary. But we'll
see how we can exercise a bit more
control over that. But let me propose
that that key error. call. We discussed
briefly last week that whenever you have
these kinds of trace backs that refer to
certain exceptions like exceptionally
bad situations that can happen, you can
also change your code to just try to do
something and then try to catch the
exception instead. So an alternative way
to do what we initially did would be
this. Instead of just blindly saying go
into the counts dictionary, index into
it at the favorite uh key and increment
it by one, what we could do is try to do
that. please, except if there is a key
error, in which case, you know what, go
ahead and just initialize that value to
one instead. So, in short, there's like
four different ways already to solve the
same problem. Whichever way you prefer
is quite reasonable. This is just
another way and arguably another
Pythonic way to do things by trying to
do something but anticipating that
something in fact can go wrong. A while
ago you removed
>> a while ago what
>> you removed next reader.
>> Correct. A while ago I removed next
reader because that was only necessary
for CSV reader because that was just
reading every row again and again. But
when you use a CSV dictionary reader
that automatically consumes the first
row because that's how the dictionary
reader knows what the columns will be
called and so you don't have to skip
over it instead. A nice enhancement.
other questions on what we've just done
here.
All right, so let me propose that like
writing this amount of code is kind of
annoying just to ask a relatively simple
question like what's the most popular
language in this file, right? You it's
been nice. It's sort of a step backwards
from Google spreadsheets and Apple
numbers and Microsoft Excel where you
could really just like highlight the
column and it would just tell you the
answer usually in the bottom righth hand
corner or you could use a function in
one of those spreadsheet tools to ask
the same question. So, it's starting to
feel like with almost a 20 lines of
code, like maybe there's a better way.
And I dare say there is. Rather than use
a flat file database, let's graduate
already to what the world calls a
relational database. And a relational
database is simply data in which you
define relations among your data, which
isn't so much relevant now except that
that timestamp is associated with that
language is associated with that uh
prefer favorite uh problem as well. But
we'll see that data sets can be much
more uh much larger and more
complicated. And it might be valuable if
we can actually express relationships
across multiple pieces of data. In
particular, let's introduce already a
programming language called structured
query language or SQL for short, aka
SQL. And SQL essentially only has four
fundamental operations. So even though
we're transitioning into a new language,
by the end of today, we're going to
transition out of the new language
because there's only so much you can do.
Now, as with any language, it's going to
take time and practice or to sort of get
a hold the hang of it. But take comfort
in knowing that SQL really just supports
four fundamental operations. And the
acronym that the world uses is indeed
CRUD, which stands for create, read,
update, and delete. That is to say, when
using a relational database, you can
create data, read data, update the data,
or delete data. And that's pretty
comprehensive as to what's possible.
Now, what is an actual database? Well,
generally speaking, a database is just a
piece of software that's running on a
computer somewhere inside of which is
stored a whole lot of data. And that
database therefore provides you with
access to that data at any time, whether
it's on your local Mac or PC somewhere
in the cloud or to a whole cluster of
web servers, which we'll talk about in
the weeks to come as we transition from
uh command line tools to the web. Now,
technically in SQL, the commands you
actually use to implement this idea of
creating data, reading data, updating,
and deleting data is almost the same.
But for whatever reason uh the world
chose the command select which is
equivalent to reading data. So we'll
soon see that there's a command in SQL
that lets us select data which is
equivalent to this idea of reading it
whereas the other three options refer of
course to writing data that is changing
data. Um technically speaking we'll be
able to insert data into a database as
we'll soon see and we'll also be able to
drop data altogether not just delete
individual rows but whole tables so to
speak of uh rows instead. So what does
this all mean? Well, let's go ahead and
do say an example of using SQL to solve
to ask some relatively simple questions
and begin to develop some muscle memory
for using this new language. If I were
to manually load a bunch of data into a
proper database for SQL, I would
actually use code like this. I would
literally type create table. Then I'd
come up with the name of the table, aka
sheet, and then I would specify every
column that I want to put in that table.
And here's where the vernacular changes.
So whereas in the world of spreadsheets
you have sheets, tabs that contain rows
and columns, in the world of databases,
you have tables which are just rows and
columns. It's different terminology, but
it refers to conceptually the same
thing. In CS50, we're going to use a
specific version of SQL known as SQL
light, which is like a lightweight
version of SQL that's actually very
commonly used in web applications, in
mobile applications, but it doesn't have
all of the bells and whistles or all of
the scalability uh that your Oracle, SQL
Servers, Microsoft Access, Postgress,
MySQL, those are just product names,
open source and commercial like, which
if you've ever heard of just represent
uh bigger, faster versions of SQL
databases. is, but we'll indeed use the
lightweight version of it known as SQL
light. And the command we're going to
start to run is quite literally SQLite
3, which is version three of the same
command, which we've pre-installed into
your code spaces for you. So, let's go
ahead and do this. Let me go ahead and
run a command called SQLite 3, which is
going to let me create my very first
SQLite database, and I'm going to import
into that database the CSV file that we
downloaded from Google Forms. In other
words, I'm going to load that same data
set into a different program, an actual
database, so that I can use a completely
different programming language to ask
questions about it instead of writing,
as we just did, some Python code. So,
let me go back into VS Code here. Let me
close my CSV file and my Python file.
Let me reopen my terminal window and let
me go ahead and run SQLite 3 space and
then the name I want to give to this
database, which for instance will be
favorites. DB for database uh by
convention. Enter. I'm going to be
prompted to make sure I want to create
this new file. Y for yes. Enter. And now
I'm inside of the database running a
command at a prompt that's now says SQL
light and then an angle bracket. I'm not
going to be using anySSQL
files for now. Although you can actually
write SQL code in separate text files.
I'm actually going to use the databases
interactive interpreter to just run all
of the commands I want interactively by
just typing them out. Semicolon enter.
type it out, semicolon, enter, back and
forth. But you can save all of these
commands as you'll see in problem set 7
in files as well. Now, how do I go about
actually importing that CSV file into
this lightweight database? Well, for
this, I'm going to execute three
commands. And any command in SQLite that
starts with a dot is specific to SQL
light, this lightweight version of SQL.
Anything that doesn't start with a dot
is generalizable and will work on most
any SQL database anywhere in the world,
no matter the product you're using. So,
I'm going to go ahead and in my SQLite
terminal, I'm going to change my mode to
CSV mode just to tell the database that
I want to load some CSV data. I'm going
to then literally import that data from
a file called favorites.csv, which is
the file we downloaded earlier and then
uploaded to my code. And now I have to
specify the name of a table. So, I'm
going to call this table aka sheet
favorites just to keep everything
consistent. And that's it. In the
absence of an error message, everything
probably worked fine. I'm going to do
gotquit. That quits out of SQLite. But
what you'll now see if I type ls is that
not only do I have favorites.csv, which
I uploaded, favorites.py, which we wrote
a few minutes ago, but I also now have
favorites. DB, which is a database
version of that same file. Now, I can't
actually see what's inside of it because
if I go ahead and run uh code of
favorites db, I'm going to see this file
is not displayed in the text editor
because it is either binary or uses an
unsupported text encoding. This is to be
expected because this database is stored
essentially in the form of zeros and
ones that the SQLite 3 program knows how
to read, but is not something that VS
Code can just show me everything
therein. And generally storing data in
binary is going to be more efficient
than storing things purely textually
because we're going to be able to use
various data structures and algorithms
that we've been talking about for weeks
uh more easily on that binary data. All
right, so let's go ahead now and see
what this import command did. I'm going
to again uh maximize my terminal window.
I'm going to go ahead and run SQLite 3
again, passing in favorites.db. Enter.
This time it already exists so it just
opened it without prompting me. And now
I'm going to go ahead and type another
SQLite specific command called schema.
The schema of a database is just the
design of the database. What does it
look like? What are the rows and columns
and tables therein? So if I type dots
schema, what I'm going to see is this
SQL command create table if not exists
quote unquote favorites which is the
name of the table. Then in parenthesis
there are going to be apparently three
columns. One of which is called time
stamp. The next of which is called
language. The third of which is called
problem. And each of those columns is
going to be raw text. Now we'll soon see
that it doesn't have to just be text.
But when I use the import command, this
is the default table that SQLite created
for me. Soon we'll see that I can
exercise more control, especially over
the types of data that I'm putting in
this database. But what's really nice
about the import command is it could not
be easier to convert a CSV file to a
SQLite database. So that now as we're
about to see we can use SQL on it
instead of Python or any other language
instead.
Okay. So how do we go about getting data
from this database? Well, the first of
our commands that we'll explore is that
one called select. So select data means
to read data from the database. And in
this sense, it's going to be a
declarative language because I'm just
going to declare what data I want to
select from the database. And I'm not
going to worry about opening the file
anymore or iterating over it with a for
loop or a while loop or defining
variables or the like. I'm just going to
select syntactically what I want. So let
me go back to SQLite here. Let me clear
my terminal just to get rid of the past
commands. And let's do the first of
these. Select star from favorites. And I
regret to say uh the semicolon is back
for the SQL code we're now writing.
Enter. and we will see a sort of asy art
version now. So even better than the raw
CSV file of all of the data that was
imported into this table. So select star
from favorites is apparently selecting
everything. So the star in this context
is a wild card of sorts that represents
all of the columns in the table. The
table itself is called favorites. So I'm
selecting all of the columns from the
table called favorites. And here you
have it with sort of simple ASKI art.
first column, second column, third
column, chronologically listed because
that's exactly how it was loaded into
the database. All right, so if star is
wild card, what more can we do? Well, if
you don't care about all of the columns,
you can actually be a little more
specific. So I could say instead, select
just the language column from the
favorites table, semicolon, enter. And
now I have just a single column of data
that shows me one cell for every
submission but not the timestamp or the
favorite problem that that person put
in. Or if I want to declare that I want
a couple of columns. So I can say select
language and problem but I don't care
about the timestamp from favorites as
such and now you get two columns
instead. So in short, rather than write
the dozen or so lines of code that we
earlier did with Python to open the file
and then iterate over it with a reader,
we just select what data we want from
this here database. But even more
powerfully, SQL comes with a whole bunch
of functions built in. Quite like the
spreadsheet software that you and I are
already familiar with in the real world
like Excel and numbers and Google
Sheets. SQLite comes with an average
function, account function, distinct
lower, min, max, min, uppercase, and so
forth. There's a whole list of them.
We'll play around with just a couple of
these. If we want to transform some of
this data, let me go back into VS Code,
clear my SQL light terminal, and suppose
I just want to get the total number of
rows in the favorites table, like how
many people at the moment in time I
downloaded the file, even if not
everyone had quite buzzed in yet, did I
end up with in that file? Well, I could
say select the count of all of the rows
from the favorites table semicolon. And
now I'll get back a single cell which
gives me 272 submissions had come in the
moment I downloaded that file. Suppose I
want to see just to confirm that no one
submitted bogus data. Which languages
were actually among those typed in?
Well, I can select only the distinct
languages that were typed in from the
favorites table. And now I get a unique
list of languages that everyone buzzed
in with irrespective of how many times.
If I want to maybe get um how many
distinct languages there are, if it's
not as obvious as three here, I could
select the count of distinct languages
from the favorites table and it would
just tell me the answer. Three is the
total number of languages that are
distinct in that submission. So again,
it's even easy to just eyeball this, but
very quickly with single statements that
are sort of English-like left to right
is enabling me to just select the
answers I want to some of these
problems. Well, what more can SQL do?
Well, here is a bunch of other
uh keywords that we can add to our SQL
commands that allow us to control
further what kind of data we're going to
get back. We're going to be able to
group data by similar values. We're
going to check for not just string
equality, but for uh fuzzy matching,
checking if something is close to a
string that we're looking for. We can
limit the total number of rows coming
back. We can order or sort the data by a
certain column. And we can actually have
predicates, so to speak, using a wear,
which is similar in spirit to an if
condition, but a little more succinctly
written instead. So, for instance, let
me go back to VS Code here. Let me clear
my terminal again, and let me go ahead
and select how many of you answered C is
your favorite language. Without
selecting all of the counts again, let's
just uh hit the nail on the head. So,
let's select the count of rows from the
favorites table where the language
selected equals quote unquote C
semicolon. And I get back a simple
answer. 58 of you buzzed in with the
answer C. How many of you liked both C
and very specifically the problem called
hello world? If you sort of that was the
extent of your sort of um the passion
for for code, let's go ahead and select
the count of star from favorites where
the language you typed in equals quote
unquote C. Uh and the problem you typed
in equals quote unquote hello,
world semicolon. And it looks like five
of you said your favorite language was C
and your favorite program was hello
world. Great. All right, so it's getting
a little more interesting. What about
the other version of hello world where
we called it hello, it's me. Well, that
one's interesting because I think it's
going to break my convention of using
single quotes, which would be convention
here in SQL. Whenever you're using a raw
string, single quotes here would be the
norm. But let's type this out. So,
select count of star uh from favorites
where language equals quote unquote C.
And the problem this time equals quote
unquote hello, it's me. So, at a glance,
this is probably going to confuse SQLite
3 because does that middle apostrophe
belong to the first one or the second
one? This is ambiguous. And this is
weird. In C, we would solve this problem
by putting a backslash in front of it in
a so-called escape character. Different
languages have different conventions.
This one's a little weird, but in
SQLite, what you instead do is doubly
single quote it. So putting two single
quotes is the convention for escaping a
single quote just because you got to
remember or Google these kinds of things
in the real world if you forget. Enter.
Now I get back that. So not it was not
the case that any of you liked both C
and that problem specifically. Well,
what if we want to be a little more
inclusive of either hello problem? Well,
I could do this in this way. Uh just
like in my uh code spaces terminal, I
can go up and down to go back through my
history. Same thing in SQLite. So I can
go back to commands to get up here and
let me go ahead and write something
longer where the problem is hello world
or the problem equals quote unquote
hello it's double apostrophe me single
apostrophe semicolon oh and parenthesis.
So it's wrapped onto two lines here. So,
it's a little messy, but I'm just
logically saying where you buzzed in
with C as your language and a problem of
hello world or a problem of hello, it's
me. Enter. It should be the same answer
as before because none of you liked
hello, it's me. But I chose this syntax
because I can actually make this a
little cleaner. I can go and delete this
whole parenthetical and just say where
language equals C. And the problem is
like quote unquote hello,
percent sign, single quote semicolon. So
this is a little weird too. It's just
how SQL does this instead. But whereas
previously I was using an equal sign to
check for literal string equality like
literally those problem names, like
allows me to use wild cards. And it's
not a wild card quite like the previous
used of the asterisk that we saw. When
you are using a wild card in a string in
SQL, you say percent sign to represent
zero or more characters there. So hello,
space percent is going to hopefully
match this or the other problem that
started with hello, so let me go ahead
now and hit enter. The answer is still
going to be the same, but indeed it's
demonstrative that that is how you could
express yourself a little more generally
if you wanted a pattern match like that.
Questions now on any of these
techniques? Yeah,
>> capitalization capitaliz.
>> Uh, good question. Does it have to be
capitalized when doing string equality?
Yes, but not with like. Like will
tolerate case insensitivity. So
uppercase or lower case,
>> but like count and everything.
>> Oh. Oh, I see. Good question. So the
capitalization so stylistically in SQL I
would argue and this is a stylistic
convention in SQL certainly for CS50 and
also for a lot of companies and
communities in the world to uppercase
your SQL keywords just to make them
stand out from words that you and I
chose as like the name of the table or
the name of the columns therein. This is
just a convention. I would propose like
always to be consistent but for CS50 and
for style50 sake I would propose that
you indeed capitalize like this. And
frankly, it just makes it easier to read
to my eye because the SQL stuff jumps
out and then the lowercase stuff is
specific to your data set. A good
question.
All right. How about another
uh set of keywords that we saw on the
screen earlier, namely grouping by?
Well, suppose we have a data set like
this whereby we suppose we have a data
set like this whereby
how does this go? Happy Halloween.
whereby here's just an excerpt from that
table. So for as languages go uh say one
of you liked C, two of you like or three
of you liked Python and then now that
we're introducing SQL, let's imagine
that two of you now like SQL even
better. So that's the extent of the data
set. Wouldn't it be nice to be able to
figure out how many of you like C or
Python or SQL? Well, I could write some
Python code, open the file, iterate over
it using variables, using a dictionary,
and those what 20 or so lines of code we
wrote earlier to answer this question.
Wouldn't it be nice to just ask the SQL
language to figure out how many of you
like C, how many of you like Python, how
many of you like SQL? We can do this by
grouping these cells by common values.
Let's group all of the Python rows
together and all of the SQL rows
together. And even though there's just
one, all of the C rows as well. So, how
can we do this? Well, let me go back to
VS Code here and clear my terminal. And
let's do this. Let's select every
language but its respective count as
well from the favorites table. But
before you do any of that, group
everything by language. So this one
takes a little more practice and getting
used to, but this is simply saying
select all of the it's saying look at
the languages essentially group all of
the common languages together and then
figure out what count that gives you for
all of the grouped rows. If I hit enter
here, we'll get an answer just like the
Python code that took me 20 lines of
code to write earlier. What's really
happening though in the database is
something a little bit like this.
Notice, of course, that there's only one
version of C. There's then three
versions of Python and there's two
examples of SQL. And the table I'm
essentially building is to group all of
those by identical values and then spit
out the total counts here. Now on the
screen, it's just one, three, and two.
in the data set with some 200 plus
responses, we have much larger answers
including scratch instead of SQL right
here. But this now sort of speaks to
just how much more convenient it is to
if you want to ask a question like that,
especially if the data set is more than
a couple of hundred rows. If your boss
for instance in the real world has a CSV
data set and wants you to analyze the
data, well, you can literally download
it, import it into SQLite, run one
command, and boom, like you've got this
analysis done. if the extent of it is
just to group the data and figure out uh
what kinds of uh counts you have in the
data set. All right, what else can we
do? Well, we can play around with this a
bit more. Let me go back here into VS
Code and propose that we could uh order
those results more than in just the uh
the default way. So, let's go ahead and
select the language uh and the count
from the favorites table yet again.
Let's group by language yet again, but
this time let's order by the counts
column in descending order. So, it's a
bit more of a mouthful and it takes some
practice to memorize all of the syntax,
but when I hit enter now, I get back the
same answers, but Python is at the very
top of the list. Now, count star isn't
necessarily all that self explanatory,
and indeed, it's a little annoying that
I have to write out count star here at
top right as well as in the beginning.
So, it turns out SQL also supports
aliases. So if you want to change the
temporary name of the column to be
something else like n for number, well
then I can actually define an alias with
the keyword as order by n at the end of
this statement and then hit enter and
get back the same results too. And so if
it's not sort of implicitly clear
already, each of these SQL select
commands is essentially giving me back a
temporary table. This is not being saved
anywhere. Like now it's gone from the
computer's memory once I've actually
gotten my answer. But it's essentially
returning a subset of the tables that do
exist in the computer's memory because
that's what the import command did for
me. It loaded the whole data set into
memory. And now I have these temporary
tables that are just containing the
answers to questions I care about. And
if you only care about the top one
language, well, there's a limit keyword,
too. I can literally just say limit one
at the end of that exact same statement.
Enter. And now I've got a single answer
to my question. A single row saying
Python was the most popular with 190
people selecting that.
All right, for now I think that's enough
on select. There's a few more keywords,
but it really is just a matter of
composing these building blocks.
Questions though on these capabilities
fundamentally.
All right. Well, how about maybe
inserting data instead? So here might be
the canonical way to insert a row into a
table in SQL. You literally say insert
into then the name of the table then in
parenthesis the one or more columns for
which you have data and then literally
the word values and then in another set
of parenthesis a commaepparated list of
the one or more values that you want to
insert into those there columns. So for
instance let me go back into VS code
here. And of course at the time we
circulated this form a few minutes ago
we had not yet assigned problem set 7.
But in problem set seven is a problem
called 50ville, which let's propose
might very well be someone's favorite in
a week. So let's go ahead and insert
that row now pro uh preemptively. Let's
insert into the favorites table two
columns, language and problem. Why?
Well, I don't really care to figure out
what the time stamp is and the format
thereof. So I'm just going to omit the
time stamp altogether. But the values
I'm going to insert for this new row are
going to be are going to be quote
unquote SQL comma quote unquote uh 50
bill close quote close parenthesis
semicolon enter. Nothing bad seems to
have happened. Let me go ahead and
select star from favorites just to see
what my data set looks like now. And
indeed at the bottom of the file or the
bottom of the table indeed there is that
new row. But what's sort of noteworthy
is that this isn't just blank. There's
our old friend null, which is not a null
pointer. It's the same word literally,
null l, and it refers explicitly to the
absence of data. And this is actually a
nice feature because if any of you have
ever used like Google spreadsheets,
Apple numbers, Microsoft Excel, and
thought about uh or looked at cells that
are blank, like what does it mean if a
spreadsheet cell is blank? Does it mean
like there's literally no data there?
Does it mean that you just don't have
the data there or it's missing in some
form? Well, how do you address that?
Well, maybe you put like n sl a in
English for like not available or
something like that, but that's kind of
hackish. And if you use na, that might
mean that no one can actually type na as
their answer. And so what's nice about
SQL and data and database languages more
generally is that null signifies the
conscious omission of data. It's not
just a missing value. It's consciously
not there. It's not just the empty
string, quote unquote, for instance. So
we might see different examples of that.
But what's nice now is that I can
distinguish null from other values. And
in fact, if that is not a good idea to
have any data in my data set that is
null for whatever reason, like it just
looks like bogus data, it would nice to
know who inserted that when. No problem.
We can also delete data from a table in
SQL. And I can delete from the name of
the table where some condition is true.
So for instance, if I want to delete
that, I can do this in a couple of ways,
but perhaps the simplest is to delete
from
favorites where uh timestamp
is null. Semicolon. So is 2 is another
SQL keyword here. And that will go ahead
and delete only those rows where the
time stamp is null. Enter. Let's do the
same select command as before. Enter.
And voila, that row is now gone. Be
very, very, very careful with delete
statements. If I had foolishly done
this, want to guess what the results
would be?
It would delete everything. And like you
can Google around and see actual
articles of like interns at companies
who had way too much access to a company
database executing something like delete
from favorites because they forgot the
predicate. They hit enter too soon. and
boom, all of the data is now gone. So
these are very destructive commands and
just like in the real world, if you
don't have backups or versions of these
same tables, the data can indeed be lost
forever. So don't do that. Always have
your wear and make sure your wear is
correct. All right. Well, let's go ahead
maybe and um suppose let's claim that
maybe 50ville is going to be a really
popular problem among students. So much
so that it becomes overnight everyone's
favorite problem. Well, we can update
the table as is. Here is the general
syntax for updating rows in a table. You
literally say update the name of the
table, the word set, and then a bunch of
key value pairs. The column that you
want to update, setting it equal to the
value that you want to update it to
where some condition is true. So, what
does this mean concretely? Well, let's
say that we want to change everyone's
favorite to SQL and 50ville. I could do
this. update favorites set language
equal to SQL comma problem equal to
50ville
close quote semicolon and this is where
again it can be dangerous but in this
case I'm going to go ahead and hit enter
without any predicate to filter this
nothing bad seems to happen but if I now
do select star from favorites semicolon
all of you would seem to like 50 bill
and there is no going back to the
previous version of the table unless I
quit out of this And I import the whole
CSV again, maybe after deleting the data
entirely. All right. So, how do I get
rid of all of the data? Well, if you
want to delete from favorites for real
now, enter. Select star from favorites.
We can confirm that that was a bad idea.
There's literally no data in the
database anymore, but we can certainly
restore from our actual CSV. So in
short, we've got select, we've got
insert, we've got update, we've got
delete, we've seen create, albeit
automatically generated by SQLite 3.
Maybe we'll see drop. And actually, we
can see drop now. So recall that if I do
dots schema, I can see all of the tables
in this here database. If I do drop
table favorites semicolon, and now again
dot schema, now there is nothing in this
database at all. So that's an even worse
command to run unless you know and
intend what you're doing. Questions then
on these CRUD operations creating,
reading, updating, deleting. Yeah, here
first.
>> Why do you not do quotation marks around
null? So null is a special symbol and if
you put quotation marks around it, you
would literally be looking for the value
null l that maybe was the name of a
language or the name of a problem or
something literally in the CSV. We are
looking for the absence of that data
altogether. Yeah.
>> Really good question. Is it's so easy to
destroy data like this. Are people
actively backing up their data? Short
answer, yes, absolutely. Like all of
CS50's web apps and the like are
automatically backed up on some
schedule. Even then, we have to decide
what that schedule is. And if it's
daily, for instance, nightly, we could
lose up to like 23 hours 59 minutes of
data. In some case maybe companies would
therefore version their data more
tightly like every 5 minutes every
minute although that's going to consume
a lot more space but there already is
this theme of trade-off certainly in
computing um you can also implement
forms of access control so SQLite is
lightweight it has no notion of
usernames or passwords if you have
access to the data you can touch
everything but in the real world with uh
commercial and open source software like
uh Oracle and SQL server and Postgress
and MySQL you actually have usernames
and passwords and specific permissions
so you can give users in turns the
ability to select data but not update or
delete or insert data or any combination
thereof. So there are defenses other
questions on these here CRUD commands.
Okay, let's go ahead and play with some
real world data. So many of you might be
familiar with IMDb, the internet movie
database, which is a great repository of
data for movies and also TV shows and
actors and the like. And within IMDb's
website, you can actually download uh
TSV files, tab separated values of files
that contain a lot of the data from that
their website. So we went ahead and did
this. We then converted that TSV data
into a whole bunch of SQL tables so that
we can begin to play with it uh in the
context of TV shows. However, let's
start first with a question about how
you could go about modeling data for TV
shows themselves. So for instance in
advance I also uh created a few
different spreadsheets that just allowed
me to play with how I might model data
real world data at that. So the office
is a very popular uh TV show. The US
version here is uh the US version here
starred Steve Carell and others. So if I
think about how IMDb or maybe just even
little old me with a spreadsheet might
keep track of who starred in what TV
show. Well, I might just use a Google
sheet like this and in the first column
have a title column where this is the
title of the show, like The Office. And
then if it stars one person, I would put
Steve Carell in the next column. But if
there was a second star, I might put
Rain Wilson or John or Jenna or BJ Novak
here, column by column by column. And I
could just keep adding show after show
after show after show, one row per show,
and then however many stars that are in
there. What might you not like about the
design of this data, though? or what
might start to look odd.
>> Yeah, it's a little weird that we have
star star star. Just this repetition has
tended to be bad. Anytime we're copying
and pasting should rub you the wrong
way. Other observations about it too?
Yeah.
>> Yeah. At the moment I've got 1 2 3 four
five stars and there's certainly TV
shows with fewer TV stars and more and
so okay I can add more columns. I can
just keep saying star, star, star, but
then it's going to be a very ragged data
set, very sparse data set where there's
going to be a lot of blank cells for
shows that have small casts, but then a
lot of columns for shows that have large
casts. So, it just feels like this
should be rubbing you the wrong way. It
just feels like it's going to get messy,
especially as the number of stars, let
alone shows, gets larger. All right.
Well, another version of this uh data
set that I put together is this instead.
So, I didn't like the fact that I was
going to have an arbitrary number of
columns based on the specific show in
question. So, here I scaled back and I
just have a single column for title as
before, but now a single column for
star. And I decided that if a TV show
has multiple stars, well, I just put
each of the stars names and then to the
left of them specify the show that
they're in. seems to be a little better
and that I've solved some of the
redundancy problem, but I've kind of
just kind of like covered up the hole in
a leaky hose and now another leak sprung
up here, which is to say there's still a
bad design. What's bad here?
Yeah,
>> yeah, now I've got the office, the
office, the office, the office, the
office. And that too feels like I'm
wasting space. If I manually type this
in, odds are eventually I'm going to
screw up and one of these is going to be
misspelled, which is going to break
something somehow. So, this two doesn't
feel quite ideal. So the third and final
version I whipped up to model this data
which is going to lead us to a similar
design in an actual database looks a
little more arcane but is the right way
at least academically to do things and
we'll see technologically too this is
going to be a big game. So here I now
have a spreadsheet with three separate
sheets. One is called shows which is
selected at the moment. Another is
called people which is not selected yet
and the third of which is called stars.
What am I doing here? Well, notice that
in the show sheet, I've still got the
title column, but I've decided to give
the office a unique ID. Much like a
Harvard student has a unique ID number,
much like an employee in a company
probably has a unique employee ID.
Similarly, have I given the office a
unique identifier that happens to be the
same as it is in IMDb. Meanwhile, for
all of the people that exist in the
world of TV shows, for instance, these
five folks, I have their names as well
as unique IDs for them. and those
integers are unique to the people and no
connection per se to the show ids just
yet. But the third and final sheet I've
whipped up is going to be a sort of
cross referencing sheet that allows me
to associate shows with people. And at a
glance, this looks the most arcane of
the three because it's just numbers.
It's just integers. But if you recall
from a moment ago that the office's
unique ID was 386676.
Well, that's how we associated that show
with this person which happens to be
Steve Carell and so forth. Now, at a
glance, not very useful to me, the human
unless I do some fancy spreadsheet stuff
like VLOOKUPs, a familiar, the like, but
this is a stepping stone to how proper
databases do actually store data. What I
have done here is normalize the data by
eliminating all redundancies except for
maximally some redundant integers. And
why is that? Well, integers, at least we
know from our days in C, are going to be
a finite length. It's going to be 32
bits, maybe 64 bits, but it's always
going to be the same number of bits. And
that's nice because anytime you have a
fixed number of bits, it lends itself to
storing things nicely in an array or
doing binary search because everything
is a predictable distance apart as
opposed to strings like Steve Carell or
John Krinski or the names might vary in
length. These IDs for the title of the
show and these IDs for the persons are
not going to vary in length because
they're all just integers. But of
course, this spreadsheet now much less
useful because if I want to figure out
who is in the office, well, first I have
to figure out what show this is, then I
have to figure out what uh person this
is and this is and this is but that's
where SQL is again going to swoop in and
allow us to solve this problem. And
indeed SQL is one of the most common
ways that web applications today, mobile
applications today store any amount of
data at scale. They are most likely not
using simple CSV files. they are using
SQL light or MySQL or Postgress or
Oracle or other commercial and open
source incarnations of SQL databases and
odds are IMDb might be using the same as
well. All right, so let's go ahead and
do this. I have created in advance a
file called shows db that contains
hundreds of thousands of rows from TV
shows and TV stars and other data from
IMDb itself. And in a moment we'll see a
database that if drawn as a picture
looks a little something like this.
There is going to be a people table.
There's going to be a shows table.
There's going to be a stars table that
somehow links the two. There's also
going to be a writer table and a ratings
table and a genres table. So overnight
this sort of escalated quickly from just
favorites which was a single table to
now a real world data set that has six
tables. But here is the relational in
relational databases as these arrows are
meant to imply. Right now, there are
relationships across these several
tables. Case in point, here is people
here. And we'll see in a moment that a
person in the IMDb world has an ID
number, a name, and a year of birth. A
show in the IMDb world has a unique ID,
a title, the year it debuted, and a
total number of episode. But there's no
mention of people and shows. There's no
mention of shows and people. But per the
arrows, there's going to be this third
table here, stars, that somehow links
show ids with person IDs. And this is
where relational databases get really
powerful because you can solve all of
those redundancy concerns and actually
enable yourself to select data much more
quickly instead. But let's focus on
something simple first. Let's focus just
on the shows table, which pictorially
might look a little something like this.
So, in just a moment, I'm going to go
ahead and reopen VS Code, and I'm going
to open up instead of favorites. DB, I'm
going to go ahead and open up uh a file
called shows.db, which again, I arrived
with in advance. So, if I open up with
SQLite 3 shows db and hit enter, I'm
back at a SQL prompt. Let me go ahead
and type schema shows just to show you
what command created this here table.
And it got a little more interesting
already. Notice that the table is called
shows and it's got 1 2 3 four columns.
The an ID for each show, a title for
each show, the year it debuted for each
show, and the number of episodes.
There's also clearly some mention of
types and some other keywords that we
haven't yet talked about. But let's
focus now first on just what the data
is. The best way to wrap your mind
around a new data set if someone hands
you a SQL uh database or you've imported
a CSV into a SQL database is just select
some data. So select star from shows
semicolon.
That's a lot of data flying across the
screen. It's not very easy to see
because some of the show names are
apparently crazy long and so it's
wrapping, but it's still going and going
and going. I'm going to hit control C to
interrupt it. C as uh with our terminals
in general is your friend. Let's run
that same command, but just limit it to
the first 10 shows. So, there are the
first 10 shows in the IMDb database of
TV shows. So, we've got 10 rows in this
data set going back to it looks like the
1970s is roughly where their data set
starts. All right. So here's the data we
have in here. Well, how much is there?
Well, let's go ahead and check. So,
select count star from shows semicolon.
And now we're talking. There's 250,87
shows in this database. And if I do the
same for people, select count star from
people semicolon. Looks like there are
74,315
TV stars associated with this year data
set. So here too the data is much more
interesting and much more representative
of real world data. All right. How about
the ratings? IMDb if unfamiliar is also
a place where you could go to check the
ratings from users as to whether
something is good uh show a good show a
bad show or anything in between. So
let's do dots schema ratings and I'll
see that yeah there's this table called
ratings that as we saw briefly on the
screen there's a show id and then a
rating and then the total number of
votes that contributed there too and
again some data types and other syntax
that we'll get to before long but let me
go ahead and just do select star from
ratings limit 10 just to get a sense of
what the data is. That's now what the
data looks like in that table. So to a
human at a glance, not that useful
because you don't know what those show
ids are. But in a moment, we're going to
see how we can reconstitute this data by
linking these tables together by way of
those ids and actually get answers to
questions. So among other things, a SQL
database or a relational database more
generally supports onetoone
relationships whereby a row in one table
can map to a one row in another table.
So it's this is in contrast to one to
many for instance. So one one means one
row over here somehow relates to one row
over here. Again the relational in
relational database. Uh how might we go
about uh seeing this? Well first here's
a tour of the data types that SQL light
supports. Uh whereas in C we had a
somewhat similar list and in Python that
list went away at least with regard to
explicit types in SQL we're back to when
creating our tables explicitly stating
what the types of those uh columns are.
So you have integers, you have numeric,
which is more of a catch-all for things
like times and dates and other useful
real world data. You have real numbers
which are like floats with decimal
points. You have text which we've seen
already. And then you have blobs which
is a great name which stands for binary
large objects. You can actually store
raw zeros and ones like files in the
database. Generally that's frowned upon
to store files. But there's certain
times where you do want to store binary
data and not pure text. That's it for
SQL light. There are only these five
types. in uh other commercial and open-
source SQL databases like Oracle and
MySQL and Postgress and the same names I
keep rattling off, you have even more
data types than these. So that's among
the additional features you get by using
other databases as well. There's a few
keywords though that are worth noting in
SQL. You can specifically say when
creating a table that this column cannot
be null. If you don't want timestamp for
instance to ever allow for null values,
you can literally specify when creating
that table, this column cannot be null.
And if I try to insert data into that
table with a null value as by not
providing a timestamp, the insertion
will fail. And so here's where things
are different from just writing Python
code or certainly using a spreadsheet.
You can actually have built-in defenses
so that you and no one else messes up
your data by inserting bogus or blank
data accidentally. You can further say
that things must be unique. So every
element, every cell in a column must be
unique to ensure that you can't
accidentally put two things with the
same ID. Two Harvard ids, two employee
ids that are duplicates. You can avoid
that all together. But more importantly,
relational databases support these two
concepts, primary keys and foreign keys.
And this is where the magic really
starts to happen. A primary key is the
unique identifier for a table. It is the
column of values that uniquely identify
every row. So it's probably going to be
the show ID, the person ID, the Harvard
ID, the employee ID. Anytime you have a
value, often numeric, often integral,
that uniquely identifies rows, you
simply call that a primary key. When
that same ID appears in another table
for cross referencing purposes, you
refer to it instead as a foreign key
because that same key is over there in
another table, thus foreign. But they
refer to one and the same things in the
context of the table in which it's
defined. It's primary. If it appears in
some other table, it is now considered
foreign. All right. So, how can we make
use of this? Well, let me go ahead and
propose that we execute a few SQL
commands as follows. If I wanted to
start asking questions about ratings, I
could do something like this. Select
star from ratings where the rating is
maybe a good show. So, let's call it 6.0
or higher. But let's just limit this to
the top 10 shows that meet that
threshold. Enter. So here I now have a
temporary table that gives me three
columns from the ratings table. Show ID,
which is a for the moment useless
identifier because I don't know what
show it corresponds to, but the rating
value and the number of votes that
contributed there too. Well, how might I
actually get to the shows that are
actually highly rated at 6.0 or higher?
Well, I don't need to select star. If
all I care about is these top 10, I can
whittle this same command down to just
selecting the ratings. And now or sorry
uh sorry, not the ratings, I can whittle
this uh this table down to just
selecting the show ids. So this is the
answer to the question. What are the top
10 TV shows whose ratings are 6.0 or
higher? Well, from the table, these are
the first 10 that come back. How do I
now select the shows that correspond to
these values? Here's where things can be
done a few different ways. I could
select everything I know from the shows
table where the ID of the show is in the
following set. I'm going to do a
parenthesis and then just for
readability, I'm going to hit enter. The
dot dot dot and angle bracket just means
I'm continuing my thought. It's not
executing the command yet. What is the
query I now want to run? Well, it's
going to be a nested query. I can now do
the same thing as before. Select the
show id from the ratings table where the
rating is really good greater than or
equal to 6.0. But let's then limit the
total number of queries to just 10. So
here just like in sort of grade school
math we have parenthesis. So the first
thing that's going to be executed is the
thing inside parenthesis. So this is
going to get me every show ID from the
ratings table that has a really good
rating of 6.0 or higher. That's going to
return to me a column of values. I'm
then going to say select star from the
shows table where the ID of the show is
in that list of values but only show me
10 of those is what I'm asking here. So
what I should now see is much more
useful data namely the 10 shows that are
highly rated. Enter. And indeed I get
back these 10 shows all of whose ratings
are indeed quite a bit higher. If I want
to only care about the title that too I
can do. So let's do this again. Instead
of selecting star, let's select title
from shows where the ID of the show is
in the following parenthetical. Select
show ID from ratings where the rating is
greater than or equal to 6.0. Close my
parenthesis. Limit to 10. Enter. And I
see the exact same thing, but just the
nail being hit on the head. Just give me
the titles of those top several shows.
Of course, I might want to might be able
to do this differently. In other words,
here's the top 10 titles. Well, what are
the ratings? Like, that's why you go to
IMDb or Rotten Tomatoes or the like. You
want to see the actual ratings, not the
titles or the ratings. Well, it turns
out we're going to need another
technique to do that. Namely, an ability
to join two tables. And in fact, just as
a teaser for this, if we want to start
playing around with some real data, here
might be, for instance, excerpts from
two tables. Here's the shows table at
left. Here's the ratings table at right
or a subset thereof. If I want to figure
out what the rating is for a given show,
wouldn't it be nice if I could somehow
like line these two tables up together
such that just like the tips of my
finger, I line up this value with its
corresponding value over here, a cross
reference of sorts. Well, just for the
sake of discussion, let me just kind of
visually flip this around. Though that
does nothing technically underneath the
hood. Let me just scooch them together
now after highlighting the common
values. demonstrate that. Well, wouldn't
it be nice to take the shows table and
join it with the ratings table in such a
way that those IDs all line up? And
we're going to have the ability to do
just this. Um, this is a lot already,
and this isn't the sort of cliffhanger
I'd wanted to end on cuz who cares about
joins, but it's going to be cool. But
let's take our 10-minute Halloween candy
break and come back in 10 for the next.
All right, we are back. So, recall where
we left off was essentially here. We had
these two tables. the shows table at
left and the ratings table at right. And
the motivation here was like how do we
actually associate shows with their
respective ratings because the ratings
of course are not in the shows table. As
an aside they could be and in fact
because this is meant to demonstrate a
onetoone relationship whereby every show
has one rating. We could have just put
the rating and the number of votes into
the shows table but we chose not to
because uh IMDb actually stores their
ratings as a separate TSV file. And so
what we tried to do for par with that is
only import into a ratings table the
very TSV file that we had downloaded
from them. But that too would be a
solution there too. So at this point in
the story we've got the shows table
here. We've got the ratings table over
here. We've noticed that there are
commonalities. There are show ids that
appear in both tables. And in fact to
use some of the new vernacular this is
the primary key. The ID column here.
This is that same value but in this
context it's known as a foreign key
because it's in some other table. But
that's going to be how we link these two
things together. So, how do we select
for not just The Office, but maybe every
TV show its respective rating? Well,
let's go back to VS Code and at my SQL
light prompt, let me go ahead and do
this. Select star from the shows table.
But let's go ahead and join the shows
table with the ratings table. How do I
want to join these two tables together?
We'll do so on the shows tables ID
column being equal to the ratings tables
show id column and then go ahead and
filter the results in the following way
where the rating we care about should
still be greater than or equal to 6.0
and let's only limit this to the top 10
results. So, it's a bit more of a
mouthful, but what I'm doing is
selecting everything from the result of
joining shows and ratings on this column
with this column. And the rest of the
predicate is as before. So, join is
going to do literally that join these
two tables as I have prescribed. When I
go ahead here and hit enter, now that I
have my semicolon, I get back a complete
table containing everything from the
shows table, everything from the ratings
table with those unique identifiers
lined up. Indeed, if you look at the
primary key over here, the ID column,
62614 dot dot dot. Over here, you have
show ID, which came from the ratings
table, 62614
dot dot dot. So, we've taken two tables
and really joined them together, but
we're only seeing a subset because I
limited it to 10 such rows. Now, of
course, most of this data doesn't seem
very interesting if my whole goal is
just to tell me what the ratings are for
these shows. Well, let's go ahead and in
code achieve this sort of result. Let's
literally join these tables together.
Let's get rid of the redundancy all
together. And then really, let's whittle
it down to just a title column and a
rating column. So, how do we do that?
Well, in code, I'm going to go ahead and
select more specifically the title of
every show and the rating of every show
from the shows table, but I'm going to
join it with the ratings table on shows
doid equaling ratings.show id. And as
before, I'm going to limit it to where
rating is greater than or equal to 6.0
and 10 such results. Enter. And now I
have a nice simple temporary table that
in one column has the titles of these
shows and in the right hand side has the
ratings of the shows. Even though those
two data sets were completely separate
in two separate tables. Indeed, if we
think back to where this data came from,
what we've been focusing on is the shows
table and we've joined it with the
ratings table. Here's the primary key
for shows. Here's the foreign key for
ratings. And by convention, notice that
we've adopted a certain uh a certain
approach. Anything that's called ID here
implies that it's a primary key.
Anything that's something underscore ID
implies that it's a foreign key. And the
convention we adopted which is actually
quite common is if the table is called
shows plural, we call the foreign key
show singular ID. Different companies,
different communities will have
different practices, but we've been
consistent across all of these tables
with our underscore and lowercase
conventions. Yeah. I'm just curious on
how these IDs all generate and relate to
each other properly.
>> Really good question. How do all these
IDs generate and relate to each other
properly? Well, in our case, I have no
idea. The Internet Movie Database people
came up with these unique identifiers
somehow and we simply in incorporated
them into our data set. In practice,
what they probably did and what you will
do for instance in future problem sets
when generating data is you just assign
an arbitrary integer starting at one
then two then three then four then five
and you just let it auto increment all
the way up and you let the database
ensure that you never have duplicate
values.
>> Yeah.
>> Just to clarify for the dot dot dot and
arrow symbol that's only to like make it
look better, right? like there's no like
>> correct the dot dot dot in uh uh angled
bracket that you keep seeing is just the
continuation prompt which means I have
prematurely hit enter deliberately
because I want to move everything onto
the next line so it doesn't wrap ugly
onto multiple lines it is not SQL syntax
it's specific to SQL light 3 and it's
just a continuation of the thought
that's all good good observation yeah
>> when you limit it to 10 showing how
Good question. When you limit something
to 10, for instance, which ones do you
get? You just get literally the first 10
rows from the table. And so it will
typically be ordered if you don't use
the order by uh keywords uh in the same
order from which it came from those
tables. And so you're just seeing
arbitrarily the first 10 that match that
predicate, which is rating greater than
or equal to six. We have not ordered it
by rating. So I'm not getting like the
10.0 shows necessarily. I'm just getting
the first 10 shows that are greater than
six. And the point for that is just I
want it to fit on the screen rather than
see hundreds of thousands of answers.
Okay. So you might recall now that there
were certainly other tables besides
these. So let's see in the broader
scheme, not just shows and ratings, but
let's focus on genres. If only because
genres is interesting because it's no
longer a onetoone relationship because
of course why would a show have multiple
ratings. It sort of has its own rating.
But a show could certainly belong to
multiple genres. You could imagine a
show being a comedy and a drama or a
musical and a comedy or any other number
of combinations of one or more genres.
And so the way we've chosen to implement
that here too is with a separate table
called genres which is not perfect.
There's going to be some redundancies
here that we have not yet eliminated.
But it does indicate that we can go
ahead and have multiple such values
associated with each and every show. So
how do we get there? Let's focus just on
this. Let's go back in just a moment to
VS Code and let's take a look at the
schema for now genres. In genres, we
have the following. A table called
genres which got has two columns. A show
ID which is an integer that cannot be
null and a genre which is text which is
also not be null. And now for the first
time, let's actually use some of the
vernacular we've introduced. Here we
have an example explicitly in SQL that
specifies when creating this table that
it shall the show id column shall be a
foreign key that references the shows
tables ID column. And admittedly I think
the syntax for creating tables is a bit
of a mouthful even. I often have to read
uh to look it up to remember the order
of everything. But here we have the
columns listed first and then these key
constraints. Foreign key referencing
this primary key over here. And in fact,
let's rewind to look at the shows table
now to see from which uh from whence we
came. So if I do do schema of shows,
which we've done before, but waved our
hand at it, then we'll indeed see that
shows has a primary key called ID, which
is an integer. How do I know that?
Because the very last thing in the
parenthesis says that the ID column in
this table is a primary key. Then we see
that uh the title is text can't be null.
The year is numeric, which again I
described as sort of a catchall for
other real world numeric types that
aren't purely integers or uh real
numbers per se. Episodes is an integer.
Both of those apparently can be null
because maybe IMDb just doesn't have
that data for some older shows, but
primary key is indeed specified here.
And just for thoroughess, let me
distinguish now genres from ratings. If
I do schema ratings again, which we
waved our hand at earlier, very similar
in spirit to genres in that there's an
ID column that somehow references the
shows table and then some other column
here, genre. In this case, we had
ratings and votes, which were reals and
integers respectively. But notice this
one additional constraint here. I
deliberately specified that show ID in
the ratings table must be unique. That
is to say, you cannot have the same show
ID more than once in the ratings table.
Why? Because I indeed wanted a onetoone
relationship. And it would not be one
one if there were multiple show ids that
correspond to one uh ID in the shows
table itself. But genres, we're going to
allow that it's uh can be duplicates.
And so we don't have mention of unique
there. All right. So where does this get
us? Well, let me go back into uh my
terminal here after clearing all of
that. And let's go ahead and just see
the data to wrap our mind around it a
little more uh real. So select star from
genres limit 10 just to see the the
first 10. All right. So it looks like
there's some comedies, adventures,
comedies, family, action, sci-fi, and so
forth. Well, let's go ahead and look up
just one show's information. In fact, I
saw this number, this ID before. How
about let's just look up this show. What
is this adventure show? Uh 63881. So
select star from shows where ID equals
63881 semicolon. Okay. So this is the
show called Catweel from 1970 which had
26 episodes in total and that was indeed
its unique identifier. So that's all
fine and good if I want to see something
about that specific show. But as before,
how do I associate Cat Weasel in this
case with all of its genres? Well,
instead of it being a onetoone
relationship necessarily, maybe Cat
Weasel is not just an adventure. Maybe
it's also a comedy and a family show.
And indeed, if I go back to the results
just now, you'll see that 68111
indeed lines up with adventure, comedy,
and family. And then the ID changes to
be about some other show. So, how do I
select these three answers to the
question, what genre is Cat Weasel?
Well, for this, we need to talk about
one to many relationships and how we can
get those back. Well, let's go ahead and
do this now in my terminal. Let me go
ahead and say uh the following. Select
genre from the genres table where the
show ID equals just that 63881, which
I'm now starting to memorize, adventure,
comedy, and family. So, that's the
answer to the question, but this
certainly isn't the best way to do this
where you have to like look up the
unique ID for the show you care about,
then copy paste it or memorize and type
it out into this query just to get the
genres. It would be nice to just ask all
of this in one breath. Well, we can do
this even though it's a bit more
verbose. I'm going to instead this time
say select genre from genres where the
show id I care about equals and now I'm
just going to hit enter so as to move
this nested query inside of parenthesis
and I'm going to say well I don't know
off the top of my head what the unique
ID is for catw weasel but I can ask the
database select the ID from the shows
table where the title of the show equals
cat weasel and this now obviates the
need for me to memorize or copy paste
that unique ID I'll hit enter and close
my parenthesis. Uh, I'm going to go
ahead then and say uh, semicolon enter.
And now I get back the exact same
answers, but without having to know or
care about these numeric values. And
that's kind of the point here. Even
though the database itself, the actual
IMDb website needs to use these unique
identifiers to store everything in the
database, we humans, generally speaking,
should not know or care what these
identifiers are. They're just meant to
implement this notion of relationships,
these cross references. And so here we
see an example where you can ask the
question you care about without worrying
about any of the underlying numbers or
even seeing them as a result. All right.
Well, what's really how else might we go
about do doing this? Well, let me
propose that we join these two tables
and ask the question in a slightly
different way. So, here's an excerpt
from the shows table. Here's an excerpt
from the genres table. And clearly we
could do something like we did before
for ratings where we could line these
two up and kind of join them together.
Just for the sake of discussion, let me
flip these columns around though that
has no technical significance. And now
we can clearly see 63881 appears there
and here. The difference though because
now this is a one to many relationship
is that it's not quite as simple as just
joining the rows together. I need to
kind of join it here and here and here.
And the database can do this for you
albeit at some cost in redundancy. So
what I'm going to observe is that these
ids are all the same. Primary key in
this context, foreign key in this
context. Well, I'm going to start to
join them together here, but it's not
possible to return a temporary table
that's just outright missing data. You
have to get the same number of rows and
columns everywhere in a grid. So what
the database is going to do if I do join
these two tables together and they are
participating in a one to many
relationship with each other, it's going
to duplicate the data that's necessary
to sort of make every row look the same.
Downside is it might indeed be taking up
some additional space unless the
database is smart and somehow using
pointers or something like that
underneath the hood to avoid the
redundancy. But for my purposes, this is
actually quite nice because if I iterate
over these rows, as I could in Python,
as we'll eventually see, it's just nice
to have all the data you care about in
each and every row, even though it's
clearly redundant. But the data is not
being stored redundantly in the data.
It's just temporarily being presented to
me with this here, redundancy. So, what
do I really want to have happen? Well, I
really care about actually joining these
two tables together and ultimately just
getting back the title and the genre
respectively. So, let me go ahead and my
VS code here and do select title and
genre from the shows table. But let's
join it this time on the genres table on
shows ID equaling genres.show id. So
that's quite the same as with ratings
where uh the ID equals just for time
sake 63881 which I know is Catweasel but
I could certainly use a nested query if
I wanted to do this as before. Enter.
And I get back Catweel's three genres.
And if I were to loop over this data in
some kind of like Python code, I would
have access to the title and genre with
each iteration, which I claim is useful.
But if I don't care about that and I
just really want to select the genres, I
can do this with joins too. Let me just
select the genre from shows joining it
on genres on shows ID equaling genres.
ID where the ID is catw weasel 63881.
And now I get back just that answer. So
in short, what have we just seen? One,
you can join two tables together and
whittle down the temporary table to just
the data you care about. Or if you
prefer, and if I scroll back up in my
history here, you could take a
fundamentally different approach but
still get the same answer of simply
using a nested query. I would say as you
learn SQL for the first time, I think
it's quite often easier to just do
multiple nested queries because you sort
of work your way uh from the inside out,
taking sort of baby steps to the
problem. If the problem in question is
give me all of the genres for a specific
TV show, well, first I need to know
because I know how the data is laid out
in the database. I need to know the
unique ID of the show I care about.
Fine, that's pretty straightforward and
hence this inner query. Once you have
that, you can parenthesize it and on the
outside now you can select the question
to which you really want the answer,
which is what is the genre that lines up
with that show ID one or more times. So
in short, nested queries probably easier
and certainly when learning it for the
first time, but quite powerful are these
join queries where this achieves the
exact same result. Especially if I were
to generalize away the 63881 and do a
nested query here. Sometimes you want
join, sometimes nested queries suffice.
>> How does SQL do all these searches?
>> Oh my goodness. How does SQL do all of
these searches? What's its time
complexity? We'll talk about that toward
the end of today. In the most naive
implementation, SQL is essentially just
doing linear search from the top of the
table all the way to the bottom.
However, we as the programmers are going
to have the ability to optimize those
queries so that the database can
actually do something closer to binary
search and in general we'll be able to
achieve much better performance as a
result. A really good question. All
right, let's go back to the big uh
flowchart of this data set. We've looked
now at shows and ratings. We've looked
at shows and genres. Let's now focus on
the juiciest part like the part that
associates shows with people. That is
who stars in what. Thinking back now to
what I was mocking up in the Google
sheet at the very start whereby I wanted
to somehow be able to associate the
office with Steve Carell and John
Krinski and Jenna Fischer and so forth.
The right way and the right way I claim
is going to be like this. Here's my
people table which has a primary key of
ID and then the name of each person and
their birth year if known. Then we have
the shows table which we keep talking
about which again has a primary key, a
title and year and episodes thereof. And
then the stars table is somewhat new now
because now when it comes to people
starring in TV shows we have a third and
final type of relationship, a many to
many relationship. Why? Because it's
certainly the case that one person can
be in multiple shows. And it's certainly
the case that some shows have multiple
people hence many to many. So this is
the third and final relationship where
just to recap ratings was one one genres
was one to many and now stars is going
to be many to many. All right let's dive
in. So these queries will be a bit more
verbose but again they're going to
follow this principle of sort of taking
baby steps to the answer we care about.
Let me go back into VS Code here and
suppose I want to find out everything
about the office that we know. So,
select star from shows where title
equals quote unquote the office
semicolon. Well, that's interesting.
There's a whole bunch of offices. There
was the UK version. There's a few other
variants, but the one we're probably
talking about with these stars is the
one that started in 2005 with 188
episodes. That's the US version in fact.
So, let me be a little more precise. Let
me select everything I know from the
stars from the shows table where the
title equals office and year equals
2005. so we don't confuse our answers
with the other versions of the office.
Now, how do I go about selecting all of
the people who starred in that version
of The Office? Well, I already have an
answer to the question of what is the ID
of that version of The Office because
it's right there in front of me. And in
fact, I can narrow my query more
precisely. Let's just select the ID from
the shows table where the title is the
office and the year is 2005. 386676.
Now, I could lazily just copy paste that
or memorize it, but we're going to do
this query more dynamically. I want to
next though figure out who is in that
show. So, if I have a show ID, I want to
figure out who's in it. But how do I get
to the people and the names of those
people? I have to logically go through
this cross referencing of the stars
table. So, here's where this query is
going to be a bit meteor than the past
ones and that we need to do a bit more
work than before. All right. Well,
what's the work I need to do? Let me go
ahead now and do the following. Select
all of the person IDs that are
associated with this show id. So, how do
I do that? Select person ID from the
stars table where the show ID equals and
I could lazily copy paste this, but
let's avoid that. Where the show ID
equals, let me now in parenthesis do
this. select ID from shows where title
equals quote unquote the office and year
equals 2005 and then close my
parenthesis semicolon. So what am I
doing? I'm taking a second baby step if
you will. The innermost query inside the
parenthesis is just again dynamically
figuring out the unique ID of the office
I care about. The outer query is now
figuring out all of the person IDs
associated with that show as per the
stars table. And the stars table has
only two columns. Show id and person ID.
That's how the linkage is done just with
those integers. Enter. I now have a
column of person IDs that are starring
in that version of the office. So how do
I take this one final step if I really
want to care about their names and not
their random person IDs? Well, I could
go ahead and select the name from the
people table where that person's ID is
in the following set. So when I'm
dealing with a single value, I just use
equals for equality. But when I'm
dealing with a whole result set, a whole
column of answers, I use the preposition
in in SQL instead. So where the person's
ID is in the following data set. Well,
let's do the same query as before.
Select all the person IDs from the stars
table where the show ID I care about
equals because there's only one show I
care about. I'm going to further
parenthesize this. Select ID from shows
where title equals quote unquote the
office and year equals 2005.
Uh, enter. I'll close my parenthesis.
Enter. I'll close my parenthesis.
Semicolon. And now from the outside in,
I've taken three baby steps. The
innermost one just gets me the show ID.
The second one in the middle gets me all
of the related person IDs. And the last
one is really the final flourish. Get me
all of the names of these people based
on those IDs. Enter. And now we see all
of the stars in this show beyond even
the subset that we've been playing with
visually on the screen.
Okay, that's a lot. Let me pause here
and see if there's any questions. Yeah,
>> this outermost query is what gives me
the names. But that query needs to know
the ID of the person who name whose name
you want. So the middle query actually
gets all of those person IDs. But to get
those person IDs, I need to know the
show id. So the innermost query, this
one gets me the show ID of the office
itself.
All right. So at the risk of
overwhelming, here are other ways you
can solve the same problem. But I do
claim that the nested selects is
probably conceptually and pragmatically
the easiest way. But let's also solve
this problem by doing a few joins just
so you've seen it. Actually, before we
uh do a join, let's let's flip the
question around first. How about all of
the shows that Steve Carell has starred
in besides The Office? So, let me select
everything I know from the people table
where the name of the person equals
quote unquote Steve Carell semicolon.
All right, there seems to be only one
Steve Carell in IMDb born in 1962.
That's all nice and good. What I really
care about is his ID. So, I'm going to
uh narrow this down to selecting just
his ID. Now, I could memorize or copy
paste 136797, but don't need to do that.
Let's just use this as part of a nested
query. Let's now select all of the show
ids from the stars table that are
somehow related to Steve Carell's person
ID. So where person ID equals and I
could copy paste this but that's
generally frowned upon. So let's not do
that. Let's just set it equal to a
nested query where I do the same thing
as before. Select ID from people where
name equals Steve Carell. Then close my
parenthesis semicolon. All right. He's
been in a lot of TV shows, but this is
not useful because I have no idea what
all of these integers are. So, the final
flourish, select the title from the
shows table where the ID of the shows I
care about is somehow in this
parenthetical list. Well, what's that
parenthetical list? Well, select the
show ID from stars where the person ID
equals Steve Carell's. What is his ID?
Well, I didn't memorize it. So, I'm
going to select ID from people where the
name of the person I care about is Steve
Carell, quote unquote. Close these par
this parenthesis. Close this
parenthesis. Semicolon. Enter. And now I
see all of Steve Carell shows. And even
though we're doing this in a black and
white command line environment, think
about what the actual IMDb is doing with
both of these queries. If you go to
IMDb.com and search for Steve Carell,
even though there's going to be a lot of
colors and pretty pictures and whatnot,
you'll probably get in some form a list
of all of Steve Carell shows. Or if you
search for The Office, you'll get a list
in some form of all of the stars there
in. I could claim then that if imdb.com
is using SQL, which it very likely is,
but not necessarily, they are executing
queries just like we did. And when you
type into the search box something like
the office or Steve Carell, they're
essentially just copy pasting your user
input into a prefabbed SQL query that
they wrote in advance so as to get you
the answers that you actually care
about. So this is how a lot of today's
websites and mobile apps are actually
working. The programmer comes up with
sort of the template for the queries you
might ask and then you supply the actual
data you're searching for. All right,
how about now as promised a couple of
other ways to implement these many to
many relationships uh based queries but
by using joins. If I know I need to
involve the shows table, the people
table and the stars table, I can
actually do this all in one breath
without any nested queries. Select for
me the title from the shows table. But
let's join that on the stars table on
shows do ID equaling stars dot show id.
Uh
but let's additionally join the shows
table on the following. Let's join it on
people on stars.person
id equaling people id. In other words,
if you know conceptually that you've got
these three tables, you want to somehow
combine them without using nested
selects. just figure out how to line
them all up. So again, I'm selecting
from the shows table, but I'm joining it
with the stars table by lining up the
shows tables primary key with the stars
tables foreign key. And I'm lining it up
with the people table by lining up the
stars tables foreign key with the people
tables primary key. I'm just kind of
logically connecting all of the things I
know to be related. And lastly, let's
just say where the name I care about
equals quote unquote Steve
Carell semicolon. It's a little slower
for now. And this speaks to the question
that was asked earlier. How is the
database doing this? Well, slowly,
apparently by default, unless we
optimize it, I got back essentially the
same results. Although there is some
duplication as a result uh which alludes
to the um filling in blank of blanks
that I alluded to earlier. But let me
show you one other technique too. But
again, I would encourage you certainly
for problem set seven to focus on nested
queries when you can because they're a
little conceptually simpler. If I care
about the titles of those shows, I could
select title from the shows table and
the stars table and the people table all
at once in one breath. But I want to do
so where the shows tables primary key
equals the stars tables foreign key. uh
and the people tables primary key equals
the stars tables foreign key and the
name I care about is Steve Carell. In
other words, this is just a third way to
express the exact same idea by doing
implicit joins by selecting data clearly
from all three tables as per this
commaepparated list of table names, but
telling the database with your
predicate, the wear clause, how you want
to line all of those tables up. If I hit
enter here, cross my fingers, I should
get back the same results as well,
albeit with duplication, which I didn't
see in the nested queries. Okay, that
too was a mouthful. Let me pause here
for questions.
Yeah,
>> to do that,
>> correct? In order to do this, you as the
programmer must know the internal
structure of the database, which is
quite often the case, whether you
created the database yourself or you
work with a colleague who designed the
schema for the database. That said, I
think your question is hinting at sort
of the challenge like I really need to
know the underlying implementation
details when really all I care about is
the answers to my questions. In code
quite oftenly nowadays um there are
object relational mappings whereby you
can use OMS for short whereby you can
use libraries that they understand the
underlying database schema. You as the
programmer do not need to because it
figures out how to do all of the joins
for you. So for CS50 we're introducing
everyone to the bottom up understanding
of how these joins work. But that too
can be easily automated because of those
schemas. Yeah. Just notice when you're
typing across you indent is indentation
important in SQL.
>> Good question. Is indentation in SQL
important? Technically no. But like with
any of the languages we've talked about
thus far, it is good for the humans and
certainly good for the students in a
context like this. Python of the
languages we looked at is the most
rigorous whereby indentation very much
matters and the consistency thereof. SQL
I'm just trying to pretty print things
to make it easy to gro visually. All
right. So those last two queries were
arguably kind of slow. Whereas with my
nested queries, I actually got lucky and
just boom, I got the answer quite
quickly. Those joins seem to be a step
backwards and that it was taking more
time to get back the same data that I
actually cared about. But that's
something we can actually chip away at.
It turns out that one of the other
values of a relational database visa v
something like a spreadsheet is that you
can actually tell the database in
advance how to optimize for certain
queries. This is not the case for
spreadsheets. If you have a lot of data
in Google spreadsheets or Microsoft
Excel or Apple Numbers, tens of
thousands of rows, hundreds of thousands
of rows, millions of rows, your
computer's going to slow to a crawl. And
at some point, those software packages
are just going to say, "Sorry, file is
too big." And they're certainly not
going to be terribly fast at searching
the data. But with a SQL database and
relational databases more generally, you
are as much the architect of it as you
are the user of it in this case. And so
you can tell the database in advance if
you want to optimize for certain queries
like select statements. So for instance,
let me go back to VS Code here and just
for the sake of discussion, let's time
how long it takes to find all of the
shows whose name is the office. I'm
going to use a SQLite command called
timer. And I'm going to set it to on.
And this is just now going to tell me
for every command I run how long it
took. I'm going to now select everything
from the shows table where the title of
the show equals quote unquote the office
close quote semicolon enter. And that
query took let's say in real terms 0.042
seconds. That's crazy fast. Like it's
less than a second. I mean it's truly a
split second. So no big deal. But it's a
fairly simple query. But I bet we could
optimize even this. Now why would you
want to optimize even queries that are
already pretty fast? Well, if they're
very commonly being executed, and I dare
say someone going to imdb.com and
searching for The Office or any TV show,
like that's the common case. People are
looking for TV shows, movies, actors,
and so forth. It'd be nice to use as
little amount of time to answer those
questions as possible. Why? One, it
makes for happier customers and users
because you're getting them the answer
faster. Two, it saves you money because
presumably if you've spent $1,000 for a
server and that server has certain
amount of RAM, a certain speed CPU or
brain, it can only do so many searches
per unit of time, per second, per
minute, or the like. So, wouldn't it be
nice if all of those searches is faster
using less time? So, you can handle not
a thousand users at once, but 2,000
users or 5,000 users all with the same
hardware. So, there's uh certainly
upsides there. Well, how can I go about
optimizing a query? Well, I can create
my own index. Another use of the create
keyword in SQL where I can tell the
database to optimize for searches on a
specific table and specific columns
therein. I say create index and then I
come up with a name for the index
whatever I want on the name of the table
that I want to index and then in
parenthesis the columns that I want to
optimize for. So what does this mean in
real terms? Well, let's go back to VS
Code here and let me create an index
called for instance title index though
the name doesn't matter on the shows
table uh using the title column. In
other words, tell the database please
expedite searches on the shows tables
title column. After all, that's what I
just searched on. Enter. Now, that took
a moment, almost half a second, but
that's a table. That's an index that
only has to be created once. If I do a
lot of updates and deletes, it might
actually take a little bit of time over
over the course of using the database to
maintain that index. But for now, that's
a one-time operation, creating the
index. But watch what happens now if I
scroll up in my history and go to the
exact same query as before, which
previously took 0.042
seconds, which yes, is fast, but not
nearly as fast as the new version, which
is 0.001
seconds instead. orders of magnitude
faster. So I can handle 4 uh2 times as
many users on the same database so to
speak than I could have previously just
by building this index. So what actually
is an index? Well, we come full circle
to discussions in like uh week five of
the class. So an index in a database is
very often created using what's called a
B tree. This is not binary tree. A B
tree is its own distinct structure
that's very similar in spirit in that
it's fairly shallow because most of the
nodes have children but it doesn't
necessarily have two children. It might
have more children. And in fact, the
more children the nodes have, the sort
of higher up you can pull all of the
leaf nodes and the shorter you can make
the height of the tree. So this is just
a generic representation of a B tree.
But what this implies is that when I am
now searching for titles like the
office, the database doesn't have to do
the default behavior which is start at
the top and use linear search all the
way to the bottom. If it has proactively
built up an index in memory thanks to my
command, it now has a treel like
structure storing those titles that
allows it to find in some logarithmic
time whether it's log base 2 or some
other base the same data much more
quickly. And that's how we went from 042
to 0.001
second instead in this case here.
Questions then on these here indexes?
No. All right. Well, let's propose that
we can combine some of today's ideas. It
turns out that now we're getting to the
point in the course where you're not
just choosing between this language and
another. You're generally using a suite
of languages to solve problems. And
indeed, in the coming weeks of the
class, when we transition to web-based
applications, you're going to use a bit
of Python, you're going to use a bit of
SQL, you're going to use a bit of
JavaScript and two other languages
called HTML and CSS. You might be using
like five different languages at a time
just to build one application. Why?
Because some of them are better for the
job than others. And indeed, that's the
ecosystem in which real world software
development is done. Well, to make this
bridge, we have a version of the CS50
library, recall, for Python, which has
functions like get string, even though
it's not that useful because it's just
like the input function, but get int uh
and get float. But also, in the CS50
library for Python, we have a module
that specifically makes it easier to use
SQL from Python code. After all,
wouldn't it be nice if I could get the
best of both worlds and implement like
an interactive program in Python, but
that uses SQL to actually get back data?
Or I can build a website that allows
people to search for TV shows or TV
stars and actually get that data from a
database, but use Python to generate the
web pages themselves. Well, we have some
documentation for this library here, but
I'm going to go ahead and use it in real
time to show you how much more easily
you can solve certain problems by using
each tool for what it's good at. So,
let's go back to VS Code here. Let me
exit out of SQL light and get back to my
normal terminal. And let me go ahead and
let's say minimize
my terminal here.
Uh, actually, let's go ahead and open up
favorites.py, which is where we left off
before. And recall that in the last
version of favorites.py, we had simply
used a dictionary to go about keeping
track of how many of you said Python or
C or Scratch. And when I last ran this
program with Python of favorites.py, pi.
The answer looked like this. Now notice
that it's not sorted alphabetically,
otherwise C would be first. And it's
also not sorted numerically, otherwise C
would be second. So it would be nice in
Python to maybe exercise some control
over this. But I stopped sort of doing
that before because it gets very
annoying quickly. And by this I mean the
following. Let me go back into VS Code
here uh and into favorites.py. And if I
wanted to sort by uh the counts here, I
could do this. Uh, I could change my
loop from iterating for favorite in
counts to favorite in sorted counts. So,
this is actually not too bad thus far. I
can actually sort dictionaries pretty
readily. So, now if I run this and let
me make my terminal a little bit taller
so we can see both results. If I run the
program now, you'll see that it's sorted
alphabetically by key. So apparently
when you use the sorted function in
Python and pass it a dictionary, you can
still iterate over all of the key value
pairs in that dictionary, but it's been
sorted now by key. So that's nice if
that's to be my goal, but maybe that's
not really my goal. And here's how
alternatively I could sort by value, the
190, the 58, and the 24. I can still use
the sorted function, but I need to tell
Python to use a key, a sorting key of
the counts dictionaries gets function.
Uh, and then if I run it again, I now
see it's sorted by value. But darn it,
it's now sorted in the opposite order. I
see scratch at 24, then 58, then 190. If
I want to reverse it, well then I have
to go up here and add another named
parameter. Reverse equals true. I can
run it another time. And now I get the
result I care about. Long story short,
this is just very annoying to have to
use that amount of code to actually
answer relatively simple questions. And
this is why we did transition for much
of today to a declarative language like
SQL that just let me select what I care
about in that data. So if I again I go
back into my database version with
SQLite 3 of favorites.db. I'll maximize
my terminal window. What did we do
before? Well, we can select uh from the
database
uh select uh let's see favorite comma
count star from favorites group by uh
favorite semicolon whoops.
Oh,
sorry. What did we do? We do select
language, comma, count, star from
favorites, group by favorite. Oh, damn
it. What happened? Oh, we deleted it.
See, this is why you don't use the
delete or drop command. So, I'm not
going to demonstrate this again, but
recall uh before break that when we last
selected this information, we used the
group by command to actually group by
the language in question and we got back
all the counts. But then we were very
easily able to reorder things by
actually just using order by and then
doing something in ascending order or
for instance descending order instead.
Well, now let's actually combine these
worlds of Python and SQL together to
write first a program that does just
that. But to do this, we're going to
need to restore that database. So let's
go ahead and do this. Let's remove
favorites. DB, which is just a file in
my account. Let's go ahead and run uh
SQLite 3 of favorites.d DB to create a
new version thereof. Let's now go ahead
and change my mode as we did earlier in
class to CSV. Let's now do import of
favorites uh CSV into a table called
favorites. And now let's doquit. And
when I do ls, okay, now it's back
favorites.db in addition to today's
other files. Now let me go ahead and run
SQLite 3 of favorites. DB. And just as a
sanity check, select star from favorites
semicolon. There's all of the data back.
minus the addition and subtraction that
we ourselves made earlier manually. And
let's go ahead and in SQL go ahead and
do select language,
count star from favorites
and group by language,
but let's order by count star in
descending order. And that's one of the
last commands we ran with this file. And
there is the answer in a single line of
code instead of some 17 lines of code
plus or minus some white space here. Can
we merge now these two ideas? Well,
let's see how to do this. Let's go back
into favorites.py here and make a new
and improved version of it that actually
uses SQL and no dictionary, no for loop,
no try except or any of this. Instead,
let's go ahead and from CS50's own
library import a SQL function which will
give me access to this functionality.
Let's create a variable called DB by
convention, but I could call it anything
I want and set it equal to CS50SQL
function and pass to CS50SQL function
the path to the database file I want to
open. This is a little weird, but the
syntax here is SQLite without the three
colon slash
favorites.
DB. This syntax, otherwise known as a
URI, is going to allow us to use the SQL
light lang uh uh protocol in order to
open up favorites. DB, which is the very
file I was just experimenting with
manually in my terminal. Here now is how
I can execute a SQL query in Python
using CS50's library. Now, as an aside,
even though this is indeed meant to be a
training wheel, CS50's library is just
easier to use than a lot of the real
world libraries that makes this
possible. So because we spend so
relatively little time on this, we're
still using this training wheel for
this. Give me a variable called rows
because I want to get back all of the
rows from this table that contain those
languages and e do db.execute.
The only function that's useful in the
CS50 library for SQL is this execute
function which allows me to write
literally a line of SQL like select
language count star uh from favorites
group by language order by count star uh
descending order. Just to make my life
easier, I'm going to add that alias
trick that we saw before. So as n to
change the count to the variable n. And
then here I can just do order by n
instead. It's a little long, but notice
that now I'm using SQL as a string that
I'm passing as an argument to this
dbexecute
function. So at the very end of this,
I've got to close my quote, close my
parenthesis so as to use one language in
effect inside of another. Now assuming I
do get back a temporary tables rows with
that line of code on line five, let's do
this. For each row in rows, go ahead and
do the following. Create a variable
called language and set it equal to row
quote unquote language. Then create
another variable called n, for instance,
and set it equal to row quote unquote n.
And then let's just go ahead and print
out language and n respectively. So what
does CS50's library do? It returns by
design a list of rows. Each of those
rows is a dictionary of key value pairs.
So when I do for row and rows, this is
just iterating over a list of values.
And we've done that over the past couple
of weeks. Inside of this loop, I'm just
creating temporarily two variables, uh,
language and n, to show you that each
row is indeed a dictionary, which means
I can index into it using strings like
quote unquote language and quote unquote
n because those are the columns that I
selected using this query up above.
Strictly speaking, I don't even need
these variables. I can just get rid of
that and a little more succinctly just
pass in row bracket language and then
row bracket uh n instead. So let me go
down to my terminal window here, exit
out of SQLite, run Python of
favorites.py in this form, enter and I
get back it would seem
the same exact answer 190 58 and 24 in
this case. questions now on this
co-mingling
of languages.
All right, how about one final thing?
Once we have the ability to like use
Python, now we can in fact make things
interactive. So for instance, let me
close my terminal temporarily. Let me go
ahead and now ask for some user input.
So after opening the database, let's do
this. Let's ask the human using Python's
input function or equivalently CS50's
get string function for their favorite
TV show and store it in that same
variable. Then let's do a SQL query that
selects that data. Rows equals
db.execute
select and let's see how many people
selected uh this favorite problem rather
not TV show how about favorite problem
from our favorites data set. So select
count star as n from the favorites
database where the problem in question
equals well now I need to put the user's
input. I don't know what that is yet
because they haven't typed it in yet.
So, what I'm going to go ahead and do is
a placeholder and say favorite close
quote and make this whole thing an F
string. Then I'm going to go down here
and I don't need to iterate because
ideally I'm just getting back a single
answer. How many people chose this
problem as their favorite? So, I'm going
to say that uh the row I care about is
simply the first row. So, rows is a
list. So, rows bracket zero is the first
and only row in that list. And then
let's go ahead and print out row quote
unquote n. Let's see the result here and
then see what happens. Let me put some
single quotes here and single quotes
here. Let me open my terminal. Let me do
python of favorites.py
and I'll say hello, world. Enter. And as
before at the start of class, 42 of you
like that. However, this is not not not
how you should ever write SQL code in
Python. What could go wrong with this
code?
Nothing went wrong a moment ago, but
what could go wrong?
Yeah, the user input. How so?
>> True. I don't know what those are yet,
but we're about to go there. What even
more simplistically could go wrong by
plugging in the user's input here? Yeah,
>> like hello.
>> Exactly. If I inputed the other problem
we played with, hello, it's me where it
was it apostrophe s that if interpolated
right here is clearly going to confuse
the uh single quotes such that who knows
what's going to come back. Now, in the
best case, the code might just not work
and I'll get some kind of error in on
the screen, which is not great for the
user because the program is not going to
be useful. There's no user friendly
error message. But in the worst case,
the user could do something incredibly
malicious if you are simply blinding
blindly trusting user input and plugging
their input into a SQL query that you
yourself constructed. Why? What if the
user types something crazy like the word
delete or drop or update or any of those
destructive commands that we saw earlier
and somehow tricks your code into
executing maybe the select but then
eventually an additional query like a
delete. Maybe they type in a semicolon
and then delete or a semicolon and then
drop or something like that. This is the
biggest threat to taking user input and
trusting it in the context of databases.
And it's called uh as one of your
classmates knows already, what's known
as a SQL injection attack. A SQL
injection attack is the ability for an
adversary or an unknowing user to
somehow inject code into your database.
A SQL injection attack then might look
something like this in the real world.
here for instance is like the login
screen to github.com. Um they do
actually use SQL among other languages
underneath the hood I believe not
necessarily for this but suppose they
did and when logging into github.com
you're prompted for your username or
email address and then of course your
password. Well, what if I know a little
something about SQL and suppose for the
sake of discussion, GitHub is using SQL
light, which they're not using because
it's not meant for massive large uh
massive data sets like this. But suppose
they are. And just to be malicious, I
type in my username mailinharbor.edu,
but then I use a single quote and then
dash dash. Well, the single quote is
there, me being an adversary in the
story, because maybe I can confuse their
code by closing their quotes sooner than
they intended. And we haven't talked
about this yet, but it turns out that
dash in SQL is the comment character. So
it's like hash in Python or slash and C.
This in SQL means ignore everything to
the right. That alone can be used fairly
maliciously as follows. Here, for
instance, could be the code that GitHub
is using underneath the hood, whereby
they might have some Python code, and
heck, maybe they're using the CS50
library that executes this pre-made
query. select star from the users table
where the username equals this question
mark and the password equals this
question mark passing in username and
password for instance. Uh but if they
are trusting the username and password I
typed in and just plugging it right
there, they could be vulnerable to
indeed a SQL injection attack. For
instance, this code we'll soon see is
actually the right way to do it. But
suppose they were doing it with fstrings
like I started to in my version of
favorites.py. Same thing. Select star
from users. where username equals this
username and password equals this
password and the little f here means
here's a format string. What could go
wrong? Well, let me actually paste in
the mail at harbor.edu single quote-
dash text here. Notice that this single
quote and this single quote are meant to
surround the username. And same thing
for the password there. But watch what
happens when I type in my data. Mail at
harbor.edu single quote. So this would
seem to finish the thought prematurely.
and then it says dash dash and so that
just means ignore everything else. And
so the effect here is essentially to
gray out all of that stuff because it's
effectively been commented out. So what
GitHub ends up doing accidentally in
this case is selecting star from users
where username is mailon at harbor.edu
irrespective of what his password
actually is. And if you assume that down
here they've got some conditional logic
like well if we get back some rows that
means that mail is in fact a registered
user. Go ahead and log him in. We don't
know what the code looks like, so it's
dot dot dot. You've just enabled anyone
on the internet to log in as me or
anyone else just by suffixing their
input with a single quote and dash dash.
And that's the least of our concerns. If
we additionally went in there and maybe
instead of dash we put a semicolon and
then delete from users or drop users, we
could cause massive havoc on their
database. This happens all the time.
Even now in the current year, you can
Google around and see examples of
companies that have not used proper
sanitization of user input. And it's not
just the intern. It's like random people
on the internet are accessing or
destroying their data maliciously. So
what is the solution to a problem like
this? Well, one, do not use format
strings in Python to simply plug in user
input. But the more important lesson is
never trust users input. either they're
going to do something accidentally or
they're going to do something
maliciously and you do not want that to
happen. So the solution then is to use a
library. Almost always use a library.
This is not a wheel you should reinvent
yourself. And by library I mean
something like this. If you instead use
a library like CS50s and you don't just
use fstrings, you'll see in a moment you
use question marks. What will happen is
this. When the user goes and types in
mailinharvard.edu single quote dash,
that's fine. and let them put weird
scary characters like single quotes in
their input. The library will take
charge of escaping user input. So
anything dangerous in their input will
be changed from one single quote to two
because we saw earlier today that that's
how you escape a character. And that
means that now what you have is in
effect my username is apparently
meenhar.edu
apostrophe dash and that's my username.
Well that's obviously not a real email
address. It's not a real username. This
is just going to return false. No rows
are actually going to come back. And the
way to do this now in our favorites
example analogously is in VS Code here
to actually go up into this uh execute
line. Don't use an F string. Change the
value of problem to be a placeholder
instead and then pass into this execute
function one or more arguments that will
be substituted in for that question
mark. And this is not a CS50 thing. This
is a uh industry convention whereby you
quite often use literally a question
mark. And that means that whatever this
variable's value is will get plugged
into that question mark for you. But the
single quotes will be added. Any
dangerous characters will be escaped for
you. And at that point, you can trust
that the user can type in anything they
want. Your code is not going to break.
You can see hints of this actually in
the real world. If you've ever gone to a
website and they tell you like, oh, you
can't you like for passwords for
instance, like all of us probably
intuitively know that you should have
pretty long uh hard to guess passwords
with letters and numbers and punctuation
symbols. Sometimes websites very
stupidly prohibit you from using certain
punctuation symbols, which should drive
you nuts because there's no
computational reason that you have to
put the onus on the user to sanitize
their own input. But quite likely those
websites have kind of learned part of
this lesson and they know some
characters can be dangerous in SQL like
semicolons or single quotes or the like
and they just don't want you to ever
type those in. Even though there are
solutions to this problem, use a library
that someone else smarter than you u
with more history of writing code than
you has used that's open source so that
many people have seen it and banged on
it over the years so that this problem
is not something you're vulnerable to.
questions then on what these here SQL
injection attacks
are all about. Yeah,
>> I guess you're telling the user what not
to use, you're also telling them what
system you're using and so maybe that
>> Good point. So if by also telling people
what characters they shouldn't use,
you're leaking information because a
smart adversary might know, oh well, if
they don't want me using that symbol,
they're probably using this language or
this technology. Yes, no good comes from
telling the world more information than
they need to know. So that's another
good paranoia to have. How about one
other issue before we come full circle
to the SQL injection attacks. There's
another challenge with relational
databases and with SQL uh itself, namely
race conditions. This isn't so much a
problem when I'm writing a a little
program here on my own computer. uh but
when you're running SQL code on a
database in the real world in the cloud
where you have many different servers
talking to that database and many
different users uh talking to those web
servers as is going to be the case at
Meta and Google and Microsoft and any
number of popular companies nowadays and
even some of CS50's own apps uses
centralized SQL databases where if
multiple people are trying to do the
same thing on them at the same time
submit their homework run check 50 we
too are vulnerable to what are called
race conditions. So what is a race
condition? Well, the way I learned this
back in the day when taking a course on
databases and operating systems uh more
generally was to think of a scenario
like this. Maybe in your dorm, you and
your roommates have a little dorm fridge
and you're both in the habit of really
liking to drink milk as the story was
told to us. And so maybe one of you
comes home from class one day and you
get get to your room, look in the
fridge, there's no milk in there. And so
you decide to walk across the street to
CVS or some other store to get milk.
Meanwhile, your roommate comes home from
their class and opens the fridge and
it's like, "Oh, we're out of milk. Let
me go to the store, too." And for the
sake of the story, they go to a
different store altogether so that you
don't run into each other and the
problem solves itself. So now both of
you are on your way to a store to get
milk. Time passes. You both come home.
One of you puts a jug of milk in the
fridge. The other one gets home and is
like, "Ah, damn it." Like we already got
milk. I can't fit this milk in the
fridge or now it's too much milk. We
don't really like milk this much. It's
going to go bad. Like very bad outcome
here. Having too much milk is the moral
of the story. But what's the what stupid
story? What's the What's the real
takeaway? Why did we find ourselves in a
situation where we ended up with too
milk, too much milk?
>> We didn't know what the other person
>> we didn't know what the other person was
doing. And to really geek out on this,
we inspected the state of a variable
that was in the process of being updated
by someone else. And this is a thing in
computing as far back as Scratch. Recall
with Scratch, you could have multiple
scripts running at the same time for a
single sprite because Scratch in effect
is multi-threaded. You can have a single
sprite doing multiple things in parallel
by having those multiple scripts.
Similarly, here your room is sort of
multi-threaded because you have two
independent beings who can both go to
the store, solve the same problem in
parallel. The problem though is that if
one is not aware that the other is doing
that work already, you might make poor
decisions. So, in the real world, what
should the first roommate have done
after inspecting the state of the
refrigerator and realizing, "Oh, we're
out of milk." Okay, call the other
roommate or maybe more simply like put a
note on the door or like maybe
dramatically lock the refrigerator
somehow. And in fact, that's a term of
art in databases is to actually use a
database lock so that if you are in the
process of updating the value in the
database, lock it so that no one else
can inspect the value of that database
and potentially make a poor decision. So
when might this actually happen in the
real world rather than the contrived
milk example. So there are a lot of
social media posts nowadays that are
quite popular. To this day, as of today,
this is still the most popular Instagram
post for instance. And imagine when this
was first posted, hundreds, thousands,
hundreds of thousands of people might
have all been clicking the heart icon
essentially at the same time. Now, Meta
uh the company behind Instagram
presumably has lots and lots of
different servers, but let's suppose for
the sake of discussion they have a
single database, which is not true, but
the danger is still there. Even with
multiple databases, all of these
different web servers are talking to the
same database. And suppose those those
servers are using Python code and hey
the CS50 library that might look a
little something like this in order to
decide how to update the total number of
likes for an Instagram post. The first
line of code running on meta servers
might say this. Get these rows as
follows. execute a query like select the
current number of likes from the posts
table where the ID of the post is
whatever it is 1 2 3 4 5 6 whatever
notice no SQL injection attacks uh
possible here because I'm using the
placeholder not an F string then the
next line of code running on meta server
maybe just stores in a variable just to
make the code more readable uh the first
rows likes column so it's again it's the
CS50 library in the story rows is a list
of dictionaries so this is the first
such element in the list and this is the
likes column in the column we just
selected the temporary table. Lastly,
what do we want to do? Well, we want to
plus+ essentially that total. So, we
update the post table setting the number
of likes equal to this question mark
where the ID equals this question mark.
And we didn't see this already, but the
CS50 library supports indeed multiple
arguments after the SQL string. I'm
going to update the number of likes to
be likes plus one. Plugging in the same
ID of that post. So in short, take on
faith that it's quite common that in
order to achieve one small goal like
updating the number of likes stands to
reason you might need to do two database
queries or three lines of code. Now if
these lines of code are executing on
multiple web servers, you could
certainly imagine that if people are
hitting the the like button pretty much
at the same time, maybe one server is
going to execute this first line of code
and it's going to get its answer. Maybe
there's a hundred likes at this point in
the story. And then just by chance on
another server, this line of code is
also executed, but it too gets the same
answer. There's currently a hundred
likes. Meanwhile, the first server in
the story continues to do its execution
of code such that it updates the number
of likes from 100 to 101. But because
the other server was essentially running
the same code in parallel, it's going to
make the same mathematical decision and
update the number of posts, the number
of likes from 100 to 101. But at this
point in the story, the number of likes
should obviously be 10. and two, so
we've lost data. And that's one of the
dangers of a race condition is that
you'll end up with an inaccurate result.
And for a company like Meta, they don't
want to go losing data like likes like
this. Like that actually drives
engagement and so forth. And so like
that's genuinely a technical, if not a
business problem as well. So it's
analogous to sort of the milk problem,
but actually at scale. So what's the
solution? There's a bunch of different
ways, but conceptually, we just want to
lock the database when this logic is
being executed such that when one server
is updating the number of likes, no one
else should be allowed to update the
like count at the same time. Now, that's
a little crazy for someone as big as
Meta because you're really just
serializing all of these likes and
slowing things down. So, there's more
fine grain control nowadays, namely
called transactions, where you can
essentially lock not the whole table and
certainly not the whole database, but
just the row in question, for instance.
And so you would use commands in SQL
like begin transaction and then execute
the lines of code that you want. And
then when you're ready to commit it,
that is save it, you use the commit
command. But if something goes wrong or
you get interrupted, you can actually
roll back the whole thing. And what this
kind of code does in effect by using
more verbose uh CS50 and Python code
like this is you can ensure that those
three lines of code inside or
technically the two database queries
inside will either both be executed or
not at all. They will not be
interrupted. And that's the fundamental
solution to this problem analogous to
putting a lock on the fridge or by
leaving a note or calling your roommate
preventing them from making the same
decision themselves.
questions then on these race conditions
the solutions again even though this
won't be gerine for CS50 simply using
techniques like locks and what we called
transactions
no all right then a final moment to end
on uh we would not be a computer science
course if we didn't introduce you to a
few pieces of CS cannon uh here is a
sort of meme that's circulated for years
when it comes to like optical character
recognition OCR of like toll booths
trying to detect your license plate
automatically
This is someone trying to have a funny
old time tricking the city into deleting
their database altogether. Because if
you're just scanning this off of
someone's license plate or front of the
car and just blindly plugging it in
without sanitizing their input, escaping
their input with something like a good
library, you might very well drop the
entire database. As an aside, something
did something similar too where I think
they made their license plate null. NL,
which just confused the heck out of the
system, too, because the programmers
didn't understand why null was all over
the place when lights were being run and
whatnot. And lastly, a very famed uh
character in the world of XKCD as
computer science circles goes is this.
So we'll end as we've done before on an
awkward silence as you process this here
canonical CS joke.
>> Now you two know who Bobby Tables is.
All right, that's it for week seven.
We'll see you next time.
Heat. Heat.
All right. This is CS50 and this is our
lecture on artificial intelligence or
AI. Particularly for all of those family
members who are here in the audience
with us for the first time. In fact, uh
for those students among us, maybe a
round of applause for all of the family
members who have come here today to join
you.
Nice. So nice to see everyone. And as
CS50 students already know, it's sort of
a thing in programming circles to uh
have a rubber duck on your desk. Indeed,
a few weeks back, we gave one to all
CS50 students. And the motivation is to
have someone something to talk to in the
presence of a bug or mistake in your
code or confusion you're having when it
comes to solving some problem. And the
idea is that in the absence of having a
friend, family member, TA of whom you
can ask questions is to literally
verbalize your confusion, your question
to this inanimate object on your desk.
And in that process of verbalizing your
own confusion and explaining yourself,
quite often does that proverbial light
bulb go off over your head and voila,
problem is solved. Now, as CS50 students
also know, we sort of virtualized that
rubber duck over the past few years and
most recently in a form of uh this guy
here. So, in students programming
environment within CS50, a tool called
Visual Studio Code at a URL of CS50.dev,
they have a virtual rubber duck
available available to them at all
times. And early on in the very first
version of this rubber duck, it was a
chat window that looked like this. And
if students had a question, they could
simply type into the chat window
something like, "I'm hoping you can help
me solve a problem." And for multiple
years, all the CS50 duck did was respond
with one, two, or three quacks. Uh we
have anecdotal evidence to suggest that
that alone was enough for answering
students questions because it was in
that process of like actually typing out
the confusion that you realize, oh, I'm
doing something silly and you figure it
out on your own. But of course now that
we live in an age of chatgbt and claude
and gemini and all of these other AI
based tools came as no surprise perhaps
when in 2023 this same duck started
responding to students in English and
that now is the tool that they have
available which is in effect meant to be
a less helpful version of chat GPT one
that doesn't just spoil answers outright
but tries to guide them to solutions
akin to any good teacher or tutor and so
today's lecture is indeed on just that
and the underlying building blocks that
make possible that their rubber duck in
all of the AI with which we're all
increasingly familiar, namely generative
artificial intelligence using this
technology known as AI to generate
something, whether that's images or
sounds or video or text. And in fact,
what we thought we'd do to get everyone
involved early on is if you uh have a
phone uh by your side, if you'd like to
go ahead and scan this QR QR code here,
and that's going to lead you to a
polling station where you can buzz in
with some answers. Um, CS50's preceptor
Kelly is going to kindly join me here on
stage to help run the keyboard. And what
we're about to do is play a little game
and see just how good we humans are
right now at distinguishing AI from
reality. And so we'll borrow some data
from uh the New York Times, which a
couple years back actually published
some examples of AI and not AI, and
we'll see just how good this this
technology has gotten. So here we have
two photographs on the screen. In a
moment, you'll be asked on your phone,
if you were successful in scanning that
code, which one of these is AI, left or
right.
So hopefully on your phone here, if you
want to go ahead and swipe to the next
screen, we'll activate the poll here. In
a moment, you should see on your phone a
prompt inviting you to select left or
right.
And feel free to raise your hand if
you're not seeing that. But it looks
like the responses are coming in. And at
the risk of spoiling, it looks like 70%
plus of you think it is the answer on
the right. And if Kelly, maybe we could
swipe back to the two photographs. In
this particular case, yes, it was in
fact the one on the right. Maybe it
looked a little too good or maybe a
little too unreal. Maybe. Let's see
maybe a couple of other examples. So,
same QR code. No need to rescan. Let's
go ahead and pull up these two examples.
Now, two photographs, same question.
Which of these is AI? Left or right?
left
or right.
All right, want to take a look at the
chart, see what the responses are coming
in a little closer in this case, but a
majority of you think the answer is in
fact left here, though 5% of you were
truthfully admitting that you're unsure.
But Kelly, if you want to swipe back to
the photos, the answer this time was in
fact a trick question. They were both in
fact AI, which perhaps speaks to just
how good this technology is already
getting. Neither of these faces exists
in the real world. It was synthesized
based on lots of training data. So, two
photographs that look like humans but do
not in fact exist. How about one more?
This time focusing on text, which will
be uh the focus, of course, underlying
our duck. Did a fourth grader write this
or the new chatbot? Here are two final
examples. Uh same code as before, so no
need to rescan. And here are the texts.
Essay one. I like to bring a yummy
sandwich and a cold juice box for lunch.
And sometimes I'll even pack a tasty
piece of fruit or a bag of crunchy
chips. As we eat, we chat and laugh and
catch up on each other's day. dot dot
dot. C. Essay two. My mother packs me a
sandwich, a drink, fruit, and a treat.
When I get into a lunchroom, I find an
empty table and sit there and eat my
lunch. My friends come and sit down with
me. dot dot dot. The question now,
lastly, is which of these is AI? One or
two?
Essay one or two? The bars here are
duking themselves out. Looks like a
majority of you say essay one. Let's go
back to the text. And someone of you who
one of you who says essay 1, why if you
want to raise a quick hand? Why essay
one? Yeah.
>> Okay. And so essay 2 looks more like you
would write. And can I ask what grade
you are in?
>> A fifth grader. So is this a new fifth
grader or not? The answer here in fact
is that essay one is the AI because
indeed essay 2 is more akin to what a
fourth or if I may a fifth grader would
write. And I dare say there are maybe
some telltale signs. I'm not sure a
typical fourth grader or fifth grader
would catch up on each other's day in
the vernacular that we see in essay one.
But suffice it to say this game is not
something we can play for in the years
to come because it's just going to get
too hard to discern something that's AI
generated or not. And so among our goals
for today is really to give you a better
sense of not just how technologies like
this duck and these games that we've
played here with images and text work,
but really what are the underlying
principles of artificial intelligence
that frankly have been with us and have
been been developing for decades and
have really now come to a head in recent
years thanks to advances in research,
thanks to all the more cloud computing,
thanks to all the more uh memory and
disk space and information sheer volume
thereof that we have at our disposal
that can be used to train all of these
here technologies. ies. So that their
duck is built on a fairly complicated uh
architecture that looks a little
something like this where here's a
student using one of CS50's tools.
Here's a website with which CS50
students are familiar called CS50.AI AI
where we the staff wrote a bunch of code
that actually talks to what are called
APIs, application programming
interfaces, thirdparty services by
companies like Microsoft and OpenAI that
really have been doing the hard work of
developing these models as well as some
local sweet uh some local sauce that we
CS50 add into the mix to make it
specific the ducks answers to CS50
itself. But what we've essentially been
doing is uh something that with which
you might be familiar in part prompt
engineering which has started popping up
for better or for worse on uh LinkedIn
profiles everywhere. And prompt
engineering really it's not so much a
form of engineering as it is a form of
asking good questions and being detailed
in your question giving context to the
underlying AI so that the answer with
high probability is what you want back.
And so there's two terms in this world
of prompt engineering that are worth
knowing about. So in CS50 has leveraged
both of these to implement that duck. We
for instance wrote what's called a
system prompt which are instructions
written by us humans often in English
that sort of nudge the underlying AI
technology to have a certain personality
or a specific domain of expertise. For
instance, we CS50 have written a system
prompt essentially that looks like this.
In reality, it's like a lot of lines
long nowadays, but the essence of it is
this. You are a friendly and supportive
teaching assistant for CS50.
You are also a rubber duck and that is
sufficient to turn an AI into a rubber
duck. It turns out answer student
questions only about CS50 in the field
of computer science. Do not answer
questions about unrelated topics. Do not
provide full answers to problem sets as
this would violate academic honesty.
Answer this question colon and after
that preamble if you will aka system
prompt we effectively copy paste
whatever question a student has typed in
otherwise known as a user prompt. And
that is why the duck behaves like a duck
in our case and not a cat or a dog or a
PhD, but rather something that's been
attenuated to the particular goals we
have pedagogically in the course. And in
fact, those of you who are CS50 students
might recall from quite some weeks ago
in week zero when we first introduced
the course uh to the class, we had code
that we whipped up that day that
ultimately looked a little something
like this. And I'll walk through it
briefly line by line. But now on the
heels of having studied some Python in
CS50, this year code that I whipped up
in the first lecture might make now a
bit more sense. In that first lecture,
we imported OpenAI's own library code
that a third party company wrote to make
it possible for us to implement code on
top of theirs. We created a variable
called client in week zero and this gave
us access to the OpenAI client. That is
software that they wrote for us. We then
defined in week zero a user prompt which
came from the user using the input
function with which CS50 students are
now familiar. And then we defined this
system prompt that day where I said
limit your answer to one sentence.
Pretend you're a dot dot dot cat I think
was the persona of the day. And then we
used some bit more arcane code here. But
in essence we created a variable called
response which was meant to represent
the response from OpenAI server. We used
client.responses.create create which is
a function or method that OpenAI gives
us that allows us to pass in three
arguments. The input from the user that
is the user prompt the instructions from
us that is the system prompt and then
the specific model or version of AI that
we wanted to use and the last thing we
did that day was print out
response.output_ext
and that's how we were able to answer
questions like what is CS50 or the like.
So, we've seen all of that before, but
we didn't talk about that week exactly
how it was working or what more we could
actually do with it. And so, in fact,
what I thought we'd do today is peel
back a layer that we've not allowed into
the course up until now. And indeed, you
still cannot use this feature until the
very end of the class in CS50 when you
get to your final projects, at which
point you are welcome and encouraged to
use VS Code in uh this particular way.
So, here again is VS Code. For those
unfamiliar, this is the programming
environment we use here with students.
And let me open up some code that was
assigned to students a couple of weeks
back, namely a spell checker that they
had to implement in C. So I came in
advance with a folder called speller.
And inside of this folder, I had code
that day and all students had that week
called dictionary.c. And in this file,
which will not look familiar to many of
you if you've not taken weeks 0 through
uh seven up until now, we did have some
placeholders for students. So long story
short, students had to answer a few
questions. that is write code to do this
to-do, this to-do, this to-do, and one
more. There were four functions or
blanks that students needed to fill in
with code. And I dare say it took most
students 5 hours, 10 hours, 15 hours,
something in that very broad range. Let
me show you now how using AI, you soon,
the aspiring programmers can start to
write code all the more quickly. not by
just choosing a different language but
by using these AI best based
technologies beyond the duck itself. So
what I've done here on the right hand
side of VS code is enabled a feature
that CS50 disables for all students from
the start of the course called copilot.
This is very similar in spirit to
products from Google um and anthropic
and other companies as well. But this is
the one that comes from Microsoft and in
turn GitHub here and it too gives us me
sort of a chat window here and this is
just one of its features. For instance,
if I wanted to implement to get started
the check function, I could just ask it
to do that. Implement the check function
and uh how about using a hasht in C. I'm
going to go ahead and click enter. Now
it's going to work. It's using as
reference that is context the very file
that I've opened which is dictionary.c
here. Um, copilot in general as as well
as a lot of AI tools are familiar with
CS50 itself because it's been freely
available as open courseware for years.
What you see here it doing is
essentially thinking though that's a bit
of an overstatement. It's not really
thinking. It's trying to find patterns
in what the the problem is I want to
solve among all of its training data
that it's seen before and come up with a
pretty good answer. So for today's
purposes, I'm going to wave my hand at
the chat GPT like explanation of what to
do that has appeared at right. But
what's juiciest to look at here is on
the left if I now scroll down is
highlighted in green is all of the
suggested code for implementing this
here check function. Now it might not be
the way you implemented it yourself but
I do dare say this has hints of exactly
what you probably did when it came to
implementing a hash a hash table. And in
fact I can go ahead and keep all of this
code if I like how it looks. Let's
assume that's all correct there. Uh it
might be the case that I want to now
implement the load function. So how
about now implement load function enter
as simple as that. And what data is
being used? Well, a few different
things. It says one reference. So it's
indeed using this one file. But there's
also what are called comments in the
code with which all students are now
familiar. These slash commands in gray
that are giving English hints as to what
this function is supposed to do. There's
implicit information as to what the
inputs to these functions, otherwise
known as arguments are meant to be, what
the outputs are meant to be. So the
underlying AI called co-pilot here kind
of has a decent number of hits hints and
much like a good TA or good software
engineer that's enough context to figure
out how to fill in those blanks. And so
here too if I scroll down now we'll see
in green some suggested code via which
it could uh solve that same problem as
well. the load function. And I dare say
I've been talking for far fewer minutes
than CS50 students spent actually coding
the solution from scratch to this here
problem. So I'll go ahead and click
keep. I'll assume that it's correct. But
that's actually quite a big assumption.
And those of you wondering like why have
we been learning off all this? If I
could just ask in English it to do my
homework for me. I mean there's a lot to
be said for the muscle memory that
hopefully you feel you've been
developing over the past several weeks.
The reality is if you don't have an eye
for what you're looking at, there's no
way you're going to be able to
troubleshoot an issue in here, explain
it to someone else, make marginal
changes or the like. And yet, what's
incredibly exciting even to someone like
me, all of the staff, friends of mine in
the industry, is that this kind of
functionality and AI amplifies your
capabilities as a programmer sort of
overnight. Once you have that
vocabulary, that muscle memory for doing
it yourself, the AI can just take it
from there and get rid of all of the
tedium, allow you to focus at the
whiteboard with the other humans on sort
of the overarching problems that you
want to solve and leave it to this AI to
actually solve problems for you. A fun
exercise too might be to go back uh at
terms end and try solving any number of
the courses assignments. For instance,
let me go ahead and do this. In my
terminal window here, I'm going to go
back to my main directory. I'm going to
create an empty file called Mario.c. C
that has nothing in it. And I'm going to
go ahead in my chat window here and say,
please implement a program in C that
prints a left aligned pyramid of bricks
using hash symbols for bricks and use
the CS50 library to ask the user for a
non negative height as an integer.
Period. I dare say that's essentially
the English description of what was for
CS50 this year problem set one to
implement a program called Marioc. This
two is sort of doing its thing. It's
using one reference. It's working. It
knows as a hint that this file is called
Mario.c. And it's seen a lot of those in
its training data over time. There's an
English explanation of what I should do.
And those CS50 students in the room
probably recognize the sort of basic
structure here of using a dowh loop to
prompt the user for a height using the
CS50 library which has been included.
print a left alto line pyramid using
some kind of loop and boom, we are done.
And these are fairly bite-sized problems
as you'll see as you get to terms end
with your final project, which is a
fairly open-ended opportunity to apply
your newfound knowledge and savvy with
programming itself to a problem of
interest. It will allow you to implement
far grander projects, far greater
projects than has been possible to date,
certainly in just the few weeks we have
to do it because of this uh
amplification of your own abilities. So
with that promise, let's talk about how
in the heck any of this is actually
working. I clearly just generated a
whole lot of stuff and that's how we
began the story with the generation of
those images and those two essays by
kids. But what is generative artificial
intelligence or really what is AI
itself? And these are some of the
underlying building blocks that aren't
going anywhere anytime soon and indeed
have led us as a progression to the
capabilities you just saw. So spam, we
sort of take for granted now that in our
Gmail inboxes or Outlook inboxes, most
of the spam just ends up in a folder.
Well, there's not some human at
Microsoft or Google sort of manually
labeling the messages as they come in,
deciding spam or not spam. They're
figuring out using code and nowadays
using AI that looks like spam and
therefore I'm going to put it in the
spam folder, which is probably correct
99% of the time, but indeed there's
potentially a failure rate. Um, other
applications might include handwriting
recognition. Certainly Microsoft and
Google doesn't know the handwriting
style of all of us here in this room,
but it's been trained on enough other
humans handwriting styles that odds are
your handwriting in mine looks similar
to someone else's. And so with very high
probability, they could recognize
something like Hello World here as
indeed that same digital text. All of us
are into streaming services nowadays,
Netflix and the like. Well, they're
getting pretty darn good at knowing if I
watched X, I might also like Y. Why?
Well, because of other things I've I've
watched before and maybe upvoted and
downvoted. Maybe because of other things
people have watched who like similar
movies or TV shows to me. So that too is
AI. There's no ifels else if else if
else construct for every movie or TV
show in their database. It's sort of
figuring out much more organically,
dynamically what you and I might like.
And then all these voice assistants
today, Siri, Alexa, Google Assistant,
and the like. Those two don't recognize
your voice or necessarily know what
questions you're going to ask it.
There's no massive if else if that has
all possible questions in the world just
waiting for you or me to ask it. That
too, of course, is dynamically
generated. But that's getting a bit
ahead of ourselves. Let's like rewind in
time. And some of the parents in the
audience might remember this year game
among the first arcade games in the
world, namely Pong. And so this was a
black and white game whereby there's two
players, a paddle on the left, a paddle
on the right, and then using some kind
of joystick or track ball, they can move
their paddles up and down, and the goal
is to bounce the ball back and forth and
ideally catch it every time. Otherwise,
you uh lose a point. Uh this is just an
animated GIF, so there's nothing really
dramatic to watch. It's going to stay at
15 against 12. Uh just looping again and
again. Nothing interesting is going to
happen, but this is a nice example of a
game that lends itself to solving it
with code. And indeed, it's been in our
vernacular for years to play against not
just the computer, but the the CPU, the
central processing unit, or really the
AI. And yet, AI does not need to be
nearly as sophisticated as the tools we
now see. For instance, here's a
successor to Pong known as Breakout.
Similar in spirit, but there's just one
paddle and one ball, and the goal is to
bounce the ball off of these colorful
bricks, and you get more and more points
depending on how high up you can get the
ball. All of us as humans, even if
you've never played this old school
game, probably have an instinct as to
where we should move the paddle. If the
ball just left it going this way, which
direction should I move the paddle? I
mean, probably to the left. And indeed,
that'll catch it on the way down. So,
you and I just made a decision that's
fairly instinctive, but it's been
ingrained in us, but we could sort of
take all the fun out of the game and
start to quantify it or describe it a
little more algorithmically, step by
step. In fact, decision trees are a
concept from economics, strategic
thinking, computer science as well.
That's one way of solving this problem
in such a way that you will always play
this game well if you just follow this
algorithm. So, for instance, how might
we implement uh code uh or decision-m
process for something like breakout?
Well, you ask yourself first, is the
ball to the left of the paddle? If so,
you know where we're going, then go
ahead and move the paddle left. But what
if the answer were no? In fact, well,
you don't just blindly move the paddle
to the right. probably. What should you
then ask?
>> Are we right below the ball?
>> Are you right below the ball? If the
ball's coming right at you, you don't
want to just naively go to the right and
then risk missing it. So, there's
another question to ask. Is the ball to
the right of the paddle? And that's a
yes no question. If yes, well then okay,
move it to the right. But if not, you
should probably stay exactly where you
are and don't move the paddle. All
right, so that's fairly deterministic,
if you will. Um, and we can map it to
code using pseudo code in uh say a class
like CS50. We can say in a loop, well,
while the game is ongoing, if the ball's
to the left of the paddle, then move the
paddle left. Uh, else if the ball's to
the right of the paddle, sorry for the
typo there, move the paddle right. Uh,
else just don't move the paddle. And so
these decision trees, as we drew it,
have a perfect mapping to code or really
pseudo code in this particular case,
which is to say that's how people who
implemented the breakout game or the
pawn game, who implemented a computer
player surely coded it up. It was as
straightforward as that. But how about
something like tic-tac-toe, which some
of you might have played on the way in
for just a moment on the scraps of paper
um that you might have had. Uh here we
have a tic-tac-toe board with two uh O's
and two X's. For those unfamiliar, this
game tic-tac-toe, otherwise known as
knights and crosses, is a matter of
going back and forth, X's and O's
between two people. And the goal is to
get three O's in a row or three X's in a
row, either vertically, horizontally, or
diagonally. So this is a game here in
mid-progress. Well, let's consider how
you could solve the game of tic-tac-toe
like a a computer, like an AI might.
Well, you could ask yourself, can I get
three in a row on this turn? Well, if
yes, well, play in the square to get
three in a row. It's as straightforward
as that. If you can't, though, what
should you ask? Well, can my opponent
get three in a row on their next turn?
Because if so, you should probably at
least block their move next, so at least
you don't lose. now. But this game,
tic-tac-toe, is relatively simple as it
is, gets a little harder to play when
it's not obvious where you should go.
Now, all of us as humans, if you grew up
playing this game, probably had
heruristics you used, like you really
like the middle or you like the top
corner or something like that. So, we
probably can uh make our next move
quickly, but is it optimal? And I dare
say if back in childhood or more
recently you've ever lost a game of
tic-tac-toe like you're just bad at
tic-tac-toe because logically there's no
reason you should ever lose a game of
tic-tac-toe if you're playing optimally.
At worst you should force a tie but at
best you should win the game. So think
of that the next time you play
tic-tac-toe and lose like you're doing
something wrong. But in your defense
it's because the question mark is sort
of not obvious. like how do I answer it
when the answer is not right in front of
me to move for the win or move for the
block? Well, one algorithm you could
have been using all of these years is
called Miniax. And as the name suggest,
it's all about minimizing something and
or maximizing something else. So here
too, let's take a bit of fun out of the
game and turn it into some math, but
relatively simple math. So here we have
three representative tic-tac-toe boards.
O has one here, X has one here, and the
middle is a tie. Doesn't matter how we
score these boards, but we need a
consistent system. So I'm going to
propose that anytime O wins the score of
the game is negative 1. Anytime X wins,
the score of the game is a positive one.
And anytime nobody wins, the score is
zero. Um so at this point each of these
boards have these values negative 1, 0,
and one. So the goal therefore in this
game of tic-tac-toe now is for X to
maximize its score because one is the
biggest value available and O's goal in
life is to minimize its score. So that's
how we take the fun out of the game. We
turn it into math where one player just
wants to maximize, one player just wants
to minimize their score. All right, so a
quick uh sanity check here. Here's a
board. It's not colorcoded. What is the
value of this board?
>> One because x has in fact one straight
there down the middle. So x is one zero
o is negative one otherwise a tie. So
now let's see how we go about with those
principles in place figuring out where
we should play in tic-tac-toe. Now,
here's a fairly easy configuration.
There's only two moves left. It's not
hard to figure out how to win or tie
this game. But let's use it for simpl
for simplicity. It's O's turn, for
instance. So, where can O go? Well, that
invites the question, well, what is the
value of the board? Or how do we how do
we minimize the value of the board for O
to win? Well, O can go in one of two
places, top left or bottom middle. Which
way should O go? Well, if O goes in top
left, we should consider what's the
value of this board? Is it minimal?
Well, let's see. uh if O goes here, X is
obviously going to go here. X is
therefore going to win. So the value of
this board is going to be a one. Now
since there's only one way logically to
get from this configuration to this one,
we might as well call the value of this
board by transitivity one. And so O
probably doesn't want to go there
because that's a pretty maximal score
and O wants to minimize. Over here
though, if O goes bottom middle, well
then X is going to go top left. And now
no one has one. So the value of this
board is thus
>> zero. we might as well treat this as
zero because that's the only way to get
there logically. So now O more
mathematically and logically can decide
do I want an end point of one or an end
point of zero. Well zero is probably the
better option because that's less than
one and thus it's the minimal
possibility. So O is going to go ahead
in the bottom middle and at least force
a tie. And so that's where you see
evidence where if you humans are ever
losing the game of tic-tac-toe, you have
not followed that their logic. But you
could probably do it if there's just two
moves left. But the catch is, let's go
ahead and sort of rewind to three moves
left here. There are three blanks. And
I've kind of zoomed out. The catch is
that the decision tree gets a lot bigger
the more and more moves that are left.
It gets sort of bigger and bushier in
that it's essentially doubling in size
and width. And that's great if you have
the luxury of writing it down on a piece
of paper. But if you're doing this on
your head while playing against a a
fifth grader, if I may, you're probably
not drawing out all of the various
boards and configurations, trying to
play it optimally. You're going with
some instinct. And your instincts might
not be aligned with an algorithm that is
tried andrude miniax that will ideally
get you to win the game, but at least
will get you to force a tie if you can't
win. But tic-tac-toe is not that hard. I
mean, how many different ways are there
to play tic-tac-toe? could write a
computer program to pretty much play
tic-tac-toe optimally. Um, we could use
code like this. If the player is X for
each possible move, calculate the score
for the board at that point in time and
then choose the move with the highest
score. So, you just try all
possibilities mathematically and then
you make the decision. Most of us in our
heads are not doing that, but we could.
Um, else does the player essentially do
the same thing, but choose the minimal
possible score. So, that's the code for
implementing tic-tac-toe. How many ways
are there to play tic-tac-toe though?
Well, 255,168,
which means if we were to draw that
tree, it would be pretty darn big and it
would take you quite a bit of time to
sort of think through all those
possibilities. So, in your defense,
you're maybe not that bad at
tic-tac-toe. It's just harder than you
thought as a game. But what about games
with which we might as adults be more
familiar? Well, what about the game of
chess, which is often used as a measure
of like how smart a computer is, whether
it's Watson back in the day playing
against it or something else? Well, if
we consider even just the first four
moves of tic-tac-toe, whereby I mean
black goes and white goes, and then they
each go three more times. So, four
pair-wise moves. How many different ways
are there to play chess? Well, it turns
out 85 billion just to get the game
started. And that's a lot of decisions
to consider and then make. How about the
game of Go a familiar? Consider the
first four move 266 quintilion
possibilities. And this is where we sort
of as humans and even with our modern
PCs and Macs and phones kind of have to
throw up our hands because I don't have
this many bytes of memory in my
computer. I don't have this many hours
in my life left to actually crunch all
of those numbers and figure out the
solution. And so where AI comes in is
where it's no longer as simple as just
writing if else's and loops and no
longer as simple as just trying all
possibilities. You instead need to write
code that doesn't solve the problem
directly but in some sense indirectly.
You write code so that the computer
figures out how to win. Perhaps by
showing it configurations of the board
that are a good place to be in that is
promising and maybe showing it boards
that it doesn't want to find itself in
the configuration of because that's
going to lead it to lose. In other
words, you train it but not necessarily
as exhaustive. And this is what we mean
nowadays by machine learning. writing
code via which machines learn how to
solve problems generally by being
trained on massive amounts of data and
then in new problems looking for
patterns via which they can apply those
past training data to the problem at
hand. And reinforcement learning is one
way to think about this. In fact, in
fact, we as humans use reinforcement
learning which is a type of machine
learning sort of all of the time. Um in
fact uh uh a fun demonstration to watch
here involves these here are pancakes.
So, in fact, let me go ahead and pull up
a short recording here of an actual
researcher in a lab who's trying to
teach a robot how to make uh how to flip
pancakes. So, we'll see here in this
video that there's a robot has a arm
that can go up, down, left, right. This,
of course, is the human, the researcher,
and he's just going to show the robot
one or more times like how to flip a
pancake
and crosses his fingers and okay, seems
to have done it well. Does it again. Not
quite the same, but pretty good. And now
he's going to let the robot just try to
figure out how to flip that pancake
after having just trained it a few
different times. The first few times,
odds are the robot's not going to do
super well cuz it really doesn't
understand what the human just did or
what the whole purpose of. But and
here's the key detail with reinforcement
learning. Behind the scenes, the human
is probably rewarding the robot when it
does a good job. like better and better
it flips, the more it gets rewarded as
by like hitting a key and giving it a
point, for instance, or giving it the
digital equivalent of a cookie. Or
conversely, every time the robot screws
up and drops the pancake on the floor,
sort of a proverbial slap on the wrist,
a punishment so that it does less of
that behavior the next time. And any of
you who are parents, which by definition
today, many of you are, odds are,
whether it's not this or maybe just
verbal uh approval or reprimands, have
you probably trained children at some
point to do more of one thing and less
of another. And what you're seeing in
the backdrop there is now just a
quantization of the movements X, Y, and
Z coordinates so that it can do more of
the X's and the Y's and the Z that led
it to some kind of reward. And now after
you're up to some 50 trials, the robot
seems to be getting better and better
such that like a good human, we'll see
if I can do this without embarrassing
myself, can flip the thing. That's
pretty good. That was pretty I've been
doing this a long time. Okay,
so we've seen then how you might uh
reinforce learning through that kind of
domain. Let's take an example that's
familiar to those of you who are gamers.
Anytime you've played a game where
there's some kind of map or a world that
you need to explore up, down, left,
right, maybe you're trying to get to the
exit. So here simplistically is the
player at the yellow dot. Here for
instance in green is the exit of the map
and you want to get to that point. And
maybe somewhere else in this world
there's a lot of like lava pits and you
don't want to fall into the lava pit
because you lose a life or you lose a
point or there's some penalty or
punishment associated with that. Well,
we with this bird's eye view can
obviously see how to get to the green
dot. But if you're playing a game like
Zelda or something like that, all you
can do is move up, down, left, right,
and sort of hope for the best. So, let's
do just that. Suppose the yellow dot
just randomly chooses a direction and
goes to the right. Well, now we can sort
of take away a life, take away a point
or effectively punish it so that it
knows don't do that. And so long as the
uh player has a bit of memory, either
the human player or the code that's
implementing this just with a dark red
line, that means don't do that again
because that didn't lead to a good
outcome. So maybe the next time the
yellow dot goes this way and this way
and then ah didn't realize that that's
actually the same lava pit. But that's
fine. Use a little bit more memory and
remind me don't do that because I just
lost a second life in this story and
maybe it goes this way next time. Ah,
now I need to remember don't do that.
But effectively, I'm either being
punished for doing the wrong thing. Ah,
or as we'll soon see, being rewarded for
doing more of the successful thing. And
just by chance, maybe I finally make my
way to the exit in this way. And so I
can be rewarded for that. Now I got 100
points or whatever it is, the high
score. So now, as per these green lines,
I can just follow that path again and
again, and I can always win this game.
kind of like me nowadays, like 30 years
later, playing Super Mario Brothers
because I can get through all the warp
levels because I know where everything
is because for some reason that's still
stored in my brain. Is this the best way
to play? Am I as good at Super Mario
Brothers as I might think?
What's bad about this solution? Yeah.
>> Exactly. Yeah. I've moved many more
times than I need to. And just for fun
today, what grade are you in?
>> Uh, seventh.
>> Seventh grade. Wonderful. So now seventh
grade observation is like exactly that
that we could have taken a shorter path
which is essentially that way albeit uh
making some straight moves. And so we're
never going to find that shorter path.
We're never going to get the highest
score possible if I just keep naively
following my welltrodden path. And so
how do we break out of that mold? And
you can see this even in the real world.
Another sort of personal example is I'm
the type of person for some reason where
if I go to a restaurant for the first
time, I choose a dish off the menu and I
really like it. I will never again order
anything else off that menu other than
that dish because I know it is good. But
there could be something even better on
the menu, but I'm never going to explore
that because I'm sort of fixed in my
ways, as some of you from the smiles
might be too. But what if we took
advantage of exploring just a little
bit? And there's this principle of
exploring versus exploiting when it
comes to using artificial intelligence
to solve problems. Up until now, I've
just been exploiting knowledge I already
have. Don't go through the red walls. Do
go through the green walls. Exploit,
exploit, exploit. and I will get to a
final solution. But what if I just
sprinkle in a little bit of randomness
along the way and maybe 10% of the time
as represented by this epsilon variable,
I as the computer in the story generate
a random number between zero and one.
And if it's less than that percent,
which is going to happen 10% of the
time, I'm going to make a random move
instead of one that I know will get me
closer to the exit. Otherwise, I'll
indeed make the move with the highest
value. Now, this isn't going to
necessarily win me the game that first
time, but if I play it enough and enough
and enough and insert some of this
randomness, I might very well find a
better solution and therefore be a
better player, a better winner overall.
If I just 10% of the time ordered
something else off the menu, I might
find that there's an amazing dish out
there that otherwise I wouldn't have
discovered. And so indeed using that
approach can we finally find a more
optimal path through the maze as was
shorter there presumably therefore
maximizing our score and doing even
better than we might have by just
exploiting the same knowledge. So you
can see this even in the game of
Breakout especially if you write a
solution in code to play this game for
you. Let me go ahead and pull up another
video recording of an AI playing
Breakout. And what this AI is doing is
essentially figuring out maybe more
intelligently than you or I could, how
to play this game optimally. And what
we'll see here is that just like uh the
pancake flipping robot, there's some
notion of scoring and rewards and
penalties here. So like right now, the
paddle is just doing random stuff. It
doesn't really know how to play the game
yet, but it realizes after 200 episodes
that, oh, my score goes up if I hit the
ball and it goes down equivalently if I
miss it. and it's still a little
twitchy. It doesn't quite understand
what it's supposed to do and why. But if
you do it again and again and again and
it's rewarded andor punished enough,
you'll see that it starts to get pretty
good and closer to what a good human
might do. But here's where the algorithm
gets a little creepy. If you let it play
long enough, or if you and I, the humans
play long enough, you might find a
certain trick to the game. I dare say
the AI becomes a bit scarily sent
sentient in that turns out if you're
smart enough to break through that top
row, you can let the game just play
itself for you and maximize your score
without even touching the ball.
Something that I do find a little creepy
that I just figured out how to do that
without being told. But it's just a
logical continuation of rewarding it for
good behavior and punishing it for bad
behavior. So that next time you have an
occasion to play Breakout, consider that
kind of strategy as opposed to doing
more of the work yourself, let the
computer do it for you instead. Well,
what else is there to consider in this
world of AI in the context of machine
learning? Well, there's specifically a
category of learning that's supervised.
And we've been using this for years. And
in fact, our first example of spam early
on was certainly supervised. Why?
Because it was you and I who was like
putting the ma email into the spam
folder. to this day, maybe once a day, I
hit the keyboard shortcut in Gmail to
say, "Ah, this is spam. You should have
caught this." And that is training
Google's algorithm further, assuming
it's not just little old me, but maybe
thousands of people tagging that same
kind of email as spam. That's supervised
learning and that there's a human in the
loop doing at least something. Um, so
spam detection might be one of those.
But the catch is that labeling data in
that way manually just doesn't scale
very well. That would be akin to having
someone at Google or Microsoft labeling
every email or someone at Netflix doing
the same for all of the videos out
there. It's expensive in terms of human
power. And there's certainly problems
out there with so much data. It's just
not realistic for humans to label
millions of pieces of data, billions of
pieces of data. We've got to move to an
unsupervised model. And so this is where
the world starts to consider deep
learning, solving problems using code
whereby you don't even have humans in
the loop in quite the same way. and
neural networks inspired by the world of
biology are sort of the inspiration for
what is the state-of-the-art even
underlying today's rubber duck and more
generally these things called large
language models like chat GPT and the
like. So here pictured somewhat
abstractly is a neuron and it's
something in the human body that
transmits a signal say from left to
right electrically and if you have
multiple neurons you can
intercommunicate among them so that if I
think a thought uh then I know how to
raise my hand because some kind of
message electrically has gone from my
head to this extremity here. So that's
in essence what I remember from nth
grade biology. But as computer
scientists, we sort of abstract all of
this away. So instead of calling these
two neuron, drawing them as neurons,
let's just start drawing neurons as
these little circles. And if they have
connective tissue between them of sorts,
we'll just draw a a straight line an
edge between them. So this is what a
computer scientist would call a graph.
If you have two such neurons over here
leading to one out uh one neuron here,
you can think of this as being like
maybe two inputs to a problem and now
one output there too. We can represent
the notion of problem solving, which is
what CS50 and intro courses more
generally are all about. So let's solve
a problem with a neural network without
necessarily training it in advance, just
letting it figure out how to answer this
question. Here's a very simple
two-dimensional world, XY grid, and here
are two dots. And the dots in this world
are either blue or they are red. But I
have no idea yet what makes a dot blue
or red. However, if you train me on
those two dots, I bet I could come up
with predictions, especially if you let
me label this world in terms of x
coordinates on the horizontal,
y-coordinates on the vertical, and then
you know what? We can think of this
neural network very simply as
representing the x coordinate here, the
y-coordinate here, and the answer I want
to get is quote unquote red or blue or
zero or one or true or false, however
you want to think of the representation.
So, how do I get from a specific
xycoordinate to a prediction of color if
I only know the coordinates? Well, up
from the get-go, maybe the best I can do
is just divide the world into blue dots
on the left and red dots on the right. A
best fit line, if you will, based on
very minimal data. Of course, if you
give me a third dot, it's going to be
pretty easy to realize that I was a
little too hasty. That line is not
vertical. So, maybe we pivot the line
this way. And now I'm back in business.
Now, I can predict with higher
probability based on XY what color the
next dot will be. You give me enough of
these dots, I can come up with a pretty
good best fit line. It's not perfect,
but here's a hint at why AI is not
perfect, but 99% of the time, maybe I'll
be able to predict correctly. And I can
do even better if you let me squiggle
the line a little bit and maybe make it
more than just a simple uh slope. So,
what is it we're really doing with
implementing this neural network, albeit
simplistically with just three neurons?
Well, essentially, we're trying to come
up with three values, three parameters,
an A, a B, and a C. And what do those
represent? Well, really just a solution
to this formula. that their line we drew
can be represented if you think back to
like high school math with a formula
along these lines where by it's a * x
plus b * y plus some constant c and we
can just arbitrarily conclude that if
that value mathematically gives me a
number greater than zero predict it's
going to be blue otherwise predict it's
going to be red we can sort of map our
mathematics just like with tic-tac-toe
to the actual problem we care about by
defining the world in this way and so if
you give me enough data points and
enough data points I can come up with
answers for that A, that B, that C. The
so-called parameters in neural networks.
Now, in reality, neural networks are not
composed of like three neurons and a
couple of edges. They look a little
something more like this. And in
practice, they've got billions of these
things here on the screen. In which
case, pretty much every one of these
edges represents some mathematical value
that was contrived based on lots and
lots of training data. And whereas I,
the computer scientist, might know what
these neurons over here represent
because those are my inputs, three in
this case. and I, the computer
scientist, know what this one represents
at the end. If you sort of took the hood
off of this thing and looked inside the
neural network, even though there'd be
millions billions of numbers going on
there, I can't tell you what this neuron
represents or why this edge has this uh
weight. It's because of the massive
amount of training data that that's just
how the math works out. And if you feed
me more data, I might change some of
those parameters more. So the graph
ultimately might look quite different,
but my inputs and my outputs are going
to be what I use to solve that their
problem. So if you want to predict like
rainfall from humidity or pressure, you
can have two inputs giving that one
output. Uh advertising dollar spent in a
given month that might predict sales by
just having trained again on such
volumes of data. And when we get now
full circle to something like CS50's
rubber duck and large language models
like claude and gemini and chacht what's
really happening and this is all hot off
the press in recent years screenshotted
here are some of the recent research
papers that have driven a lot of this
advancement in recent years. you have
from open AAI say a generative
pre-trained transformer which is a lot
to say but there's the GPT in chat GPT
and essentially this is a neural network
that's been trained on large volumes of
textual information that gives us the
interactive chat feature that we have in
the class and we all have more generally
in chatbt itself. So an example of what
is actually happening underneath the
hood of these GPTs. Well, here's a
paragraph that up until recent years was
kind of a hard paragraph to end with the
dot dot dot. Uh, Massachusetts is a
state in the New England region of the
northeastern United States. It borders
on the Atlantic Ocean to the east. The
state's capital is dot dot dot. Now,
most anyone living in Massachusetts
probably knows that answer. But if this
AI has just been trained on lots and
lots of data, there's probably a lot of
people who say Massachusetts in part of
a sentence and then the answer, which I
won't say yet, is in uh the other part
of the sentence. But in this example,
given that the question we're asking is
sort of so far from some of the useful
keywords up until recently, this was a
hard problem to solve because there was
so much distance. Moreover, there's
these nouns that are being used to
substitute for the proper noun. Like we
suddenly start calling it a state, we
call it a state down here. And it wasn't
necessarily obvious to AIS that we're
talking about the same thing as if it
were just city, state, where you'd have
much more proximity. So in a nutshell,
what we now do especially to solve
problems like these is we first break
down a sentence or the training data or
input alike into like an array or a list
of the words themselves. We come up with
a representation of each of these words.
For instance, the word Massachusetts if
you encode it in a certain way uh is
going to be represented with an array or
vector of numbers, floatingoint values.
So many so that the word Massachusetts
in one model would use these 1536
floatingoint numbers to represent
Massachusetts essentially in an
n-dimensional space. So not just an XY
plane but somewhere sort of virtually
out there and then and this has been the
key to these GPTs an attention is
calculated based on all of that data
whereby in this picture the thicker
lines imply more of a relationship
between those two words. So
Massachusetts and state is inferred as
having a thicker line, a higher
attention from one word to the other.
Whereas our A's and our ises and our
thus have thinner lines because they're
just not as much signal to the AI as to
what the answer to this question is.
Meanwhile, when you then feed that
sentence like the state's capital is one
word per neuron here, the goal is to get
the answer to that question. And even
here, this is way smaller of a
representation than the actual neural
network would be. But in effect, all
these LLMs, large language models are
are just statistical models. Like what
is the highest probability word that it
should spit out at the end of this
paragraph based on all of the Reddit
posts and Google search results and
encyclopedias and Wikipedias that it's
found and trained on online? Well, the
answer hopefully will be Boston. But of
course, 1% of the time, maybe less than
that, the answer might not be correct.
And even CS50's own duck is fallible,
even though we've written lots of code
to try to put downward pressure on those
mistakes. And those mistakes are what
we'll call lastly hallucinations where
the AI just makes something up perhaps
because some crazy human on the internet
made something up and it was interpreted
as authoritative or just by bad luck
because of a bit of that exploration 10%
of the time 1% of the time the AI sort
of veered this way in the large language
model in the neural network and spit out
an answer that just in fact is not
correct. And so I thought I'd end for
today on this final note, a poem with
which many of us might have grown up
from Shell Silverstein here about the
homework machine, which years ago
somehow sort of predicted the state we
would be in with these AI machines. He
said, "The homework machine, oh, the
homework machine, most perfect
contraption that's ever been seen. Just
put in your homework, then drop in a
dime, snap on the switch, and in 10
seconds time, your homework comes out
quick and clean as can be." Here it is.
9 + 4, and the answer is three. Three.
Oh, me. I guess it's not as perfect as I
thought it would be. This then was CS50.
See you next time.
Heat. Heat.
Heat. Heat.
All right, this is CS50 and this is
already week 8. uh and up until now of
course in so many of our problem sets
like we've been writing command line
code like a black and white terminal
window and everything is very keyboard
based very textual but of course like
the apps that you and I are using like
every day are in the form of a web
browser and on our phone and so today
and really for the rest of the semester
we now transition to using all of the
building blocks that we've been
accumulating over the past few weeks but
to redeploy them in the context of web
apps and for your final project for
instance if you so choose even mobile
apps as well. So today we're going to
understand how the internet that we use
every day actually works. We're going to
introduce you to a language called HTML
which is the language in which web pages
are written. A language called CSS which
is the language with which web pages are
stylized. And then lastly JavaScript
which of those is the only actual
programming language but even though
we'll spend uh quite little time on it
you'll see syntactically and
functionally it's very similar to C to
Python and languages indeed that have
come before. All right. So we use the
internet every day. So what exactly is
it? Well, in the simplest form, like
we've got networks in the world and
networks are interconnections of
computers, whether with wires or
wirelessly. You have a network at home
nowadays for the most part. You
certainly have a network on a campus
like this. In corporations, you have
networks. So interconnections of
computers. As soon as you start
networking the networks, if not
networking the networks of networks, you
have in effect the internet. So this
global interconnection of computers,
servers, devices and so many other
things literally nowadays that we take
for granted every day. But how does it
actually work and where did it come
from? Well, if we rewind to like 1969,
the internet in its original form really
something known as ARPANet for the
advanced research projects agency, a
project from the Department of Defense
that was really designed to interconnect
what limited supercomputers we had back
then that were otherwise geographically
inaccessible to so many researchers and
others. The internet or ARPANET really
just looked like this with UCLA and just
a few other nodes so to speak
interconnected somehow. Uh just a year
or so later did we have Harvard and MIT
and others on the east coast. And if we
fast forward now to today of course we
can find and route data most anywhere in
the world. And in fact the world is now
filled with these things called routers.
A router is just a computer a server uh
that routes data up down left right
geographically. And of course in the
real world it might go out this wire
here, out this wire here, out this wire
or out this wire. And in fact, just to
make more real what we're about to be
talking about when we talk about
networks of computers and eventually the
internet, um we engaged some of our
teaching fellows over the past few years
to perform a a little skit of sorts for
us using uh Zoom, if you will, whereby
each of the teaching fellows or humans
you're about to see consider them as
representing a router, a device on the
internet whose purpose in life is to
route data. And what they're routing is
what we're going to start calling
packets. packets of information which
metaphorically you can think of as just
like a little white envelope like this
that we use to send things via snail
mail via the US Postal Service or beyond
that internationally. So I give you in
just 60 seconds or so what it means to
send a packet on the internet for
instance from Phyllis in the bottom
right hand corner to a familiar face
Brian at top left. If we could dim the
lights if only to be dramatic.
Heat. Heat.
Thank you. Sure, we can clap for that.
And we actually should clap for that
because you're seeing the sort of final
version which looked kind of perfect,
but they were all smiling and clapping
because it took us so many damn takes to
like actually get the coordination of
that correct. But for now, assume that
it was in fact correct. But notice
what's among the takeaways from even
that little skid is that the packet, the
envelope from Phyllis to Brian could
have taken any number of paths. It could
have gone up and then to the left. It
could have gone left and then up. It
could have zigzagged and the like. And
that's actually representative of how
the world now looks because of so many
wires and so many wireless connections.
There's actually a lot of ways that data
can travel from point A to point B. And
it turns out it's not even necessarily
going to be the shortest difference. It
might be the least expensive dis uh
distance uh or perhaps just the result
of how some humans or somehow some
servers have automatically configured
the d the uh routes to get from point A
to point B. So let's consider how the
data is actually getting there. So long
story short, all of those routers and
indeed all devices on the internet
including the ones in your pocket or on
your laps speak a language, more
technically a protocol nowadays known as
TCP IP. And this is actually a pair of
protocols which is a set of conventions
that governs how computers behave on the
internet. In the human world, we have
protocols as well. For instance, when I
meet someone for the first time, I very
often instinctively sort of extend my
hand just sort of hoping that they too
will extend their hand and shake. And
that's a human protocol in that it
governs how to people in that case
intercommunicate. Well, servers have the
same kinds of protocols, but it's all
textbased or bit based instead of of
course physical. But TCP and e and IP
are two different protocols that solve
two different problems. And let's focus
on the last of them first. So IP short
for internet protocol is simply a
protocol that decides to give all of us
a unique address in the world. In other
words, there are these things called IP
addresses. It's a numeric address that
literally every computer in the world
has in order to uniquely identify it.
Case in point, in the real world, we
have addresses too. For instance, in
this building here, Memorial Hall, we're
at 45 Quincy Street, Cambridge,
Massachusetts 02138 USA. And
theoretically that unique identifier
should get an envelope in the physical
world to this location from any other in
the real world. IP as applied to the
internet just means that similarly do
devices, Macs, PCs, phones, and
everything else on the internet have a
unique identifier as well known as an IP
address. It's a number, but it's
typically formatted in dotted decimal
notation, so to speak. So it's something
dot something dot something dot
something. And just as a bit of trivia,
each of these number signs represents a
value from 0 to 255. So there are four
such values apparently. And just doing
some quick week zero math, if each of
those values can be 0 to 255, how many
bits is an IP address presumably?
>> So eight bits per number. And how many
was this?
>> So 32 bits because if you're counting
from 0 to 255, well that's 256 total
possibilities. That's two to the eth
which means 8 bits. 8 bits. 8 bits. 8
bits. So IP addresses are 32 bits.
Little trivia that's germanine only in
so far as it does kind of limit how many
total devices we could seem to have in
the world. If you've got only 32 bits,
how high can you count? Roughly
>> two.
>> So two to the 32nd power, which we've
generally ballparked as 4 billion, which
is to say you can have 4 billion devices
total, it would seem on the internet,
which is a big number. But there's also
a lot of humans nowadays. is and odds
are most everyone in this room has at
least two devices to their name. Maybe a
phone and a laptop with which you're
taking the course. Maybe even more
devices thanks to the internet of things
like smart home devices. We have so many
IP addresses being assigned to things.
So long story short, the world is
gradually transitioning from this
version here, IPv4,
uh to IPv6, which instead of using 32
bits is actually using 128 bits, which
is crazy large and gives us more than
enough IP addresses for the foreseeable
future. To be fair, we've been talking
about this for like 20, 30 years,
transitioning from V4 to V6, and it's
still gradually in motion. But for
simplicity in the class and in general,
we'll still use IPv4, if only because
it's a little easier to wrap your mind
around. Now, this is admittedly a pretty
arcane diagram. But this is the diagram,
ASI art, if you will, that's in the U
official specification of what we mean
by an IP datagramgram. More
colloquially, this is what a packet
actually looks like. Now, what are we
looking at? Well, you're just looking at
like a grid of bits. So this here
represents 32 bits total where this is
bit zero and that's bit 31 zero indexed
all the way over there. And then each
row represents 32 more bits. 32 more
bits. 32 more bits. Which is to say
anytime a computer like Phyllis sends an
envelope of information on the internet.
It contains at least this information. A
whole bunch of bits broken down into
bytes. Now, the only ones we'll really
care about today are this one here,
source address, which is to say when
Phyllis sends that packet, she writes
her source address, her IP address,
something
on the outside of the envelope, so to
speak. And she also puts Brian's IP
address, whatever that is, something
else something else
on the outside of the envelope as well.
There's a whole bunch of other bits
involved which are useful, but we'll
wave our hands at those for today. But
that really speaks to what's actually
happening. And if we do this
metaphorically in the real world, it's
kind of like taking out that envelope.
And for instance, if Brian's IP address
is 1.23.4
for the sake of discussion, Phyllis in
advance of our filming that bit would
have written something like 1.23.4
in the middle of the envelope, just like
we would in the real world. But
presumably, she wants Brian to be able
to reply to acknowledge receipt or send
his own message. So, she's also going to
put her IP address, for instance, in the
top left corner of the envelope,
5.67.7.8
for the sake of discussion, so that
Brian knows when he writes out his own
packet of information how to actually or
to whom to reply. But at the end of the
day, it's all just bits uh being sent in
a specific pattern and there is formal
documentation is the the order in which
all of those bits will actually be sent
out on the wire or wirelessly. So in
short, IP ensures that all of us have
unique IP addresses via which data can
go from us or to us. But that's only one
problem. Nowadays, of course, servers
can do so many other things. They can do
email and chat and video conferencing,
game servers, and who knows what. And it
would be nice if a single server
certainly could do multiple things. And
in fact, that's very much the case.
Single servers nowadays, and a server is
just a term of art for a computer used
to serve information to other people. By
contrast, our laptops, our desktops are
generally clients because they only
serve one of us, not multiple people.
But these are just uh terms of art.
We're describing at the end of the day
still computers. IP only ensures that we
can uniquely address computers on the
internet. But there's another protocol
in TCPIP, namely the TCP portion that
allows computers to uniquely identify
services that they're offering uh to the
rest of the world. So for instance, TCP
allows it allows a computer to
distinguish whether it has received a
packet that's an email or receive a
packet that's a chat message or a piece
of a video conference or the like, which
is to say there's more than just IP
addresses on the outside of these
envelopes. There are also what are
called port numbers as well. Uh
similarly, numeric uh numeric values
that are usually in the range of like 0
to one uh zero on up in the low
thousands and they're standardized. For
instance, if you are requesting a web
page using http
slash with which all of us are
presumably familiar, unbeknownst to you,
on the outside of the virtual envelope
that your computer subsequently sends is
the port number 80. Because when the
server receives that, it knows, oh, this
human is requesting a web page and not,
for instance, their email or something
else. or nowadays if you're using HTTPS
where the S denotes secure in the URL
you're actually using port 443 which is
just an arbitrary number that a bunch of
humans in a room decided on years ago to
standardize what goes on the outside of
an envelope. So just to be more clear
then when Phyllis is sending a request
to Brian and if Phyllis for instance is
the client just a human using a computer
and Brian in this story is now a web
server better yet a secure web server
that's somehow encrypting or scrambling
the information to keep it secure well
on the outside of this envelope after
Brian's IP address which was 1.2.3.4
four. Phyllis is also going to write the
number 443 so that when Brian receives
and opens this envelope, he knows what
he's looking at. A request for a web
page and not an email or a chat message
or something else. Moreover, we can
continue the story just a little bit
further. Phyllis also writes on the
envelope not only her IP address 5.67.8,
but some number as well in that top
lefthand corner, whatever it happens to
be, which is a port number via which
Brian can reply to her. In this way,
Phyllis can in effect have multiple tabs
open, be using Zoom and uh some chat
software or something else, running
multiple programs on her computer, and
the internet packets are all coming in,
but her computer knows to which tabs or
applications those packets belong. So,
if you really want to geek out, here's
what this thing looks like. This is just
the sequencing of bits for TCP as well,
which is to say, in addition to the
dozens of bits we looked at a moment ago
that standardize what IP is putting on
the outside of the envelope, TCP is
adding uh 16 bits that specify a port
number, which means you can indeed have
tens of thousands of possible port
numbers, a destination port number, and
a bunch of other stuff, including this
so-called sequence number, which happens
to be a 32bit value, which is actually
pretty important because quite often
when sending messages on the internet,
they're pretty large. And it would be
nice if one person downloading a big
image or one person downloading a movie
or streaming a movie doesn't mean that
no one else on the internet can do
something else at that moment in time.
So for the sake of discussion, suppose
that this very happy cat here is a very
large JPEG, for instance, a very large
graphical file. It would be nice, let's
say, that if Phyllis is trying to send
or receive an image as large as this,
it's not just in one massive envelope
that's going to prevent a whole bunch of
other users from similarly using the
internet at that moment in time. So, at
the risk of a a bit of heresy, we can
actually tear this cat in half and
fragment it really. And then inside of
Phyllis's envelope or equivalently
Brian's reply depending on where this
cat is coming from or going to part of
that cat can go in this envelope. And
now say in the bottom left hand corner
of this envelope, Phyllis or Brian could
write the sequence number in question.
One out of four, two out of four, three
out of four, four out of four. So that
when this and hopefully the other
packets arrive at their destination, the
recipient's computer can check, okay,
this was a really big file in this case.
Do I have all of the parts? Yes, it can
be inferred from the so-called sequence
number which we've represented there in
that memo field of the envelope. There's
a bunch of other stuff that can go on
here too, including prioritization of
data as well. Um, but ultimately TCP
just allows servers to handle multiple
types of services and also allows it to
receive data reliably because if for
instance a recipient only gets two out
of the four packets or three out of the
four packets, the fact that there's a
sequence number involved is enough
information for that recipient to say to
the sender, hey, I'm missing one or two
or three or more packets. Please resend
them. So in short, TCP guarantees
delivery by just doing some bookkeeping
on the outside of these envelopes. So in
short, IP allows us to uniquely identify
computers and TCP guarantees delivery
and allows us to multiplex so to speak
among multiple services on the same
device. Questions on the uh this jargon
thus far because today's filled with
acronyms unfortunately.
questions on IP, TCP or anything
else. Okay, so seeing none, uh, as
promised, let's do yet another acronym.
So, it would be pretty tedious if
Phyllis and Brian and all of us humans
had to write actually IP addresses into
our browsers when visiting websites. Uh,
and in fact, most of us never do that.
Instead, we go to google.com or
Harvard.edu edu or actual domain name so
to speak which were so much easier for
us humans to remember than these
arbitrary IP addresses that are either
automatically assigned to computers or
manually configured uh by humans
configuring servers but there's another
acronym in the world and there's another
technology used on the internet namely
DNS for domain name system and this is
just a certain type of server that every
home has if even if you didn't know it
every uh campus has every company has
there's so many DNS servers around the
world but their purpose in life quite
simply is to translate what you and I
know as domain names like google.com,
harvard.edu and the like into their
corresponding IP addresses. And so in
short, inside of these DNS servers are
essentially like a two column table or
spreadsheet, however you want to think
about it, whereby here's all of the
domain names in the world. Here are all
of the corresponding IP addresses in the
world. And so when your Mac or PC or
phone being used by you is trying to
access google.com or harbor.edu, edu.
That device certainly when it's first
booted up has no idea what IP address
what the IP address is for that server.
It's not the case that Apple or Google
are pre-installing billions of IP
addresses inside of our devices. But
your device is smart enough to ask the
local network at home on campus or at
work. Well, what is the IP address of
google.com? What is the IP address of
harbor.edu? Then what your Mac, PC or
phone actually do upon getting that
answer from one of these local DNS
servers is it writes the corresponding
IP address on the outside of that
envelope. So it's a wonderfully useful
service that just makes the internet
more useful for you and I to use because
we can use names instead of IP addresses
as well. Um technically these things are
called fully qualified domain names.
Where do they come from? Well, some of
you might actually have your own
personal website. You might have gone
through this process. It's actually not
that hard to get your own domain name.
You can go to any number of what are
called internet registars and pay them
some money and it's essentially a on a
rental basis. So you rent a domain name
for a year or maybe three or five years
at a time and they can automatically
bill you. The domain name might be as
little as a dollar per year or thousands
of dollars per year depending on whether
someone has scooped it up and is maybe
squatting or the like. But all you do
ultimately is pay someone money and they
give you the rights to use that domain
name. And then what you do technically
is you configure some DNS server
somewhere in the world to know what the
eventual IP address is for your server
that's going to serve up your domain
names, web pages. And long story short
with DNS, I say that you have one in
your home and on your work and on your
campus because it's a very hierarchical
kind of structure. like there is out
there somewhere these so-called root
servers that essentially know what all
the IP addresses are of all of the
dotcoms for instance or all of theus or
the like but my Mac doesn't know that
and so my Mac might actually ask that
root server what is that IP address but
in ter more efficiently my Mac is better
still going to ask the local network
first when I'm at home it asks my home
DNS server which is built into the
little home router that you've got
somewhere in there uh or if you're on
campus it asks Harvard's DNS server And
this whole design is recursive to borrow
a term from a few weeks ago in that if
my computer doesn't know the answer,
what's the IP address for this domain?
If Harvard doesn't know the answer, it
eventually gets escalated to those
so-called root servers, but then cached
that is remembered by all of these other
DNS servers along the way. So, it's a
very elegant hierarchical design, but at
the end of the day, it's just doing
this. It's a big cheat sheet of domain
names to IP addresses, and the server is
responding for us. All right, one more
acronym. So, how do I know what my MAC's
IP address should be? How do I know what
my phone's IP address should be? Uh, how
do I know what the IP address is of the
DNS server of whom I should be asking
any of these questions? How do I know
the IP address of the router to whom to
hand my data off to? Like, there's a lot
of assumptions built into the story
we've been telling. And the answer is,
unfortunately, yet another acronym,
DHCP, is the solution to all of those
problems. And it wasn't always. You
know, back in my day, we used to have to
manually type in what our computer's IP
address was based on what some human
told us it would be. We had to type in
our DNS server, type in our router
address. But now, uh, now DHCP is just
yet another server running in your home
network, running on campus, running in
your corporate network whose purpose in
life is to answer questions of the form,
what is my IP address? which is to say
when you boot up your Mac, your PC, your
phone for the first time, it essentially
broadcasts a message like hello world,
what's my IP address? And hopefully
there's one such DHCP server on that
local network wired or wirelessly that
will respond based on how Harvard or
Comcast or Verizon or someone at home
has configured it to tell you what your
devices IP address is, what the IP is of
your local router, what the IP address
is or are of your DNS servers and the
like. And so this is why things just
work nowadays once you've connected to
like a Wi-Fi network or physically
plugged in. Dynamic host configuration
protocol didn't always exist. Wonderful
that it now does. All right, enough sort
of outside of the envelope stuff.
Everything else today will be a deeper
dive inside the inside of this envelope
to look at what actually are the
messages that we are sending, receiving,
how are you structuring the web pages
and designing everything that comes back
from the server to the client. And let's
dive in then to this acronym HTTP which
you've been typing for years or seeing
for years even though you don't really
have to type it anymore because browsers
just assume that this is what you want.
But HTTP is another protocol, hypertext
transfer protocol, whose purpose in life
is to request web pages and receive web
pages. As a protocol, it just
standardizes like what goes inside of
that envelope when you're trying to use
the web. There are different protocols
for email, different protocols for Zoom,
different protocols for Discord, and any
number of other internet services. We'll
focus predominantly today on HTTP, which
happens to use ports 80 and 443, among
others, as we saw. So let's see what
HTTP uh it uh is all about or HTTPS the
corresponding secure version thereof. So
here is a URL canonical URL in that it
has a whole bunch of components. Let's
consider what some of the jargon is that
we're going to start taking for granted.
So if you go to httpswww.agample.com/
you are implicitly requesting the root
of that website. root just means the
default directory, the default folder if
you will. And that's what the yellow
highlighted slash here just means like
give me the default web page.
Technically speaking, what you're going
to receive in your browser, unbeknownst
to you, is an actual file. By
convention, it's a file called
index.html,
maybe index.htm, or any number of other
files. But it would be pretty stupid if
we as humans all had to type out the
actual file name that we want. So the
server by default is just going to
return you the root of the website. If
though you're inside of a folder or you
do actually click on a link that leads
you to a file, you might very well have
at the end of this domain name a full
path as well, which might contain zero
or more folder names and zero or more
file uh zero or one file names as well.
In fact, it could be explicitly
file.html orfolder/or/folder/file.html.
You've probably seen thousands of these
over time, even if you haven't really
given it much thought. So we today
onward will be creating all of this
stuff here but we need to understand
what's going on to the left too. So here
is the so-called domain name or more
properly the fully qualified domain name
and it has a few different parts too. So
this is technically the domain name as
we all refer to it something.com
means commercial and that com is more
specifically known as a tople domain or
tldd. Back in the day there were only a
few of these.gov.com.net.org
org.edu and a bunch of others. Now,
there's hundreds, if not thousands of
them. Many of them aren't really used
prominently in the wild, but there are
some not on that original list, like
CS50 uses. IO a lot, which doesn't mean
input output. It's actually a two-letter
country code that has been uh uh
essentially rented to us and anyone else
using that same TL because in the
English- speakaking world, io actually
sounds kind of cool. It's kind of
conotes indeed input and output.tv TV is
another one that actually belongs to a
country but in fact also sounds like uh
in English television and so that too
has been used as well but in general
there are top level domains like these
some of them now are full words some of
them are two characters denoting they
belong to a country they are the sort of
top level indeed uh categorization of
all of these websites meanwhile many
URLs but not all also have something to
the left of the domain name known as a
host name which technically speaking
refers to the name of the server that
you're requesting specifically. It
doesn't have to be literally one server.
www can refer to dozens of hundreds
thousands of servers. Indeed, if you go
to any popular website like gmail.com or
the like. Even though you only have one
domain name, somehow or other
technologically it is referring to
clusters of hundreds or thousands of
servers that ensure that they can handle
all of the customers that might visit
that site. And then lastly, there's this
the scheme or the protocol in use
specifically. And for our discussion
today, it's always going to be HTTPS,
which is ideal because it's secure and
encrypted somehow. Uh, but it can also
be indeed HTTP col. So that's it. Like
that's just the jargon with which you
should be familiar when it comes to URLs
like these. And what we'll be doing
today is actually creating content that
lives at URLs like that and serving it
up to us. But what do the messages
ultimately look like that are going
inside of these envelopes? what the URLs
are doing are just getting us to the
right place. But how do we express in
some form of code that we want this
fileh from this server using encryption
in this way? Well, inside of the virtual
envelopes that Phyllis was sending to
Brian and he would have ultimately sent
back to her are messages that look like
this. Uh get, post, and a bunch of other
verbs, if you will. So, HTTP supports a
bunch of operations or verbs, namely
get, post, and a few others. And it was
in the the first of these that Phyllis
would have put inside of her envelope
initially in order to get a web page
like a cat from Brian. Specifically,
inside of the envelope, she would have
had a textual message. It's not code per
se. There's no functions or loops or
variables or anything like that. It's a
protocol just in the sense that humans
years ago standardized what messages
should appear inside of those envelopes
if you want to get a web page from a
server. So for instance, if Brian in
this story is now suddenly harvard.edu,
specifically www.har.edu,
Phyllis's envelope would have contained
a message saying get in all caps slash
if she just wants the root or the
default page from Brian's server, the
version of HTTP that she's using, for
instance, version two. And she would
also specify just in case Brian is
multitasking and serving up websites for
different domain names on the same
physical box which actual host that she
wants and maybe a bunch of other lines
as well. And hopefully if all goes well,
Brian would have responded with an
envelope of his own containing an HTTP
response in answer to her HTTP request.
And Brian's envelope would have
contained a textual message that just
confirms what version of HTTP he's
using, a status code, which is an arcane
number that just indicates in this case
that everything is okay. All is well,
and he would specify the type of content
he's sending back to her in his own
envelope because it could be HTML. More
on that to later today. It could be a
JPEG, it could be a GIF, it could be any
number of other file formats. And this
is just a hint to Phyllis's browser as
to what's going to be inside of that
envelope she is getting back within her
browser. And then maybe a bunch of other
stuff as well. So even though some of
these details like these underlying
implementation details might visually be
new to you if you've never really
thought about it, turns out we as
aspiring programmers can actually see
and and poke around with these building
blocks and ultimately today take
advantage of them. So you're about to
see a program that's called curl which
stands for connect URL. It's installed
in Linux systems like cs50.dev. It's
also comes with Macs and PCs quite
frequently or you can easily install it.
And essentially it's a headless browser
that allows you to pretend to be a
browser and grab the response from a
server by pretending to send by actually
sending the contents of an envelope like
this. So for instance, if I want to
pretend to be a browser and request
harbor.edu, edu. I can type this in my
cs50.dev terminal window. And let me go
ahead and maximize its size and do the
following. curl- i, which specifically
is only going to show me the headers,
the text that we were just talking
about. And it's not going to send any of
the contents of Harvard's website. Curl-
capital I httpswww.harboard.edu/.
So if I were typing this into a browser,
I would actually see Harvard's homepage.
In this case, I'm just going to see the
contents of the envelope as black and
white text on the screen. Specifically,
only the first few lines, the so-called
headers that the server is responding
with, just as I claimed Brian would to
Phyllis. I hit enter, and there's indeed
more lines than I had in my slide, but
you can see that everything is in fact
200. Okay, this is a convention. 200
means all is indeed okay. There's a
bunch of other information here,
including the date and time in which
this response came back. Here's that
content pipeline text HTML and then some
other details and a whole bunch of other
information as well. So that's one way
of seeing what's going on underneath the
hood. Well, what other responses might
come back? Well, it turns out that 200,
okay, is the best possible outcome, but
there's another a bunch of other
outcomes that are possible as well. For
instance, sometimes you'll get not 200
but 301, which means moved permanently.
uh it uh colloquially speaking and what
does this mean? Well, if a server
responds to a browser with a numeric
code of 301, that means that the browser
is supposed to go to this location
instead. It's sort of like putting a
detour sign on the server that says
there's nothing for you here. Go over
here to this location instead. And now
notice in this example, it's telling the
user to go to httpsw.har.edu/
do slash that's actually what I typed
before so I would not have seen that
myself but if I go back to VS Code here
and let's run the exact same command but
let's try to visit the insecure version
of Harvard's website http slash which
just means that anyone else on the
internet can technically see what it is
I am now doing with my browser which
might not be desirable enter this time
Harvard server does not just tell me 200
okay it actually says 301 move
permanently and if I read lower in these
lines there indeed is the location to
which I should actually go and it's a
subtle difference. It's forcing me to go
to https instead without actually
showing me the contents of Harvard's
website. So nowadays you and I don't
even have to think about this. You and I
are not even in the habit surely of
typing http
or https col.
But the browser is ensuring in this case
that you are redirected so to speak
automatically to the secure version of
that site instead. Now there's other
status codes and in fact even if you
never realized it before now what
numeric code do you essentially you
sometimes see on the internet when
something goes wrong 404. So 404 is a
weirdly public arcane error number error
number or status code that just means
file not found. And we can simulate this
as follows. For instance if I in my
terminal window do curl-hwww.har.edu
I'll suppose that Harvard has a whole
department dedicated to cats, which it
does not. But if I hit enter here,
you'll see that I get an HTTP24
status code, which just means the
website does not in fact exist. And if I
visited https/www.har.edu/cats
in my browser, I would presumably see
some error page that may or may not show
me visually 404. But many websites, most
websites, for better or for worse,
reveal this number. So much so that most
everyone in this room is probably
familiar with 404, even though its
origin is this very low-level arcane
status code buried in the HTTP headers
inside of envelopes like these. There's
a whole bunch of others if you'd like
some fun facts. Uh 200 is indeed okay.
301 is moved permanently. There's a
bunch of other 300 ones that all relate
to go elsewhere. Uh 400 generally means
that you as the user have somehow done
something wrong or next week as we start
writing code that talks to web servers.
Maybe your code has done something wrong
when requesting a website. 500s are
really bad. It means the server is
messed up somehow. Either it's not
available or the programmer made some
bug in their code such that it's
crashing with for instance something
like an internal server error. Uh, we
included 418, which is not actually a
thing, but it was a fun uh um sort of
April Fool's joke years ago where a
bunch of uh humans thought it would be
funny to write up a whole specification
for what it means for a server to
respond with a number of 418. Inside
joke, not funny at the moment, but uh it
is sort of part of internet lore
nowadays. Um we can have a little bit of
fun with this, maybe with the at the
expense of our dear friends down the
road. Um, for years now, someone has
been paying for uh the following
behavior. Let me go back to V uh VS Code
here in my terminal window. Let me do
curl- httpsychool.org.
Have you ever been ever reply perhaps?
Well, let me actually go to
httpsafetyschool.org
and just for fun, hit enter. Oh my
goodness, look at where we are. So, how
is this implemented? Well, if I finish
what I began over here by just looking
at the HTTP headers inside of the
envelope my actual browser just sent to
safetychool.org
for like 20 years, presumably some
Harvard alum has been paying the bill to
rent this domain name just to have this
trick implemented such that 301 move
permanently is directing people ever
since to yale.edu. There's a bunch of
others if you go down the rabbit hole of
looking on Reddit and the like Stanford,
Berkeley, there's a healthy competition
on East Coast and West Coast, but it all
boils down to very arcane understanding
of how HTTP works, the protocol that
governs how data is sent from web
browsers to web servers. Now, you can of
course use curl for connecting to URLs
in the context of something like CS50.
You could have been doing stuffing stuff
like this all the time though with your
actual browser. So, I'm using Chrome
here, but most any browser nowadays has
the ability to give you developer tools
uh natively, which is to say somewhere
there should be an a menu option that
lets you use developer tools that are
conducive to someone who knows a bit of
programming to poking around underneath
the hood of the browser and see what's
going on. For instance, I'm going to go
ahead and open up a new window here, and
I'm going to rightclick on the
background, or I can go to the
appropriate menu in Chrome's dot dot dot
menu, and I'm going to go to inspect,
which pulls up what we're going to call
developer tools. I'm doing it incognito
mode for reasons we'll see next week.
This has the effect of clearing
automatically any of my cookies, my
browser history, because most anytime I
do something with the web browser today,
I want to pretend like I'm doing it for
the very first time so that the behavior
is exactly as we suspect. uh expect. So
down here, now that I've opened up the
so-called developer tools in Chrome, and
they look almost the same in Safari and
Edge and a bunch of other browsers as
well, I will see a tab called elements,
which shows me all of the elements of
this web page once it appears, including
the so-called HTML code we're about to
write. I can see a console where error
message might sometimes appear, similar
in spirit to the terminal window in VS
Code. I can also see the network
connections that the browser is making
to the server. And that's where I
thought we'd start our attention here.
Here I have a brand new browser window.
I'm clicking on network over here. Um,
just to make sure we can see everything
without it getting automatically
deleted, I've clicked on preserve log
and disable cache just so that it
behaves exactly as expected. And now
let's go up here for the first time in
this incognito window and go to
http/safetieschool.org.
Enter. And you'll see a whole bunch of
output including this warning in this
particular mode. This is increasingly
common nowadays for websites that do not
support HTTPS, which this alum hasn't
been paying for. Uh you'll get a warning
typically that specifies you might not
want to do this because the whole world,
at least the whole world between you and
point B, might know what it is you're uh
accessing on the web. I can go ahead and
pass through this. In fact, once I do
that and click on connect to site, we'll
see even more output at the bottom and a
whole bunch of output that's kind of
overwhelming. Notice at bottom left
here, just going to safetychool.org
resulted in 61 HTTP requests, in effect,
61 envelopes going back and forth. I'm
going to focus though on the ones at the
very top here, whereby when we finally
click through that warning, and I got
back a response from the server, having
visited safetieschool.org, here is
Chrome's presentation of the same
information that curl was showing me in
my terminal window. The message that
came back was 301 move permanently. The
protocol or the verb being used was get.
There's some uh mentions of the IP
address in question here and a whole
bunch of other stuff that we'll wave our
hands at for today. So all of this time
you can see the same and let's try this
with some cats. Let me click on the
little ghostbuster symbol to clear
everything uh down in the developer
tools. Let me zoom out and this time let
me go to httpsw.har.edu/cats
edu/cats which recall did not exist
according to curl. If I hit enter, I do
see a web page. It's interesting that
Harvard has chosen to fairly arcanely
reveal to all visitors 404, which means
nothing except in so far as the status
code. But if I scrolled through all of
the 59 requests that were involved and
just displaying this very graphical page
and go back to the top, you'll see by
clicking on the first row for cats
itself that I used get to get it uh that
URL/cats in the end and it was indeed
404 not found. So you can sort of have
all this fun on your own by just poking
underneath the hood of what your browser
has been hiding from you all of this
time.
All right. Any questions now before we
dive in?
No. All right. Well, that's the network
tab. Let's look at some of the others
and see how we can start writing the
stuff oursel. Let me go to stanford.edu.
Enter. A whole bunch of things will fly
across the screen, but this time I'm
going to go to the elements tab. And
what we're about to dive into is an
actual language, not a programming
language, a markup language called HTML,
hypertext markup language, whose purpose
in life is just to tell browsers what to
display on the screen. So here is all of
the so-called HTML that some human or
humans or software at Stanford wrote in
order to create Stanford's homepage,
which as of today looks lovely like
this. Uh the interesting thing though
about the code that Stanford has written
to generate this website is that it's
being sent to me as a copy. And this is
quite unlike the code we've been writing
thus far. Um when you wrote code in
Scratch, it was sort of there in the
browser and stored on MIT server. When
you wrote C code and ran it, it was
inside of the code space and not given
to any user who might access it. The way
the web works though is a little bit
different. Inside of those envelopes are
literally copies of what's on the server
being sent to the browser. And so it's
your browser, the so-called client,
that's actually reading that code, HTML
in this case, top to bottom, left to
right, and figuring out how to display
it. It's not executed on the server per
se. Now, that story is going to change a
bit next week when we start using Python
to dynamically generate HTML so that
we're not writing all of this code by
hand after this week, but for now,
everything you see was the result of the
browser executing code that Stanford
wrote. The implication of that is that
we can have a bit of fun with these same
developer tools. For instance, if I
control-click or rightclick on something
like the word Stanford in the middle
middle of their homepage, choose that
same inspect option. What's nice about
these developer tools is it's going to
jump to the very line of code that
created that Stanford brand name in the
middle of the web page. And this is a
wonderful teaching and learning tool
because in the days to come when you're
trying to learn more and more HTML, you
can literally do this for any website on
the internet and understand how it is
someone implemented a design for
instance that you really like and you
can learn from other websites how
they've constructed the same. So over
here you'll see that the word Stanford
is just in the source code of this page
in the so-called HTML and you know just
for fun I can change it to Harvard. Hit
enter and now Stanford's website looks
like we've been there um and rather
hacked it. Of course, it's not that easy
to hack Stanford's website. What have I
presumably only done just now?
I've changed my local copy of that
particular website. So, if I just click
on the reload icon, I'll actually see
that Stanford's website, for better, for
worse, still looks like that. But this
speaks to now the control that we have
within our browser to actually
manipulate and learn from what it is
that's going on underneath the hood. So,
let's dive into this language called
HTML, hypertext markup language. It's
not a programming language, which means
we're going to fly through it even
quicker than usual because it really
just contains some basic building blocks
that do have some interesting
intellectual design under them, but for
the most part, it becomes an exercise
ultimately and just like looking up
other tags that exist, read the
documentation and figure out how you can
use them to do other features in
websites. So, let's take a look at
perhaps the simplest of webpage and
specifically glean from them what tags
are and what attributes are. really the
only two terms of art that are going to
be generained for this particular
language. No loops, no conditionals, no
variables, no complexity really other
than basic building blocks like these.
So here is HTML for the simplest of
websites. This is like a mini version of
what Stanford's uh team presumably wrote
on their server, but it's only like a
dozen lines of code instead of hundreds
or thousands, however long that website
was. Any web page written today,
assuming it's using the latest version
of HTML, which happens to be version
five as of today, uh begins with code
that looks like this. This kind of code
will presumably be stored in a file
called file.html,
uh index.html, Stanford.html, whatever
the file is actually named. This is
simply what's going to be inside of the
contents. You could save this file on
your own Mac, open it up, and your
browser would open it, but you're going
to be the only one in the world that can
actually see the contents of that web
page if it's just on your Mac or just on
your PC. So, we of course are going to
be writing HTML on a server so that not
just you, but in theory, especially for
your final project, anyone on the world
with an internet connection can access
the same. So, we within the context of
CS50.dev dev are going to start using
this new command HTTP server whose
purpose in life is just to serve up
files via HTTP. Now, there's kind of an
interesting design going on here because
if we use ht if we use uh cs50.dev,
otherwise known as GitHub code spaces,
there's already a web server running on
that website because when you go to
cs50.dev dev and log in and get
redirected some longer URL. You're using
a web application aka VS Code that
allows you to write code in the cloud.
Now, that application by default is
running on port 80 and 443. So, it
doesn't matter if you start at HTTP or
HTTPS, both will work. But that means
that your code that we write today and
you write for the next problem set or
for your final project can't live at
port 80 or port 443 because GitHub, the
company that hosts this, is already
using those default standard ports. But
we can use any number of other port
numbers. I claimed earlier there's tens
of thousands of numbers that we could
use. So that's what we're actually going
to do. So let me go back to VS Code
here. Let me shrink down my terminal
window. Let me create a first file today
called for instance uh hello.html.
Enter. And now I've got an empty tab as
usual. I'm going to very quickly whip up
the exact same contents that we just
saw. So an angled bracket, an
exclamation point, dock type HTML, then
open bracket HTML, close bracket, and
notice the autocomplete kicked in for
this particular language. So I don't
have to type everything myself. Inside
of this tag, so to speak, I'm now going
to put a head tag inside of which is
going to be a title tag. I'm going to
say something like hello title just to
be quick. And then down here below those
lines, I'm going to put a so-called body
tag inside of which is hello body just
for some quick text. And that's it. This
is now a file inside of my code space.
And there's no command to just compile
or run this in the terminal because the
goal is going to be to open this HTML
file with a browser. If I want to do
that in another browser tab, I need to
tell code my code space to serve that
file via HTTP. So, the simplest way to
do this is as follows, http-server
enter. You're going to see a whole bunch
of text on the screen. You're going to
see a green button hopefully pop up that
says open in browser, which is going to
allow you to open up, and I'll zoom in
the contents of the current folder with
a web browser. My URL has changed to be
different from what it was a moment ago.
I came in advance today with my own
folder of code like we usually do.
Source 8, which contains all of today's
pre-made examples. But here is the file
I just created a moment ago. And if I
click on that hello.html,
what we're looking at at the moment is
just a directory listing, a directory
index of all of the files in my code
right now, I see the simplest of web
pages. It's a little underwhelming, but
clearly here's hello body, which takes
up like 95% of the screen, the so-called
viewport, which is just a big
rectangular region of the screen, but
there's the title in the tab up there.
So, if you've ever wondered or cared
like where does the content in a web
page come from, well, here's the body
content. Here's the head or the title
content. And then everything else is
just sort of icing on the cake. So, I've
written at this point a file called
hello.html.
it has yielded this effect of having
something in the head uh in the uh the
head of the page and the body. But let's
actually tease apart what just happened.
So at the start of any file written in
this language called HTML, the latest
version thereof, five, it literally just
starts with this. And this is just the
kind of thing you memorize or copy
paste. Uh open bracket exclamation point
dot type HTML close bracket over there.
It looks a little bit different because
we're not going to use for the most part
the exclamation point syntax anywhere
else unless we're using an HTML comment.
So HTML has comments just like Python, C
and other languages. But let's focus
really on this juicier part. Here we
have what's known as an uh an element in
HTML. An element includes a start tag
and an end tag or equivalently an open
tag and a close tag. So here for
instance is syntax that essentially is
going to tell the browser when my
browser reads this file top to bottom
left to right hey browser here comes the
HTML of my page and the language in
which the contents of this page are
written are in English. So HTML all
lowercase is the name of the tag so to
speak and equivalently the name of the
element. Lang is what's going to be
called an attribute which just modifies
the default behavior of the uh element
and quote unquote en is the value
thereof which is the shorthand notation
for English and their shorthand
notations for most every human language
as well. So you have a tag name and an
attribute with a value. And we've seen
these things so many times. These key
value pairs in the context of
dictionaries or hashts or any number of
other contexts. Key value pairs in HTML
are separated by an equal sign with the
value typically quoted in this way.
Double quotes or single quotes but being
consistent. Then notice at the end of
this file as per the indentation,
there's something symmetrically down
here that has the effect of closing the
tag or ending the tag. And this
effectively tells the browser, "Hey
browser, that's it for my HTML."
Meanwhile, everything else follows the
similar paradigm inside of those two
tags. Here is a head tag that says, "Hey
browser, here comes the head of my page.
Hey browser, that's it for the head of
the page. Hey browser, inside of the
head, here comes the title, that's it
for the title. Well, what is the title?
Hello, title." Just as I wrote in my
code space. Same story for body. Hey
browser, here comes the body of the
page. The 95% of the screen, that's it
for the body. But what's in the body is
exactly that. The indentation is nice
and pretty printed. I've used four
spaces as we commonly do. Not strictly
necessary. In fact, in my own code
space, I didn't even bother putting
these on three separate lines. I just
did one line. That's fine because as
we'll see, browsers typically ignore
whites space. Uh but I've done it there
as we often do just to ensure that
things are pretty printed and therefore
readable by us humans. Let me call your
attention to one other thing on the
screen. Up until now, before every
lecture, I've been hiding a whole bunch
of tabs in my terminal window. But
today, I left enabled one that you've
probably seen but not cared about
before, namely ports. And it's under
this ports tab that you can actually see
a real incarnation of a TCP port. By
default, when you run the command HTTP
server, it serves up my current folders
content on its own web server, its own
HTTP server, but not using the default
port 80 or 443 because GitHub is already
using those on CS50.dev and their
product. But by default, we've chosen
another common developer port number
8080, which is interesting only in so
far as it's 80 twice, but it's a human
convention, but it could have been any
number of thousands of other
possibilities. But this line here is
just telling me that I am some
apparently running a server on port
8080. And if I click on there too, I can
manually open the same tab. But that's
what the green button was doing for me.
It was informing me, hey, you've just
started a web server on this port. Do
you want to open a new tab with the
contents thereof?
So this is the picture we're now
painting. Let me pull back up the code
that we just wrote and let me propose
that what we've really done is built a
tree in the browser's memory. So we kind
of have come full circle with week five
when we talked about trees and other
hierarchical structures. If we assume
that the document can be represented
with a node that looks a bit like an
oval up here that just represents the
whole contents of the file. Well, it
starts with a single root element by
convention, the HTML element. And your
page can have only one of those
elements. But the HTML tag inside of it
can be a head tag and a body tag. And in
this case, the head tag, recall, had a
title tag as well as the actual text
thereof, which was hello title.
Meanwhile, the body had just the text
thereof as well. And so when I keep
saying that the browser is downloading
the file, for instance, hello.html,
reading it top to bottom, left to right.
It's doing literally that, but somehow
or other, it's using Maloc or whatever
language it's written in to allocate
node, node, node, node, node, and
populating that tree in your browser's
memory or RAM, a data structure quite
like that. So, it's all sort of gerine
to where we've been before.
Before now, we take I think a snack, are
there any questions
about what we've just seen?
anything at all. Shouldn't have prefaced
this with the only thing between us is
uh these questions and snacks.
No. All right, snack time. All right,
see you in 10. Snacks.
All right,
so we are back and pretty much
everything we do here on out will look
structurally like this. And we're just
going to introduce a few more tags and a
few more attributes to give you a sense
of some of the basic building blocks of
most any website out there. And you'll
find pretty quickly that it starts to
get kind of tedious writing it out. In
fact, I will resort to some copy paste
today just to kind of speed things up.
But this is going to motivate indeed
next week when we reintroduce Python as
well as SQL to actually auto automate
generation of HTML as well. So all of
today's websites and many of today's
mobile apps are written in HTML. But
people are in decreasingly writing this
kind of stuff by hand. Rather they are
writing code that generates precisely
what we're going to learn. So
understanding the fundamentals will
still be useful so we know what code to
write next week and beyond. So let me go
back into VS Code here. And what I'm
going to go ahead and do is open up
another terminal window so that I can
leave HTTP server running in this first
terminal window. And what I'm going to
go ahead and propose that we do is
implement a web page that has not just a
single line of text, but maybe some
paragraphs. So I'm going to call this
paragraphs.html.
That's going to open up a new tab. And
here's where I'm going to save some
time. I'm going to go back to hello.html
HTML and just highlight all and copy
paste this as the beginning of this
file. But what I'll start doing is just
changing the title of each page to match
the file name. So this is going to be my
paragraphs example. And instead of
saying just hello body, let's actually
have a few paragraphs of text. Um I'd
rather not waste time writing even full
paragraphs of text. So let's actually
open up the doc and let's log in and for
instance just ask it for a help quick
helping hand here. Write three
paragraphs about
uh computer science. don't really care
what the output is. All I want is some
dynamically generated text to save me
some keystrokes. And here we have an
educational answer there, too. Even
though all we really care about today is
the fact that this is three chunks of
text. Hopefully, that's all quite
accurate. All right, I'm going to go
ahead and highlight all of that. Go back
into my paragraphs.html tab. Paste it
inside of the body. It's so long, the
paragraphs, that the text scrolls. I can
at least clean this up slightly. I'm
going to go ahead and just indent it
twice just so that at least it's pretty
printed inside of the body. And now I'm
going to go back to my other tab which
represents the contents of hello.html.
I'm going to click back which is going
to show me that same directory listing
again which now has a new file
paragraphs.html and I'm going to click
it so as to see these three paragraphs
of text.
What looks wrong? Yeah,
>> paragraphs.
>> There's no paragraphs. It's just one big
blob of text. It's the same text, but
buried in there is the end of the first
paragraph and the start of the next, and
same for the third. So, what's going on?
Well, appropo of my comment earlier
about browsers not really caring about
whites space, you can put all the white
space you want there. It's just going to
ignore it in this particular case. All
it's going to give me minimally is a
single space between each of these
paragraphs of text. So, HTML is very
pedantic. Like, if you want there to be
more paragraphs, you need to tell the
browser, put a paragraph here, put a
paragraph there. And the way to do this
thankfully isn't all that hard. I'm
going to go inside of the body here and
I'm going to simply open a tag called
open uh P for paragraph for short.
Notice that VS Code in this particular
case is a little annoying because it's
trying to finish my thought, but it
doesn't know that I already wrote this
text. So, I'm just going to delete what
it automatically generated. And then I'm
going to manually indent this. And I'm
going to do the same thing again for the
other paragraphs. Up here, I'm going to
open the paragraph tag. I'm going to
delete temporarily the close tag so that
I can actually put it below that chunk
of text here. Indent this and then down
here. And this would have been easier if
I just did it right the first time. I'm
going to do the same thing with the
third and final paragraph. So now what
we in effect have three times in a row
is hey browser here comes a paragraph
then the first paragraph. Hey browser
that's it for the paragraph. Hey browser
here comes a paragraph that's it for the
paragraph. Hey browser comes a
paragraph. So, three times in total with
open, close, open, close, open, close.
Now, if I go back to the browser,
nothing appears to have changed yet, but
that's cuz I'm looking at a copy that
was downloaded a moment ago in that
virtual envelope. So, this is why, among
other reasons, we hit reload on web
pages to get the latest version. And
voila, now we have three actual
paragraphs. Um, the white space is
inserted automatically by the browser,
but it's at least prettier to the eye
now. So, that then is the paragraph tag.
So, useful, of course, if we have
paragraphs of text. What are some other
tags we might introduce? Well, maybe
you're writing a paper or a blog post or
the like. It's pretty typical to want
headings of sections of the page. Maybe
chapters and then sections and then
subsections or the like. HTML can help
with this too. So, let me go into my
terminal window again, create a file
called how about uh let's call it
headings.html.
And then in this file, let me similarly
go back to hello.html, copy paste it
into headings. I'm going to close
paragraphs because we're done with that.
And I'm just going to change the title
now to headings. And inside of the body
here, what I'm going to go ahead and do
is uh you know, it would have been nice
to have some of that same text. Let's
let me go back one step. Let me grab the
paragraphs and paste that into this new
file. Let me rename it to headings to
make clear which file we're in. And now
let me go ahead and propose that
wouldn't it be nice if I made clear that
this is the first paragraph. So I'm
going to use the H1 tag, which is the
heading one tag. And I'm just going to
say one for the sake of discussion. And
down here, I'm going to say H2 and say
two for the sake of discussion. And down
here, H3 3 because I don't really care
what these things are called. Just want
to demonstrate the functionality. If I
go back to my other tab now, back to the
directory listing, there's my brand new
file headings.html. And it's the same
paragraphs, but now you have some big
bold text that looks reminiscent of the
chapter heading, the section heading,
the subsection heading, and the like. Or
that you might see on a news site or a
blog site or the like. So you've got H1
through H6 from biggest and boldest to
uh smaller but still bold. And the
browser decides on all of those settings
for us. But it also makes some semantic
clarity to me that probably the most
important thing on the page at least to
begin with is that H1 tag and then
everything else is like supporting
paragraphs or arguments or whatever the
case might be. There's a hierarchy
implicit there. All right. What are some
other things we can do with web pages?
Well, let me open my terminal window
again and why don't we code up how about
a list of values cuz lists are
everywhere on the internet. So, let me
open up list.html and then close my
terminal. Uh, I'll go ahead and start
with that same file, headings.html,
paste it into list, change the name
here. Let's delete everything I did. And
again, the only reason I'm copying and
pasting is just to avoid writing out the
same boilerplate code again and again
with the HTML tag, head tag, body tag,
and so forth. Let's focus on the new
stuff. The new stuff in this example
will be a list of values like the words
fu, bar, and baz, which much like a
mathematician might go with xyz as
placeholders, computer scientists would
typically reach for words like fu, bar,
and baz when nonsensical placeholders.
And this looks like a list of three
values, one after the other. Of course,
if I go back into my directory index,
click on list, how many list items am I
going to see per line?
Yeah. Well, it's going to be just one
big blob of text here, too. It doesn't
matter if it looks like a list. It is
just going to be text after text after
text separated by a single space, not
the multiple lines I had. So, here too,
we've got to be pretty pedantic. If I
want a list of values, I need to use a
tag that conveys that. And the tag I'll
use first is going to be ul for
unordered list, which gives me a
bulleted list. And then inside of this
unordered list, I claim we're going to
have a whole bunch of list items or li
for short. uh like fu, like bar, like
baz or any other things that you want to
put in your list. If I now go back to my
other tab, reload, now you get the
familiar bulleted lists that you might
see in any number of websites, Google
Docs or the like. How does Google Docs
do it underneath the hood? Well, they're
just using a UL tag and some LI tags
inside of that to give you the bulleted
list that's just happening automatically
when you click the appropriate button in
something like Google Docs, which at the
end of the day is just a website. Well,
what if I want to number these things?
Well, if I go back to VS Code, I could
certainly just start numbering them like
1 2 3, which is fine, but honestly, like
computers can count and with loops
pretty quickly. Also, it's a little
annoying. If I want to go back in later
and insert something between some of
those elements, I then have to reumber
everything manually. I mean, this is one
of the things computers are good at. So,
take a guess. If I want not an unordered
list, but an ordered list that is
numbered, what might you change? Yes, O
is a good bet. Let's change both the
open tag and the close tag. Let me go
back to this uh my second tab. Reload.
And now we have it. Uh one, two, and
three. And you can actually use a whole
table of contents. You can use uh sub
bullets or subning. Anything you can do
in like a table of contents, HTML can do
for you automatically here. Well, what
about tabular data? Laying out data in
kind of rows and columns. Well, we can
do that, too. Let me go ahead and open
up a new file. Uh how about table.html.
HTML. Let me go ahead then in this file,
copy paste as before, just so I have
some boilerplate. Let's get rid of
everything in the body. And then let's
just manually whip up a little table
like this. Open bracket table. Inside of
the table tags, I'm going to have a TR
tag for table row. Inside of this table
row, I'm going to have a table data tag,
which is going to have the number one.
I'm going to give myself another two,
another three. Outside of the table row,
I'm gonna have another table row. And
I'm gonna create maybe four. And now I'm
going to do five. And now I'm gonna do
six. And you can perhaps see where this
is going. After this, I'm going to do
one more table row. How about a little
tediously? Seven. How about eight? How
about nine?
And then lastly, just to make it look a
little familiar, final table row. How
about with a TD of an asterisk? And then
how about a zero? And lastly, how about
a pound symbol? Maybe. Any guesses as to
what we're making in HTML here?
Like a telephone keypad. Yeah. So, let's
go back over to Let me close the old
file. Back over to the browser. Click
back. There's my new file, table.html.
And it's not going to be very pretty,
but I dare say that's exactly what you
see when you pull up the phone app and
you start dialing a number. It's sort of
a numeric keypad laid out automatically
for me in rows and columns. Now, this
one's a little underwhelming. Let me
open up a file that I made in advance of
class today. Um, in my favorites uh file
here, I'm going to go ahead and copy a
pre-made example. I'm going to open up
this file called favorites0.html.
And what you'll see here is a slightly
more complicated table, still with a
table tag, but this time with a t head
tag for table head and then a tbody tag
inside of which are all of those rows.
And I know this just by having read the
documentation. And then notice this.
Inside of the first TR in the T head,
there are three TH's, table headings,
timestamp, language, and problem, which
might sound a little familiar when we
last collected data from everyone via
that Google form. Well, let's go ahead
and spoil what this is. Let me go back
to the directory index. There is this
pre-made file, favorites.html, and
arguably a more compelling use of a
table. Now, we have an HTML table
containing all of the form submissions
that you all clicked in with the other
day when we were asking you your
favorite language and your favorite
problem. It's not super pretty, but
indeed it's in rows and columns. And so,
it's reminiscent of the HTML that Google
is using in the actual Google Sheets
software to lay out a sheet of data for
you in those same rows and columns. All
right. Well, let's do something that's a
little more visually interesting. Let me
go back to VS Code here. uh close out
those first uh those last two. And how
about let's do something with images?
Well, I brought again uh inside of
today's code. Uh how about our same
bridge that we keep opening up in class?
And this is the week's bridge. Looks a
little something Whoops. Uh looks a
little something like this. Here though
is just the raw image. How could I
include an image in a web page that I
serve up on the internet? Well, let's go
ahead and try this. Let me close the
ping itself. Let me copy this and create
a new file called how about image.html.
Hide my terminal. Copy paste that. Just
quickly change the title to image so we
know where we are. And inside of the
body of this page, let's go and embed
that image so that we can include not
just the image, but if we want
paragraphs of text around it, headings
as well. Heck, maybe a table, any other
features that we've seen already. I'm
going to say img, which is image for
short. Source src for short equals quote
unquote bridge.png. And then I'm going
to close the tag here. Now I'm going to
go back to my other tab. Go back into my
directory index. Here's my brand new
file, image.html. And this too isn't
going to look all that different from
the actual image because I have no other
content. But when I click on this,
you'll see that there is the full screen
image. And it's even a little too big to
fit in my viewport in the body of the
page. But we can fix something like that
later. I've embedded in this website
precisely that image. But I should do a
little bit better here. In fact, if the
image is slow to load or if someone uh
is visually impaired and doesn't know
what they're looking at, it would be
nice to have some alternative text that
something like screen reader software
could recite. So, there's another
attribute for this tag specifically
called alt for alternative. And I can
put something like Harvard University to
at least give the user a textual
description of what kind of photo
they're looking at. You'll also see that
text if indeed the image is slow to load
or if it's broken, like missing
altogether, you won't see 404. you'll
see like a broken image icon, but at
least with some explanatory text as to
what the developer intended you to see
at that point. It's not going to change
at all if I reload here by going back to
image.html, but again, a screen reader
or an astute viewer would see that
ultimately in the browser. But there's
something different, and this isn't a
mistake for once. What have I done
differently, but apparently not wrong? I
claim
something new or noteworthy about this
particular image tag. Yeah.
>> Yeah. There's no like close tag. There's
no like open bracket/ img which is the
pattern we followed for every other tag
like closing the HTML tag, the head tag,
the body tag and so forth. I just don't
see any end tag here. And it's just not
necessary. Turns out there are certain
HTML tags that can be empty elements,
which is to say doesn't make semantic
sense to start and end an image. Like
it's either there or it's not. And so
some tags just don't require an end tag
if it's sort of obvious to the browser
that the image should go there. So image
is one such of those tags. And then I
noticed um I'm missing the lang here,
which isn't strictly necessary because
I've got no textual content, but just
for consistency, let me go back and put
that in as before. Um, meanwhile, um,
the image is exactly as it would appear
in the screen, but it doesn't have to be
just an image we embed. We can do
something with like video. So, let me go
ahead and open up a file called
video.html.
Let me copy paste some of that starter
code. Change this to video. And instead
of the image tag, as you might imagine,
there's also a video tag. It's a little
more involved, but per the
documentation, I know I can do this
video. And then inside of the video tag,
I can actually have multiple sources
just in case the browser might want
different versions or different
resolutions, sort of qualities thereof.
And this somewhat confusingly is an
actual tag called source, not shortened,
but stupidly this tag has an attribute
called source, which is shortened that
equals the name of the file you want to
embed. And I came with today's examples,
a video file called video.mpp4, which is
a small video that you can embed. And I
can tell the browser what type of video
it is to be clear. And the convention
here or content type is to say the type
of this video is an MPEG 4 video. There
are other features though for the video
tag. In fact, in when you see a video on
a page, you can very often see like a
play icon, a pause icon, maybe some
other controls. Well, it turns out you
can put an HTML attribute on the video
tag literally called controls that will
enable those. If you don't turn them on,
there's no way to like start and stop
the video and or see rather those
controls visually. This way, the user
actually sees them. But this attribute
is a little bit different from others.
It doesn't actually need a value. It
just has to be present and the browser
will know when it sees the word
controls, oh, I should turn on the
controls feature. And for good measure,
especially in today's world of
advertisements everywhere, if you want
the video to play automatically
potentially, uh, or at least not annoy
the user, you might want to mute it by
default as well. So another attribute
per the documentation for the video tag
is that you can start the video muted as
well. And only when the user clicks on
it might you actually start to hear
something. But of course these are
fairly basic examples of media inside of
pages. Let's actually do what the uh H
is meant to imply in HTML. The hypertext
the ability to link from one page to
another. That is a feature we haven't
yet seen. So let me go ahead and do
this. And let me just for completeness,
let me go back into hello.html because I
completely forgot the language
attribute, even though that's really
just there for SEO, search engine
optimization, or for tools like Google
Translate or the like that know
therefore what language they're
translating from. Um, let me go into my
terminal window here and let's create
another file called link.html, which
demonstrates exactly that, the ability
to link from one web page to another. Uh
let's go ahead here and change the title
to link so I know where I am. And in the
body of this page, let's go ahead and
create what's called a hyper reference
or hyperlink. Uh I'll encourage people
in this page to visit the actual Harvard
website. So let's do visit. How about uh
Harvard period just to demonstrate where
we're beginning. If I go back into this
directory index, click on link.html.
This, of course, is not yet a link, so I
should probably make it one. Well,
instead of just saying visit Harvard,
maybe I should say harvard.edu. Go back
to the other tab. Reload. And it's
harvard.edu, but I can click and
highlight it, but it's not clickable.
It's not underlined like a link. All
right. Well, maybe I need to do like
www.harboard.edu.
Reload. Still nothing happening. All
right. Well, maybe I need the full URL
in the scheme. https
and maybe the slash at the end. Reload
again and nothing's happening. So here
too, HTML is pedantic. Like it will not
create a link for you unless you tell it
to create a link. And the fact that when
you post on social media nowadays or in
Google Docs, things are automatically
hyperl for you, like that's a feature
implemented in code. Very often, Python
or JavaScript or something else where
some human wrote code that looks for
patterns in the uh input you've typed in
and if it looks like you've typed a URL,
it will automatically link it for you.
But what are those websites doing for
you automatically? Well, they're doing
this. If you want to have a tag, a link
here to Harvard's website, you use open
bracket a for anchor, href for hyper
reference. Set that equal to the URL to
which you want to link. Close the tag
and then in between the open tag and the
closed tag, put the actual word you want
to link to. So now if I go back to this
page and reload, now I have what looked
like my original attempt, just visit
Harvard, but it's a hyperlink. And this
is super subtle, but if I hover over
that underlined word, which is blue by
default, you'll actually see in the
browser's bottom lefthand corner where
you're going to be whisked away to, even
though that's all too subtle, but this
now looks like I intended, an actual
hyperlink to Harvard. In fact, I could
link it to the full URL, but it would be
a little redundant. And even though this
looks like uh you shouldn't have to do
this, this is indeed how HTML works. The
href attribute is where you're going to
go. The text inside of the open and
close tag is what the user will see. So
if you want them to see the full URL,
you got to put it there. And now I can
see the full URL to where I'm being led.
But here's where you can actually
introduce discussions of like cyber
security. How could this feature be
abused? Might you think? This stupid
simple feature. Yeah. have it display
something but actually
>> yeah you could have it display one thing
but lead to somewhere else and it
wouldn't be that hard for the adversary
who's maybe tricked you into visiting
their web page to say you're actually
going to go to yale.edu edu instead of
Harvard. But if I reload the page, it
doesn't look any different. Unless the
viewer is astute enough to look at this
tiny little text in the bottom of the
screen or just click on the link and be
whisked away to the wrong destination.
That can be problematic. Like this is a
nice haha sort of prank. But you could
certainly imagine doing this with like
paypal.com addresses or any number of
banks or anything where you're trying to
collect personal information from
someone. And if the resulting website
looks quite like the one you're actually
creating, uh, it looks quite like the
website they're expecting, but it's
actually your copy thereof, it's all too
easy to wage what are called fishing
attacks. P H I S H I N G, which means to
lead someone to what looks like the real
site, but is not. Typically, to get
their username, their password, their
credit card information, or something
else. But it boils down to just these
basic building blocks like this.
questions then on any of these building
blocks that we've seen thus far. Yeah.
>> I think I might have gone lost in the
earlier portion.
>> Sure.
>> How did you um like get get it to open
up? Like did you run the file in
>> Oh, good question. How did I get it to
open up? So, let me rewind. So, the very
first thing we did after creating
hello.html HTML was open a terminal
window and specifically I ran a command
which was HTTP server http-server which
starts my own web server in my code
space but not on the default port 80
and443 because that's what cs50.dev is
already using instead it chose by our
design 8080 which is commonly used by
developers when making websites. Then I
just kind of hid my terminal because
it's not interesting to see constantly
then. But that web server is still
running in my code space. And anytime
I'm saying let's go back to this tab, I
am now visiting a different URL that was
the result of my clicking on that green
button which led me to my own website.
If you ever get lost or close that tab
by accident, no big deal. If you go to
the ports tab of your terminal, you can
actually hover over this and click on
that same URL and open up the contents
of your own site instead.
>> Fluffy meme. Yes, these are randomly
generated names by GitHub, which is the
company that hosts VS Code in this way.
And they do this to ensure uniqueness
without it being some arcane sequence of
random letters and numbers. They
concatenate random English words
together. A good question. All right.
So, what else can we do here? Well, let
me propose that there's a bit more you
can do with even these URLs. Here, of
course, is the scheme and the host name
and the domain and the TLD. But after
the URL, things can get a little more
interesting than just folder names and
file names. In fact, it's quite common
to see URLs that have somewhere in them
a question mark and then a bunch of
other key value pairs which is this
omnipresent computer science thing it
seems including in the context of URLs
whereby if you want to pass a input to a
web server one means by which you can do
that is literally in the URL itself. So
for instance, if you visit google.com
and you want to search for something,
you and I are all in the habit of course
of just typing into a search box. But
how is that search box actually getting
the data into Google's servers? Well,
it's via these URLs. And if there's not
one input, but two inputs, the URL might
be a bit longer and there might be one
or more amperands in the URL that just
separate more key value pairs. And it
turns out we can see this in the real
world as follows. Let me go back to VS
Code here. Let me open up a new tab. Uh,
and let me open up uh, google.com. And
I'm just going to hit enter on the
shortest way of saying it. So, I get to
Google's home uh, homepage here. Even
though notice I ended up at some longer
form of the URL. In fact, I'm going to
delete everything else from the URL
that's not relevant to us today. It's
still forcibly coming back. So, Google
is somehow trying to track me by putting
that in there. That's fine. All I'm
going to do is search for cats. Now,
there's a whole bunch of other
functionality that's clearly happening,
like autocomplete, and it's trying to
figure out what results or words I might
want. I'm just going to go ahead and hit
enter. And this is all to say that
notice if I zoom in on the URL at the
top of my screen, it's a crazy long URL
because Google probably is doing a bunch
of tracking and advertising and
analytics technologically, none of which
is relevant to us today. But notice
after www.google.com,
there's /arch, which is the path on
their server, the search program that
someone there has written. There's a
question mark and then there is an HTTP
parameter as these things are called the
more precise name for key value pairs in
URLs. This is an HTTP parameter. Its
value after the equal sign is in fact
cats. All this other stuff I have no
idea what it is. I'm going to just
delete it and hit enter and it stays
gone. But I still get cats in my search
results. So this I would argue is sort
of the canonically shortest form of a
Google URL that's useful. In fact, if I
want to search for dogs instead, I don't
have to use the search box. I can
literally manually make my own URL, hit
enter, and if I zoom out, there are
Google search results about dogs. So,
this URL 2 is sort of the essence then
of how URLs work. And specifically, the
get verb, which was that keyword in all
caps that I claimed was inside of the
envelope, and it's what Phyllis's
browser was sending, and it's what my
browser has been sending through all of
these examples. But here's where things
now can get interesting. If I know how
Google's server works, its backend, the
part that knows all about cats and dogs
on the internet, I can implement my own
front end by just knowing a bit of HTML.
So, let me actually go back into VS Code
here. Let me go uh into my second
terminal, which is blank, and let me go
ahead and create something called
search.html.
I'm going to go ahead and copy my
original code, close link, and paste it
here. Hide my terminal. call this thing
search and then inside of the body of
this page I'm going to make my own
version of Google here. I'm going to use
a form tag and I'm going to in that form
specify an input tag whose name is going
to be exactly equal to what I saw Google
uses Q which happens to stand for query.
Uh I am then going to add another one
input. Uh the type of this button
actually let's say the type of this box
this input is going to be text. The type
of this next one is going to be a submit
button. Uh, and then that's it. Let me
go back into my other tab. Go back into
my directory listing. Click on
search.html. And this is not pretty, but
it is the beginning of my very own
search engine. Unfortunately, if I type
in cats, notice what happens. My URL
changes such that it's search.html
question mark q equals cats. I know
nothing about cats. I don't have a
database of cats. I haven't done any
backend work, just the front end. The
front end is what the user sees. The
back end is what provides data to the
front end. But why don't I tell this
form not to submit to me. But let's say
that its action should actually be go to
go to https
www.google.com/arch
which is the URL that I saw in my
browser. I'm just inferring how Google
works. I'm going to be pedantic even
though this is the default. I'm going to
say the method I want my form to use is
get. Confusingly, it should be lowercase
here, even though inside of the envelope
it will be all caps. And then I'm going
to go back to this page. Reload after
going back. And you'll see the same
exact box, but when I search now for
cats, submit, notice my URL changes to
Google's own. It's like voila. Like I
just implemented my own Google without
doing the actual hard part. I've
actually just done the more simple front
end. And there's a few other things I
can do here that are sort of nice. I can
change the type to be a search box. I
can change the value of my button, not
to be the default, which notice was
submit. I can say Google search. And I
can keep tweaking this to make it even
prettier and prettier here. Now in my
version is now a box that has uh cats.
Notice that it's trying to complete my
thought. I can actually go back into the
form. I can say autocomplete equals off
to turn off that feature. So now if I
click in this box and type Oh,
autocomplete equals off. Why is it still
there?
>> Did I forget to refresh? Oh, thank you.
I forgot to refresh. Hence my point. So
you always have to reload after making a
change. And now the autocomplete feature
is off. And this other little thing,
it's subtle, but this little X that will
just clear the whole thing. That is
simply the result of having changed text
to search for the type of that box. Um,
there's other things you can do too for
accessibility or user friendliness. I
can do auto uh focus here for instance
without any attribute or without any
value. If I now reload this page, notice
that the cursor is automatically
blinking in the text box, which is a
marginal change, but much easier for me
to now type cats without having to
stupidly click in the box in order to
actually foreground it so I can type
input. So, suffice it to say, this is
not really the business that Google is
in. They do much more on the back end
than they do on the front end. But with
just these basic building blocks, can I
implement the beginnings of the same
website? In fact, let me do one other
flourish. You'll see that that text box
is blank. Not clear what I might want to
do. Well, there's another attribute I
can use. Placeholder equals something
like query. I can at least tell the user
what to search for. If I reload again,
now I see in gray text query
instructions so that I roughly know what
now to type. So all these things that
you see every day on websites are really
as easy as just coding up some HTML like
that. But what else can we do with HTML?
Well, it turns out this is a topic for
another longer day too. There exist in
computing what are called regular
expressions which is a fancy way of
describing patterns which are quite
useful when you want to validate input.
For instance, if you want the user to
have to type in an email address with
the at sign with the tldd and so forth,
it would be nice to make sure that they
get a warning if they try to skip that
field or they mistype something in it as
well. Um, with the world of regular
expressions known in short as reg x's,
you have a whole bunch of uh
documentation here that in a nutshell
will introduce you to some pretty
powerful syntax that we won't spend much
time on at all today, but it's syntax
that exists not only in uh the world of
the web, but in Python and so many other
languages as well. So consider this just
a quick crash course. If you want to
define a pattern in say a website that
ensures that the user types in a email
address, you can use these textual
building blocks whereby in the world of
regular expressions, a single dot
represents any character. If you don't
care what the character is, dot
confusingly doesn't represent a period,
it represents any character. Star
represents zero or more times. Uh plus
means one or more times. Question mark
means zero or one time if you want
something to be there or not. curly
braces with a number means this many
times n and you can even have a range of
values instead. And then you can use
square brackets and some other syntax to
say I want the user to type in any of
these characters or digits in this case.
Or you can do ranges like this. I want
them to type in any decimal digit
between 0 and 9 or back slashd
represents any digit. Back slash capital
d means anything that's not a digit.
Long story short, humans over the years
have come up with shorthand notation
known as regular expressions via which
you can define patterns. This is useful
because if I wanted to make a web page
that does in fact require that someone
type in say an email address, I can
enforce that to some extent. If I go
back to my browser here and into VS
Code, let me go ahead and create a new
file called say register.html to be
representative of registering for some
website. I'll change the title here real
quick. I'm going to keep the form, but
in this case, I'm not going to bother
with Google anymore. So, let's make it a
bit simpler than before. And let's go
ahead and do this. Inside of the form,
I'm going to have an input. Uh, I'm
going to have the name of this input be
email because that's what I'm
collecting. I'm going to have a
placeholder be quote unquote email so
the user know what's to type in. Um, and
I'm going to go ahead here and have
something like how about
uh this a pattern as well. So actually
let's say uh let's say type equals text,
but I'm going to specify additionally a
pattern. So the pattern I want the user
to type in in between these quotes is
going to be any character one or more
times. That is to say their username,
then an at sign. then any character one
or more times. Uh then literally a
period and we didn't see this on the
screen but just like in C when you want
to escape special characters if you want
literally a period in their input as the
like the dot in harbor.edu you can say
backslash period to mean a literal
period and then the word or the uh tld
edu. So I think now what this means and
let me go ahead and give myself a button
and just so you've seen it there's also
a button element in HTML which is
similar in spirit to the submit button
we saw a moment ago. Let me go back to
my directory listing go into
register.html
and let me go ahead and just type in
like mail as my name register and you'll
see please match the requested format.
So I have not satisfied it properly
until I actually type in something like
and now it's happy. Alternatively, it's
a little tedious to actually type in
these patterns. So, there are some
shorthands for them. I can actually get
rid of this pattern. And if I read the
documentation for HTML, there is
actually an input of type email which
just does all of that pattern matching
for you. But the scary thing is that
it's actually pretty involved to
validate email addresses. I did a very
simplified version of username at
domain.tld.
This is the regular expression that some
browsers use to validate email addresses
because even though mine is relatively
simple [email protected], turns out
there's a crazy amount of syntax that is
valid in email addresses. And this is
where regular expressions get scary. But
for our purposes today, they're a thing
that exists. You might find them useful
in HTML. You might find them useful in
Python. They're incredibly useful when
it comes to extracting information from
web pages. If you're analytically
minded, you like the world of data
science, you like to uh gather and
analyze data, you can use regular
expressions not just to validate data
but to find patterns of data in actual
websites or documents and extract that
data so as to perform operations or
analysis on them. So wonderfully useful
if complicated tool. The catch though is
this. Notice that here I'm still
required to type in a valid email
address register and I'm getting even
more explicit information this time
because I use the type equals email. The
catch though with web pages is that
they're not to be trusted in so far as
this HTML came from the server and is
downloaded onto the user's Mac or PC or
phone where they have a copy thereof. I
can open up developer tools as I did
before by right-clicking or
control-clicking and choosing inspect or
whatever the menu option might be. I can
go into the elements of this page,
literally the HTML, and if I don't want
to type in email, I want to just type in
any old text and see if I can break your
site, I can just change it. And now
there is no such warning. Which is to
say, even though you will encounter, not
just today, but over the coming weeks as
you play with HTML certain features,
they are not to be trusted in general
when it comes to security. And just like
our discussion in the world of SQL and
SQL injection attacks, this is one of
the attack vectors. If two people are
working on a website, one person's
implementing the database stuff, one
person's implementing the HTML, and the
database person's like, "Oh, I don't
need to worry about escaping characters
because we're doing you we're using the
pattern attribute in the HTML." Bad idea
because it's this easy to hack a
website, disable features that have been
written for the site by just literally
deleting them in your own copy. So,
we'll see next week how we can defend
against this on the server side, but the
point now is just not to trust the
user's input at all.
All right. How can we be sure our HTML
is right? Well, there's a bunch of ways,
but one tool that's worth knowing about
is this one here at validator.w3.org
is a website uh by the group that
essentially standardizes this and other
languages. If I click on their validate
by directput tab and I quickly go back
into VS Code and let me grab the
simplest of my examples, hello.html, I
can just copy paste that into their
website. Click check and they have
written code to validate that the HTML I
have written is in fact correct.
Anything I've opened that needs to be
closed has been closed. I don't have any
stupid typos or missing brackets or
quote marks. This is a wonderfully
useful tool just to validate that your
code is syntactically correct. Even
though it might still look like a mess
visually on the screen, this will at
least check for you the underlying HTML.
All right. So, up until now, everything
I've done has been pretty boring. It's
black and white. The pages are fairly
simplistic. Turns out we can take things
the final mile using another language
altogether. Namely, something called
CSS, which is the second of our three
languages today. This two not a
programming language, although
curiously, they keep adding more and
more features that are making it more
and more like a programming language,
but more on that another time. This
stands for cascading stylesheets. And
whereas HTML is all about the skeleton
of a website, the structure thereof, CSS
is like the the skin, the aesthetics
thereof, the final mile that actually
allows you to control the positioning of
things more precisely, the colors, the
font sizes, all of the aesthetics. It
lets you do the finer touches on the
website. And with CSS, we have slightly
different syntax, but frankly, it just
boils down to even more key value pairs.
And as with HTML, we'll give you a taste
of the basic structure and principles
underlying CSS. There's so many uh key
value pairs that are possible that we
certainly won't do them justice today,
but it's the kind of thing where you
ultimately look it up in a reference, a
book, um a website, or the like to pick
up even more than these techniques.
Well, let's do this. Let me propose that
in a moment. We're going to see what are
called properties. This is CSS's jargon
for key value pairs. Why do we have yet
another word? because a different group
of humans in a different room came up
with this language versus the other
people. But it's just key value pairs
known as now as properties instead of as
attributes in HTML itself. There's going
to be different ways we can define
properties and this is kind of a laundry
list of some of them and we'll see them
in context. But in short, CSS is just
going to allow us to slap a whole bunch
of key value pairs on our HTML elements
to make them hopefully look prettier or
be more precisely controlled
aesthetically. So, in my HTML, thus far,
we've generally had something that looks
like this. Turns out, if I want to start
using some CSS, I can introduce, as
we'll see, a so-called style tag in the
head of my page. And inside of that
style tag, I can put these so-called key
value pairs. Or, as we'll soon see too,
if I want to factor them out and put
them into a separate file, I can
actually use a link tag, which
confusingly has nothing to do with
hyperlinks or clickable text, but just
links in another file. In this case,
styles.css. the relationship of which
shall be that of stylesheet. This the
sort of copy paste stuff that you do
where the only thing you really care
about as the developer is the name of
the file in which you're putting your
styles. All right, let's do this. Let me
go back over to VS Code, close out
register.html, open up a new file this
time called home.html, and let me
purport to make a simple homepage for
someone like John Harvard. I'll copy
paste my boiler plate. I'll change the
title here just to be uh let's say uh
home. And then inside of the body of
this page, let's do the simplest web
page possible for someone called John
Harvard. I'm going to say here's a
paragraph of text uh when John Harvard
is going to be the person's name. Here's
another paragraph of text. Welcome to my
homepage will be in the middle of this
page. Then a final paragraph of text
inside of which is like copyright.
See how about uh John Harvard down here.
So, it's a basic website. It's just
three paragraphs. It's not going to be
pretty, but let's make sure I haven't
done anything wrong. Let me close my
developer tools. Click back. Click home.
And there we have it. The simplest of
pages for John Harvard. Welcome to my
homepage. Copyright John Harvard. Let's
at least start to exercise some control
over this. Let's change the font size
and the alignment of the text. So, back
in VS Code, let's go ahead and add uh
for now, actually, not even a style tag,
but a style attribute. I'm going to go
ahead here and type in style quote
equals quote unquote font-size
large and then text-all
colon center semicolon. And I apologize,
but semicolons are back in CSS. Then, in
my next paragraphs, open tag, let's do
something similar, but different. font
size colon medium for medium text align
colon center semicolon. Uh, and then
lastly down here, let's do style equals
quote unquote. Font size colon small
because it's the footer, so who cares?
Text align colon center semicolon.
Strictly speaking, at the last key value
pair, otherwise known as a property, you
don't need the semicolons, but just for
consistency, I'll keep them uh for for
that. All right, let's go back to this
page, reload, and watch. All of the text
a moment ago was left aligned and the
same size. Now, it's a little subtle,
but it's clearly centered, but it's
large, medium, and small, respectively.
Even if you've never seen CSS before,
what rubs you wrong about this design,
though, based on all weeks past?
Yeah.
>> Yeah. For every line, I've been
repeating myself with text align center.
Text align center. text in line center.
And if we really want to nitpick, these
aren't really paragraphs, right? There's
like no phrases or full sentences, let
alone paragraphs. So, it turns out
there's a whole bunch of tags we can use
to lay out a page. And in fact, I'm
going to transition to one that's a
little more generic than paragraphs,
namely div, which is just going to
create a division in the page for me.
And this doesn't have any functional
impact, but semantically it's a little
nicer because it means I've got the
division here for the header, the
division here for the main part, and the
division down here for the footer. It's
just a different way of thinking about
it. is just different rectangular swaths
of the page. But I like your point that
text align center is kind of stupidly
duplicated all of these times. Let me
actually go ahead and first reload this
change because there is one side effect
that we might want to get back. When I
reload now using divs instead of
paragraphs, well, there goes the nice
white space in between my text. Divs
just give me rectangle after rectangle.
And as an aside, let me control-click or
rightclick, open up developer tools yet
again, and notice this other trick with
your elements tab. Whatever you hover
over at the bottom of your screen will
be colorcoded at the top of the screen.
So if I dive into the body by clicking
this little triangle, let me zoom in. At
bottom left, I can now see my own HTML
much more uh pretty printed and colorful
down here. If I click on this one or
hover over it, you'll see that the first
div, the rectangular region is
highlighted. Now the second, now the
third. That's all we mean by divisions
of the page. Um, this allows me to see
my copy of it in the browser as opposed
to in the original file. So just another
technique for developer tools. All
right, but I don't like this
duplication, but here is now the C in
CSS. Cascading stylesheets means that if
you want one property or key value pair
to sort of cascade down on all of the
other tags inside of that one, you can
do that. For instance, in the body tag,
I can add my own style attribute here
and put all of that text align center
there. Why? Because div are the three
children of the body tag to borrow our
vernacular from family trees and from
trees more generally. So, this too
should work because text align center
should cascade down now on all three of
those children. And indeed, if I reload
the page, nothing visually changes, but
it's arguably now better designed.
All right, what more could we do here?
Well, how about this? It would be nice
to make clear to servers out there, like
search engines, like what's going on in
the page semantically. And the term of
art out there nowadays is the semantic
web, which essentially is about putting
more hints in your HTML so that servers
like um search engines kind of know more
so what they're looking at. This is
pretty generic right now. Div, div, div.
But presumably the top of the page is
among the most important things because
that's effectively like the header of
the page. Then the middle div is kind of
the second most important because it's
like the main part of the page and the
footer is like the least important. So
it turns out there are other tags in
HTML besides paragraphs and divs. There
are literally tags like header which
allows me to define the header of the
page, main which allows me to define the
main part of the page and then even
footer which allows me to define that
too. So now if Google and Bing and other
search engines are sort of crawling my
website once it's public, they know that
John Harvard's important because it's in
the header, uh, welcome to my homepage
is important because it's in the main
page. They're probably not going to care
as much about the copyright because it's
in the footer. So it's just providing
more hints to these kinds of services.
Um, moreover, we can do some other
things here. This is kind of a hackish
way to implement a copyright symbol.
HTML also has what are called entities
where if I can do this magical
incantation here, uh, amperand hash
symbol 169 semicolon. Notice that VS
code recognizes this as an HTML entity.
If I go back to this page and notice my
first approach was just parenthesis C
parenthesis. If I reload now, having
used that HTML entity, which I only know
by having looked it up, now I get the
copyright symbol that actually comes in
the font that's being used here.
All right, so let's transition now to
this approach whereby I claimed before
that you can actually use a style tag.
And why might we want to do this? Well,
looking back at my code here, this is
sort of hinting at potentially bad
design. Even though there are different
arguments for and against this, right
now I'm sort of co-mingling my data with
my presentation thereof. Like John
Harvard, welcome to my homepage and
copyright such and such is sort of the
data I care about. Um, but I'm sort of
mixing in the stylization of all of this
stuff by putting CSS and HTML in the
same place. So to be clear, all of the
green stuff and even well everything
we've seen thus far, the tags and the
attributes, that's all HTML syntax.
Everything between the quotes is now
CSS. And this is the first we've seen
this before only in the sense that we've
used SQL inside of Python code. Here
we're using CSS inside of HTML code. But
the CSS syntax is everything thus far
inside of those quote marks. Wouldn't it
be nice to kind of factor that out so
that I can see it all in one place and
better still factor it out ultimately to
another file? And I can do this as
follows. Let me in my home.html HTML get
rid of all of these style attributes and
really go whittle the page down to its
essence whereby I just have the header
main and footer tags inside of which is
that content. It's already easier to
read at least for me the human inside of
my head tag. Now though let me go up and
say style and inside of this new style
tag let me show you another approach for
stylizing the page. Up here is where we
can actually select elements to operate
on using what are called selectors. So
if I want to modify the style of my
page's body, I can do that by typing
body. And then I'm afraid curly braces
are back in CSS 2, I can put text align
center up here. And the fact that I've
put the word body before those curly
braces just means all of these key value
pairs, one in this case, will operate on
the body. Meanwhile, down here, I can
say the header is going to have font
size colon large. Uh, the main part of
the page is going to have font size
colon medium. And then lastly, the
footer of the page is going to have font
size colon small. You know, definitely
more lines now, which isn't the best,
but the effect now if I go back to my
browser and reload visually is pretty
much the same. I've just relocated all
of those key value pairs elsewhere, but
as a stepping stone now for doing
something a little smarter whereby I now
can uh lay the foundation for putting
this in another file al together. But
first, let me note this too. The fact
that I've put all of these key value
pairs associated with specific HTML tags
doesn't really make them very usable or
re rather reusable. And so when I
alluded to earlier that these properties
can be applied to different selections
of HTML type selectors, class selectors,
ID selector, attribute selector. Let's
just give you a little taste of this.
What do we mean? Well, suppose that I
want to generically be able to use text
align center uh without associate it
only with the body. Maybe I want to use
this for a larger project where I want
to uh center many things on the page. I
can define my own keyword like the word
centered which doesn't exist per se but
if I prefix it with a dot what I've just
created is what's called a CSS class and
a class is just a set of key value pairs
properties that you can associate with
any HTML tags meanwhile if I want this
key value pair to be associated with the
notion of large I can define large I can
define medium and I can define dot small
down here the motivation for which is
that now in my page page. If I want to
center the body, oops, let me fix my own
typo. If I want to center the body, I
can say please use the class known as
centered on this tag. And then on the
header, I can say please use the class
known as large on this tag. And then
please use the class called medium here.
And then lastly, use the class called
small here. So now in the spirit of a
lot of the modularization we did in
Scratch and in CN Python of making your
own functions, classes aren't functions,
but they are a way to encapsulate one or
more properties and use or reuse them
anywhere you want in a web page. It's
not that over it's not that impressive
here in this short one, but it lays the
foundation for doing much more
interesting things soon down the road.
In fact, let's take a step in that same
direction. Let me go ahead and now
highlight everything I've put inside of
this style tag
um and cut it onto my clipboard. I'm
going to get rid of the style tag al
together. I'm going to create quickly a
new file comb.css and I'm just going to
paste all of that stuff in there. And
just to be nitpicky, I'm going to
de-indent it so it's all left aligned.
So all I've done is just move everything
I just wrote into a new file called
home.css.
I'll close that. Out of sight, out of
mind. But what I'm going to do now in
the head instead of a style tag which
contained all of that clutter, I'm going
to say link href equals home.css and
then this real tag which just means the
relationship of this file to this one
should be that of a stylesheet. And this
tag 2 does not need to be closed. It
just is. And now if I go back here and
reload, still no changes other than the
tweaked the font a moment ago. Still no
changes. But now it's better design with
that file completely separate. So where
are we going with this? Well, just to
kind of circle back to something we did
earlier, let me open up my terminal
window. And recall earlier we had this
file like favorites0.html.
And this contained all of the data from
a couple of weeks back that we solicited
via that Google form. And recall a bit
ago when we went into favorites 0.html.
I mean, it was just kind of an ugly uh
table structure. But it turns out in the
world of uh in the world of HTML and
CSS, there are also what we're going to
call frameworks, which is a fancy word
for library. But a framework is sort of
a way of doing something by using
someone else's library. And to do it
their way, you just read their
documentation and then you adopt their
functions in the case of code or you
adopt their CSS classes in the case of
this example. So, one of the most
popular frameworks out there nowadays
and among the simplest and best
documented is one called Bootstrap. Uh,
which is a set of uh CSS classes and
other features that you can use because
it's open source in your own code. And
in fact, all of the documentation is at
this URL here. I read the documentation
before class and I copied really the one
line of code that I need to make
favorites.html
even prettier. So, let me go back into
VS Code and let me copy my pre-made
example from earlier. And you'll see
that in favorites, whoops, favorites
one.html,
I have all of the same code, all of
those lines of everyone's submissions.
But notice I've added now this link tag.
And it's a little longer than the one I
wrote. It's referencing a third party
website, JS Deliver, which is a CDN,
content delivery network, which is to
say a server that just serves up content
for other people to use. But I copied
that from Bootstrap's own documentation.
And what I did here is the following. I
added a class to my table tag
specifically with a value of table and
followed by a space table striped. Why?
Well, I read Bootstrap's documentation
at that previous URL and I liked the
look of their tables because it lays it
out with nice stripes like white and
gray and white and gray and it sort of
formats everything quite a bit nicer.
So, if I go into this version in my
second tab by going back first and now
opening up favorites 1.html, HTML, same
exact data, two lines of change, and
voila, now we're talking. This looks
much more like a table that you would
see on any pretty website like your
Gmail inbox or the like, all by simply
changing the CSS and not really the HTML
at all. So, the motivation for
introducing those classes a moment ago
was so that we can have reusability of
code. And better still, we can start to
stand on the shoulders of others by
using code that other people have
written in order to improve the
aesthetics of our own websites
as well. All right, how about a couple
of final flourishes with some style? Let
me close out these examples here and let
me propose to go into how about that
same link example from earlier. So, let
me reopen link.html, which recall had
this fishing attack at the time. I'm
going to revert this to the safe version
and just say visit Harvard at Harvard's
actual URL. Suppose I wanted to stylize
this link beyond the default. Well,
let's see what it looks like by default.
If I go back into link.html, this is
what it looked like before, blue and
underlined by default per the browser's
decision. But I can override that and
any number of ways to keep things
simple. I'm just going to stay in my
same file now rather than uh be pedantic
about moving it to another file. And if
I want to stylize the anchor tag, just
as before, I can say a and then in some
curly braces here, I can do something
like this. Color uh colon red. If I want
to make it crimsonlike instead, let me
go back to VS Code or my other tab.
Click reload. And now we have a red tab.
I can really geek out. And if you
remember your hexadimal codes from our
discussion of images a few weeks back, I
can do hash FF000000,
which is a lot of red, no green, no
blue. And if I go back to my other tab,
click reload, same exact thing. You have
that much control over even the color
codes that you might use. Maybe you
don't like the underlining in this
particular case. Well, that's fine. I
can do something like text decoration
none per the documentation. I can reload
and gone is that underline. Maybe it'd
be nice to hover over the word and then
see the underline. Well, I can do that,
too. Turns out I can have these pseudo
selectors whereby I say the name of the
tag, then a keyword like hover, which
browsers know to recognize. And when I
hover over an anchor, what I want to do
is change the text decoration to
underline temporarily. If I go back to
this tab now, reload, looks the same,
but as I move my cursor over, notice
that it's underlining it for a visual
effect. Let's see what's going on with
my developer tools. If I right click
anywhere and choose inspect, notice a
detail I haven't showed us before is not
uh is under the elements tab here.
Notice if I go down to my link here and
let me just make the right hand pane
here a bit bigger. All this time but
ignored up until now has been this part
of developer tools whereby I can
actually see all of the CSS that applies
to the element I have just selected,
namely this link. And I see here in nice
pretty printed fashion that I'm using
this color FF00000000 text decoration
none. Why is this useful? Well, one, if
you want to learn from another website
how it's doing its thing, you can just
look at the CSS, but also if you want to
be able to iterate more quickly and just
kind of tinker with things, I can
actually turn the color on and off by
just hovering over the inspector here
and just turn it on and off by clicking
and uncclicking. And if I want to just
play around with, oh, maybe maybe
Harvard should be 00 FF0000, enter, I
can make it green instead. So, you can
temporarily change the browser's copy of
your own HTML or CSS just to tinker and
iterate quickly just like I tinkered
with Stanford's uh own website or at
least my own copy thereof.
Lastly, how about in terms of these
selectors? These are using type
selectors that is selecting the name of
the tag. If I want to actually uh affect
one tag specifically, a very common
convention is to give an HTML element a
unique ID. For instance, I'm going to
call this Harvard. And by uh honor
system, I should not give any other
element in this page an ID of Harvard.
The motivation is that I can now
uniquely identify this tag by for
instance changing this to hash Harvard,
which is just the convention for
specifying that it's not a class now.
It's instead an ID. You do not put the
hash though in the actual value down
here. And what I can even do down here
is something like um uh hash harbored to
scope that as well. If I now reload,
we're back to the red version and the
same functionality as before. And it's
just a more precise way now to target
your CSS properties to a very specific
element instead.
Okay, that was a lot. Any questions on
any of this thus far?
No. That clear? All right. Well, one
last language for the day. And and we do
mean what we say like that is the extent
to which you will learn formally HTML
and CSS like everything else just
follows those exact same patterns. It's
different classes. It's different
attributes. It's different tag names.
All of which can be picked up through
practice, through uh osmosis, through uh
references. But that's really it for the
fundamentals. And so our last focus
today is on an actual programming
language that we'll just scratch the
surface of, if only because it's so darn
omnipresent nowadays. Most every website
you use is made from not only HTML and
CSS, but if it's in any way interactive,
odds are it's using JavaScript, a
programming language that is very
commonly used client side whereby humans
write the code on the server, but then
your browser as before downloads it to
the client and then it runs in your own
Mac, your PC or your phone. That said,
JavaScript is also very popular on the
server nowadays. It's not just a
browserbased language. In JavaScript,
what you have most powerfully though is
the ability in memory to mutate this
tree in real time. In other words, think
about even your Gmail inbox or your
Outlook inbox. Typically, you see email
after email after email after email.
Odds are per today, what HTML tag is
creating that UI of row after row after
row?
Which tag?
like table tag like the table tag
probably right table row table row table
row but it wouldn't make well actually
this is the way things used to work in
my day back in the day when you visited
not even Gmail before it existed but
your email inbox you would download from
the server a web page containing a table
tag with table rows and table data
elements and that was your inbox if you
wanted to see if you got new mail you
just reload the whole page and it would
download new contents from the server
and show you the new HTML with
JavaScript
which has come onto the scene over the
past 20 plus years. You have the ability
to download the data once initially,
then use code to just grab some more
data every 30 seconds or some more data
pretty much anytime an email arrives.
And if this picture here represents not
our super simple hello title, hello body
page, but a whole bunch of table rows
for your existing email. The moment you
get more email, you can use JavaScript
code to add another node to this tree,
another node to this tree representing
the table row tag. the table row tag
again and again. So in short, with
JavaScript, you have the ability to
change the tree, otherwise known as the
document object model or DOM for short,
dynamically in order to evolve the web
page. So let's take a quick tour of what
JavaScript does have syntactically and
then I'll just demonstrate some of the
capabilities thereof without dwelling
today on syntax beyond this. So in
Scratch, which is looking pretty good
now, you had conditionals which looked
like this. In JavaScript, it's pretty
much the same as C. The curly braces are
back at least for uh two or more lines.
Uh but uh indentation doesn't matter
except for the style thereof as it uh as
in contrast with Python. If you have an
if else, it's going to look the exact
same in C. If you have if else if else,
you have the exact same thing in C.
Different from Python because this was l
if in Python. Now we're back more
verbosely to else if as in C. Uh
variables in JavaScript. Well, here in
Scratch is how you set a variable
counter to zero. In JavaScript, there's
a few ways to do this, but the most uh
reasonable for now is to let counter
equal zero. So, you don't specify the
type. This is more of a polite way of
asking the browser, please let a
variable called counter exist and set it
equal to zero by default. Semicolons are
back. However, that's not strictly true.
Browsers are smart enough to know where
semicolons actually matter, but for our
purposes, assume that they're always
there. How do you change counter by one?
Well, you can do it the pedantic way,
which is a little verbose. You can do
the plus equals trick or nicely back in
play is the plus+ in JavaScript just
like in C but not in Python. Loops in
JavaScript. Well, in Scratch, if you
want to do things three times, here's
how you would do it in JavaScript. It's
pretty much the same as C except for not
mentioning the data type. Instead, you
use the keyword let here. But otherwise,
this is exactly the same as in C. Uh if
you want to do something forever for
whatever reason in JavaScript, you can
say while true, which is exactly how we
did it in C. If you have a web page like
this, meanwhile,
and you want to insert some JavaScript
to it, you can do it in a couple a few
different ways. You can put a script tag
just like the style tag in the head of
the web page. This can get you into
trouble though for reasons you might
encounter whereby if you put your
JavaScript code up here and you try to
use it to modify the web page but the
web page isn't defined until down here
you can get into some uh a race
condition really where the data does not
yet exist. So um you instead of putting
it there or even in another file, it's
actually pretty common too to avoid that
altogether by putting your script code
or your script tag at the end of the
page just before the end of the body to
ensure that all of the web page exists
already. This is similar in spirit to
the deaf issues we saw in Python or the
prototype issues we saw in C. There's
bunches of solutions though to this here
problem. But let's now take some
JavaScript for an actual spin and use VS
Code to write some of it as follows. uh
in VS Code. Let me go ahead and close
link.html,
open up my terminal temporarily, and
let's improve my actually let's just
improve the very file, hello.html, that
I have here in front of me, and actually
have it be more interactive and give me
sort of a popup on the screen when I
type in my name. So, let's start as
follows. First, let's go ahead and
change this just to uh hello, just for
short. And in the body of this page,
let's give myself a form. And in this
form, let's give myself an input. Uh,
we'll turn off autocomplete just to
avoid distractions. We'll turn on
autofocus to save me a click. I'm going
to give this HTML element an ID uniquely
of name. A placeholder also of name just
so the human knows what to do. And the
type of this field shall be text. In
other words, I want to create a program
week one and week zero where I type in
my name and see hello such and such. I'm
going to give myself an input a submit
button with input type equals submit.
don't really care what the button says,
but I do care now when I go back to my
other tab, close my developer tools, go
back into hello.html, I now have
something that looks like this. It looks
similar to our search example for cats,
but now I'm asking the user for their
name along with the submit button. But
what I want to have happen is when I
type in David and click submit, I want
to see hello David somewhere on the
screen. Well, how can I do this? Well, a
few different ways, but JavaScript
allows me to do things like this. And
for upcoming problem sets, you won't
necessarily have to write JavaScript
like this. So consider this a whirlwind
tour, not so much uh something to
ingrain. Here I can add a new attribute
to the form tag called onsubmit, which
as the name suggests means call the
following function when this form is
submitted. Well, what function do I want
to call? I'm going to call it a greet
function. And that's it for now. How do
I define a greet function? Well, I
could, among other places, put this
inside of the head of my page in a
script tag. I can define a function in
JavaScript by literally saying function
and then the name of the function and
then in parenthesis any arguments there
too. I'm not going to have any. And then
in curly braces, I can actually define
the meat of that function. And for
instance, I can do this. Uh, let name
equal the following document.query
selector. And now what I want to do is
this. Document is a global variable that
just comes with JavaScript in the
browser that allows me to write code
involving the whole document, the web
page itself. Query selector is a fancy
name for a function that lets me select
specific elements of the page using CSS
selector. So the very same syntax we saw
with names and with dots and with hash
symbols a moment ago are back in play
for JavaScript here. So if I want to
create a variable that stores the name
that the human typed in, what I can do
is pass to query selector a selector for
that element, which is quote unquote
hash name, where hash just means ID. But
the reason I'm using name is because the
unique identifier I put here is name. If
I change this to foo nonsensically,
that's fine. I just have to change this
to foo up here. So I'm in full control
over what is called what. But if I want
to get the value that the user typed
into that box, I now do value. And we've
seen these dots before. In C, they were
for accessing strrus. In Python, they
were for accessing contents of objects.
So this just means use the document
global variable, use the query selector
function or method inside of it, get the
element whose unique ID is name, and
then go inside of that text box and give
me its value. So it's a very long-winded
way of saying store the user's input in
a variable called name. But what's nice
now, even though this is going to be a
bit ugly, is I can then use a built-in
JavaScript function called alert. And I
can say something like hello, close
quote, then plus, which we've seen
before in Python, and concatenate with
it that name's V value. Now, this isn't
quite complete and for reasons I'm going
to wave my hand at. I also need to add
annoyingly return false down here
because otherwise if I click submit,
yes, the greet function will get called,
but the browser will still try to submit
the form to a server which is going to
interrupt my own code. So, long story
short, this is a bit of a hackish
approach for now to just making sure
that the only thing that happens when I
submit this form is that my function is
called. Now, if I didn't screw anything
up, I should now see after reloading
this page a prompt for my name. I'll
type it in and when I click submit, I
should see an ugly but functional alert
box pop up with dynamically generated
text, namely hello, David. I say it's
ugly because by convention, Chrome shows
you the full URL or the domain name of
the website in question, which is my
randomly generated one, which does look
stupid. So, we can do better than this.
But the point is now that I have written
code in JavaScript to listen for the
submission of this form and when that
happens call that their function. And
this is generally the paradigm of
JavaScript. There exists in the context
of websites a whole bunch of events that
can happen. And this is a word we
haven't used since week zero in Scratch.
Recall that in Scratch you have events
like when green flag clicked and when
the green flag is clicked you can do
something in response. Same thing in the
world of web programming. Here are just
some of the events that can happen in a
web page. Like the user can change
something, click on something, drag
something, key up, put the keyboard up,
put the mouse down, or other things.
What I'm listening for is the submission
of a form, which is cool because in
JavaScript then you can essentially
write code that listens for any number
of these events and then does something
when it happens. Consider after all in
Gmail, if you click the little refresh
icon within Gmail itself to get new
mail, it runs some JavaScript code. it
turns out to talk to Google's servers,
get more email, and update your site. If
you click and drag on Google Maps to see
like higher up geographically, well,
what's happening? Some JavaScript code
is listening for your mouse going down
and dragging so as to go fetch more
tiles, more rectangular pictures of the
map wherever you're trying to drag. So,
anything that's interactive in websites
nowadays like that is using JavaScript
by just listening for things that you or
someone else might actually do. Well,
let me go ahead and start opening some
pre-made examples just to give you a
sense of the other syntax that is in use
today with JavaScript. I'm going to go
ahead and open up a version of this
hello program called hello2.html,
which is different in that I'm
practicing what I preached earlier by
putting the script tag at the bottom of
the page just to ensure that the form
and everything inside of it already
exists for sure by the time this code
executes. Moreover, what I'm getting out
of is the business of using the onsubmit
attribute. So, just as I tried to get my
CSS out of my HTML and put it elsewhere,
similarly, I'm trying to get my
JavaScript code like the greet function
out of the HTML and putting it down
here. Now, why is this useful? This is a
big mouthful, but it just follows a
general pattern as follows.
Document.query selector quote unquote
form is just getting a reference to the
actual form element in the page. So if
you imagine in your mind's eye that this
is drawn out as a tree in the computer's
memory, this is just getting me a
pointer to the form node in that tree.
Haven't seen this before, but it kind of
does what it says. Add event listener is
a function or method that you can call
on any element that just tells it to
listen subsequently for this event and
when that event is heard, submit in this
case, call the following anonymous
function, otherwise known as a lambda
function. But long story short, this
syntax just means when submit happens on
that element, execute the code between
these curly braces. What happens? Alert.
Hello. Quote unquote document.query
selector name.val. I didn't bother with
the variable this time. This does
exactly the same thing, but is a purely
JavaScript solution without using the
onsubmit attribute. And we show you this
only because especially for final
projects, you might want to do something
like add event listener to make like
maybe a drop- down menu or some
interactive clickable thing in your
website that just listens for one of
these events to happen before actually
executing some code. Um, notice I've
been conventionally using single quotes
in JavaScript because that's just a
thing in the JavaScript community to
generally prefer single quotes over
double quotes. Why? Well, it means
people in JavaScript are hitting the
shift symbol like much less than the
rest of the world to get double quotes.
It's just a convention. So long as
you're consistent, um either is fine. Um
conditional on not having actual
apostrophes and text and such. Let me
show you one other convention. Instead
of putting my code at the bottom of my
page just before the body ends, it is
also alternatively conventional as in
hello3.html HTML to do this to still
maybe put the script tag at the top of
the page, but to additionally have this
magical line whereby you add an event
listener before you do anything else
that listens for this crazy weirdly
named event called DOM content loaded.
But now that you've heard DOM briefly,
DOM is document object model just means
the tree in memory. This is just the
fancy way of saying when that tree is
loaded, go ahead and do the following.
And this ensures that when a browser
reads all of this code top to bottom,
left to right, this code won't actually
be executed until the whole DOM is
loaded into the computer's memory. That
whole tree is built. So that's all
that's being referred to there. The rest
of the code is actually exactly the
same. Um, what more can we do? Well,
just so you've seen it, I can delete all
of that code, move it to a file called
like hello.js, JS. And in the fourth
version of this example, I'm back to
just HTML because I can put all of the
fancy complexity inside of my script uh
tag here, factoring that code out into
hello 4.js, but the code is otherwise, I
claim, unchanged.
All right, this is a lot. I know it's
quick, but do the general principles
make sense? Like just listening for
events and running some code in
response? That's really all we're
talking about. Allah week zero with
scratch.
All right. Well, let let me let things
escalate just a little bit. And this
time I'll open the demo first. Let me go
ahead and open up hello 5.html which I
wrote in advance, which okay, this is
definitely starting to look like a
mouthful, but in a moment it'll make a
bit more sense. Let me go ahead into my
other tab here. Click back. Go into
source 8, which is all of my pre-made
examples. And I said we're in hello 5
now. And in hello 5, there's no submit
button because watch this fanciness when
I search for something like C uh or
David as a full-fledged word there.
Notice it's just happening inside of the
web page. Moreover, if you poke around,
let me rightclick on the page. Let me
inspect to open my developer tools. Let
me expand the body down here. And
actually, let me reload the page. So,
notice by default, this is what my web
page looks like. It's just got an empty
paragraph tag for some reason. But watch
what happens at the bottom of the
screen. And I'll zoom in a bit more.
When I start typing my name like D and
then let me expand this triangle. You
see it beginning A V I D. When I say
that JavaScript can mutate the DOM, the
actual tree in memory. Like that's what
you're seeing. You're seeing the HTML
preprinted color-coded version of that
tree in memory. And how is it working?
Well, if we go back to the code here,
well, let me wave my hands at this first
line. This just means don't do this
until the whole DOM is loaded. Let's
look at this line, which means give me a
variable called input, and set that
equal to, okay, the input tag on the
page, the text box, and then do what?
Well, take that input, add an event
listener that's forever listening for
key up, like my finger going up off the
keyboard, and when that happens, call
the following function, which has no
name, but that just means call these
lines inside of the curly braces. Well,
what happens inside of those curly
braces? Well, here's a variable called
name. And this is just pointing at the
paragraph tag. Apparently, I'm checking
this question. If there's hm if input
value, so this is like saying if input
value does not equal quote unquote just
implicitly, go ahead and set the inner
HTML of that name
variable equal to hello quote unquote
input value. Now, this is crazy syntax
and I'm showing it just because you'll
see it in documentation online. This is
similar in spirit to Python's F strings.
It's ugly syntax with dollar signs and
curly braces and worse yet back ticks.
However, this is a manifestation of
really the JavaScript community
presumably deciding that if you want the
language to evolve, you have to make
sure you're backwards compatible with
old versions of the language. So, they
chose characters and syntax that
probably do not appear already in the
wild. That's why sometimes things look
uglier, I would surmise, than otherwise.
But long story short, this just means if
there's input there, go ahead and say
hello, input. Otherwise, it says by
default, hello whoever you are. And in
fact, if I go back here and delete my
name, watch what happens. It goes back
to that default. So, here is just an
example of listening for keystrokes
going up and down and making sure that
the page responds accordingly. How about
something else? Let me go back into my
directory listing. Let me open up
background.html, which I wrote in
advance. It's super simple, but this is
the first of like an interactive website
that has three buttons labeled R, G, and
B. As you might imagine, clicking on R
does that. G does this, B does that.
Well, how is this working? This is the
first example now where you can use
JavaScript code to alter CSS
dynamically. So, let me reload the page.
So, it's back to white. Let me open
developer tools and watch what's
happening now on the body tag
specifically. Initially, there's no
stylization on the body other than the
browser's default margins and whatnot
over here. But watch what happens at
bottom right when I click on the R
button. You see that all of a sudden
background color red was dynamically
added. Now it's green, now it's blue.
And notice the HTML at bottom left is
changing too. So somehow I am listening
for clicks and then changing CSS in
response. So if I go back to VS Code,
let's close Hello 5. Let's open up
source 8's uh version of
background.html. And in here, it's a bit
of a mouthful, but the HTML is simple.
Here's three buttons. And because I
wanted them to be uniquely identifiable,
I gave them all IDs of red, green, and
blue, respectively. And then this code
is a bit of copy paste. And frankly, I
could probably avoid that if I were more
elegant. But just to be pedantic, here's
what's happening. Here's a variable
called body that's just getting the body
element, the node in the tree at that
moment in time. And then these three
lines of code, their purpose in life is
to handle the red clicks. How? Well,
we're telling the document to select the
element whose ID is read, listen for the
click event, and whenever that happens,
do this. Body, which is the same
variable as before, dotstyle, which we
haven't seen before, but any element can
have a style property associated with it
in JavaScript. Background color equals
quote unquote red. And the other blocks
of code are exact same thing for green
and for blue. The whole point here is
we're now listening for clicks on
buttons and changing not the contents of
the button but rather the style thereof
of the whole page. As an aside, this is
curios uh curiosity. This is what's
known as camelc case whereby like a
camel has a hump in the middle. This
word has a hump in the middle like
capital C all of a sudden to separate
the two words in CSS. Recall it was uh a
moment ago background dash color. Anyone
want to guess why this is not how you
write it in JavaScript?
Anything with hyphens in CSS is changed
to camelc case in JavaScript.
>> Uh it's not related to comments. It's
simpler than that. Yeah.
>> Yeah. Right. Like left hand wasn't
talking to right hand and people
realize, oh damn it. Like this now means
background minus color which is not a
thing because minus is indeed just like
in C and in Python a mathematical
operator. So, the world decided to
reconcile this problem by just
capitalizing uh the character that would
otherwise be where the hyphen is. Well,
little CSS trivia. All right, what else
can we do? How about a couple of final
examples here? So, what more can we do
with CSS? So, back in my day, too, we
had a tag called the HTML blink tag,
which is among the few tags in the world
of HTML that has actually been
deprecated, that is removed from
language. Like no one removes things
from languages generally, but the blink
tag was so hideous, followed only by the
marquee tag whereby my own homepage is
like a freshman had like welcome to my
homepage just moving across the screen
like this from left to right for no good
reason like an ugly marquee and like uh
on like a digital signage nowadays. But
we can bring it back as follows. So if I
close out my developer tools, go back
into my source 8 directory and open up
blink. This is what the blink tag used
to do back in the day. Now, this version
is implemented instead in JavaScript
code as follows. I have a function here
called blink, which I'm apparently
calling every once in a while. Uh, how
is that happening? Well, let's scroll
down. Here's my HTML, super simple.
Literally just says hello world. But
notice this. There's another global
variable we haven't seen in JavaScript
called window. That refers to like the
general window, not necessarily the
contents of the page, where you can call
a method called set interval. And you
can tell that method set interval to
call a specific function every number of
milliseconds. So if I want to call blink
every 500 milliseconds, that's the line
of code that I use. If I scroll up to
now this function, let's see how blink
is implemented both now and perhaps back
in the day. Well, body is a variable
here that's just pointing to the body
node in the DOM. And this is a big
mouthful, but if that body's styles
visibility property in CSS is quote
unquote hidden, then change that body's
styles visibility property to be
visible. Otherwise, change it to be
hidden instead. Here too, don't
understand why left hand and right hand
weren't talking to one another. You
would think that the opposite of visible
would be invisible, but in CSS, the
opposite of visible is hidden. Just have
to memorize stupid things like that. But
what's this really doing? It's just
changing the CSS from hidden to visible.
Hidden to visible every 500
milliseconds. So in fact what you're
seeing here in the blink is if I inspect
this page too. And now notice it's kind
of fun just to watch it. You can see the
HTML at bottom left and the CSS at
bottom right just automatically changing
because I'm doing that every 500
milliseconds. All right. How about one
other? Well, autocomplete. Well, we saw
a step toward this with my hello, David
example a moment ago. Super common
though in Google and like every website
now to automatically try to finish your
thought. How is that happening? Well,
that's not just HTML and CSS. That is
also some JavaScript thrown into the
mix. So, for instance, let me go into my
terminal and open up source 8's example
called autocomplete.html.
And here I am going to borrow a file
called large.js which is just a massive
version. I'll open that too if you're
curious. Large.js is just a huge
JavaScript array. eras are back
containing all of the words from problem
set five, the spellchecking problem set
where you had a 100,000 plus words in uh
C in a file given to you. Now we've
converted that to JavaScript by using a
global variable like this in the code
here. What's happening? Well, apparently
there's going to be a text box at the
bot at the top of the page that we see.
Then there's an empty unordered list. So
an empty bulleted list. And then there's
this code down here. I'm apparently
creating a variable called input that's
referencing that text box. I'm then
listening for key up just as like we've
done before. And then I'm doing this.
I'm setting a variable called HTML equal
to quote unquote nothing. So an empty
string. And then I'm checking does the
input text box have any value
implicitly. If so, what am I doing? This
is kind of cool. It's a bit of Python
and C together syntactically for each
word in the words array. JavaScript uses
the keyword of instead of in like
Python, but so be it. What I'm doing now
is in JavaScript, I'm saying if that
current word in that big file of 100,000
words starts with whatever the user
typed in, go ahead and add to that HTML
string using plus equals, which is just
concatenation. We've seen plus before,
the following, an LI tag inside of which
is that specific word. And so in effect,
what you're seeing now is what every
almost every website nowadays does.
They're not manually writing HTML like
we've been doing much of today. They're
writing code that dynamically generates
HTML because the programmers understand
what HTML is. They understand that
unordered lists have li children. And so
using this string that I've highlighted,
they're creating LI element after LI
element for the purpose of changing the
inner HTML of the UL element to be the
value of that variable. And this is a
very long way of saying how is
autocomplete implemented in general.
Well, just like this, if I search for
cats by typing in C, there's every word
in that 100,000 dictionary that starts
with C. A T S. And there's every word
that starts with C A T S. Meanwhile,
watch what happens underneath the hood.
If I open up my inspect tab again and I
go to my body, inside of this is the
empty UL, but watch as soon as I start
typing something like C. Now I can
expand the triangle because there is an
LI element that's been created for every
one of the words that match. As I do
ATS, now I've got just four of them. And
there is cats, there is cats skill and
so forth. So anytime you go to
google.com like we did earlier and we
went to google.com and started searching
for cats, where are all of those search
results coming from? Someone wrote
JavaScript that's listening for key up
or the like and then dynamically
populating an unordered list or in this
case a much prettier list of the
matching results. And the final example
that we thought we'd leave you with, and
again the whole purpose of introducing
JavaScript is to give you a taste of its
syntax and its relative familiarity, but
with the power that you can uh the power
with which you can leverage it to make
websites so much more interactive. And
in fact, with Bootstrap, you don't just
get CSS you can use, you have a whole
set of JavaScript functionality. So you
can have drop- down menus and the like.
For instance, for instance, among the
things you'll use for an upcoming
problem set and perhaps your final
project, something that looks a little
like this, uh, in Bootstrap.html,
here's a whole bunch of code that I
literally copied and pasted from
Bootstrap's documentation. And it's just
like boilerplate code for a corporate
website that has features with pricing
and disabled menu options as well, just
for the sake of discussion. And then
here, if I go back into this example,
you'll see fairly simple website that
looks like this. A so-called navbar with
all of the main menu options of like a
corporate website. And notice if you
start to resize the window, which I'll
do here, and put it into sort of mobile
mode because it's so narrow now, thanks
to JavaScript, it's listening for clicks
on this hamburger menu and revealing the
menu options that way. This is quite
like how CS50's own website works and so
many other websites out there. But the
last one we thought we'd use is you're
so in the habit of using Google Maps or
Uber Eats or any number of apps that
need to know your location. That too is
exposed through JavaScript quite simply.
Let me go ahead and in geoloccation.html
HTML open up uh the following code
whereby
super simple even though some new
functions there exists another global
variable in JavaScript in browsers
called navigator which has a property
called an object called geoloccation
which has a function called get current
position that takes an argument which is
just an anonymous function which means
call this code when you're ready to know
the uh coordinates because it might take
a while to figure out your GPS
coordinates and once you do this simple
example is just going to write to the
document that is the rectangular page
the positions latitude that comes back
and the position's longitude that comes
back. So to see this in action, let me
go ahead and uh open up that second tab.
Go back into
geol location.
It's notice for privacy sake, it's
asking me to approve this. So I'm going
to say allow this time. There are
apparently my laptop's GPS coordinates.
And if I go to google maps.com, I can
actually paste this in here. Enter. And
looks like if we zoom in in in okay, I'm
not technically outside, so it's only
close to a degree of precision, but it's
probably mapping to one of the Wi-Fi
access points that's on that corner of
the building. So, we're pretty darn
close, pretty much close enough to get
me my my food or my my ride here. And a
final note, now that you've seen a
little bit of JavaScript, let me go
ahead and open up just 60 final seconds
of uh just how uh how much effort it
took us to put not only this lecture
together, but particularly that example
of the teaching fellows passing packets,
everything we like to think is very
finely flourished here. Uh but here's a
little bit of behind the scenes and
these final 60 seconds together. If we
could dim the lights before we adjourn.
>> Off you go.
Offering. Okay,
Josh. Nice.
Helen. Oh,
Bentimony. No. Oh, wait.
That was amazing. Josh
um Sophie
Amazing.
That was perfect.
>> I think I
over to you all.
>> Oh, nice guy.
That was amazing. Thank you all.
>> So good.
>> All right, that's it for CS50. We'll see
you next time. Heat up
here. Heat. Heat.
All right, this is CS50. This is already
week nine. And I dare say this week is
the most representative of what you'll
be doing after the class if you so
choose to program in the future and
tackle some project that's new to you.
In fact, the closest to this week was
perhaps week six wherein we didn't
really introduce all that many new
concepts but really translated them from
C and to Python. And so this week in
particular, the goal is to really
synthesize the past 10 weeks of class,
drawing upon a lot of the building
blocks that are hopefully now uh
metaphorically in your toolbox and gives
you an opportunity now to apply those
ideas to new problems. In particular,
web programming. So every day you and I
are using the web in some form. Every
day you and I are using mobile apps in
some form. And we said last week that
the languages underlying a lot of those
applications are HTML and JavaScript for
the layout and aesthetics. and then also
in part JavaScript for a lot of the
client side interactivity that you might
experience nowadays. Well, today we come
full circle and bring back a serverside
component whereby we'll again write some
Python, we'll again write some SQL code
and use it to make our full-fledged own
web applications and in turn if you so
choose mobile applications as for your
final project as well. So up until now
when we did anything with the web, you
ran this command last week HTTP server
which literally did just that. It
spawned a so-called HTTP server that is
a web server whose purpose in life is
just to serve up content from like your
current folder, any files therein, any
folders therein. And so all of the URLs
generally followed a certain format. So
if your URL were example.com/reall
just denotes the root of the web server
and so in there typically by default you
would see a directory index. We'll see
today that that goes away because
generally when you visit something.com/
you want to see the actual website, not
the contents of everything in the
server. So we'll see how to address
that. But the URLs up until now have
been of a form like file.html literally
referencing a file in that folder or
folder slash which just means whatever
is inside of that folder or
folder/file.html
or dot dot dot. You can nest these
things however long that you want. And
recall that more generally we said that
you're referring to some kind of path on
the server where pi the p path is a step
of folders ending in perhaps a file
name. So today we're going to generalize
that at least in terms of nomenclature
and start talking more about routes
because essentially in web programming
we are going to exercise a lot more
control over what is in the URL. So back
in the day it referred to literally a
file on the server and as recently as
last week the URLs referred to literally
a file on the server. However, we'll see
in code that we can actually just parse
this that is analyze what is after the
domain name in a URL and just use this
as generic input to the server to figure
out what kind of output to produce.
We're going to see the same convention
though. If you want to pass in specific
parameters, key value pairs, uh we'll
use a question mark after our so-called
route key equals value. And then if
there's another one or more, we'll just
separate them by amperands. And to do
all of this, we're going to recall the
inside of those virtual envelopes.
Recall that if we did something like on
google.com to search for cats, what was
really being sent to the server was a
request for /arch, which notice is not
search.html. There's no folder per se
there. This is just the name of a
program really running on Google
servers. And that's going to be the
so-called route that we ourselves start
programming today. question mark Q
equals cats just meant that the query
parameter the input from the web form is
going to contain in this particular
example the word cats. So how are we
going to do all do this? So we could
implement our own web server in C. It
would be a nightmare to like use a
language as lowle as C and actually deal
with something as high level as writing
code for the web. We're instead going to
use Python for the most part if only
because it's much higher level. But even
then, we would probably if we wanted to
do this thing uh from scratch, we would
have to write a lot of Python code to
like analyze the insides of these
envelopes, figure out what inputs are
being passed to the server, and then
figure out how to access that in Python
code. It's just a lot of work to just
get a web application up and working.
And so what the world generally does is
they don't reinvent the wheel of writing
their own web server. Rather, they use
an off-the-shelf fairly generic web
server or application server as it might
be called. And we for instance are going
to use something called flask. Now flask
is a framework as the world would say or
more specifically a micro framework
which just means it's a library of code
that other people wrote to make it
easier for us to implement web
applications. So they took the time to
figure out how to handle get requests on
a server, post requests on a server,
figure out how to extract key value
pairs from URLs, the sort of commodity
stuff that like literally every web
application on the internet has to do
anyway. So we don't have to retrace
those steps ourselves. What this will
allow us to do is only implement the
problems that we care about by using
this framework. And to be clear, a
framework much like Bootstrap is not
only a library that someone else has
written for you, but it's like a set of
conventions that you follow in order to
use the library in their recommended
way. So it's more of a generic term that
includes library and a set of
conventions. And how do you know how to
use either? You just read the
documentation or take a class in which
we're about to give you an introduction
to some of this right here. So instead
of running today http-server
to start a web server that just serves
up static content files and folders in
our account we're instead going to run
the command moving forward flask space
run and this is going to look for code
that we've written in our current
directory and if it is in accordance
with the conventions to which I'm
alluding by using the so-called
framework then it's going to start our
web application on some TCP port for
instance 8080 as we discussed last week
to do this all we have to have in our
current folder There is minimally a file
called app.py by default. This is
hinting at an application in the
language called Python. And what code we
put in there we'll soon see. And then
ideally we would have another text file
called requirements.ext by convention
inside of which is just one per line the
name of all of the libraries that we
want this web application to include. In
other words, if I go over here to VS
Code, if I don't have such a file,
that's fine, but I want to use a
framework like Flask. Recall our pip
command for installing Python packages.
is I could just say pip install flask
enter and that would go ahead and
install the flask framework or library
for me just like we did a few weeks ago
with installing the silly little cows uh
library as well. I've already done that
in advance and better still I've
installed I've come with uh my code
today both of these files app and
requirements.ext and in fact if I go
ahead and create one just for fun here
all you need do in a requirements.ext
text file is literally put the name of
the library that you want to include and
then you run pip in a slightly different
way to install that library or any other
libraries that are in that file as well.
So let me wave my hands at the
requirements.ext for uh moving forward.
It just means what libraries do you want
to use with this web application so you
don't have to remember or memorize them
and type them all out manually. All
right. So what's going to go inside of
app.py? Well, the minimal amount of code
that we can write to make our own web
application that does something like
print out hello world to my browser
could look like this. Now, there's a bit
of new syntax here, but not all that
much today moving forward. The very
first line just says from flask import
flask, which is a weird way of just
saying give me access to the flask
library. Capitalization no matters. And
so, the package that we're using is
called flask lowercase, but we want to
have access to a special function in
there called flask capital F. So this is
sort of a copy paste line. The next
one's a little weird looking, but it
essentially says give me a variable
called app and turn this file into a
flask application. We haven't seen this
in a few weeks, but there was that weird
if conditional that we put at the bottom
of some of our Python code a few weeks
back that just said if uh dot dot dot
and it mentioned in there name if name
equals equals_.
So we've seen an illusion to name. For
our purposes, name just refers to
whatever the name of this file here is.
No matter what I call it, you can sort
of access the current file by way of
this special global variable. So this
line collectively just means turn this
file into a flask application and store
the result in a variable called app. So
I can now do stuff with flask. And what
am I going to do? Well, down here, let
me first point out a familiar syntax.
I'm defining a function that I called
index by convention, but I could have
called it anything I want whose sole
purpose in life is just to return quote
unquote hello world, which is the super
simple output this web app is going to
display. But, and this is the new
syntax, I'm using here, what's generally
called a Python decorator, which is a
type of function that essentially
affects the behavior of the function
right after it. So, by saying atapp.rout
route quote unquote slash. This is
telling the Flask framework associate
this index function with this route, the
single forward slash. And that's how
we're going to take over the default
behavior of the slash portion of the URL
by telling it to return whatever this
function returns. And we'll see this in
action now. So let me go over here say
to VS Code. And within VS Code, I'm
going to whip up exactly that
application in a file called uh app.py.
Just so as to combine this and some
subsequent examples, maybe the same
folder, I'm going to first create a
directory or folder called hello. I'm
going to go into that hello folder. I'm
going to go ahead and recreate that same
requirements file just for good measure
to tell the world that I want to use the
flask library here. And then I'm
additionally going to create now app.py.
And I'll type this fairly quickly, but
I'm just reciting what we saw a moment
ago. From the Flask package, import the
Flask function, lowercase F, capital F,
respectively. Then give me a variable
called app. Set it equal to that
function call passing in the name of
this file, whatever it actually is. And
then lastly, let's go ahead and call at
app.rout quote unquote slash, which
says, hey, Python, whatever the next
function is, associate it with this
slash route. And so I'm going to define
that function. I could call it anything
I want, foo or bar or baz. But in so far
as slash represents the index of the
website, like the default page, I'm just
going to go ahead and call it by
convention index and then return for now
hello, world. And that's it. So whereas
last week when I was writing code in
HTML files, I was making web pages, now
I've created what we'll call a web
application. And it's an application in
the sense that there's actually some
logic going on there. There's some
functions, there could be some
conditionals, there's clearly a
variable, there could be loops, and all
of the sort of stuff we've seen in
Scratch, NC, and Python as well. We'll
now see back in this Python file. So,
how do we now run this? Well, let me go
back into my terminal window here, and
I'll clear it just for good measure. I'm
going to go ahead and run flask run
enter. I'm going to see some cryptic
looking output, but there's that
familiar pop-up with the green button
that wants to open up this application,
whereas HTTP server uses 8080 by
default. Flask uses port 5000 by
default. And here we have it. I've just
opened up my second tab, and we spent a
lot of time there last week. This is the
server I'm running, not on port 8080,
but on port 5000 today. And there is the
contents of what was spit out by my very
first application. Now, even though the
browser is rendering this like it is a
web page, notice this. If I uh inspect,
if I rightclick or control-click
anywhere on the screen and go to view
page source, you'll see that there's no
actual HTML on this page. It's literally
a single line of text, hello, world. If
I close that and rightclick or
control-click again and go to inspect
like we did last week to open up
developer tools, you'll see that the
browser has actually filled in some
blanks here for me by just rendering as
it should the minimal possible web page.
But the content I actually sent to the
web browser is only literally hello,
world. So how can I actually send a web
page of my own rather than letting the
browser do something like this? Well, I
could go ahead and close that and go
back to my application. I'm going to go
ahead now and hide the terminal just
because the server is still running. And
what I'm going to go ahead and do here
is well, nothing's really stopping me
from returning not just a string of
text, but a string of HTML. And this
might not look pretty, but let me go
ahead and do open bracket doc type HTML
close bracket then HTML then head then
title. And I'll just title this for
instance hello to keep it simple. back
slashtitle back slash head open bracket
body hello, world back slashbody back
sltl uh close quotes and I used single
quotes in this case but I could have
just as easily used double quotes but
that's a full-fledged web page like
that's the minimal amount of content we
saw last week actually you know what for
good measure let's actually add lang
equals quote unquote en so it's actually
fortuitous that you use single quotes
because now I have some double quotes
inside and even though this is not
pretty printed it's just one massive
mouthful of HTML all along one Fine.
When I now go back to the browser,
reload the page as by clicking here, and
then view page source again, here's what
my browser received this time. Indeed,
it's the full-fledged HTML. And in fact,
if I close that tab and reopen developer
tools via inspect, now we'll see in the
tab absolutely everything that I sent
over, including a title, including the
lang equals n. And had I typed even
more, we would have seen that, too. All
right. So, what was the point of this
exercise? It feels as though that I've
really just taken more time, added more
complexity to achieve literally what I
could have done last week by just
creating index.html
myself without any Python code. But I
dare say what we're trying to do is lay
the foundation for a full-fledged
interactive website that maybe has forms
that we can submit to the application
that allows us to generate not just one
page, but maybe two or three or any
number. So what you're seeing here is
sort of the beginning of google.com's
search application or gmail.com itself
or facebook.com or any web application
you can think of begins with a little
code that theoretically looks a little
something like this. But this is kind of
stupid to put HTML hardcoded no less in
one long string here inside of my
application. Let's try to factor this
out. That was a lesson we preached last
week about sort of factoring out our
JavaScript, factoring out our CSS. We
can do the same thing with our actual
HTML here. And so what I'm actually
going to do is import not only the Flask
function, but also another function that
per its documentation comes with Flask
called render template with an
underscore in between. This is a
function whose purpose in life is to
render a template, so to speak, of HTML.
We'll see what we mean by template in
just a bit. But down here, what I'm
going to do is now delete all of that
code. And let me just assume that I'm
going to put that same code in a file
called index.html, html just like I did
last week. So let's instead return the
return value of render template of quote
unquote index.html.
Now that file does not yet exist.
Indeed, if I go into my terminal window,
create a second terminal just so I can
leave the server running but still see
what's going on. I'm going to CD into
that same hello directory, type ls to
list my files, and I only see app.pay
and requirements.ext. But it turns out
per Flask's documentation, if you want
to create your own HTML files, you
simply have to add a directory that by
convention is called templates. And
that's it. So in addition to app.py
requirements.ext, I need a folder called
templates. So let's go back into VS
Code, make dur templates. Capitalization
matters, all lowercase. Now, let me go
ahead and cd into templates and run the
code command and create a file called
index.html in the templates folder. And
then super quickly, let me hide this.
Let me whip up that same page again. Doc
type HTML html lang equals quote unquote
en close bracket uh head close bracket
title close bracket hello and then down
here body close bracket hello, world. So
autocomplete is helping me type quickly.
But now I have a file with my HTML that
this application I claim is going to
spit out automatically for me. So let's
see the effect. Let me go back into my
other browser tab. Let me close the
developer tools and let me quite simply
just click reload. And no apparent
change. It's working exactly as it did
before, but I've laid the foundation for
making a much more useful layout of my
files so that I can actually keep my
logic, my Python code, and my HTML a bit
separate from that. All right. Well, how
can we make this into something even
more interesting? Well, let's start to
take some actual user input for
instance. So, wouldn't it be nice if I
could pass in via the URL something like
Q equals cats, but maybe something like
name equals David or name equals Kelly
and actually see the name that's being
outputed. In other words, let me zoom in
up here and let me pretend like this
happened automatically. Let me do
question mark uh name equals David.
Enter. Well, it would be nice if I saw
hello, David. I'll I'll propose rather
than just hello, world. So, how do I
actually get access to everything after
the question mark? Well, here is where a
framework like Flask and any number of
alternatives starts to shine. It gives
me that answer for uh automatically. And
so it turns out in Flask once you've
used it, you have access to a special
global variable as we'll call it called
request.orgs
where args just means the arguments or
the parameters that were passed in to
this HTTP request. So how do we use
this? Well, let me go back to VS Code
here. And at the very top line, in
addition to importing Flask, capital F,
render template, let's also import
request, which is a global variable that
comes with the Flask framework. And then
I'm going to use it as follows. I'm
going to go ahead and say um a second
argument to the render template function
where I'm going to say placeholder
equals request. Actually, let me not do
that yet. Let me first create a variable
name equals request args. And then let
me go ahead and get the name key from
the arguments. And then down here, let's
go ahead and pass in placeholder equals
name. So what am I doing here on line 8?
I'm creating a variable called name. I'm
storing in that the value that's in the
request global variable in what's
apparently a dictionary called args,
specifically the name key therein. So if
the thing after the question mark name
equals is David, this should give me
David. If it's Kelly, it should give me
Kelly instead. Then what I'm doing is
rendering this template called
index.html, but I'm additionally passing
in some named parameters. We talked
briefly about that in week six when we
introduced the idea that Python can take
not only a commaepparated list of
arguments, but some of which can have
names. So I'm proposing that one such
name of an argument to this render
template function can be placeholder for
instance. Now, at the moment, this code
isn't going to do anything useful. If I
go back indeed to the other tab, click
reload after zooming in, even with my
name in the URL, you'll see that we
still see hello, David. But here's where
things now get interesting. And here too
is what we mean by template. If I go
back into VS Code, open up index.html
again, and instead of putting the word
world there, what I'd like to see is not
hello world, but hello, placeholder. But
of course, if I literally type that, I'm
going to see literally placeholder
unless I surround placeholder with pairs
of curly braces like this. And by using
these pairs of curly braces, I'm telling
Flask that I want to interpolate, so to
speak, that variable. I want to
substitute in its value. So this is yet
another syntax. In Python, we saw
fstrings. In C, we saw percent s. When
using something like print f in an HTML
file, when using flask specifically, we
use these pair of curly braces to denote
this is indeed a placeholder whose value
should be plugged in. So now let's go
back over to the second tab. Recall if I
zoom in that passed in already to this
URL is question mark name equals David.
And this time when I click reload,
voila, now I see my actual name. And
unlike the JavaScript examples last week
which were doing everything client side,
notice here if I go to uh rightclick or
control-click and view page source,
what's noteworthy today is that David in
this case literally came from the
server. This was not rendered client
side. The server sent this HTML and
specifically this text. So, if I go back
to the same tab here, zoom in and change
David for instance to Kelly, what I
should see instead when I hit enter is
hello, Kelly. And indeed, if I go back
to the source code and reload the page
there, I should see in the view page
source that the server sent indeed
hello, Kelly. So, it's in this sense
that it's an application. The URL is
providing input to the application by
way of this URL format, the so-called
get for uh the get string that's being
passed in. And if I look at the code
that I'm running, app.py is the code
that's running. It is grabbing that name
from the URL. I am then passing it into
my index.html file and then my HTML file
is plugging the actual value in for me.
And so what's going on with for instance
these curly braces? Well, here too is
where we're actually using a library.
And included in Flask is another library
called Ginga. And Ginga is what's called
a templating library. And there's so
many templating libraries in the world.
Ginga is actually fairly s simple, which
is nice. And which is why Flask uses it.
And for now, you can just think of Ginga
as being the library that knows how to
interpolate variables inside of pairs of
curly braces. So why are we introducing
yet another frame, another library? of
all the folks who implemented Flask
decided that it was not worth their time
reinventing the wheel of a templating
language, a language via which you can
figure out what values to plug in where.
So they just lean on another library
that someone else wrote years prior so
as to not reinvent that wheel
themselves. And that's all that's going
on with a framework. In this case, it's
using perhaps multiple libraries
instead. All right. So what then is a
template? So this then is a template.
What you're looking at here, hello,
placeholder, is a template in the sense
that it's kind of the blueprint for the
web page I want the user to see, but
it's going to be dynamically generated
using indeed this blueprint by plugging
in the value of placeholder inside of
those pairs of curly braces. And so
that's why index.html starting today is
in a folder called templates because
this is not just static HTML like the
stuff we wrote last week. This is the uh
the the the blueprint for the actual
HTML that we want the browser to spit
out. But there's a bug here. Notice
what's going to happen here. If I go up
to this URL and I get rid of the name
altogether, for instance, I just visit
the slash route without any key value
pairs and hit enter. This is sort of bad
bad request. It's an HTTP 400. In fact,
if you look at the tab, here's another
HTTP status code that we probably
haven't seen before. But 400 just means
the user did something wrong by not
passing in the parameter that was
expected. Well, that's a little bad
design if like the user has to manually
type in things to the URLs. Like no
human actually does that. That's not
good for business or customers in
general. So I can go back into app.py
and just make a little bit of
conditional code here. And here's too
where we see what makes this an
application and not just a static page.
Instead of just blindly getting the name
here, I could instead do something like
this. Well, if the name parameter is in
request.orgs, and this is just Python
syntax for asking if this key is in this
dictionary, then I'm going to go ahead
and define name and set it equal to
request.orgs quote unquote name. Else,
if there is no name in the request,
well, then I might as well give some
default value like name equals quote
unquote world. And that alone logically
makes sure that I only try to access
request.org's name if the key is
actually there. So, if I go back to the
browser now, reload without anything
else in the URL. Now, we're back in
business and it's saying hello, world.
But if I go up to the URL bar and add
name equals David, enter, that too now
works. So, it's a web application in the
sense that not only does it have
function calls as well as a variable,
but now we've got some conditional logic
with boolean expressions as well.
All right, questions on anything we've
done thus far because it was a lot all
at once. Questions thus far? Yeah.
>> Good question. Let's try that. What if I
just did question mark name equals
nothing? Well, let me go back to that
other tab. Uh, delete the name David and
hit enter. And I indeed see hello,
nothing. Why? Because the name key is
provided now. It just doesn't have a
value. And so the conditional has the
same answer. Well, yes, name is in
request.orgs, but there's just no value
associated with it. And here again is
the value or a hint at the value of
using a framework like flask. The fact
that I can just import the request
global variable and then ask questions
like is this parameter in this
dictionary means I don't have to write
any of the code that like figures out
what the URL looks like, break it apart
between the question mark and the equal
signs and any amperands therein. That's
all sort of generic logic that every web
application has to do. So again, Flask
is sort of doing that lift for me and I
can just focus on the logic that I
actually care about. All right. Well, a
quick convention here. It's I've used
the word placeholder here just to kind
of hit the nail on the head and make
clear this is a placeholder, but frankly
it's a little more readable
stylistically to not just put hello
generic placeholder, but to say
something like hello, name so that a
colleague or even myself looking at this
file down the line knows that okay,
we're trying to print out the user's
name here. That's fine. You can change
the name of these variables to be
anything you want. And even though it
looks weird, it's conventional in Flask
to do something like this. Name equals
name. But each of these names means
something different. This is the name of
the placeholder that I'm going to put in
my actual template. This is the value
that I actually want to give it. And it
just keeps me a little ser by just
reusing the same name instead of calling
it placeholder or placeholder 1,
placeholder 2, placeholder 3, or
something generic like that. Now it's
just a little clear even though it looks
weird to say name equals name. Again,
that just allows me to do this in my
template. All right. Well, what more can
I do after that? Well, let me propose
that we can actually go in and simplify
this code a little bit. It turns out
this is so common to just ask a question
as to whether the parameter is there and
then do something with it or not that
flask comes with some logic to do this.
And in fact, I can get rid of all four
of these lines. Just go ahead and with
confidence declare a variable called
name, set it equal to request.orgs,
arcs, but in the so-called dictionary,
use a function called get that comes
with it, which technically doesn't
relate to the verb that was used by
HTTP. This just means literally get me
the following. And if you want to get
the parameter called name, you literally
just say quote unquote name. However, in
case there is no name parameter, you can
also give this function a default value
like world. And so now we've collapsed
into four lines uh from four lines into
one that exact same logic. So this gets
me the HTTP parameter called name. But
if it's not there, it gives me a default
value of world. So that no matter what,
this name variable has what I care
about. Indeed, if I go back over here,
let's type in how about name equals
David again. Enter. That's there. If I
type in uh no name, enter. That too is
now working as well. All right. Well,
let's see if we can refine this a bit
more. Let me propose that in our next
version of this. Let's introduce a
second route. So two URLs. Much like uh
Google has many different URLs as does
most any web application. At the moment,
I'm doing everything in my slash route.
So how might I move away from this?
Well, let me go ahead and not only add a
second route, but an actual form via
which the user can type in their their
name. So to do this, let me propose that
in index.html, HTML. Instead of just
printing out the user's name and
trusting that they're going to have
typed their name in manually to the URL,
which again is not normal behavior,
let's actually show the user a form via
which they can do exactly that. So
here's my form tag. Uh let's say the
method I'm going to use is get so that I
see everything in the URL. Let's give
myself an input uh that whose name is
name because this is the human's name.
And notice somewhat confusingly, this
name on the left is the HTTP, sorry,
this name on the left is the HTML
attribute that we saw last week. So,
it's different from what we just did in
Python, even though they're all called
the same thing. The type of this input
is going to be text. And let's go ahead
and make this a little more user
friendly. Let's put some placeholder
text called name, so the human knows
what what to type in. Let's go ahead and
disable autocomplete just so we don't
see previous input into this text box.
And let's autofocus it so that the
cursor is blinking in the text box by
default. Then lastly, let's go ahead and
have a button the type of which is
submit. So that clicking this button
actually submits the form. And I'm just
going to call this button like greet
because I want the user to be able to
greet themselves by clicking this
button. Now I should specify action. The
only other time we used action is when
we actually went to httpsw.google.com/
google.com/arch
that's not relevant today because I'm
trying to print hello world not search
for cats and such but this is where I
too have control if I want to submit
this form to a specific location on in
my web application action is where I can
specify it so why don't I pretend that
there exists a route in my application
called /greet and if you go to
example.com/greet
question mark name equals David this now
will greet the user with hello David for
instance, but slashgreet does not exist.
If we go back to app.py, literally the
only route that currently exists is
single slash, but I can change that. I
can go into my uh app.py as I have here
and below this function, I can go ahead
and define app.rout quote unquote /greet
and just invent any route that I want. I
can then define a function that will be
called whenever that route is visited.
By convention, to keep myself sane, I'm
going to call the function the same
thing as the route, but you don't have
to do this. It's just to minimize uh
decisions I have to make. And then in
this function, what I'm going to do is
this. Return render template greet.html,
which doesn't exist yet, but that's a
problem to be solved. And then I can
pass in the name of the user. I'm going
to go ahead and save myself a line of
code and just say request.orgs.get
quote unquote name, world. In other
words, strictly speaking, I don't need
that variable on its own line. This has
the effect of what we already did in
index, but I'm doing it all in one
elegant oneliner. And now in index, in
so far as I want the index of the site
to just show the user the form via which
they can type in their name, this one's
easy now. Render template quote unquote
index.html
and return that template. So to recap,
here's index.html, HTML which is now a
form instead of a template for hello,
such and such. App.py is going to return
that template whenever I visit the index
or slash of the page. And then this
greet route is going to handle the case
of printing out greet.html passing in
the user's name. All right, I think I'm
not quite good to go yet, but let's try
this out. Let me go back to my browser
tab, reload, and there we have it. I
have a web form now instead of the uh
the hello, soando, I'm going to go ahead
and type in my name. And notice the URL
at the moment, even though Chrome is
hiding it, technically it's there slash,
but Chrome and most browsers today sort
of hide as much stuff as they can if
it's not all that intellectually
interesting. But watch what happens when
I click greet to the URL. It
automatically sends me to /Greet
question mark name equals David. And
this is just like the way the forms
worked last week when we recreated our
own version of Google in search.html
because the action there was
google.com/arch.
The user was whisked away to Google
server. Today I stay on the same server
because the action I used was quite
simply slashgree which is assumed to be
on my own server. But clearly I screwed
something up because I have a big
internal server error in front of me as
you soon will too. Odds are as you dive
into this uh 500 is the status code that
means your fault somehow. Now why is
that? Well, it's unclear from this
generic black and white message.
However, because I'm the developer, I
can go back to VS Code, open my terminal
window, and recall that I have two
terminals open now. One that I can type
stuff in, the other of which is still
running from before. Let me open up that
one. And you'll see if I maximize my
terminal window, a whole bunch of scary
error messages here. But the relevant
one is probably going to be, let's see,
down here. Race template not found
error. Ginga exceptions template not
found. Greet.h. html. So there's a lot
of esoteric error messages here, more so
than usual, but the simple fact is that
I just screwed up and I did not create
greet.html. So file not found by the
server. So the user doesn't see all that
complexity. That's deliberate by design.
It's generally not good for cyber
security. if you're revealing to the
user all of the error messages that are
happening on your server because maybe
that suggests they can hack in some way
some way by taking advantage of those
error messages and the information
implicit in them. But they are there in
your terminal window to actually see and
diagnose. So how do I fix this? Well,
not a problem. Let me shrink my terminal
window back down. Let me code a file
called greet.html.
And in greet.html, let's create the
template via which I'm going to greet
the user, which ironically is the exact
same as index.html HTML used to be. So,
let me recreate that real quick. Uh, doc
type HTML. Let me close my terminal.
HTML lang equals en uh head uh title
hello body hello, and there's my uh
here's my placeholder hello, name. So,
to be clear, the index.html template
doesn't have any curly braces or
anything dynamic. It just spits out the
HTML for the form. Greek.html HTML spits
out HTML and the actual greeting. And
it's app.py that decides which of these
to show the user. Either index.html if
they visit the slash route or greet.html
if they somehow find their way to the
/greet route, which they will
automatically by simply submitting that
form. All right, so let's go back into
this internal server error and go back
to the form. Nothing has changed with
the form, but now when I type in David
click greet, not only will the URL
change to be slashgreet question mark
name equals David, I actually now see
the content that I expected a moment
ago. All right. Well, now it's a
opportunity to critique. I have these
two templates open, index.html and
greet.html. And even if you've never
done web programming before and even if
you've never did HTML before last week,
what is bad about this design
intuitively?
>> Say again.
>> Abstraction.
>> Abstraction in what sense?
>> Yes. So that's exactly the the hangup I
have here. There's a lot of duplication.
And technically I didn't copy paste
though I might as well have because
notice as I very hintingly go back and
forth almost every line of code in these
files is the same except for the form
which is there or not there or the hello
comma like all of the boilerplate HTML
namely everything I just highlighted
here lines one through seven in
greet.html HTML and this and this is
what we really start to mean about a
template. Like wouldn't it be nice if we
could factor out all of that HTML that's
common to both files, put it in
literally a template that both routes
can use so that I can write that
boilerplate code once instead of again
and again. Cuz imagine in your mind's
eye, well, if I have three routes or
four routes or five routes, I'm going to
be like typing the same darn HTML three,
four, five times. That's got to be dumb
and that's got to be solvable as we've
seen in other languages as well. So, let
me indeed go ahead and try to improve
this. And the syntax is a little weird,
but it's the kind of thing you get used
to quite quickly. I'm going to go ahead
and create a third HTML file now by
going back to my terminal window inside
still my templates directory. And by
convention, this file is going to be
called layout.html. Why this? That's
what the flask documentation tells you
to do. So, in layout.html, HTML. I can
pull all of my boilerplate HTML, the
stuff that is invariant and doesn't
change. So, here we go. Doc type HTML uh
HTML tag lang equals en close bracket
open bracket head open bracket title.
We'll call it hello for all of the
pages. Open bracket body. And here's
where it gets interesting. The body is
the only thing that has been changing in
these two examples. In index.html,
it was a web form. In greet.html, HTML.
It was just a simple string of hello, so
and so. So, what I want to tell Flask is
that everything in the body will just be
a dynamic block of code. And the syntax
for that, which takes a little bit
getting used to, but it's also sort of
copy-pasteable. Block body using percent
signs this time. And because I don't
want any such body in the template, I'm
going to literally close this block as
follows. And here you see another
example of sort of HTML like syntax but
instead of using angled brackets, Ginga
uh the templating library that Flask
uses uses curly brace and percent sign
to open the tag and then the opposite to
close it. So what you really have here
are two Ginga tags as we'll call them.
This one is called block and I'm
defining an arbitrary name here. I could
have called it foo bar or baz but
because I want this block to refer to
the body of the page by convention I'm
going to call it body. And then this
weird syntax which is used in some other
languages too just means end whatever
block you just began. And so again you
just see reasonable people disagreeing.
The people who invented HTML use nice
angled brackets and words like these.
The people who came up with ginger used
curly braces and percent signs. Why?
Well, odds are these are not normal
symbols that a human would type when
writing uh code, at least in HTML. So
they just chose something that probably
wouldn't collide with actual syntax the
human wants to use. So that's it for the
template. This is now a uh this is
essentially a blueprint that doesn't
have just a placeholder for a single
word or value like name. I can put a
whole chunk of code here now instead.
And how do I do that? Well, let me go
into index.html with the moment which at
the moment is a little duplicative in
that it's got all of this boilerplate.
So you know what? I'm going to go ahead
and delete everything that is already in
my layout both above and below that web
form. And now I'm going to use a bit
more ginger syntax. This too takes a
little while to memorize or copy paste.
But if I want index.html
to use the layout.html
blueprint, I can simply say extends
layout.html
and then close tag using percent sign
close bracket here. And then if what I
want to plug into that layout is the
following code, I can say as before
block uh body and then down here I can
say
end block. And that's it. And just to be
a little nitpicky, I'm going to
de-indent that slightly. And now even
though it looks like web pages suddenly
look a lot uglier. Well, they do because
like this is weird looking syntax, but I
have now distilled index.html into its
essence. This is the only thing that
changes visav the greeting page. And so
I've put my HTML here that I care about.
I've said to Flask, this is what
index.html's
body block shall be. Where to put it?
Well, put it into that particular
layout.html file. And so the logic for
greet.html is the same thing. It's going
to look just as weird, but again, you
get used to it. Let's go ahead and
delete everything that's boilerplate in
greet.html, both above and below. up at
the top. Let's tell Flask that
greet.html 2 extends layout.html.
And let's go ahead and say to Flask that
the block uh called body shall be this
for greet.html.
And the end of this block is now down
here. And just to be nitpicky, I'll
de-indent that too. So again, the pages
look a little weirder now, but it's
going to follow a paradigm that we just
see again and again, such that the only
juicy stuff is what's inside of that
body block. So now, if I go back to my
layout, it looks exactly like this. This
indeed is a placeholder, not just for a
single variable like name or the
placeholder we did before. This is the
placeholder for a whole block of code
that came from a file, not from a
variable. And so if I go back into my
other tab here, go click back to go back
to the web form and reload. Notice that
I have the familiar looking form. But if
I now look at my developer or if I look
at view page source, notice everything
that came from the web page from the
server. Here's that boiler plate up
here. Here's that boiler plate down
here. And here's the stuff that's unique
to this page. And recall too,
aesthetically I de-indented it, which is
why it's now no longer pretty printed in
what the browser sees. Like that's okay.
There's no reason to obsess over the
indentation and the pretty printing of
what the browser sees. Ultimately, the
reason I did this indentation is because
arguably when I'm in VS Code here and I
look at index.html,
this is clearly indented inside of the
body block just so I know what's part of
that block. The browser does not care
about superfluous whites space or less
thereof.
All right, questions on what we've just
done here, which is to truly take this
template out for a spin and now remove
what redundancies I had accidentally
introduced.
Questions?
No. Okay. Amazing. All right. Well,
let's go ahead and look at this URL
again. I'm not liking the fact that
every example we've done thus far
involves putting my name or Kelly's name
right there in the URL bar. Well, why is
that? Well, if I have like a nosy
sibling and they sit down at my browser,
they're going to see like every URL I
visited, including whose name was
greeted. Now, that's not all that big a
deal, but now imagine it's a username
and a password that the form is
submitting or a credit card number that
the form is submitting or just search
terms that you don't want the world
knowing you're searching for. They're
going to end up in the URL bar. Why? If
you are using method equals get for the
form, that's how get works. It literally
puts all of the HTTP parameters in the
URL, which is wonderfully useful if it's
sort of uh low stake stuff like the
Google search box or if it is um or
potentially low stake stuff like the
Google search box or if you just want to
be able to hyperlink directly to a URL
like this. In other words, if I put this
into an anchor tag open bracket a href
and a URL like this, I could deep link a
user to a web page that just always says
hello, David. So get strings contain all
of the requisite information to render a
page for the user. But this isn't really
good for privacy. So recall that there's
not only get, but there's also something
called post. And post is just a
different HTTP verb that essentially
with respect to those virtual envelopes
next last week sort of puts the
information more deeply inside of the
envelope such that it's not written
right there in the URL bar, but it's
still accessible by the server. So if I
do this, watch what happens. Let me uh
go back into VS Code. Let me go back
into index.html which has the form. And
let me quite simply change the method
from get to post. And now let me go back
to my other browser tab. Back to the
form and reload so that the form knows
that the method has changed. Now type in
David and click greet. And before I do
that, let me zoom in on the URL bar.
Notice that the URL does change. I'm at
slashgreet, but I haven't revealed to
the world or to anyone with physical
access to my browser what URL I just
searched for. All they know is that I
went to /greet, but not the key value
pair or pairs that were passed in. Of
course, this clearly hasn't worked. I've
got an HTTP status code of 405, which
means method not allowed. That's because
flask by default when defining routes
simply assumes that you want get instead
of post. Now, get is good for the
default page. In fact, when I go back
here, this is equivalent to me visiting
the slash route just in the browser. So,
I want my index to generally support
get, but the greet route should support
post. And the simplest way to do this is
to pass in another argument to the route
function, which we haven't needed before
because the default is get. And I can
instead tell flask a commaepparated list
of the HTTP methods that I want this
route to support. So if I wanted to
support just post, I can pass in a list
containing just post. And recall FL uh
Python uses square brackets for lists,
which are their version of arrays in C.
Now by default, this argument is this
methods equals get. And that's why the
only thing supported a moment ago was
get. That's why I'm now changing it to
be post instead. I have to make one
other change though. It turns out if you
read the documentation when accessing
HTTP parameters via post instead of get
you move from using request.orgs to
request form. This is completely
unintuitive that request.orgs is get and
request.form is post because they all
come from forms. So it's bad naming
admittedly. So you just kind of have to
remember request.orgs is used for get.
Request form is used for post. So all I
need to do further is change this to be
request.form
and that's it. Now my web application
will support web form submitting to it
via post instead of get. Let me go ahead
and type in my name. Now I'll zoom in.
Notice that the URL will again change to
/greet with no parameters evident. But I
will be greeted this time because the
server knew to look deeper into that
envelope for those key value pairs
instead. And just to be now uh sort of
diagnostic about this, let me go back
once more. Let me rightclick or
control-click on my desktop and go to
inspect. Here's where developer tools
can be super useful as well. I'm going
to go in here and I'm going to go ahead
and clear this. And now I'm going to
type in David again and I'm going to
click greet. But because I have the
network tab open like we played with
last week, it's going to show me all of
the requests going from my browser to
server, which is going to be useful here
because not only do I see, okay, it
obviously worked because I got back a
200, but if I click on this diagnostic
output, I can actually go to the payload
tab here and I'll see that the form data
that was submitted was name, the value
of which was David. So you can see what
you're submitting. So you can do this
today like if you want to log into some
website uh Gmail or otherwise you can
actually see all of the data that your
own keyboard is submitting to the server
even if it's using post because the
browser that you control of course can
see the same there.
All right, any questions now on this
transition from get to post
kind of on a roll or not going so well.
We'll see. All right, so what more can
we do with this? Well, let's give
ourselves a couple more building blocks
before we transition to actually
implementing some real world problems as
I did years ago with one such example.
Suppose that I don't like this direction
I'm going in in so far as every time I
have a page with a form, it submits to
another route altogether. Cuz in your
mind's eye, just kind of extrapolate.
Well, if I have two forms on my page, I
now need four routes. If I have three
forms, I need six routes. It seems a
little annoying that you use one route
just to show the form and another route
to process the form. This is going to
get annoying over time because it's like
twice as many routes as might be ideal.
So, is there a way to get kind of the
best of both worlds and combine these
two routes into one so that everything
related to greeting the user all happens
in one place? Well, you can as follows.
What I'm going to go ahead and do is
delete my greet route al together and
most of my index route. But I'm going to
ask a question. I'm going to first say
that the methods that the index route
support now shall be both get and post
as a commaepparated list there. And then
inside of my index route I can simply
ask a question of the form if the
request that is submitted to the server
has a method of post then assume that
form was submitted. This is just a
Python comment note to self that I'm
going to come back to in a moment. else
if the request method is not post. So I
could technically say if l if uh l if
request method equals equals get then
but this is kind of dumb because I only
support two verbs. So I might as well
just assume for efficiency else handles
the get implicitly then go ahead and
assume that no form was submitted. So
show form. So just notes to self as to
what I want to do. So how do I show the
form? Well this line was easy. return
render template of index.html.
If though the form was submitted, what
do I want to do? Well, just as before,
let's return render template greet.html
passing in a name value of
request.form.get
quote unquote name else a default value
of world. So, the exact same logic from
each of the two functions a moment ago,
but I've now combined them into one by
just using some conditional logic and
just asking the server if the user got
here via post, well, the only way they
could have gotten here via post is by
having clicked that button and submitted
the form. So, let's just go ahead and
greet them. Else, if they got here via
get by just typing in example.com or
whatever the actual URL is, let's go
ahead and show them the template. So,
it's still good design in that I have a
separate template for each of these
pieces of functionality that is only
minimally different, but I'm sort of
deciding which of those to show based on
the actual logic in this here app. All
right, so this is almost perfect except
for one bug. What else needs to change
if I've just combined my greet route and
this default slash route as well?
Yeah.
Yeah. So, in the form that has
index.html, recall that there's an
action line that specifies like to what
URL do you want to submit this? Well,
let me go back to index.html. It can't
be /greet anymore because that doesn't
exist. So, I'm just going to delete the
word greet and submit it to slash
instead, which will have the effect of
also just omitting it entirely. If you
don't specify an action, it submits to
the very location that it came from. But
if you want to be pedantic and even more
clear, just specifying that the action
now of this form is just this, then that
will work here, too. All right, so let's
test it. Let's go back to the other tab.
Back to the form, reload. It's blank
now. I type in David. Click greet. And
this two is working. But again, if I go
back and reload, get is working as well.
But there's nothing ending up in the URL
because I'm now using post, which again
tends to be a good thing for privacy
reasons as well. Let me show one final
flourish before we transition to
something realworld motivated. If I go
into app.py, for a while now, I've been
passing in this default value of world,
which is fine, especially if it's
something short and sweet. That's the
default value. But I can actually put a
bit of conditional logic in my template
as well. So, in fact, let me go into
greet.html HTML and trust that I will
now be passed in a name variable. But I
can decide for myself in the template
whether I want to say hello name or if
it's blank hello world instead. And how
might I do this? Well, I can always say
hello, but then I'm going to use some
Ginga syntax that we haven't seen yet.
But it turns out in Ginga, the
templating language that Flask uses, you
can use Python-like syntax too. And you
can ask questions like well if uh the
name variable has a value well then go
ahead and output the value of that name.
Else if the name variable does not have
a value go ahead and output a literal
value like world. Uh and then down here
end if. So ginger again is a little
weird in that it says end block end if
but that's the way it is. But even
though this looks a little weird, it's
just a nice clever way of putting a bit
of logic into my template. And if the
name has a value, so it's not empty or
none, go ahead and display it. Hence the
curly braces. Else go ahead and
literally say world. Why is it not
problematic? And you can see the dots
here that there's all of this white
space after the word hello,
like otherwise this would seem to create
quite a messy paragraph or phrase of
text in terms of whites space. But
>> HTML ignore ignores superfluous whites
space. So anything more than a single
space just gets canonicalized or
collapsed into a single space. And we
saw that recall last week accidentally
when I had those three paragraphs of of
text uh from uh from the duck, but I
wanted them deliberately to be separate
paragraphs and they weren't because all
of that white space was ignored until I
actually introduced the uh paragraph tag
instead. So this just moves some of that
logic. now to the templates. So for all
this logic and more, here's the official
documentation for Flask and specifically
Ginga's own documentation, but for the
most part, we've seen what's possible
already. And I promised a real world
example. So here now it is. So uh back
when I took CS50 as a sophomore, there
was no web programming in the class. And
frankly, there was barely any web
actually in the world because it was all
so new HTML and the like. But uh it was
my sophomore, spring maybe or junior
fall that I also got involved in the
freshman inter mural sports program or
frost IM's for short. And back in the
day uh we would walk from say Matthews
Hall to Wigglesworth uh freshman year at
least to register for sports by filling
out what was called a sheet of paper and
then you would go to the proctor's dorm
room and slide it like under their door
or through the mail slot and that's how
we registered for sports. It was sort of
ripe for disruption before that was even
a phrase. And so one of the very first
projects I took on myself personally
after taking CS50 was to figure out how
web programming worked. And Python
wasn't really a wasn't a thing yet uh
nor was half of the topics we've been
talking about thus far. But at the time
I learned a programming language called
Pearl. I learned a little something
about CSV files which we did a couple of
weeks back too. And I built this the
freshman intramural sports website via
which you could click on a bunch of
links and get some information. But most
importantly, you could register for
sports as by typing in your name,
selecting the sport for which you want
to register, click submit, and no longer
walk across Harvard Yard with a piece of
paper to actually register for sports.
So, we thought we'd use this as sort of
the beginning of a motivation for how we
can now solve problems using web- based
interfaces using code. Um, and also what
not to do, like background images that
repeat like this are not really in
fashion anymore, nor arguably in 1997.
Um but let's leave that as a cliffhanger
and come back in 10 minutes after a
snack with re-implementing the frost
IM's website. All right, we are back. So
among the goals now are to recreate the
beginnings of a site like this for frost
IMS whereby we want to enable students
to uh visit a form, fill out that form
and submit it to a server and then
register. And we'll dispense with all of
the amazing graphics and such and keep
it fairly simplistic and core HTML. So
let's go ahead and do this. Back here in
VS Code, I've gotten ready now for this
next set of examples. And in particular,
I've created in advance a directory
called frost im.py,
requirements.ext, and templates, which
are essentially the same as the ones we
just created, but I stripped out the
hello and greeting specific stuff. I'm
going to go ahead in this terminal and
do flask run. So, I get the server up
and running again on port 5000. And then
I'm going to go ahead and open up
another terminal here as I did before.
cd into frost ims in that terminal where
I'll see the exact same files and I'll
give you a quick tour of what I created
in advance. So here in app.py is quite
simply the simplest of applications that
just renders the index.html template
with an expectation in a moment that
we're going to make it more interesting
than that. Meanwhile, if I open my temp
uh my terminal again and open up
requirements.txt, it just mentions
flask, but it's already installed. So no
more to say about that for now. Now, let
me go ahead lastly and open up
templates, uh, the templates folder. Two
files there in the first of which is
layout.html, which looks almost the
same, except I did add a slightly more
userfriendly tag to the head of the
page, which you might not have seen
before, but this is a tag that
essentially you can copy and paste into
templates of your own that help the
content of a page resize to be mobile
friendly. In fact, without this line, if
you were to develop problem set 9 or
your final project for the web and then
try to access the site on a phone,
everything might look quite a bit too
small, font sizes and more, this line
tends to help the browsers resize
dynamically so that it actually matches
the width of the devices own width. For
instance, a phone versus a laptop or
desktop. But otherwise, everything else
is the same there, including the
placeholder for the body block that I've
defined here on line 9. Lastly, there's
one more file that at the moment doesn't
do anything all that interesting except
is ready to contain the contents of the
registration form for frost IM. So,
let's go ahead and start with actually
that. Let me quickly whip up a form that
minimally gives the user something that
they can submit to the server to
register for sports and then we'll
improve upon it a bit iteratively. So,
here inside of the body of index.html,
html which is going to extend the actual
layout, the blueprint we already
created. I'm going to have a quick title
for the page like register just to make
clear to the student what they need to
do using the H1 which is the big and
bold tag. Then I'm going to go ahead and
have a form tag uh whose uh action is
going to be anything I want, but since I
want the user to register, I'm going to
have it go to slashregister, which makes
more sense semantically than greet now
because we're doing something else. The
method I'm going to have the student use
is post, if only because they don't want
their roommates knowing what they
visited in their browser. So this way it
will tuck the HTTP parameters deeper in
that virtual envelope so it's not stored
in the browser's history. Inside of this
form, I'm going to have minimally an
input box for the student's name. So
I'll call that aptly name and set name
equal to name in my HTML. The type of
this text box will be exactly that text.
And then just to make it a little more
user friendly, I'm going to add a
placeholder of name so they know what to
do. I'm going to go ahead and uh turn
off autocomplete in case multiple
roommates want to uh sign in from the
same computer, register from the same
computer. And then we'll turn on
autofocus to put the cursor in that name
box. And then, and you didn't see this
last week, but if you've ever wondered
how drop-own menus are implemented in
HTML, if you've never done this
yourself, those drop-own menus on web
pages are called select menus. And if I
want the user to select a sport to
register for, I'm going to call this
input a uh sport. And this is an
alternative to just having a generic
text box where we have the students type
in the sport they want to register for
which would be fraught with
typographical errors and changes in
capitalization. A drop-own menu of
course standardizes what the human can
select. So inside of this dropdown I'm
going to have a few options. uh the
first of which uh will be uh basketball
for instance, the second of which will
be soccer and the third of which I think
was the first three with which we
debuted back in the day was ultimate
frisbee. Now these option tags can take
some attributes. Uh by default they will
take on the value of whatever words are
typed in between the open and close
tags. But just to be pedantic I'm going
to make clear that the value of
selecting this option shall be
basketball. But I could change it to be
something else if I so chose. The value
of this selection will be soccer and the
value of this last option will be
ultimate frisbee just in case I want to
store something else in my database
ultimately. Now that is a complete
index.html I think. So if I go back to
uh my browser tab which previously was
showing me the hello program because I
stopped and restarted Flask and you can
stop flask by just hitting C uh for
interrupting it. I'm going to reload the
page and I should now see okay a
slightly more interesting form with a
name box with the uh cursor is blinking
there and then a select menu a dropown
with three options. Now it's a little
presumptuous of me to select basketball
by default and in fact this is kind of
inviting user error if they type in
their name don't really think about it
and now register for basketball
accidentally. So I'm going to make a
couple of improvements here. I'm
actually gonna have essentially a blank
option at the top whose value is nothing
and I'm gonna have it just labeled
sport. And just to be super clear, I'm
going to select this value by default.
So the option tag in HTML supports not
only a value attribute, but it turns out
a selected attribute, which if present
means that's the option that will be
selected by default. So if we go back
now to this page and reload to get a new
copy of the HTML, looks a little better.
I still have the name at left, but the
sport now menu looks like this. So, it's
a little more clear what I want them to
do from this dropdown. And sport
deliberately on the back end won't have
a value. And theoretically, this will
help me determine if they actually
selected a sport or just clicked
register and ignored the drop down
still. But I do need a way for them to
register ideally by clicking a button.
So, I'm going to add a button, the type
of which is submit. And then I'm going
to have this button's label be register.
So now if I go back to the form once
more, reload, I now have I think a
complete form, albeit not very pretty,
via which David can register, for
instance, for basketball by clicking
register. And ah darn it, I have a 404
not found. But why is that?
Why is nothing yet found? Why is
slashregister not found? Yeah,
>> what's that?
>> I haven't Well, I haven't linked the
option to anything. I think the form has
been linked. Whoops. The form is telling
the browser to go to slregister. So,
this is correct behavior. But if we go
to app.py, like there's no route defined
for slregister. So, of course, it's not
found because there's an infinite number
of routes that don't exist and register
is currently among those. So, I can
define that myself. I can say app.root
quote unquote register. Uh, I do want to
use post. So I need to proactively say
that the methods this uh function will
support will be indeed post instead of
the default of get. I'm going to define
an actual function to call when this
route is used. And by convention I'm
going to call it just register even
though I could call it anything I want.
And inside of my register function, well
for now I'm going to cheat a little bit.
I'm going to at least just say uh I'm
going to at least check that the user
has given me a name and a sport. So how
can I express this? Well, because I have
already imported the request global
variable that comes with flask, I can
ask questions of it. And I can say
something like if it is not the case
that request.form.getame
has a value or if it's the case that or
if it's not the case that
request.form.getport
has a value, then let's go ahead and
give the user uh a warning of sorts.
I'll return render template of a file
called failure.html.
This doesn't exist yet, but no big deal.
Let me go back into my terminal. Let me
uh go into templates and create a file
called failure.html.
And in this file, I'm going to say that
it extends
uh layout.html.html.
And then it has a block body inside of
which is going to be something like
super trivial for now, just to get us
going. And this failure page is simply
going to say you are not registered
exclamation point and then end block. So
that's it. Just sort of an error page
that now exists. I'm going to close it
out of sight, out of mind. But I think
this now will work. If it is not the
case that the user gave us a name or
it's not the case that the user gave us
a sport, we will show this error
message. Otherwise, if all seems to be
well, for now, we're not going to do
anything useful with the information,
but I'm going to go ahead and return
render template of success.html,
which is simply going to assume that the
user was successfully registered. So,
let's whip that up quickly. Uh, I'm
going to go ahead and code up
success.html
inside of this file, which will
similarly extend uh layout.html
inside of which there's a body block
that quite simply says, "How about you
are registered?" and we'll just pretend
that it is so and block. So that's it.
In short, I want the two templates that
show failure or success respectively. So
I think now in app.py, we're in better
shape. I now have a register route that
will get called if post is used to visit
it. And I'm going to check request.form,
which is where you get the post
variables from. Check whether name or
sport is provided. And I'm going to
render a template accordingly. So let's
try this. Let me go back to my other tab
and go back to the form. Let me type in
my name, David, but no sport. Click
register, and I have an internal server
error, which was not intended. So, let's
figure out how to diagnose this. So, it
seems to be the case that I'm at
/register. That was intended, but
something clearly went wrong. So, let's
go back. Now, I could just kind of stare
at my code endlessly, but recall that
there should be some hints in my
terminal window that's running Flask.
So, let me go back to my other terminal,
and there it is. Unexpected char double
quote at line 11. Well, look, sounds
like user error. So, that is in
failure.html.
And you can kind of see it because Flask
is like underlining it literally for me.
What did I do that was stupid?
Yeah, I just didn't close my quote. So,
amateur hour here. So, let me go into I
do need to open it after all,
ironically. So, let's go ahead in my
other terminal, open up failure.html.
And there it is. One stupid character
away from correctness. All right, let's
close this again. Go back to the other
tab. Let's try this again. David as my
name but no sport. Register. Okay, you
are not registered. I don't know why,
but I know I'm not registered. Let's try
it again with a name. Uh with no name,
but yes, a sport. Click register. You
are not registered. All right, just for
good measure, let's give no name and no
sport. You are not registered. So, that
seems to be working. Let's now
cooperate. Let's go ahead and register
as David for basketball. Cross my
fingers. Damn it. And internal server
error. Let's try to learn from my past
mistakes. Let's open up this eyeball it.
I did it twice even though that was not
copy paste. So 0 for two. All right,
let's go back here. Notice now I can
actually just click reload because the
browser is smart enough to remember what
I just posted to the server. So if I
click reload, you'll be prompted to
confirm the form submission less you be
doing this on a website with your credit
card or something where you don't want
to send it twice. But in this case, I'm
fine with sending my name and basketball
twice. So I'm going to click continue.
And this time it worked telling me that
I'm actually registered. So I'm not
doing anything with the students data,
but at least I am validating that they
gave me some input. Now there's a catch
here. The catch of course with HTML is
that it's all executed s client side.
And so for instance, suppose that a
student is really upset that we only
offer basketball, soccer, and ultimate
frisbee. And maybe they really want to
register for volleyball even though
we're not offering volleyball. Well,
there's arguably like a security
vulnerability here where technically my
code right now will tolerate any user
input even if it's not in that dropdown
because after all, let me go ahead and
rightclick or control-click on my web
page and open up the developer tools.
Let me go into the form as sort of a
hacker type student. Let me go into the
select menu and okay, no big deal. If I
want uh ultimate frisbee to exist, well,
I just need to know a little HTML. I'm
going to rightclick on that element and
click edit as HTML. This literally lets
me start editing the HTML of the page.
I'm going to give myself my own option.
Option value equals volleyball. Close
bracket volleyball. Uh, enter. And now
when I close developer tools, woohoo, I
can register for volleyball if I want.
So let's select volleyball. Type in
maybe Kelly is hacking the site.
Register. And she is registered for
volleyball apparently. All right. So the
short answer is the short the takeaway
here is do not trust user input ever for
reasons we've already seen when we
discuss SQL ever more so now that we're
dealing with the web because who knows
what users are going to do accidentally
foolishly or even in Kelly's case here
maliciously trying to pass data that we
did not expect. So what would be the
defense against this? Like this is just
how HTML works and assume that I'm
actually registering Kelly for sports
now and somehow she's now signed up for
volleyball in our database. What would a
solution be logically here?
Yeah.
>> Yeah. So maybe do some server side
validation. So don't just blindly check
that we have a value from the user.
Actually check that it's one of those
sports. So if I go back to app.py, I
could do this in a few ways. And maybe
my first instinct would be this. Let's
check for the name and do this. But
let's also do this. Like if request
form.get get quote unquote uh sport. And
actually, let's put this in a variable
just to make it even easier to type. So,
sport equals this. If sport uh how about
does not equal uh what was it? Basket
ball and sport does not equal uh soccer
and sport does not equal quote unquote
ultimate frisbee, then render an error.
So, uh, return render template quote
unquote failure.html.
So, now if I go back to this form and
try to register as Kelly again, you are
not registered. So, I somehow caught her
because volleyball of course is not in
the list of sports that I put there. But
what might you not like about this
approach?
Even if you've never done web stuff
before, what's bad about this?
>> Yeah, I have to hardcode every single
sport now in not only app.py PI to check
for the validity on the server of what
the humanness has typed in. But recall
that the drop down itself came from
index.html. So I now in duplicate have
to put like all of the sports there too.
So like this just seems bad to have
duplication. And so better might be to
do something more like this at the top
of my file here. Why don't I go ahead
and just give myself a global variable
which in the context of this web app is
perfectly reasonable. So I can access it
anywhere. Let's call it sports in all
caps just to note that this is a global
variable in constant. Even though Python
does not have consts in the sense that C
does, but this is sort of on the honor
system. If you see a variable in all
caps like this, just don't mess with it.
Use it, but don't mess with it. So, uh,
inside of the square brackets, this is
going to be a list of the sports that I
do want to support. So, basket ball,
uh, soccer,
ultimate frisbee, and that's it. Now,
instead of doing all of this, what I can
instead ask is a simpler question like
this. If sport not in sports, then go
ahead and return render template quote
unquote failure.html.
And I can actually tighten this up a
little bit. I don't need two calls to
failure.html. Why don't I just borrow
this code and say or uh sport not in
sports render a failure. And now I've
tightened this up quite a bit more, but
I'm essentially using Python to just ask
is the sport that Oops, sorry, I deleted
too much. Sport equals actually, let's
just tighten it up further. Sport does
not exist. So let's do request.form.get
quote unquote sport. So if the sport
that the human typed in or selected from
the drop down somehow is not in this
global list of possible sports, well
then it's a failure. Don't let Kelly or
whoever register instead. But if I now
have this global variable, I can be a
bit smarter in my template. I don't need
to manually write out all three of these
sports here. Instead, I think I can be
smart about this. And when I render
index.html itself, why don't I just pass
in a variable called sports for
instance, set it equal to the value of
that global array. And then in my
template, and here's where templating
again gets interesting and starts to
save you time. Let me go into
index.html, HTML delete all but the se
default value the blank one and do
something like this. Ginger it turns out
also supports loops like Python for
sports in sports using the curly braces
and the percent signs. I can now
dynamically generate options as many as
I want. So option value equals quote
unquote the current sport close uh quote
there close bracket sport. So it's a
little redundant but again this is just
how HTML is. This is what the human
sees. This is the value that gets
submitted to the server in case you want
one to differ from the other. And then
below that option line, I can say end
for which is a bit weird, but that's how
it works in Ginga to stop that loop. So
this is kind of powerful. Now if I have
three sports, 30 sports, all of the
options will be dynamically generated by
this template. And so now we're starting
to save ourselves time and I can
centrally manage all the sports by just
updating this global list here in
app.py. So, let's go back to the
browser, uh, back to the form, reload,
and you'll see that the drop-down
thankfully still works the same way, but
all of those options were dynamically
generated. Indeed, if I view page source
from my browser, you'll see, and there's
some extra whites space there because
the loop was adding some whites space on
each iteration, I still have the three
sports, but not volleyball, as was my
intention. So now if uh if Kelly even
tries hacking this version of the site
by going in here and select and typing
in volleyball manually registering the
logic will still catch it because only
those three sports are in that array. So
it's perfectly fine for me now to
register for basketball because it's
among the sports sorry in that list not
array questions on any of these here
techniques.
All right how about another type of
form? So, select menus are nice, but you
also might see radio buttons on
websites, which are the mutually
exclusive little circles that you can
select to choose one or another option.
Uh, let me go back to index.html and
just show you how those can be created
as well. Instead of using a select menu,
turns out we can create a whole bunch of
inputs uh of radio type type as follows
uh as of radio button type as follows.
for each sport. So for sport in sports,
let's go ahead and output
in between this tag and the N4 the
following input type equals radio
uh and let's give it a name. The name of
this radio box is going radio uh button
is going to be sport and the value of
the current input is going to be quote
unquote sport. And the word that the
human's going to see is as before sport.
So notice it's just another type of
input. Previously we've seen text for
instance two lines above. We also saw
last time search. We saw email. There's
a bunch of text input types. This one
though is going to display as a radio
button instead. And the human is going
to see this label here. If I now go back
to my other browser tab and click back,
click reload on the form. I should see
it's not pretty, but it's a radio button
in the sense that these are mutually
exclusive. How does the browser know
that I should only be allowed to select
one of them? Well, because I use the
same name for each of those radio
buttons. It knows that means mutual
exclusivity. In fact, if I view page
source in the browser, you'll see that
all three of the inputs that were
dynamically generated, type equals
radio, type equals radio, type equals
radio, also have identical names. And so
that's just how that works. And that's
the only change necessary. If I now go
ahead and type in my name, David
Basketball, click register, we're still
up and running because what the server
gets is still exactly the same inside of
request.form.
They can access. You can still access
name or sport no matter what type it was
in the user's own browser.
Questions on these techniques?
All right. Right. Well, it's kind of
obnoxious that when you don't do
something right in this website, like
forget your name, but do select a sport,
all you are told is generically you are
not registered. Like, it'd be nice and
much more userfriendly, better UX, user
experience, so to speak, to actually
tell the user what's wrong so they can
actually fix the problem. Now, there's a
bunch of ways we can do this, but I'm
going to propose that we go ahead and do
this. Let's create a template called
error.html, whose purpose in life is
just to tell the user a little something
more about what they did wrong. So, I'm
going to go back into my terminal window
here. I'm going to code up a file called
error.html.
Enter. And I'm going to go ahead and
before as before extend uh layout.html,
learning from my past mistakes and
closing that quote. Then I'm going to go
ahead and do body block down here. And
then inside of this block body, I'm
going to go ahead and have just some
simple text like an H1 tag that just
says error to the user. then a paragraph
tag that's going to contain some error
message to be determined. Uh and then uh
that's it for now. So I've got the
template for an error message screen.
Let me go back into app.py now and let
me add some logic because app.py does
know what's wrong. It's just at the
moment we're very generically returning
a failure template instead of something
more precise. But if I know that the
user hasn't given me their name, well
let me say that error message. So, let's
actually get rid of these two lines and
be a little more specific like this. So,
if or how about let's do it like this.
How about validate the user's name
first? So, name equals request.form.get
quote unquote name. That just gives me a
variable containing the user's name. If
they didn't give me a name, which I can
express with just if not name, like if
name is blank or none, then let me go
ahead and return render template of that
error template. But let's pass in a
specific message like missing name. And
so by passing in another argument to
this template called message, I can
trust that Flask will dynamically output
that message where I tell it to using
the old curly braces. Meanwhile, let's
go ahead and validate not just the name,
but validate uh sport. I can do this in
a couple of ways. Let's do this. So
sport equals request.form.get quote
unquote sport. Then in here, let's say
if there's no sport, go ahead and return
render template quote unquote
error.html,
message equals missing sport. So quite
like name. But we can be more specific
now, too. If the sport they did give me
is not in the global sports list, well
then it's Kelly trying to register for
volleyball again. So let's return render
template of error.html, HTML, but this
time the message shall be invalid sport
or something like that. So, we're being
ever more clear otherwise they are
presumably confirmed because we got this
far logically. So, if I go back to the
other browser tab, go back to the form
and let's go ahead and type in no name
and just click register.
Okay, what did I do wrong accidentally?
So, let's go back to VS Code, open my
terminal, open the first terminal window
where Flask run is running. un
encountered unknown tag body. So I did
something stupid in error.html.
So let's go into error.html
and uh body block. Oh, that's subtle.
I just transposed the words. It's
supposed to be block body. That was
dumb. All right. Block body. I think
that's correct. So let's go back to the
browser. Let's reload. It's prompting me
to reconfirm that I want to submit the
exact same form which recall had no name
and no sport. But now I see an error in
a good way. This is not an uh server
error. This is my error. Missing name.
Now it's not super user friendly, but
it's at least more explanatory than you
are not registered. All right, let's go
back. Let's give it a name, but no
sport. Register. Ah, missing sport.
Let's go back. Uh, let's go ahead and
give it a sport, but uh a sport, but no
name. Missing name as before. And if I
took the time to actually hack the HTML
and do what Kelly did before and add
volleyball, it would similarly say
invalid sport in this case, too, because
it's not in that same list.
All right, questions on this technique.
All right. Well, it's all fine and good
to have a registration site that does
this, but it's literally just throwing
out the information. And what I did like
years ago was actually even cut a corner
initially where I think I wrote code
that just sent an automatic email to the
proctor running frost IM containing the
person's name and the sport for which
they registered. But that was very
quickly replaced by a better feature
which is actually store the data in the
server itself and keep track of it
rather than just send it off via email.
So let's do a first pass at actually
storing information on everyone who has
registered for sports. Well, well, let
me go up here and let me create another
global variable to make my life easier
here called registrance and set this
equal to curly brace close curly brace.
What do these two characters represent
if empty especially?
What data type is this? It's a
dictionary. So, it's a Python dict. So,
you could similarly say dict explicitly
open close pen. But it's more Pythonic
generally to just use two curly braces.
This is just giving me an empty
dictionary. Why? Well, I want to store
the two things I'm se collecting about
all of the students, their name and the
sport for which they registered. So, key
value, name sport. So, how can I go
about doing this? Well, it's pretty
trivial. Down here in my register
function, recall that I'm just kind of
naively saying you're registered even
though I'm not doing anything with their
name or sport. But that's easy. Let's
remember the student for real now. So in
that registrance uh uh dictionary, let's
go ahead and index into it using the
student's name, David or Kelly or
whoever, and set that equal to the sport
for which they registered. And now
notice the name is coming as before from
request.form.get.
The sport is similarly coming from that
function. And so this is just
remembering that key value pair. So
that's all fine and good. It's in the
computer's memory. How do we actually
see it? Well, wouldn't it be nice after
you register if you could see the actual
registrance of the website? Um, uh,
certainly if you're the proctor trying
to run the sports. Well, yes. So, let's
go down here and let's create another
route like /registrants, which is just
going to give me a list of everyone
who's registered. Let's define a
function called registrants, though I
could call it anything I want. And this
one's going to be relatively simple.
Let's render a template called
registrants which will soon exist and
pass in all of the registrants that are
in that global dictionary. And again I
can call this placeholder anything I
want but in so far as it contains the
registrance I'm setting registrance
equal to the registrance global
dictionary. So let's go now into my
terminal window and create
registrance.html HTML and create really
the beginnings of an actual frostim's
website that's going to show the proctor
who has now registered. So let me go
into this terminal and do code of
registrance.html
and close the terminal. Let's try to get
this right. Finally extends layout.html
close quote uh close bracket there. Then
let's do block body in the right order.
Then end block down here. And then
inside of the block here, this is going
to be a bit more of a mouthful, but
let's use some of our HTML from last
week. We'll give an H1 tag that says
registrance so the proctor knows what
they're looking at. Then let's put this
in a table for instance with two
columns, names and sports. So table tag
followed by a T head tag for the table
heading. Uh then that heading is going
to contain just a single row for TR. And
each of those has a th table heading. Uh
one of which, and actually I'll make it
tighter is name. The other of which is
going to be sport. So these are the
column headings, the table headings, TH
tags for short. After the head of the
table, let's go ahead and do a T body
for table body. And inside of here, this
is where Ginga comes in use. I can say
for each name in the registrance
placeholder that was plugged in and for
proactively, what do I want to do on
each iteration? Well, I think want to
output table row, table row, table row.
And in here I can do TR and then inside
of that a table data for the cell on the
left putting in the student's name which
is coming from this for loop just like
in Python. And then one more table data
namely the registrance uh placeholder
indexed into at that name which because
it's a dictionary will give me the sport
for that student's name. And then I
think we're good to go. And in fact,
just to hark back to something I said
last week when we were imagining,
actually this is in week five when we
were talking about stacks and like your
Gmail or Outlook inbox is essentially a
stack with the newest emails on top. And
I hypothesized at the time that it's
just row after row after row after row
when we started talking last week about
HTML. Here is what Google and Microsoft
and others are probably doing. Anytime
you have tabular information in a page,
they've got some data in memory like the
registrants and they're just using code
like this in Ginger to output table row,
table row, table row. Imagine this is
your email instead. Same exact idea. And
now we have the ability to express that
kind of logic. So let's go back now into
the browser. Click reload on the form.
Let's register for instance David for
basketball. Click register. It claims
I'm registered. But hopefully now I'm
legitimately registered because that
variable is storing it in memory. And in
fact, let's go ahead and go now to not
slregister, but I'll zoom in at the top
registrance and hit enter. And we will
see a very ugly but functional HTML
table containing two columns name and
sport. The so-called t head with which
David and basketball are present.
Moreover, if we now go back to that form
and let's try registering Kelly for
instance for soccer. Click register. Now
let's manually go to registrants again.
Now Kelly and David are in the server's
memory as well.
Questions then on what this example is
now doing or how it's achieving these
results? Yeah.
>> Really good question. If you wanted to
restrict the registrance page to only
certain people, ideally you would have a
password on it. Um, and in fact, one of
the next examples we'll do in a few
minutes is a a login page for exactly
that reason. Right now, just sort of on
the honor system that only the proctor
in question goes to this URL. But just
for the sake of discussion actually,
suppose that you did want the
registration list to be public if only
to like hype up who has already
registered. Well, it's not you good to
just tell people go to the /registers
URL. We can actually link them to that
in a few different ways. So for
instance, I can go down to uh how about
uh let's say success.html.
So let me open up success.html.
It just says you are registered. I can
do something like this. Um a href equals
/registrance. So I have control now over
my HTML and the routes. So slregistrance
will exist. Uh see who
else registered. Period. So, this will
create a nice little HTML link that
links me to that route. So, let's try
this. So, let's go back to the form over
here. Uh, let's go ahead and register
John for ultimate frisbee and register.
All right. And now we see you are
registered. See who else registered. And
if I hover over this, it's super small,
but it would have showed me in the
bottom left corner at the link. And
indeed, here now is John at the bottom
of this table. And just to be clear, if
I view page source on the browser, you
see all of the TRS that we dynamically
generated on the server side before they
were sent as such to the browser. All
right. What if we wanted to do something
slightly more elegant here? Well, I
don't have to just use this HTML hack
like why don't I just show the user who
has registered automatically. And this
is kind of a cool feature of web apps as
well. In addition to importing flask
render template and request, I'm going
to also import a function called
redirect that comes with flask. And
indeed, rather than just show
success.html,
I'm going to go ahead and return the
result of redirecting the user to
/registrance. So to be clear, I'm in my
register route, and instead of showing
them the success page anymore, which I
might as well delete at this point, just
going to redirect them to this list of
everyone who is registered, including
themselves. So, if I go back over here
and type in someone like Doug, who maybe
will play basketball with me, and click
register, watch what happens to the URL
at the very top of the screen, I'm
automatically whisked away to
registrance in this case. Um, I made a
change to the code though, and so the
server actually was smart enough to
reload. So, Doug is now uh the only one
in the database. And this actually hints
at a problem we should really solve.
Like, in fact, let's do this real fast.
Let me go ahead and register myself
again for basketball. Register. Now,
it's Doug and David. The catch though is
if this server ever goes offline, maybe
because it needs to be updated or it
crashes or it reboots, when you hit
control C and get back to your terminal,
Flask server is no longer running, which
means that global variable called C
registrance in all caps is gone. It's
like free. The memory has been freed.
So, if I were to rerun Flask now, as
would happen automatically if the server
itself rebooted, well, this is not great
because if I go back to the registrance
page and click reload, no one has
registered. And in fact, that's what
happened with Doug a moment ago because
I changed my actual app.py, Flask was
smart enough to realize, oh wait, the
code has changed. I better reload the
program, which gave me a brand new
version of that global
dictionary. So what would be better
clearly than storing registrants in
memory in RAM in a variable in the
server?
Yeah. Yeah. So in an actual database and
so here's two where everything kind of
comes full circle and connects again. So
let me go back into uh app.py here. And
I like generally the logic of what I've
done. I don't like the fact that I'm
just storing my registrance inside of
this global variable, which is again
just in the computer's volatile memory.
Let's actually put this in a database
instead. So, let me go up here and get
rid of this global dictionary and let me
do something a little smarter up here.
Let me import from CS50's own library
the SQL function that we've used before.
And again, even though we've been taking
off all almost all of CS50's training
wheels, the reality is using CS50's SQL
library, even through final projects,
just makes using SQL in Python so much
easier. But there's certainly thirdparty
libraries you can use. Um, let me go
down now and in addition to creating my
app, let's create a database, DB for
short, setting that equal to SQLite, and
then SQLite SL, which is not a typo. And
let's assume that the database shall be
called frost imdb. More on that in a
moment. And then down here, now that I
have a database variable, let's not
remember the student by storing them in
this dictionary. Let's actually execute
a line of SQL. So, db.execute
insert into Well, wait a minute. What am
I going to insert them into? Not to
worry. I came prepared for this. So, let
me go ahead and maximize my terminal
window and then run SQLite 3 of a file
called frost imdb. And this is a file I
made in advance, but it's super simple.
In fact, if I type dot schema just to
see the design of this database, you'll
see that in advance I created a table in
this database called registrance. It has
a column called ID, a column called
name, and a column called sport. And the
primary key of this table is to use the
ID value which is just an integer. And
now notice I have some constraints here.
I want the user to give me a name and a
sport. So I've specified that it's not
just text, it's not null. That is null
values should not be possible to put in
here. All right. So, let me go ahead and
exit out of SQLite 3. Let me go back
into uh my code editor here. And now I
know what to insert into. Insert into
the table called registrance. What?
Well, I want to insert how about a name
of the student and the sport for which
they registered. And the values
therefore that I want to insert are
going to be whatever they came from the
post request. Here's where you do not
want to make yourself vulnerable to SQL
injection attacks. No fst strings in
here. you know, just plugging the
students input in blindly. This is where
and why we use these placeholders in
both CS50's library and in many
libraries uh in the real world to
specify that I want the library to
properly sanitize the user's input and
get rid of any scary characters like
apostrophes or semicolons or the like.
So, I'm going to pass in name and sport.
And this one line has the effect of, as
you recommended, storing the
registration in an actual database on
the server, not just in volatile
temporary memory. But we do have to
change one thing. This line here is no
longer valid because there's no global
variable there via which we can get all
of the registrants. But that's no big
deal. Here's how most web apps would do
this. I'm going to define a variable
called registrance and set it equal to
DB execute of select star from
registrance. It's as easy as that to
just get all of the registrants from my
database. And down here, there's no
longer an all capitalized variable, but
there is a lowercase one registrance.
So, to be clear, in my register route, I
am inserting the user into the database.
And in my registrance route, I am
selecting the users from the database.
And then the rest of the code, I think,
can stay the same. So, let's go back to
fro's here. Go back to the form. Let's
register David for basketball register.
Ah, I did screw up. You're seeing some
weirdness here. What are you actually
seeing? There's one user registered. Not
intentional. But what does this syntax
suggest? We're looking at this is a
dictionary. Recall that the db.execute
method that comes with CS50 SQL library
gives you a list of dictionary objects.
And so because there's only one
registrant at the moment, you're seeing
my dictionary for my registration, which
is not what I want to show here. And I
forgot. I need to also go back into the
registrance
uh template to tweak my syntax as
follows. Let me go back into VS Code
here. Let me go into registrance.html.
And because I am passing in now not a
dictionary but a list of dictionaries, I
just need to think about the problem a
little bit differently. So my syntax
here is going to be for each uh let's do
this as follows.
For each registrant
in that registrance list of
dictionaries, go ahead and display the
current registrance name and go ahead
and display the current registrance
sport. In other words, I'm using Python
syntax which works as well in Ginga
here. This iterates over the list of
registrants each of which is a
dictionary. So I'm using dictionary
syntax now to index into the name key of
the registrant dict uh object and the
sport key of the same. So now let me go
back to my browser and I'm just going to
go ahead and reload the registrance page
without resubmitting the form. Now there
it is. David and basketball. And now
let's go back to the form and register a
couple more people. Kelly for soccer
register. Notice we're at the
registrance link. Kelly is indeed still
registered. Let me go back to this and
let's register John. Ultimate Frisbee
register. Let's go ahead and kill the
Flask server by going to my first
terminal window. Uh, control C. And now
let me go ahead and rerun Flask, which
was bad before. That's how Doug ended up
the only registrant last time. But this
time if I go back to the registrance
page and immediately click reload, even
though the server is running a new in
memory, the database is persistent,
which was the whole point of using SQL
from week uh seven onward. And let's do
one more for good measure. If I go back
to the form, we'll register Doug so he
can play basketball with me, too. And we
even have Doug now in the database. It's
an ugly looking table, but the data is
in fact all there.
All right, questions now on this
improvement which is getting closer and
closer to what the actual Frostim's
database did uh website did so many
years ago.
All right. Well, let me propose this
now. We have this table of registrants.
Suppose that um maybe uh Kelly was not a
very sportsman like when she played
soccer last time. So, we want to
dregister Kelly from soccer. That is
nope. we're going to reject your
registration. Let's think for a moment
about the design here. Like, here's an
HTML table containing names and sports.
And wouldn't it be nice if we could add
a button that would let me dregister
Kelly or anyone for that matter? When I
click on that button, what information
should ideally be sent from the browser
to the server to remove someone like
Kelly from the database?
>> ID.
>> Yeah. The ID of the person. And you're
proposing ID instead of name. Why? the
ID uniquely identifies in that SQL
table.
>> Exactly. The ID uniquely identifies the
user in the SQL table. So, in fact,
let's see this real quick. If I go back
to VS Code and we'll revisit essentially
a week seven issue here. Let me go back
into my second terminal where I can
again run SQLite 3 after maximizing my
terminal. And before I just wrote schema
to see what the table is. Now I'm going
to literally run select star from
registrance in SQLite 3 and we'll see a
little askar table of all four of us who
registered but we also see the unique ID
and the value of the unique ID recall
from week seven is that it's the
so-called primary key. It is the value
that uniquely identifies users as
minimally as possible and that's a good
thing because if we have another Kelly
registering for frost IM's we don't want
to dregister the wrong Kelly or both
Kelly's we want only the Kelly with ID
of two. So somehow the button we add to
the registrance page should contain in
it the ID of the person we want to
delete. Because if you do pass the ID of
the person that you want to delete to
the server, the server can do some kind
of select looking or some kind of delete
statement using that ID number and
delete just that row. So there's a few
ways we can do this, but let me propose
that we proceed as follows. in our
registrance route, which is where we can
currently see all of these users. Let's
go ahead and output an ugly but
functional form for each of those users.
So, let me go ahead and uh minimize this
and hide my terminal window. And in
registrance, let's go ahead and just do
this. In addition to outputting every
registrance name and sport, let's also
output a third column whose purpose in
life is to contain an HTML form. The
action of that form will be a route like
dregister and the method we're going to
use is going to be post just so that we
don't accidentally store uh personally
identifying information in a URL or
such. This form is going to have a
button the type of which is submit and
the button is going to say dregister.
And I could now implement the ID in a
couple of ways. I could do input name
equals ID, type equals text. And now if
I go back to my other browser tab and
reload, I should see a button for every
one of these registrants. And I do. But
this is kind of like the honor system
where I just let the user type in the ID
of who they want to delete. And it's
sort of weird that I have multiple forms
in that case. But here is where
dynamically generating HTML can get
pretty uh useful. Let's change the type
of this input to hidden and set the
value of this uh input to be whatever
the current registrance ID actually is.
Uh storing this in here and let's go
ahead and not confuse this. So we'll use
single quotes on the outside instead. So
inside of this value I'm putting the
current user's ID. So, if I go back now,
notice that the text boxes are going to
disappear, but the buttons will not. But
all of that information is still there.
If I right click or control-click and
open up my developer, uh, let's open up
view page source because it's just a bit
bigger. Notice that David and Kelly and
John and everyone else here has the same
HTML as before, plus another column
containing a form that contains a I
somehow messed up still. Why is this
blank? So, this is still not good.
Ah, thank you. I accidentally pluralized
this, but it should be registrant
because I'm inside of this for loop and
each iteration gives me a variable
called registrance. So, user error on my
part. So, let's go ahead and
dramatically do this again. Let me view
page source of the same page. Scroll
down a bit. Thankfully, there is now for
every one of these registrants a hidden
ID for one for me, two for Kelly, and I
bet if we keep scrolling, we'll see
three for John, and four for Doug. So,
now this form has enough information,
even though there's no user input other
than the clicking of the button to tell
the server whom to delete. So, how do we
delete the user from that particular
registration table? Well, I think we
just need to add a route. So, let me go
back into VS Code here into app.py and
let's go ahead and create another route
for instance uh in here say uh we'll put
it up here below uh up here below index.
So, app.root quote unquote slash
dregister whoops dregister and now
defregister
but I could call it anything I want. And
how do I do this? Well, let's first get
the ID from the form. ID equals
requestform.get get quote unquote ID.
Let's do a bit of a sanity check here.
So if there is an ID and it's not blank
for some reason, go ahead and do
DB.execute
delete from registrance where ID equals
uh question mark. And now let's pass in
the user's actual ID. And then no matter
what, let's go ahead and redirect the
user back to the registrance page so
that we can hopefully see the result of
that change. So again, I'm just using a
bit of SQL per week 7. I'm using a
placeholder by using the question mark,
passing in the actual ID from the form.
And I'm only doing this if there is an
ID that was passed in. And I'm letting
the database actually do the deletion.
All right, so let's try to do this.
Let's go back to the browser here.
Reload the /registance page for good
measure. Let's decree that Kelly is now
dregistered by clicking this button. And
oh, so close.
method not allowed at the dregister
route. What did I do wrong?
Let me go back to the code. What's wrong
with my dregister route?
Well, what method is the form using? If
I go back to registrance.html, the meth
the form is using post.
>> Yeah. So, I need to override the
default, which is get. So, I need to go
up here again and just change an
argument to be methods equals and then
in a list containing only post now
instead of get. All right, let's go back
to the form and go back. And now let's
try to dregister Kelly. She's gone.
Let's get rid of me now. I'm gone. And
indeed, if I go back to VS Code, open my
terminal, maximize it, and select star
from registrance again, you'll see that
the two of us are indeed gone in this
case.
questions now on this technique because
now we have most of the plumbing in
place for adding people to a database,
deleting people from a database. It's
very similar in spirit now to most any
website that has this kind of
interactivity.
All right, subtle question. I
deliberately in my
registrance.html file uh used post as we
just discovered instead of get. Why
though? because it wasn't that strong an
argument that I hinted at earlier of
like, well, I don't want like Kelly's ID
to end up in my URL bar or mine. Like
IDs are not really personally
identifiable. They're just opaque
integers at the moment. But why would it
be bad if you could delete people by
using the get method?
So this is kind of subtle but the catch
with using get is that by definition you
can visit that resource that route by
just typing in a URL or following a
hyperlink. So for instance if an
adversary were to type a URL like
/registrance question mark id equals oh
I don't know uh four and then send me
this URL in an email or send this URL in
an email to the proctor who's running
the frostam's program. If that proctor
simply clicks naively on this link as my
code is implemented now and I've used
get instead of post, what's going to
happen?
>> Doug gets dregistered just because the
proctor followed a link in their email.
And this is hinting at the kinds of
fishing attacks that are possible too.
Bad design like generally when you are
using get requests that is just simple
URLs that are clickable or typable. They
should not have the effect of changing
data on the server. Post is much better
if only because you can't just click a
link and post happens. To induce a post
request, you almost always have to click
a button. So, at least this case, the
proctor would receive an email. They
would have to receive an email, click on
a link, and then they would see a web
page like this that clearly has a button
labeled dregister or the like, which is
an additional layer of protection. And
there's even more attacks that you can
wage by supporting get. So in general,
post requests are preferred anytime
there's anything remotely personally
identifiable or remotely destructive
like actually changing data on the
database like this. All right. Well,
what more can or should we do with fro
perhaps? Well, let's see. Maybe one or
so final flourishes here. Um, if I want
to go ahead and maybe make those error
messages a little more interesting.
Let's do that for just a second. Let me
go back to uh my uh other browser tab
here. Let's go back to the registration
page where the form is and let's
deliberately not cooperate and just
click register so that I get an error
about missing name. Well, wouldn't it be
nice if we made this a little more user
friendly by including like an image on
the page as is commonly the case? Well,
we can certainly include images in
websites using the image tag, but the
catch is we actually have to be a little
more clever about how we store the image
on the server in order for this to work.
So for instance, let me go into that
error page. We don't need success open
anymore and we don't need layout anymore
or this index anymore. Let's focus on
error. And suppose that I did want to
include an an error message containing
like a a grumpy cat on the screen. Well,
ideally I would just do alt or I would
do open bracket image uh source equals
and then something like cat.jpeg where
cat.jpeg is the name of a cat in this
current folder. And just to be clear,
let's have an alternative text of grumpy
cat for screen readers or slow
connections.
Okay, this unfortunately is not going to
work. Let's go over here and induce the
same error by just reloading and
submitting the same form. And you'll see
indeed a broken image because that image
that cat.jpeg does not exist, but we do
at least see the alternative text. Well,
I did come prepared with a cat already.
And so, let me go ahead and grab this
cat from another folder. And this cat is
going to contain uh is going to exist in
a file called cat.jpeg. And indeed, if I
type ls now after having grabbed a copy
of that cat, it exists alongside app.py.
Seems good. Let's go back to the browser
here. Let's reload. And we should see
ah still no cat. Well, why is this?
Well, this is a side effect of using the
framework as well. It turns out for
organizational sake, any images you want
to display on a page or any CSS files or
JavaScript files that you want to embed
in a page, if they're static assets,
should actually be in a folder called
static. And by static, that just means
unchanging. You or someone else wrote
them once and they're not dynamic in the
way that app.py is. So, I'm actually
going to use my mv command and move
cat.jpeg into the static folder. Indeed,
if I type ls now, cat is gone, but it is
in the static folder. And now if I go
back over here, I think we'll be good
except that I do need to go into
error.html and say that the source of
this image is actually in
/static/cat.jpeg
to make clear it's in that folder. And
so indeed when I now reload the page
once more now I see a very grumpy cat at
least guiding my error message. A but
there is a difference here. Even though
when accessing the static directory I
have to be explicit. Notice that this
whole time we have never once mentioned
the templates directory. The render
template function to be clear knows
automatically to look in the templates
folder for your template. You do not and
you should not say something like
templates here. You simply specify the
name of the file. But in the in the uh
HTML template, you do actually have to
include as I did /static in the HTML.
All right, let's do one final flourish
with the actual code. Suppose that it's
time to modernize and let people
register not just for one sport as per
the radio buttons, but multiple sports.
It's a little obnoxious to make me go
back and fill out my name again and
again and again if I want to register
once, twice, three times for sports. So,
why don't we uh go ahead and in terms of
UI change those radio buttons to
checkboxes? That's a very easy fix. Let
me go into uh my templates folder and
into index.html HTML where this form is.
And if I want to change radio buttons to
checkboxes, literally just change radio
to checkbox. If I go back to the browser
here and reload, you'll see the familiar
checkboxes now, which are not mutually
exclusive. It lets me check multiple
ones, thereby registering for multiple
sports at once. But my logic has to
change a tiny little bit here whereby if
I want to go ahead and get all of the
sports for which the user is registered,
well, that logic has to change in
app.py. So where is my register route?
Down here. And we haven't touched this
in a while, but recall that the register
route here has uh a validate name chunk
of code, validate sport chunk of code,
and we most recently did the insert into
chunk of code as well. But if the user
is registering for multiple sports, I'm
okay with having one row per sport, even
though I'm sure we could do better than
that. But how do I iterate over all of
the sports that the user gave me? Well,
I need to change my validation code here
a little bit. If you know the user can
select multiple values as with
checkboxes, you're going to use
request.form.getlist
and then the name of the uh parameter
that you want to get the value of. And
then this is going to give me back a
list of values. So I'm going to go ahead
and change semantically my code to say
sports because I'm expecting zero or
more sports now instead of one. So if
there are no sports, we're going to just
say missing sport. Heck, missing sports.
Um but then I can't simply do this. I
can't just say is the sport for which
the user registered in that array or not
because they might have given me two
sports or three. So logically I should
really check all of the sports that the
human typed in for me and I should
probably do something like this instead.
So for each uh sport
in the sports that the user typed in, go
ahead and uh ask the question if that
sport is not in sports, then go ahead
and output invalid sport. So it's just a
bit of tedium here. We're just adding a
bit of logic, but this way I'm iterating
over every check box that the user
checked and making sure they didn't do
what Kelly did earlier and sort of make
up her own sport and submit that to me
among all of the others. But this now
should let me. Let's try. Let's reload.
Oh, and then actually one other line
here. We also need to do it down here.
Uh, for each sport in sports, we better
execute that line of code multiple
times. So, let's see what happens. Let's
go ahead and register David for actually
let's see what who's in the database
still. So registrance. So we've got John
and Doug. No David or Kelly. So let's
reregister David for basketball and
soccer. Click register. And now I'm
indeed registered for both. And I
observe that it's kind of bad design
that I'm just inserting myself twice
into the database. So let me go ahead
and open up the Frostims database one
last time. Uh let me do a select uh let
me do a select star from registrance.
You'll see too that David and David are
both there. What would be a better
design here to get rid of the redundancy
and to know that I'm the same person
ideally?
Yeah.
>> Yeah. I should probably have an ID for
the the person as well. So this is going
to complicate it more than we want to
play with today. Instead of just a
registrance table, I should probably
have like a students table that has an
ID for every student and the name of
every student and then change this table
as we've seen with the IMDb database and
others. I should really be storing the
IDs of the students, the Harvard IDs if
you will, and not just their names like
this. So, there's room for improvement,
but the point here is just how we can
actually use checkboxes and get back
multiple items from folks.
All right,
that was a lot. Questions on where we're
now at.
All right, to make the coding a little
less tedious, what we're going to do is
look at a few final examples that have
sort of come pre-made, and we'll walk
through the code, pointing out only
what's different as opposed to some of
the boilerplate that we keep seeing. Um,
where we left off now, recall, is that
we have app.py, which is all of our
logic, requirements.ext, text which just
enumerates the libraries that we want to
use in the project. Static which now
contains any static files like cats or
JavaScript or CSS and templates which
contains our actual templates. It's
worth noting that we're actually
following a fairly common paradigm. This
is not specific to Flask. The model that
we've essentially the the paradigm that
we've essentially been implementing is
this. If this uh shape over here
represents the human or the user, they
keep interacting with what the world
generally calls a view. A view is the
term of art that just describes like the
user interface. aka view. But that view
is generated by a certain type of code,
namely controller logic. So app.py is
technically what the world would call
controller logic or business logic uh to
use an industry term. And that
controller code, aka app.py, is
generating one or more views. So the
views that we're referring to here is
like everything in your templates. Those
are your views. But there's a third
piece of the puzzle that we just
introduced which is generally called a
model. And initially my model was just a
stupidly simple uh dictionary in memory
and that evolved eventually into
frostams.db. So your model is generally
your persistent data like where you're
storing data related to the application.
And even though the picture doesn't lend
itself to pronouncing it in the right
order this is what's known as the MVC
paradigm model view controller. And it's
a very common way of developing web apps
by just thinking about the different
problems you need to solve with this
kind of nomenclature. Like I've got to
implement my controller which does all
of the logic, all of the variables,
functions, conditionals, loops, and so
forth. I've got to implement the view
which contains everything the user sees
and interacts with like the HTML. And
I've got to eventually implement the
model which is like all of the backend
data space and such. The catch though is
that this is not a clean line because
clearly in views we've seen variables,
we've seen loops, we've seen
conditionals. So this is just a general
mindset to have and in the real world if
you ever uh explore web apps again you
are henceforth familiar with what's
known as this MVC model. But now let's
solve some other real world problem. So
here's what you see on the occasion that
you sign into something like Gmail or
really any other website that asks for a
username and then eventually a password
or some such thing. This is just a web
form. It looks a lot prettier than mine
because they're using some fancy CSS to
make things blue and nicely indented and
so forth, but it's just HTML underneath
the hood with probably an input type
equals text to give me this text box. Of
course, when you log into Gmail after
providing your password, somehow Gmail
remembers often for days, weeks even
that you have logged in already. Now,
how is that actually working? Well, when
you first log into a site like Gmail and
click submit or the next button in this
case, presumably the browser is
submitting in a virtual envelope, so to
speak, a message like this to Google's
servers. Post slash something to
accounts.google.com, which happens to be
the URL that Google uh typically uses
for this. And inside of this, the dot
dot dot is your username and password
and anything else that might be
submitted to the server. Ideally, the
server responds to you with 200. Okay,
like here is your inbox. Okay, you
logged in successfully, but it also
underneath the hood, every time you've
been logging into Gmail, has been
planting a cookie on your computer. And
you might be generally familiar with
cookies. They have kind of a bad rap
because they're often used and are used
quite frequently for tracking, for
advertising, um, and really kind of
keeping eyes on you in some way. But in
their basic form, they're just a feature
of HTTP, which is wonderfully useful
because it solves some typical problems.
Uh this is another HTTP header that is
usually inside of those virtual
envelopes that come back from servers to
browsers. In addition to telling the
browser what the type of content is in
the envelope, it might tell the browser,
please set the following cookie. A
cookie is just a key value pair. It
might be something like session
literally equals some value. And that
value is usually a random string that
might be 1 2 3 4 5 6 or something like
that, but it's a unique identifier. Or
naively, if Google implemented cookies
poorly, they could technically tell your
browser to store a cookie on your
computer containing your username and a
password. Why? So that tomorrow when you
open up Gmail, you're not prompted again
with the stupid form to log in. It
already knows your browser that you're
logged in. And your browser can do that
by just sending the same cookie it got
yesterday to the server. Now, this is
bad to use cookies to store usernames
and passwords generally because it's
putting very precious data in the
browser's memory and any sibling or
roommate who walks over to your browser
can now find your username and password
by just poking around your cookies. So
generally what browsers do is more like
this screenshot here whereby all the
server does is it puts a big random
value on your computer somewhere
essentially a text file containing a big
random value and that is equivalent
essentially to sort of a handstamp like
if you go into a bar or a club or an
amusement park generally you show your
ticket once when you go in and then
thereafter you just show your hand if
you want to be able to come and go again
and again. So right now my hand has not
yet been stamped. We uh have this nice
here smiley face sticker. I might have a
smiley face now on my hand anytime I
want to go back into the bar or club or
amusement park because they now know,
oh, we already checked who you are,
presumably the very first time that you
came in. That's all cookies are
effectively doing is it's putting a
virtual handstamp in your browser
because the browser the next time you go
to Gmail and click on a link or click on
an email. Your browser unbeknownst to
you will send a get request that looks
like this but also contains a line like
cookie colon and then that same key
value pair. It's like presenting your
handstamp again and again every time you
open an email or click on a link in
Gmail. This cookie header is what the
browser sends. This set cookie header is
what the server sends. So this is the
act of stamping your hand. This is the
act of presenting your hand. And that
effectively is how browsers and servers
remember who you are. This is how
advertisers generally remember who you
are because at one point or other they
put a cookie on your computer and
unbeknownst to you, you're going to this
website, this website, this website and
your browser has been presenting this
handstamp all this time so advertisers
know, oh that's David again, that's
David again. And that's David again
because they're seeing the h same
handstamp. And so one of the reasons why
last week for instance I kept opening
things in incognito mode which you might
use generally if you want to do
something private and not have it be
saved in the computer's memory is also
because incognito mode gets rid of all
of your cookies when you close the
window effectively like wiping off the
handstamp the next time you go to that
same website. So that's all a cookie is.
It's a key value pair that can be
planted on your computer, but it's a
wonderfully powerful mechanism for
implementing, and this is the juiciest
idea for today, I'd argue, what are
called sessions. Sessions are this
feature whereby browsers and servers
have a persistent connection to each
other, even though HTTP is what we'll
call stateless. So stateless just means
that you don't have a constant
connection to the server when you are
using a website. And that's not always
true. And nowadays you sometimes do have
a consistent a persistent connection but
cookies allow you to close your laptop
even shut down your computer come back
the next day and still have the illusion
of being connected just as you were the
previous day because of this virtual
presentation of handstamps. So a session
more concretely you can think of in
Python as a dictionary of key value
pairs that you can associate with each
and every user. That is to say, when I
log into a website that is using
sessions implemented with cookies, they
can store any number of key value pairs
about me in the server's memory. And my
presentation of the handstamp will
ensure that they keep uh they know which
key value pairs to assign to mate. Let
me go back into VS Code here and let me
CD into a directory with which I came,
which is called login, which is just
going to be a relatively simple Flask
application that demonstrates how you
can implement the ability to log into a
website. And we'll keep it super simple
with just usernames, no passwords. But
as you'll see in problem set 9, we'll
add some passwords to the mix as well.
If I type ls inside of this login
directory, you'll see some familiar
friends, app.py, requirements.ext, and
templates. But let me draw our attention
to one other library we're going to now
start using called Flask session. So
flask session is just a third party
library that gives us the ability to use
cookies in our application and not have
to know or understand any of the
screenshots we just saw of HTTP
requests. it sort of suffices to
stipulate, okay, someone figured out how
cookies works. I just want to use them
now as a feature so that when a user
uses my website, I can associate data
with them like who they are, what their
username is, and therefore that they've
logged in. So, let's go ahead and close
requirements.ext and open up app.py in
this case. Here is an implementation of
a program whose purpose in life is to
enable me to log in. And in fact, before
we demon before we walk through the
code, let me do this in this uh
terminal, let's do flask run. And I
already hit control C on my other
terminal window a moment ago. Uh let me
now go into my other tab up here and
reload the slash route, which is now
going to be this login route instead of
frost imams. All this website does by
default is it tells me first you are not
logged in, but here's a link to log in.
It's a little small, but if you look in
the bottom lefthand corner of my browser
right now, it's a URL that ends with
slashlo. And in fact, I can see that
more clearly if I view page source in
the browser. Here is the only thing I'm
really seeing in this web app so far.
But notice what happens now. If I click
on login, the route in my URL just
changed to /lo. I'm again keeping it
simple with just usernames, no
passwords, but I'm going to log in as
David and click login. But first, let me
show you the code. In view page source,
I have a form that submits to /lo using
the post method. The only thing about
this button that's that form that's
interesting is it's got a text box and a
login button. Same as we've seen before.
So, let's click it. Now, I click login.
And notice I get whisked away back to
the original route, the slash route.
Even though Chrome is hiding the slash
from me, but the website somehow knows
that I'm logged in as David. In fact, if
I open up my page source in the browser,
I'll see that now it doesn't say you are
not logged in. It says I am logged in as
David. And it's now giving me apparently
conditionally a logout link. So I argue
this is representative now of any
website that lets you log in and out of
it. So how does this work? Well, in my
login account uh in my login app here,
what do we have in app.py? The
following. I've got from flask import
flask redirect render template request
and a new one session which you can
essentially think of as a dictionary
where you can store key value pairs for
each and every user and flask will make
sure that your code has a different copy
of session for every user that visits.
You can just treat it as though you only
have one user, but Flask will ensure
that when a user visits, they get their
own copy of session, their own copy of
session, their own copy of session
essentially to store whatever you want.
This next line here, I just need to copy
paste from flask session import capital
session. This line is the same. Turn
this file into a flask app. This stuff
is new and find a copy paste. This just
says configure this app to use sessions
by storing the cookies on the server as
files instead of in a database or
somewhere else. But this is the default
that we use for our examples. All right,
what's going on here? Well, in my slash
route, I've got an index function whose
purpose in life seems to be to render a
template called index.html and then pass
in a name placeholder, which is the
value of session.get.name.
So whatever name is stored in the
session if any that gets passed into the
template. So let's go down this rabbit
hole. Let me open up index.html.
Interesting. So here is the logic that
implemented those two different versions
of the homepage that we saw. If the name
has a value, so if it's not empty, we
saw you are logged in as such and such.
Here's a logout link. If though there
was no name, as happens by default
before you even log in, you see you are
not logged in. Here's a link to log in.
So that's all the homepage is is it's
conditional logic checking if there is
in fact a user logged in. All right.
Well, let's go back to app.pay. How does
the login work? Well, if you find your
way to the login route, then I'm asking
a question. If the user got here via
post, they probably got here by clicking
the login button that I gave them. So,
let's store in the session dictionary
the word name and make the value of that
key this value here where what I've just
highlighted is whatever the user typed
into the form whether it's David, Kelly,
John or anyone else. That's what comes
back from the form and I'm just storing
that in the session which again is like
this special global variable that you
get one per user and it's implemented
underneath the hood by way of cookies or
these handstamps. Then I'm just
redirected to the slash route.
Otherwise, if the request method wasn't
post, that means the user just van newly
visited example.com or whatever my
website is. That's why I show them
login.html. All right, let's go down
that rabbit hole. Let's open up
login.html.
It's pretty simple. It's just a stupid
form that has a text box and a submit
button. But the most important part is
that as we saw in the browser, it
submits to /lo the route we just saw.
All right, if I go back to here, how do
you log out? Well, we didn't actually
click this, but here is how you can
delete the contents of the session and
actually log the user out. You just call
session.clear. And so, in fact, if I go
back over here and click log out, how
does the server know that I've logged
out? Well, that route very quickly, you
didn't even see the URL bar change
logged me out by clearing the whole
session. And so, the cookie that was
planted on my computer was essentially
deleted at this point in time. Or
really, the server side data that's
associated with that cookie was deleted.
So, I'm no longer seeing it at all. So,
that's kind of it. Like, if you log into
a website, whether it's Facebook or
Gmail or Outlook or anything else, like
that's effectively how they're logging
you in, but of course, they're adding
into the mix some uh passwords and other
security as well. All right, how about
one other example? Let me go back into
VS Code here and let me go into my first
terminal, hit C to kill this login
example. Let me hit cd to go back and
then cd uh store to implement the
simplest of web stores like some kind of
e-commerce site that has an actual
shopping cart implemented. Let me do
flask run inside of this directory. Open
up my other terminal window. And in my
other terminal window, I'm going to go
cd to go back and then go into store
here where I'm going to see some
familiar files, namely app.py
requirements.ext, but a database file
this time in addition to my templates.
Well, let's see what's inside of that
database. Let me go ahead and run SQLite
3 of store.db dots schema to see what's
in the database. Ah, this is like a
bookstore like the very first version of
amazon.com if you will. And the table
has uh two columns an ID column and a
title column for all of the books that
this store shall sell. Well, what are
those books? Select star from books
semicolon. Okay, so this is a bookstore
that sells only five books among them
the Hitchhiker's Guide to the Galaxy and
sequels. All right. So, wouldn't it be
nice if we have a website that displays
everything in this catalog and lets me
like add things to my cart? And in fact,
here is maybe the better metaphor for
what a session is. A session essentially
gives you the ability to implement a
shopping cart like this where the
shopping cart of course in the real
world is specific to each user. Like if
I'm on Amazon.com and Kelly's on
Amazon.com and both logged in, we
obviously don't see the contents of each
other's carts. And that's because we
have separate cookies on our hands. And
so Flask or whatever Amazon is using
creates the illusion that we each have
our own global dictionary called session
in which Amazon can store any key value
pairs it wants like what's in our
shopping cart. So let's try this. Let me
go back to my other browser and reload.
So I'll now see not the login example
but the bookstore example. And it's
super ugly because I whipped it up using
the simplest of HTML. But you'll see
here every one of the books in the
database plus an add to cart button. And
even if again you're sort of new to all
this web programming, there's not all
that much you can do with HTML except
use forms maybe with some hidden
elements to achieve this result. So here
we have the H1 tag with books. Here's an
H2 which is big and bold but not quite
as big. Here's the form. Here's the uh
here's the button for the Hitcher's
Guide to the Galaxy as an aside because
there's like a curly quote or an
apostrophe in the book's name. This is
just an HTML entity that Flask is
outputting for me, even though it's not
there uh visually in the database. So,
what is the button do for Hitchhiker's
Guide to the Galaxy? Well, it's a form
whose action is /cart, presumably
because I want to add it to my cart
using the post method. I've got an input
name equals ID, the type of which is
hidden, the value of which is one. And
fast forward 2 3 4. So just like the
dregister example for Kelly, similarly,
is each book going to be addable to a
cart instead of removable by using that
unique ID? And indeed, every form has an
add to cart button. So what's happening
then on the server? Well, let's take a
look at the other tab here. If I go back
into uh VS Code and if I go into my
let's say let's minimize the terminal
window here and let's open up inside of
store. Let's open up our template for
index.html which is sort of the entry
point. Oh, which is not that. Uh let's
open up app.py first and figure out
what's going on. So at the top we have
some imports including our SQL library.
We have an app variable being created, a
DB variable being created using that
same store.db. We've got this
boilerplate code which just again
enables cookies and stores the contents
on the local file system instead of in a
database. Ah here's the interesting
beginning point. How did I see that big
page with all the books and the buttons?
Well, for the slash route, we've got
this function that first uses some SQL
to get all of the books from the
database. Select star from books. And
then, ah, there's no index.html because
I called it books.html in this case just
because. And I set the books placeholder
equal to the value of the books
variable. All right, let's go down this
rabbit hole now. Let's open up the
templates folders books.html file. Okay,
so here we have that H1 with books and
then we have a for loop which is going
to output for every book an H2 tag and a
form tag a form tag again and again and
again each of which has a value that
equals the current book's ID but the
title in the H2 of course is the title
of the book which is more human
friendly. So what happens when I
actually click on add to cart for the
Hitchhiker's Guide to the Galaxy? Well,
I should indeed see that now that one
book has been added. And if I go back
and add another like the restaurant at
the end of the universe, I now have two
books in my cart. So, where is that data
actually being stored? Well, if we go
back to VS Code here, uh, hide the
terminal and focus on the cart route.
The cart route because it supports post
in addition to get also is doing this
for me. Well, first it's checking with
some logic here. If there is no cart in
the session, go ahead and create a key
called cart and set it equal to an empty
list. In other words, I can put any key
value pairs into the session that I
want. So, if I want my shopping cart to
effectively be a list of all of the
books that the user has added to their
cart, it stands to reason that my cart
by default should just be an empty list
when they first arrive. However, if the
user has clicked submit in order to get
here, well, I'm going to do this. I'm
going to get the ID of the book that
they've submitted via that form. And if
it indeed exists and it's not someone
like Kelly messing around and sending me
invalid parameters, I am going to append
to the cart list in the session the book
ID. And then I'm just going to redirect
the user to the cart. And anytime you do
a redirect that always is using get, not
post. And so when I come back to this
cart route later, I'm not going to be
using post. I'm going to be using get,
which means this chunk of code here is
executed. I have a variable called
books. set it equal to the results of
doing select star from books where id in
the following parenthesized list of ids
recall that in is the preposition that
gives me back multiple ids if I so
choose and then I'm rendering cart.html
HTML with those there books. And if I go
back to the application, the reason why
I'm seeing two elements here, and indeed
if I go to my developer tools or view
page source rather, I'll see two list
items inside of an ordered list or a
numbered list containing the contents
then of that shopping cart. All right.
So, if we now have the ability to use
sessions to remember who has logged in
and we have the ability with sessions to
remember what someone has added to their
shopping cart, what else can we do with
web applications more generally, even if
not using sessions? Well, let me go
ahead and close this tab here. Let me go
back to VS Code here. Close out these
two examples and let's do a final set of
examples that demonstrate what we can do
with some real world data and a web
application. I have lastly a directory
called shows which is evocative of our
use of IMDb in the past. And I'm going
to go ahead into my first terminal
window. Hit control C and call your
attention to one thing before we move
on. Every time I have executed a SQL
query inside of my code in my first
terminal window where Flask is running,
you'll see either in green for success
or yellow or red for some issues the
actual SQL code uh SQL commands that are
being sent to your database. This is
useful if you mess something up at some
point related to a database query. You
can actually see in your terminal where
you're running flask run actually what
SQL command was sent to the server to to
try to troubleshoot errors that way.
Otherwise, you're just flying blind when
actually interacting only with the web
browser. But for now, let me go ahead
and clear that away and cd back to my
default directory and cd now into shows
where if I type ls, we'll see a whole
bunch of files. app.py requirements.ext
text and this time shows.db which is the
very same database that we had in past
weeks when we played with some of the
very large number of shows in the
internet movie database. And what does
zap.py do here? Well, it implements the
simplest of programs. This gives me
access first to shows.db with some
boilerplate up top. If I scroll down
here, you'll see that there's a uh
index.html template that's rendered by
default. And then apparently there's a
search route which is akin to what
Google does for us when we searched for
cats and dogs in the past. But for the
first time I'm implementing my own
search engine for TV shows, not for dogs
and cats. But what does this search
route do? Well, it uses a shows variable
and it executes the SQL select star from
shows where title equals question mark
and it passes in just like Google does
the Q parameter for query and then it
renders a template called search.html
HTML passing in those shows as a
placeholder. In other words, what does
this do? Well, let me go back over to
the store uh to the store tab here.
Change the URL to just slash. And
because I'm now running uh I'm no longer
running the store, I do want to go ahead
and run in my first terminal window
flask run to start start off the shows
application instead. So if I now go back
to that tab because no server is
running, what I see here now is the
simplest of search boxes like our Google
example asking for a query, but this
time I can search for things with which
I'm more familiar, like the office,
capital T, capital O, search. And what I
get back, not that enlighteningly, but
is the title of every show that matches
exactly that. If I go ahead and view
page source, you'll see that what was
generated was a unordered list of
offices that are in the database. And
recall there's the British one, the
American one, and a bunch of others as
well. However, this form does not work.
If I type in something like the office
search, I get no results in that case,
which isn't so much a bug. Well, is just
a lack of features here. And so, let me
actually go into VS Code here, and let
me propose that we come up with a better
version of this code. So, in fact, I'm
going to go into the pre-made examples
with which I came today. I'm going to go
into the next version of shows here. Run
flask run here. reload the application
over here and now show you that the
office in lowercase does actually work.
Moreover, it searches for anything that
mentions the office. So if you had to
guess how might this be implemented
underneath the hood, well, if I open up
my other terminal window and go into
that same directory, shows one and open
up this version of app.py, PI you'll see
that instead of using a simple query
like before I'm now using the like
keyword here because I'm checking that
it is like the office and notice this is
a bit clever here or a bit confusing at
first glance the placeholder I want is
question mark but I don't want to just
search for the user's input I want to
tolerate zero or more characters to the
left via the SQL wild card and zero or
more characters to the right so I'm
concatenating onto the user's input a
percent sign here a percent sign here
because recall from our week seven with
SQL. This just means look for anything
case insensitively that has t space o
ffic in it no matter where that string
is in the text. How did it know to
render that though as this bulleted list
of all of these offices? Well, let me go
into my terminal here and open up uh
search.html which is the template that
the search route is using. And you'll
see that I'm just iterating over with a
ginger for loop each of those shows. and
then outputting a list item for each of
those matches effectively just as I did
before. But there's this other technique
I can use altogether and it's generally
going to open up more possibilities for
us in final projects if not beyond of
creating essentially my own API. Rather
than to just make a web app that spits
out the entire HTML page that I want the
user to see, wouldn't it be nice if I
could just start to create routes that
spit out the data that I want and then I
or even some third party making a
website with the same data can integrate
my application into their own. And
indeed, an API is an application
programming interface. And it's
essentially web- based functions you can
call to get data from someone else's
services generally using HTTP. And you
can return the data in any number of
formats in text format um in HTML format
or in something called JSON format which
is short for JavaScript object notation
which looks a little something like this
which is quite like Python arrays and
dictionaries combined. But notice here
with a wave of the hand, there's a whole
bunch of key value pairs in this
particular example of all of the offices
that are in IMDb's database. And so I
wanted to show us these final versions
of this same shows application that
works a little bit differently. If I go
into say shows 2 example here now run
whoops and let's go ahead and exit out
of the previous flask copy and run shows
two inside of which is flask run. Notice
here that if I go back to this web form
now, notice that there is no more search
button because this is meant to be
highly interactive and I can search for
t space of ffic.
And you'll notice that this is
effectively autocomplete which we saw a
taste of last week with JavaScript which
I am in fact using here. But how is this
working? Well, let me reload and open up
my developer tools. And in developer
tools, let's watch the network tab this
time because when I type in something
like t, you'll see that my web page
suddenly made a request to my own
slasharch route. And if I click on my
developer tools and look at the response
that came back, you'll see that the
slasharch route spit out not a full web
page, but just a whole bunch of LI tags.
Now, why is that? Well, let me go back
to VS Code and open up in my other
terminal uh app.py. And in app.py,
scrolling down to search, you'll see
that when I get shows from the database,
I'm still using search.html, which
previously extended my layout and
plugged in that whole ordered unordered
list. But this time, if I go into this
version of search.html, HTML, you'll see
that I'm only spitting out raw HTML
because I'm assuming that maybe someone,
myself included, wants to use slash
search to just get a whole bunch of list
items that they can put into their own
unordered list or UL tag. And so what's
effectively happening over here is every
time I type a letter, notice at bottom
left, another HTTP request goes across
the internet, another HTTP request, and
each of those is returning the set of LI
elements that line up with the query
that I've typed in. But this is a little
sloppy arguably in so far as I'm
returning a chunk of HTML, but out of
context, and I'm dictating to the user
that they have to use list items.
Wouldn't it be nice to just send the raw
data? And I can do that, too. Let me go
back into VS Code here and look at our
final example, shows three, inside of
which is a version of this code that now
returns that so-called JavaScript object
notation. And if I go into shows three,
run flask run, go back over now to my
browser tab, and click reload, I'll see
now when I search for say T and click on
that row. Notice now in the response tab
of my developer tools, I'm getting back
a whole bunch of juicy information. A
massive JavaScript object notation chunk
of data. Notice the square bracket means
here comes a list or an array. Here
comes a dictionary or dict. And indeed,
that's what I'm seeing. This looks like
Python, but it's technically JavaScript
and it's technically JavaScript's object
notation. This just means this is the
juicy data I'm getting back from the
server. And if you now think way back to
week zero and even our family weekend
lecture on AI, a lecture on AI where I
was writing code that talked to open AIS
so-called API to get responses from our
serverside cat. They were sending us
JavaScript object notation like this and
I was just grabbing the data that I
actually cared about, namely the cat's
actual response. And so in this case, if
I open up in my other terminal window
here, app.py, Pi. You'll see in my
search route that instead of returning a
template, I'm using a crazy named
function called JSONify, which is just
another function that comes with Flask
itself that has the effect of taking the
list of Python dictionaries that came
back from my SQL database, JSONifying it
in such a way that I then can uh serve
it to anyone on the internet, myself
included, as a service so that I and
they can use my own data to implement
ment their own web web applications. So
that's sort of it for web programming.
Ultimately, you now have all of the
building blocks from week zero onward to
make your own web applications. And if
you so choose for final projects, your
own mobile applications, even if this
too, like everything else has felt like
a bit of a fire hose, it is in the
process of your final project of
specking out and proposing and executing
your own final project that will make
all of this feel much more comfortable
and familiar. And you'll look back on so
many of the past weeks as useful
building blocks. Uh but this then was
your CS50 education weeks 0 through
nine. We have just one more left next
week. So we'll see you then.
Heat. Heat.
Heat.
Heat.
All right, this is CS50 week 10, the
very end. And we will end today's class
just as we ended week zero, which is a
little bit of cake outside in the
transcept. But over these past 10 plus
weeks, if you've been feeling like it
was that proverbial fire hose sort of
hitting you in the face with so much new
content, so many new skills, so many new
challenges, um realize that you're in
very good company. And we can officially
declare nonetheless that if you started
the class among those less comfortable,
you are officially after today no longer
less comfortable. You're at least
somewhere in between. And if you were in
between, you're more comfortable. And if
you were more comfortable, you're
perhaps now most comfortable among those
here. Um, but keep in mind as per CS50
syllabus, what does ultimately matter in
this course is not so much where you end
up relative to your classmates, but
where you end up relative where uh to
where you yourself began. And that's
taken into account come final projects,
come final grades. But most importantly,
that's really what's most important
educationally in general is that delta
from week zero to in our case here now
week 10. Uh, so if it's any reassurance,
something I like to bring up around this
time is just how badly I did in CS50 and
like the very first problem set. Like I
didn't even get hello world right
somehow in the fall of 1996. So here's a
photograph of my homework assignment for
assignment one. It was a program to
print hello world on the screen. I was
incredibly detailed with my comments.
Even commenting that main is main which
is not the way you're supposed to
program. Even telling the the TF where
my file ended, which is not really
necessary. And I got minus two for not
even following directions uh correctly.
So take some comfort in that. Even if by
problems at nine, you're still getting
points off, you're hopefully, at least
in my case, in some very good company.
It only gets better and easier uh and
faster in time. But the whole course
ultimately has really been about this
picture, right? Problem solving is
computer science. And you have inputs,
which is the problem to be solved. You
have the outputs that you want to get
to, which is presumably the solutions
there, too. And inside of that
proverbial black box are these
algorithms, step-by-step instructions
for solving some problem. And I pulled
up my own notes from CS50's first
lecture some 25 plus years ago too where
I wrote down this in my horrible writing
handwriting to this day. But I noted
that what an algorithm is is a precise
sequence of steps for getting something
done which is pretty much what we now
say. Uh I noted that programming itself
as we have for weeks now is the process
of taking an algorithm and putting it
into a language that a computer can
process and that's what you've done in
Scratch and C and Python and SQL and
JavaScript and anything in between. Um,
and most important, at least my takeaway
that day when it comes to algorithms is
precision and correctness. Um, and
indeed those are points we've made
perhaps not as emphatically um, over the
past several weeks as well. But we
thought we'd see just how much those two
lessons in particular have sunk in uh,
by doing a bit of an exercise, some CS50
Pictionary and this our last lecture al
together this term. Um, for which to
begin we need one brave volunteer to
come on up stage.
Who would like to volunteer?
Who? How about Okay, over here. We never
call from the middle of the section.
Come on up. Come on up. A round of
applause for being so brave. Nice.
All right, come on over.
And in just a moment, let's go ahead and
do introductions. First, if you want to
come up over to the middle of the uh
stage and introduce yourself to the
world.
>> Hi, I'm Gia. I'm a freshman.
>> All right. Nice. Nice to meet you. Thank
you for joining us. So, what we're about
to do is G is going to look at my screen
where there's going to be a picture on a
white screen. All of you presumably have
a white sheet of paper in front of you
that you grabbed on the way in. If you
don't, just grab one from a friend or
your binder or the like. And if you
really don't, that's okay, too. But
hopefully everyone has a pen or pencil
or someone near you does. And what Gia,
we're going to ask you to do is program
the audience to draw what it is you see
on the screen. You can say anything you
want, but you may not use any physical
gestures or the like. Verbal programming
only.
>> Okay.
>> All right. Come on over to the lectern
and in just a moment GN only Gia will
see what is actually here on the screen.
So,
step one for your audience.
Okay. So, the first thing that you need
to do is draw two lines right next to
each other. Two vertical lines.
Okay.
>> Okay.
>> Step two.
>> Step two. Once you have done that, you
need to draw three dots. One on above
those two vertical lines, one right in
the middle between those two vertical
lines, and one at on the bottom of these
three vertical lines, but beneath those
two vertical lines. Yeah. So, three
dots.
>> Okay. Step three. Step three is on the
top of the left vertical line, you're
going to connect a line from that
position to the top dot that you drew.
And then on the top of the right
vertical line, you're going to connect
that position to the top dot that you
drew.
>> All right, step four
>> is remember that top left position?
You're going to connect that to the
middle dot that you drew. And then the
top right of the vertical line at the
Yes. You're going to connect that to the
middle dot of the line that you drew.
>> Got it?
>> And then step five, on the bottom left
of your left vertical line, you're going
to connect that position to the bottom
dot that you drew. And then on the
bottom right of the right vertical line,
you're going to connect that position to
the bottom dot that you drew.
And now from the middle dot to the
bottom dot, you should have no line in
between that. And you can now draw a
line between those two dots.
>> Step six and the last.
>> I think you should be done.
>> All right. A round of applause then for
our programmer. Let me give you a little
something
>> if you want to take a seat. So now what
Kelly and I are going to do is very
quickly collect your execution of this
program and we'll see just how it went
with Gia as the programmer. If you want
to just reach out and hand me or Kelly
over there any of your handwritings. We
don't need all of them. Just a
representative sample will suffice. If
you're proud of your work, extend your
hand quite a bit. Okay. Very proud.
Okay.
>> Okay.
>> Okay. Okay. One more. One more. That's
okay. All right. All right, I'm going to
run back to the stage. Okay, it's okay
if we didn't grab yours.
All right.
All right. Thank you to Kelly for
grabbing these as well. So, without
having seen any of these, here is how
you all interpreted Gia's instructions.
So, here's one interpretation.
Okay. Perhaps similar or different from
your own. Uh here's another several
vertical vertical line question mark.
Okay. Uh here is
very narrow one.
All right.
And
and let's see if we got any other
variants thereof. Actually, the rest of
them are pretty consistent. So, G, if
it's any reassurance, I'm seeing a lot
of ones that look like this. Here's
another that looks like th this. And
here's yet another that looks like this.
So, if you're wondering where we're
going with this, if I go ahead and
reveal what it was Gia was looking at on
the screen, she was in fact having you
draw this here cube. So, some of the
takeaways here. So, suffice to say, not
all of that went well. Uh, but why was
that? Well, I dare say it was very easy
to get confused, I think, G, in some of
your words because you had in your
mind's eye exactly what it was you were
drawing. And of course, it was right
there on the screen. But we didn't
leverage, at least in G's instructions,
any abstractions. I dare say it might
have been a little bit easier for all of
us if maybe she had just teed things up
by saying, "All right, everyone, we're
going to draw a cube," for instance,
which is indeed an abstraction over
these lower level details that she was
focusing on. But perhaps there could
have been another approach altogether,
which is even more pedantic. For
instance, a lot of the earliest drawing
programs and even worlds like Scratch
sort of take for granted that you have a
coordinate system like X's and Y's and
you can go up, down, left, and right.
So, an alternative to just saying, "Hey,
I'll draw a cube, which could be subject
to interpretation because the cube like
this is it like this rotated." So, we
still would have needed more information
than just a cube from Gia. But here,
maybe an alternative approach would have
been to really get into the weeds and
say, "Put your pen at the top of the
page and then draw a straight line to
the southwest, for instance, and then
draw another line of the same distance
to the south and then to the southeast
or so forth." And it could have been in
terms of degrees. It could be
directionally in that way, but it might
not have been clear to anyone what it
was we were drawing until enough of the
lines suddenly appear on the screen and
then voila, you see that we've been
drawing a cube this whole time. So the
degree to which we're precise and the
layer of the level of abstraction that
we operate in is incredibly important.
Whether it's for another human to
understand us, for an AI to understand
us nowadays, or anything in between. All
right, why don't we go ahead and flip
things around a bit um for this? Why
don't we go ahead and get one more
volunteer to do something a little
different here on stage? One more. Okay,
how about here on the aisle? Come on
down. Round of applause for this brave
volunteer. Come on down.
All right. So, in this exercise, we're
going to flip things around. So, you all
will be giving the instructions verbally
by just shouting them out. And our
volunteer, whose name is
>> Presley.
>> Preston.
>> Presley.
>> Presley. Presley, you want to say a
quick introduction?
>> Yeah. Uh, my name is Presley. I'm a
freshman uh living in Stoton House.
>> Nice. Well, welcome. Come on over to the
the uh the easel here. And we have a
black marker for Presley here. And the
only thing that we ask is that you not
look up or behind you because the answer
is going to be right there on the
screen. But everyone else is welcome to
look up or over to the TV screen. And if
you want to go ahead and face the easel
here and as you draw, just make sure to
kind of open up after each uh stroke of
the pen so that everyone can see what
you have done. All right. So no looking
up as of now because what the audience
is about to do is to program you to draw
this on the screen. Oh, way to encourage
him. Okay. So, step one, feel free to
just raise your hand and we'll shout
them out.
>> Oh, I heard draw a circle over here.
>> But not too big. I heard over here
a stick figure.
>> Good abstraction. You're going to end up
drawing a stick figure.
But we should probably be a little more
helpful than that. So, let's do the hand
thing just so we can be more precise and
not overwhelm Presley. There was a hand
over here. Yeah. And back.
>> Draw a line down.
>> Draw a line down from the circle.
Presley
>> from the bottom
>> from the bottom of the circle.
Okay, someone else.
>> Actually, let me let me rewind. Sorry.
Say it again.
>> Draw two diagonal lines from the line
you just drew.
>> Well, I don't think the audience likes
this. Wait, let's Oh,
>> okay.
Okay, that's what we were told. Next
step, someone else.
>> Good one. Okay. Extend the original
vertical line to be about the same
height as the circle.
>> Okay. Yeah, that's good. Good feedback.
All right. Someone else. Next step.
Next step. Yes.
Draw two diagonal lines from the bottom
of the line.
>> Nice.
Draw two diagonal lines from the bottom
of that line that look like legs. Good
use of detail and abstraction.
Okay, nice. Next step.
>> Anyone? We're close. Yeah, over here.
line
>> on the left. So, you're going to draw a
speech bubble to the left of the head
with the word high, capital H, with a
short line.
>> No bubble, just high.
>> And you wanted to clarify one other
detail. And then a line from high to the
face.
>> A line from high to the face
>> with space in between.
Okay. No, you're doing great. It's okay,
Presley. Okay. Hang in there. Okay.
Final step or two.
Next step.
Anyone at all.
>> Feel free to shout it out.
>> Adjust the arms to make them look like
they're running.
>> Adjust the arms to make them look like
they're running.
Good luck.
>> Draw a perpendicular line from the left
arm.
>> Oh, I like that. Draw a perpendicular
line from the left arm
>> to the bottom
>> to the bottom.
>> Okay. And lastly, one final step.
>> Same side as
Yeah, it's permanent. Uh,
I think we need a final touch on the
other arm. Maybe. Yes. One final step.
>> Anyone?
>> Draw a perpendicular line per diagonally
to the left
>> of the arm
>> of the right arm.
Just a little bit.
>> Just a little bit.
>> All right. I think I've I think we've
withheld our applause long enough.
Presley, if you want to take a step back
and look at what you They were trying to
get you to draw a round of applause.
So, here too. Let me Here you go. Your
dorm room if you would like. Okay. And a
little Super Mario as well. All right.
So, here too. Um, I think you were the
problem this time. Round of applause for
Presley.
And of course, since it's, you know,
permanent ink, it's easy to sort of go
off the rails early on and make a
mistake. But I think that was actually a
nice mix of low-level details like the
directions of the lines and the lengths
thereof and also some abstractions
because I do dare say someone shouting
out that it is to be a stick figure gave
him a much more helpful mental model. So
that might be sort of the comments on
top of the function, but when we really
got into the weeds of implementing that
function, it was more akin to stepbystep
instructions for solving this here
particular problem. So my thanks to
Presley for bearing with us with that
one as well. So beyond this, where have
we been up until now? So uh if we look
back at the past several weeks, this is
sort of the trajectory on which uh we've
been. So we started with scratch from
scratch literally in the very first
week. The goal of which was to introduce
you to some of those procedural
fundamentals like what a loop is and a
conditional and boolean expressions and
variables which have pretty much
recurred in different forms and
different languages over the week since
thereafter we transitioned to a more
traditional language C which many of you
will never use again and admittedly even
I only use it for like a month or two of
the year during CS50 itself. The intent
was to be this incredibly foundational
language that so many other languages
today are built on top of. Case in
point, the interpreter that you might
use for Python itself can be written in
C. And that speaks to how we sort of
talked about bootstrapping from one
language to another, from lowlevel to
high level and beyond. Arrays and
algorithms, all of that and uh memory
and data structures like all of that is
sort of omnipresent in computing, in
programming and the like. even though
you might not need to in modern
languages like Python uh worry as much
about managing your own memory because
good programmers better programmers have
figured out how to solve those problems
for you in the language itself or in the
libraries that you're using. You can
take for granted now that you at least
know what a hash table is, what a linked
list is, what the trade-offs are among
those, what the running times are. And
that's what computer scientists and
software engineers think about and talk
about and whiteboard about in the real
world when trying to implement
algorithms of their own to real world
problems or implementing real world
products. And then of course over the
past few weeks we've sort of used that
as a stepping stone to talk about very
modern programming paradigms. most
recently web programming. And even
though we didn't use it explicitly in
the class, mobile programming is
increasingly based on HTML and CSS and
JavaScript, which might be something
some of you will tackle for your own
final projects. And you can't escape now
using or seeing or leveraging somehow
artificial intelligence. And among the
goals for today is to at least point you
in the direction of tools that now
having finished problem set 9, you are
welcome and encouraged to use for your
final project so that you can build all
the more um and all the more
successfully than even some of your
predecessors just a few years ago could
have now that your own work and your own
knowhow can be amplified by the impact
of AI itself. Um this of course now
brings us to today the end, but wanted
to give you a sense of where you can go
here on out. So with your final project,
this really is the uh the intent of the
final project is to be the very first of
hopefully many projects that you decide
to spec out for yourself. Like every
problem set thus far has been written by
me and the team and you've been
following our instructions step by step.
The final project takes all of those
training wheels off. And even though you
are welcome and encouraged to borrow
code from say problem set 9 if you want
to do something web- based or even
earlier if you want to do something
that's more similar to past pets is to
make it ultimately your own. And even if
you want, start with a completely empty
window and just a blinking prompt and
build something of your own. Um, setting
out for yourself, as you've seen in the
specification, a good goal, which you
intend to meet no matter what, a better
goal, which is a bit more of a stretch,
and a best goal, which in practice
rarely ever happens with software. To
this day, 25 years since taking CS50
myself, um, or plus now, um, even I
consistently underappreciate just how
long it takes sometimes to solve
problems. But that's beginning to go
away at least to some extent thanks to
AI where at least now you essentially
have a junior colleague next to you who
can help solve bugs for you, point you
in the right direction, even tackle
features as well. Um, all that we ask
for this final project is that you build
something of interest to you, that you
solve an actual problem, that you impact
campus, or that you, as we say in the
spec, change the world and try to
achieve something, try to create
something that outlives the course
itself over these final few weeks of the
class and even continue on with it if
you'd like in January and beyond. Uh,
for now, this the so-called CS50
charades for which we need two teams of
three. So, if you're sitting there in a
group of three of friends total, or
we'll form one up here live. So, come on
up as our first volunteer. Need five
more volunteers. Feel free to volunteer.
The person's next to you. Three in a
row. How about two more over here? One.
And how about two on the end? Come on
up. All right. And a round of applause
for these six here volunteers. And
all right, let me give you one
microphone.
Let me give you second microphone. And
Kelly, if you want to come on up as
well. I think these three seem to know
each other already. So, we'll have them
be one team. If you guys want to be
another team as well, come on up. Uh,
let me take one microphone actually for
the other team. All right. And how about
quick introductions to this team here.
And first, we need a team name from you
all. You haven't had time to think about
this.
>> Team A. Okay. So, team A is who?
>> Uh, I'm Leah. I'm a first year and I'm
in wholeworthy.
>> Welcome. Uh,
>> my name is Stephen. I'm a freshman in
candidate F.
I'm Charlotte. I'm a freshman and I'm
also in Canada F.
>> All right, let's do introductions on the
other team as well. You are going to be
team
>> Awesome Sauce.
>> Awesome sauce. Okay. Versus team A. Uh,
if you want to go ahead and introduce
yourselves here.
>> Hi, my name is Jenny Pan. I'm a freshman
in Hollis.
>> Hi, my name is Noah. I'm a freshman in
Halbut.
>> And hi, my name is Marie and I'm a
freshman. Sorry, I'm a freshman in
Canada.
>> All right, welcome to both of our teams
here. And among the goals now, let's
leave one microphone with each team, uh,
is to play a bit of charades whereby one
of you in a moment is going to be
responsible for acting out a word that
you see on the screen. So, we're going
to put on this screen and this screen
over here some term that relates to CS50
somehow, and that person's goal over the
course of 60 seconds is going to be to
act that out in such a way that their
teammates can hopefully guess what the
word is. We'll give you 60 seconds at a
time. Kelly has kindly offered to keep
score. Um, and if you solve it in fewer
than 60 seconds, we got another word for
you and another word. And we'll see how
many points you can acrewue over the
course of those 60 seconds. And
depending on how this goes, we'll do
maybe one or two rounds in total.
Questions.
>> Skips do we get?
>> How many skips do you get? I guess you
can skip uh as many as you want until we
run out of questions.
>> Oh. Oh,
>> but try not to run through all of our
questions. All right. Any questions
though beyond that? All right. So, if
you guys want to step off stage over
there, why don't we have team A begin?
So, one of you, Leah, if you're holding
the mic, if you want to be the charader,
let's go ahead and have you stand here
so you can see the screen. And we only
ask that you two not look up because the
answer is going to be right there.
>> All right. And you should just shout out
uh the word that Leah is acting out.
Question.
>> Acting only charades.
>> Speaking.
>> Yeah. Yeah, I can't speak because that
would kind of defeat the point. So, yes,
just acting out. Just acting out
physically. All right.
>> I'm going to go over here. Give me just
a moment to get the slides ready with
your questions. And Leah, the first
clue. Oh, and Kelly's going to be timing
you. 60 seconds to acrew as many points
as you can. All right, here we go. Go.
Act that out.
>> Oh, that was weird. Thank you. Sorry.
Yes. Act out. This is CS50. All right.
No. Act this out. Please go.
>> Loop. calling a recursion.
>> Yes. One point
>> coming
>> uh an array link list
>> abstraction
>> snake.
>> Python. Python.
>> Yes. Python
>> duck. The duck.
>> Nice.
>> Binary.
Uh
>> one zero
>> binary digit bit
>> bite
>> one zero. It's definitely binary asy.
>> Want to pass
>> link list array.
>> Yes. Array
>> loop.
>> Yes. Loop
>> time. time. All right. Very nicely done.
All right. Five is the score to beat.
So, if you guys want to step over here,
if uh one of you has the mic, go ahead
and assume the same roles.
Five is the score to beat.
All right. Five is the score to beat.
All right. Here we go. Final round.
First word. And you guys just make sure
you don't look up.
Go. Head
node
>> algorithm
>> input
algorithm
>> these are hard
No.
>> Sure. You have to act it out. Act it
out.
>> Oh, they go.
Run time. Run time. What's that?
>> Tree.
>> Yes. Tree.
>> Next one.
>> Oh my god.
>> Next one.
>> Binary search.
>> Binary boolean. No.
A merge s call phone call
>> function.
>> It was binary search, wasn't it?
>> What was binary
>> phone? Oh, that's time. All right, but a
round of applause for our team awesome
sauce.
>> Okay, we have some some parting prizes
for you, your very own Super Mario Pezes
for you guys as well. I'm glad we
squared away that the ability to pass
though on the question, so thank you for
that. All right, so admittedly pretty
hard. Our thanks to all of these
volunteers for playing that out. Allow
me to turn our attention back to here in
just a moment where else uh we can go
from here. So up until now
up until now
we've been using Visual Studio Code for
CS50 at the URL CS50. Recall that this
is just an adaptation of a commercial
tool called GitHub code spaces which is
like a cloud-based version of Visual
Studio Code itself or VS code which is
an largely open source tool for
Microsoft that's incredibly popular in
the industry which is to say even though
we have the CS50 library in there and we
turned off by default some of the menu
options and we disabled AI. It is the
tool that so many programmers around the
world do use every day to write code. So
you have been learning all this time
sort of industry standards in that
sense. It is now time if you so choose,
but you are welcome to keep using this
for your final project if feeling more
comfortable with it. Uh to drop the
4CS50 and actually install on your own
Mac or PC if you so choose Visual Studio
Code itself. You can go to this URL
here. Um it's fairly straightforward to
install it. But invariably you'll run
into probably some technical support
headaches depending on the language that
you're trying to use with it. For
instance, if you're trying to use it
with Python, you'll probably also have
to download and install Python onto your
computer at least if you want the latest
version. And just know a priori that
sometimes just stuff happens and it just
doesn't work and you have to Google or
ask chat GPT and that's fine and
honestly that's kind of normal but this
is also why we don't do any of this in
week zero of the class so that we can
focus on hello world and Mario and cash
and credit and get into the interesting
parts of computing and programming and
not frust uh not frustrating you so with
technical support challenges. But now
given that all of you are somewhere in
between or among those more comfortable
uh you're now ready to sort of uh deal
with those same technical challenge
yourself. But who knows maybe it will go
perfectly smoothly. Um you can go to
CS50's own documentation because if you
want to be able to use all of the same
software that CS50 has pre-installed you
can use a technology known as
containerization with a tool called
Docker and actually run a CS50
environment on your Mac or PC or even in
the cloud but still run VS Code on your
own Mac and PC. Among the upsides of
which are that you're not dependent
necessarily on the cloud. You can do
everything offline. Uh which is useful
in general. You can do things more
quickly sometimes if you're using the
full capabilities of your own computer
and not just a browser. So this is
generally how uh programmers approach
their code using something like VS Code
or alternative products. And in fact
there's a bunch of others out there but
perhaps the trendiest right now are
these three here. Not just Visual uh
Studio Code itself um but a tool called
Cursor, another one called Windsurf.
There's dozens of other text editors,
often known as integrated development
environments, which tend to have even
more features that you can download for
free or commercially on your own Macs,
PCs, and the like. Uh, but you can't go
wrong transitioning from CS50 to VS Code
on your own Mac or PC, if only because
you're already familiar with it. As for
the command line, so those of you with
Macs might have found somewhere in your
utilities folder a program called
Terminal. Um, if not, poke around there
later today and you'll see that all this
time you've had a command line interface
available to you on Mac OS. Windows has
something similar as well. They don't
necessarily come with all of the same
tools that we've been using within
CS50.dev, but if you're a Mac user and
you go to this URL here, or you're a
Windows user and you go to this URL
here, or if you're a Linux user, you
probably know all of this already, so
there's no URL for you there. Um you can
install some of those same tools on your
Mac and PC and feel all the more at home
uh doing things in a command line as
well. Um git this is something that we
actually in CS50 abstract on top of.
This is essentially the de facto
standard nowadays for collaborating with
other people using a central cloud
server in order to share your code with
it and in turn other people uh for
versioning your code so that you keep
track of multiple uh versions thereof
and changes that you've made. um go to
this URL here if you would like and
you'll see a tutorial by CS50's own
Brian U introducing you to actual Git
because we've been sort of abstracting
away this particular tool by just doing
it all automatically for you. If you've
ever gone through your timeline in
CS50.dev being able to roll back to
previous versions of your code, we're
just using Git, but we're automatically
running this command for you. If you
want to collaborate with partners for
your final project, you can use Git.
However, I will encourage you to
alternatively use Visual Studio Code's
live share feature, which allows one of
you to log into your code space, click
some buttons, and then share access to
your code space with your friend or your
partner on whom with whom you're working
on the project, and you can both in real
time like Google Docs edit the code or
different files therein uh using that
one code space. A little easier than
getting onboarded at least with Git. um
hosting a website if this proves of
interest for your final project or even
after the course if it's a static
website. Two popular places to go if
only because they offer free tiers is
what's called GitHub pages which you can
use to just host HTML CSS and JavaScript
with no Python, no Flask, no backend. Um
or Netlefi is a popular company nowadays
too that has an uh entry-level account
that for which you can sign up for free.
If you just want to have like a
portfolio website, if you're an artist
or a programmer, you just want to have
static content that you write once and
deploy, these are good starting points,
but not all of them. Hosting a web app.
So, this law, this list gets even
longer. And all of these recommendations
are essentially uh curated by the
teaching staff. So, they're all
opinionated, but these are perhaps the
most common places you can go. Um,
Amazon, Microsoft, Google, Cloudflare,
they all have student type accounts. So,
if you use your.edu email address, for
instance, or some other form of proving
your status as a current student, you
can generally sign up for discounts and
free access to a lot of these same
services as well without having to pay
while you're just learning along the
way. GitHub has something similar called
the student developer pack. And then a
couple of other companies for hosting
web apps that have been popular are
Heroku, Verscell, and bunches of others.
So by web app we mean not just HTML, CSS
and JavaScript but maybe some Python
maybe some JavaScript on the server
maybe Ruby yet another language or any
number of others when you actually need
a backend in addition to the front end
maybe you need a database as well this
would be the place to start whether it's
at the CS50 hackathon or beyond um and
nowadays this is a slide that didn't
even need to exist a couple of years ago
asking AI again for your final projects
you are welcome and encouraged to
amplify your own productivity with AI
not by having it do for you but moving
away from the duck which by design has
been fairly limited and meant to be a
good teacher but not necessarily one
that's going to be a good partner when
it comes to building your final project.
So chatbt claw gemini uh GitHub copilot
openai codeex v 0ero um are all uh
popular tools right now that you might
want to play around with. The easiest of
these to use perhaps if not familiar
with say Chacha BT already would be
GitHub copilot only because you can
enable it within your CS50 code space by
following our own documentation at
cs50.thed the docs.io where we'll tell
you the sequence of steps via which you
can reenable AI now that you're allowed
to for your final project and turn on
all of those features that were disabled
by default. Um and then there's still
humans out there like it remains to be
seen just how popular these websites are
in the years to come for better or for
worse. Um, but among the places that
programmers and technopiles have gone
for years are Reddit, Stack Overflow,
Server Fault, where there's a rich
history of questions and answers that
ironically all of those AIs have been
trained on, which unfortunately means
some of these might be driven out of
business eventually in some sense if
we're all just turning only to AI. But
when you actually want that human
component, these are still good places
to go. Um, and then news. Two of the
many places you can go for news in
technology, computing, computer science
more broadly, would be TechCrunch is
still a good one. hacker news so to
speak and then you might have some of
your own popular choices as well. Um and
then if uh with some bias um take other
classes like CS50 besides this
undergraduate class has a rich history
now over the past decade of creating all
the more open courseware. So courses in
more Python, more SQL, a language called
R, cyber security, uh game development
and more. All of those are linked at
this URL here edex.org.css50
where you need not pay or sign up beyond
auditing the course and all of the
content is freely available. something
for winter break, for instance, if you
want to dive a little more deeply into
some subject for the sake of your final
project, your professional aspirations,
or even just to prepare for spring term.
And then over the coming weeks too, will
CS50 itself be soliciting interest in
applications for becoming a teaching
fellow or TF, a course assistant or CA.
If you would like to get all the more
involved as a teacher of CS50 next fall,
uh do uh follow the application link
that we will soon circulate uh via
email. Um, and do stay in touch too if
you just enjoy answering other people's
questions or seeing what the pulse of
sort of computing is. At this URL here
is a whole bunch of CS50's own
communities uh in social media largely
via which you can follow along at home
in the months and years to come too. So,
a few thanks before we do one final game
al together. Um, to all of the people
who have been making this course
possible. Um, so our friends at Memorial
Hall who make bring us into this
beautiful space and make it possible for
us to have of all things a class in such
a space. um our friends at ESS who help
with the audio each and every week in
CS50. Um the restaurant Changa down the
road, we hope you'll continue to visit
our friends there. Wesley Chen is a good
friend of ours and the manager um please
tell him you're from CS50 and I'm sure
he'll be delighted to see you. Um and
then CS50's own team, most of whom were
in back there or sitting next to you
with cameras um without whom the course
wouldn't be possible. And of course
CS50's own teaching fellows and CAS,
just a few of whom posed here for this
photo. If I could invite you to all give
everyone here a round of applause, my
thanks to all of them.
So,
um, and then of course the CS50 duck
should be thanked as well. Okay.
Thanks. The CS50's own Rang Shinlu and
some of our own former teaching fellows
and students who have been behind the
development of that their duck that
you've gotten to know over these past
several months. All right, if Kelly
could join me again on screen, the only
thing between us and cake is a final
game, namely a quiz show in which all of
you can partake. Here we go. Question
one. What is the largest number an 8bit
unsigned binary digit can represent?
256, 128, 255, or one?
Starting strong, and keep in mind all of
these questions came from you all
because we asked you recently for review
questions that are now on the screen.
Again the timer is clicking and most
popular answer was 255
which I think if we click once more
we'll confirm was in fact the correct
answer. So why is that and why is it not
256? Well if we start counting from zero
as we always have that's consuming one
of the 256 possibilities. So the largest
number that we can represent with that's
8 bit and unsigned which means no
negative numbers involved is indeed
going to be 255.
treasure that information now always.
All right, next question from Kelly.
Which issue is at the center of the year
2038 problem, which hopefully you added
to your Google calendars a few weeks
back. Integer overflow, malicious
inputs, SQL injection attacks, or memory
leak.
Which of those is at the core of the
year 2038 problem?
All right, let's go ahead and reveal the
number one answer with 92% of you saying
integer overflow is in fact correct
because we're still in the habit of
using 32-bit integers to keep track of
time from the so-called epoch which was
January 1st, 1970. And unfortunately, we
humans aren't great at sort of planning
ahead. And so we're going to run out of
permutations of 32bits by a certain date
in the year 2038 unless everyone
upgrades their computers to 64-bit
counters which thankfully most every
piece of modern hardware nowadays is
using already. Your Macs, your PCs, and
your phones. So hopefully this will be
really a non-event, but hopefully you'll
think of us in CS50 in uh you know 10
plus years when your Google calendar
reminder goes off. Question three, which
of the following is not a step of
compiling? Linking, pre-processing,
assembling, or interpreting?
Bit more of a challenge. Which of these
is not a step of compiling?
All right, almost 200 responses coming
in.
All right, why don't we go ahead and
reveal the most popular answer with 54%
of you saying interpreting is in fact
correct. Recall that we we talked about
compiling. Compiling itself is just one
of several steps. There is in fact the
pre-processing step which takes care of
any of the hash symbols in C that start
with hash include hashdefine and the
like. That's pre-processing. Uh there
was then assembling or there was then
compiling which actually compiled your
code into assembly code. There was then
the assembler which would actually take
it down further to machine code and then
linking
29. This is for 29% of you. The linking
step, recall, was taking your zeros and
ones and combining them with say CS50's
libraries zeros and ones and maybe the
standard IO libraries zeros and ones,
linking them all together to give you
one executable program like hello uh
itself. All right, next question. What
does a pointer store? The name of a
variable, the memory addresses of a
value, the size of a value, or the value
of a variable?
Think for a moment.
What does a pointer store?
All right, about 200 responses in and
yes, the memory address of a variable
with 96% of you confirming as much. That
is correct. Question five.
What is the running time of linear
search? Big O of 1, big O of N, big O of
N squared, or big O of N log N? linear
search running time.
And recall that with something like
search, you could get lucky. But if big
O is the upper bound on our running
time, you might not. You might hit the
end of the list that you're searching.
And so the running time of linear search
is of course big O of N. It might be
omega of one, but not big O of one. At
least if we're considering what the
worst case scenarios might be. All
right, on to question six. Which what
data structure follows the first in
first out principle? A Q, a link list, a
stack, or a hash table? First in, first
out, aka FIFO.
Which of these is FIFO?
All right. First in, first out is in
fact a Q as you would hope if you're
getting in line for a restaurant, for a
store. You'd hope that if you're the
first one in line, you're going to be
the first one out equitably speaking.
And so it is in fact a queue. The
opposite of that in some sense then
would have been a stack whereby when you
think about the cafeteria trays, the
sort of first one in is actually the
last one out. So LIFO instead for a
stack. All right, question seven. Which
operator returns the memory address of a
variable? An asterisk, a dollar sign, an
amperand, or a hyphen and a greater than
sign.
presumably in C
which returns the memory address of a
variable.
All right, let's see what everyone
thinks.
So the most popular and correct answer
is the amperand. This is the address of
operator. The asterisk recall in most
context is the opposite of that. That's
the dreference operator. It's actually
go to an address. Um this is not a thing
in C. Uh this though is similar in
spirit to a combination of the star
operator and the dot operator which
means to dreference and follow a pointer
to something inside of a strct
typically. All right, question eight.
Which SQL command is used to remove
duplicate rows from a result set?
Remove, unique, distinct, or clean?
We didn't spend a huge amount of time on
these keywords,
but only one of them applies here. A
result set is just the answers that you
get back when doing your select. And if
you want to filter out duplicates, you
can in fact say
distinct is correct. Unique is also a
keyword in SQL, but that is when you
want to define in your schema that a
columns values are going to be unique,
like an email address column instead.
Distinct is how you filter out
duplicates in your selects. All right,
question nine. We're past the halfway
mark. What does an HTTP code of 418
signify? Not found. I'm a teapot.
Forbidden, unauthorized.
418.
This too. If you know this one, moving
forward, you'll be considered among the
CS
elite.
answers are coming in a little slower,
but I'm a teapot is correct, which is
not actually a thing or useful
technology. It was in fact an April
Fool's joke years ago where a bunch of
computer scientists got together in a
room and wrote out an entire
specification for what it means for a
server to return 418. I'm a teapot. All
right, number 10. Where does Malo
dynamically allocate memory from? The
heap, the stack, global variables, or
assembly?
All right,
heap is in fact correct. That's the sort
of top part of the memory. Even though
top and bottom make no actual technical
sense. It's just our artist rendition
thereof. The stack recall is what is
used when functions are being called.
Every time a function is called, it gets
a so-called frame on the stack. That's
where your local variables and your
arguments get put. But if in C you use
maloc, it does in fact end up on the
heap. in C. If you allocate memory with
Maloc but forget to call free, what
problem can occur? A memory leak,
segmentation fault, stack overflow, or
all of the above
if you allocate memory with Maloc but
forget to call free. What problem can
occur?
All right, most popular answer is in
fact memory leak, which is correct. Um,
you could imagine scenarios in which you
also get a segmentation fault andor a
stack overflow, but those aren't direct
consequences of not calling free. That's
generally the consequence of using too
much memory, for instance, or in this
case doing something wrong with your
memory. So interrelated, yes, but in
terms of not calling free for each
maloc, this is what's going to happen by
definition. All right, well done there.
Next question, which is 12.
What does this domain name give the web
page of? Safetychool.org. Is it Harvard
University? Is it Princeton University?
Is it Yale University? Or Colombia
University?
All right. Recall that this was in the
context of our HTTP redirections.
Yes. Interesting. Yes. In fact, uh Yale
University, some alum has been paying
like $10 a year for like 20 years for
this joke. safetychool.org if you visit
it returns an HTTP 301 uh HTTP header
which says the location of it is in fact
yale.edu.
All right 13 three to go. What is the
purpose of DNS? Uh to encrypt data sent
over the dark web to find the nearest
coffee shop for you to protect your
location against hackers or to translate
domain names into IP addresses.
What is the purpose of DNS? If helpful,
domain name system.
All right, about at the 200 mark and the
correct answer is indeed domain names
into IP addresses. That is a server that
is on your home network, on your ISP's
network, on your campus's network, your
corporate network. That just answers
questions like that for you. All right,
second to last question. Which of the
following is not a built-in SQL feature
to tackle race conditions? Begin
transaction, commit, roll back, or
enroll?
We talked ever so briefly about this in
the context of ending up with too much
milk. Recall
and the correct answer is
indeed in roll. All three of those even
though you didn't have to use them for
problem set seven or nine um are indeed
uh features of SQL. Uh but enroll is not
a thing. All right. And the very last
question. and try to answer this as
quickly as you can. What does Professor
Men say at the beginning of every CS50
lecture? Welcome to Harvard's computer
science class. Hello everyone. Ready to
code? All right, this is CS50
or let's get started with some
programming.
All of these questions were in fact
written by you all.
All right. And the correct answer, I'm
pretty sure with 98% of you saying so,
is all right, this is CS50. And all
right, this was CS50. Cake is now
served.
Full transcript without timestamps
If you want to learn about computer science and the art of programming, this course is where to start. CS50 is considered by many to be one of the best computer science courses in the world. This is a Harvard University course taught by Dr. David Men and we are proud to bring it to the free code camp channel. Throughout a series of lectures, Dr. Men will teach you how to think algorithmically and solve problems efficiently. And make sure to check the description for a lot of extra resources that go along with the course. All right. This is This is CS50, Harvard University's introduction to the intellectual enterprises of computer science and the arts of programming. My name is David Men and this is week zero. And by the end of today, you'll know not only what these light bulbs here spell, but so much more. But why don't we start first with the uh the elephant or the elephant in the room. That is artificial intelligence, which is seemingly everywhere over the past few years. And it's been said that it's going to change programming. And that's absolutely the case. It's been that way actually for the past several years is only going to get to be the case all the more. But this is an incredibly exciting time. This is actually a good thing I do think in so far as now using AI in any number of forms. You can ask the computer to help solve some problem for you. You can find some bug or mistake in your code. Better still increasingly you can tell the AI what additional features you want to add to your software. And this is huge because even in industry for years, humans have been programming in some form for decades, building products and solutions to problems, the reality is that you and I as humans have long been the bottleneck. There's only so many hours in the day. There's only so many people on your team or in your company and there's so many more bugs that you want to solve and so many more features that you want to implement. But at the same time, you still really need to understand the fundamentals. And indeed, a class like this CS50 has never been about teaching you how to program. Like that's actually one of the side effects of taking a class like this. But the overarching goal is to teach you how to think, how to take input and produce correct output and how to master these and other tools. And so by the end of the semester, not only you will be not only will you be acquainted with languages like Scratch, which we'll touch on today if you've not seen it already, languages like C and Python and SQL, HTML, CSS, and JavaScript. You'll be able to teach yourself new things ultimately, and ultimately be able to tell computers increasingly what it is you want it to do. But you'll still be in the driver's seat, so to speak. You'll be the pilot. You'll be the conductor. Whatever your preferred metaphor is. And that's what I think is so empowering still about learning introductory material, foundational material, because you'll know what you're ultimately talking about and what you can in fact solve. And we've been through this before, like when calculators came out. It's still valuable, I dare say, all these years later to still know how to do addition and subtraction and whatnot. And yet, I think back on some of my own math classes. I remember learning so many darn ways in college how to take derivatives and integrals. And after like the six process of that, I sort of realized, okay, I get it. I get the idea. Do I really need to know this many ways? And here too, with AI and with code, can you increasingly sort of master the ideas and then lean on a a co-pilot assistant to actually help you solve those same problems. So, let's do some of this ourselves here. In fact, just to give you a teaser of what you'll be able to do yourselves before long, let me go ahead and open up a little something called Visual Studio Code, aka VS Code for short. This is popular largely open- source or free software that's used by real world people in industry to write code. And it's essentially a text editor similar to Notepad if you're familiar with that or text edit kind of like Google Docs but no boldf facing and underlining and and things like that that you'd find in word processing programs. And this is CS50's version thereof. We're going to introduce you to this all the more next week. But for now, let's just give you a taste of what you can do with an environment like this. So I'm going to switch over to this program already running VS Code. And in this uh bottom of the screen, you're going to see a so-called terminal window. Again, more on that next week. But it's in this terminal window that I can write commands that tells the computer what I want it to do. For instance, let's suppose just for the sake of discussion that I want to make my own chatbot, not chat GPT or Gemini and Claude, like let's make our own in some sense. So, I'm going to code up a program called chat.py. And you might be familiar that I using a language here.py is it's just called Python. And if unfamiliar, you're in good company. You'll learn that too within a few weeks. And at the top of the file here, I can write my code. And at the bottom of the file of the window here, I can run my code. So, here's how relatively easy it is nowadays to write even your own chatbot using the AI technologies that we already have. I'm going to go ahead and type a command like import uh uh I'm going to go ahead and type the following from OpenAI. import open AI. We'll learn what this means ultimately, but what I'm going to do is write my own program on top of an API, application programming interface that someone else provides, a big company called OpenAI, and they're providing features and functionality that now I can write code against. I'm going to create a so-called client, which is to say a program of my own that's going to use this OpenAI software. And then I'm going to go ahead and ask this software for a response. And I'm going to set that equal to client.responses.create whatever all that means. And then inside of these parenthesis I'm going to say the following. The input I want to give to this underlying API is quote unquote something like in one sentence what is CS50? Much like I would ask chatpt itself. If you're familiar with things like chat GPT and AI more generally nowadays, you know there's this thing called models which are like statistical models that ultimately drive what the AIs can do. I'm going to go ahead and say model equals quote unquote gpt5 which is the latest and greatest version at least as of today. Now down in my terminal window I'm going to run a different command python of chat.py and so long as I have made no typographical errors in this program I should be able to ask openai not with chatgpt.com but with my own code for the answer to some question. But I want to know what the answer to that question is. So, I actually want to print out that response by saying print response output text. In other words, these 10 lines, and it's not even 10 lines because a few of them are blank, I've implemented my own chatbot that at the moment is hard-coded that is permanently configured to only answer one question for me. And let's see, with the cross of the fingers, CS50 is Harvard University's introductory computer science course, the intellectual enterprises of computer science and the art of programming. weirdly familiar covering problems solving algorithms, data structures, and more using languages like C, Python, and SQL. Okay, interesting. But let's make the program itself more dynamic. Suppose you wanted to write code that actually asks the human what their question is because very quickly might we want to learn something more than just this one question. So up here, I'm going to go and change my code and type something like this. Type prompt equals input with parenthesis. More on this another time, too. But what I'm going to ask the user for is to give me an actual prompt. That is a question that I want this AI to answer. And down here, what you'll notice, even if you've never programmed before, is that I can do something somewhat intuitive in so far as line five is now asking the human for input. Let's just stipulate that this equal sign means store that answer in a variable called prompt where variables just like in math x, y, or z. Let's go ahead and store that in prompt. So the input I want to give to open ai now is that actual prompt. So, it's a placeholder containing whatever keystrokes the human typed in. If I now run that same command again, python of chat.py, hit enter, cross my fingers, I'll see now dynamic prompting. So, what's a question I might want to ask? Well, let's just say it again. In one sentence, whoops, in one sentence, what is CS50? Question mark. Enter. And now the answer comes back as probably roughly the same but a little bit different a variant thereof. But maybe we can distill this even more succinctly. How about let's run it again. Python of chat.py and let's say in one word what is CS50 and see if the underlying AI obliges. And after a pause course in a word. So that's not all that incorrect. And maybe we can have a little fun with this. Now how about in one word which is which is better maybe Harvard or Stanford question mark hope you picked right let's see the answer is depends okay so would not in fact oblige but notice what I keep doing in this code I keep providing a prompt as the human like in one sentence in one word well if you want the AI to behave in a certain A why don't we just tell the underlying system to behave in that way so I the human don't have to keep asking it in one sentence in one sentence in one word so we can actually introduce one other feature that you'll hear discussed in industry nowadays which is not only a prompt from the user which I'm going to now temporarily rename to user prompt just to make clear it's coming from the user I'm going to also give our what's called a system prompt by setting this equal to some standardized instructions that I want the AI to respect like limit your answer to one sentence, quote unquote. And now, in addition to passing in as input the user prompt, I'm going to actually tell Open III to use these instructions coming from this other variable called system prompt. So, in other words, I'm still using the same underlying service, but I'm handing it now not only what the user typed in, but also this standardized text limit your answer to one sentence. So, the human like me doesn't have to do that anymore. Let's now go back to my terminal. run Python of chat.py Pi once more and this time we'll be prompted but now I can just ask what is CS50 question mark and I'll likely get a correct and similar answer to before and indeed it's Harvard University's flagship introductory computer science course dot dot dot so seems spot on too but now we can have some fun with this too and you might know that these GPTs nowadays have sort of personalities you can make them obliged to behave in one way or another why don't we go into our system prompt here and say something silly like pretend You're a cat. And now let's go back to the prompt one final time. Run Python of chat.py. Prompt again will be say what is CS50? And with a final flourish of hitting enter, what do we get back? CS50 is Harvard University's introductory computer science course teaching programming algorithms, data structures, and problem solving. And it's available free online. Meow. So that was enough to coersse this particular behavior. So this is to say that with programming, you have the ability in like 10 lines of text, not all of which you might understand yet, but that's the whole point of a class like this to build fairly powerful things, maybe silly things like this, but in fact, it's using these same primitives that CS50 has its own virtual rubber duck. And we'll talk more about this in the weeks to come, but long story short, in the world of programming, it's kind of a thing to keep a rubber duck literally on your desk or really any inanimate cute object like this because when you are struggling with some problem, some bug or mistake in your code and you don't have a friend, a teaching assistant, a parent or someone else who's more knowledgeable than you about code, well, you literally are encouraged in programming circles to like talk to the rubber duck. And it's through that process of just verbalizing your confusion and organizing your thoughts enough to convey it to another person or duck in this case that so often that proverbial light bulb goes off and you realize ah I'm being an idiot now I hear in my own thoughts the ill logic or the mistake I'm making and you solve that problem as well. So CS50 drawing inspiration from this will give to you a virtual duck in computer form and in fact among the other URLs you'll use over the course of the semester is that here cs50.ai AI which is also built into that previous URL cs50.dev dev whereby these are the AIS you can use in CS50 to solve problems and you are encouraged to do so as you'll see in the course syllabus it is not reasonable it is not allowed to use AI based software other than CS50's own be it claw Gemini chat GPT or the like but it is reasonable and very much encouraged along the way to turn not only to humans like me your teaching assistant and others in the class but to CS50's own AI based software and what you'll find is that this virtual duck is designed to behave as close to a good human tutor as you might expect from an actual human in the real world knows about CS50 knows how to lead you to a solution ideally without simply spoiling it and providing it outright. So with that said that's sort of the endgame to be able to write code like that and more. But let's really start back at the beginning and see how we can't get from zeros and ones that computers speak all the way back to artificial intelligence. So computer science is the in the name of the course computer science 50. But what is that? Well, it's really just the study of information. How do you represent it? How do you process it? And very much gerine to computer science is what the world calls computational thinking, which is just the application of ideas from computer science or CS to problems generally in the real world. And in fact, that's ultimately, I dare say, what computer science really is. It's about problem solving. And even though we use computers, you learn how to program along the way, these are really just tools and methodologies that you can leverage to solve problems. Now, what does that mean? Well, a problem is perhaps most easily distilled into a simple picture like this. We've got some input, which is like the problem we want to solve, and the output, which is the goal we want, the solution there, too. And then somewhere in the middle here is the proverbial black box, the sort of secret sauce that gets that input from output. So, this then I would say is in essence is problem solving and thus computer science. But we have to agree, especially if we're going to use devices, Macs, PCs, phones, whatever. How do we all represent information, the inputs and the outputs, in some standardized way? Is it with English? Is it with something else? Well, you all probably know, even if you're not computer people, that at the end of the day, computers somehow use zeros and one entirely. That is their entire alphabet. And in fact, you might be familiar already with certain such systems. So the unary uh notation, which means you essentially use single digits like fingers on your hand. For instance, unary aka base one is something you can do on your own human hand. So for instance, with one human hand, how high can I count? >> All right, so hopefully 1 2 3 4 5 and if you want to count to six and uh to 11 and 10 and so forth, you need to, you know, take out another hand or your toes or the like because it's fairly limiting. But if I think a little harder, instead of just using unary, what if I use a different system instead? What about something like binary? Well, how high if you think a little harder can you count on one human hand? So 31 says someone who studied computer science before. But why is that? It's kind of hard to imagine, right? Because 1 2 3 4 5 seems to be the five possible patterns. But that's only when you're looking at the totality of fingers that are actually up. Five in total or four in total or one or the like. But what if we take into account the pattern of fingers that are up and we just standardize what each of those fingers represent? So maybe we all agree like a good computer would too that maybe no fingers up means the number zero. And if we want to count to one, let's go with the obvious. This is now one. But instead of two being this, which was my first instinct, maybe two can just be this. A single second finger up like this. And that means we could now use two fingers up to represent three. I'll propose we can use just one middle finger up to offend everyone, but represent four. I could maybe use these two fingers with some difficulty to represent five, six, seven. I'm already up to seven having used only three fingers. And in fact, if we keep going higher and higher, I bet I can get as high as 31 for 32 possible combinations, but the first one was zero. So that's as high as we can count. So we'll make this connection in just a moment. But what I started to do there is something called base 2. Instead of just having fingers up or fingers down, I'm taking into account the positions of those fingers and giving meaning to like this finger here, this finger here, this finger here and so forth. Different weights if you will. So the binary system is indeed all computers understand. And you might be familiar with some terminology here. Binary digit is not really something anyone really says, but the shorthand for that is going to be bit. So if you've heard of bits and we'll soon see bytes and then kilobytes and megabytes and gigabytes and terabytes and more. This just refers to a bit meaning a single binary digit either a zero or a one. A zero is perhaps most simply represented by just like turning maybe keeping a finger down or in the world of computers which have access to electricity be it from the wall or maybe a battery. You know what we could do? We could just decide sort of universally that when a light bulb is off, that thing represents a zero. And when the light bulb is on, that thing's going to represent a one instead. Now, why is this? Well, electricity is such a simple thing, right? It's either flowing or it's not. And we don't even have to therefore worry about how much of it is flowing. And if you're vaguely remember a little bit about voltage, we can sort of be like zero volts, nothing's there available for us. Or maybe it's 5 volts or something else in between. But what's nice about binary only using zeros and ones is that it maps really nicely to the real world by like throwing a light switch on and off. You can represent information by just using a little bit of electricity or the lack thereof. So what do I mean by this? Well, suppose we want to start counting using binary zeros and ones only. Well, let's think of them metaphorically as like akin to these light bulbs here. And in fact, let me grab a few of these light bulbs and let me propose that if we want to represent the number zero, well, it stands to reason that here single light bulb that is off can be agreed upon as representing zero. Now, in practice, computers don't have little light bulbs inside, but they do have little switches inside. Millions of tiny little things called transistors that if turned on can allow it to capture a little bit of electricity and effectively turn on a metaphorical bulb or the switch can go off. the transistor can go off and therefore let the electricity dissipate and you have just now a zero. Unfortunately, even though I can let some electricity, there's the battery I mentioned is required. Even though we might have some electricity available to us, I can therefore count to one. But how do I go about counting? Hardware problem. How do I go about counting higher than one with just a light bulb? Yeah. So, I need more of them. So, let me grab another one here. And now I could put it next to it. And this two I'll claim is just still the number one. But if I want to turn two of them on, well, that would mean I could count to two. And if I maybe grab another one, now I can count as high as three. But wait a minute. I'm doing something wrong because with three human fingers, how high was they able to count? So, seven in total, starting at zero. So, I've done something wrong here. But let me be a little more clever than about the pattern that I'm actually using. Perhaps this can still be one. But just like my finger went up and only one finger in the second version of this, this can be what we represent as two. Which one do I want to turn on as three? Your left or your right? >> So you're right because now this matches what I was doing with my fingers a moment ago. And I claimed we could represent three like this. If we want to represent four, that's fine. We have to turn that off, this off, and this on. And that's somehow four. And let's go all the way up to seven. Which ones need to be on to represent the number seven? All right. So, all of them here. Now, if you're not among those who just sort of naturally said all of them, like what the heck is going on? How do half the people in this room know what these patterns are supposed to be? Well, maybe you're remembering what I did with my fingers. But it turns out you're already pretty familiar with systems like this, even if you might not have put a name to it. So in the human world, the real world, most of us deal every day with the so-called base 10 system, otherwise known as decimal deck implying 10 because in the decimal system you have 10 digits available to you, 0 through 9. In the binary system, we only had two by implying two. So 0 and one and unary we had just one, a single digit there or not. So in the decimal system, we just have more of a vocabulary to play with. And yet you and I have been doing this since grade school. So this is obviously the number 123. But why? It's technically just three symbols. 1 2 3. But most of us, your mind ego goes, okay, 123. Pretty obvious, pretty natural. But at some point, you like me were probably taught that this is the one's place and this is the 10's place and this is the 100's place and so forth. And the reason that this pattern of symbols 1 2 3 is 123 is that we're all doing some quick mental math and realizing well that's 100* 1 + 10 * 2 + 1 * 3. Oh, okay. There's how we get 100 + 20 + 3 gives us the number we all know mathematically is 123. Well, it turns out whether you're using decimal or binary or other base systems that we'll talk about later in the course, the system is still fundamentally the same. Let's kind of generalize this away. Here's a three-digit number in some base system specifically in decimal. And I know that only because of the placeholders that I've got on top of each of these numbers. But if we do a little bit of math here, 1 10 100 1,000 10,000 and so forth. What's the pattern? Well, technically this is 10^ the 0 10 the 1 10 the 2 and so forth. And we're using 10 because we can use as many as 10 digits under each of those columns. But if we take some of those digits away and go from decimal down to binary, the motivation being it's way easier for a computer to distinguish electricity being on or off than coming up with like 10 unique levels of electricity to distinguish among. You could do it. It would be annoying and difficult to build in hardware. You could do it so much simpler to just say on and off. It's a nice simple world that way. So let's change the base from 10 to two. And what does this get us? Well, if we now do undo the math, that's 2 to the 0 is 1. 2 to the 1 is 2. 2 to the 2 is 4. So the ma the mental math is now about to be the same, but the columns represent something a little bit different. So for instance, if I turn all of these off again, such that I've got off, off off, otherwise known as 0 0, it's zero because it's 4 * 0 + 2 * 0 + 1 * 0 still gives me zero. By contrast, if I turn on maybe just this one all the way over on the left, well, that's four times one because on represents one and off represents 0 plus 2 * 0 + 1 * 0, that gives me four. And if I turn both of these on, such that all three of them are now on, on on aka one, one, one, that's 4 * 1 + 2 * 1 + 1 * 1. That then gives me seven. And we can keep adding more and more bits to this. In fact, if we go all the way up uh numerically, here's how we would represent in binary the number you and I know is zero. Here's how we would represent one. Here's how we would represent two and three and four and five. And you can kind of see in your mind's eye now because I only have zeros and ones and no twos or threes, not to mention nines, I'm essentially going to be carrying a one in a moment if we were to be doing some math. So to go from five to six, that's why the one ends up in the middle column. To go to seven here gives us now 1 one or on on on. How do I represent eight using ones and zeros? Yeah, >> we need to add another digit. >> Yeah. So we're going to need to add another digit. We need to throw hardware at the problem using an additional digit so that we actually have a column representing eight. Now, as an aside, and we'll talk about this before long, if you don't have an additional digit available, if your computer doesn't have enough memory, so to speak, you might accidentally count from 0 1 2 3 4 5 6 7 and then accidentally end up back at zero. Because if there's no room to store the fourth bit, well, all you have is part of the number. And this is going to create all sorts of problems then ultimately in the real world. So let me go ahead and put these back and propose that we have a system now. If you agree to sort of count numbers in this way via which we can represent information in some standard way and all the device underneath the hood needs is a bit of electricity to make this work. It's got to be able to turn things on aka use some transistors and it's got to be able to turn those things off so as to represent zeros instead of ones. But the reality is like two bits, three bits, four bits aren't very useful in the real world because even with three bits you can count to seven, with four you can count to 15. These aren't very big numbers. So it tends to be more common to actually use units of measure of eight bits at a time. A bite is just that one bite is eight bits. So if you've ever used the vernacular of kilobytes, megabytes, gigabytes, that's just referring to some number of bits. But eight of them together compose one individual bite. So here for instance is a bite worth of bits. Eight of them total. I've added all the additional placeholders. And what number does this represent in decimal even though you're looking at eight binary digits? >> Just zero cuz like literally every column is a zero. Now this is a bit more of mental math but unless you know it already. What if I change all of the zeros to ones? I turn all eight light bulbs on. What number is this? >> Yeah. So 255. Now some of those of you who didn't get that instantly, that's fine. You could certainly do the math manually. I dare say some of you have some prior knowledge of how to do this sort of system. But 255 means that if you start counting at zero and you go all the way up to 255, okay, that's 256 total possibilities once you include zero in the total number of patterns of zeros and ones. And this is just going to be one of these common numbers in computer science. 256. Why? because it's referring to eight of something. 2 to the 8 gives you 256. And so you're going to commonly see certain values like that. 256. Back in the day, computers could only show 256 colors on the screen. Certain graphics formats nowadays that you might download can only use as many as 256 colors because, as we'll see, they're only using, for instance, eight bits, and therefore they can only represent so many colors of the rainbow as a result. So this then is how we might go from just zeros and ones electricity inside of a computer to storing actual numbers with which we're familiar. And honestly we can go higher than 255. What do you need to count higher than 255? A 9th bit, a 10th bit, an 11th bit and so forth. And it turns out common conventions nowadays and we'll see this in code too is to use as many as 32 bits at a time. So that's a good chunk of bits. And anyone want to ballpark how high you can count count if you've got 32 bits available to you? Oh, fewer people now. Yeah, in the back. >> Yeah. So, it's roughly 4 billion. And it's technically two billion if you also want to represent negative numbers, but we'll revisit that question. But 2 to the 32nd power is roughly 4 billion. However, nowadays it's even more common with the Macs and PCs you might have on your laps and even your phones nowadays to use 64 bits, which is a big enough number that I'm not even sure offhand how to pronounce it. That's a lot of permutations. That's 2 to the 64 possible permutations, but that's increasingly common place. And as an aside, just to dovetail things with our discussion of AI, among the reasons that we're living through over these past few years, especially this crazy interesting time of AI, is because computers have been getting so much faster, exponentially so over time, they have so much more memory available to them. There's so much data out there on the internet in particular to train these models that it's an interesting confluence of hardware now actually meeting the mathematics and statistics that we'll talk about later in the class that ultimately make tools like the cat we just built possible. But of course computers are not all math and in fact we'll use very little math per se in this class. And so let's move away pretty quickly from just zeros and ones and talk about letters of the alphabet. Say in English here is the letter A. Suppose you want to use this letter in an email, a text message, or any other program. What is the computer doing underneath the hood? How can the computer store a capital letter A in English? If at the end of the day, all the computer has access to is a source of electricity from the wall or from a battery and it has a lot of switches that it can turn on and off and treat the electricity in units of 8 or 32 or 64 or whatever. How might a computer represent a letter A? >> Yeah, we need to give it an identity so to speak as an integer. In other words, at the end of the day, if your entire canvas, so to speak, consists only of zeros and ones. Like that is going to be the answer to every question today. You only have zeros and ones as the solution to these problems. We just need to agree what pattern of zeros and ones and therefore what integer, what number shall be used to represent the letter A. And hopefully when we look at that pattern of zeros and ones in the right context, we'll indeed see it as an A. So if we look inside of a computer so to speak in the context of like a text messaging program or a word processor or anything like that, that pattern shall be interpreted hopefully as a capital letter A. But if I open up Mac OS's or Windows or my phone's calculator program, I would want that same pattern of zeros and ones to be interpreted instead as a number. If I open up Photoshop, as we'll soon see, I want that same pattern of zeros and ones to be interpreted as a color presumably, not to mention videos and sound and so forth, but it's all just zeros and ones. And so, even though I, when writing that chat program a few minutes ago, didn't have to worry about telling the computer, oh, this is text, this is a number, this is something else. We'll see as we write code ourselves that you as the programmer will have control over telling the computer how to treat some pattern of zeros and ones telling it this is a number, this is a color, this is a letter or something else. Um, how do we represent the letter A? Well, turns out a bunch of humans in a room years ago decided ah this pattern of zeros and ones shall be known globally as a capital letter English A. What is that number if you do the quick mental math? So indeed 65 because we had a one in the 64's place and a one in the onees place. So 65 that's just sort of it. It would have been nice if it were just the number one or maybe the number zero. But at least after the capital letter A, they kept things consistent such that if you want to represent a letter B, it's going to be 66. Capital letter C, it's going to be 67. Why? Because the humans in this room, a bunch of Americans at the time, standardized on what's called ASKI, the American standard code for information interchange. doesn't matter what the acronym represents, but it was just a mapping. Someone on a piece of paper essentially started writing down letters of the alphabet and corresponding numbers so that computers subsequently could all speak that same standard representation. And here's an excerpt thereof. In this case, we're seeing seven bits worth, but eventually we ended up using eight bits in total to represent letters. And some of these are fairly cryptic. Maybe more on those another time. But down here, if we highlight just one column, we'll see that indeed on this cheat sheet, 65 is capital A, 66 is B, 67 is C, and so forth. So, why don't we do a little exercise here? What pattern of zeros and ones do I see here? I've got three bytes, so three sets of eight bits. And even though there's no placeholders now over the columns, what is this number? It's 60. Yeah. Yeah. So, we got the ones, twos, fours, 8s, uh, 16, 32, 64s column. So, indeed, this is going to be the number 72. 72. This is not what computer scientists spend their day doing. This is just to reinforce what it is we just looked at. And I'll spoil it. The rest of these numbers are 72 73 33. And anyone in this room could have done that if you took out a piece of paper, figured out what the columns are, and just do a bit of quick or mental or written math. But this is to say, suppose that you just got a text message or an email that if you had the ability to look underneath the hood of the computer and see what pattern of zeros and ones did you just receive over the internet. Suppose that pattern of zeros and ones was three bytes of bits, which when you do the math are the numbers 72, 73, 33. Well, here's the cheat sheet again. What message did you just get? >> Yeah. So, it's high. Why? Because 72 is H and 73 is I. Now, some of you said hi fairly emphatically. Why? Well, 33 turns out, and you wouldn't know this unless you looked it up or someone told you, is an exclamation point. So, literally, if you were to text someone like right now, if you haven't already, hi exclamation point in all caps, you would essentially be sending three bytes of information somehow over the internet to that recipient. And because their phone similarly understands ASI because it was programmed years ago to do so, it knows to show you hi exclamation point and not a number three numbers no less or colors or something else altogether. So here we then have hi three digits in a row here. Um what else is worth noting here? Well, there's some fun sort of trivia embedded even in this cheat sheet. So here again is a b cde e fg and so forth. 65 on down. Let me just highlight over here the lowercase letters 97 98 99 and so forth. If I go back and forth, does anyone notice the consistent pattern between these two? >> Yeah. So, the lowercase letters are 32 away from the uppercase letters. Well, how do we know that? Well, 97 - 65 is Yeah. 32. Uh 98 - 66 is okay. 32. And that pattern continues. What does this mean? Well, computers know how to do this. Most normal humans don't need this information. But what it means is if you are representing in binary with your transistors on and off representing some pattern and this is the pattern representing capital letter A, which is why we have a one in the 64's place and a one in the onees place. How does a computer go about lowercasing this same letter? Yeah, >> perfect. All the computer has to do is change this one bit in the 32's place to a one because that has the effect mathematically per our discussion of adding the number 32 to whatever it is. So it turns out you can force text from uppercase to lowerase or back by just changing a single bit inside of that pattern of eight bits in total. All right, why don't we maybe reinforce this with another quick exercise? We have an opportunity perhaps here for um maybe to give you some stress balls right at the very start of class. Could we get eight volunteers to come up on stage? Maybe over here and over here and uh over here on the left. Let me go all the way on the right. Uh let's see. Okay, the high hand here. The the hand that's highest there. Yes, we're making eye contact. How about all the way? Wait, let's see. Let's go here in the crimson sweatshirt here. And how about in the the white shirt here? Come on up. Did I count correctly? Let's see. Come on down. The eight of you. I didn't count right, did I? 1 2 3 4 5 6. It's ironic that I'm not counting correctly. Eight here. How about on the left in gray? Okay. Oh, and uh Okay. In black here. Come on down. All right. Hopefully, this is eight. 1 2 3 4 5 6 7. I pretty. Okay. Eight. There we go. All right. So, let's go ahead and do the following exercise. I've got some sheets of paper preprinted here. If each of you indeed want to do exactly what you're doing and line up from left to right, each of you is going to represent a placeholder essentially. So we have over here the ones place all the way over here. And then we have the two's place and the four's place and the eights 16 32 64 128. And we come bearing a microphone if each of you want to say a quick hello. your name, maybe your dorm or house, and something besides computer science that you're studying or want to. >> Hi, I'm Oh, that's loud. Okay. I'm Allison. I'm a freshman in Matthews and um I like climbing and I'm thinking of CS and econ. >> Number two. >> Hi, I'm Lily. I'm in Herbut this year and I'm thinking of doing CS in government. >> Nice to meet. >> Hi. Hi, I'm Sean. I'm in candidate hall and I'm thinking of doing astrophysics and CS. >> Welcome. >> Hi, I'm Jordan. I'm doing applied math with a specialization in CS and econ. And um I'm in Wigglesworth and I like going to the gym. >> Okay, nice. 16. >> Hi, I'm Shiv. I'm studying Macki and I'm in Canada. >> Nice. >> Hi, I'm Sophia. I'm in the think of doing electrical engineering. >> Welcome. Hi, my name is Marie and I'm in Canada B and I really like CS physics and astrophysics. >> Hi, I'm Alyssa. I'm in Hullworthy. I'm also thinking of studying math or physics and I also like to climb. >> Nice. Welcome to you all. So, on the backs of their sheets of paper, they have a little cheat sheet that's describing what they should do in each of three rounds. We're going to spell out together a threeletter word. You all as the audience have a cheat sheet above you that represents numbers to letters. These folks don't necessarily know what they're spelling. They only know what they individually are spelling. So if your sheet of paper tells you to represent a zero in a given round, just kind of stand there awkwardly, no hands up. But if you're told on your sheet of paper to represent a one, just raise a single hand to make obvious to the audience that you're representing a one and not a zero. And the goal here is to figure out what we are spelling using this system called ASKI. All right, round one, execute. What number is this here? I'm hearing You can just shout it out. What number? >> 66 or B. So, you're spelling B. All right, hands down. Round two. More math. Feel free to shout it out. >> Oh, I heard it. Yeah. 79, which is >> O. Okay, so we have B O. Hands down. Third and final round. Execute number 87. >> Yes. 87. Which is the letter? >> W. Which spells >> bow? If you want to take your bow now. >> Ah, okay. Here we go. You guys can keep those. Okay. Thank. All right. You guys can head back. Thank you to our volunteers here. Very nicely done. We indeed spelled out bow and that's just because we all standardized on representing information in exactly the same way which is why when you type b on your phone or your computer the recipient sees the exact same thing but what's noteworthy in this discussion is that you can't spell a huge number of words like yeah English okay we've got that covered but odds are you're noticing depending on your own background what human languages you read or speak yourself um that a whole bunch of symbols might be missing from your keyboard for instance we have accented characters here in a lot of Asian languages there's so many more glyphs than we could have even fit in that cheat sheet of numbers and letters and so ASI is not the only system that the world uses it was one of the earliest but we've moved on in modern times to a superset of ASI that's generally known as Unicode and Unicode uses so many more bits than ASI that we even have room for all of these little things that we seem to send constantly nowadays these are obviously images that you might send with your phone or your computer but they're technically ally characters. They're technically just patterns of zeros and ones that have similarly been standardized around the world to look a certain way, but they're this is an emoji keyboard in the sense that you're sending characters. You're not sending images per se. The characters are displayed as images obviously, but really these are just like characters in a different font and that font happens to be very colorful and graphical as well. So, Unicode instead of using just seven or eight bits, which if you do the quick mental math, if ASKI only used seven or let's say eight bits, how many possible characters can you represent in ASKI alone? 256. Because if we do that quick mental math, 2 to the eth 256 possibilities, like that's it. That is that's enough for English because you can cram all the uppercase letters, the lowercase letters, the numbers, and a whole bunch of punctuation as well. But it's not enough for certain other punctuation symbols, not to mention many other human languages. And so the Unicode Consortium, its charge in life has been to come up with a digital representation of all human language, past, present, and hopefully future by using not just seven or eight bits, but maybe 16 bits per character, 24 bits, or heck, even 32 bits per character. And per before, if you've got as many as 32 bits available to you, you can represent what, like 4 billion characters in total. And that's just one of the reasons why these emoji have kind of exploded in popularity and availability. There's just so many darn patterns. Like, what else are we going to do with all of these zeros and ones? But more importantly, emoji have been designed to really represent people and places and things and emotions in a way that transcends human language. But even then, they're somewhat open to interpretation. In fact, here's a pattern of I think 32 zeros and ones. I'm guessing no one's going to do the quick mental math here, but this represents what decimal number if we do in fact do out the math with that's being the ones place all the way over to the left. Well, that's the number 4 bill36,991,16. Who knows what that is? It's not a and it's nothing near a uppercase or lowercase, but it is among the most popular emoji that you might send typically on your phone, laptop, or other device. namely this thing here face with tears of joy which odds are you've sent or received recently but interestingly even though many of you might have iPhones and see and send the same image you'll notice that if you see a friend who's got Android or some other device maybe you're using uh Meta's messenger program or Telegram or some other messaging service sometimes these emoji look a little bit different why because what a Unicode has done is they decided there shall exist an emoji known known as excuse me faced with tears of joy then Apple and Google and Microsoft and others they're sort of free to interpret that as they see fit. So what you see on the screen here is a recent version from iOS, Apple's operating system. Google's version of the same looks a little something like this. And on Telegram, if you have animations enabled, the same idea faced with tears of joy is actually animated. But it's the same pattern of zeros and ones in each case. But again, they each essentially have different graphical fonts to present to you what each of those images actually is. All right. So, those are each, excuse me, images. So, those are each images. How is the computer representing them though? At the end of the day, we've represented numbers, we've represented letters, but how about these things here, colors? So, how do we represent red or green or blue, not to mention every other color in between? At the end of the day, we only have one canvas at our disposal. Yeah, so integers is the exact same answer as before. We just need to agree on what number do we use for red, what do we use for green, what do we use from blue, and we can come up with some standardized pattern for this. In fact, one of the most common techniques for doing this and the common one of the most common ways to do this in the real world is to use a combination of three colors together. Some amount of red, some amount of green, and some amount of blue, and mix them together to get most any color of the rainbow that you might want. This is sort of a a picture of something I grew up with back in the day where in like middle school when we'd watch movies or some kind of show in like in in class, we would kind of uh the projector screen would be over here. This is a old school projector with three different lenses, one of which projects some amount of green, some amount of red, some amount of blue. And so long as the lenses are correctly oriented to all point at the same circle or like rectangular region on the screen, you would see any number of colors coming to life in the old school video. I still remember all these years later, we would kind of sit and lean up against it because it was super warm and you could hear it easy way to fall asleep back in grade school. But we use the same fundamental color system nowadays as well, including in modern programs like Photoshop. So let's abstract that away. focus on just three colors, some amount of red, green, and blue. And let's suppose for the sake of discussion that we want to mix together like a medium amount of red, a medium amount of green, and just a little bit of blue. For instance, let's suppose that we'll use 72 amount of red, 72 amount 73 amount of green or or 33 amount of blue, RGB. Now, why these numbers? Well, in the context of ASI or Unicode, which is just a supererset thereof, what does this spell? >> Hi. But again, if you were instead to open a file containing these three numbers or really these three bytes of bits in Photoshop, you would hope that they're going to be interpreted not as letters on the screen, but as some m uh the the color of a dot on the screen instead. So it turns out that in typically when you have a three of these numbers together each of them is using a single bite. So eight bits. So you can have zero red or 255 red. Zero green or 255 green or 0 to 255 of blue. So zero is none, 255 is the max. So if we mix these together, imagine that just like that projector consolidating these three colors into one central point. Anyone want to guess what you're going to get if you mix some red, some green, some blue in those amounts in way back? >> Yeah, you're going to get a dark shade of yellow. I've brightened it up a little bit for the projector here, but you're going to get roughly this shade of yellow. And we could play with these numbers all day long and get similar results if we want to represent different colors as well. And indeed, whether it's Photoshop or some other program, you can actually combine these amounts in all sorts of ratios to get different colors. So if you had 0 0 0, so no red, no green, no blue, take a guess as to what color that's going to be in the computer, >> so it's going to be black, like the absence of all three of those colors. But if you mix the maximal amount of each of those 255, red and green and blue, that's going to give you white. Now, if any of you have made web pages before or use programs like Photoshop, you might have seen numbers like 00 or FF. Long story short, that's just another base system for representing numbers between 0ero and 255 as well. But we'll come back to that mid-semester when we make some of our own filters uh in sort of an Instagram-like way, manipulating images of our own. So, where are these colors coming from or where can we actually see them? Well, here's just a picture of that same emoji face with tears of joy. If I kind of zoom in on that and maybe zoom in again, you can start to see if you blow it up enough or if you put your eyes close enough to the device, sometimes you can actually see individual dots or squares. These are generally known as pixels. And they're just the individual dots that collectively compose an image. Which is to say that if each of these dots, which is part of the image, is going to be a distinct color. Like this one's yellow, this one's brown, and then there's a bunch in between. Well, you're using some number of bits to represent each of those pixels colors. So, if you imagine using the RGB system, that's 8 + 8 + 8 bit. So, that's 24 bits or three bytes just to keep track of the color of each and every one of these dots. So now, if you think about having downloaded a GIF at some point, a ping, PNG file, um a JPEG or any other file format, it's usually measured in what file size? like megabytes typically that means millions of bytes. Why? Because if it's a pretty big photograph or pretty big image, each of those dots takes up at least three bytes it would seem. And if you do out the math, if you got thousands of dots, each of which uses three bytes, you're going to quickly get to megabytes, if not even larger for things like say videos. But again, it's just patterns of zeros and ones. And so long as the programmer knows what they're doing and tells the computer how to interpret those zeros and ones. And equivalently, so long as the software knows, look at these zeros and ones and interpret them as numbers or letters or colors, we should see what we intended to represent. All right, so that's num that's uh colors and images. What about how many of you kind of played with these little flip books as a kid where they've got like a hundred different little pictures and you flip through them really quickly and you see what looks like animation in book form. Well, this is essentially a video. So therefore, what is a video or how can you think of what a video is? It's just a whole bunch of like images flying across the screen either on paper or digitally nowadays on your phone or your laptop. And that's kind of nice because we're sort of composing more interesting media now based on these lower level building blocks. And this is going to be thematic. We literally started with zeros and ones. We worked our way up to letters. We then worked our way up to sort of images and uh colors and thus images. Now we're up at this level of hierarchy in terms of video because what's a video? It's like 30 images per second flying across the screen or maybe slightly fewer than that. That collectively tricks our mind into thinking we are seeing motion pictures. And that's the old school term for movies, but it literally is what it was. motion pictures was this film was showing you 30 pictures per second and it looks like motion even though you're just looking at images much like this flip book very quickly one after the other. What about music? Well, how could you go about representing musical notes if again your only ingredients are zeros and ones? Even if you're not a musician, how do you represent music like that on the screen here? Yeah. Okay. So, the frequency like the tone that you're actually hearing from the device. What else might weigh in beside besides the frequency of the note? Yeah. >> So the speed of the note or maybe the duration like if you think about a physical piano like how long you're holding the key down for or not. What else? So the amplitude maybe how loud like how hard did you hit the keyboard to generate that sound. So let me propose at the risk of simplifying we could represent each of these notes using three numbers. maybe 0 to 255 or some other range that represents the frequency or the pitch of the note, the duration, and the loudness. And so long as the person receiving a file containing all of those zeros and ones knows how to interpret them three at a time, I bet you could share uh a musical file with someone else that they could hear in exactly the same way that you yourself intended. Let me pause here to see if there's any questions now because we've already built our way up from zeros and ones now to video and sound. >> Yeah, in front. >> How does the computer know differentiate between what the letter like 65 would be and then what the number 65? >> So, how does the computer distinguish between the letter 65 and the number 65? It's context dependent. So put simply and we'll see this as early as next week the programmer tells the computer how to display the information either as a number or a letter or equivalently once programmed the software knows that when it opens a GIF file or JPEG or something else to interpret those zeros and ones as colors instead of as like docx for a Microsoft Word file or the like. Other questions on any of these representations? Yeah. In front. Can we >> go over like the base 10 base 2 thing like really briefly? >> Sure. So, can we go over base 10 and base two? So, base 10 is like literally the numbers you and I use every day. It's base 10 in the sense that you have 10 digits at your disposal. 0 through 9. And any numbers you want to represent in the real world must be composed using 0 through 9. The binary system or base 2 is fundamentally the same. It's just the computer doesn't have access to two through 9. It only has access to zero and one. But much like the light bulbs I was displaying here, you can simply ascribe different weights to each of the digits. So that instead of it being as much as the ones place, the 10's place, and the hundred's place, if we more modestly say the ones place, the two's place, the four's place, we can use the same system. In binary, you might need to use more digits to count as high because in 255, you can just write 255. That's three digits in decimal. But in binary, we've seen you need to use eight such digits, which is more, but it's still much better than unary, which would have had 255 light bulbs on instead. >> And is binary and like the same thing. >> Is binary and base 2 the same thing? Yes. Just like base 10 and decimal are the same thing as well. And unary and base 1 are the same thing as well. All right. So let me just stipulate that even though we sort of took this tour quickly at the end of the day computers only have zeros and ones at their disposal. So again the answer to any question as to how can we represent X is going to somehow involve permuting those zeros and ones into patterns or equivalently into the numbers that they represent. But if we now have a way to represent all inputs in the world be it letters, numbers, images, videos, anything else and get output from some problem-solving process like how do we actually solve problems? Well, the secret sauce in the middle here is another term that you've probably heard in the real world nowadays, which is that of algorithm. Stepbystep instructions for solving some problem. So, this ultimately is what computer science really is about too, is not just representing information, but somehow processing it, doing something interesting with it to actually solve the problem that you've been provided as input so you can output the correct answer. Now, there's all sorts of algorithms implemented in our phones and in our Macs and PCs, and that's all software is. It's an implementation in code, be it C++ or Java or anything else. Other languages exist too in code that the computer understands, but it's still just step-by-step instructions. And among the things we'll learn in CS50 is how to express yourself in different ways to solve problems, not only in different languages, but using different methodologies as well. Because as we'll see, among the reasons we introduce these several languages is you don't just learn more and more languages that allow you to solve the same problems. Different languages will allow you to solve different problems and even save you time by being better tools for the job. So here for instance on uh an iPhone is maybe a bunch of contacts which is presumably familiar where we might have a whole bunch of friends and family and whatnot alphabetized by first name or last name and suppose we want to find one such person like John Harvard whose number here might be plus1 949-4682750. Feel free to call or text him sometime. Um this is the goal of this problem. If we have our contacts app and I start typing in John's name by first name or last name, the autocomplete nowadays kicks in and it somehow filters the list down from my 10 friends or 100 friends or a thousand friends into just the single directory entry that matches. So here too, back in the days of RG&B um projector, we had uh phone books like this here too. Um I'm pleased to say thanks to our friend Alexis, this is the largest phone book that we've used for this demonstration. Uh, this is an old school phone book that's essentially the same thing as our contacts app or address book nowadays whereby I've got a whole bunch of names and numbers alphabetically sorted by first name or last name, whatever, and corresponding to each of those as a number. So, back in the day and frankly even nowadays in your phones, how do you go about finding someone in a phone book or your contacts app? Well, you could very naively just start at the beginning and look down and just turn one page at a time looking for John Harvard in this case. Now, so long as I'm paying attention, this step-by-step process will get me to John Harvard. Like, this is a correct algorithm, even though you might kind of object to how I'm doing this. Why? Like, what's bad about this algorithm? >> It's just slow. I mean, this is crazy slow. If there's like a thousand pages in this phone book, which looks like there are, like this could take me as many as a thousand pages, or maybe he's roughly in the middle, like 500 pages. Like, that's crazy. That's really rather slow, especially if I'm going to do this again and again. Well, what if I do it a little smarter? Grade school, I sort of learned how to count two at a time. So, 2 4 6 8 10 12 14 16 18. Again, if I'm paying attention, I'll get there twice as fast because I'm counting two at a time. But is that algorithm step by step correct? And I'm seeing no, but why? >> I might skip over John Harvard. So, just by bad luck and kind of with 50/50 probability, he's going to be sandwiched between two of the pages. Now, I don't have to abort this algorithm alto together. I could just as soon as I get past the J section if we're doing it by first name. I could just double back one page and just make sure that I haven't missed him. So, it's recoverable. And this algorithm therefore is sort of twice as fast plus one extra step maybe to double back. But that's arguably otherwise a bug or a mistake in the algorithm if I don't fix it intelligently. But what did we do back in the day? And what does your iPhone or Android phone do? What they typically do is they go roughly to the middle, look physically or virtually down. They see, "Oh, I'm in the M section." And so, which side is John Harbor to? To the left or to the right? So, he's to the left. So, I could literally now Jesus Christ. We talked about this before class that this might be more Oh my god. There we go. We can tear the problem in half. Thank you. It's been a while. We can tear the problem in half. We know that John Harvard is to the left. So, I can throw half of the problem away if uh dramatically such that I'm now gone from a thousandpage problem to 500 pages instead. What now can I do? I can go roughly to the middle here and maybe I'm in the E section. So, I went a little too far back to the left, but I kept it simple and I just divided so that I can conquer this problem, if you will. And if I'm in the E section now, is John Harvard to the left or to the right? To the right. So I can again Jesus Christ. Tear the problem in half. And now, thank you. So now John Harvard again is going to be in this half. I can throw this half away. So now I've gone from a,000 to 500 to 250. And I can repeat, repeat, repeat down to 125. Half of that, half of that, half of that until I'm left with finally just a single page. And John Harvard is hopefully now on this page such that I can call him or not at all at which point this is all sort of for not. But what's powerful about each of those algorithms is that the sort of good better and best like they all get the job done conditional on the second one having that little fix just to make sure I don't miss John Harbor between two pages but they're fundamentally different in their efficiency and the quality of their design. And this is really representative of one of the emphases of a class like this. It's not just about writing correct code or getting the job done, but doing it well and doing it quickly. Using the least amount of CPU or computing resources, using the minimal amount of RAM, using the fewest number of people, using the least amount of money, whatever your constrained resource is, solving a problem better. So that first algorithm step-by-step instructions was all about doing something like this whereby the first algorithm if we plot things on a grid like this we have on the x-axis a representation of the size of the problem. So this would mean small problem like zero pages. This would mean big problem like a thousand pages. And on the y or vertical axis we have some measurement of time. So this is the number of seconds or the number of page turns whatever your metric actually is. So this would be uh not much time at all, so fast. This would be a lot of time, so slow. So what's the relationship if we just roughly draw these three algorithms? Well, the first one is technically a straight line. And we'll describe that as n. The slope is n because if you think of n as a number for the number of pages, well, there's a one toone relationship in the first algorithm as to how many times I have to turn the page based on how many pages there actually is. And you can think about this in the extreme. If I was looking for someone whose name started with Z, I might have to go through like a thousand darn pages to get to that person whose name started with Z, unless again I do something hackish and just kind of cheat and go to the end. If we execute these algorithms again and again the same way, that's going to be pretty slow. But the second algorithm was pretty much twice as fast plus that one extra step potentially. But it's still a straight line because if there's a thousand pages and I'm dividing the problem and I'm doing two pages at a time, well that's like n divided by two steps plus one give or take. But it's still a straight line because but it's still better. Notice if this is the size of the problem, a thousand pages for instance, we'll notice that the first algorithm took literally twice as much time as the second algorithm. So we're doing better already. But the third algorithm fundamentally is going to look something like this. And if you remember your logarithm so to speak, sort of the opposite of an exponential, this curve is so much lower and flatter, if you will, than either of these two mathematically. More on this another time. The slope is going to be like log base 2 of n or just logarithmic in nature. But what it means is that it's growing very very very slowly. It's still going up. It's never going to flatline and go perfectly horizontal, but it goes up very slowly. Why? Well, if you think about two towns nearby, like Cambridge on this side of the river and the town of Alustin on the other, suppose that they still have phone books like this one, and they merge their phone books for whatever reason. So, overnight, we go from a thousandpage phone book to a 2,000page phone book. The first algorithm is going to take literally twice as long as will the second one because we're only going through it one or two pages at a time. But if the phone book size doubles from this year, for instance, to next year, you can kind of in your mind's eye think about the green line. It's not going to go up that much higher. Why? Well, practically speaking, even if the phone book becomes 2,000 pages long. Well, how many more times do you have to tear or divide that problem in half? >> Just one. Because you're taking a,000 page bite out of it, or a 500 than a 250. you're taking much bigger bites out of it than just one or two at a time. And so what computer science and what algorithms and about good design is about is figuring out what is the logic via which you can solve problems not only correctly but efficiently as well. And that then gives us these things called algorithms. And when it comes time to code, which we're about to do too, code is just an implementation and a language the computer understands of an algorithm. Now this assumes that we've come up with some digital way that is to say zero in onebased way to represent names and numbers. But honestly we already did that. We came up with a asky and then unicode to represent the names. Representing numbers is even easier than that. That's really where we started. So code is just about taking as input some standardized representation of names and numbers and spitting out answers. And that's truly what iOS and Android are doing. When you start doing autocomplete, they could be searching from the top to the bottom, which is fine if you've only got a few friends and family in the phone. But if you've got a thousand or if you've got 10,000 or if it's not a phone book anymore, it's some database with lots and lots of data. Well, it stands to reason that it'd be nice maybe if the computer kept it all alphabetized just like that book and jumped to the middle, then the middle of the middle, then the middle of the middle of the middle, and so forth. Why? because the speed is going to be much much faster, logarithmic in nature and not linear so to speak in nature. But we'll revisit those topics as well. But for now, before we get into actual code, let's talk for a moment about pseudo code. So pseudo code is not one formal thing. Every human will come up with their own way of representing pseudo code. It's an English-like or human-like formulation of step-by-step instructions just using tur correct English or whatever human language. So, for instance, if I want to translate what I did somewhat intuitively with that phone book by just dividing in half, dividing in half into step-by-step instructions, I could hand you or now it is like a robot or something like that. Well, step one was essentially to pick up the phone book, which I did. Step two was I open to the middle of the phone book in the third and final algorithm. Step three was look at the page as I did. Step four got a little more interesting. Even though I didn't verbalize this, presumably I was asking myself a question. If the person I'm looking for, John Harbert, is on the page, then I would have called him right then. But if he weren't on the page, if he instead were earlier in the book, as did happen, well then I'm going to go to the left, so to speak, but more methodically, I'm going to open to the middle of the left half of the book. Then I'm going to go back to line three. That's interesting. We'll come back to that in a moment. But else if the person is later in the book, well, I'm going to open to the middle of the right half of the book and then go back to line three. Now, let's pause here. Why do I keep going back to line three? This would seem to get me doing the same thing forever endlessly. But not quite. Why? >> As soon as you hit the one the on. >> Yeah. So because I am dividing the problem in half, for instance, on line six or line nine implicitly just based on how I've written this, the problem's getting smaller and smaller and smaller. So it's fine if I keep doing the same logic again and again because if the problem's getting smaller, eventually it's going to bottom out and I'm going to have just one person on that page that I want to call and so the algorithm is done. But there is a perverse corner case, if you will, and this is where it's ever more important to be precise when writing code and anticipate what could go wrong. I should probably ask one more question in this code, not just these three. What might that question be? Yeah. >> John Harvard is in the book. >> Yeah. So, if John Harvard is not in the book, there's this corner case where what if I'm just wasting my time entirely and I get to the end of the phone book and John Harvard's not there. What should the computer do? Well, as an aside, if you've ever been using your Mac or PC or phone and the thing just freezes or like the stupid little beach ball starts spinning or something like that and you're like, what is going on? Some human at Google or Microsoft or Apple or the like made a mistake. They forgot for instance that fourth uncommon but possible situation wherein if they don't tell the computer how to handle it, the computer's effectively going to freak out and do something undefined like just hang or reboot or do something else. So we do want to add this else quit altogether. So you have welldefined behavior and truly think that the next time your computer or phone spontaneously reboots or dies or does something wrong, it's probably not your fault per se. It's some other human elsewhere did not write correct code. They didn't anticipate cases like these. But now let's use some terminology here. There's some salient ideas that we're going to see in Scratch and C and Python and these other languages I alluded to earlier. Everything I've just highlighted here, henceforth, we're going to think of as functions. Functions are verbs or actions that really get some small piece of work done for you. Functions are verbs or actions. Here though, highlighted is the beginning of what we'll call conditionals. Conditional is like a fork in the road. Do I go this way? Do I go this way? Or some other way altogether. How do you decide what road to go down? We're going to call these questions you ask yourself boolean expressions. Named after a mathematician Bull. And a boolean expression is just a question that has a yes or no answer or a true or false answer or a one or zero answer just it's a binary state yes or no typically. Otherwise we have this go back to go back to which is what we're generally going to call a loop which somehow induces cyclical behavior again and again. And those functions and those conditionals, boolean expressions and loops and a few other concepts are pretty much what will underly all of the code that we write whether it is in scratch C or something else altogether. But we need to get to that point and in fact let's go and infer what this program here does. At the end of the day, computers only understand zeros and ones. So I claim here is a program of zeros and ones. What does it do? Anyone want to guess? I mean, we could spend all day converting all of these zeros and ones to numbers, but they're not going to be numbers if it's code. What do you think? >> That's amazing. It does in fact print hello world. All right. So, no one except like maybe you and me and a few others in the room should know, and that was probably guess admittedly or advancing on the slide. But why is that? Well, it turns out that not only do computers standardize information, data like numbers and letters and colors and other things, they also standardize instructions. And so, if you've heard of companies like Intel or AMD or Nvidia or others, among the things they do is they decide as a company what pattern of zeros and ones shall represent what functionality. And it's very low-level functionality. those companies and others decide that some pattern of zeros and ones means add two numbers together or subtract or multiply. Another pattern might mean load information from the computer's hard drive into memory. Another might mean store it somewhere else. Another might mean print something out to the screen. So nested somewhere in here and admittedly I have no idea which pattern off because it's not interesting enough to go figure it out at this level says print. And somewhere in there, like this gentleman proposed, I bet we could find the representation of H, which was 72 and E and L and L and O and everything that composes hello world. Because, as it turns out in programming circles, the very first program that students typically write is that of hello world. Now, this one here is written in a much more intelligible way. Even if you're not a programmer, odds are if I asked you, what does this program do? you would have said, "Oh, hello world." Even though there's a lot of clutter here, like no idea what this is until next week. Int main void. That looks cryptic. There's these weird curly braces, which we rarely use in the real world, but at least I understand a few words like hello in world. And this is kind of familiar. Print f, but it's not print, but it's probably the same thing. So, here too is an example of this hierarchy. Back in the day, in the earliest days of computers, humans were writing code by representing zeros and ones. If you've ever heard your parents talk about punch cards or the like, you're effectively representing patterns that tell the computer what to do or what to represent, like literally holes in paper. Well, pretty quickly early on this got really tedious, only writing code at such a low level. So, someone decided, you know what, I'm going to put in the effort. I'm going to figure out what patterns of zeros and ones I can put together so as to be able to convert something more user friendly to those zeros and ones. And as a teaser for next week, that person invented the first compiler. A compiler is just a program that translates one language to another. And more modernly, this is a language called C, which we'll spend a few weeks on together because it's so fundamental to how the computer works. Even this is going to get tedious by like week six of the class. And this is going to get stupid. This is going to get annoying. This is going to get cryptic. We're just going to write print hello on the screen in order to use a different language called Python. Why? because someone wrote in C a program that can convert Python, this is a white lie, to C which can then be converted to zeros and ones and so forth. So in computing there's this principle of abstraction where we start with the basics and thank god we can all trust that someone else solved these really hard problems or way uh long ago. Then they wrote programs to make it easier. We wrote programs to make it easier. You can now write code like I did with the chatbot to make things even easier. Why? because OpenAI and other companies have abstracted away a lot of the lower level implementation details. And that's where I think this stuff gets really exciting. We can stand on the shoulders of others so long as we know how to use and assemble these kinds of building blocks. And speaking of building blocks, let's start here. Now, odds are some of you might have started here in like grade school playing with Scratch. And it's great for like after school programs, learning how to program. And you probably used it this language to make games and graphics and just maybe playful art or the like. But in Scratch, which is a graphical programming language designed about 20 years ago from our friends down the road at MIT's Media Lab, it represents pretty much everything we're going to be doing fundamentally over the next several weeks in more modern languages like C and Python, more textual languages, if you will. I bet I could ask the group here, what does this program do when you click a green flag? Well, it says hello world on the screen. Because with Scratch, you have the ability to express yourself with functions and loops and conditionals and all of this, but by using drag and drop puzzle pieces. So, what we're about to do is this. We're going to go on my screen to scratch.mmit.edu. It's a browserbased programming environment, and we're only going to spend one week, really a few days in CS50 on this language. But the overarching goal is to one make sure everyone's comfortable applying some of these building blocks and actually developing something that's interesting and visual and audio as well, but to also give us some visuals that we can rely on and fall back on when all of those curly braces and parentheses and sort of stupid syntax comes back that's necessary in many languages but can very quickly become a distraction early on from the interesting and useful ideas. So what we're about to see is this in a browser. This is the Scratch programming environment and there's a few different parts of this world. This is the blocks pallet so to speak. That is to say, there's a bunch of puzzle pieces or building blocks that represent functions and conditionals and v and uh loops and other such constructs. There's going to be the programming area here where you can actually write your code by dragging and dropping these puzzle pieces. There's a whole world of sprites here. By default, Scratch is uh and is a cat by design, but you can make Scratch look like a dog, a bird, a garbage can, or anything else as we'll soon see. And then this is the world in which Scratch itself lives. So Scratch can go up, down, left, right, and generally be animated within that world. For the curious, kind of like high school geometry class, there's sort of this XY plane here. So 0 0 would be in the middle. 0 180 is here. 0 comma 180 is here. Uh -240 is here. and positive 240 0. Generally, you don't need to worry about the numbers, but they exist. So that when you say up or down, you can actually tell the program go up one pixel or 10 pixels or 100 pixels so that you have some definition of what this world actually is. All right, so let's actually put this to the test. Let me go ahead here and flip over to in just a moment the actual Scratch website whereby I'm going to have on my screen in just a moment that same user interface once I've logged in that via which I can actually write some code of my own. Let me go ahead and zoom in on the screen a little bit here and let's make the simplest of these programs first. Maybe a program that simply says hello world. Now at a glance it's kind of overwhelming how many puzzle pieces there are. And honestly, even over 20 years, I've never used them all. And MIT occasionally adds to it. But the point is that they're colorcoded to resemble the type of functionality that they offer. And also, it's meant to be the sort of thing where you can just kind of scroll through and get a visual sense of like what you could do and then figure out how you might assemble these puzzle pieces together. So, I'm going to go under this yellow or orangish category here to begin with. So, there exists in the world of Scratch not quite the same jargon that I'm using now. functions and conditionals and loops. That's more of the programmer's way. This is more of the child-friendly way, but it's really the same idea. Under events, you have puzzle pieces that represent things that can happen while the world is running. So, for instance, the first one here is sort of the canonical when the green flag is clicked. Why is that relevant? Well, in the two-dimensional world that Scratch lives in, there's a stop sign, which means stop, and there's a green flag, which means go. So, I can therefore drag one of these puzzle pieces over here so that when I click that green flag, the cat will in fact do something for me. Doesn't really matter where I drop it, so long as it's somewhere in the middle here. I'm going to go ahead and let go. Now, I want the look of the cat to change. I want to see like a cartoon speech bubble come out for now. So, I'm going to go under looks here. And there's a bunch of different ways to say things and think things. I'm going to keep it simple and just drag this one here. And now notice when I get close enough to that first puzzle piece, they're sort of magnetic and they want to snap together. So I can just let go and boom, because they're a similar shape, they will lock together automatically. And notice too, if I zoom in here, the white oval, which by default says hello, is actually editable by me because it turns out that some functions can take arguments or more generally inputs that influence their behavior. So, if I kind of click or double click on this, I can change it to the more canonical hello world or hello David or hello whatever I want the message to be. I'm going to go ahead and zoom out. And now over here at top right, notice that I can very simply click the green flag. And I'll have written my first program in Scratch. I clicked the green flag, it said go. And now notice it's sort of stuck on that because I never said stop saying go. But that's where I can click the red stop sign and sort of get the cat back to where I want it. So think about for just a moment what it is we just did. So at the one hand we have a very obvious puzzle piece that says say and it said something but it really is a function and that function does take an input represented by the white oval here otherwise known as an argument or a parameter. But what this really is is just an input to the function. And so we can map even this simple simple scratch program onto our model of problem solving before with an addition of what we'll call moving forward a side effect. A side effect in a computer program is often something that happens visually on the screen or maybe audibly out of a speaker. It's something that just kind of happens as a result of you using a function like a speech bubble appearing on the screen. So here more generally is what we claimed it represents the solving of a problem. And let's just consider what the input is. The input to this problem say something on the screen is this white oval here that I typed in. Hello world. The algorithm, the step-by-step instructions are not something really I wrote like our friends at MIT implemented that purple say block. So someone there knows how to get the cat to say something out of its uh comical mouth. So the algorithm implemented in code is really equivalent to the say function. So a function is just a piece of functionality implemented in code which in turn implements an algorithm. So algorithm is sort of the concept and the function is actually the incarnation of it in code. What's the output? Well, hopefully it's this side effect seeing the speech bubble come out of the cat's mouth like this. All right, so that's one such program, but it's always going to play and look the same. What if I actually want to prompt the human for their actual name? Well, let me go back to the puzzle pieces here. Let me go ahead and throw this whole thing away. Okay. And if you want to delete blocks, you can either rightclick or control-click and choose from a menu. Or you can just drag them there and sort of let go and they'll disappear. I'm going to go back in and get another uh another event block, even though I could have reused that same one. I'm going to go ahead and go under sensing now. And if I zoom in over here, you'll see a whole bunch of things like I can sense distance and colors. But more pragmatically, I can use this function in blue, ask something, and then wait for the answer. And what's different about this puzzle piece is that it too is yes a function. It too takes an argument, but instead of having an immediate side effect like displaying something on the screen, it's essentially inside of the computer going to hand me back the response. It's going to return a value, so to speak. And a return value is something that the code can see, but the human can't. A side effect is something the human sees, but a return value is something only the computer sees. It's like the computer is handing me back the user's input. So, how does this work? We'll notice, and this is a bit strange. This isn't usually how variables work, but Scratch 2 supports variables, and that was a word I used quickly at the very start when we were making the chatbot. A variable like in math, X, Y, or Z, just store some value, but it doesn't have to store a number. In code, it can store like a human name. So, what's going to happen when I use this puzzle piece is that once the human types in their name and hits enter, MIT, or really Scratch is going to store the answer, the so-called return value in a variable that's designed to be called answer. But, as we'll see, you can make your own variables down the line if you want and call them anything you want. But, let me go ahead and zoom out. Let me drag this over here. I'm going to use the default question, what's your name? But I could certainly change the text there. And let me go under looks again. Let me go ahead and grab the say block and let me go ahead and say just for consistency like hello, okay? And now let me go under maybe sensing I want to say how do I want to say this answer. Well, notice this. The shapes are important. This too is an oval even though it's not white but that's just because it's not editable. It's going to be handed to me by the ask function. Let me zoom out and grab a second say block like this. And notice it will magnetically clip together. I don't want to say hello again. So, I could delete that. But now it's still the same shape even though it's a little smaller. Let me go back to sensing. And notice what can happen here. When you have values like words inside of a so-called variable, you can use those instead of manual input at your keyboard. And notice it too wants to magnetically snap into place. It'll grow to fit that variable because the shape is the same. And now let's do this. Let me click the green flag at right. I'm seeing quote unquote what's your name? I'm getting a text box this time, like on a web page for instance. Let me type in my name and watch closely what comes out of the cat's mouth as soon as I click the check mark or hit enter. Huh. Okay, I got my name right, but let me do it once more. Let me stop and start davvid. Enter. No, it didn't work. Let me try one other. Maybe it's my name. Let's try Kelly. Enter. What's missing? Obviously, the the hello. There's a bug, a mistake in this program. But is there like what explains this? Even if you've never programmed before, intuitively, what could explain why I'm not seeing hello? >> Exactly. It's on two different lines. So, it's doing one after the other. So, it is happening. It's just you and I is the slowest things in the room are just not seeing it in time because it's happening so darn fast. Because my computer is so, you know, so new and so fast, it's happening, but way too quickly. So, how can we solve this? So we can solve this in a few different ways. And this is where in Scratch at least for problems at zero when wherein you'll have an opportunity to play around with this. I can scroll around here and okay under control I see something like weight. So I can just kind of slow things down. And now notice too if you hover over the middle of two blocks if it's the right shape it'll just snap into the middle too. Or you can just so you know kind of drag things away to magnetically separate them. But this might solve this. So let me hit stop and then start davvid. Enter. Hello, David. All right, that was a little Let's do like maybe two seconds to see it again. Green flag dab ID. Enter. Hello, David. All right, it's working better. It's sort of more correct because I'm seeing the hello and the David, but kind of stupid, right, to see one and then the other. Wouldn't it be nice to say it all in one breath, so to speak? Well, here's where we can maybe compose some ideas. So, let me get rid of this weight and the additional block. Let's confine ourselves to just one say block. But let me go down to operations where we haven't been before. And this is interesting. There's this bigger oval here that says join two things like apple and banana. And those are just random placeholder words that you can override with anything you want. But they're both ovals and white, which means I can edit them. So let me go ahead and do this. Let me drag this on top of the say block. And this is just going to therefore uh override the hello I put there. Now I don't want to say apple or banana, but I do want to say hello, and I then want to say my name. Okay, so now I can go back to sensing, go back to answer, drag and drop this here. That'll snap into place. And let me zoom in. Now what I've done is take a function and on top of it I've nested another function, the join function that takes two arguments or inputs and presumably joins them together as per its name. So let's see what this does for us. Let me click stop and start. I'll type in David enter. And it's so close. Now, this is just kind of an aesthetic bug. What have I done wrong here? There's no space. So, it looks a little wrong, but that's an easy fix. I just need to literally go into the hello block after the comma, hit the space bar, so that now when I stop and start again and type in David, now I see something that's closer to the grammar we might typically expect syntactically here. All right. So, let's model this after what we just saw earlier. We've now introduced a so-called return value. And this return value is something we can then use in the way we want. It's not happening immediately like the speech bubble. It's clearly being passed to me in some way that I can use to plug in somewhere else like into that join block. So if we consider the role of these variables playing, let's consider the picture now as follows. If the input now to the first function, the ask block is what's your name? Quote unquote, that's indeed being fed into the ask block. And the result this time is not a speech bubble. It's not some immediate visual side effect. It is the answer itself stored in a so-called variable as represented by this blue oval. Meanwhile, what I want to do is combine that answer with some text I came up with in advance by kind of stacking these things together. Now, visually in Scratch, you're stacking them on top, but it's really that you're passing one into the other into the other because much like math when you have the parenthesis and you're supposed to do what's inside the parenthesis and then work your way out. Same idea here. You want to join hello and answer together. And whatever that output is, that then becomes the input to the say block, which like in math is outside of the join block itself. So pictorially, it might now look like this. There's two inputs to this story. Hello, comma, space, and the answer variable. The puzzle piece in question is join. Its goal in life had better be to give me the full phrase that I want. Hello, David. Let's shift everything over now because that output is about to become the input to the say block which itself will now have the so-called side effect. And so this too is what programming and in turn what computer science is about is composing with the solutions to smaller problems solutions to bigger problems using those component pieces. And that's what each of these puzzle pieces represents is a smaller problem that someone else or maybe even you has already solved. Now, we can kind of spice things up here. If I go back to Scratch's interface, we don't have to use just the puzzle piece here. I can do something like this. Let me go ahead and drag these apart and get rid of the say block down here. Just for fun, there's all these extensions that you can add over the internet to your own Scratch environment. And if I go to like text to speech down here, I can, for instance, do uh a speak block instead of a say block colored here in green. I can now reconnect the join block in here. And if we could raise the volume just a little bit. Let me stop the old version, start the new version, type in my name, and hear what Scratch actually sounds like. >> Hello, David. >> Okay, not very cat-like, but we can kind of waste some time on this by like dragging the set voice to box. And I can put this anywhere I want above the speak block. So, I'm just going to put it here, even though I've already asked a question. Maybe kitten sounds appropriate. Let's try again. Dav >> meow meow. >> Okay. And then let's see uh giant little creepier. Here we go. DAV ID. And lastly, >> hello David. >> All right. Little ransomlike instead. All right. So, that's just some additional puzzle pieces, but really just the same idea, but I like that we've introduced some sound. So, let's do this. Let me go ahead and throw away a lot of those puzzle pieces, leave ourselves with just the when green flag clicked, and play around with some other building blocks that we've seen already thus far. Let me go ahead, for instance, under sound, and let's make the cow actually meow. So, it turns out Scratch being a cat by default comes with some sounds by default like meowing. So, if we go ahead and click the green flag after programming this program, let's hear what he sounds like now. Okay, kind of cute. And if you want it scratched to meow twice, you can just play the game again. And a third time. All right, but that's going to get a little tedious as cute as it is. So, I can solve that. Let's just grab three of the puzzle pieces and just drag them together and let them connect. And now click the green flag. All right. Doesn't it gets less cute quickly, but maybe we can slow it down so that the cat doesn't sound so so hungry. Maybe let me go under uh let's see under control. Let's grab one of those. Wait one second and maybe plop a couple of these in the middle here. That might help things. And now click the green flag. Okay. Still a little hungry, but let's see if we change it to two. And then I change it to two down here in both places. Let's play it again. Okay, cuter maybe, but now I'm venturing into badly programmed territory. This is correct. If my goal is to get the cat to meow three times, pausing in between. Sorry, three times pausing in between. What is bad about this code? Even if you've never programmed before, though. Yeah, in the middle. >> Yeah, I literally had to repeat myself three times. Essentially copy pasting. And frankly, I could have been really lazy and I could rightclick or control-click and I could have chosen duplicate. But generally, when you copy paste code or when you duplicate puzzle pieces, probably doing something wrong. Why? It's solving the problem correctly, but it's not well designed. Even if for only because when I change the number of seconds, now I had to change it in two places. So, I had one initially, then I had to change it to two. And if you just imagine in your mind's eye having not like six puzzle pieces but 60 or 600 or 6,000, you're going to screw up eventually if it's on you to remember to change something here and here and here and here. Like you're going to mess up. It's better to keep things simple and ideally centralized by factoring out common functionality. And clearly playing sound and waiting is something I'm doing at least twice if not a third time here as well. So how can we do this better? Well, remember this thing loops. Maybe we can just do something a little more cycllically. So I tell the computer to do something once, but I tell it how many times to do that al together. So notice here by coincidence under control I have a repeat block which doesn't say loop, but that's certainly the right semantics. Let me go ahead and drag the repeat block in and I'll change the 10 to three just for consistency here. I'm going to go back to sound. I'm going to go ahead and play sound meow until done just as before. And just so it's not meowing too fast under control, I'm going to grab a weight one second and keep it inside the loop. And notice that the loop here is sort of hugging these puzzle pieces by growing to fill however many pieces I actually cram in there. So now if I click play, the effect is going to be the same, but it's arguably not only correct, but also well designed because now if I want to change the weight, change it in one place. If I want to change the total number of times, change it in one place. So I've modularized the code and made it better designed in this case. But now this is silly because even though I want the cat to meow, it feels like any program in which I want this cat to meow, I have to make these same puzzle pieces and connect them together. Wouldn't it be nice to invent the notion of meowing once and then actually have a puzzle piece called meow? So when I want the cat to meow, it will just meow. Well, I can do that, too. Let me scroll down to my blocks here in pink. I'm going to click make a block and I'm going to literally make a new puzzle piece that MIT didn't think of called meow. And I'm going to go ahead and click okay. Now I have in my code area here a define block which literally means define meow as follows. So how am I going to do this? Well, I'm going to propose that meowing just means to play the sound meow until done and then wait 1 second. And notice now I have nothing inside my actual program which begins when I click the green flag. But notice at top left because I made a block called meow, I now have access to one that I can drag and drop. So now I can drag me into this loop. And per my comment about abstracting the lower level implementation details away, I'm going to sort of unnecessarily dramatically just move that out of the way. It still exists. I didn't delete it, but now out of sight, out of mind. Now, if you agree with me that meow means for the cat to make a sound, we've abstracted away what it means mechanically for the cat to say that sound. And so, we now have our own puzzle piece that I can just now use forever because I invented the meow block already. Now, I can do one better than this. It would be nice if I could just tell the meow block how many times I want it to meow because then I don't need to waste time using loops either myself. So, let me do this. Let me zoom out and let me go back to my define block. Let me rightclick or control-click and just edit it. Or I could delete it and start over, but I'll just edit it. And specifically, let me say, you know what, let's add an input, otherwise known as an argument, to this meow block. And we'll call it maybe n for the number of times I want it to meow. And just to be super clear, I'm going to add a label, which has no functional impact, but it just helps me remember what this does. So, I'm going to say meow end time, so that when I see the puzzle piece, I know what the N actually represents. If I now click okay, my puzzle piece looks a little different at top left. Now it has the white oval into which I can type or drag input. Notice down here in the define block, I now see that same input called N. So what I can do now is this. Let me go under control. Glag, drag the repeat block here. And I have to do a little switcheroo. Let me disconnect this. Plug it inside of the repeat block. Reconnect all of this. And I don't want 10. And heck, I don't even want three down here anymore. I can drag this input because it's the right shape. And now declare that meowing n times means to repeat the following n times. Play sound meow until done. Wait one second and keep doing that n total times. If I now zoom out and scroll up, notice that my usage of this puzzle piece has changed such that I don't actually need the repeat block anymore. I can disconnect this. And heck, I can actually rightclick and uh control-click and delete it. just use this under the green flag. Change this to a three. And now I have the essence of this meowing program. The implementation details are out of sight, out of mind. Once they're correct, I don't need to worry about them again. And this is exactly how Scratch itself works. I have no idea how MIT implemented the weight block or the repeat block. Heck, there's a forever block and there's a few others, but I don't need to know or care because they've implemented those building blocks that I can then implement myself. I don't necessarily know how to build a whole chatbot, but on top of OpenAI's API, this web-based service, I can implement my own chatbot because they've done the heavy lift of actually implementing that for me. Well, let's do just a few more examples here. Let's bring the cat all the more to life. Let me throw away the meowing. Let me open up under when green flag clicked. How about that forever block that we just glimpsed? Let me go ahead and now add to the mix what we called earlier conditionals which allow us to ask questions and decide whether or not we should do something. So under this, let me go ahead and under forever say if the following is true. Well, what boolean expression do I want to ask? Well, let's implement how about this program and we'll figure out if it works. Uh under sensing, I'm going to grab this uh very angled puzzle piece called touching mouse pointer. that is the cursor and only if that question has a yes answer do I want to play the sound meow until done. So let me zoom in here and in English what is this going to implement really just describe what this program does less arcanely as the code itself. Yeahouse >> yeah if you move the mouse over the cat it will make noise. So, it's kind of like implementing petting a cat, if you will. So, let me zoom out, click the green flag, and notice nothing's happening yet, but notice my puzzle pieces are highlighted in yellow because it is in fact still running because it's doing something forever. And it's constantly checking if I'm touching the mouse pointer. And if so, it's like I just pet the cat. Now, it stopped until I move the cursor again. Now, it stopped. If I leave it there, it's going to keep meowing because it's going to be stuck in this loop forever. But it's correct in so far as I'm petting the cat. Let me do this though. Let me make a mistake this time. Let me forget about the forever and just do this. And you might think this is correct. Let me click the green flag now. Let me pet the cat. And like nothing's actually working here. Why though logically? Yeah. >> Yeah. The program's so darn fast. It already ran through the sequence. And at the moment in time when I clicked the rear flag, no, I was not touching the mouse pointer. And so it was too late by the time I actually moved the cursor there. But by using the forever block, which I did correctly the first time, this ensures that Scratch is constantly checking the answer to that question. So if and when I do pet the cat, it will actually detect as much. All right, about a few final examples before you're on your way building some of your own first programs with these building blocks. Let me go ahead and open up a program that I wrote in advance in fact about 20 years ago whereby let me pull this up whereby we have in this example a program I wrote called Oscar time and this was the result of our first assignment in this class whereby when MIT was implementing Scratch for the very first time we needed to implement our very own Scratch program as well. I'm going to go ahead and full screen it here. The goal is to drag as much falling trash as you can to Oscar's trash can before his song ends. For which one volunteer would be handy here. Okay. I saw your hand go up quickly in blue. Yeah. Come on up. All right. So, you're playing for a stress ball here if we will. At one at some point, I'm going to talk over what you're actually playing just so that we can point out what it is we're trying to glean from this program. And I'll stipulate this probably took me like 8 12 hours. And as you'll soon see, the song starts to drive you nuts after a while because I was trying to synchronize everything in the game to a childhood song with which you might be familiar. Let me go ahead and say hello if you'd like to introduce yourself. >> Oh, hello. So, I'm Han and uh I'm a first year student. I'm pretty excited for this class. >> All right, welcome. Well, here is Oscar time. If you want to go ahead and take control of the keyboard, all you'll need to do is drag and drop trash that falls from the sky into the trash can. Papa heat. And it's around this point in the game where the novelty starts to wear off because there's like three more minutes of this game where more and more stuff starts to fall from the sky. So as Han, as you continue to play, I'm going to cut over here. You keep playing. Let's consider how I implemented this whereby we'll start at the beginning. The very first thing I did when implementing Oscar time honestly was the easy part. Like I found a lamp post that looked a little something like this and I made the so-called costume for the whole stage. And that was it. The game didn't do anything. You couldn't play anything. You put your green flag, nothing happened. But then I figured out how to turn the scratch cat, otherwise known more generally as a sprite, into a trash can instead. And so the trash can, meanwhile, is clearly animated because I realized that, oh, I can give sprites like the cat different costumes. So, I can make the cat not only look like a trash can, but if I want its lid to go up, well, that's just another costume. And if I want to see Oscar popping out, that's just a third costume. And so, I made my own simplistic animation. And you can kind of see it. It's very jittery step by step by step by creating the illusion of animation by really just having a few different images or costumes on Oscar. Now, I hope you appreciate how much effort went involved into timing each of these pieces of trash with the specific mention of that type of piece of trash in the music. Okay. 20 years later, still clinging. So, you're doing amazing, by the way. How do we get the trash to fall in the first place? Well, at the very beginning of the game, the trash just started falling from some random location. What does it mean for trash to fall from the sky? Oh, big climax here. You got a lot of trash on the ground to pick up. There we go. And your final score is a big round of applause if we could for Han. Thank you. Thank you. So just to be clear now, let's decompose this fairly involved program that took me a lot of hours to make into its component parts. So this is just a sprite. And I figured out eventually how to change its costume, change its costume, change its costume to simulate some kind of animation. And I also realized that oh, I don't need to just have one sprite or one cat or trash can. You can create a second sprite, a third sprite, and many more. So I just told the sprite to go to a random location at Y equals 180 and X equals something. I think I restricted X to be in this region, which is why the trash never falls from over here. I just did a little bit of math based on that cartisian plane that we saw a slide of earlier. And then I probably had a loop that told the trash to move a pixel, move a pixel, move a pixel down, down, down, down until it eventually hits the bottom and therefore just stops. So we can actually see this step by step. And this is representative of how even for something like your first problem said in CS50 and with Scratch specifically, you might build some of the same. So, I'm going to go back into uh CS50 Studio for today, which is linked on the courses website, which has a few different versions of this and other programs called Oscar 0ero through Oscar 4, where zero is the simplest. And truly, I meant it when I look inside this program to see my code. Like, this was it. There was no code because all I did was put the sprite on the screen and change it from a cat to a trash can. And I added a costume uh a costume for the stage, so to speak, so that the lamp post would be fixated there. If I then go to the next version of code, version one, so to speak, then I had code that did this. Now, notice there's a few things going on here. At bottom left, you'll see of course the trash can and then at top right the trash. Here are the corresponding sprites down here. So, when Oscar is clicked on here, the trash can, you see the code I wrote, the puzzle pieces I dragged for Oscar. And in a moment, when we click on trash, you'll see the code I wrote or the puzzle pieces I wrote dragged and dropped for the trash piece specifically. So what does Oscar do? Well, I first switch his costume to Oscar 1, which I assume is this the closed trash can. Then forever Oscar does the following. If Oscar's touching the mouse pointer, then change the costume to Oscar 2. Otherwise, that is if not touching the mouse pointer, change the costume to Oscar 1. Well, what's the implication? Anytime I move the cursor over the trash can, the lid just pops up, which was exactly the animation I wanted to achieve. Meanwhile, if we do this and click the green flag, you can see that in action, even for this simple version. If I move the cursor over Oscar, we have the beginnings of a game, even though there's no score, there's no music or anything else, but I've solved one of my problems. Meanwhile, if I click on the trash piece here, and then you'll see no code has been written for it yet. So, we move on to Oscar version two and see inside it. In Oscar version two, when I click on trash, ah, now there's some juicy stuff happening here. And in fact, this trash sprite has two programs or scripts associated with it. And that's fine. Each of them starts with when green flag clicked, which means the piece of trash will do two things at once essentially in parallel. The first thing it will do is we'll set drag mode to dragable. And that's just a scratch thing that lets you actually move the sprites by clicking on them, making them dragable. Then it goes to a random X location between 0 and 240. So yeah, that must be what I did from the middle all the way to the right. And I set y always to 180, which is why the trash always comes from the sky from the very top. Then I said forever change your y by negative one. And here's where it's useful to know what 180 is, 240 is, and so forth. Because if I want the trash to go down, so to speak, that's changing its Y by a pixel by a pixel by a pixel. And thankfully MIT implemented it such that if the trash tries to go off the screen, it will just stop automatically, even if it's inside of a forever block, lest you lose control over the sprites altogether. But in parallel, what's happening is this. Also, when the green flag is clicked, uh the trash piece is doing this too forever. If touching Oscar, what's it doing in blue here? Sort of teleporting away. Now, to your eye, hopefully it looks like it's going into the trash can. But what does that mean to go into the trash can? Well, I just put it back into the sky as though a new piece of trash is falling. So even though you saw one piece of trash, two, three, four, and so forth, it's the same sprite just acting that out again and again. So here, if I click play on this program, you'll see that it starts falling one pixel at a time. Because it's draggable, I can sort of pull it away and move it over to the trash can like that. And as soon as I do, it seems to go in, but really it just teleported to a different X location. Still at Y= 180. Again, it's not much of a game yet. There's no score. There's no music or anything, but let's go to Oscar 3 now. And in Oscar 3, if we scroll over to the trash, even more is happening here. In so far as I realized, you know what? There was kind of a inefficiency before. Previously, I had these two programs or scripts synonym whereby they both went to the top by going to 0 to 240 for X and then 180 for Y. And if you noticed, I used that here and I used that down here in both programs. Now that too is kind of stupid because I literally copied and pasted the same code. So if I ever want to change that design, I have to change it in two places and I already proposed that we frown upon that. So what did I do in this version? I just created my own block and I decided to call my own function go to top. What does it mean to go to the top? Pick a random x between those values and fixate on y= 180 initially. Now in both of those programs which are otherwise identical, I just say what I mean. Go to top. Go to top. And if I really wanted to, I could drag this out of the way and never think about it again because now that functionality exists. So correct, but arguably better designed. I've now factored out commonality so as to use and reuse my code as well. So let's go up to Oscar version 4 now. And in Oscar time version 4, the trash can does a little something more whereby what have I added to this mix even though we haven't dragged this puzzle piece together before? Yeah. What's new? >> Score. >> Yeah. So, it turns out on the left here, there's a variables category, which is goes beyond the answer variable that we just automatically get from the ask block. You can create your own variables X, Y, Z. But in computer and programming, it's best to name things, not silly simple words like X, Y, and Z, but full-fledged words that say what they are, like score. So, I'm setting a score variable to zero. And then any time the trash is touching Oscar before it teleports away to the top, I change the score by one. That is increment the score by one. And what Scratch does automatically for me is it puts a little billboard up here showing me the current score. So if I now play this game once more, the score is going to start at zero. But if I drag this trash over here and even let it fall in, as soon as it touches, the score goes to one. And now if I click and drag again, the score is going to as soon as it touches Oscar going to go to two and so forth. And you saw in the final flourish with Han playing that once you had the sound and other pieces of trash, which are just really other sprites and I just had wait like a minute, wait two minutes so that the trumpet would fall at the right time. I've broken down a fairly involved program into these basic building blocks. And when you too write your own program, that's exactly how you should approach it. Even if you have these grand aspirations to do this or that, start by the simple problems and figure out what bites can I uh bite off in order to make progress. Baby steps if you will to the final solution. Well, let's look at one other set of examples before we have one final volunteer to come up. And as you'll soon see, it's tradition in CS50 to end the first class with cake. So, in a moment, cake will be served out in the transcept. And please feel free to come up and say hi and ask questions if you'd like to. Let me go ahead and open up though a series of building blocks here via which we can make so-called Ivy's hardest game which is one implemented by one of your predecessors, a former classmate from CS50. So here we have a whole bunch of puzzle pieces written by your classmates but let me go ahead and zoom in on this screen. You'll see that this harbored crest is my sprite. So it's not a cat, it's not a trash can, it's a harbored crest and it exists in a very simple two-dimensional world with two walls next to it. If I click on the green flag, notice that with my hands here, I can go up, I can go down, I can go left, and I can go right. But if I try going too far right, I get stuck on the wall. If I go too far left, I get stuck on the wall. Well, it's the sort of the beginning of any animation or game. But how do I do this? Well, let me go up here and propose that the first thing the Harvard sprite is doing is it's going to the middle 0 comma 0. And it's then forever listening for the keyboard and feeling for walls. Now those are functions I implemented myself to kind of describe what I wanted the program to do. And let's do the shorter one first. What does it mean to feel for the walls? Just to ask the question, if you're touching the left wall, change your x by one. If you're touching the right wall, change your x by negative one. Why have I defined touching walls in this weirdly mathematical way? Yeah. >> Sure. Yeah. >> Like counteracts the movement. Otherwise, you're like not moving. >> Exactly. Because if I've gone so far right that I'm touching the right wall, well, I'm already kind of on top of the wall a little bit. So, I effectively want the sprite to bounce off of it. And the easiest way to do that is just to say back up one pixel as though you can't go any further. And same for the left wall. Meanwhile, let me scroll over to the second script or program that's running in parallel. It's a little longer, but it's not more complicated. What does it mean to listen for keyboard? Well, just check. If the key up arrow is pressed, change Y by one. Arrow go up. Else if the key down arrow is pressed, then change Y by negative 1. Key right arrow is pressed, change X by one, and so forth. So again, this is where the math and the numbers are useful because it gives you a world in which to live. Up, down, left, right. deconstructed into some simple arithmetic values. All right, so the net result is that we have a crest living in this world. Well, let's add a bit of competition here. And in the second version of this game, let me go ahead and full screen it again. Click play. And now we'll see sort of an enemy bouncing back and forth autonomously. So there's no one playing except me. I'm controlling Harvard. Yale is bouncing on its own. And nothing bad's going to happen if it hits me. But it does seem to be autonomous. So how is this working? Well, if it's doing this forever, there's probably a forever loop involved. So, let's see inside here. Let's click not on Harvard, but on the Yale sprite. And sure enough, if we focus on this for a moment, we'll see that the first thing Yale does is go to 0 comma 0. It points in direction 90°, which just gives you a sense of whether you're facing left or right or wherever. And then it forever does the following. If it's touching the left wall or touching the right wall, I was a little clever this time, if I may. I just kind of turn around 180 degrees, which effectively bounces me back in the opposite direction. Otherwise, I go ahead and no matter what just move one step. And this is why Yale is always moving back and forth. So, a quick question. If I wanted to speed up Yale and make this beginning of a game harder, what would I do? Yeah. >> Yeah. So, let's have it move like 10 steps at a time, right? This looks like a much harder game, if you will, like level 10 now, because it's just moving so much faster. All right. Well, let's try a third version of this that adds another ingredient. Let me full screen this and click play. And now you'll see the even smarter MIT homing in on me by following my actual movements. So, this is sort of like boss level material now. And it's just going to follow me. So, how is this working? Well, it's kind of a common game paradigm, but what does this mean? Well, let's see inside here. Let's click on MIT sprite. It's pretty darn easy. go to some random position just to make it a little interesting lest MIT always start in the center and then forever point towards the Harvard logo outline which is the name the former student gave to the costume that the sprite is wearing that looks like a Harvard crest and then move one step. So coral layer of the previous question, how do we make the game harder and MIT even faster? Well, we can change this to be like 10 steps and now you'll see MIT is a little twitchy because this is kind of a visual bug. Let me make it full screen. Why is this visual glitch happening? It's literally doing what I told it to do. It just looks stupid. Yeah. Say again. >> Yeah. It's moving so fast that it's sort of going 10 pixels this way, but then I kind of it kind of overshot me. So then it's doubling back to follow me again, and it's doubling back this way. And because these are such big footsteps, if you will, it just has this visual effect of twitching back and forth. So, we might have to throttle that back a bit and make it five or two or three instead of 10 because that's clearly not desirable gaming behavior here. All right. Well, let's go ahead and do this. Let's put them all together just as your former classmate did when submitting this actual homework. Uh, the game will conclude hopefully in an amazing climax where you've won the game. So, we need someone ideally with really good hand eye coordination to play this final game here. Yeah, your hand went up first, I think. Okay, come on up. Big round of applause because this is a lot of pressure to end. All right. So, if you win the game, cake will be served. If you don't win the game, there will be no cake. >> Okay. But introduce yourself in the meantime. >> Hi, I'm Jenny Pan, freshman at Hollis and I'm actually a CS major or concentration. >> Nice to meet you. Head to the keyboard here. This now is the combination of all of those building blocks and even more aka Ivy's hardest game. You will be in control just as I would of the harbored crest. And the goal is to make it to the exit, which is this gentleman on the right here. And you'll see there's multiple levels where it's each level gets a little harder. All right, here we go. Heat. Heat. All right, this is CS50 and this is week one, our second week together. And you'll recall that last week, week zero, we focused on Scratch. Ultimately, this graphical programming language by which you can drag and drop puzzle pieces that interlock together only if it makes logical sense to do so. And many of you had actually probably played with that in like middle school or even prior at some point. But for our purposes, the goals of Scratch were to give us sort of a mental model for some fundamental constructs that we're going to see again and again today in C in a few weeks in Python and even thereafter. And those include things like functions and return variables and arguments and variables and loops and conditionals and more. And so even if today feels like a bit of a fire hose, such as that picture here, appreciate that a lot of today's ideas are exactly the same as last week's ideas, it's just that the syntax is going to change. It's going to look a little different. It's going to look a little scarier. It's going to be harder to sort of memorize, except with practice will come that muscle memory, but the ideas ultimately are going to be the same. And indeed, this is, if unfamiliar, uh MIT down the road has a tradition of hacks whereby students once a year do something fairly crazy. And at this point, they happen to connect an actual working uh drinking fountain to an actual fire hydrant. And the sign there, very pixelated, says, "Getting an education from MIT is like trying to drink from a fire hose." And that's indeed how computer science, how programming, how CS50 will sometimes feel, but realize that what's going to be ultimately most important is not where you uh feel you are day after day, but where 3 months from now you feel that you are relative to last week alone. so-called week zero. So, let's look back at what week zero looked like. It looked a little something like this. The simplest of programs by which we get get that cat to say hello world. Today, that same code is going to start to look a little like this, which was a glimpse we gave you last week. But this time, I've deliberately colorcoded it to try to send the message that whereas in Scratch, we had this yellowish puzzle piece that sort of kicked things off that didn't really do anything itself, but it got the program started, whereas the real work was done in purple here. Same is going to be true today whereby I'm going to wave my hands for a little bit of time at this yellowish code on the screen. But what's really going to have the most effect is this same purple line here and the white text within. And we'll break down what all of these lines mean over the next couple of weeks. But sometimes we'll wave our hand at details if we feel it's a little unnecessary at this point in the story. And in fact, let me get rid of the color coding for now. And we'll see that this is the kind of code in a language called C we're going to start playing with and using today and for the next several weeks. And indeed, it's representative of what we're going to generally call source code. So source code is what programmers write. It's what you write. It's what you wrote, albeit by dragging and dropping puzzle pieces. This week onward, you're going to start using your keyboard all the more. And you're going to write source code. So this is code that we humans can understand with some training and with some practice. But of course per last week, what language do computers ultimately understand? Only >> so binary zeros and ones. And so you and I, yes, can write code starting today in a form that looks a little something like this, which admittedly might look a little arcane and cryptic, but it's certainly better than a whole bunch of zeros and ones. But we're going to write in source code. But the machines that we write code for ultimately only understand these here, zeros and ones, which may very well say hello world, but we're going to call this moving forward machine code. So machine code is what the the computers understand. Only the zeros and ones. Source code is what you and I understand and actually write. So it stands to reason that we're going to have to somehow translate one to the other from source code to machine code. And I alluded to this ever so briefly last week, but we're going to use this same mental model whereby the source code we write might be the input to some problem. The output we want there from is going to be the machine code. So what we're going to equip you with today inside of this proverbial black box is a special piece of software that takes source code as input, produces machine code as output, and that type of program is called a compiler. And there's bunches of difference of compilers in the world. We're going to have you use one of the most popular ones, but it's simply a piece of software that someone else wrote that converts one language to another. Source code, for instance, in a language called C to machine code, the zeros and ones that our Macs, PCs, phones, and other devices actually understand. So, where are we going to do this and how are we going to do this? So, I promised last week that we'd introduce you to this year tool, which I used briefly at the very start of class to whip up that chatbot. We're going to use it though not for Python this week, but indeed for a different language, C. And indeed, this tool, Visual Studio Code, or VS Code for short, is super popular in industry. This is what real programmers, so to speak, are using all of the time nowadays. There's absolutely alternatives. If some of you have programmed before, you might have used or experienced different tools, but this is a very common tool that you'll see even after CS50. And in fact, it's something that ultimately you can install for free on your own Macs and PCs so that by the end of the course, you're completely independent of CS50 and any CS50 related tools. But what we have done for the very start of the class is essentially provided you with a cloud-based version of this tool. So all you need is a web browser on any Mac or PC or the like so that everything's pre-installed for you, preconfigured for you, and you don't have to deal with the stupid technical support headaches at the start of the term because it should just work. But by the end of the term, once you're a little more comfortable with technology and with code in particular, you can absolutely offboard yourself from this tool. Install it, download it on your own Mac and PC and have pretty much the exact same environment completely under your control. So, starting today, you're going to see an interface that looks quite like this quite often. And we used this same interface last week ever so briefly. Moving forward, here's where we're going to write code. At top right is where one or more code tabs are going to appear, similar to any tabbed uh environment that you might use. Here, for instance, is just a screenshot of the first file we'll create today called hello.c. The reason it's called hello.c is because it's in a language called C, as we soon shall see. No pun intended. Meanwhile, the code here happens to be colorcoded, not quite in the same way as you saw before cuz I manually made it look more like scratch blocks. But among the features that VS Code and other programming environments provide is something called syntax highlighting whereby you don't worry about or even think about these colors. But as you write out code in a recognized language, tools like VS Code will just color code different parts of your code for you just to make different features jump out. And we'll see what those features are over the course of today. But you'll also spend a good amount of time, as I briefly did last week, down here in the bottom right of your screen, the so-called terminal window, which is going to be where you run commands for compiling code and writing code. And in fact, as we'll see today, you're going to start using your mouse and clicking a little bit less. You're going to start using your keyboard and typing a bit more. And ultimately, even though if at first that might feel like a step backwards to sort of not use something that's so user friendly, the reality is most every programmer tends to find themselves ultimately much more productive, much more powerful using the keyboard more often, more quickly than say a traditional mouse or trackpad would allow. Meanwhile, we'll see some somewhat familiar features here at left, like this is where you'll see the files and folders that will create over time. At far left here is going to be an activity bar, which is essentially a modern form of a menu via which you can open and close things and access other features. For my purposes, I'll generally hide this part here. I'll generally hide this part here so that when we're together, we're focusing almost entirely on code and commands, but I'm just typing some quick keyboard shortcuts to simplify my own user interface in that way. So, with all that said, just some terminology. So this whole collective environment that I'm describing here is generally what's known as a graphical user interface. Why? Well, it's an interface for users that's graphical in nature with icons and buttons and the like. Shorthand notation for this is guey, GUI for short. But within this graphical user interface, as promised, is going to be that terminal window at bottom right where I promised we would be typing most of our commands. And just to give you a bit more jargon in computing, that's generally known as a command line interface or CLI for short, whereby you're typing commands into that interface instead. And the world of computing software is essentially divided into gueies and CLIs and sometimes a piece of software might have one of each as well. But without further ado, why don't we go ahead and focus entirely first on this here program, which I dare say is the simplest program you can write in a language like C and see how we can actually compile and run it together. So, I'm going to go over to VS Code here where I've hidden my file explorer with all the icons and I've hidden my activity bar so that only do I have room for tabs of code and the command prompt at the bottom. I'm calling this a command prompt because it's at this dollar sign where I'm going to run some of my commands. And it's a dollar sign by convention. It has nothing to do with currency. It's just a computing convention. Some systems will use a carrot symbol. Some systems will use a greater than symbol rather or something else. But it just means type your commands here. The first such command I'm going to type is this code hello. C with a single space in between. I've not used any spaces in the name of the file. I've not capitalized any aspect of the file just because this is convention. Unlike your Mac or PC where you might be in the habit of naming files with spaces and capitalization, generally you'll make your life simpler by just using lowercase and no spaces at all. As soon as I hit enter, what you'll see is that a brand new tab appears called hello C with a cursor blinking on line one. And this is essentially VS code waiting for me now to type the first line of my code. Notice though that the command is complete there by whereby I am have another cursor here which I've give if I give click in the terminal window and give foreground to it my cursor might blink there instead that just means I can type another command when I am ready. So let's go ahead and whip up this code and I've done this many times so I can type it fairly quickly but in this tab I'm going to do include standard io.h h so to speak int main void then inside of so-called curly braces indenting therein by four spaces I'm going to say print f quote unquote hello world back slashn close quote semicolon and voila I've written my first program in C in a class like this no need to write down each and every line of code that I write in fact on the course's website will be copies of everything that we've done as well as excerpts there from in the courses notes but you're welcome but not expected to follow along in real time with what I am typing here. So that's it. Like I've written my very first program in C. If I had done this on an actual Mac or PC without a command line interface, I might have a new icon on my desktop, so to speak, called hello. And ideally, I could double click on that or tap on it and run the program. But because I'm in this specific programming environment that has a mix of a guey and a CLI, I actually need to click down in my terminal window. And I need to now compile this program first because at this point in time, it exists only as source code. So to do this, I'm going to compile my code by very aptly saying make space hello. And I'm pronouncing the space, but literally I hit the space bar. Make space hello as it sort of implies semantically will make a program called hello. Notice I have not said hello.c C again because the compiler, let's call it make for now, even though that's a bit of a white lie, is going to infer that if I want to make a program called hello, it's going to automatically look for a file called hello. C in this case. So, a bit of magic. Enter. And remarkably, anytime you don't see any output at a command like this, that's probably a good thing. Generally speaking, when you see output when compiling your code, you have done something wrong. Or in this case, I might have done something wrong. But no output is good because what I can now do and this is a bit cryptic. I can run this program not by double clicking or tapping anywhere but by doing dot slashh hello with no spaces. And this is a bit weird but what the dot slash means is that a having just made a program called hello that program is going to end up in my current folder. It's somewhere in the cloud. Yes, more on that in a bit. But the program called hello is just somewhere in my current folder. When I say dot slash, that's like saying go into the current folder and run the program therein called hello specifically. Now, as I often do, I'll cross my fingers, hope that I didn't mess this up in any way, and I should see in a second hello world indeed printed onto the screen. And so, just to recap those then commands. One, I ran code hello.c, which is a VS code specific thing. Code short for VS Code just creates a new file called hello.c. And then I'm on my way with my own keyboard. Make hello compiles that source code into machine code thereby creating a new file called hello. And to run that program hello, I type this strange command dot /hello. But this is a paradigm. No matter what you call your programs, we're going to see again and again and again. So even if you've not done something quite like this, it will very quickly get familiar. Yes. Questions. How when you say make hello, how like how does how do you how does the computer know like what part of the code to what part of the code is ascribed to hello? >> Good question. When I say make hello, how does the computer know what part of the code is ascribed to this program hello? It literally is going to take the entire contents of hello.c and turn them somehow into a program. >> And does it have to be like named hello? >> Does it have to be named hello? No. I could have called it goodbye or anything more my first program C. anything at all so long as I change these words here accordingly. >> But it has to like it needs to be like from the same thing like it needs to >> Yes. >> have like green C and make green or whatever. >> Exactly. If you change the name there you need to change your commands accordingly. Other questions on these here steps? No. All right. So let's tease apart what it is we just did and like why this code works in the way that it does. Well, to recap, in Scratch, we had a program like this. When the green flag was clicked, we wanted to say hello world onto the screen. The code that corresponds to that is roughly here. And indeed, notice that the yellowish or oranges code lines up with the when green flag clicked. The purple code here lines up with the say block. And the white code inside of here roughly corresponds to what was in the white oval that we kept using again and again last week. So, let's do more of a onetoone correspondence. And these slides are deliberately designed to give you again that sort of mental model of taking same ideas from last week and just changing the syntax this week onward. So when we have a function like this thing here and recall that a function is just an action or verb. It sort of accomplishes a small piece of work in code in C specifically you're going to type of course not a purple puzzle piece but you're going to say the word print. Well, more technically print f where the f as we'll soon see means format the printed output because this is more powerful than just printing some raw text alone. Then you can have parentheses open and close left and right. And notice that it's no accident that MIT MIT chose an oval for their input to functions because it roughly looks like the start of a parenthesis and parenthesis on left and right. Meanwhile, what goes inside of the parenthesis in the corresponding C code? Well, at the end of the day, minimally hello, world because that's literally what we want to print to the screen. But in C, unlike in Scratch, there's a bit of overhead, a bit of additional syntax that you just got to deal with to make clear to the computer what you want to print. In particular, you're going to have to surround everything you want to print with double quotes to make clear that hello is not some special function or variable or something else. It's hello world is the English phrase that you want to print. So double quote here, double quote there means here's the beginning and the end of what I want to print. You're also curiously going to put a backslash in most cases at the end of the word or words you want to print. We'll take that away in a moment and see what it does. And then lastly, and perhaps most annoyingly in programming circles, you have to finish your thought with a semicolon. Much like in English, you would finish most sentences with a period instead. And the thing in the thing about programming is with C in particular, if you mess up almost any of these details I just rattled off, something's going to go wrong. And so you're in good company. The very first program you try to write or try to compile, odds are it might not work correctly because you'll develop over time the muscle memory for spotting all of these seemingly minor and actually minor details, but that do matter to the computer. All right. So if you're familiar of course with the notation in like mathematics of functions like a function in code is really the same idea as a function in math whereby the function f takes some input for instance x and generally produces some output. So if you're coming more from that background realize that what we're really doing here is roughly the same but in code recall that we can have different types of output. So if this is our grand mental model and say we've got a function as inside of this black box that takes arguments, that is to say as its inputs, it can sometimes have side effects. And recall that side effects are often visual things that happen as a result. They display on the screen. Maybe it comes out of the speaker. It's something generally ephemeral that just happens. But it's not necessarily useful in the same way as another type of function that we'll return to in just a bit. But last week, recall that we got the cat with a speech bubble to uh manifest on the screen and say hello world in that speech bubble when the input was hello world and the corresponding function was instead say. So let's see if we can't now tease apart what the code we wrote is actually doing for us bit by bit. So let me go back to VS Code here and let me propose to break this in a little way. Let me delete the backslash n if only because at first glance who knows or cares what that's doing. Let's just get rid of it if we don't understand it. I could now go back down to my terminal window and I could do dot /hello enter again. But there's seemingly no change, which is good. Doesn't seem like I broke it, but I've kind of misled you here. Why? Why did nothing seem to change? I didn't recompile it. So, recall that the compiler converts source code to machine code, but I already did that a couple of minutes ago. If I've changed the source code, it stands to reason that I need to recompile the code to actually see the effects of that. So, let me do that again. Make hello enter. Nothing seems to have gone wrong, but let me now dot /hello enter. And it's subtle now. And in fact, let me go ahead and zoom in. It's really just an aesthetic bug in so far as functionally the program is still technically printing hello world. But what's seemingly wrong? Or put another way, what did the backs slashn apparently do? Yeah. >> Yeah. So, it's somehow giving me a new line. And that's essentially what the back slashn denotes is give me a new line there. And why was I doing that? Well, really just for the aesthetics. Like if this dollar sign represents my prompt where I type commands. If anything, it just looks kind of stupid that I finished a program over here and then the prompt is on the same line. It just looks wrong. Even though you could sort of argue that was my intent, even though in this case it wasn't. So, what would the alternative be? Well, what you're seeing here is what's actually generally known as an escape sequence, which are sort of uh special sequences of symbols like backslash and n in this case that do a little something unusual. And here's just a non-exhaustive list of some you'll encounter in the real world and including in CS50. Back slashn moves you to a new line. Back slash r is a so-called carriage return. If you've ever seen or used an old school typewriter, this refers to the process of bringing the typing head back to the left end. So it sort of moves the cursor horizontally as opposed to vertically. This one's interesting. Back slash double quote. Why does there exist this pattern? Back slash double quote. Yeah. >> If you just write double quote, it closes the >> exactly. So recall that phrase we tried to type uh print out like hello, world. If for some reason you didn't want to say hello world, but you wanted to say some or like sort of snarkily like hello world or something like that, well, you can't put a quote a quote a quote and a quote and expect the computer to know which quote corresponds to what. It's just arguably ambiguous. So if inside of double quotes, you actually want to print actual double quotes, this is a escape sequence that tells the computer, this is not some quote delim delineating where my thought begins and ends. This is literally a double quote. And we'll see other situations in which a single quote or apostrophe is the same. We'll see crazy situations in which you want to print a backslash, but backslash already has some special meaning. So there's solutions to all of these problems. But let's not get too far into the weeds here. But let me go back to the code and propose what the alternative otherwise might have been. If I didn't know about backslashn, my instinct to move the cursor to the next line might have been literally to just like hit enter or do something like this, like move the double quote, move the parenthesis, move the semicolon on to the next line. But this should start to rub you the wrong way. And indeed, this violates a principle of most programming languages and that most programming languages are linebased. You sort of start and finish your thought ideally on the same line. And this runs a foul of that. And two, even if you're seeing code for the first time, assume that this just looks stupid as well to sort of move part of your thought to the next line, it just looks a little sloppy. And it is. So C and many other languages, Python among them, solve this by giving you these so-called escape sequences. So if you want a new line there, you do back slashn and you will get your new line there. Now, that's a bit of an overstatement what I said in that sometimes lines of code will be so long that they do wrap onto multiple lines, but generally that's a convention that we're going to try to avoid. All right, what else could go wrong? Well, let's do this. Let me go ahead and clear my terminal window, which I can do by hitting uh L or I can literally type clear. And I'm going to frequently do this just to keep the screen clear, even though it has no functional impact. It's just an aesthetic. Let me do something else accidentally. Suppose I forgot to finish my thought and I omitted the semicolon, but otherwise the code is perfect. Let me do make hello. Now enter. Now we're going to see some output that's a little more arcane. Let me go ahead and scroll back up here to make clear that what's just happened is I ran make hello, but I didn't get back to another prompt. I don't see immediately a dollar sign because there's an error message here that is almost as long as the code I tried to write. Not to worry. Let's see. Here is the name of the file in which the problem exists. Stands to reason that it's in hello C. Here is the line uh number in which the problem seems to exist. Line five. And that's helpful because it lines up with this. And then if you're you care to count, this is the 29th character. So if I count from left to right around character 29, something is wrong. Something is missing. So it's a pretty decent error message. In fact, it even says expected semicolon after expression. There's a little green carrot symbol pointing me at the mistake. So this is an again a this is another value of the compiler. Not only will does it know how to convert source code to machine code, it's also pretty good at finding mistakes in your code and trying to draw your attention to them. So how do I fix this? Well, assuming you've understood the error message at this point. Well, you just go back in, add the semicolon. Let me go back down to my terminal window. I'm going to clear it just to clean up the mess. Let me rerun make hello. And now we are back in business. And indeed, if I do /hello, I've got hello world back on the screen. Well, let's make one other mistake. Suppose that I forgot, as you sometimes will, to include this line at the top, which will make more sense next week, but for now, let's just omit it and dive right into the code. You would think this is enough, just printing out hello world. Well, here, let me go back down to my terminal window. Let me do make hello again now. And I'm going to get a whole different error message instead. So now problem is still with hello C. That makes sense. Line three. Okay. So somewhere in there print f is suddenly the problem even though the semicolon is back and the back slashn is back. So let's keep reading. Error call to undeclared library function printf with type int. And then this is a whole mouthful. So, here is an example of an error message that unless you're sort of conditioned to know what this means and you've seen it before, it's quite more cryptic and unclear like what the solution to the problem is, especially when the rest of your code is truly correct. I've just forgotten something stupid. But how can I sort of think about this problem? Well, it turns out that another feature of C is that it comes with a bunch of header files. A bunch of files whose names don't end in C, but end inh. And these so-called header files which end inh are contain code that other people wrote that you can use in your own programs. So for instance in this particular case a header file is giving us access to what's more generally in computing called a library. A library is code someone else wrote that you can use. And I actually used a library last week when I did that import line and mentioned open AAI the company. I was actually using a library from that company that I had automatically downloaded and installed into my programming environment in advance of class because I don't know how to implement a chatbot without standing on their shoulders and using a lot of the code they themselves wrote. Same idea here. Even though print f is a feature of C, if you want to use it, you have to include that library by telling your program to include the header file that defines that function. And you only know this by being taught it or looking it up in a book or a reference. But in this case, I wanted to use a header file called standard io.h stdiodio.h. Um, it is not studio.h. This is a very common bug online. Um, if you find yourself typing studio.h, typo, it's standard io.h. And in that file then is defined the printf function. So, if I go back to my code here, the solution to this problem truly is to just undo the deletion I made a moment ago. Because what line one is now doing for me is it's telling the compiler, oh, by the way, I didn't write all the code that I'm about to use. Please include the definition of print f from this other file called standard io.h. And again, you'd only know this by looking it up in a reference, attending a lecture or something like that. It's not obvious otherwise, but these are the kinds of things you very quickly look up. So, where do you look them up? Well, it turns out the ecosystem of C has, you know, hundreds of books you can buy or download, many, many, many websites. Among them is one of CS50's own. And in fact, the conventional way to look stuff up for the programming language called C is to look at the official manual pages or man pages for short for the C language. Unfortunately, many of them were written decades ago and they were certainly written by fairly advanced programmers and not for a broad audience. And so what we have done is imported all of that freely available documentation uh hosted it at our own URL here manual.cs50.io and we've essentially simplified it for those less comfortable those of you who might be less familiar with less comfortable with technology and really for most people who aren't used to reading manual pages. It's just useful to have it written in teaching assistant like language instead. So for instance if you go to a URL like this you'll see CS50's documentation for this official library standard io.a H that comes with C itself. If you get a URL like this, you can look up the documentation for print F itself specifically. So for instance, let me go ahead and just give you a teaser for this. If I were to do the same on my own computer, I might see the CS50 manual pages here and you'll see header file by header file a bunch of frequently used functions in CS50. We've also filtered the list down from a massive list to much shorter list so that you can sort of see what's most likely useful to you. If you go to a specific page like standard io.h, you'll see for instance here just over a halfozen functions that we won't touch on today beyond print def, but that we'll see in the class over time that does useful stuff. For instance, printf prints to the screen. And we'll see other functions for opening files, closing files, and the like because all of that's related to standard IO input and output. If I go to a specific man page for uh this uh header file, you'll see the standard formatting for these pages. So, here's the name of the function, print f, and it prints to the screen. You'll see a synopsis, and this indeed indicates we're in less comfortable mode. If you want to see the original, more arcane documentation, just uncheck that, and you'll see the original official documentation, but you'll see a mention of like what header file this function is defined in so that you know what file to use in your own code. You'll see a so-called prototype, which is just the first line of code from that function. More on that in just a little bit. You'll see an English description. You'll see example code. Long story short, this is the authoritative answer. And even though you have access in this class to the virtual rubber duck at CS50.AI and in other forms of it that you'll soon see, you should also have the tendency and the in instinct moving forward to check the official documentation. And all of today's AIS are trained on things like the official documentation. So that's the source material that any of these AI, the ducks among the duck among them are actually relying on. But what we're also going to see is that besides these official functions, there's some that CS50 itself has invented. We use these really as training wheels for just the first few weeks of the course and then we take these training wheels off. But the reality is in a language like C, certain stuff is just really hard or annoying to do. Certainly if you're learning how to program for the very first time or at least you are new to C. We'll eventually show you how to do it that way. But even if you just want to get input from the user like a string of text or a number of some sort, it's generally not that easy to do in C, at least in these early days. So for instance, at this URL here, you can see documentation for CS50's own library and CS50's own header file, CS50.h. And you'll see such functions in the documentation as these get string, get int, get char, and a bunch of others as well. And we'll touch on those this week. But it will ultimately be a way of just getting useful work done quickly by standing on our shoulders and actually uh using functions we wrote to then solve problems of interest to you. So let's focus for instance on one of these first. Get string. A string in programming speak means text. Zero or more characters of text like h e l l o comma space w o r l d. That is a string of text in computer speak. And it's obviously not a number like 50. It's actual text that you would type on the keyboard. We'll see then what other things we want to get. But with this pro this function, we can start to replicate another program that we implemented pretty quickly last week in Scratch. So recall that in Scratch, this one was a little more interactive. I used another blue puzzle piece ask to actually get input from the user. And recall that unlike the print defaf function today and the say block last week, this time we still have the same input output model, but if we pass in arguments to a function uh that we're about to see, you can get back not just a side effect sometimes, but a return value like a useful reusable value like the person's name as we'll soon see. All right, so let's actually do this. If in Scratch the equivalent was asking the user, what's your name? asking them that and then waiting for an answer that we can store in a variable. Let me propose that in C side by side it's going to look a little something like this. Instead at left we have the scratch block the ask function here is the argument there too and then it and wait just means it's going to wait till the user finishes typing. If I want to translate this to C now today moving forward well it looks a little something like this. The closest analog in C thanks to CS50's library is going to be a function called get string. So there's no C function called ask. And we deliberately named this function get string just to make super clear what it is you are getting. A string of text in this case. And we've got the parenthesis ready to go indicative of this white oval for user input. If I want to prompt the user with that same phrase, what's your name? Well, I can just put it inside of those parenthesis. But what next do I need to add around my user input? Um, you did the quotation marks. >> Yeah, I need the quotation marks just to make clear that these aren't special individual words. This is a whole phrase that I want to be displayed to the user. So, I'm going to indeed put double quotes around everything. And this is just an aesthetic. I don't in this case want to bother moving the cursor to the next line. Like, I want the user to see the question and I want the cursor to just stay there blinking waiting for their prompt. But I don't want the cursor to be right next to the question mark. So, I'm deliberately just leaving a single white space there just to kind of scooch it over a bit so it looks a little prettier, at least to my eye. Now, we're not done yet because we need to do something with this value. The get string function, as we'll soon see, is going to prompt the user for me to type something in like my name. But where do I want to put that? Well, MIT has the answer put in a variable called answer. And you can't rename that in Scratch. It's just defined as answer. But in C, what I'm going to need to do is something like this. If you want to keep return values around from a function, you literally use an equal sign and then to the left of it, you put the name of the variable into which you want to put that return value. So in mathematics, we would use X, Y, and Z as our variables. Again, in code, as in Scratch, you can name your variables anything you want. By convention, they should usually be lowercase. They should not have spaces therein, similar to file names. But this is a pretty good analog now of what's going on collectively here. But C is a little more precise. It you can't just give the variable a name. You need to tell C or really the compiler what type of value you want to put in this variable. So if it's a string of text, you put string. If it's a number, you're going to put something else. But for now, it's a string. Per the function's name, it's going to give me a string. Now, we're so close to finishing this comparison. There's one detail missing. What's still missing from the code here? Yeah. >> Yeah. So, we have to finish the thought lastly with a semicolon. So, if you're getting to sort of the point already, like this is one of the reasons why we start with Scratch, you sort of you get the intuition pretty quickly. And even though nothing on the right hand side is particularly hard, there's just all these stupid little details that you have to ingrain in yourself over time. In this case for C, but for many programming languages, we're going to see the similar paradigm. But among the goals of the course too are to show you how ultimately languages have been evolving. And so one of the things we'll see in Python in a few weeks time that some of this syntax actually goes away because over time humans have gotten annoyed at older languages like this. Like why the heck do I have to keep putting a semicolon when it's clear that I'm at the end of the line. So we'll see among languages like Python we can get rid of some of these same features. But for now it's just a matter of remembering what goes where. All right. So, let's go ahead now and take that same idea of converting Scratch to C and actually do something with this code. Let me go back to VS Code here. I'm going to keep my file name the same, but what you'll see on CS50's website is that we'll add version numbers to each of the examples that I'm typing out. So, you can actually see the progression of these programs, even though we're not changing the name. And what I'm going to go ahead and do here, for instance, in hello C this time, is the following. I'm going to go ahead and uh first get rid of the single hello world. I'm going to go up here and include this time cs50.h. So, not one but two header files. And then inside of my curly braces, inside the so-called main function, as we'll soon call it, I'm going to go ahead and do this. Exactly the same line of code as on the screen before, I'm going to get a string prompting the user for what's your name question mark space close quote semicolon. And as an aside, this will will soon see print on the screen what's your name. So that implies that the get string function is actually using print f itself to print out that message. I do not need to use print f to display that message on the screen because I read the documentation for CS50's get string function and I just know that it is using print f for me to achieve that particular goal. Now let me do something intuitive but not quite correct. If I want to print out that answer so that the expression is going to be not hello world but hello David or hello Kelly. Let me go ahead and say hello, answer back slashn to move the cursor down as before. semicolon. So this is not quite right. And even if you've never programmed before, you can perhaps see where this is erroneously going. Let me remake the program because I've changed the source code and I need new machine code. Nothing seems to be wrong aesthetic uh uh logic rather syntactically. But if I do now dot /hello and hit enter, you'll see I'm being prompt. What's your name? So I'm going to go ahead and type in David and then hit enter. But when I do, if you know where this is going, what am I going to see instead? >> Hello answer. And the computer's just doing literally what I told it to do. I said quote unquote print out hello answer. But obviously that's not the goal that I have in mind. So how do I actually work around that? Well, what I really need to do is achieve the equivalent of this thing here, which we did by stacking blocks in Scratch or nesting them, if you will, one inside of the other. So, I want to join the expression hello, space, and that answer. And it turns out in C, you can't do it quite like this. Like, there isn't an analog of the join function, at least that we'll see today. So, we have to do this a little bit differently. We can do it though by maybe telling the computer, we'll go ahead and print out hello, comma, space, and then maybe we can give it like a placeholder to plug in the name once we know the name. Because when I'm writing my code, I have no idea who's going to play this game, me or Kelly or someone else. So, what if we use special syntax to indicate where I want the person's name actually to go? Let me propose that we now do this. instead of printing out hello quote unquote uh hello comma answer quote unquote let's go ahead and start printing out something and I got my parenthesis ready to go and I did my semicolon in advance this time I want to somehow now say hello placeholder and you would only know this by someone having told you or a reference online percent s is the placeholder for a string that you don't know when you're writing the code but when someone else is running the code it will be filled in and substituted for other input. So, hello, percent s is the closest we can get to this. I still need though some other syntax. I still I do need those quotes on the left and the right just to be uh aesthetically pleasing. I'm going to put a back slashn there at the end to move the cursor, but now I've left room in my parenthesis for one more thing. And you can perhaps guess where I'm going with this. Again, even if you've never programmed before, this is telling print f print out h e l o comma space something. What should I probably pass in to these parentheses as a second input so that print f knows what that something is? Yeah, >> the variable. >> The variable name. So the variable in which I have the user's name and indeed the convention is to put a comma after the quotes and then the name of the variable that has the value you want to be substituted for that placeholder. Now notice there's a collision of syntax and grammar here. The comma inside of the quotes is just an English thing. Hello, comma, so and so. The comma outside of the quotes is meaningful to C because it delineates which is the first input or argument to left and which now is the second. And we haven't seen this before in C. Up until now, we've only been passing one input, but you can pass in two or three or four. Completely depends on what the function is designed to expect. So, let me put this all together now. Let me go back to VS Code. Previously, we were literally printing out answer, but I can change answer to percent s. I can move my cursor outside of those quotes, comma, answer, because that's the name I gave to that variable. I can go back down to my terminal window and clear it just to reduce clutter. Let me do make hello one more time. Seems to work. Dot /hello. Enter. DAV ID. And now hello, David is printed. Okay, questions on any and all of that. >> I was wondering with the header file, where is it pulling from? >> Good question. Where is it pulling these header files from? So, what you are seeing here is a graphical user interface that's somewhere hosted in the cloud at cs50.dev, the URL I mentioned last week, and we're going to tease this apart in just a moment. That software is running on a computer, and that computer's got a hard drive or a solid state drive, like folders of storage. Those files, CS50.h and standard.io.h age and many more are pre-installed on the server to which I have connected and they're stored in a standard place so that the compiler in particular knows where to look for them and those are all things we did in advance for you. Yeah. >> Why is back slashn not create a new like a new line? >> Why does the back slashn not create a new line? So it is back slashn is essentially being printed here which has the effect of pushing the dollar sign to the next line. Otherwise, the dollar sign would stay on that second to last line. Other questions? >> Why is there no backslash on this? >> Good. Uh, why is there no backslash and over here? >> Good question. My choice as the programmer. I just wanted to see the sentence, what's your name? And I wanted the user me to type my name immediately after it like this. But I didn't have to do it that way. I just wanted to show you the difference. >> Gotcha. And then also like just generally when we're like doing the work should we always write the like first four lines. >> Should you always write the first four? Oh these. Yes. For today trust me do this, do this, do this, do this. And next week we'll understand even more what those lines do. However, slight caveat only use cs50.h if you're using one of our functions. Clearly you don't need cs50.h if you're just printing something out as in the first example. Other questions? is dividing the first input and the second input. I understand that the second input is what I type as the user. The first input doesn't really feel like input for me because that's the question that you asked. Can you like explain a little bit why both say input? >> Correct. So to to summarize the question on the right here, this input is effectively provided by the user. This first input though is provided by me. That's the way it is. So uh these are both inputs because they're being provided as inputs to the function. The origins of those inputs though are entirely up to what I'm trying to achieve. The first one I know in advance like I'm the programmer. I know I wanted to say hello, someone. The second input I don't know in advance. So I'm using a place I'm using a variable to store the value that I'm going to get when the get string function is used later on. But they're both inputs even though they're used in different ways. Good question. Any others? No. Okay. So, if we now have that done, well, let's just take a step back into the first question that was just asked about um where are these files? Let's take a look back at actually what it is we're actually using here. So, it turns out even though most of you are using Mac OS or Windows, there's other operating systems out there in the world. Phones have iOS. Uh iPads have iPad OS. Uh Android devices have Android, which is its own operating system. The operating systems in the world are the pieces of software that really just do the most fundamental operations on a device like booting it up, shutting it down, sending something to a printer, displaying something on the screen, managing windows and icons and all of that sort of commodity stuff that is used by other people's software as well. A very popular operating system in the programming world and in the world of servers in the cloud and on the internet at large is called Linux. And it's a descendant of something called Unix um which has been around for quite some time and it's what many programmers most programmers um use depending on their environments in so far as Linux is very highly performant like you can support thousands of millions of users on servers running an operating system like this. It tends not to but it can have a graphical user interface which just means it can operate more quickly because it doesn't need all of these graphics that are really just for humans benefits not necessarily for web browsers and other devices. And Linux in so far as it's usually used or often used as a command line interface comes with a whole bunch of commands that you'll start to use and see over time. Now I've used a bunch of commands already. I've used code which is a VS code thing. I have used make which is for today's purposes our compiler but that's a little white lie that we'll distill next week. Uh and then I've used dot /hello which is a command I essentially invented as soon as I created a program called hello. But there's a bunch of other ones as well. For instance, if I want to list the files in my current folder, I can type ls and hit enter for short. If I want to uh create a new folder, otherwise known as a directory, I can use mkdir to make a directory. If I want to remove a directory, I can use rm directory. If I want to remove a file, I can use rm. If I want to rename a file, I can use mv for move. If I want to copy a file, cp. If I want to change directories, change into a folder, I can use cd. Now, these two just take a little bit of time and practice to memorize them, and they're all very tur in so far as the whole point of a command line interface is to let people navigate things quickly. So, for instance, even though this will be a bit of a whirlwind, let me go back into VS Code and let me propose that we play around with just a few of these commands so that you've seen me doing it, but generally speaking, in CS50's problem sets, we will tell you step by step what commands to type so that you can achieve the same results. And then later in the term we'll stop bothering reminding you pedantically how to do uh this and that because it should come more naturally eventually. But for instance let me go ahead and do this. Let me go ahead and reopen my file explorer at left. Yours will look a little different. You'll have a different number as your unique ID but generally you'll see whatever files and or folders you've created already. The first thing I created today was called hello.c. And then by using make I created a second file I claimed called hello. So the reason hello works is because there is in fact a program called hello in my current folder ergo the dot that was created when I compiled my source code into machine code. Now suppose for the sake of discussion that this is going to get messy quickly because the more programs we create in class and for problem sets, you're just going to have a hot mess of files inside of this one main folder. Well, let's create subfolders like you might be inclined to do on your Mac or PC or Google Drive or whatnot. Well, we can do this in a bunch of ways. I could rightclick or controll-click on my file explorer, and I'll see a somewhat familiar uh contextual menu, and I can literally choose new folder, or I can rename things, or I can move things around by dragging and dropping them. But for today, let's focus more on the CLI, the command line interface. And again, commands like this. So, let me go back into VS Code, and let me propose that we do a few things just because as a tour. First, let me delete the machine code. I I've I'm done with this example. I don't really want to keep these bits around unnecessarily. I'm going to delete hello. Not hello.c, but hello. The compiled program. When I type that, I'll be cautioned. Remove the regular file, whatever that means, called hello. Here, I'm being prompted for a yes no response. Y suffices. So, I'm going to hit Y, enter, and watch what happens at top left. As soon as I use my terminal window and this command to remove that file, it disappears. I could have rightclicked on it or control-cllicked on it, but this command line interface achieves the same thing. Now suppose that for problem set one in future problem sets, I want to keep like every program I write in its own folder just to keep myself organized, especially as the term progresses. Well, let me create a new folder called hello itself. So I don't want to create a program called hello. I want to call create a folder called hello. Well, one way I can do this per this here cheat sheet is to make a directory which just means folder. So, mkdir hello. Enter. And you'll see at top left now I indeed have a folder. And it even has an obvious folder icon next to it. Now I could cut some corners. I could click and drag on hello.c and just drop it into hello. But again, let's stick with the command line interface. Let me go ahead now and move mv for short. Hello. C into hello. So this is the first command where I'm passing in not one word after the command like code hello. see or make hello. Now I'm typing two words after the command because the way the move command is designed is to expect the origin as the first word and the destination as the second so to speak whereby if I want to rename hello C sorry if I want to move hello.c into the hello folder I should type like this. Now, you can, just so you know, include a trailing slash, a forward slash at the end of the destination just to make clear that you want to put this into a folder and not just rename hello.c to hello. But because the hello folder already exists, Linux knows what it's doing. And it's just going to assume that when you do that, watch what happens at top left. Hello. C seems to have disappeared. But if I click this little triangle, ah, there it is. It's now inside of that folder. But now I've created kind of a predicament for myself. Let me clear my terminal window. And now let me type ls. And when I type ls for list, you'll see only a folder called hello. And it's colorcoded just to call it out to your eyes. And there's a trailing slash just to make obvious that it's a folder. That's all done automatically for you by Linux, the operating system. But wait a minute, where did my hello program go? Like where is hello. C. Well, it's in that folder. So I need to change into that folder or directory. And here per the cheat sheet, we have cd for change directory. So, I can do cd space hello with or without the slash and hit enter. And now you'll see this. And it's admittedly a little cryptic, but my prompt has now changed to still be a dollar sign, but before it is just a constant reminder of where what folder I am in. We uh adopted this as a convention. Many systems do the same thing, though the formatting might be a little different. This is just to help you remember where the heck you are without having to type some other command to ask the operating system what folder you are in. So now that I'm here, if I type ls and hit enter, what should I see? Just hello. C because that's the only thing in that there folder. So now let's do maybe one other thing. Let's do make hello inside of this folder. That is okay. And notice at top left what just happened. Now I've got both files back. All right. Suppose I want to get rid of one. Well, I can do rm hello again. I can type y for yes to confirm the deletion. And now I'm back to where I just was. Now suppose I want to do yet other things. Suppose that I'm not really proud of this version of hello. C. Let me keep it but rename it. Well, I can say uh how about MV hello C to old C. I just want to rename the file. So MV can be used not only to physically move a file from one place to another. If you use it onto file names, it will just rename the file for you. So there's no rename command that you need use instead. Uh but you know what? Nope. I regret that. This program was fine. Let's rename it back. So, let's move old C back to hello. C. And watch it. Top left. It just renames the file again. Um, let me go ahead and make a backup though. So, let me copy with CP hello. C into a file called like backup.c just in case I screw this up. I want to have a spare around. Now, you see at top left, I've got both files. If I now type ls, you'll see both files. So, what's happening in the guey is the exact same thing is happening in the CLI. But, you know what? This was just for demonstration sake. I don't need any of this. So, let me remove the backup. say yes for y. Let me go ahead and move hello.c out of this folder, which I could just kind of drag and drop it. But how do I move hello C to the parent folder, so to speak. I want to move it out of this folder. Well, you would only know this by having been told dot dot is special notation. That means the so-called parent folder. So, go back up in the hierarchy. And now, if it's not obvious, a single dot, which we have seen before, means this folder. Two dots means one step up. There's no triple dots or quadruple dots. You have to use different syntax, but more on that another time. So, watch what happens when I do move hello.c up into the parent directory. Notice at top left that the indentation changed because it's no longer inside of that same folder. And heck, now I'm going to go ahead and do this. I could go back to my main folder by doing cd dot dot to back out of this folder. But when in doubt or if you ever get yourself into a confusing mess, just type cd enter alone and you'll be magically whisked away to your default folder, a home directory so to speak, even though that too is a bit of a white lie. So that will lead you always where you're starting when logging in to c50.dev aka VS Code. And now I can see the folder which happens to be empty and the file. So let me go and do one last command rmder. Hello to really undo all of the work such that we're now back to where the story began. But the point here is just to demonstrate with that with these basic fundamental commands, you can do everything that you've taken for granted on Macs and PCs for years with a mouse instead. Questions on any of these here? Linux commands. Yeah. >> Files in a folder, how can you like to open? >> Really good question. If you have five different f files in a folder, how can you choose which one to open? Well, you can certainly do code space and the name of the file you want to open. Or we're going to see other tricks like you can use an asterisk or star for a so-called wild card and say open everything in this folder. And you can even use more precise patterns than that. So over time once we have more files at my disposal, I'll be able to do tricks like that as well too. Yeah. >> I don't know if I said it back. >> Uhhuh. when you like delete the file was that hello was that hello. >> Sure. So one of the things I did in my VS code a moment ago was once I was inside of the hello folder into which I had put hello.c just for the sake of discussion. I then recompiled it by running makehello. And this example is a little confusing deliberately in so far as I've got a file called hello.c C inside of a folder called hello. But because I compiled hello.c, I then created a program called hello as well. But that program hello was inside of a folder called hello. Which is only to say that you can totally do this. You can't have a file in a folder in the same place named the same thing because they would collide. Like you can't do that on a Mac or a PC as well. You have to have unique names. But you can certainly put something inside of another folder without collision. Good question. All right. So let's introduce a few more building blocks and a few more things we can do. So besides these Linux commands which we'll now start taking for granted, we have a bunch of other features of of programming languages that we saw in Scratch. Let's now translate them to C. So conditionals were sort of the proverbial fork in the road enabling you to do this or this or some other thing based on the answer to a question, a so-called boolean expression. Here for instance in scratch is how we might express if a variable x is less than a variable y we'll go ahead and say x is less than y and out of context I didn't include it in the slide presumably we've created x and y and somehow given them values whatever they are but this is just now the conditional part of the program in C the way you would do the same thing is you would say if and then a space then parentheses which have nothing to do with functions if is not a function it is a feature of C that implements conditionals just like this orange block is a feature of scratch inside of the parenthesis you put your same boolean expression. So here too out of context if up here I have defined variables X and Y well I can certainly use them in this conditional and I can use this less than operator just like in math class to ask this question and the answer even though it's a less than sign is indeed if you think about it going to be true or false yes or no. It's a boolean expression. It either is less than or it is not. All right. Inside of the curly braces which are necessary here I'm just going to literally put our old friend print f. And there's nothing interesting here except the new phrase x is less than y with the backslash end the semicolon and the parenthesis. This though is deliberate just like in Scratch the say is sort of indented and sort of hugged by the if orange puzzle piece. Similarly do these curly braces are they meant to sort of imply the same. It's sort of embracing these lines of code. As an aside in C they're not always necessary. If you have a single line of code you can technically omit them. However, what you'll see in C as in as well as in CS50 in particular, we will generally preach a certain style like any company in the real world would do so that programmers who are collaborating on code all write code that looks the same uh so that it doesn't uh devolve into a mess because everyone has their own convention. So this is a convention to which you should indeed it here and then I've indented four spaces to make clear logically that this line of code only executes if the answer to this question is true or yes. Meanwhile in Scratch if we had an if else condition so a two-way fork in the road. If x is less than y say so else say x is not less than y. How can I do that in c? Well if x less than y something else something else. And what are the uh what's goes in between those curly braces? Well, just two different print fs. X is less than Y or X is not less than Y. The only new thing here is we've added else and another pair of curly braces, just like we've got sort of two uh orange uh shapes hugging those two purple puzzle pieces there. All right, how about something a little more involved? And this looks like it's escalating quickly, but it's just because the scratch puzzle pieces are so big. If x is less than y, then say x is less than y. Else if x is greater than y, then say x is greater than y. else if x equals y then say x is equal to y. How can we do this and see almost the same idea. If x less than y else if x greater than y else if x equals equals y. Well before we reveal what's in the curly braces. This is not a typo. Why have I presumably done this even if you've never used C before. Yeah. >> Exactly. The single equal sign, which we've used already when storing a value from get string into a variable like answer, is technically the assignment operator. So humans decades ago decided that when faced with the situation where they wanted to copy from the right to the left a return value into a variable, it made sort of visual sense to use an equal sign because you want those two things ultimately to be equal. Even though you kind of read the code from right to left in that case, I can only imagine at some point the same people were in the room and they were coming up with the syntax for conditionals and like oh shoot we've already used equals for assignment. What do we now use for equality and the solution in C as well as in many other languages is literally this. They use two. So this is the equality operator whereas a single one is the assignment operator and it's just because now Scratch is designed for kids. No sense in confusing little kids with equal equal signs. So, Scratch uses a single equal sign, whereas C and most languages use double equal sign. So, a minor divergence there. What goes in the curly braces? Nothing all that interesting, just a bunch more print fs. But here's an opportunity to distinguish not only the equivalence of this scratch code with CC code, but a misdesign opportunity that we sort of tripped over if briefly last week. This is arguably not well designed even though it is correct. Why? Yeah, >> you don't need to ask. >> Yeah, we don't need to ask this third boolean expression. Is X equal equal to Y, so to speak? Well, logically, if we're using sort of normal person numbers, it's either less than or greater than or by default equal to. So, you're just wasting the computer's time and in turn the user's time by asking this third question. So, slightly better here would be get rid of the else if just have a default case, an else block so to speak, that looks like this. if it stands to reason that there's only three possibilities, you only really need to interrogate two of them out of the three. So, a minor optimization, but you could imagine doing that again and again and again in your code. You don't want to be wasting the computer or the user's time if you can improve things like that. All right. So, now that we have these equivalences between Scratch code and C code for these conditionals, well, what other things can we throw into the mix? Well, uh C has a whole bunch of operators. And just so that you've seen a list in one place, you've got not only assignment and less than and greater than and equality, but a few others here as well. Now, even though in like Microsoft Word, in Google Docs, you can kind of do a greater than or equal to sign one over the other or less than or equal to, in C in most languages, you actually just hit the keyboard twice. You do the less than and an equal sign, or you do a greater than and the equal sign. And that's how you achieve the notion of greater than or equal to or less than or equal to. Um, this one we've seen. Anyone want to guess what uh exclamation point equals means? Otherwise pronounced bang equals. Yeah. >> Not equal. So generally in programming you'll see an exclamation point implying the negation of something else. The opposite. So you don't want it to be equal to, you want it to be not equal to. Now you might think, shouldn't it be not equal equal? Yes, but they're trying to save keystrokes. So this is the negation of that even though it doesn't quite look like it should be. just two characters instead of three. Um, and dot dot dot there's many other operators that we'll encounter in the wild over time. Um, but there's also worth noting in C more than just strings like strings recall were strings of text and there's other types of uh data that you might get from a user or store. We've seen string but we'll actually see a whole bunch of others. So in C we're going to see bools themselves a a variable that can be true or false and that's it. So very much interrelated with boolean expressions. A variable itself can be true or false. We're going to see chars or characters. So not strings of text like multiple letters and words and the like but just individual characters. C unlike some languages does distinguish between single characters and multiple characters. Uh double or rather let's jump to float. A float is otherwise known as a floatingoint value which is just a number that has a decimal point in it. a real number if you will, but a float generally uses nowadays 32 bits total to represent those numbers. The catch with that is that how many total values can you represent with 32 bits roughly per last week? It was one of the few numbers I propose you remember. It's like roughly 4 billion. But how many real numbers are there in the world according to math class? An infinite number. So we seem to have a mismatch between what we can represent in code and how many actual numbers there are in the world. Okay, so not to worry if you need more precision like more significant digits. Well, you can upgrade your variable so to speak from a float to a double which uses 64 bits which is way more precise twice as many bits but it doesn't fundamentally solve the problem because really it's still finite and not infinite. And we'll end today with a look at what the real world implications of that are. But besides floatingoint values, they're just simple integers. 0 1 2 and the negatives thereof. Uh but those conventionally use 32 bits, which means the highest a computer can count using an int would be 4 billion. But if you want to do negative numbers, it's going to be roughly 2 billion. So you can go all the way to negative 2 billion. So that's not that large nowadays. Along uses 64 bits, which is a much bigger range of values, but there too still finite. And there's a bunch of others as well. So these are just the types of data that we can store and manipulate in our programs. But a couple of those know do uh couple of those one in particular specifically come from cs50.h. So among the things you get by including cs50.h in your code is access to not only get string but these other functions as well. And we'll start to use these in a little bit whereby you can get integers or chars or doubles or floats. We don't have a get bool cuz it's not really useful to just get a true or false value typically, but we could have invented it. We just chose not to. But we'll frequently use these here functions that you can access by using that there header file. But where are we going to put these values and how are we going to display them? Well, turns out there's more than just percent s. So percent s was a placeholder for a string, but if you want to print out something like a char, a single character, you're actually going to use percent c. If you want to print out a floatingoint value, you're going to use percent f. An integer percent i and a long integer that is a long, you're going to use percent li instead. So in short, there's solutions to all of these problems. These are not uh intellectually interesting details, but they are useful practical things to eventually absorb over time. So let's go ahead and do this. Let's do just a few more examples together. In a little bit we'll journey and we uh for a short break uh during which uh snacks will be served every week out in the transep. But before we get to that, let's uh focus on these here variables. So in Scratch we had the ability to store a bunch of values in variables that we could create ourselves by creating new puzzle pieces. In C you can essentially achieve the same. So for instance suppose that in Scratch we wanted to keep track of someone's score using a counter. Well, we might create a variable called counter and set it initially to zero and then eventually add one to it, add two to it, and so forth as they drop trash into the trash can, for instance. Well, in C, you're going to do something almost the same. You can choose the name of your variable just like I did previously with answer. You can assign it a value like zero initially, but per earlier, what more am I probably going to have to do in C on the right hand side here? Yeah, >> I got to give it a type and a counter in. in so far as it's numeric is not going to be a string of text and I don't think I need to worry about decimal points if I'm just counting the equivalent on my fingers. So int will suffice and int is the go-to number and le at at least if two billion plus values is more than enough for your case which this is going to be still one minor thing missing. Yeahm >> the semicolon to finish the thought. So that on the right is the equivalent to doing this here on the left. Suppose that in Scratch you wanted to increment the counter and add one to the score, add two to the score and so forth. It might look like this. Change counter by one implicitly going up unless you did negative which would go down. In C, you can do this actually in a few ways. And this looks a bit wrong at the moment. How can counter possibly equal counter + one. This does not mean equality per se. The single equal sign recall is assignment and it means take the value on the right and copy it to the value on the left or to the variable in this case on the left. So this takes whatever the current value of counter is zero adds one to it and then stores that one in the counter variable. So now the value is one and if you do it again it goes to two goes to three goes to four and so forth. But honestly this incrementation technique is so common that there's more shorthand notation for it. You can also just do this. Looks a little weird at first glance but counter plus equals 1 semicolon does the exact same thing. You can just type fewer keystrokes. And honestly, doing this is so down common in C that you can even do this counter plus plus does the exact same thing by adding one to the variable. There's no plus+ or plus+ or more pluses. It's only for incrementing individual values by one. So arguably this version and this version, albeit more verbose, are a little more versatile because you can add two or three or more at a time. And there are equivalents for you doing decrementation and doing minus minus or the minus symbol more generally in there. All right, so let's actually use this technique in some code. Let me go back into VS Code here. Let me close my file explorer and let's go ahead and create maybe this time like a a little calculator of sorts. Let me propose that we implement a very baby calculator or rather not even a calculator yet. Let's just compare some few values. So let me do this code of compare C to create a brand new program called compare. And then in here I'm going to do a bit of boilerplate. I'm going to go ahead and include cs50.h. I'm going to go ahead and include standard io.h. And I'm going to go ahead and uh do int main void. More on that next week. And then inside the curly braces, let's use these these new techniques. Let's give myself a variable called x and set it equal to the return value of get int. that other function I promised exists. And let's prompt the user for a value for x with a sentence like what's x question mark and then a space just to nudge the cursor over. Let's get another variable y. Set it equal to get int again and ask the user this time what's y essentially using the same function twice but to get two different values. Now let's go ahead and do something pretty mindless. If x is less than y, go ahead and print out with print f x is less than y. Back slashn to move the cursor close quote semicolon. So it's not that interesting of a program, but it's at least dynamic in that now I'm prompting the user for two numbers. So let's do this. Make compare. Enter. Seems to have worked. And in fact, I can check that it worked by typing what command to list the files in my directory. ls for short. And now you'll see I've got hello.c. C, but no hello because I deleted that with rm a few minutes ago. I've got compare.c which I just created. And then I've also got a program called compare. And the asterisk there is just a visual indicator that this is executable. It's a program you can run. It's not just a simple old file. Even though I didn't type ls previously with hello, uh it would have similarly had an asterisk next to it in this context. But you don't see that in the file explorer. If I now do compare, well, let's do something silly like one for x, two for y. Okay, X is less than Y. Let's do it again. Dot slashcompare two for X, one for Y. Okay, and I see nothing. Well, why am I seeing nothing? Well, logically, I didn't have a condition for checking for greater than, let alone equal to. So, let's enhance this a little bit. Let me go ahead and minimally say, all right, else if X is not less than Y, let's go ahead and print out X is not less than Y back slashn close quote semicolon. So I'm at least handling that situation too. Let me clear my terminal window. Do make compare again. Dot /compare one and two works exactly the same. Now let me go ahead and do two and one. There we have better output. Of course it's not really complete yet because if I do dot slash compare again and do one and one, it'd be nice to be a little more specific than x is not less than y. It's not wrong but it's not very precise. So I can add in the to the mix what we did earlier and I can say okay well else if x is greater than y say x is greater than y else if x equals equals y go ahead and print out x is equal to y back slashn close quote but here too someone observed that this is sort of stupidly inefficient what line of code should I actually improve here to tighten this up yeah >> instead What else did you just get rid of? >> Yeah. So line 17. I think I can just get rid of that unnecessary question because logically that's going to be the case at this point. And now I can go ahead and recompile this with make compare dot / compare again. Enter one and one. And now we're back in business catching all three of those situations uh those uh scenarios there. Questions on any of these things here? Why have I deliberately not done this? Let me rewind just a moment and let me hide my terminal window just to keep the emphasis on the code here. Why not do this and keep my code arguably simpler? Like why not just ask three questions? Step nine, step 13, and step 17 here. Yeah. What don't you like? >> Because then it would check each and every condition. Um even though for example the first one might be fulfilled, it would check the second and third. That wasted Exactly. It's another example of bad design because now no matter what, you were asking three questions on lines 9, 13, and 17. Even if X ends up being less than Y from the get-go, you're still wasting everyone's time by saying, "Wait, well, is X greater than Y?" You already might know that it's not. Is X equal to Y? You already might know that it's not. And so these three conditionals at the moment are mutually exclusive, whereby you're checking all three of them no matter what. even though logically that shouldn't be necessary. So our first approach was actually quite better. And in fact, just to show you the the density difference here, let me go back to this very first version here whereby I was only checking that one condition. Is X less than Y? Well, if you're more of a visual learner, you can actually draw out what code looks like in flowchart form. So here is a drawing of a program that starts here and ideally stops down here. And each of these uh figures in the middle sort of represent logical components of the code. Uh here in the di in the diamond here is my boolean expression which represents the start of the conditional. So if x is less than y I have a decision to make yes or no true or false. Well if it is less than y true. Well let's go ahead and print out quote unquote x is less than y and then stop. However the first version of that program recall just said nothing if it were not the case that x were less than y. That's because false just led to the stop of the program. There's no keyword stop. There's just no hand no code to handle that situation. But the second version of the code when I actually added an else looked fundamentally a little different. So now second version of that code asked is X less than Y and if true behavior is exactly the same. But if it weren't true, it were instead false, that's when I got the message X is not less than Y. But in the third version of the code where I added the if else if else if then the picture gets a little more complicated and let me zoom in top to bottom here we have a longer flowchart but the questions are really the same. When I start this program I ask is s is x less than y. If so I print out x is less than y. However in that la sorry in that last version of the program I was still foolishly asking the same question. Well wait a minute. Is x greater than y? Wait a minute. is x equal to y and that's the version in which again I had all of that unnecessary code which I just undded here asking three questions at a time ideally I don't want to make that mistake by doing it again and again and again so if I instead revert that code to else if and else if then my flowchart looks a little bit different because notice the sort of shortcuts now if x is less than y true we do this and we're done Super quick. If X is not less than Y, fine. We do ask one more question. X is greater than Y. Well, if so, boom. We make our way to the end of the program by just printing that. Only if it's the perverse case where X equals equals Y. Do we check this condition? No. This condition, no. This condition, and then okay, now we can print out X is equal to Y because it must be logically. Of course, it's been observed multiple times. This is a waste of everyone's time. So we can prune this chart more and just have one question, two questions and that alone tightens up the program. So again, if you're more of a visual learner, most any block of code you can re translate to this sort of pictorial form, but it really just captures the same logical flow that the indentation and the syntax and the code itself is meant to imply. All right, how about a final exercise with one other type here? Recall that this is our available types to us. Actually, two final examples here before we have a bit of a break. Here we have a list of types that we can use. And here we have a list of functions that we can use. Let's go ahead and make a a program that's representative of something we do quite often nowadays, but using a different type. So, let me go back into VS Code. Let me close compare.c. Let me reopen my terminal window and clear it just so we have a new prompt. And let's go ahead and create a program called agree.c. It's all too often nowadays that we have to like agree to terms and conditions. To be fair, it's usually in the form of like a popup and a button that we click, but we can do this in code at the command line as well. Let me go ahead and include to start CS50.h and include to start standard io.h. Let me again for today's purposes do int main void, but we'll reveal next week what we why we keep doing that. And now for a yes no answer, it suffices just to ask for a single char or character, not a whole string. So let's do this. char C equals get char and let's ask the user quote unquote do you agree question mark for instance and now I can actually compare that value for equality with some known answers for instance I could say if c equals equals quote unquote y then go ahead and print out for instance agreed period back slashn close quote semicolon else if c equals equals equals n in quotes. Let's go ahead and print out, for instance, not agreed period back slashn semicolon. Now, there's still room for improvement here, but notice we're just now using the same building blocks in C um in different ways to solve different problems. But notice on lines 8 and 12, I've used single quotes, which I alluded to earlier. Why is that the case? Why single in this case here? >> Yeah, it's a single character. And this is just the way you do it in C. When you want to compare a single character, you use chars and you use single quotes. When you want to use strings of text, like multiple characters, multiple words, multiple sentences or paragraphs, you use strings. So this would seem to work, but arguably I could be a little more efficient. If the user doesn't type why, I mean, frankly, I could just chop off this else if and make it an else and just assume if you don't give me a Y answer, then at least I'm going to assume the worst and you don't agree. But even here, the program's not all that great. Let me go ahead and do make agree and then do dot slag agree. And do I agree? Sure. I'm going to go ahead and type y. Meanwhile, if I type anything else like n or uh even emphatically, no, that would seem to Whoops. Why did that not work? Yeah. >> Exactly. So, among the features of CS50's functions like getchar is that it will enforce what type of data you're getting. So even though I it because I used getchar, if the user doesn't cooperate and types in multiple characters, get char like some of our other functions is just designed to prompt them again again and again until they cooperate. That's useful so that you don't have to deal with that kind of error checking. But here I could type n in uppercase and that seems to now work. But that only works because of the else. Let me go ahead and do this which is very reasonable. I'm going to go ahead and type y capital y which you would hope works. That feels like a bug at this point. Like it's fine if we don't want to support yes and no. We just want to support Y and N. But it's kind of obnoxious not to support the uppercase version thereof. So how can we fix this? Well, let me hide my terminal window. And I could go in and fix this as follows. I can say well else if C equals equals quote unquote capital Y in single quotes. And then I could do print out agreed period back slashn semicolon. And then I can do uh else uh that that would work. That would work there. But what rubs you the wrong way perhaps about this solution? Even if you've never programmed before, just applying some of the lessons from last week. Yeah, >> it's redundant. I mean, I didn't technically copy and paste, but like line 14 is identical to line 10, so I might as well have copied and paste. And that's generally bad practice. Why? Well, if I want to change the English language to say something else in that case, now I have to change it twice. And it's just I'm repeating myself, which is just bad design. So, there are ways to address this through other types of operators that we haven't yet seen. If I want to ask two questions at once, that's fine. I can do something like this. Well, if C equals equals quote unquote Y or C equals equals quote unquote capital Y, I can tighten things up using so-called logical operators whereby I am now taking a boolean expression and composing it from two smaller boolean expressions. And I care about the answer to one of those questions being true. So whether it's lowercase Y or uppercase Y, this code now will work. And if it's anything else, we're going to default to not agreed. So the two vertical bars, which is probably not a character you type that often, and it varies where it is on your keyboard depending whether it's American English or something else, just means logical or. This is not relevant here, but you could also in some context use two amperands to conote and. But this does not make sense. Why? Why is it clearly not correct to say and in between these two clauses? Yeah, >> exactly. The variable can't both be lowercase and uppercase. That just makes most no sense. So, this would be a bug, but using a vertical two vertical bars here is in fact correct. All right. Well, let's do one final flourish here. Besides conditionals, we had these now loops. Recall that a loop is just something that does something again and again and again. Here for instance to scratch how we might meow three times in C. There's going to be a few different ways to do this. Here is one. You can in C declare a variable like I for integer or whatever you want to call it and set it equal to three, the number you care about. You can then use a loop and the closest to the repeat block is arguably a while loop. There is no repeat keyword in C. So we can't translate this verbatim, but we could say while I is greater than zero. Why? Because that's sort of logically what I want to do. If I start counting at three, maybe I can just sort of decrement one at a time and get down to zero, at which point I can stop doing this thing. So I'm going to initialize a variable to I, a variable I to three, and then I'm going to say while I is greater than zero, go ahead and do the following. And at the end of that loop before whipping around again, I'm going to use this line of code, which we haven't seen, but you can infer. IUS minus just means subtract one from I. So this is going to have the effect of starting at three, going to two, going to one, going to zero. And as soon as it goes to zero, this boolean expression will no longer be true. And so the loop will just implicitly stop because that's it. So what are we going to put inside of the curly braces besides this decrementation? Well, I think I can get away with just saying meow. And that will now print 1 2 3 times. And yet that's interesting. I sort of counted in instinctively 1 2 3 even though I'm proposing that we count 3 2 1. So can we implement the logic in the other direction whereby we count up from zero instead of down from three. Well sure we just have to make a few changes. We can set i equal to zero initially. We can change our boolean expression to check that i is less than three again and again. And on each iteration of this loop let's just keep incrementing i with i ++. And at this point it will have the effect of doing 1 2 3. Three is not less than three. So I won't put any more fingers up. I will meow in total three total times. And again, if you're a visual person, here's how we might start counting at zero initially. Check that i is less than three, which it is initially. And if so, we print out meow. Then we increment i, and we get whisked around again to the boolean expression because that's how while loops work. You constantly have the condition being checked again and again. That's just how C works. As soon as I've incremented I from 0 to 1 to two to three, three will eventually not equal not be less than three. So the answer will be false. So the loop will just stop. So that has the effect of achieving the same. But it turns out that looping uh some amount of times is so darn common that you don't strictly have to use a while loop. A for loop, so to speak, is another alternative there too, whereby the syntax is a little weird. It's a little harder to memorize, but it allows you to write slightly less code because you write more code on a single line. So the way you read a for loop is exactly the same in spirit. You initialize the variable everything to the left of this first semicolon. The you then check the condition and the computer does all this for you. If I less than three, if so, you execute what's inside of the curly braces and then automatically the thing to the right of the second semicolon happens. So I gets incremented from zero to one. In this case, the condition is checked. Is one less than three? It is. So, we print meow again. And C increments I to two. Is two less than three? Yes. So, we meow again. I gets incremented to three. Is three less than three? No. So, the for loop stops. So, it's exactly the same, but just more magic is happening in this first line of code here more than you yourselves have to actually write. And it's just arguably more common convention. But both of them are perfectly correct if you'd like to do that yourself. So let's go ahead and actually implement now this this beginning of a cat in VS Code. Let me go back to VS Code and close agree.c. Let me reopen my terminal window and create a actual cat in cat.c. And let's go ahead and do this initially the wrong way. Include standard io.h int main void. And then inside of main let's go ahead and print out quote unquote meow back slashn semicolon. And then heck, let me just copy paste. So this is obviously the wrong way, the bad way to do this because I'm literally copying and pasting. But it is correct. If I want the cat to meow three times, I can make this cat. I can do slashcat and I get my meow meow meow. But let's now actually use some of those new building blocks whereby we converted scratch to C. And let me go back into this code and I'll do the while loop first. So I could instead have done int i equals 3. If we count down initially while I is greater than zero, then go ahead and print out quote unquote meow back slashn. And then do I plus+ or I minus minus? I minus minus because we're starting at three. Now let me go back to my terminal window and clear it. Do make cat again. Dot /cat and we get three meows. And this is now arguably better implemented. What if I want to flip things around? Well, I could now change uh maybe do it the normal person way. I could start counting at zero. And I can do this so long as I is less than three. And I can do this so long as I increment I on each iteration. Now I can do make cat again. Dot /cat. Enter. And that too works. But there's another way I could do this. If I want to count like a normal person, like start counting from one and count up two and through three, I could do this. But this is arguably this is correct. It would iterate three times. But it's a little confusing because now I have to think about what it means to be less than four. Okay, that means equal to three. I could be a little more explicit and say we'll do this while I is less than or equal to three using yet another one of those operators. So I can make a cat yet again dot /cat and that too would work. Now which of these is correct or best? The convention truthfully is in general in code to start counting from zero. start counting up to but not through the value that you want. So at least you see the starting point and the ending point on the screen if you will at the same time. But of course I can condense all of this a bit more and turn this whole thing into a for loop. And I instead could do four int i equals 0 i less than 3 i ++ and then down here I could do print out quote unquote meow. And if only because I typed fewer keystrokes that time like this feels a little nicer. It's a little tighter and more uh efficient to create even though the effect is the same. Indeed, when I make this cat and do dot /cat a final time, this here too gives me the three meows. So, what could go wrong? Well, sometimes you might be inclined to do something forever and we might have done that in Scratch and indeed we did when we had some things bouncing back and forth off of walls and so forth. You can achieve the same thing in code. In fact, in C we could use a while loop, but there is no forever block. So while suffices, but recall that the while loop expects a boolean expression. And if I want to do something forever, I essentially need an expression here that's always true. So I could do something stupid and uh arbitrary like while two is greater than three or while one is less than two. I mean make a statement of fact that never changes air go. It's just going to run forever. But if the whole goal here is to do something forever and to get this boolean expression to be true, the convention in programming is just to literally say while true. And that implies and functionally means that you will do this thing forever unless you somehow prematurely break out of those curly braces. More on that before long. So if I want to meow forever, I could now just do this. And this would be an infinite deliberate loop. But unlike a game where you might want it to keep going and going and going for some time, I'm not sure this is going to be the best thing for us. Let's go ahead and try this. So let me go ahead here and include for good measure uh CS50's library if only because um it too is giving us features like uh bools. Uh here I'm going to go ahead and say while true and then inside of my curly braces I'm just going to print out meow. Let's go ahead back slashn semicolon. Let's go ahead here and make cat one final time. Let me go ahead here and do dot slashcat. And this is like the annoying cat game. Just like meowing, meowing meowing endlessly. Like I've now kind of lost control over my terminal window. And mark my words, at some point you might do this, too. But let's go ahead and take a juicy 10-minute break here. Uh we have some delicious blueberry muffins out in the transep. Come back in 10 and we'll figure out how to stop this here cat. All right, so it's been about 10 minutes and like VS Code is freaking out with high code space, CPU utilization detected. Consider stopping some processes for the best experience. So this is what happens when you have intentionally or otherwise an infinite loop in so far as I've been printing out meow endlessly. And I was warned by my colleague that I probably shouldn't let this run too long because we might lose control over the environment altogether. But the answer to how to solve this is going to be control C. So there's a few cryptic keystrokes that you can use to generally interrupt things as in this way. And in fact, if I go back and you'll see, yeah, I kind of lost control over my code space here. I'm going to go ahead and try to reload the window altogether. But had I hit control C in time, let's hope this doesn't now go off the rails. C would have been our friend. There we go. And we're back. Okay. So, now that we've got control over our so-called code space again, how can we go about making our meowing program a little more dynamic in so far as let's like start asking the user how many times they want the cat to meow. Certainly, rather than do it an infinite number of times and even rather than do it three times alone, I think we have all of these building blocks thus far. So, let me go ahead and stay in cat.c here and go ahead and delete the body of the contents of my main function. And let's go ahead and do this. Let's give myself an int. And I'll go ahead and call it n for number. Though I could be more verbose than that if I wanted. I'm going to set it equal to the so-called return value of get int, which recall is going to get an integer from the user. And quote unquote, let's ask the user what's n just like I asked earlier, what's x and what's y, where n is the number of times I want the cat to meow. Now, how can I use this variable? Well, we have that building block, too. I could use a while loop or a for loop. And if I use a for loop, I could do this. I could initialize a variable i for integer, set it equal to zero initially. I could then do I less than not three this time but n. So I can use that variable as a placeholder inside of the loop to indicate that I want to do this n times instead of three. And on each iteration through this loop I can do i ++. Of course I could be counting down if I prefer uh by using decrementation. But logically I would say this is canonical. Start at zero and go up to but not through the value that you actually care about. And I'll go ahead now and print out quoteunquote meow with a back slashn semicolon. Back down to my terminal. Make this cat again. Dot slashcat. Enter. I'm prompted this time for n. I can still give it three and I'm going to get three meows this time. However, if I run it again with dot /cat and a different input like four, of course, I'm going to get four meows instead. Now, what is get in doing for me? Well, it does a few things similar to getch doing a few things for me. For instance, suppose that instead of answering this question correctly with a number n, I say something random like dog that is not an integer. And so the get in function is designed to reject the user's input implicitly and just reprompt again and again. Uh I can try bird and it's going to do this again. So somewhere in the implementation of get in, there's a loop that we wrote that does this kind of error checking for you. But it doesn't do everything because an integer is a fairly broad category of numbers. It's like negative infinity through positive infinity. And that's a lot of possibilities. But suppose I don't want some of those possibilities. Suppose that it makes no sense to ask the cat to meow like negative one time. And yet the program accepts that. It doesn't do anything or anything wrong. But I feel like a better designed program would say, "No, no, no. Negative one makes no sense. Let's meow zero or one or two or more times instead." So, how can I begin to add some of my own error checking and coers the user to give me the type of input I want? Well, let me clear my terminal window and go back up into my code. And why don't I do something like this? After getting n, let's just check if n is less than zero. Because if so, I want to prompt the user again. And I can prompt the user again by doing n equals get int quote unquote what's n question mark semicolon. Now what's going on here? Well on line six I'm doing two things. I'm getting an integer from the user and I'm not only storing it in the variable n. I'm also technically creating the variable n. So, I didn't call this out earlier, but on line six, when you specify the type of a variable and the name of the variable, you are creating the variable somewhere in the computer's memory. And that's necessary in C to specify the type. If the variable already exists though, and you just want to reuse it and change it later on, it suffices as in line 9 just to reference it by name. It would be sort of stupid to specify the type again because C already knows what type it is because you told C what it is on line six. So that's why lines six and nine are a little bit different. So let's see how this now works. Let me go back to my terminal window and remake this cat. Let me do dot /cat again. Let me not cooperate and type in like negative one again. And notice I am reprompted this time. Fine, fine, fine. Let's type in three. And now it works. But you can perhaps logically see where this is going. Let me go ahead and run this again. Dot /cat. Type in negative 1. Type in negative one. And huh, it didn't prompt me again. But that's consistent with the code. If I hide my terminal window here, you'll notice that I've got one maybe two tries to get this question right. And after that, there's no more prompting of me. Now, you can kind of imagine that this is probably not the best way to do this. If I were to go inside of line nine and then move the cursor down and say, "Okay, well, if n still doesn't uh is still is less than zero." Well, let's just do get int again and ask what's n question mark. And heck, okay, if it's still less than zero, well, let's just keep asking the same, right? Why is this bad? I'm repeating myself. I'm essentially copying and pasting even though I'm retyping. I mean, this just never ends, right? Like, how many chances are you going to give the user? In spirit, you'd hope that they don't un uh not cooperate this many times. But really to do this the right way, we should probably prompt them potentially as many times as it takes to get the correct input. So this is not the right path for us to be going down. But of course, we have already now this notion of like a loop whereby we could just do this in a loop. Ask the question once and maybe just repeat the question again, but the same question. So how might I do this? Well, let me go ahead and delete all of this. And let me just try to spell this out logically. So, I want to get a variable n from the user. And let's go ahead as follows. While true. I know how to do infinite loops now. And even though that created a problem for me with the cat, I bet we can sort of terminate the loop prematurely like I proposed earlier as follows. I could do this int n equals get int and ask the user again what's n question mark. And then I could do something like this. If n is less than zero, well then you know what? Go ahead and just continue on with the same loop. Else if it is not the case that n is less than zero, what do I want to do? I want to break out of this loop. So this is new syntax. This is something you can do in C whereby if n is less than zero, fine. Continue means go back to the start of the loop and do the same exact thing again. Otherwise, if you instead say break, it means break out of the loop and go to below whatever curly brace is associated with that loop. So, continue essentially brings you to the top. Break brings you to the bottom, if you will. So, logically, I think this is right, but this code curiously isn't quite going to work and get me a value for n. Let me go ahead and open my terminal window again. Let's make this cat. And, huh, cat. C line 19 character 25 is an error. Use of undeclared identifier N. Well, what does that mean? Again, cat. C line 19. Let me hide my terminal window. Highlight line 19. N is being used in line 19, but I created it in line 8. And so what's the problem? Why is it not declared seemingly? Yeah, >> because you are using like within the loop that you wrote. >> Yeah, this is a subtlety, but I'm using I'm creating N inside of this loop. I mean, literally between the curly braces on lines 7 and 17. The implication of which because of how C works is that that variable only exists inside of that for loop. This is a problem of what's known as scope. the variable n only exists inside of the scope of the while loop in which it was declared. So how do I actually fix this? Well, I need to logically somehow declare that variable n outside of the loop so that it exists later on in the program as well. And there's a few different ways I can fix this, but the best way is probably to move the the declaration of n, so to speak, the creation of n outside of the curly braces and maybe kind of squeeze it in here below line five. So still inside of main, whatever that is. More on that next week, but in the same curly braces as everything else. So I can in fact do this, and this is where the syntax gets a little bit different. I can solve this quite simply as follows. I can go down to a new line six and just say int n semicolon and that's it. This declares a variable called n. It creates a variable called n. And initially it doesn't give it any value. So who knows what's in there. More on that another time. But now on line 9, I don't need to recreate it. I just need to assign it a value. And because now n has been declared on line six and between the curly braces on line five and all the way down on 24. Now n is in scope so to speak for the entirety of this code that I've written. So let me reopen my terminal window and clear that old error. Let me do make cat again. Now the error messages is gone. Let me go ahead and do /cat. What's n? Now I'm back in business and I can do three for meow meow meow. Better yet, because I'm inside of a loop now, watch that I can do negative 1gative 1gative 1gative 1gative -2g350. Finally, I can cooperate with something like three. And because I'm in a loop that by design may very well go infinitely many times until the user actually cooperates and lets me break out of that exact loop. Now, I strictly speaking don't need both continue and break. I wanted to demonstrate that both exist, but this is like twice as much code than I actually need. If logically I just want to break out of this loop if and only if n is greater than or equal to zero because I'm sort of comfortable with the idea of zero meows but negative makes no sense. Well, I can just flip the logic. I can say if n is greater than or equal to zero then go ahead and break. And I've tightened up the code further. I could technically do something else. I could say something like if n is less than zero, but wait a minute. I want to negate that. You can start to do tricks like this. An exclamation point with some additional parentheses. So you can invert the logic. It's arguably a little hard to read. Even though that would be logically correct. So I'm just going to say more explicitly as before. If n is greater than or equal to zero, break out of this here loop. All right. So this is one way to use an infinite loop. But it turns out there's another construct that you can do altogether that is in a feature of C. Instead of using a while loop and forcing it to be infinite by using while true and then eventually manually breaking out of it, there exists another type of loop altogether and that's called a do while loop. And you can literally say the word do which means do the following. Then you can do exactly what we did before n equals get and quote unquote what's n question mark. So exactly like before but then after those curly braces you use a while keyword. So at the end of the loop instead of the beginning and that's where you put your boolean expression. I want to do all of that while n is less than zero. So you can kind of invert the logic and now kind of tighten things up further by just telling the computer do the following. What's the following? Everything in between those curly braces while n is less than zero. And this implicitly handles all of the continuation and all of the breaking by just doing what you've said. Do this while this is true. But the difference between this dowh loop and a normal while loop is literally that the condition is checked at the bottom instead of the top. So when you say while parenthesis something that question is asked first and then you proceed maybe this condition is only asked at the very end. And why is this useful? Well often time when writing programs where you want to do something at least once like you obviously want to ask the user this question at least once. There's no point in asking a question like while true or while anything else. You should just do it and then you should do it again if the expression evaluates to true and tells you to do something. Now you haven't played with these loops yet most likely unless you have programmed before. Uh there's a fun sort of meme that's apppropo of this moment. So let's see if this maybe causes a few chuckles. If you remember Looney Tunes here, is this funny for people in the know? There we go. Thank you. Okay, this doesn't make sense. It eventually will. And it still might not be funny, but it will at least make sense. And it illustrates the difference between doh while loop like the roadrunner is stopping because he's checking the condition. While not on edge, he'll run. But if he is on the edge, he's not going to proceed further. But of course, the coyote here, he's going to do running no matter what. And then only too late. Does he check? Haha. He's still on the ed. All right. So, ah, thank you. All right. Now, you're cool. All right. So, many more memes will now make sense as a result. But let's go ahead and revisit this code and maybe do something a little bit different here whereby we no longer want to just fuss around with some of these uh conditionals and these loops. Let's actually make the software a little better designed. And to do this, we'll revisit an idea that we touched on last week and had to do with problem set zero, which was like create your own function. Like C does not come with everything you might want. CS50 library is not going to come with everything you might want. And at the end of the day, a lot of programming is about abstracting away your ideas. So you solve a problem once and then reuse it, reuse it, reuse it. And heck, you can package it up in a so-called library like we have and let other people use it as well. So here for instance in Scratch is how we could have implemented the notion of meowing as by getting the cat to play the sound meow until done. We abstracted it away and then we had a magical new puzzle piece called meow in C. This is going to be a little weird today but next week these details will start to make more sense. You would instead do the following. Literally type void the name of the function you want to create and then void again in parenthesis. For now know that this is the return value of the function. So void means it returns nothing. This is the input to or the arguments to the function. Void means it takes no inputs. And that makes sense because literally meow doesn't return anything. It doesn't take anything. It just meows. It has a so-called side effect audibly last week. So this means hey c invent a function called meow that takes no input, produces no output, but does have a side effect of printing meow on the screen. Meanwhile, if I wanted to do something like this in code last week where I meowed three times, well, that's fine. We have the building blocks for this. And here's where inventing your own function starts to get more compelling. I can abstract away the notion of meowing now. Like, this doesn't come with C. It doesn't come with the CS50 library. I just created in the previous code this meow function. So, I can encode with a for loop and that new function meow three times. But I can abstract this away further. Recall that the refinement in Scratch last time was this. I could edit the new function and I can say it actually does take an input otherwise known as an argument called n. And I clarified that this means to meow some number of times. And then inside of those scratch blocks, I repeated n times the meowing act. Well, in C, I can achieve the exact same thing. Even though it's going to look a little more cryptic, but meow still returns nothing. It has a audible or visual side effect, but it doesn't return a value. But this version does take an input. And this might look a little weird, but just like before, when you create a variable in C, you specify the type and the name. When you invent your own function in C and it takes one or more inputs, aka arguments, you specify the type and the name of those as well. No semicolons up there, just inside of the parenthesis. And you'll get used to with practice this convention. But the rest of this code is exactly the same, except instead of three, I'm now using n. So again, I'm just composing the exact same ideas as last week, even though it looks way more cryptic this week, but it will come more and more familiar with more and more practice. So how can I go about implementing this myself? Well, let me propose that we do something like this. Let me go back to VS Code here and let me go ahead and let's really delete most of the code that I've written inside of Maine. And let me just suppose for the moment that meowing exists. And I'm going to go ahead and say for the first version for int i equals zero i less than three. So we're not going to take input yet. i ++. And then I'm going to go ahead here and say meow is what I want this function to do. Now if I scroll back up, you'll see there's no definition of meow yet. So I'm going to invent that too. I'm going to go up here and say void. Uh meow void. And again this first version means no input, no output, just a side effect. And that side effect super simply is going to be to say just quote unquote meow with a back slashn. And now if I go and open my terminal window, clear it from before, do make cat, so far so good. /cat, we're back in business, but I've abstracted the function away. Now, much like last week where I sort of dramatically dragged the meow definition way down to the bottom of the screen just to make the point that you don't need to see it anymore. Out of sight, out of mind. Let me sort of try to do the same here. Let me highlight and delete that and like go way way way down arbitrarily just to be dramatic and paste it near like the hundth line of code and scroll back up. Now out of sight, out of mind. I've already implemented the idea of meowing. We don't need to see or talk about it again. But there is a caveat in C. When I now clear my terminal and make this cat, now I've introduced a problem and there's like more problems it seems than code. Let me scroll back up to the first such error and you'll see this on line nine of cat.c See character 9, there's an error. Call to undeclared function meow and then something fairly arcane, but that means that meow is no longer recognized as an actual function. I know that it doesn't come from CS50.h, and I know it doesn't come from standard.io.h. It's just down there. But why is the compiler being kind of dumb here? Uh, yeah. function. >> Yeah, because in so far as the first version worked like logically it would seem that putting it at the bottom was just a bad idea because C compilers are fairly simplistic. Like they won't proactively do you the favor of like checking all the way down at the bottom of the file. They're going to take you literally. So if meow doesn't exist as of line 9, that's on you. Like that is an error. So I could fix this by just undoing what I did and move it way back up to the top. But let me argue that in general when writing C programs, the main function, which I keep using and we'll talk more about next week, is literally meant to be the main part of your code. And so it kind of stands to reason that it should be at the top because when you open the file, it'd be nice to see the main program that you care about, the main function. So there's an argument to be made that it's a little annoying to have to put my functions all at the top, which is just going to push main further and further down. So there is a solution, and this is dare say the only time copying and pasting is appropriate. Let me delete most of these blank lines which is unnecessarily dramatic and just move it below main as over here. The way I can uh the solution here though is to do this to copy the first line of the main function its so-called signature and then just put that one line and only that one line with a semicolon above main. And this is what's known as a prototype. So a prototype is just a bit of a hint to the compiler, a promise if you will, that hey compiler, there will exist a function called meow. It takes no input and it returns no output semicolon. And it's on the honor system that it will eventually exist later in the file. We'll talk more about this next week why that works, but this is sort of a promise to the compiler that it will eventually be defined. Now, what I've done here on line four as an aside is what's generally known as a comment. I just wanted to put on the screen exactly what I was verbalizing. Anything in C that starts with slash is a note to self, like a sticky note in Scratch, which is just for the human, not for the computer. And it's a way of reminding yourself or someone else what's going on on that line or those lines of code. But I'll go ahead and delete it for now is unnecessary because now if I go back into my terminal and clear those errors, make this cat again, now it does work because the cat uh the meow function has been defined exactly where it should be. And now I can make the new version of this uh cat even better. I could change the function meow to take a variable n as input for the number of times. And then in here I could do something like my for loop for int i equals z i less than n i ++. And then in this for loop I can print out quote unquote meow. And then I'm going to have to change this too because I have to copy and repaste it if you will or just manually fix that. But now I can get rid of all of this and do meow three for instance. And this now will be the second version of the scratch code. If you will make cat still going to work exactly the same. Meow meow meow. But now I've implemented my own function that does take input even though it doesn't happen to return any output. All right. Questions on any of these examples just yet? confusion. All right, let me add one other feature to this to demonstrate that we can take not only input but actually produce output if we want. If I go back into this code here, let me propose that it's a little silly to be hard coding that is fixating three. It'd be nice to get input from the user. So I could do this. I could use int n equals get int and say something like what's n question mark and then I could pass n in if only to demonstrate a couple of things. So one now the program is dynamic. I'm going to ask the user how many times to meow and I'm going to pass in that value n. Now this deliberately is confusing at the moment because wait a minute I got n defined here used here but then redefined here and then reused here. So it turns out that even if you create n up here and use the name n, no other functions can see it for that same issue of scope. So for instance, suppose I didn't quite remember this and I sort of naively just said void. Meow doesn't need to take any inputs because heck meow uh n is already defined in main. Let me go ahead and open my terminal and clear it. Make cat and see what error comes out here. Well, error cat. Oh, sorry. I made two mistakes here. Let me I also have to change the prototype up here to say void which means again meow takes no inputs. Let me go ahead now and rerun make cat. And there we have an undeclared identifier again n. So in cat line 14 which is here it doesn't like that I'm using n. But wait a minute I created n here but for the same logic as earlier. That's fine. You created n on line 8. But where does n exist? In what scope? Yeah, only between the curly braces, which is lines seven and 10. So by the time you get down to 14, it's out of scope, so to speak. So it just doesn't work. So the solution is exactly what I did the first time. I can pass it into meow as input, and I have to tell C to expect that input. And I can use the same name, but arguably that's going to get confusing sometimes. But let me do this. Let me go back into my code. Let me undo this change such that now meow does take an input, but instead of just calling it n and using n everywhere for number, this is crazy. Let's just call this like times. So meow takes some number of times and then it uses that value. And now I'm passing in on line 9 n, but in the context of the meow function on lines 12 onward, that same variable n is now referred to as times because you're passing it in as input and giving it its own name. And that's totally your prerogative. It's just a matter of scope. I mean, I could have called it M or some other letter of the alphabet, but times is even more clear because that's the number of times I want the cat to meow. But again, the whole point here is just this matter of scope. All right. So, let's take a higher level look now at some of the things we've been thinking about and then we'll do a final deep dive or two on some of the corner some of the problems that we can solve with all of these building blocks and some of the problems that we're sort of ignoring for now. So, when it comes to writing good code, CS50 and really the world in general tends to focus on these kinds of axes. Correctness, design, and style. What does this mean? Correctness just means does the code work the way it's supposed to? In the context of a class, it should do exactly what the homework assignment aka problem set tells you to do. In the real world, it should do exactly what someone decided the software should do, the product manager, the CEO, or the like. Correctness just means it behaves as it should. That's different though from how well designed the code might be. And we've seen that a few times. I've had some simplistic examples in Scratch and C that were 100% correct. Like it did the right thing logically, but I was wasting the computer's time. I was wasting the human's time by asking more boolean expressions than I needed to and so forth. So design is more about like in the in the world of English like not only saying things that are correct but doing it well like in making a good cogent argument not just one that happens to be correct. Style meanwhile is the third axis on which we might evaluate the quality of someone's code and that's more of the aesthetics like is everything pretty printed that is nicely indented are variables well- named and not just called XYZ arbitrarily or something like that. So style matters really to other humans, not to the computer, but to other humans. And to illustrate these, you'll see that in problem set one onward, you'll be given a number of tools that you can use. So one of those tools is called check 50. And in each problem set problem in C and Python and other languages, you'll be showed how you can test your own code. And you can literally run a command that CS50 created called check 50. You'll then specify what's called a slug, which just means a unique identifier for that homework problem. and you'll get uh quick feedback on whether or not your code is correct. It doesn't mean it's well implemented or well-designed or pretty that is well stylized. But at least that's the first gauntlet in getting good code submitted. Design though is much more subjective. Design is something you get feedback on from a human for instance in section or a teaching assistant or in software. You can actually see at top VS code there's a couple of buttons that I haven't yet used but could. Design 50 is built on top of the CS50 duck whereby if you have a program open in a tab, you click design 50, you will get chatgpt like advice on how you can improve not the correctness of that code but the design of that code, the quality thereof, which is a bit more subjective and modeled after what a good teaching assistant might say. Style 50, meanwhile, is a third tool that will provide you with feedback on the style of your code and will show you on the left what your code looks like and on the right what your code really should look like in so far as it should be consistent with what we've taught in class and consistent with CS50's so-called style guide. And those of you who have some prior programming experience undoubtedly won't like some of CS50's stylistic choices. And that's going to be the case in the real world, too. But as I alluded to earlier, in typical companies, you would have an official style guide or tool to which everyone adheres so that everyone's code actually looks the same as everyone else's even though people have contributed different solutions to problems. So correctness, design, style is not only how we but really the world at large tends to evaluate the quality of code and we do it by way of these CS50 specific tools here. All right, how about one final flourish then to this here program? Back in VS Code, I've got a correct solution right now. Um, it's well styled, I'll stipulate, even though it could stand to have some more comments. So, for instance, I could do something like this, like meow uh some number of times, a comment to myself. Or up here I could say something like uh get uh a number from user just to remind myself and my TA or my colleague what it is this code is doing. But what more could I do in the way of design? Well, this function here get in will indeed get me an integer but not just positive or zero but negative. And I could go in and add a bunch of code like before like I could actually do instead of this line I could do something like int n semicolon do the following. All right. n equals get int and then I can say what's n question mark and then after that I can do something like while n is less than zero keep doing that so I can have a pretty verbose implementation of getting user input or I can implement another function of my own that only gets a positive integer or non- negative integer from the user for instance I might do something like this uh I could uh declare at the bot uh maybe below my main function a function like this uh int uh how about get n and then inside of this I might say void because I'm not going to pass in any input then inside of this function is where I'm going to do int n do while uh n equals get int quote unquote what's n question mark and then down here I'm going to do while n is less than zero but rather than do something immediately with n because I'm no longer inside of my so-called main function. What I'm going to do, which is new, is return this value n. And notice that this notion of returning a value, which is the first time I've done this explicitly, is consistent with this little hint here on line 19, which implies that this get n function, which I'm inventing, is going to return not void, which means nothing, but an integer. And that's the whole purpose of this function in life. Now, if I scroll back down here, I can get rid of this whole block of code and just say get n from the user and then I can immediately call meow with that value. I need to do one other thing. I need to highlight this line of code here and I'm going to go ahead and add another prototype up top, which is the only time again for now that copy paste is encouraged and uh best to do. So, I've invented my own function getn. The whole point being now I have this sort of abstraction here of a function whose sole purpose in life is to get me not just an integer but one that is zero or positive and not negative. If I open my terminal window, clear the mess from before, make this cat dot slashcat. What's N3? I'm now back in business. And again, we've essentially translated from scratch last time into C this time. Exactly how we might modularize now the code. abstract away these lower level details and ultimately create my own function that as before takes not only arguments but in this case has not only side effects or doesn't have side effects but rather a return value this time. All right. So as you walked in we had a little walkthrough of Super Mario Brothers playing from yester year which was a sidescrolling game in which Mario would jump down and go up down left right and try to collect coins and make it to the end of the level. There's a lot of obstacles throughout this kind of game uh whereby the world might look a little something like this. Like there's a pit that Mario's got to jump over and then there's these coins hidden typically behind these question marks that he can jump up and hit his head with and actually acrew points. Now, we're not going to do anything graphical just yet. We're leaving graphics behind for now in the form of scratch. But with C, we can implement some of these ideas. For instance, if I were to write code to generate just this uh row of four question marks, I dare say there's a bunch of ways we can do this. In other words, let's see if we can't use all of today's building blocks to start implementing our own tiny version of Super Mario Brothers in a file, say, called Mario.c. So, let me open and clear my terminal window. Let me run code Mario.c. And let's just try to do something super simple like print four question marks in a row. Well, to do this, I need print f. So, I'm going to include standard io.h. I'm then going to do int main void. More on that next time. And inside of main, my default function that just automatically as before gets called for me. I'm going to print out the simplest possible implementation just print out four question marks like that. So no need per se for a loop just yet. But I think we can go down that rabbit hole too. Let me go down into my terminal window. Make this version of Mario dot / Mario. Enter. And voila, we have a very black and white version textual version of four question marks in the sky. Now I'm kind of cheating here by just hard- coding four question marks. What if I wanted not four but three or five or some number other number? Well, we could do that with a loop too. So let me change this code here and do something like this. Four int i equals say zero. I less than say four for now. I ++ then inside of this loop I can print out one question mark at a time. Semicolon. Now let me go back to the bottom. Make this version of Mario dot / Mario. Enter. And voila. It's not actually correct this time. So why am I getting a column instead of a row with this here change? Yeah. >> Yeah. So I've got I foolishly included the backslash n after each question mark. Okay. So that seems like an easy fix. Let me get rid of that. Let me now recompile Mario. Rerun Mario. And now so close. Now I've just done something stupid. All right. I need the back slashn. So, I think I do want this here. Or what do you propose instead? >> Yeah, I should really put the back slash in outside of the loop. So, once I'm done printing all of the question marks, then I get the backslash. And that's fine, even though we haven't seen this before. Back slashn is an escape sequence that you can certainly print by itself. So, I do quote unquote back slashn outside of the loop below those curly braces. Now, if I do make Mario dot slashmario, now I get the four uh question marks in a row as well as the new line at the very end. So, again, kind of a little baby exercise, but demonstrative of how you can just take a few different techniques, a few different building blocks we've used to compose a correct solution to what a moment ago was a brand new problem. Well, let's try another. So later on in Super Mario Brothers when you go into sort of the underground world, you see or rather it's still above ground, you see a column of uh bricks like this that he has to jump over. So those here, how might we make a column? Well, we kind of had that solution already. And in fact, if I go back to VS Code here and just change this version of Mario, I think we can design this thing to be pretty simply the same. I is less than three though. And I do want to put the back slashn at the end there. Make Mario dot / Mario. And albeit textual, I've got my column of three uh of let's see, I don't want question marks. Let's make this a little better. Maybe we'll use the hash symbol because that kind of sort of looks like a square. So, make Mario dot / Mario. Okay, now we're back in business. But let's make it more interesting by going into Mario's underground now. And here's the third and final Mario problem whereby we want to implement like this 3x3 grid of bricks circled here. So, this one's interesting because we've never done something in two dimensions. I did horizontal, I did vertical, but we haven't really composed those ideas into the same. So, let me now think a little harder this time about how I can print out row, row, row. And this is where if you have in your mind's eye any familiarity with like old school typewriters, it's kind of the same idea where you want to print a row of bricks, then go back to the beginning, a row of bricks, then go back to the beginning, and a row of bricks. And that's kind of what print f has always been doing for us. It's printing line by line by line of text. It's not jumping around. So, we can leverage that perhaps as follows. Let me go into my main function here. And if I want to print out something two-dimensional, let me kind of think about it as rows and columns. So, maybe I could do this for int i equals 0, i less than 3, i ++. Why? Well, I want to do something three times. Even if I have no idea where I'm going with this solution, I at least want to do something three times, like three rows of text. But how about this? On each row, what do I want to do? I want to print out three things. So I could steal this idea like int i= 0, i less than 3, i ++. And then inside of this loop, let me just print out one brick at a time. No new lines yet. One brick at a time. But there is a bit of a problem here. This is correct to nest loops in this way. Totally fine to have an outer loop. Totally fine to have an inner loop. But I probably don't want the inner loops variable competing with the outer loops variable by giving them the same name. And that's fine. It is pretty conventional in code when you want another integer and it's not I because you've used it already. Fine. You can use J. So using I and J and K is generally fine. If you're using L, M, N, O, like at that point, you're probably doing something wrong. There's no hard line, but at some point it gets ridiculous and you should be coming up with better variable names. But I and J, maybe K is fine. So now what's really happening? Let me suppose that this is my uh for each row. This is my for each column I want to print one brick. Now this isn't quite correct but let me go ahead and make this version of Mario dot / Mario and ah now there's what? One, two, three. There's nine bricks there. So I'm close, right? It's supposed to be 3x3. Nine total. What do I want to do though to get this just right? Yeah, over on the left. Yeah. What on what line number would you or afterward? Uh where would I put the new line? Because I think I don't want to put it here because I'm going to get myself into trouble as before. How about in back? >> After the what? >> After 13. Yeah. So, after I finish printing each uh brick in the column from left to right, I'm going to go ahead and print out I think a single new line here, nothing else. And now, if I open my terminal, run Mike Mario again, dot / Mario. Now, we've got it. And it's not a perfect square like this one is because like the hashtags are kind of more vertical than they are horizontal, but it's pretty darn close. The e the takeaway here being you can certainly nest these kinds of ideas and compose them. And honestly, INJ is maybe making this uh more confusing than necessary. I could just give these better names like row, row, row, and then maybe call for column or column. I can spell it out if that's clearer. Column column just to make clear to myself, to my TA, to my colleagues what exactly these variables represent. And indeed, like an old school typewriter, the outer loop is handling row by row by row. But each time you're on a row, you first want to do column, column, column, column, column, column. And that's what logically the nesting is achieving. And again, if I do make Mario dot/mario, all I've done is change variable names. It has no functional effect beyond that. Now, this is a little more subtle, but there is a bit of duplication in this program. There's a bit of magic, and this is subtle, but does anyone want to conjecture what still could be improved here? What is maybe rubbing you the wrong way? >> Yeah, I've hardcoded the three here and here. It's not a big deal. It's like an in-class exercise. Like, who really cares if I'm just manually typing three. But if I want to make this square bigger and bigger and bigger over time, I'm going to have to change it in two different places. And I've conjectured last time and today eventually that's going to come back and bite you. You're going to do something stupid or a colleague isn't going to realize you hard-coded three in multiple places. Like just bad design. So, how could we fix this? Well, we could just declare a variable like n, set it equal to three, and then use n in both places. And that's pretty darn good. That's better because now we're reusing the value. But we can do one better than this. It turns out in C and in many languages too, there's the notion of a constant whereby if you want to store something in a variable, but you want to signal to the compiler that this value should never change. And better still you want to prevent yourself a human let or not not to mention a colleague from accidentally changing this value you can declare it to be constant or const for short. So if I go back into VS code on line five now and say constint that means that n is an integer that has a constant value. So if I do something stupid later in my code and I try to set n equal to something else the compiler won't let me do that. It will protect me from myself. So, it's just a slightly better design as well. All right, questions on any of these here, Mario examples. The first of our sort of real world problems, albeit simplified textually. All right, let's focus lastly on things we can't really do well with computers. Uh, namely some of the limitations thereof. So, here is a cheat sheet of some of the operators we've seen thus far. We played with these with comparison and uh doing some uh addition or the like but here we have addition, subtraction, multiplication, division and the modulo operator which is essentially the remainder operator which you can do with a single command uh with a single operator like this. Let's use some of these to make our own calculator and see what this calculator can and can't do for us. So back here in VS Code, let me open my terminal. Let's go ahead and create a program called calculator C. And in this program, let's do something super simple initially that just like adds two numbers together. So let's include first uh cs50.h so we can use our get functions. Then let's go ahead and include standard io.h so we can use print f. Let's just copy paste our usual ma uh int main void. And inside of main let's do this. Declare a variable x. Set it equal to get int. And let's ask the user what's x question mark. Then let's declare another variable y. set it equal to get int and ask the user what's y question mark. Then let's do something super simple like give me a third variable. Heck, we'll call it z. Set it equal to x + y. And then lastly, let's just print out the sum of x + y. So this is a super simple calculator for addition of two numbers. Print f quote unquote. What's the answer going to be? Well, it's not percent s. This was quick earlier. What's the placeholder to use for an integer? percent I back slashn and what do I want to substitute for that placeholder just z in this case we haven't quite done this before but again it's just the composition of some of our earlier ideas I can go ahead and make this calculator enter dot slashcal enter what's x is one what's y is two and indeed I get three so not a bad calculator it seems to be working correctly but it's maybe not the best design like it's generally frowned upon to create a variable like Z if you're only going to use it a moment later in one place. Like why are you wasting my time creating a variable just to use it once and only once? Sometimes it's fine if it makes your code more readable or clearer. And in fact, it might if I called it sum. Like that's arguably a net positive because I'm making clear to the reader that it's the sum of two variables. But even then, I'm quibbling. I could just get rid of that third variable altogether. And heck, I could just do x plus y right here. That's totally fine and reasonable, especially since it's still a pretty short line of code. It's not hard for anyone to read. Feels like a reasonable call. But this hints at again my comment on design being subjective. There's no steadfast rules here. Some of the TAs might disagree with me, but like h this feels fine. It's readable, which is probably the most important thing ultimately. Let's make this calculator dot /cal enter 1 2 and we still get three. So the code now is still working. As an aside, if you're starting to wonder how I type so fast, sometimes I'm kind of cheating with autocomplete. So if I know I want to create a program called calculator and calculator.c exists, I can start typing c tab and you can hit tab to sort of autocomplete the rest of the file name if it happens to exist there. Better still, if I want to go back to previous commands I've typed, I can actually use my up and down errors to go through my history. So if I go up up, you'll see all of the recent commands I typed, and that saves me time, too. So just little keyboard shortcuts that speed things along. All right. All right. Well, let's do something like this. Not just addition, why don't we use some multiplication? So, how about we prompt the user not for two um numbers, but how about just one initially x and let's go ahead and multiply x by two. And I would do x asterisk 2, which is the multiplication operator in C. Let's make this version of the calculator dot/cal. And now, what's x? Let's do 1. So 1 * 2 is 2. Let's do this again. Let's type in 2. 2 * 2 is 4. Let's do this again. 3. 3 * 2 is 6. and so forth. That's fine. It seems to work. But maybe let's implement like a recent meme from the past year or two. How about this? Let's uh let's see if you recognize it as we go. So, I'm going to get rid of this code al together. And inside of my calculator, I'm going to do something like int dollars equals $1 by default. Then I'm going to deliberately induce an infinite loop just for demonstration sake. Then I'm going to do a character from the user and say something like this using getch char which gets a single character. Uh, how about I'll tell the user here's this many dollars percent I with a US uh dollar sign before it double it and give to next person question mark if you're familiar with that one and I'm going to prompt them for yes no answer but I'm going to plug in the current number of dollars so they know what they're wagering on then below this I'm going to say if the character the human typed in equals equals y for yes then I'm going to go ahead and do dollars times equals 2 which recall was our shorthand notation for doubling something. Uh, in this case, I could more pedantically say equals dollars* 2. But again, I can save some keystrokes and do dollar uh times equals 2 instead. There's no plus+ there's no star star trick asteris asterisk trick. You have to do it in this way uh minimally. However, if the user does not want to double it and give it to the next person, then let's do an else and just break out of this infinite loop altogether. But notice what I've deliberately done in get char similar to print f. I have included a placeholder. Why we implemented getchar and get in and get string just like print f in that you can pass in placeholders and plug in values. Why? Well again for the meme sake I want to be able to tell the user how much money I'm about to hand them when I ask them the question. Do you want to double it and give it to the next person? I want to see the number. And the dollar sign is just because we're talking about dollars. The percent i is because we're talking about integers. All right. If I didn't mess this up, let's make this version of a calculator or meme. So far so good. Dot/calculator. Enter. Here's $1, which was the initial value of my dollars variable on line six. Double it and give it to the next person. All right. Why? Here's $2. Double it and give it to the next person. Okay. Okay. Okay. Okay. Okay. I'm going to do it faster. It's getting pretty good. You can see the power of exponentiation. It's getting pretty high. Let's keep going. Keep going. Lot of doll. Too far. That does not happen in the memes. What happened here? What's going on? Yeah. What do you think? >> Exactly. Good intuition. Because the computer only has a finite number of bits allocated to each integer. I hypothesized earlier that it's usually 32 bits, maybe 64 bits, but it's finite, which means you can only count so high and it's roughly 4 billion or again an integer by default can be negative or positive. So it's roughly 2 billion and that's pretty close to what we were getting here. In fact, we overflowed the integer in memory. In fact, integer overflow is a term of art whereby you can overflow an integer by trying to store too big of a value in it. And the reason for this is again to make this clear, this is a piece of memory inside of a laptop or a desktop or some other device. And in these little black chips is a whole bunch of bits or really bytes that can store information electronically. But they allocate those bits in units of 8, maybe 16, maybe 32, maybe 64, but finitely many per value. And whether we're using 32 or 64, you can only count so high if you have a finite number of bits. And we've seen this problem even on a small scale with our flat light bulbs last week. If we have a three-digit number as represented by like three physical light bulbs or three tiny transistors in the computer, I can count from zero to one to two to three to four to five to 6 to 7. If I want to count to eight though, I need a fourth bit. But as the red suggests, if you don't have a fourth bit, for all intents and purposes, that number is just zero. Or as an aside, depending on how you're representing your number, sometimes a leading one indicates that the number itself is negative, which is why in VS Code, we actually saw both symptoms. First, we went negative because we wrapped around logically, much like that one resulted in our getting back effectively to zero, and then we did indeed end up on zero ultimately. So, how can we chip away at this? Well, a couple of solutions perhaps. Let me close my terminal window here, and instead of using an int, well, let's just kick the can down the road. Let's use a long which is 64 bit. So at least we can give away even more money in this scenario. I can't use percent I and need to use percent li now for a long integer. But I think at this point if I go back to VS Code's terminal window here. Oh, and I quit that program by hitting C quickly. Uh now I'm going to go ahead and do make calculator again dot /cal. And I'm just going to keep hitting Y. But because I'm using a long int now and thus 64 bits, if I do this long enough, it's going to get crazy high and much much higher than before. High enough that I'm not going to keep clicking Y enter because we're never going to hit the boundary. But eventually, especially if I did this in a loop automatically, it would certainly Oh. Oh, okay. I guess exponentiation works fast. Okay, so it did work. I didn't think I was going to hit it enough times, but the same problem happened again. We overflowed this long integer even using that many bits because I was talking so long I kept hitting y enough times to overflow even that long integer. So that too was a problem and this happens truly in the real world. So picture here is a Boeing 787 from a few years back, long before there were all the more recent problems with Boeing planes, whereby after 248 days of continuous power, which is kind of a thing in the aviation industry, like time is money and generally they want the planes in the air as much as possible, which means they want them powered on as much as possible, which means they don't like turn them off at night. They keep them going and flying. After 248 days, the New York Times reported a few years back that a model 787 airplane that has been powered continuously for 248 days can lose all alternating current electrical power due to the generator control unit simultaneously going into failsafe mode. This condition is caused by a software counter internal to the GCUs that will overflow after 248 days of continuous power. Boeing is in the process at the time of developing a GCU software upgrade that will remedy the unsafe condition. So literally what this means is that the power to these planes would just shut off if the planes were on for more than 248 days at a time. And this was a common thing for planes to be maximal power. Why was this actually happening or what was the solution? Well, the short-term fix because it took a while for Boeing to fix this was what? What would you do if the the symptom is that the plane shuts off mid-flight after 248 days? Yeah. >> Turn it off back on. literally turn it off and back on again, much like you've probably been taught with your phones and computers and any other electronic devices that somehow freak out on occasion. Reboot the plane. Now, why is that? Well, anytime you reboot a phone or a laptop or a plane, all of those variables get reset to their default values, which if it's the first line of code, like in some of my examples, gets set back to zero again. For instance, the first line of code is executed from top to bottom. So, this effectively solved the problem. But when they finally rolled out a fix, then you didn't have to do that anymore. But the or source of the problem is essentially that they were probably using 32-bit integers, but also negative values. So they had 31 bits at their disposal to count to positive numbers. And 248 days is roughly how many tenths of a second there are, which means once you count in tenths of a second for 248 days, you would overflow an integer and the power would shut off effectively because something ended up going to zero. So, there was a lot of sort of marketing speak or technical speak in that, but it boiled down to just a simple integer overflow. There's a historical bug in Pac-Man. If you've ever played this uh in any of its forms, whereby you can play up to level 255, but because there was a missing if condition that checked what level you were on, you could accidentally garble the screen if you were amazing at Pac-Man because they too would overflow an integer and just random characters would end up appearing on the screen. So, it's sort of like a badge of honor to actually hit level 256 in this way because of this bug. But there's yet other issues we can see here. And if you don't mind, we might go a couple minutes over, but let me just demonstrate what these examples can do for us here. If I were to revamp my calculator here as follows by clearing my terminal window after hitting C to kill that, let me go ahead and get rid of all of this meme code here. Scrolling down to the inside of main, and let's just do a couple of things like this. int x equals uh quote unquote uh what's x question mark. Then let's go ahead and do int equals get int quote unquote what's y question mark. Then let's go ahead and print out just x / y. So here's a percent i back slashn x / y. This would seem to be a calculator now for division which I can make as before. And actually sorry I don't want to do missing terminating. Oh, sorry. Missing a double quote. There was an unintended bug. So, if I make this your calculator, do do/calculator, type in 1, type in three, I get zero, which is weird. What if I do instead maybe two and three? It's zero instead of 66. What if I do three and three? Well, that curiously works. But if I do something like four and three, which would be 1.33, that two doesn't seem to work. So there's this other issue in computing when you have finite numbers of bits known as truncation whereby even when you're trying to do floatingoint math like with a decimal point if you are using an integer you're going to throw away everything after the decimal point unless you're explicitly using the right data type. And we saw an illusion to this earlier. If I actually go in now and change my values from integers to floats and change the percent i to a percent f and remake this calculator. Now I can do 1 / 3 and I actually get back that their response. But there's another issue latent here which happens to in the real world whereby I'm going to tweak this percent f to be a little arcane. It turns out you can tell C how many digits you want to show, how many significant digits you want, if you will, by just using a dot and then a number like 50 arbitrarily. And contrary to what you might have learned in grade school, this calculator would seem to think that dot /calc 1 divided by three is not 0.3333 infinitely many times. There's all this random stuff happening at the end. Long story short, this is because computers one only use finitely many bits even to represent floatingoint numbers. And if there's an infinite number of those, you can't possibly represent every possible floatingoint value. So we're essentially seeing an approximation of 1/3 precisely. But this too happens quite a bit in the wild. There's really no solution to this other than by throwing more bits at the problem using a a double instead of a float or at least somehow trying to detect this and catch this. That then is what we'd call floatingoint imprecision. But to tie this together and sort of induce a bit of fear and for the coming years these things happen all of the time. Back when I was finishing school, there was the so-called Y2K problem or year 2000 problem whereby for decades, computers had been using not four digits to represent years, but just two because it was convenient. It was more efficient because you use half as much memory to represent maybe the year 1999, just using two digits instead of four. Of course, when the uh year rolled around from 20 thou from 1999 to 2000, if you didn't have these numbers even in memory, you might confuse 2000 with 1900, which was the presumption if you're only storing two digits. So, we screwed that up. And thankfully, the world scrambled. And if you read up on Wikipedia and news articles from the time, everyone thought the world might very well end, but it didn't. So, you'd think we'd have learned our lesson. Unfortunately, another such problem is coming up in the year 2038 whereby historically since uh the 70s and prior, computers have generally used 32-bit integers to keep track of time, the date and the time by means of counting how many seconds have passed since January 1st, 1970. And all of the math is just relative to that date because that's when computers were really starting to come onto the scene, if you will. Unfortunately, there's only 4 billion values you can count to or two billion if you're doing negatives from uh January 1st, 1970. And so, um on the date January 19th, 2038, we will overflow a 32-bit counter. And suddenly, if this problem is not fixed by you or other people before the year 2038, our computers and phones and other devices may very well think it's December 13th, 1901. So, there are solutions to these problems. CS50 is all about empowering you with solutions to these problems. But if you'd like to scan this here code, um, this will add that date to your Google calendar or your Outlook calendar. Keep an eye on it. That though is week one for CS50. Problem set one will be in your hands soon. We'll see you next time. Heat. Heat. Heat. One fish. Two fish. Red fish. Blue fish. >> Congratulations. Today is your day. You're off to great places. You're off and away. >> It was a bright, cold day in April, and the clocks were striking 13. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him. All right, this is CS50 and this is week two. And if we could after this dramatic reading, a round of applause for our volunteers. So we can now take for granted from week one that we now have a new way to express some of the ideas that we first explored in week zero like functions and conditionals and variables and the like. And now we're doing in C what we used to do in Scratch. Today what we're going to start to focus on is some real world problems so that we can take for granted that we have that expressiveness. We have some tools in our toolkit and actually start to solve some realworld problems if representative thereof. In particular, the real world problem that we're going to start today and this week with is that of reading levels. Odds are when growing up, you read at a certain level based on the age at which you were at. Maybe it was first grade level or fifth grade level or 10th grade level or the like. And that was a function of just how comfortable you were with the words in the book or words on the screen that you were reading. What you've just heard, thanks to our volunteers, are three different reading levels that each of these three volunteers reads at. And in fact, why don't we go ahead and hear them again and be a little more thoughtful this time as to assess at what reading level your classmate is reading. So, let's start with Leah if you'd like to introduce yourself first. Hi, I'm Leah. I'm a first year in Hworthy. And here is my little thing. One fish, two fish, red fish, blue fish. >> So, at what reading level would you say Leah reads based on her recitation thereof? Yeah, in the front. >> Kindergarten. >> Kindergarten. Okay. Okay. So, a fairly young age. And what makes you say kindergarten? >> He is speaking in very short phrases without much complexity. >> Okay. Very short phrases without much complexity. And indeed, according to one scientific measure that we'll explore in this week's problem set, indeed. We would say that Leah reads before grade 1, so kindergarten would indeed be apt. But welcome to the stage here. Let's move on now to Maria if you'd like to introduce yourself. >> Yeah. Hi, I'm Maria. I'm in Stoutton thinking of applied math. Um, congratulations. Today is your day. You're off to great places. You're off and away. >> Another familiar phrase, perhaps. At what reading level would you say Maria is? Well, yeah. Over here. >> Third grade. >> And what makes you say second or third grade? >> Okay. >> So, now we're starting to introduce uh complexities like rhyming and a bit more substance to the quote. And indeed, based on that reading, that same measure that I described earlier, which will involve a mathematical function that somehow analyzes what it is Maria just said. Indeed, we would conclude that she read at a third grade level or grade three. Finally, Omar, if you'd like to introduce yourself and read once more yours. >> Okay. Um, so, hi everyone. I'm Omar. Um, I'm a freshman at Earl, but thinking of doing Kamsai and this is my reading. Um, it was a bright cold day in April and the clocks were striking 13. Winston Smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent the swirl of gritty dust from entering along with him. >> All right, sort of escalated quickly. What reading level is Omar at, would you say? Someone else. What might you say or estimate? Yes, right here in the front. >> Eighth grade. >> Okay, eighth grade. And what made you say that? more comp, >> more complex sentences, more complex words. And indeed, according to that same measure, this full paragraph of text now, which indeed has even more grammar when you see it there on the screen, would be said to be at grade 10 because of that added complexity. So, with that said, we're going to need to be able to somehow sort of crunch these numbers to determine given a body of text at what reading level someone is. But in order to do that and apply any metrics to a body of text, we're going to need to represent that text in memory using something like strings from last week. But last week with strings, we could really just print them out or display them wholesale on the screen. But I think we're going to need to break down these various texts and others like it at a finer grain level. And indeed, among the goals for today is to explore exactly that. and also to take the proverbial hood off of the car to take a look underneath and how the computer is actually working, how these things like strings are actually functioning. So, if you could join me one last time in a round of applause for our volunteers. Thank you so much for helping out. Thank you guys. Thank you. Thank you to Maria as well. So among the goals for today beyond exploring a representative problem like this of reading levels is going to be another one which is even more important and more omnipresent than reading levels namely cryptography. The art of scrambling information or specifically encrypting it so you can send secure communications. Now you sort of take this for granted increasingly nowadays that when you send a text message or perhaps an email or check out online with a credit card that somehow or other your information is secure. And over the coming weeks, we're going to explore to what extent that is actually true and why or why. Now, now with cryptography, similarly too, if we want to be able to send messages securely, such that if I want to send a message to you, I don't want anyone else in the room to be able to figure out what it is I have said, even if they physically intercept that message, which is all too possible in a digital world. We're going to need to come up with metrics and mechanisms for actually scrambling information in a reversible way so that I can write my message somehow scramble it. You can receive that message even if after it's passed through many other hands and you can descramble or decrypt that same message. So for instance, here on the screen is a message, a fairly simplistic one that has somehow been encrypted. And we'll see by the end of today and by the end of this week that this encrypted message and there's a bit of a tell on the end there actually will be said to decrypt to this is CS50. But why is going to be the underlying question and what additional tools do we need on our toolkit in order to do that? Another word on tools. So, up until now, you've probably experienced some bugs, whether it was in Scratch or ever more so in C. In fact, don't feel too bad if like the very first program you wrote in C like didn't even work. You couldn't even make it or compile it until you went back and fixed some of the code that you had written. Well, it turns out that bugs, mistakes in programs are ever so commonplace. And even though we've already provided you with tools like the virtual rubber duck at CS50.ai, also embedded into VS Code at CS50.dev, dev of whom you can ask questions along the way. Among the goals today are to give you some lifelong tools at how you can actually debug software yourself when you don't have a duck nearby, when you don't have a TA nearby, let alone any humans at all. So with debugging, there's going to be a number of techniques that we can use all toward an end of like finding and removing bugs or mistakes from our software. And perhaps the person best known for having popularized this term of bugs is that of uh Dr. uh Grace Hopper pictured here who was a rear admiral in the Navy and was one of the original programmers of the so-called Harvard Mark1, a very early mainframe computer that if you wander across the Charles River over to the science and engineering complex here at Harvard, you can actually see part of this on display still in the lobby. It was succeeded by the Harvard Mark II. And on the Harvard Mark II, Dr. Hopper and her team were known for having put this note in their log book after having done some number crunching on the system there. And if we zoom in, they had found a problem with the computer this one day whereby there was literally a bug, a moth inside of the circuitry of the computer. And as was written here, first actual case of bug being found. And ever since then, do we say ever more so, the phrase bug and debugging when it comes to finding and eliminating problems in our code. So let's start with just that. In fact, let me go over to VS Code and let's deliberately make some mistakes together that might very well be reminiscent of some of the mistakes you've accidentally made thus far, but along the way give you all the more tools for solving those problems as opposed to sort of uh having to ask someone else, be it virtual or physical, for help and actually find these mistakes in your own code. Let me go ahead and consciously in VS Code create a program known to be buggy called buggy.c. And in this program, let's go ahead and do some fairly familiar code initially. I'm going to go ahead and start just like we did last week with int main void. More on that today before long. Uh inside of my curly braces, I'm going to say print f hello, world. Uh that's it. Now I'm going to go back to my terminal window here. I'm going to go ahead and do make buggy to make a program from that source code. But before I do, odds are even after just a week of this stuff, you can probably spot a few mistakes I've made, a few bugs. What do you see wrong already? Yeah, >> include standard. >> I didn't include standard io.h, that so-called header file, which is important because it tells the compiler that I plan to use functions therein like print f, which clearly I'm doing. So, let me go in and include standard io.h. What else seems to be wrong here? Yeah. I'm missing a semicolon at the end of line five here. So, I'm going to go ahead and add that in. And this is subtle and arguably not a bug, but maybe an aesthetic detail. What else have I done arguably wrong? Yeah. And back. >> Yeah, I forgot my backslash and the new line character just to move the cursor to the next line so that when I get a new prompt, it's on a fresh line of its own. Again, more of an aesthetic, but certainly a pretty reasonable thing to do. So, let me go ahead now and actually in my terminal window run make buggy. and it indeed compiled. But up until then, had I not fixed those mistakes, I would have triggered a whole bunch of bugs, a whole bunch of error messages as a result. In fact, let's rewind in time and undo the fixes I just made and go back to the original form here and try running again. Make buggy. Enter. And we'll see some scary looking messages up here. Let me scroll up to the top of the output here where we see buggy c, which means line three. That's where the problem is right now. error call to undeclared library function print f with type and then it starts to get a little more complicated but I do see clearly that it's calling my attention to print f. So hopefully at some point if not last week hopefully this week onward your instinct will be ah all right I'm an idiot I forgot the header file in which print f is actually declared it's not a huge deal it's going to come with practice so that's how I might know uh in more intuitively what in fact uh the solution here might be now here's another common mistake that I've just gone in and fixed but I did do something wrong and hopefully none of you actually did this because it's an annual FAQ. What did I just do accidentally wrong? So it's not studio.h, it's standard io.h. So do kind of ingrain that one for standard input output. The next though bug that I haven't yet fixed is that semicolon. So let me clear my screen and rerun make buggy. I should no longer see that first error message anymore. But I now do see another error message on line five. Expected semicolon after expression. All right, that one's pretty explicit. So I'm going to go ahead and fix this. But notice that up until now, my code wouldn't have been able to compile because of those two error messages. it stopped showing me uh by showing me these errors. But at this point, if I run make buggy enter, it did in fact compile. And yet it's arguably still buggy because when I run dot /buggy, I get my prompt on the wrong line. So this is a distinction now between a syntax error, something that or a programming error that outright stops my program from compiling. It's sort of a dealbreaker versus something that's maybe more of a logical error. I actually meant to move the cursor to the next line. And so there's different types of errors in the world as we're seeing here. Of course, if I rerun make buggy again/buggy. Now we're back in business hopefully with the intention of having this uh display exactly that. All right. Well, let's modify to look a little more like something else from last week. Recall that last week I started to get someone's name more dynamically. So I said something like name equals get string. And that was a function we introduced. And I might have said something like this. what's your name? question mark with a space just to move the cursor over. I know now I definitely need to end my thought with a semicolon. I could try and compile this make buggy now and I'm seeing a different error message altogether that you might not have seen yet. So on buggy.c line five error use of undeclared identifier name. What now is the mistake that I've made? Why does it not know? declare the type. >> Yeah, I forgot to declare the type of this variable, which for those of you with the prior programming experience is not something you have to do in some languages like Python for instance. But in languages like C, C++, Java, and others, you do in fact need to explicitly tell the compiler that you want to instantiate a variable, create a variable in the computer's memory by telling it its type. And it's not going to be an int because I don't want an integer, of course, in this case. I want text which we now know to be called string instead. All right, I think this fixes that bug. So, let me do make buggy again. And hopefully, huh, a fatal error this time. Again, indicating that my code did not recompile on line five. Still, I have an error, but this time it says use of undeclared identifier string. Did I mean standard in? So, this is a bit of a red herring. The compiler is trying to be helpful and saying did I mean standard in but I don't think I actually do that just is the most similar looking word in the compiler's own memory. What's the actual mistake that I've made here? Yeah, >> you didn't CS library. >> Yeah, I didn't include the CS50 header file because string recall is a feature of the CS50 library as is get string and get int and others. So the solution here is indeed to go up here and just to be nitpicky I tend to alphabetize my header files. It's not strictly required technically but stylistically I find it nice to be able to skim the header files alphabetically to see if something is there or not. I can include cs50.h in addition to standard io.h and it's in that file c50.h that not only is get string define declared so that the compiler knows that it exists it turns out so is the word string. So this is a bit of a white lie and this is something we do in the early weeks of the class. We dug up these old training wheels from a bicycle. The whole idea being to sort of keep you up and avoid you having to do all too much complexity early on. The point of these training wheels in the form of the CS50 library is to let us kind of ignore what a string really is for just another week or two after which we will then uh peel back that layer, take off those training wheels and reveal to you what is actually going on. So, for now, strings exist, but they exist because of the CS50 library. In a couple of weeks, they're still going to exist, but we're going to call them by a different name, as we'll eventually see. But everyone in the real world, uh, every software developer uses the phrase string. So, this is a concept that exists. It is not CS50 specific at all. It's just that in C, the word string doesn't typically exist unless you make it so, as we have. All right. So I think now if I clear my terminal window and rerun make buggy now it should in fact compile. And if I run dot /buggy enter I should be able to type in my name. And now voila hello. So this is now not a syntax error because I didn't screw up my code per se like it compiled. Everything is grammatically correct so to speak but logically intellectually this is not what I wanted right I wanted it presumably to say hello David. So, let's fix one final bug here. How do I fix this? On what line? How do I get it to say, "Yeah, hello, David." >> Yeah. On line seven, I need to do the string placeholder, the format code, so to speak, percent s. And then one more thing, someone else. What do I do after this? Yeah. And back. >> Yeah. A comma. and then add the variable name that contains the value I want to substitute in there which is indeed name though I could have called it anything I want. All right, so now make buggy enter seems to have compiled again dot /buggy. Now I type in my name once more and now we're back in business. So over the course of these few exercises, clearly I I meant to make most of all of these bugs, these mistakes, but they demonstrate not only syntax errors, which are just going to stop the compiler in its tracks. Like you won't even be able to compile your code until you fix those things, but even after that, there could be these latent bugs that seem to not be there until you actually provide input and see what's actually happening at so-called runtime when you're running the actual code. And so here's where it's no longer as easy as just reading the error message and figuring out what it means because there is no error message that appeared on the screen when it said hello, world. We had to use our own human intellect and realize, okay, that's clearly not what I wanted. Had you run CS50's own check 50 program on something like that, we could have told you that that's not correct by automatically assessing the correctness of it. But the compiler has no idea what you are trying to achieve logically. it only knows about the language C itself and the requisite syntax for actually uh writing and compiling code. So how could we go about solving logical problems in code? So I would propose that we start to consider this here list whereby when you want to find a logical problem in your code and better understand what's going on or really what's going wrong, print f is going to be your friend. Up until now we've used printf to literally print on the screen. Hello David, hello Kelly or anything else on the screen. But you can certainly use print f temporarily to just print stuff out inside of your program that you might want to better understand. And then once you understand it and once you've solved some problem fine then you can delete those temporary lines of code recompile and move on. So let's use print f as a debugging tool in that sense. Let me go back over to VS Code here and let me in this same program buggy.c see sort of delete everything and start over with a different sort of bug. I'm going to include standard io.h at the top. I'm going to do int main void after that. And then inside main, I'm going to do a simple for loop that just prints out like a a stack of three bricks like we saw in the world of Mario when Mario needed to we claimed sort of jump over a stack of bricks. We want to print out just three of those at the moment. So I'm going to go ahead and say for int i equals 0. i is less than or equal to three because I want three of these i ++. Then inside of this for loop, I'm going to go ahead and quite simply do print f hash symbol to represent the brick followed by a new line to move the cursor to the next line. Semicolon to complete the thought. Now, I've deliberately made a stupid mistake here, but in the context of a simple enough program that we can focus on the debugging technique on, not on the obscurity of the bug in question. Hopefully, you'll spot the bug in just a moment, if not already. When I do make buggy now and dot/buggy, I don't get three bricks. I of course get one 2 3 four total. So, there's a logical bug in this program. And odds are you can already spot what it is. But let me propose that this program is representative of a type of problem that you can solve a little more diagnostically by poking around and really asking the computer via printf to show you what's really going on. And I would propose that one of the most helpful techniques in a situation like this if you're trying to wrap your mind around why are there four bricks instead of three. Well, clearly this is related to the loop somehow. So let's look a little more thoughtfully at what the value of i is before we print out each of those bricks. And I might literally do something like this temporarily. Uh, print f quote unquote i is percent i back slashn close quote. And then I could just print right here and now the value of i just so that I can actually see it. Let me now go down into my terminal window make buggy again dot /buggy. And now and I'll full screen my terminal. I'll get some diagnostic information at the same time. So when I is one I get a brick. When I sorry when I is zero I get a brick. When I is one, I get another brick. When I is two, I get another brick. When I is three, I get a fourth brick. So now I can kind of see that, okay, my loop is working, but I'm going too far. I'm going too long. Now I can do this even more succinctly. For what it's worth, I don't need a whole new print def statement. I could just go into my existing print def, put my percent I there, and then maybe a space just to scooch things over and then print out I in that same line. If I now do makebuggy slashbuggy. Okay, now I'm seeing that I'm printing a hash a brick for each value of i from i equals 0 1 2 and also three. So the solution of course is that I shouldn't be starting at zero and iterating less than or equal to three. The solution is like ah I'm an idiot. I should have said less than three. Or if I prefer to count starting at one like a normal person, I could have set I equal to one and then go up two and through three. But as I claimed last week, the canonical way, the most common way to do this is start counting at zero and go up two, but not through the total value that you have in mind. But there's going to be another technique that's worth knowing here. Let me go ahead and sort of abstract this away by whipping up a slightly better variant of this as follows. Let me go ahead and delete this for loop. Let me assume for the moment that inside of main I'm going to ask the user now for the height of a pyramid. And I'm going to do something like this. int h equals get int. And let's prompt the user for the height value of this pyramid or this wall. And then let's go ahead and assume there exists a function called print column who takes as input a number h which is how many bricks you want to print. Now this function does not exist yet. Print column. Get in does exist but I don't have access to it. So let me not make the same mistake twice. What do I need to add at the top of this file? Yeah, >> CS50 header file. >> I need the CS50 header file because I'm using the get int function now, which again comes from our library, not C. So, let me go ahead and include CS50.h, but now print column. I can invent this function myself. So, let me go ahead and say void print column int height in parenthesis. More on that in just a moment. And then I'm going to recreate the loop from before for int i equals z. I is less than or equal to the height. So I'm going to deliberately for now make that same mistake as before. i ++ and then inside of this for loop I'm going to go ahead and print out a single hash and a new line to represent that there brick. So now main can use a function called print column. It's going to pass in the value of h and then this for loop in the print column function is going to take care of printing this thing for me. So, let me do this again. Make buggy. Enter. So far so good. Dot /buggy. Let's put in a height. I'm going to say manually height of three. And I should see three bricks. But of course, I'm still seeing four. Now, before we move on, let me hide my terminal and propose that this is just kind of stylistically bad to put anything other than your main function at the top. But recall that if I move my helper function, print column, and it's a helper function in so far as I made it to help me solve another problem. I can't recompile and run my code now. Why? The compiler won't let me. Yeah. >> Exactly. When the compiler gets to line seven of my code, it's going to abort compilation because it doesn't know what print column is. Why? Because I don't tell it what it is until line 10. And this was the only time I proposed that copy paste is reasonable is to highlight and copy the very first line of that function. Paste it above main with a semicolon. And that's a so-called function prototype. It specifies what the name of it is, what its inputs are if any, and what its output is if any. And more on these inputs and outputs later on. But now this is just a more complicated but more modularized version of this same program. Let me do make buggy. Still compiles dot /buggy. type in three and I still have that same bug. But the catch now is that my code has gotten more complicated. And the point of my having abstracted away this idea of printing a column into a new function is that there's just more code now to debug. I could certainly go in there and start adding print fs, but at some point print f is going to be a very primitive tool and you're going to waste more time adding print defs, recompiling your code, running your code, changing the print f, recompiling your code, running your code. It's going to get very tedious quickly when you have lots of lines of code on the screen. So, can I actually step through my code line by line? Maybe like your TA would in a section or a small class line by line walking through the code. You can because another tool that you have access to is that called debug 50. So, this is a CS50 command that will start an industry standard debugger. And a debugger is a piece of software that is used in the real world that literally lets you do that, debug your code by letting you slow down or even pause execution and walk through execution of your code line by line. The only reason we call it debug 50 is because in VS Code it's a little annoying to start the debugger. And so we automated the process of starting the debugger, but everything thereafter has nothing to do with CS50 and everything to do with realworld software engineering techniques. So how do we use this? Let me go back to VS Code here and let me propose that I want to step through this code line by line just like we might at a whiteboard in a smaller class to figure out why I'm getting four instead of three hashes. Well, in my terminal window, what I'm going to go ahead and do is this debug50 space/buggy. So debug 50 is the command. It needs to know what program I want to debug. So I'm specifying/buggy, which is the name of the program I just compiled. I'm going to get an error though the first time I run this. Uh, as will you if you make the same mistake. I'm about to see this message here. Looks like you haven't set any break points. Set at least one break point by clicking to the left of a line number and then rerun debug 50. So, what is this really telling me? Well, the debugger has no idea when and where I want to pause execution so as to start walking through my code line by line. It wants me to tell it where to break. That is where to pause by clicking on a line number. So, let me hide my terminal for just a moment. And you've probably never done this intentionally, but if you hover over the space to the left of your program's line numbers, you'll see a little red dot, a little stop sign of sorts. If you actually click on a line number, that red dot will stay there. And you can see the hover here saying click to add breakpoint. What I'm going to go ahead and do is say click to add a breakpoint at main. Maine is the entry point to my program. It's the default function that gets called. Let's break right away so I can step through this code line by line. All right, let me reopen my terminal window and clear it and then run debug 50 again with dot slashbuggy enter. And now a whole bunch of stuff is going to happen quickly on the screen. And then it's going to clean itself up because once the debugger is running and ready to go, it's going to allow me to start stepping through my code line by line. So what is going on? Well, notice nothing has happened in the terminal yet. Why? Because my code has been paused inside of main. in particular, it's been paused in the first real line of code. So the curly brace is uninteresting. The first line is just the function's name essentially. So line 8 is the first juicy line of code that could possibly do anything useful. It's been highlighted here in yellow. And that the fact that this cursor is here means that we have broken execution on this line, but we have not yet executed this line, which is why in the terminal, I don't see anything yet. I definitely don't see height followed by colon. Notice what else has happened here. All of a sudden in the lefth hand side of the screen where your file explorer typically is or where the CS50 duck typically is, we see mention of variables, you can actually see inside of the debugger what the value of any variable in the computer's memory happens to be. Now I don't quite understand this right now. We'll come back to this over time, but weirdly before line a 8 even executes, it seems that h has a default value of 32,764, which seems to have come from nowhere. As an aside, this is going to be what's called a garbage value. And this is actually why we have Oscar so omnipresently here. A garbage value tends to be a default value inside of a variable that's the result of that memory having been used previously for something else. Inside of your computer, you've got all of this memory, random access memory or RAM. More on that today. And it stands to reason that the my computer or whatever cloud server we're using has been running for some time. So the bits that H is going to use might already have some random switches on and off. Some random pattern of bits that happens to give me 32,764. But the moment this line of code executes, that value is going to get changed to what I actually want it to be, which is what the human is going to type in. Meanwhile, at the bottom here, you'll see a so-called call stack. More on this too in the weeks to come, but you'll see that we've paused on the function called main in the file called buggy.c. So, how do I do something useful? Well, at the very top of the debugger, you'll see a whole bunch of color-coded icons. One looks like a play button. And if I click that, it's just going to continue execution of my code as though I don't want to step through it anymore. So, I'm not going to click that just yet. The second arrow, which is a little curved arrow over a dot, is the so-called step over line, which will mean step over this line and execute it, but only one line at a time. Let's go ahead and do exactly that. So, I'm going to click the step over icon, the second one, which is the curved arrow with the dot under it. Click. Now, I see in my terminal window height being prompted. All right, let's go ahead and type in three, just like I did before, and hit enter. Now, notice what happens. Execution has paused on line 9 instead of 8. And you'll see that my variable, a so-called local variable, has the value of three as intended. All right. So far, this isn't all that enlightening other than demonstrative of the fact that I can pause execution of my program anytime I want. So, let's now click that step over button again so that we actually print this column. Click. And there we have it. Four hashes at the bottom of the screen. Now, execution has paused at the end of the function. This is just my opportunity to either stop or restart or continue. I'm just going to go ahead and click the play button and let it finish executing. Unfortunately, that wasn't really at all in enlightening except to confirm for me that I typed in three and three is what is in the computer's memory. Not that interesting though yet. So, let's do this. Let's leave the breakpoint on line six as before. Let's rerun the debugger by running debug 50 space/buggy. Let's let it do its startup thing, which looks a little messy at first, but now we've highlighted line 8 again. I'm going to go ahead and step over this line because I do want to get an int. I'm going to type in three again. enter. But this time, instead of stepping over line 9 and just letting print column happen, this is where the debugger gets powerful. Let me step into line 9 and walk through the print column function itself line by line. So, let me go ahead and click not this button, which is the curved arrow over the dot, but the next one, which is the step into button. Click. And now you'll see that execution has jumped inside of print column and paused on line 14. At which point I can see at top left what the default value of I is. And this is some crazy garbage value because whatever bits are being used to store I's value have some random garbage from some previous use of that memory. But as soon as line 14 executes once, I bet I is going to take on a value of zero. So let's do that. I'm going to go ahead and click step over because I don't need to step into this because there's no other functions there. Step over it and immediately at top left I is now zero. Now line 16 is highlighted. Let's step over this. Okay. And notice in the terminal window, what do you see? The first of our hashes. Let's step over. Step over. Second hash. And I is now one. Step over. Step over. Now we see a third hash. And I is now two. Step over. Step over. Okay, there's the symptom of the bug. Four hashes and yet I is three. But wait a minute, this is going to draw my attention now to line 14 before I continue onward. Wait a minute. Three is of course less than or equal to three, which is why I got that fourth hash on the screen. So at the end of the day, like you still need to exercise some of your own human intellect to figure out and understand what's going on. But the value of this here debugger is that you can pause and work through things at your own pace and poke around inside of your own code and better understand what's happening as opposed to compiling the program, running it, and just now having to infer from the symptoms alone what the source of the problem might be. So that was a lot. Let me go ahead here and just let it continue to the end because I know what the problem is. Now I need to change the less than or equal to sign to a simple less than instead. Questions though on debug 50 or any of these steps. Yeah, >> I have two questions. >> Sure. >> Could you go over what the break point thing is? And then my second question was around the garbage. The second time you ran it, it still gave that same garbage value even though you had assigned to H. >> Correct. So in order of your questions, what again are these break points? The break point or the little red stop sign here just tells the debugger where to pause execution. So frankly, I didn't have to break pause execution at main. If I really care about debugging print column, I could have clicked down here instead and then it would have just run main automatically and only paused once print column gets called. So a break point is where your code will break, the point at which it will break. As for the garbage values, I'm tell it's I'm oversimplifying exactly what's going on inside of the computer's memory. and it's not necessarily using exactly the same memory as before, but the operating system will govern exactly how the memory is laid out. Um, this is actually a significant problem, long story short, in a lot of today's systems because it's not that interesting to me to know that there was 32,000, whatever that number is, or the negative number. But suppose that that revealed the password of some another program or function that had some information there. It seems all too easy with the debugger, let alone C, to actually poke around the computer's memory. And we're going to come back to that in a couple of weeks. But for now, it's a garbage value in so far as you didn't put the value there. It somehow got there on its own for now. Other questions? >> When you have a four, does the i= to one at the end of the four or the next? Correct. So the question is about the order of operations for a for loop. So the first time you go through a for loop the initialization happens the stuff before the first semicolon and the condition is actually checked the boolean expression. Then everything inside of the curly braces is executed. Then the incrementation or update happens which in this case is I++ and then the condition is again checked the boolean expression. The code is executed. The update happens. The condition again the code is updated. And so it starts to loop like this. The debugger's graphics are fairly simplistic and it just highlights the whole line without making super clear what's happening. But that's just the definition of a for loop. Good question. Others about debug 50 or print def. All right. Yeah. >> Can you change the position of I++ and height? Short answer, no. The first thing is the initialization, the variable you want to create and initialize. The second thing is the actual condition, the so-called boolean expression. The third thing is always the update. So, it must come in this order. What you're not seeing is that you can actually have multiple boolean expressions, you can have multiple initializations, you can have multiple updates, but we're keeping it simple for now. And this is canonical. All right. So to make clear, assuming that either print f or debug 50 helped me figure out where the illogic was in my thoughts, I now know that the fix here is to just go and change the less than or equal to to a simple less than. And if I run the program again, of course, it's going to give me the three bricks that I always wanted instead. But there's other techniques we can use too. So besides print f and debug, you might wonder why we have a 7ft duck behind me here. All of these little rubber ducks on the floor. So rubber duck debugging per week zero is actually a thing. Uh this was popularized in a book some years ago and the idea is that when you are facing some bug, some mistake in your program or you're just confused on some concept. There is anecdotal evidence to suggest that just talking out the problem with an inanimate object like a rubber duck on your desk is enough often for that proverbial like light bulb to go off over your head because you hear in your own words what confusion you're having, what illogical thoughts you're having, and you don't even need another human or TA or AI in the room to answer the problem for you. So in fact on the way out today at the end of class we've got hundreds of ducks and enough for everyone to take home with you if you'd like to use that as another debugging technique whether in CS50 or something else. But of course now in the age of AI you also have the AI powered virtual duck at cs50.ai and also in VS Code at cs50.dev which really is a mechanism for asking questions that you don't think you can solve on your own. So, it might be reasonable to ask the duck, "What does this error message mean?" If you're having trouble wrapping your mind around it, but it's less reasonable to say copy paste your code into the duck and say, "What's wrong with my code?" You should really be meeting the AI halfway. After all, what's the point of actually doing this or any other class is to develop that muscle memory, develop those mental models, get some practical skills. So try hard to walk that line between asking the duck too much versus deploying some of these same tools yourself. Print fbug 50, even a physical rubber duck on your desk before you resort to sort of escalating it to human like or duck help. All right, so with those tools added to one's toolkit, let's actually consider and reveal what's been going on underneath the hood since last week. So this was the mental model we proposed for last week whereby when you write source code in a language like C. It's not something that the computer itself understands natively because computers we saw only understand zeros and ones aka machine code. So the compiler is the program that we use to convert your source code to the machines code from C to zeros in one in this case. More generally a compiler is just a program that translates one language to another. And in this case we're going from source code to machine code. So let's consider what's really happening. And indeed, this is among the goals of this week is to take a look at a lower level so that when you encounter more interesting, more challenging problems, you'll understand from so-called first principles what the computer is actually doing and supposed to do. So you can deductively figure things out for yourself and generally not view computers as like magic or I don't know how this works. you'll have a fairly bottom-up sense of how everything works by terms end inside of any computer, laptop, desktop, phone, or the like these days. So, here's the simplest of programs that we wrote last week, even though there's a lot of syntactic complexity as we've seen. The goal is to get it to machine code. These here, zeros and ones. So, how has that been happening when you just run make since last week? Well, these are the two commands that we've typically run after creating a file like hello. C. We then compile it with make hello and then we run it with dot /hello. So let's give ourselves this starting point real quick just so that we have an example in mind of exactly what it is we're compiling. So let me go back to VS Code here. Close out buggy.c and let's create a new file just like last week called hello.c inside of which is our old friend standard io.h h int main void and then inside of this we'll keep it simple just printing out hello world which again is my source code in C. How do I now actually compile that? Well, of course I can go down to my terminal window make hello/hello and we're off and running. So it was a bit of a white lie for me to let you think though that last week the compiler itself is called make. Make is a command that literally makes your program. It makes it by compiling it. But make is not technically the compiler. If we really want to get nitpicky, the compiler you've been using is actually called clang for C language. And this is a very popular compiler, freely available, open source so to speak. You can even look at the code other humans wrote to create the compiler online. And what make is really doing for us is essentially automating this command. So all this time I could have just run clang spacehello.c. But the default file name from Clang the compiler weirdly and for historical reasons is not going to be hello as you would hope. It's going to be a.out for assembler output. And we don't do this in the first uh in week one of the class because like this just makes things unnecessarily complex that we're adding some random name that you just have to know to type. However, we can do this now as follows. Let me go back to VS Code here. And let me clear my terminal and type ls. And we'll see everything we've created thus far. Buggy. C, which when I compiled it, I got buggy. And hello.c, which I just wrote. And when I compiled it, I got hello. Let's do this command now manually, though. Let's use clang on hello. C, and hit enter. That two seems to work. But if I now type ls, you'll see a third program specifically called a.out, which happens to be the same as hello. It just is using the default name instead of my custom name, hello. But if I do dot slash a.out indeed that too will work. But the reason we don't do that certainly in the first week of the course is that things get a little annoying or sort of escalate quickly thereafter. So let me go ahead and change this program as we've done a few times already. Let me include cs50.h so that we get access to like get string. Let me do string name equals get string quote unquote what's your name question mark close quote. And then down here, just like before, let me add my percent s and add in my name. So, I did that super quickly, but it's the same program we wrote a few minutes ago, and it's the same one we wrote last week. What happens now, though, is as follows. If I now try to do clang hello C enter, I actually get an error message. This one perhaps more cryptic than most. Somehow or other, I have this error. Linker command failed with exit code one because of an undefined reference to get string. Now, in the past when we've seen undefined or really undeclared mentions of get string, the problem was just with missing this line. This line is clearly here. But the catch is I'm getting this error message now because when I run clang of hello.c, I'm just assuming that clang knows where to find the CS50 version of get string. And that is not the case. Technically, if I want the compiler to compile this code for me, what I'm actually going to have to do is this. Let me go back to uh my terminal window here, and I'm going to say clang hello. C, but I'm then going to specify -Lcs50, which is cryptic at first glance, but this is telling the compiler to link in the CS50 library so that it knows what the zeros and ones are that belong to the get string function. Long story short, if I hit enter now, the error message has gone away. If I type ls, I've still got a.out, but it's a new version thereof. And if I do dot / a.out, now I see the new behavior where I can type in my name and see hello, David. Now, this is getting a little stupid that I keep using a.out. We can change that as well. In fact, these commands, as we're starting to see, support what are called command line arguments. And a lot of the programs we've run already take command line arguments. When we run code space hello.c, the so-called command line argument to code is hello. C. When I run make hello, the command line argument to make is hello. In other words, the command line arguments to a program are all of the words you're typing in your terminal after the name of the program itself, whether it's make or whether it's code or anything else. So, this is to say what I just ran clang of hello. C-LCS50, I was passing in two command line arguments. Hello. C, which is the code I want to compile, and LCS50, which means use the CS50 library, please. But I can add another to the mix. I can actually do something like this. whereby I do clang- o hello hello then I can do hello c and then -lc cs50 enter. Now that too seems to work. And if I type ls I've got all the same programs as before. So let's go ahead and get rid of those to make clear what's going on. I'm going to remove a.out. I'm going to remove hello. And just for good measure I'll remove buggy as well. So that all I have left in this folder is source code. So if I type ls there's my two files. Let's do this again. clang- o hello hello c-lcs50 enter. Now if I type ls I don't see a.out anymore because apparently according to the documentation for clang the actual compiler if you pass d- o as a command line argument followed by another word of your choice you can name the program anything you want without having to resort to mv or clicking on it and typing a new name in manually. So if I now do /hello, I see the exact same version where it's just asking me for my name and then printing it out. But long story short, the whole point of this exercise is that like running commands like this quickly gets very tedious. You have to remember like the order in which to do it, what the command line argument. I mean, this is just stupid waste of time typically, certainly in week one of the course to have to memorize these kinds of magical commands to get things working. But for now, know that when you run make, it's essentially automating all of that for you and making it as simple semantically as make hello or make buggy. But what's really happening is the make command because of the way we've configured cs50.dev for you is doing all of this behind the scenes. And it's not that magical. This just means change the file name to hello when you compile it. This just means compile this code. And this just means use the CS50 library. like that's all. But that message about linking something in there's there's something juicy going on there such that make is in fact helping us sort of solve a whole bunch of problems when we compile and in fact let me propose that if we take a step back and look at some of the actual code that we're compiling. Let's consider like what we actually mean by compiling. Yes, it's the case that to compile your code means to go from source code to machine code. But technically there's a few more steps involved. Technically when you compile your code that's sort of become the industry term of art that really is referring to four separate processes all of which are happening in succession automatically but each of which is doing a different thing. So just once let's walk through these these several steps. So what is this pre-processing step? So consider this program here which we wrote uh in brief last week. We've got include standard io.h which is there because we want to be able to use print f ultimately. We've then got a prototype for this meow function. And the meow function does this. All it does is print out quote unquote meow followed by a new line. Takes no input, returns no return values. The main function now has a for loop. Iterates three times each time calling the meow function. And we saw this already earlier today. This line of code here, the so-called prototype is necessary because we need to tell the compiler that meow exists before we actually use it here, especially if I don't get around to implementing it until later. So this copy paste of that first line of code, a so-called prototype solve that problem. This is what the header files are essentially doing for us. Before I use print f down here, the compiler needs to know what it is, what its inputs are, what its outputs are. Turns out the prototype for print f is going to be in standard io.h. And that's what that line of code has been doing for us all this time. In fact, let's take a simpler example that we keep using here whereby I'm including CS50.h and standard io.h. And I'm using the CS50 get string function to get someone's name and put it in a variable called name and then I'm printing out hello, such and such. What's going on now when I pre-process this file by running make, which in turn runs clang? Well, the compiler finds on the server's hard drive the file called cs50.h H goes inside and essentially copies and pastes its contents into my own code. Meanwhile, such that we get the prototype there for get string. And we haven't seen this yet, but it stands to reason that all this time using print f, we've been passing in a prompt like what's your name? And we've been getting back a string. What's inside the parenthesis, recall, is the input. What's before the function name is the output, the so-called return value. What about standard io.h? It's in that file that print f's prototype is. So essentially what the compiler does when pre-processing this file is it finds standardio.h somewhere on the server's hard drive, goes inside and copy and pastes those relevant lines of code into my code as well. It's to avoid me having to do all of that myself, find the file, copy paste it, or manually type out the prototype. These pre-processor directives just automate all of that TDM. So what this effectively has at the top of my code after the files been pre-processed is all of those hash symbols followed by include are changed to contain the actual contents of those header files. Now the compiler knows what get string is all about and what print f is all about. That then is the pre-processing step. What is compiling technically mean? Compiling means taking that pre-processed code, which again looks a little something like this, and convert it into something called assembly code. And we won't spend much time in this class on assembly code, but this is how programmers used to write code. Before there was C, before there was Python and Java and all of these other modern languages, programmers were writing code like this. Before this existed, they were programming zeros and ones into the earliest of mainframe computers using punch cards and other technologies. Like literally sheets of paper with holes in them. Not very fun. Very tedious. So the world invented this. Also not very fun, very tedious. So the world invented C. Not that much fun. So the world invented Python and so forth. We continue to sort of evolve as a species with code. But the compiler technically takes your pre-processed source code and converts it into something that looks like this. Cryptic, and that's to be expected. But there are some familiar phrases. There's mention of main. There's mention of getstring. There's mention of print f. And there's a bunch of other things. Move and push and exor and call and these other commands here. These are the assembly instructions. Those are the lowest level instructions that the CPU inside of a computer understands. CPU is the central processing unit. The thing by Intel or AMD or Apple or other companies. Those are the lowest level commands that the actual hardware inside of the computer understand. It's just nice to be able to write words like main and for and uh print f than it would be to run these much more arcane commands that you'd have to look up in a manual. So compiling just takes CC code and makes it a lower level type of code called assembly. When I said a.out means assembler output, that's why inside of that file is essentially the output of an assembler. All right, we're almost there. What does it mean to assemble a program? which is step three of the compilation process. That means converting assembly code to the actual zeros and ones we keep talking about. So if the file is called hello C, when that file is assembled, the assembly code becomes the zeros and ones for your code in hello. C. But your code is not everything that composes your final program. Your code from hello. has to be combined with code from CS50's library from the standard IO library that other humans wrote. I and the team wrote the CS50 code. Other humans in the world wrote the print f code in standard IO. So essentially the fourth and final step is to link all of those zeros and ones together. Somewhere on the server there is not just the header file CS50.h and standard io.h but your code hello.c, our code cs50. C and the code that contains print def's own implementation. Bit of a white lie. It's technically not called standard io. C, but the point remains ultimately the same. So these files have already been compiled for you in advance. This is your code. What the assembly process does is it combines all of that into zeros and ones and then all three chunks of zeros and ones are linked together. So if you think back to when I tried compiling the code without -Lcs50, there was some mention of linker linking just means the computer did not know how to link your code with CS50's code because we were missing LCS50 which tells the compiler to go find it somewhere on the hard drive. And the final step then of linking is to combine all of those zeros and ones into one bigger blob of zeros and ones. And that's what's inside your hello program that you can execute. So long story short, these four steps are what's been happening ever since the start of last week. Pre-processing, compiling, assembly, and linking. But thankfully, the world of programmers generally just treats all four of these steps as what we know now as compiling. It's just a lot easier to say compile and not worry about those lower level details. But that might reveal better to you what all of these error messages mean when you see hints of this kind of terminology questions on any and all of that from here on out. We're going to go higher level than lower. Yeah. I I I don't get the part with the like when we're talking about com um when I think it's the assembly process when you basically convert it to zeros and ones. >> Um doesn't like across the multiple like the three different ones. Don't the zeros and one signify different things like one signify text and the other signify something else. How does the computer know like what part what 8 bit corresponds to which part? >> Really good question. How does the computer know which of those zeros and ones corresponds to data like numbers or strings of text or actual commands? We're going to come back to that in week four of the class. But long story short, what we just saw on the screen is a big blob of zeros and ones actually follow some pattern where the bits up top represent a certain functionality. The bits on the bottom represent something else and they're organized into patterns. So, long story short, we'll come back to that, but they follow conventions. It's not just a hot mess of like zeros and ones. >> Other questions? >> So, Preprocessing step is just replacing the hashtag. >> Correct. The pre-processing step goes into the header file and essentially copies and paste the contents of it into your own code so you don't have to waste time doing that manually yourself. Other questions? >> Just curiosity when you're talking about the compiling step um how it converts it to assembly code and you're saying that the CPU understands all those commands. Is the CPU then converting that into Uh no the so when you compile your code you're going from the uh assembly code to the zeros and ones that sorry uh when you compile let me pull up the the chart again when you compile your code you're going from the C code to the assembly code and the patterns you get when you see the assembly code are specific to a certain CPU. So long story short, if you're designing software for iPhones or for Android devices or Macs or PCs, you're going to necessarily use a different compiler because given the same C code, you will get different assembly instructions in the output. And this is why you can't just take back in the day like a CD containing a program from a Mac and run it on a PC or vice versa because it's the wrong patterns of instructions. But the reason why we have all of these annoying layers of complexity is because one, four different people can now implement the notion of compiling. Someone can implement the pre-processor, someone can implement the compiler, the assembler, the linker, and you can actually collaborate by breaking things down into these quantized steps. But also you can do this step, this step, and then two different people can write compilers to actually write uh to output assembly code for like iPhones over here and Android devices over here. But all of us can still enjoy using the same language up here. So there's a lot of reasons for this complexity. Just understanding it is useful, but you're not going to need to use this sort of knowledge day today, but it's what enables so much of today's complexity nonetheless. All right, so a bit of a flourish now as to what we've been doing with compiling. Well, compiling is going ultimately from source code to machine code. Couldn't you just kind of reverse the process, right? If someone wrote really interesting software like Microsoft Word or Excel or something like that, well, when I buy it or download it, like I literally have a copy of all of those zeros and ones, couldn't I just kind of reverse this process and reverse engineer someone else's code by decompiling it? And this is genuinely a threat. And this comes up in matters of law and intellectual property because the zeros and ones have to be accessible to you and to your computer. So, it's not a great feeling if someone with enough time and enough savvy could sort of reinvent Microsoft Word by just figuring out what all those zeros and ones mean. However, it's sort of easier said than done to reverse engineer code from these zeros and ones. For instance, this pattern of bits on the screen here did what did we say last week? Silly. No normal person should be able to answer this, but I did say it before. These zeros and ones print what? >> It just prints out hello world. And I cannot glance at that and figure it out like off the top of my head. But if I know what architecture, what CPU this code has been compiled into and I pay attention in week four and know what the various layout of the zeros and ones are, I could painstakingly figure out what each of those patterns of zeros and one means by breaking them into chunks of 8 or 16 or 32 or 64, which are common units of measure that I alluded to last week. Now, that's going to take a crazy amount of time. And the sort of pre presumption is that if you are smart enough and capable enough and have enough free time to do that, it would probably take you less time to just implement Microsoft Word the normal way and just rebuild the software. It's going to take you more time to go in reverse than it would in the so-called forward direction. But there's other subtleties as well. Inside of this code is not only commands like print, functions like printf, but suppose that it contained a loop for instance to print meow meow meow. Well, we know already that you can use a for loop sometimes or you can use a while loop, but they're functionally equivalent. It's sort of a stylistic decision which one you use, whichever one you're more comfortable with, or maybe feels a little better designed, but you can't figure out from the zeros and ones whether or not it was a while loop or a for loop, because it just results in the same pattern of zeros and ones. It's just a programmer's choice. Which is to say, you can't even perfectly reverse engineer everything because it's not going to be obvious from the zeros and ones what the source code originally looked like. But again the bigger deal breaker is if you have that much time and energy and savvy just like reimplement Microsoft Word itself don't try to reverse the whole process which is going to be much more painstaking and timeconuming instead. Now this is not true for all languages and just as a teaser in a few weeks time when we talk about web programming and another language called JavaScript it turns out that JavaScript source code is actually sent from web servers to web browsers and you can look at the source code of any website on the internet harvard.edu edu, facebook.com, gmail.com, it's going to be there. So, not all languages, it turns out, are even compiled. Typically, sometimes the source code is just executed by the underlying computer. So, we're just scratching the surface of some of the implications of all this. In a little bit time, let's take a look further under the hood at the actual memory, solve some other problems, but I think it's now time for cheese it. So, let's go ahead and take a 10-minute break. Uh, snacks are now served. See you in 10. All right, we are back. And up until now when we've been writing code, recall that we have to specify like what type of value you want to put in a variable. Like that's why I had to go in and add string before the word name in my first bug today. But it turns out C, as we've kind of seen already, has a whole bunch of these data types. Um, I rattled these off last week. Bool, int, long, float, double, char, string. But we'll consider for a moment just how much space each of these things takes up and see if we can't help you see what the debugger was seeing earlier. That is what is where in memory. So, a bull, it turns out, actually takes up one bite, which is kind of stupid because technically a bool, true or false, really only needs one bit. It just turns out that it's more efficient and easier to just use a whole bite, eight bits, even though seven of them are effectively unused. So, a bool will take up one bite, even though it's just true and false. An int recall uses four bytes. So, if you want to count really high with an int, the highest you can go is roughly 4 billion, we've claimed, unless you want to represent negative numbers, in which case the highest is like 2 billion. because if you want to be able to count all the way down to negative two billion, you got to kind of split the difference. A long meanwhile is twice that. It uses eight bytes which is roughly nine quadrillion possibilities which is quite a few more than 4 billion. Um that is if you want to include negative numbers as well. Then we had floats which were real numbers with decimal points which speak to just how precise you can be with significant digits. A float is four bytes by default, but a double gives you twice as many bits to play with, which gets you get lets you be more precise. Even though at the end of the day, whether you're using floats or doubles, floating point imprecision, as we've seen, is a fundamental problem for scientific, financial, and other types of computing where precision is ever so important. A char meanwhile, at least as we've seen it, is a single bite using asy characters specifically. And then string I'll put as a question mark because a string totally depends on its length. If you're storing high, that's like one, two bytes. If you're storing hello, that's like five bytes and so forth. So, strings depend on how many characters you actually want to store inside of them. So, where does this go? Well, here is a picture of a a stick of memory uh a a dim so to speak, whereby on this uh stick of memory, which is slid into your computer, your laptop, your desktop, or some other device, there's all these little black chips that essentially contain lots of room for zeros and ones. it's somehow electronic, but inside of there are all of the zeros and ones that we can uh store data in. So, if we kind of zoom in on this, it stands to reason that for the sake of discussion, if this one chip represents like one gigabyte, 1 billion bytes, it stands to reason that we could slap some addresses on these bytes whereby we could say this is the first bite and this is the last bite or more precisely this is by 0 1 2 3 dot dot dot bite 1 billion. And it doesn't matter if it's top, down, left, right, or uh any other order. We're just talking about this conceptually at the moment. So in fact, let's go ahead and draw this really as a grid of memory, a sort of canvas that we can just use to store types of data like bools and ints and chars and floats and everything else. If we are going to use one bite to store like a char, well, you might use just these eight bits up here, one bite up here. If you want to store an int, well that's four. You might use all four of these bytes necessarily contiguous. You can't just choose random bits all over the place. When you have a four byte value like an int, they're all going to be contiguous back to back to back in memory like this. But if you got a long or a double, you might use eight bytes instead. So truly, when you store a value in memory, whether it's a little number or a big number, all you're doing is using some of the zeros and ones physically in the computer's hardware somewhere and letting it permute them, turn them on and off to represent that value you're trying to store. All right, so let's go ahead and abstract away from the hardware though and let's just start to think of this grid of memory uh sort of in zoomed in form and consider more at a lower level what is actually being stored inside of here. For instance, suppose that we've got some code like this containing three scores on like problem sets. You got a 72 on one of them, a 73 on another, and a 33 on the third. I've deliberately chosen our old friends 72 73 33 which recall spell high or together in the context of colors is like a shade of yellow just so that we're not adding some new random numbers to the mix. These are our old friends three integers. Well, let's use these in a program. Let me go over to VS Code here and let me create with code a program called scores.c. That's just going to let me quickly calculate my average score on my problem sets. I'm going to go ahead and include as we often do standard io.h at the top. I'm going to do int main void after that. And then inside of my curly braces, I'm going to do exactly those sample lines of code. My first score uh was let's say a 72, my second score was 73, and my third score was 33. So I've declared three variables, one for each of my problem set scores. Now let's calculate the average. So print f quote unquote average colon just so I know what I'm printing. And now I'm going to go ahead and use maybe percent uh i back slashn. And then what I'm going to pass in is a bit of math. So to compute an average, it's just score 1 plus score 2 plus score 3 divided by three. And I put the scores the numerator in parenthesis just like in grade school like I need to do that operation first before doing the division. So just like math class semicolon at the end to finish my thought. Let's see how this goes. Make scores. enter dot slashcores and it would seem that my average across these three problem sets is 72 which I which is great but I don't think that's actually what I want here. What have I done wrong? It's unintentional. Yeah. >> Yeah. I'm kind of being a little generous with myself here. I didn't really factor in my worst score. So that was accidental. So now let me do this correctly. make scores dot slashscores and now okay my average is 59 but I I beg to differ I'd like to quibble my score technically I think mathematically should really be 59 and a3 I'm kind of being cheated those that third of a point so what's going on here why am I only seeing 59 and not my full grade >> you're using so it's going to >> perfect because I'm using integers when I divide by three it's going to truncate everything after the decimal point which we touched on at the very end of week one, which is an issue with just truncation in general. So, one approach to fix this, I could change my percent I to percent F, which is the format code, it turns out, for a float, and that is what I want to print. So, let's see if that fix alone is enough. Make scores. Oops, it's not. I got ahead of myself there. And let me scroll up to the error. Format specifies double, but the argument has type int. Turns out you can use percent f for doubles as well. So, that's why I'm saying double, even though I intended a float in this case. So, there's a problem here. I the argument has type int even though I'm passing in percent f. You're seeing mention of percent d here which is an alternative to percent i. We typically encourage you to use percent i because i for integer but there is uh that is not the solution to this problem because I want my third of a point back. So how could I go about fixing this? Well the fundamental problem here is that I'm trying to format an integer as a float or even as a double. Well I need to convert these scores to floats instead. So, I could go in and change this to float, this to float, this to float, and heck, just to be super precise, I could add a 0 on the end of each of them just to make super clear these are floats. But there's another way. I could, for instance, uh, simply convert my denominator to 3.0 because it turns out so long as you involve like one float in your math, the whole thing is going to get promoted, so to speak, to floating point values instead of integers. I don't have to convert all of them. So I think now if I do make scores dot slashscores now ah there's my third of a percent uh the third of a point back. There's another way to do this just as an aside and we'll see this again down the line if you really want to stick with three cuz it's a little weird just semantically to divide by 3.0 like that's an implementation detail but you're truly computing an average of three things. You can technically cast the three to a float in parenthesis. You can specify the data type that you want to convert another data type to. And this too should make the compiler happy. Aha. Dot /cores. I get roughly the same answer. We're seeing some floatingoint imprecision though nonetheless. But that too would achieve the goal here. But short that's all just a function of um floating point arithmetic there. So what's going on now actually in the computer's memory? Let me revert back to the simpler one with just 0 there. And let me propose that we consider where these three things are in memory. Well, if we treat this as my grid or canvas of memory, who knows where they're going to end up? But for the sake of discussion, let's assume that 72 ended up in the top left of my computer's memory. I've drawn it to scale, so to speak, and that this score one variable is clearly taking up four bytes of memory, and it's an int. And that's typically how many bytes are used on systems. Technically, it depends on the exact system you're using, but nowadays it's pretty reasonable to assume that an integer will be 32 bits on most modern systems. Score 2 is probably over there. Score 3 is probably over there. So, I'm using 12 bytes total, four bytes for each of these values. All right, so that's really all that's going on underneath the hood. I don't have to worry about this. The compiler essentially figured out for me where to put all of these things in memory. But what really is in memory? Well, technically each of these variables if it's used if it's composed of 32 bits is really just a pattern of literally 32 zeros and ones. And I figured out the pattern here. I crammed them all into the space there. But you see here three patterns of 32 bits which collectively compose those numbers there. But let's consider design now in terms of my code. This gets the job done. It's not that bad or big of a deal for just calculating the average of three scores. But this should also start to rub you the wrong way. this week onward when it comes to design like this is correct especially now that I uh clamorred back my third of a point but this is bad design using the variables in this way why might you think yeah >> you're going to have to type in each score manually assign variable individually >> yeah I'm going to have to type in each score manually with each passing week when I get the fourth problem set and the fifth I mean surely people who came before us came up with a better way to solve this problem than like manually create 10 variables, 20 variables, whatever it is by the end of the semester. It just feels a little sloppy. And indeed, that's often the the way to think about the quality of something that's designed. Think about the extreme. If you don't have three scores, but 30 or 300, is this really going to be the best way to do it? And if you feel like, no, no, there's got to be a better way, odds are there are. Certainly, if the language itself is well designed, so let's consider how else we might go about solving this. Well, it turns out we can treat our canvas of memory, that grid of bytes you into uh chunks of memory known as arrays. An array is a chunk of contiguous memory back to back to back whereby if you want to store three things, you ask the computer for a chunk of memory for three things. If you want 30, you ask for one chunk of size 30. If you want even more, you ask for a chunk of size 300. Chunk is not a term of art. I'm just using it to colloqually explain what an array actually is. It's a chunk or a block of memory that is back to back to back to back. So what does this mean in practice? Well, it means that we can introduce a little bit of new syntax in C. If I want to create one variable instead of three and certainly one variable instead of 30, I can use syntax like this. Hey compiler, give me a variable called scores plural. Give me room for three integers therein. So, it's a little bit of a weird syntax, but you specify the type of all of the values in the array. You specify the name of the array, scores in this case, and I pluralized it just semantically because it makes more sense than calling it score now. And then in square brackets, so to speak, you specify how many integers you want to put into that chunk of memory. So, this one line of code now will essentially give me 12 bytes automatically, but they'll all be referable by the name scores plural. So, let's go ahead and weave this into some code as follows. Let me go back to VS Code here, clear my terminal, and now let's just whip up the same kind of program, but get rid of these three independent variables. And instead, let's go ahead and just say int scores plural bracket three. Now, I need a way to initialize the three values. But this I can do too. It turns out that if I want to put three values in this, I just need slightly new syntax. I can say scores bracket 0 equals 2 72 scores bracket 1 equals 73 scores bracket 2 equals 33 so it's not all that different from having three variables but now I technically have one variable and I am indexing into it at different locations location 0 1 and two and it's zero because we always in computing start counting from zero so I do scores bracket zero is going to be my 72 problem set scores bracket one is my 73 problem set and scores bracket two was my weakest my uh 33 P sets. Now my syntax down here has to change because there are no more score one, score two, score three variables, but there are scores bracket zero plus scores bracket one plus. And notice what VS Code is trying to do for me. It's saving me some keystrokes. As I type in scores and type one single bracket, notice it finishes my thought for me and magically puts the cursor where I want it so I can put the two right there and generally save on keystrokes. But that has nothing to do with C. just has to do with VS Code trying to be now helpful. So I think now if I go down here and do make scores dot slashcores, we get the same answer, but it's arguably better designed because I now have one variable instead of three, let alone many more. And in fact, if I wanted to change the total number of scores, I can just change what's in that initial square bracket. So if we consider what's going on now, if we look at the computer's memory, it's the same exact layout, but there's no more three variable names. There's one scores bracket zero, scores bracket one, and scores bracket two. And notice here, ever more important, an array's values are indeed contiguous back to back to back. Now, the screen is only so wide. So, they kind of wrap around to the next row of bytes, but the computer has no notion of up, down, left, right. I mean, it's just a piece of hardware that's got lots of available that can be addressed from the first bite all the way down to the last bite. The wrapping is just a visual artifact on this here screen. All right. So if I've done this now, maybe we can make this program a little more dynamic than just hard- coding in my scores. Let me go in and add the CS50 header library so that we could also use for instance like get int and start getting these scores dynamically. So I could do get int and I could prompt the user for a score. I could use get int again and I can prompt the user for another pet set score. I can use get int a third time and prompt the user for a third such score. And then pretty much the rest of my code can stay the same. Let's do make scores again. Dot slashcores 72 73 33. And now my program's a little more interactive. Like this doesn't work for just my three scores. It could work for anyone scores in the class. Now this too hints of bad design. I like my introduction of the array because I now have one variable instead of three. But what now might rub you the wrong way among lines n 7, 8, and nine? Yeahive. >> It's repetitive. I mean, I typed it manually, but I might as well have just copied and pasted like literally the same thing. So, what's a candidate for fixing this? Like, what programming construct might clean this up? Yeah, >> yeah, we could use a for loop or a while loop or whatever, but a for loop would get the job done. And that's often my go-to. So, let's do that instead. Let's go under my declaration of the array and do four int i= 0, i less than 3, i ++, which we keep seeing again and again. Uh, now how do I index into the array at the right location? Well, here's where the square brackets are kind of powerful. I can just say my scores array at the location I should get an int from the user as follows. So now I'm using get int once inside of a loop, but because I keeps getting incremented as we've done many a time now for meowing and other goals, I'm putting the first one at location zero. Why? Because I is initialized to zero. I'm putting the second one at location one. Why? Because I'm going to plus+ or increment I on the next iteration, then the next iteration. So, this has the ultimate effect of putting these three scores at location zero, one, and two instead of me having to type all of that out manually. Now, I don't love how I've done this still. If we really want to nitpick, this solves the problem correctly, but it's kind of got a poor design decision still. It's got a a magic number as people say. What is the magic number here and why is it bad? Yeah, over here. >> Yeah, it was a little soft, but I think the number three is hardcoded in two places. We've got it on line six, which is the size of the array, and then again on line seven, which is how many times I want to iterate. But those are the exact same concepts, but it's on the honor system that I type the number three correctly both times. So, I think we can fix this a little better. I could do something like int n equals 3 and then I could use n here and then I could use n here so that now I only change it in one place. If your eyes are wandering to the bottom of the program, there's still a problem here because I've still hardcoded 0, one, and two, but we'll come back to that. But this is arguably a little better. But let's talk a little bit about style. Typically when you have a con when uh typically when you've got a a variable that should not change its value we saw last week that we should declare it as constant and the trick there is to literally just write const for short in front of the type of the variable and now it should not be changeable by you by a colleague a collaborator or the like but typically too by convention stylistically to make visually clear to another programmer that this is a constant it's convention also to capitalize constants so to actually use like a capital N here in all places just to make clear visually that there's something interesting about this variable and indeed it is a constant that cannot be changed. All right, with that refinement, I don't think we've really improved the program fundamentally. I think we're going to need to do a bit more work to do this really well. So, I'm going to do this a little quickly, but mostly to make the point that we can make this indeed more dynamic. So, let me hide my terminal window there. Let me go ahead now and get the scores as I already am as follows here. And let me go ahead and uh assume for the sake of time that we have a function that exists already called average and I simply want to pass in to that average function the scores whose average I want to calculate. So average does not exist off the shelf like I can't just use an existing library for it. I'm going to have to implement this thing myself. But how? All right. Well, let's go ahead and do this. At the top of my file, I'm going to go ahead and compute or define a function called average uh that takes in what? An array of numbers. So, this syntax is going to be a bit new, but the way I do this is int say array bracket zero or array sounds a little too generic. Let's just call it numbers for instance here. So that says my average function is going to take as an argument an array of numbers. This average function though should return a value too. And it should return what type of value from what we've seen thus far? A number, a float specifically. It could be int. But then I'm going to get short changed my third of a point potentially. So I think I wanted to return a float. Or if you really want precision, you could return a double just to be really nitpicky. But that seems excessive here. All right. Well, now inside of my average function, how can I calculate the average? Well, this is just kind of like a math thing. So, I could declare a variable called sum and set it equal to zero. I could then have a for loop inside of this function for int i gets zero, i less than, huh? Uh, I'm going to come back to this the number of numbers in the array. And then I'm going to do i ++. And then on each iteration, I'm going to do sum equals whatever the current sum is plus whatever is in the numbers array at that location. So I'm going a little quickly, but again, I'm just applying the same lesson learned. Numbers is my array. Numbers bracket i means go to the i location in there. But if my loop starts at zero, that means go to location zero and then one and then two. And heck, if there's more scores in this array, it's just going to keep going on up from there because of the plus+. But I hesitated here for a couple of reasons. So I put a to-do here, which is not a thing. That's a note to self. How far do I iterate? Well, if you've pro come into CS50 with programming before, you can usually just ask an array, aka a vector, what its length is in Java and in Python and the like. You can't do that in C. So if I want to know what the length is of this array, I've got to have the function tell me. So I'm going to additionally propose that this average function can't just take the array. It's also going to have to take another argument, a second input, for instance, called length that tells me how long it is. And then down here, which is where we started the story, when I use this so-called average function, I'm going to have to tell the average function by passing in n how many numbers are in that array, just because this is annoying that you have to pass in not only the array, but also its size separately. That's the way it's done in C. More recent languages have improved upon this. So you can just figure out what the length of the array is as we'll see in a few weeks in Python. All right, back to the average function at hand. I think we're almost there. This is a little unnecessarily verbose. Recall that we can tighten this up by just doing plus equals whatever is in numbers bracket I. That's just tightening it up. It's syntactic sugar, so to speak. And then the last thing I'm going to do in my average function is what? Actually calculate the average. So what is the average? It's just the numerator. like the sum of all of the scores divided by the total number of all of the scores. Well, I've got the sum. So, I think I just want to do sum divided by what to get the actual average now? >> Yeah. >> Exactly. Sum divided by length will give me the average because the sum is the numerator effectively all of the scores added together and the denominator is the length. How many numbers were there actually? Now, I can't just write this math expression here. If this is going to be my function's return value, and we've done this once or twice before, I literally say in my average function, return this value. So, it hands back the work. I could use print f and just print it on the screen, but I don't want that visual side effect. I want to hand it back so that on line 23, I can simply calculate the average of those n scores and let print f use it as the value of that format code percent f. All right. Unfort uh I think we are in reasonably good shape. Let me cross my fingers now and hope I didn't screw this up. Make scores. Okay. Dot slashcores. How many do we want to do? So we'll do 72 73 33. Enter. And there is Oh, so close. Average. I've had a regression. I've made the same mistake again just in a different way. I think I saw your hand go up. Why am I getting 59 and I'm not getting my third of a point? >> Yeah, I in this return line on line 11. Right now, I'm again stupidly doing integer divided by integer. That will make us suffer from integer integer truncation because if you're returning an integer, there's no room for the decimal point or any numbers thereafter. So, how do we fix this? Well, I could change the sum to float. like that would be reasonable. So then I do a float divided by the length. I could do my casting trick like convert the float the length to a float just for the sake of floating point arithmetic. There's a bunch of ways to solve this but I think I'll go with this one. Now let me now do make scores again dot/score 72 73 33 and now I've got albeit with some imprecision I think enough precision certainly for like a college grade in this case 59.33 and so forth. Okay. So what are the things to actually care about here? So there's a decent amount of code here. Most of it is sort of stuff we've seen before, but the interesting parts I would propose are this. When you create your own function that takes an array as input, you have to take as input the length of the array. You're not going to be able to figure it out correctly. As in mo newer languages, you also need, of course, to pass in the array itself. How do you pass in an array? Well, when you're defining the function, you specify the type of values in the array. whatever you want to name the array inside of this function and then you use empty square brackets like this. You don't have to put n or some other number there. All you need to tell the compiler is that my average function is going to take some array of values specifically this many. You don't put it inside the square brackets there. Then when I use it now it's just the now familiar syntax when you want to index into your array that is go to location zero or one or two you just use square bracket notation here. But the array itself, recall, was actually created in Maine when I did this line of code here where I said, give me an array called scores, each of whose values is going to be an int, and I want this many of them. And so maybe the final flourish that I'll add here, just to be sort of nitpicky, is I keep saying that main should really go at the top. Fine, no big deal. Let me highlight my average function, move it to the bottom of my file just because, and then and only then I'll copy and paste that first line, the so-called prototype, so that Clang doesn't freak out by not knowing what the average function is. So in short, there's seemingly a bunch of complexity here, but all we're the only thing that's really new in this one example is this is how you pass to a function an array that already exists elsewhere, not by its name, but by with the square brackets there. Okay, questions on arrays or any of this new syntax? Yeah, >> a bit slow, but back when you did the whole like average thing, >> okay, >> you said that we could store it as a float >> and instead of saying 3.0 was a float, you just said because 3.0 is a float. How does it know it's not a double? >> Oh, uh, how does it know it's not a double? So, by default, if you just type a number like 3.0 zero into your code, it will be assumed to be a double just because um raw values, literal numbers with a decimal point will be treated by the compiler as doubles and be allocated 64 bits. >> So how come you still do percentage? >> Uh uh just because like the world did not need to create a new format code like percent D is not double percent D is decimal integer but don't worry about that. We tend not to talk about it too much in class. Percent I is integer. Percent F is float. But percent F is also double. And this is not consistent because what's a long percent L L I. What did I say last week? Percent LI gives you a long integer. It's just a mess. That's there's no good reason for this other than historical baggage. >> Thank you. >> Sure. I'm not sure if that's reassuring, but All right. So, um Okay. Let's use these this knowledge for like something useful now and actually tease apart what is uh how we can use these um these skills for good and to better understand what's going on inside of the computer as follows. Let me go over to our grid of memory and this time let's not store some numbers but let's store like these three lines of code these three variables. So three chars even though we you know where this is going like this is not good design because I got three stupidly named variables C1 C2 C3 but let's make a point first. The first variable's value is quote unquote H. Second is I. Third is exclamation point. Why though am I using single quotes suddenly instead of double quotes? >> It's a character. Chars are single quotes. Strings are double quotes. And we'll see the distinction why in a moment. So for instance, if this is my grid of memory and this program contains just three variables, each of them a char. Odds are they'll end up like this in memory. C1, C2, C3, HI, exclamation point. Assuming there's nothing else going on in my program, they're just going to end up being back to back to back in this way. even though it might not uh in in this way. So what does this really mean is going on? Well, let's go ahead and poke around. Let me go back to VS Code here. Let's close scores.c reopen my terminal and let's create a new program called high C and just do something playful. So let me include standard io.h at the top. Let me do int main void after that. And inside of my curly braces, let's just repeat this. C1 equals H in caps. Char C2 equals I in caps. and then char C3 equals exclamation point in cap uh in exclamation point. That's all. Now, let's actually poke around and see what's inside the computer's memory. So, I could do something like this. I could print f for instance, percent c percent back slashn and percent c turns out means character. So, what do I want to plug in? C1, C2, and C3 semicolon. So, let's go ahead and do this. Make high. enter dot /h high and voila, there's my hi exclamation point. There's no magic here. Like I'm literally just printing out three char variables. I can I don't need the spaces. If I want to get rid of those spaces between the word, I can remake this. Make high dot /h high. And now we're back in business. hi exclamation point. But here's where an understanding of types can give you a bit of power and sort of satiate some curiosity. What if I change my percent C to percent I? percent I percent i. So int int int. Well, turns out that a char is really just a number because it's an asky value from 0 to 255. So there's nothing stopping me from telling the compiler, don't print these as chars, print them as integers. So let's do make high dot /h high. Enter. And that's a little cryptic. It looks like it's saying 727,333, but no, let me add those spaces back in between each of those placeholders. make high again dot /hi there are our old friends 72 73 33 it is not necessary in this case to say int int int because the compiler is smart enough and print f is smart enough that if you hand it a value that happens to be a char it knows already it's going to be an integer essentially so you don't even need to bother explicitly casting it this way we're essentially implicitly casting it to an integer by using those format codes as such. All right, so that just proves that what I've claimed is the case, that there is this equivalence between characters and numbers is actually the case inside of the computer's memory. So even though you're storing hi exclamation point, technically you're storing three patterns of eight bits each that give you these decimal numbers 72, 73, and 33 or specifically these patterns here. All right, then what is a string? And this is where things get a little more interesting. string as we've used it is like a whole word or a phrase or when we started class today like a whole paragraph of text. So that's multiple values. Now why is that interesting for us potentially? Well, let's go ahead and write one line of code as a string. So here for instance is one line of code with a string. Let's go ahead and put that into my program. So I'm going to go back to VS Code here and clear my terminal. And I'm going to go ahead and delete all of this code here for a moment. And I'm going to do something like this. String s equals quote unquote high with excl uh with double quotes now. And now just like in week one, I'm going to print out percent s back slashn and print out the value of s per earlier because string is technically one of our training wheels for just a few weeks. I'm going to additionally include cs50.h at the top so that the compiler knows about what this word is string. All right, let's go into the terminal. make high dot /h high enter and we're back in business printing that out now as an entire string. Well, what's going on inside of the computer's memory this time? Well, I still have hi exclamation point, but it's a string now. Well, it turns out the way that's going to be laid out in the computer's memory is exactly like before. There's no mention of C1, C2, C3 because those variables don't exist. There's just one variable S, but it's referring to three bytes of memory, it would seem. hi exclamation point. And you can kind of see where this is going. Like a string, as a spoiler, turns out is actually just what an array. >> It's just going to be an array of characters. Hence the the dots we're trying to connect today. So at the moment though, this is a single variable s a string. The value of which is hi exclamation point. But you know what? If it is in fact an array, I bet we can start playing around with our new square bracket notation and see as much in our actual code. So in fact, let me go ahead and do this in VS Code. Now let's not use percent S. Let's use percent C, percent C, and percent C three times. Then instead of just S, let's print it out like it is an array. S bracket zero, S bracket 1, S bracket 2. Let's go back to VS Code. Uh my terminal in VS Code, make high dot slhigh. and nothing has changed, but I'm printing it out now one character at a time because I understand what's going on underneath the hood. In this case, I can actually see these values. Now, let's go ahead and change the percent C to percent I and add a space just so it's easier to read. Percent i space percent i space. I don't need my casts in parenthesis because print f is smart enough to do this for me. Make high again dot /h high. There again is my 72 733. However, that came from the mere fact that I put in double quotes hi exclamation point. So, what's really happening here is it seems that a string is indeed just an array of characters. But how does the computer know when doing percent s know what to actually print? In other words, it stands to reason that eventually if I've got more variables, more code, there's going to be other stuff in the computer's memory. Why does print f know when using percent s to stop here and not just keep printing characters that are over here? Especially if I did have more variables and more stuff in memory. Well, let's take a look at what's just past the end of this array. Let's go back to VS Code. And now let's get a little crazy and add in a fourth percent I. And even though this shouldn't exist, let's do S bracket three, which even though it's the number three, it's the fourth location, but hi exclamation point is only three values. So, let's look one location past the end of this array. Make high dot slashh high. Interesting. It seems, and maybe it's just luck, good or bad, that the fourth bite in the computer's memory seems to be a zero. Well, that's actually very much by design. And it turns out if we look a little further by convention what the compiler will do for us automatically is terminate that is end any string we put in double quotes with a pattern of 8 zero bits. More succinctly it's just the number zero because if you do out the math you've got eight zeros it gives you zero in decimal or more technically the way it's typically written is this because it's not like the number zero that we want to see on the screen. back slashz0 similar to back slashn is sort of a special escape character. This just means literally 8 zero bits not the number zero that you might see in a phone number or something like that. So even though we said string s equals quote unquote high with an exclamation point seemingly three characters, how many bytes does a string of length three actually seem to take up in memory? It's actually going to be four. Then this happens automatically. That's what the double quotes are doing for you. They're telling the compiler, "This is not just a single character. This is a sequence of characters. Please be sure to terminate it for me automatically with a special pattern of 8 bits." And that special pattern of 8 zits actually has a name. It's the so-called null character or null for short. The null character is just a bite of zero bits and it represents the end of a string. You've actually seen it before if super briefly two weeks ago. Here was our ASKI chart and we focused mostly on like this column here and this column here and then we looked at the exclamation point over here. But all this time over here asky character zero is null n which just means that's how you pronounce all eight zero bits. It's been there this whole time. So why is it done this way? Well, how is the computer actually printing something out in memory? Well, it needs to know where to stop. Print F is pretty stupid. Odds are inside of print f there's just a loop that starts printing the first character, the next character, the next character, and it's looking for the end of the string. Why? Well, consider what might happen. Suppose you've got a program that has not just one string, but two. For instance, two strings like this. So, in fact, let me go back to VS Code here, clear my terminal, and let's just make this program a little more interesting for a moment. String t equals quote unquote by, for instance. And then down here, let's do two print fs. percent s back slashn and print out s print f percent s back slashn print out t. Now to be clear, percent s means string placeholder. T and s are just also the names of the variables. There's no percent t that we want to use here. All right, let me go down to my terminal make high and voila, I get high and by just like you would have expected last week. But what's going on inside of the computer's memory? Well, in so far I asked I have asked it to create two variables s and t like this. Odds are what's happening in the computer's memory is high is ending up here aka s t because there's nothing else in this program is probably going to end up here b exclamation point but it wraps on this particular screen. T is taking up 1 2 3 4 five bytes total just as high is taking up four bytes total because the compiler is automatically adding for me the back slashzero the null character to make clear to other functions where this string ends. So what does this mean in real terms and why is it zero? Well, why is it zero? Like h just because like at the end of the day all we have is bits. We've got eight bits to work with for chars. You got to pick some pattern. We could have chosen all ones. We could have chosen all zeros. We could have chosen something arbitrary. A bunch of humans in a room years ago decided eight zeros will mean the null character. That's the special character we will use to terminate strings in this way. Well, what does that mean with our new syntax? Well, it means we could poke around with strings as well. So, even though that first variable is S and that second one is T, you could technically poke around and access S brackets 0 and 1 and 2 and 3. t bracket 0 1 2 3 and four and so forth. So, in fact, if I wanted to dive in deeply there and actually see that, well, let me go ahead and do this. Uh, back in VS Code here, let me make a refinement here. I've now got, uh, my two strings here. Um, I could go and, for instance, down here, just like before, percent C, percent C, percent C, percent C, percent C, percent C, percent C. And if I then do s bracket zero, uh, s bracket 1, s bracket 2, whoops, two, and then down here, t bracket zero, t bracket 1, t bracket 2, t bracket three, and I'm doing that only because the word by is longer than the word high. If I do make high, same principles work even in this context here. But let's add an interesting twist just because if I have these values in memory here uh as follows. Well, it's kind if I've got two words in memory, I could use them in an array too. Instead of having like s and t or word one and word two, I can actually put strings in an array, too. So, let's go ahead and do this. Let me go back to VS Code. And just for fun now, let's go ahead and do this. Give me an array called words that's going to fit two strings. Then in the first words, words bracket zero, put hi. Then in words bracket one, put by. The only thing new here is that I'm making an array of strings now instead of an array of ins. But all of the syntax is exactly the same. How can I go about printing these things? Well, just as before, I can do print f percent s back slashn and print out words bracket zero. Then I can do print f quote unquote s back slashn words bracket one. And again, I'm just sort of applying the same simple syntax that we saw before. SLHigh again of the sixth version of this program, right? I'm just sort of jumping through syntactically to demonstrate that these are just different lenses through which to look at the exact same idea. And while a normal person would not do this, we could think about what's really going on in memory with arrays of words when those words themselves are arrays of characters. because a word is just a string. So this code here gives us something like this in memory in that program a moment ago. This is words bracket zero. This is words bracket one. The only thing that's different is I'm not calling them sn. I've given them one name with two locations 0 and one. Well, if each of these values is itself a string, well, you said earlier that a string is just an array. So we can actually think of these two strings even though the syntax is getting a little crazy using two sets of square bracket notation where I can index into my array of words and then index into the individual letters of that word by just using more square brackets. And again, this is just to demonstrate a point, not because a normal person would do this. But if I go back to VS Code, instead of printing out these two strings, why don't I do something like this? Print f quote unquote percent C percent C percent C back slashn. Then let's print out the first word, but the first character therein. Let's print out the first word, but the second character therein, the first word, but the third character therein. And even though I'm saying third and second and first, it's 2, 1, and zero respectively because we start counting at zero. And then lastly here, we can print out the second word. Percent C, percent C, percent C, percent C, back slashn, then words bracket. How do I get to the second word in this array? Words bracket one, the first character they're in. Words bracket one, the second character they're in. Words bracket one, the third character they're in. words bracket one the last character therein and again I'm this is just to demonstrate a point but if I do make high now dot slashh high we have full control over everything that's going on if you now do agree and understand that an array can be indexed into square bracket notation as can a string because a string is itself just an array strings are arrays for today's purposes then questions on any and all of these tricks. No. All right. Yeah. In front. >> Okay. How do you like that? >> How do you establish or create an array? Well, in the context of this program, if I go back to VS Code, line six here gives me an array of size two, an array of two strings, if you will. The previous example we were playing with, which was my scores, uh, whoops, wrong program, wrong file. If I open up scores C as before, this line here, line nine, gives me an array of n integers. So, that is what establishes or creates the array in memory. You specify a name, the size, and the type. That's all. And the only thing that's new today again is the square bracket notation, which in this context creates an array of that size. But once it exists, you can then access that chunk of memory by using square brackets as well. Other questions on arrays? Yeah, in front. all the values in the array as you declare it or do you need to go in index by index to declare? >> Good question. Do you need to go index by index to put things inside of an array? Short answer, no. So, let me open up again scores.c from before and what I could have done in an earlier version of my program would be something like this. I could have done 72 73 33. And I deliberately didn't show this because I didn't want to add too much complexity, but you can use curly braces in this new way and initialize the array in one line. And in that case, you don't even need to specify the size because the compiler is not an idiot. It can figure out that if you've got three numbers on the right, it knows that it only needs three elements on the left to put them into. But let me undo that and leave it just as I did. But short answer, yes. You can statically initialize an array if you know all of the values up front and not when using get int. All right. So, if you're on board with the idea that all a string is is an array and that array is always null terminated, we can now use that knowledge to like solve some simple problems and problems that others have already solved before us. So, let me go ahead and close that file in VS Code. Let me go ahead and open up another program here called length.c. And let's just play around with the length of strings as follows. Let me include the CS50 library at the top. Let me include standard io after that. Let me do int main void after that. And then inside of main, let's prompt the user for their name by using get string and just say name colon today. And then after that, let's go ahead and figure out the length of the person's name. Like d- avid, I should get the answer of five. And ke ly, we should get the answer of five. And hopefully for a longer or shorter name, we'll get the correct answer as well. So, how can I go about counting the number of characters in a string? Well, the string is just an array, and that array ends with the null character. There's a bunch of ways we can do this, but let me go ahead and do this. Let me create a variable called n, which eventually will contain the length of the name. And I'm going to set it equal to zero because I don't know anything yet about the length. Then, I can do this with a for loop, but I prefer this time to use a while loop. I'm gonna say the following. While the person's name at that location does not equal backs slashz0, go ahead and add one to the value of n. And then after all of this, go ahead and print out with percent i back slashn the value of n. So what's going on here? This is easier said when you know already where you want to go with it, but with practice, you too can bang this out pretty quickly. n is going to contain the length of my string. I have in my loop here a boolean expression that's just asking the question, does name at the current value of n not equal the null character? In other words, you're asking yourself, is this character null? Is this character null? Is this character null? Is this character null? And if not, you keep going. You keep going. And this is kind of a clever trick because I'm using n and incrementing it inside the loop. So when I look at d, that's not equal to back slashz. So I increment n. Now n is one. So I look at name bracket one. What's at name bracket one if it's my name? A. A does not equal back slashz0. So it increments n. What's at location two in dav ID? V. V does not equal back slashn. So we repeat with i. We repeat with d. And then we get to the end of my name which is the null character because the get string function and c put it there automatically for me. The null character does equal backs slash0. n does not get incremented any more time. So at this point in the story on line 13, n is still five because I have not counted the new the null character. So I hope I will see five on the screen. This is just kind of a very mechanical way of checking checking checking checking trying to figure out uh through inference how long the string is because it's as long as it takes to get to that back slash zero the null character. So, let's do make length. Enter dot slength. Type in my name, David. And I indeed get five. Let's go ahead and dolength Kelly. I indeed get five. And hopefully for shorter and longer names, I'm going to get the exact same thing, too. In fact, we can try a corner case. Dot slashlength. Enter. Let's not give it a name at all. If I just hit enter here, what should the length of the person's name be? Zero. Which is not incorrect. It's literally true. But that's because we're going to get back essentially quote unquote. But even though it's quote unquote in the computer's memory, it's still going to take up one bite because the get string function will still put null at the end of the string even if it's got no characters therein. So it turns out this is not something you need to do frequently like initializing a variable using a loop like this. It turns out there are better solutions to this problem. You do not need to reinvent this wheel yourself because it turns out in addition to standard io.h H and CS50.h and as you probably saw in problem set one, math.h uh and perhaps others. There are other libraries out there, namely the string library itself. In fact, if you go into the CS50 manual, you can look up the documentation for a header file called string.h, which contains declarations for that is prototypes for a whole bunch of helpful functions. In fact, the manual pages for it are at this URL here. The most important function and the one we're going to use so often for the next few weeks is wonderfully called stir lang for string length. Someone else literally decades ago wrote the code that essentially looks quite like this but packaged it up in a function that you and I can use. So we don't have to jump through these stupid hoops just to count the length of a string. We can just ask the string length function what the length of a string is. But odds are if we looked at the C code that someone wrote decades ago, it would look indeed quite like this. So how can I simplify this program? Well, I can get rid of all of this code here. I can include string.h at the top of my file. And then I quite simply could do something like this. int length equals sterling of name. That's going to put in the variable length. Actually, let's be consistent. int n equals stir length of name. And then on line nine, let's print it out. Let's try this. Make length dot slashlength David. Okay, Kelly. Okay, and no one. And zero. It seems to now be working. So this is a wheel we do not need to in reinvent. And frankly, now in a matter of design, I don't really need the variable n anymore. Recall that we can nest our functions just like we did with average before. So let me get rid of that line and just say sterling of name is actually perfectly reasonable here. All right. Well, what more can we do with this? Well, let's consider some other matters of design. Let me close out length C and let's create another program of our own called string. C in which we'll play around now with this library and others. Let me go ahead and include cs50.h. Let me go ahead and include standard io.h. Let me go ahead and include also string.h. All right, what do I want to now do? Well, in main void and inside of main, let's go ahead and write a program that prints a string character by character just to demonstrate these mechanics. So, string s equals get string and I'm going to ask the user for some input because I just want to play around with any old string. I'm going to go ahead and proactively say output here and I'm going to go ahead and uh not use a new line character there deliberately below this. Now I'm going to have a for loop, though I could use a while loop that says int i equals z, i is less than sterling lang of s, the string I just got from the human, and increment i on each iteration. And on each iteration, print out just one character in that string, specifically at s location i. And then at the very bottom of this program, let's just print a single backslash n to move the character onto a new line. Long story short, what have I done? I wrote a stupid little program that prompts the user for a string, prints the word output thereafter, and then it just prints the word that they typed in character by character by character by character until it reaches the end of the string based on the length returned by Sterling. So, let's go ahead and run this in my terminal window. I'm going to do make string dot sling and I'll type in my own name of before. This was a subtlety. I deliberately wrote two spaces here because I just um to be nitpicky, I wanted input and output to line up perfectly. So you can see what's happening. Indeed, if I do enter here, now I see input is David. The output is David as well. So that was just a formatting trick that I foresaw. Why is this program correct but not arguably well-designed? It's pretty good in that it's using the Sterling function. I didn't reinvent the wheel unnecessarily, but there's an inefficiency that's kind of subtle. And it relates to how a for loop works. Any thoughts? This program I claim is doing unnecessary work somewhere. Yeah. >> Why do you have to character? >> Okay, that's definitely stupid. Um, you don't have to output a character by character. That's just my pedagogical decision here. So, correct, but not the question we're fishing for. There's a second stupid thing. Yeah. >> Yes. Every time through this loop, and this isn't so much my conscious choice, but my mistake. I'm checking the length of S again and again. Why? Because recall how a for loop works. The initialization happens once at the very beginning. Then you check the boolean expression. Then if it's true, you do the code. Then you do the update. Then you check the boolean expression. Then you do the code. update boolean expression you do the code but every time you evaluate this boolean expression you're asking does ah is i less than the ster length of s but this is a function call like you are literally using sterling again and again and again and like a crazy person you're asking the computer what's the length of s what's the length of s what's the length of s it's not going to change it's going to be the same no matter what so how can we fix this well I could solve this in a couple of ways like I could for instance down here do int n equals stir lang of s and store it in a variable n and just do that. I think that eliminates the inefficiency because now I calculate the length of s once. It's not going to change nor is my variable. So I can now use and reuse that variable. It's just saving me a little bit of time, you know, microsconds maybe. But when you're writing bigger programs and you're doing things in loops, if that loop is running not three times or five, but a million times, uh, millions of times, all of those microsconds, milliseconds might very well add up. But it turns out there's some syntactic tricks we can do too. I alluded to this earlier. If you want to initialize not one variable but two, you can actually do it all before the first semicolon like that. So now on line 9, I'm declaring a variable called i and setting equal to zero. And I'm declaring a second variable called n, also the same type, int, and setting it equal to the length of s. And now I can use that again and again. Now, as an aside, this is a little bit of a white lie because smart compilers nowadays are so advanced that they will notice that you're calling Sterling again and again inside of a loop and they will just fix this for you unbeknownst to you. But it's representative of a class of problems that you should be able to spot with your own human eyes and avoid altogether so that you don't waste more time and more compute and more money in some sense than you might otherwise need to in this case. Any questions on that there? Optimization. Yeah, >> you do not say int. Again, the constraint is that you have to use the same data type for all of your initialization. So, you better hope that you only want ins otherwise you got to pull it out and do what I did earlier. Good question. Others on this? Yeah. >> When does it spaces? >> When does it account for spaces? A space is just uh character asky character number 32. So there's nothing special about it. It's sort of invisible but it is there. It is treated like any other character. There's no special accounting whatsoever. The null character which is also invisible is special because print f and sterling know to look for the end of that variable the end of that value as such. All right, let's try one other demonstration of some of these ideas here. Let me go into uh a another file that we'll create called how about uppercase C. Let's write a super simple program that like uppercases a string that the human types in and see how we can do this sort of good, better, and best. So I'm going to call this file uppercase C. Inside of this file, let's use our now friends include CS50.h. Let's do include standard io.h. Let's then include lastly, how about uh string.h. And the goal here inside of main is going to be to get a string from the user. So string s equals get string. And we're going to ask the user for a before string representing what it is they typed before we uppercase everything. Then I'm going to go ahead after that and print out just as a placeholder after and two spaces just to be nitpicky so that the text lines up vertically on the screen. Now I'm going to do the following for int i= z n equals sterling lang of s semicolon i less than n just like before i ++. So I'm just kicking off a loop that's going to iterate over the string the human typed in. Now if my goal in life is to change the user's input from lowercase if indeed in lower case to uppercase let's just express that literally. If the current character in the string, so s bracket i is greater than or equal to quote unquote a and s bracket i is less than or equal to quote unquote z using single quotes. This is arguably a very clever way of expressing the question is it lowercase. We know from our ASKI chart from week zero that uh the ASKI chart has uh not only numbers representing all the uppercase letters but also numbers representing all the lowercase letters. Lowerase A for instance is 97 and they are all contiguous thereafter. So we can actually treat just like we did before chars as ins and ins as chars and sort of ask mathematical questions about these chars and say is s bracket i between a and z inclusive. So if it is lowercase and I'll add a comment here for clarity. If S bracket I is lowercase what do we want to do? We want to force it to uppercase. So this is a little trick I can do as follows. Print f the current character. But let's do some math on it. Let's change s bracket i by subtracting some value. Well might that value be? Well recall from week zero our asky chart here. And let's focus for instance on the lowercase letters here and the uppercase letters here. What's the distance between all upper and lowercase letters? It's 32, right? And the lowercase letters are bigger. So, it stands to reason if I just subtract 32 from the lowercase letter, it's going to immediately get me to the uppercase version thereof. So, this is kind of cool. So, I can actually go back to VS Code and I can literally subtract the number 32 in this case because ASKI is a standard. It's not going to change. else. If the letter is not lowercase, I'm just going to go ahead and print it out unchanged without doing any mathematics at all to it. And I'll make clear with a comment. Uh, else if not lowercase makes clear what's going on there. All right, let me go ahead and make uppercase in my terminal window. Dot sluppercase. Let's type in my name all lowercase. And I get back David. H, minor bug. Couple bugs actually. Let me fix my spacing. I think I want another space after the word after. And at the very bottom of my program, I think I want a back slashn. Now, let's rerun uh make unuppercase dot /upercase enter dab. And now it's forcing it all to uppercase. Meanwhile, if I do it once more and type in name capitalized, it's still going to force everything else to uppercase. Questions? >> You're spacing for the after. >> Oh, I'm an idiot. Okay, thank you. Yes. Uh I misspelled after otherwise my lining my alignment would have worked. So let's do this again. Make uppercase if only so that we can prove it's the same dab and all lowercase. And there we go. That was thank you the intent. All right. So it's kind of a little trick but this is kind of tedious, right? Like Microsoft Word, Google Docs all have the ability to toggle case from uppercase to lowerase or lowerase to uppercase. It's kind of annoying that you have to write this much code to achieve something so simple seemingly and so commonplace. Well, it turns out there's a better approach here, too. In addition to there being the string library, there's also the cype library in cype.h, another header file, there's a whole bunch of other functions that are useful that relate to characters uh characters uh in ASI. So, for instance, if we go ahead and use this as follows, I'm going to go ahead at the top of my file here and include now cype.h. It turns out there's going to be functions via which I can actually ask these questions myself. For instance, in this next version of the program, I don't need to do any of this clever but pretty verbose math. I can just say if the is lower function which comes from the cype library passing in s bracket i returns true, we'll then convert the letter to lower uppercase by subtracting 32. But you know I don't even need to do this mental math or math in code. I can also from the cype library use a function called to upper which takes as input a character like s bracket i and let someone else's function do the work for me. So let me go back down to my terminal window here. Let me make uppercase now dot /upercase enter before dab ID. This now works too. But if I really dig into the documentation for the cype library, you'll see that you can just use the is lower function on any character and it will very intelligently only uppercase it if it is actually lowercase. So someone else years ago wrote the conditional code that checks if it's between little A and little Z. So knowing this, and you would see that indeed in the documentation, I don't even need this else. I can instead just get rid of this whole conditional, tighten my code up significantly here and simply say print f using percent c the two upper version of that same letter and let the function itself realize if it's uppercase pass it through unchanged if it's lowercase change it first and then return it. So now if I open my terminal window again and clear it make uppercase dot slashupcase enter dav ID and we're back in business. So again, demonstrative of how if you find that coding is becoming tedious or you're solving a problem that like surely someone else has solved, odds are there is in fact a library function for whether it's from CS50 or from the standard library that you yourselves can use. Um and unlike the CS50 library, which is indeed CS50 specific, which is why Clang needed to know about -L CS50, many of these libraries just automatically work. You don't need to link in the cype library. you don't need to link in other libraries. Um, but non-standard libraries like CS50's training wheels for the first few weeks, we do need to do that. But make is configured to do all of that automatically for you. All right, in our final minutes together, let's go ahead now and reveal some of the details we've been rubbing um uh sweeping under the rug about Maine. I asked on week one that you just sort of take on faith that you got to do the void, you got to do the int, you got to do the void and all of that. Well, let's see why that actually is. So, main is special in so far as in C. It is the function that will be called automatically after you've compiled and then run your code just because not all languages standardize the name of the function, but C and C++ and Java and certain other ones do. In this case, here is the most canonical simple form of main. We know that including standard io.h H just gives us access to the prototypes for functions like print f. But what's going on with int and what's going on with void? Well, void in parenthesis here just means that main and in turn all of the programs we've written up until this moment do not take command line arguments. Literally every program we've written / a.outhello/scores dot sl everything else. I have never once typed another word after the name of our programs that we've written in class. That is because every program has void inside of these parenthesis telling the computer this program does not take command line arguments, words after the program's name. That is different from make and code and cd and other commands that you've typed with words after them their names at the prompt. But it turns out the other supported syntax for the main function in C can look like this too, which at a glance looks like kind of a mouthful, but it just means that main can take zero arguments or it can take two. If it takes two, the first is an integer and the second is an array of strings. By convention, those inputs are called arg and arg. arg is the count of arguments that are typed after the pro uh after the program's name. Arg is the argument vector aka array of actual words. In other words, now that we have the ability to use arrays, we can get zero or one or two or three or more words from users at the prompt when they run our own programs. So what do I mean by this? We can now write programs that actually have command line arguments as follows. Let me go into VS Code here and close our old program uppercase. Let's write a new simpler program here in my terminal called greet C and just greet the user in a couple of different ways. So I'm going to include initially CS50.h and then I'm going to include standard io.h here. Then I'm going to say int main void without introducing anything new just yet. I'm going to ask the user like we did last week for a return value from get string asking them what's your name as we've done so many times. Then I'm going to say print f hello percent s back slashn spitting out their answer as follows. Same program as last week again I'm going to make greet. I'm going to say /greet and I'm prompted now for my name. I hit enter. Notice that I did not take any command line arguments. The only command I ran was dot / greet no other words. Let's now use this new trick and actually let the user type their name when they're running my program rather than waste their time by using getstring and prompting them. Let me go into my editor here. Let's get rid of the CS50 library. Let's get rid of my use of get string and let's simply change void to int arg c then string argv open bracket close bracket. That's all down here. Let's simply print out argv bracket 1 for reasons we'll soon see. The only change then I'm making really is changing the prototype for main from the first version which we've been using for like a week and a bit now to the second version which is the only other version supported. I'm going to go back to my terminal window now. Make greet and darn it. I shouldn't so close. Why did I make uh how do I fix the mistake I accidentally made? Yeah, in back. Oh, no. In front. >> Yes, I should have kept the CS50 library because it's in the CS50 library that string is defined. So, include CS50.h. In week four, we will delete that line for real and actually show you what string actually is. I promised at the start of class that string is a term of art, but it's not a keyword in C, but it we'll see what it means in a couple of weeks time. Okay, let me fix this. make greet dot slashgreet but now I'm gonna type before I even hit enter my actual name and when I hit enter now I see hello David if I instead dot /g greet kelly enter now I see hello Kelly if I do nothing like greet enter I just see hello null which is not the same null as before n this is n u lll for reasons we'll come back to before long but clearly print f knows something's going on there's no actual word there. Why though did I do arg bracket one? Well, it turns out that just as a feature of C, if I recompile this program and do dot /greet and type in nothing else, I'm going to see something kind of curious. Hello. Because automatically the zero location in the arg variable will automatically contain the program's own name. Why is this useful? If you ever want to do something self-referential like thanks for running my program or you want to show documentation for your program and the name of your program that it depends on whatever the file itself is called, you can use argv bracket zero which will always contain the program's name no matter what the file has been named or renamed to. But we can fix that null issue now in a couple of ways. So arg c is the other input that I said now can exist which is the count of arguments at the prompt. So if I want to check if the user actually typed their name, I could say something like if arg c equals equals 2. Well then and only then go ahead and print out their name. Else let's just do some clever default like print f quote unquote hello world or heck nothing at all. This version of the program now is a little smarter because when I run make greet and dot /gre of my name works exactly as intended. But if I forget and only dot slashgreet it's going to say hello world. Moreover, if I don't quite cooperate and I say David Men enter, it similarly just ignores me because arg count is not two anymore. It's now three. So, arg contains the total numbers of words at the prompt, but the first one is always the program's name. Question. >> Sorry. Can you say that once a little louder? Why is it information that we just have or >> Oh, so the short answer is just because like the definition of C, if you look up the documentation for C, you can either define main as taking no arguments with the word void Or you can specify that main can take two arguments and the compiler and the operating system will just ensure that if you provide two those two variables arg will be filled with those two val values automatically. Someone else decided that though that's just the way it works. You can't come up you can't put three there. You can't put four there. You can change the names of those variables but not the types because of this convention. So there's one last feature of main then it's the actual value it returns. Up until now every program I've written starts with int main something. Int main something. What is that int? We have yet to use it. Technically the value that main returns is going to be called a so-called exit status which is a numeric status that indicates success or failure. Numbers are everywhere in the world of computing. So for instance here's a screenshot from Zoom whereby if something goes wrong with Zoom like you have bad internet connectivity or something like that you might see an error code like 1132. That means nothing to normal people unless you Google it, look up the documentation, but it means something very much to the software engineers who wrote this code because they know, oh shoot, 1132 means this error and they probably have a spreadsheet or a cheat sheet somewhere that converts those codes to actually useful error messages. And frankly, in a better world, they would just tell you what the problem is rather than just say report the problem and mention this number. That said, on the web, odds are you're familiar with this number 404, which is also a weird thing for so many normal people to know, but this generally means file not found. It's a numeric code that signifies that something has gone wrong. Exit status isn't quite this, but it's similar in spirit. In Maine, you can return a value like zero or one or two or something else to indicate whether something was successful or not. By convention, a program, a function like Maine returns zero on success if all is well. And that leaves you then with like several hundred possible things that can go wrong because you could return one to signify one thing, two to return another, three to signify another, and so long as you have a spreadsheet or a cheat sheet or something, you can just keep track as the programmer as to what error means what. So what does this mean in real terms? Well, if I go over to VS Code here, let me implement a relatively simple program, our last called status.c. So in status C, I'm going to go ahead and use the CS50 library at the top, the standard IO library at the top, and then inside of int main and with our new uh format int arg c string arg v square brackets inside of main, I'm going to now do the following. If arg c does not equal to, then I'm going to go ahead and print out this time a warning. I'm not going to have some silly default like hello world. Let's tell the user that they didn't use my program correct. And I'm going to say print f missing command linear argument. And we'll assume they know what that means. Then to signify an error, I'm going to say return one. It could be two, it could be three, but this is the first possible error. So I'm going to start simple with one. Otherwise, if arg does equal to and I get to this part of my code, I'm going to say hello, percent s back slashn and pass in argv bracket 1 just like before. And just to be super specific, I'm going to return zero to tell the computer, the operating system, that this is success. Zero signifies success. Any other value signifies error. Let's make status now. Let's do dot /st status. And this is a little magical, but let me go ahead and cooperate initially. I'm going to type in my name David. And I'm going to see hello, David. Uh most people wouldn't know this but among the commands you can type at your terminal are this one here and the TFS and II the TAS and II would do something like this. We after running your code can do echo space dollar sign question mark and we can see secretly the return value that your program returned zero in this case. Meanwhile if we do this again dot slatus uh dot slash uh status and let me not type my name this time. When I do this, I see missing command line argument. What value should the code have returned? Then one. So let's see echo dollar sign question mark. There's the one. So even after just one week of CS50, if you've ever wondered how check 50 knows if your code was correct or not, among the ways we check for that is by checking this semi-secret status code, this exit status, which isn't really a secret. It's just not displayed to normal people because it's not all that enlightening unless you're the software developer who wrote the code in question. But this means we could return one in some cases or two in other cases or three or four in yet others. And these command line arguments are sort of everywhere. And in fact, a program I skipped over a moment ago was going to be this. There's no uh academic value to what you're about to see. But uh another program that takes command line arguments is known as cows. And this is sort of very famous in computing circles because it's been on systems for many years. Cowsay is a program that allows you to type in a word after the prompt like moo and it will print out what's called asky art. An adorable little cow with a speech bubble that says moo. So kind of evocative of like scratch, but it takes other command line arguments, not just the words that you want to come out of its mouth, but even the appearance that you want it to have. So for instance, I can say -f duck and run it again. Enter. And now I have a little cute duck saying moo, which is a bit of a bug. So let me change that to quack for instance instead. And again no academic value here. It's just fun to now play with the various options. But if we really want to have fun with this, we can do another one. So cow say-f dragon. And we can say something like raar. And now we have this crazy dragon appearing on the screen. Which is to say again no value here. It's just fun to play with command line arguments sometimes. And how is cows doing this? Well, someone wrote code maybe in C or some other language using arg c and argv and poking around at their values and maybe a conditional that says if the -f value is dragon then print this graphic else if the value is duck then print this other one. It all boils down to the same fundamentals of week zero of functions and conditionals and loops and boolean expressions and the like. It's just being composed into more and more interesting things. And indeed in closing among the other interesting things we'll play with this week to come full circle is that of cryptography. the art of scrambling information so as to have secure communication. So important nowadays with passwords and credit card numbers and personal messages that you might want to send and we'll have you explore through code some of the algorithms via which you yourselves can encrypt information. And there's a number of ways we can do this form of encryption and they all boil down to this mental model. You've got some input like the message you want to send and you want to incipher it somehow, encrypt it somehow so that no one knows what message you've sent. So you want your plain text, which is the human readable version in English or any other language to become cipher text ultimately. So the code you'll be writing this week is inside of this black box some kind of cipher, an algorithm that encrypts information so that you can do exactly this. Now the catch is that you can't just give it plain text and run it through an algorithm and get cipher text because you need to somehow have a secret typically for encryption to work. Like if I'm going to send a message to someone in back, well, I could just randomize the letters that I'm writing down. But how would they know how to reverse that process? Probably what we need to do is agree in advance that you know what, I'm going to change every A to a B and every B to a C and a C to a D and a Z to an A. I'll wrap back around at the end of the uh the alphabet. It's not very sophisticated, but who know middle school teacher if they intercept two kids passing notes in class are going to waste time trying to figure out this cipher. But it does presuppose that there's a secret between them, the number one in that case, because I'm changing every letter by one place. So how might this work? Well, if I want to encrypt the word hi, hi exclamation point and my secret key with someone that I've come up with in advance is one. I should send the cipher text i j exclamation point. Now, this is a simple cipher, so I'm not really encrypting the punctuation, which may or may not be a good thing, but I am encrypting at least the alphabetical letters. But what does the recipient then have to do to decrypt this message? When they see on paper I J exclamation point, how do they know what I said? Well, they use that same key but subtract. So B becomes A, C becomes B, A becomes Z and so forth. Essentially inverting the key from positive one to negative 1. Of course, slightly more secure than uh a cipher of one, a key of one would be 13. And in fact, in computing circles, 13 has special significance. ROT 13, RO T13 is an algorithm that's been used for many years online just to sort of avoid spoilers. Like Reddit might do this or other websites where they want you to have to do some effort to see what the message says. But it's not all that hard. You just have to click a button or write the code that actually does this. But if you use 13 instead, you wouldn't get uh J uh you wouldn't get I J. You'd get UV because U and V are 13 places away from H and I respectively. But again, we're not touching the punctuation. Or we could send something more personal like I love you and the message comes out like that. Slightly more secure than that would be rot 26. No. >> No. Why? Because it's the same thing. It literally rotates all the way around. A becomes a, b becomes b. So there's a limit to this. But more seriously, that speaks to just how strong this encryption is or is not. Because if you think about this now from an adversar's perspective, like the teacher in the room intercepting the slip of paper, how much work do they need to do? Well, they just try all possibilities. Key of one, key of two, key of three, dot dot dot, key of 25. And at some point, they will see clearly that they guessed the key, which means that cipher is not very secure. Nonetheless, what we're talking about is historically known as the Caesar cipher because back in the day, when Caesar was communicating by uh by uh by legend uh with his generals, if you're the first human on Earth to come up with encryption or come up with this specific cipher, it doesn't really matter how not complex it is if no one else knows what's going on. Nowadays, it's not hard at all to write some C code or any other language that could just brute force their way through this. So there are much more sophisticated algorithms nowadays than simple rotations of letters of the alphabet as we'll soon see. But when it comes to decryption, it really is just a matter of reversing that process. So this message here, if we rotate all the letters in the opposite direction by subtracting one, will be our final flourish for today. There's a bit of a hint there which will reveal that this message and our final words for us as the clock strikes 4:15 is going to be the U becomes T and the I becomes H. Um, this I'm the only one. This is amusing. H I S W A S C50. And this was CS50. We'll see you next time. Heat. Heat. Heat. Heat. Heat. Heat. Ow. Black. B. W. Heat. Heat. Heat. All right, this is CS50. This is week three. And this was an artist rendition of what various sorting algorithms look and sound like. Recall from week zero that an algorithm is just step-by-step instructions for solving some problem to sort information as in the real world just means to order it from like smallest to largest or alphabetically or some other heristic. And it's among the algorithms that we're going to focus on today in addition to searching which of course is looking for information as we did in week zero too. Among the goals for today are to give you a sense of certain computer science building blocks. Like there's a lot of canonical algorithms out there that most anyone uh who studied computer science would know, who anyone who leads a tech interview would ask. But more importantly, the goal is to give you different mental models for and methodologies for actually solving problems by giving you a sense of how these uh real world algorithms can be translated to actual computers that you and I can control. We thought we'd begin today uh with an actual algorithm for sort of taking attendance. We of course do this with scanners outside, but we can do it old school whereby I just use my hand or my mind and start doing 1 2 3 4 5 6 7 8 9 10 11 12 and so forth. That's going to take quite a few steps cuz I've got to point at and recite a number for everyone in the room. So I could kind of do what my like grade school teachers taught me, which is count by twos, which would seem to be faster. So like 2 4 6 8 10 12 14 16 18 20. And clearly that sounds and is actually faster. But I think with a little more intuition and a little more thought back to week zero, I dare say we could actually do much better than that. So, if you won't mind, I'd like you to humor us by all standing up in place and think of the number one if you could and join us in this here algorithm. So, stand up in place and think of the number one. So, at this point in the story, everyone should be thinking of the number one. Step two of this algorithm for you is going to be this. Pair off with someone standing. Add their number to yours and remember the sum. Go. Okay. At this point in the story, everyone except maybe one lone person if we've got an odd number of people in the room is thinking of what number? >> Two. Okay. So next step, one of you in each pair should sit down. Okay, good. Never seen some people sit down so fast. So those of you who are still standing, the algorithm still going. So the next step for those of you still standing is this. If still standing, go back to step two. Air go repeat or loop if you could. And notice if you've gone back to step two, that leads you to step three. That leads some of you to step four, which leads you back to step two. So this is a loop. Keep going. If still standing, pair off with someone else still standing. Add together and then one of you sit down. So with each passing second, more and more people should be sitting down and fewer and few are standing. Okay, almost everyone is sitting down. You're getting farther and farther away from each other. That's okay. I can help with some of the math at the end here. All right, I see a few of you still standing, so I'll help out and I'll I'll join you together. So, I see you in the middle here. What's your number? >> 32. >> 32. Okay, go ahead and sit down and I'll pair you off with What's your number? >> 20. Okay, you can go ahead and sit down. Uh, who's still You're still standing? >> 27. >> 27. Okay, you can sit down. >> You guys are still adding together. Who's going to stay standing? Okay. What's your number? >> The worst part is doing like arithmetic across a crowded room, but >> 27. >> 27. Also >> 47. >> 47. Okay, you can sit down. Is anyone still standing? Yeah, >> 15. >> Nice. 15. Okay, you can sit down. Anyone still standing? Okay, so all I've done is sort of automate the process of pairing people up at the end here. When I hit enter, we should hopefully see Oh, the numbers are a little What's going on there? There we go. When I hit enter, we'll add together all of the numbers that were left. And if you think about the algorithm that we just executed, each of you started with the number one, and then half of you handed off your number. Then half of you handed off your number. Then half of you handed off your number. So theoretically all of these ones with which we started should be aggregated into the final count which if this room weren't so big would just be in one person's mind and they would have declared what the total number of people in the room is. I'm going to speed that up by hitting enter on the keyboard. And if your execution of this algorithm is correct, there should be 141 people in the room. According to our old school human though, Kelly, who did this manually, one at a time, the total number of people in the room, according to Kelly, if you want to come on up and shout it into the microphone, is of course going to be >> I don't know, something around 160, I think. >> 160. So, not quite the same. Okay, but that's pretty good. Okay, round of applause for your your accuracy. Okay, so ideally counting one at a time would have been perfectly correct. So, we're only off by a little bit. Now, presumably that's just because of some bugs in execution of the algorithm. Maybe some mental math didn't quite go according to plan. But theoretically, your third and final algorithm wherein you all participated should have been much faster than my algorithm or Kelly's algorithm whether or not we were counting one at a time or two at a time. Why? Well, think back to week zero when we did the whole phone book example, which was especially fast in its final form because we were dividing and conquering, tearing half of the problem away, half of the problem away. And even though it's hard to see in a room like this, it stands to reason that when all of you were standing up, we took a big bite out of the first problem and half of you sat down, half of you sat down, half of you sat down, and theoretically there would have been, if you were closer in in uh space, one single person with the final count. So let's see if we can't analyze this just a little bit by considering what we did. So here's that same algorithm here. Recall is how we motivated week zero's demonstration of the phone book in either digital form as you might see in an iPhone or Android device looking for someone for instance like John Harvard who might be at the beginning middle or end of said phone book but we analyze that algorithm just as we can now this one. So in my very first verbalized algorithm 1 2 3 4 you could draw that as a straight line because the relationship between the number of people in the room and the amount of time it takes is linear. It's a straight line with each additional person in the room. It takes me one more step. So if you think to sort of high school math, there's sort of a slope of one there. And so this n number denoting number of people in the room is indeed a straight line. And on the x-axis, as in week zero, we have the size of the problem in people and the time to solve in steps or seconds or whatever your unit of measure is. If and when I started counting two at a time, 2 4 6 8 10 and so forth, that still is a straight line because I'm taking two bytes consistently out of the problem until maybe the very end where there's just one person left, but it's still a straight line, but it's strictly faster. No matter the size of the problem, if you sort of draw a line vertically, you'll see that you hit the yellow line well before you hit the red line because it's moving essentially twice as fast. But that third and final algorithm, even though in reality it felt like it took a while and I had to kind of bring us to the exciting conclusion by doing some of the math, that looked much more like our third and final phone book example. Because if you think about it from an opposite perspective, suppose there were twice as many people in the room. Well, it would have taken you all theoretically just one more step. Now, granted, one more loop and there might be some substeps in there, if you will, but it's really just fundamentally one more step. If the number of people in the room quadrupled, four times as many people, well, that's two more steps. Equivalently, the amount of time it takes to solve the attendance problem using that third infogal algorithm grows very slowly because it takes a huge number of more people in the room before you even begin to feel the impacts of that uh growth. And so today indeed, as we talk about not only the correctness of algorithms, we're going to talk about the design of algorithms as well. just as we have code because the smarter you are with your design the more efficient your algorithms ultimately are going to be and the slower their cost is going to grow and by cost I mean time like here maybe it's money maybe it's the amount of storage space that you need any limited resource is something that we can ultimately measure and we're not going to do it very precisely indeed we're going to use some broad strokes and some standard mechanisms for describing ultimately the running time the amount of time it takes for an algorithm or in turn code to actually run. So, how can we do this? Well, last week recall we set the stage uh for talking about something called arrays, which were the simplest of data structures inside of a computer where you just take the memory in your computer and you break it up into chunks and you can store a bunch of integers, a bunch of strings, whatever, back to back to back to back. And that's the key characteristic for an array. It is a chunk of memory wherein all of the values therein are back to back to back. So, right next to each other in memory. So we drew this fairly abstractly by drawing a grid like this and I said well maybe this is bte zero and this is bte 1 billion whatever the total number amount of memory is that you have. We zoomed in and looked at a little something like this a canvas of memory. We talked about what and where you can put things. But today let's just assume that we want 1 2 3 4 5 6 seven chunks of memory for the moment. And inside of them we might put something like these numbers here. Well, the interesting thing about computers is that even though if I were to ask you all, find the number 50 in this array. I mean, our minds quickly see where it is because we sort of have this bird's eye view of the whole screen and it's obvious where 50 is. But the catch with computers and with code that we write is that really these arrays, these chunks of memory are equivalent to a whole bunch of closed doors. And the computer can't just have this bird's eye view of everything. If the computer wants to see what value is at a certain location, it has to do the metaphorical equivalent of going to that location, opening the door and looking, then closing it and moving on to the next. That is to say, a computer can only look at or access one value at a time. Now, that's in the simplest form. You can build fancier computers that theoretically can do more than that, but all the code we write generally is going to assume that model. You can't just see everything at once. You have to go to each location in these here lockers, if you will. Starting today two when we talk about the locations in memory we're going to use our old uh zero indexing uh vernacular that is to say we start counting from zero instead of one. So this will be locker zero locker one locker two dot dot dot all the way up to locker six. So just ingrain in your mind that if you hear something like location six that's actually implying that there's at least seven total locations because we started counting at zero. So that's intentional. Um we don't have in the real world yellow lockers. So, we're going to make this metaphor red instead. We do have these lockers here. And suppose that within these seven lockers physically on stage. We've put a whole bunch of money, uh, monopoly money, if you will, but the goal initially here is going to be to search for some specific denomination of interest and use these physical lockers as a metaphor for what your computer's going to do and what your code ultimately is going to do. If we're searching for the solution to a problem like this, the input to the problem at hand is seven lockers, all of whose doors are metaphorically closed. The output of which we want to be a bull. True or false answer. Yes or no? That number is there or no it is not. So inside of this black box today is going to be the first of our algorithm step-by-step instructions for solving some problem where the problem here is to find among all of these dollar bills specifically the $50 bill. If we could get two volunteers to come on up who are ideally really good at monopoly. Okay. How about over here in front? And uh how about let me look a little farther in back. Okay. Over here there and back. Come on down. All right. As these uh volunteers kindly come down to the stage, we're going to ask them in turn to search for specifically the $50 bill that we've hidden in advance. And if uh my colleague Kelly could come on up too because we're going to do this twice. Once searching uh in one with one algorithm and a second time with another. Uh let me go ahead and say hello if you'd like to introduce yourselves to the group. >> Hey, I'm Jose Garcia. >> Hi, I'm Caitlyn Cow. >> All right, Jose and Caitlyn. Nice to meet you both. Come on over and let me go ahead and propose that Jose um the first algorithm that I'd like you to do is to find the number 50. And let's keep it simple. Just start from the left and work your way to the right. And with each time you open the door, stand over to the side so people can see what's inside and just hold the dollar amount up for the world to see. All right, the floor is yours. Find us the $50 bill. 20. >> Shut it. >> No, that's good. That's good acting, too. Thank you. No, you can shut it just like the computer. All right. No. Very clear. Thank you. Still no. $10 bill. Next locker. $5 bill. Not going well. Uh $100 bill, but not the one we want. This one. H $1 bill. Still no 50. Of course, you've been sort of set up to fail, but here, amazing. A round of applause. Jose found the $50 bill. All right. So, let me ask you, Jose, you found the $50 bill. Um, it clearly took you a long time. Just describe in your own words, what was your algorithm, even though I nudged you along. >> Yeah. So, my algorithm was basically walk up to the first door available, open it, check if the dollar bill was the dollar bill that I was looking for, and then put it back, and then go to the next one. >> Okay. So, it's very reasonable because if the $50 bill were there, Jose was absolutely going to find it eventually, if slowly. In the meantime, Kelly's going to kindly reshuffle the numbers behind these doors here. And even though Jose took a long time here, I mean, what if Jose like wouldn't have been smart to start from the other end instead, do you think? >> Um, not necessarily because we don't know if the 50 is going to be at that end. >> Exactly. So, he could have gotten lucky if he sort of flaunted my advice and didn't start on the left, but instead started on the right. Boom. he would have solved this in one step, but in general that's not really going to work out. Maybe half the time it will. You'll get lucky, half the time it won't. But that's not really a fundamental change in the algorithm whether you go left to right, right to left. To Jose's point, if you don't know anything priori about the numbers, the best you can probably do is just go through linearly left to right or right to left. So long as you're consistent. Now, could you have jumped around randomly? >> Uh, I guess I could have, but if again, if they weren't in any like specified order, I don't think it would have helped either. Yeah. So, in additionally, if he just jumped around to random order, they might get lucky and it might be in the very first one might have taken fewer steps ultimately, but presumably you're going to have to then keep track of like which locker doors have you opened. So, that's going to take some memory or space, not a big deal with seven lockers. But if it's 70 lockers, 700 lockers, even random probably isn't going to be the best job. So, let me go ahead and take the mic away and hand it over to Caitlyn. You can stay on the stage with us. Caitlyn, what I'd like you to do is approach this a little more intelligently by dividing and conquering the problem, but we're going to give you an advantage over Jose. Kelly has kindly sorted the numbers from smallest to largest from left to right. >> So, accordingly, what's your strategy going to be? >> Start in the middle. >> Okay, please. And go ahead as before and reveal to the audience what you found. Not the 50, the 20. But what do you know, Caitlyn? At this point, >> it'll be in on the left is left. Correct. So the 20 is going to be to the left. So where might you go next with this three locker problem? Let me propose that you maybe go to the middle of the three. >> There we go. The middle of the middle. Like that would have been good. But let's >> Oh no. >> Oh no. It's a 100 instead. You failed. But what do you now know? >> It's in the middle. >> That I should have just let you. But now we have a big round of applause for Kayn for having found the 50 as well. Okay. So, the one catch with this particular demo is that because they know presumably what monopoly money denominations are because we just did this exercise and we had the whole cheat sheet on the board, you probably had some intuition as to like where the 50 was going to be. even though I was trying to get you to play along. But in the general case, if you don't know what the numbers are and that they're the specific denominations, but you do know that they're going from smallest to largest, going to the middle, then the middle of the middle, then the middle of the middle again and again would have the effect of starting with a big problem and having it, having it, having it, just like the phone book as well. So, thanks to you both. We have these wonderful parting gifts that we found in Harvard Square. Uh, if you like Monopoly, you'll love the Cambridge edition filled with Harvard Square name spots. So, but thank you to you both and a round of applause for our volunteers here. >> All right. So, let's see if we can't formalize a little bit these two algorithms known as linear search in so far as Jose was searching essentially along a line left to right and binary search by implying two because we were having that problem in two again and again and again. So for instance with linear search from left to right or equivalently right to left we could document our pseudo code as follows. For each door from left to right if the 50 is behind the door well then we're done. Just return true. That's the boolean value which was the goal of this exercise to say yes here is the 50. Otherwise at the very bottom of this pseudo code we could just say return false. Because if you get all the way through the lockers and you have never once declared true by finding the 50, you might as well default at the very end to saying false. I did not find it. But notice here, just like in week zero when we talked about pseudo code for searching the phone book, my indentation of all things is actually very intentional. This version of this code would be wrong if I instead used our old friend if else and made this conditional decision. Why is this code now in red wrong in terms of correctness? Yeah, if it's not behind the first door, it'll return false. >> Exactly. Because if the number 50 is not behind the first door, the else is telling you right then and there, return false. But as we've seen in CC code, whenever you return a value, like that's it for the function. It is done doing its work. And so if you return false right away, not having looked at the other six lockers, you may very well get the answer wrong. So the first version of the code where there wasn't an else but rather this implicit line of code at the very or this explicit line of code at the very end that just says if you reach this line of code return false that addresses that problem and to be clear even though it's right after an indented return true when you return a value as in C that's it like execution stops at that point at least for the function or in this case the pseudo code in question. All right, so here's a more computer sciency way of describing the same algorithm. And even though it starts to look a little more arcane, the reality is when you start using variables and sort of standard notation, you can actually express yourself much more clearly and precisely, even though it might take a little bit of practice to get used to. Here is how a computer scientist would express that exact same idea. Instead of saying for each door from left to right, we might throw some numbers on the table. So for i a variable apparently from the value zero on up through the value n minus one is what this shorthand notation means if 50 is behind doors bracket i so to speak. So now I'm sort of treating the notion of doors as an array using our notation from last week. If 50 is behind doors bracket I return true. Otherwise if you get through the entirety of that array of doors you can still return false. Now notice here n minus one seems a little weird because aren't there n doors? Why do I want to go from 0 to n minus one instead of 0 to n? Yeah, >> because zero is the first block. >> Exactly. If you start counting at zero and you have n elements, the last one is going to be addressed as n minus one, not n because if it were n, then you actually have n + one elements, which is not what we're talking about. So again, just a standard notation and it's a little turser this way. it's a little more succinct and frankly it's a little more adaptable to code. And so what you're going to find is that as our problem sets and programming challenges that we assign sort of get a little more involved, it's often helpful to write out pseudo code like this using an amalgam of English and C and eventually Python code because then it's way easier after to just translate your pseudo code into actual code if you're operating at this level of detail. All right. So, in the second algorithm, uh, where Caitlyn kindly searched for 50 again, but Kelly gave her the advantage of sorting the numbers in advance. Now, she doesn't have to just resort to brute force, so to speak, trying all possible doors from left to right. She can be a little more intelligent about it and pick and choose the locker she opens. And so, with binary search, as we call that, we could implement the same pseudo code. We could implement pseudo code for it as follows. We might say if 50 is behind the middle door, then go ahead and return true. Else if it's not behind the middle door, but 50 is less than that number behind the middle door, we want to go and search the left half. So that didn't happen in Caitlyn's sense because we ended up going right. So that's just another branch here. Else 50 is greater than what was at the middle door. We want to search the right half. But there's going to be one other condition here that we should probably consider, which is what is it here? Is it to the left? Or is it to the right? But there's another a corner case that we'd better keep track of. What else could happen? >> If it's not in the array or really like we're out of doors, so we can implement this in a different way. I left myself some space at the top because I shouldn't do any of this if there are no doors to search for. So, I should have this sort of sanity check whereby if there's no doors left or no doors to begin with, let's just immediately return false. And why is that? Well, notice that when I say search left half and search right half, this is implicitly telling me just do this again. Just do this again, but with fewer and fewer doors. And this is a technique for solving problems and implementing algorithms that we're going to end today's discussion on because what seems very colloquial and very straightforward. Okay, search the left half, search the right half is actually a very powerful programming technique that's going to enable us to write more elegant code, sometimes less code to solve problems such as this. And more on that in just a little bit. But how can we now formalize this using some of our array notation? Well, it looks a little more complicated, but it isn't really. Instead of asking questions in English alone, I might say if 50 is behind doors bracket middle, this pseudo code presupposes that I did some math and figured out what the numeric address, the numeric index is of the middle element. And how can I do that? Well, if I've got seven doors and I divide by two, what's that? 7id two, three and a half. Three and a half makes no sense if I'm using integers to address this. So maybe we just round down. So three. So that would be locker number 0 1 2 3 which indeed if you look at the seven lockers is in fact the middle. So this is to say using some relatively simple arithmetic I can figure out what the address is the index is of the middle door if I know how many there are and I divide by two and round down. Meanwhile, if I don't find 50 behind the middle door, let's ask the question. If 50 is less than the value at the middle door, then let's search not the left half per se in the general sense. More specifically, search doors bracket zero through doors bracket middle minus one. Otherwise, if 50 is greater than the value at the middle door, go ahead and search doors bracket middle + one through doors bracket n minus one. Now let's consider these in turn. So searching the left half as we described this earlier seems to line up with this idea like s start searching from doors bracket zero the very first one. But why are we searching doors bracket middle minus one instead of doors bracket middle. Yeah >> middle. >> Yeah exactly. We already checked the middle door by asking this previous question. And so you're just wasting everyone's time if you divide the half and still consider that door as checkable again. And same thing here. We check middle plus one through the end of the lockers array because we already checked the middle one. So same reason even though it just kind of complicates the look of the math, but it's really just using variables and arithmetic to describe the locations of these same lockers. But let's consider now what we mean by running time. The amount of time it takes for an algorithm to run. and consider which and why one of these algorithms is better than the other. So in general when talking about running time we can actually use pictures like this. This is not going to be some like very low-level mathematical analysis where we count up lots of values. It's going to be broad strokes so that we can communicate to colleagues uh to other humans generally whether an algorithm is better than another and how you might compare the two. So here for instance is a pictorial analysis of two different algorithms. It's the phone book from week zero and then the attendance taking from today itself. And let's generally as we've done before sort of label these things. So the very first algorithm took n steps in the very worst case if I had to search the whole phone book or if I had to count everyone in the room. So the first algorithm took indeed n steps. The second algorithm took half as many plus one maybe but we'll keep it simple. So we'll call that n /2. And the third and final algorithm both in week zero with the phone book and today with attendance is technically log base 2 of n. And if you're a little rusty in your logarithms, that's fine. Just take on faith that log base 2 alludes to taking a problem of size n and dividing it in half and half and half as many times as you can until you're left with one person standing or one page in the phone book. That's how many times you can divide in half a problem of size n. Well, it turns out that we're getting a little more detailed than most computer scientists t care to get uh when describing the efficiency of algorithms. So in fact we're going to start to use some not common notation instead of worrying precisely mathematically about how many steps today's and the future's algorithms take. We're going to talk in broader strokes about how many steps they are on the order of and we're going to use what's called big O notation which literally is like a big O and then some parenthesis and you pronounce it big O of such and such. So the first algorithm seems to be in big O of N which means uh it's on the order of N steps give or take some. this algorithm here, you might be inclined to do something similar. Ah, it's on the order of n / two steps and ah, this one's on the order of log base 2 of n steps. But it turns out what we really care about with algorithms is how the time grows as the problem itself grows in size. So the bigger n gets, the more concerned we are over how efficient our algorithm is. if only because today's computers are so darn fast. Whether you're crunching a thousand numbers or 2,000 numbers, like it's going to take like a split second no matter what. But if you're crunching a thousand numbers versus a million numbers versus a billion numbers, like that's where things start to actually be noticeable by us humans and we really start to care about these values. So in general, when using big O notation like this, you ignore lower order terms or equivalently, you only worry about the dominant term in whatever mathematical expression is in question. So big O of N remains big O of N. Big O of N / two. Eh, it's the same thing really as like big O N. Like it's not really, but they're both linear in nature. One grows at this rate, one grows at this rate instead. But it's for all intents and purposes the same. They're both growing at a constant rate. This one too, ah, it's on the order of log of n where the base is who cares. In short, what does this really mean? Well, imagine in your mind's eye that we were about to zoom out on this graph such that instead of going from 0 to like a million, maybe now the x-axis is 0 to a billion. And same thing for the y-axis, 0 to a million. Let's zoom out. So, you're seeing 0 to a billion. Well, in your mind's eye, you might imagine that as you zoom out, essentially things just get more and more compressed visually because you're zooming out and out and out, but these things still look like straight lines. This thing still looks like curved lines, which is to say as n gets large, clearly this green algorithm, whatever it is, is more appealing it would seem, than either of these two algorithms. And if we keep zooming out, like at some point, the ink is going to be so close together that they all for are for all intents and purposes pretty much the same algorithm. So this is to say computer scientists don't care about lower order terms like divide by two or base 2 or anything like that. We look at the most dominant term that really matters as n gets bigger and bigger. So that then is bigo notation and it's something we'll start to use pretty much recurringly anytime we analyze or speak to how good or how bad some algorithm is. So here's a little cheat sheet of common running times. So for instance here's our friend big O of N which means uh the algorithm takes on the order of n steps. Uh here is one that takes on the order of login steps. Here are some others we haven't seen yet. Some algorithms take n times log n steps. Some algorithms take n squared steps and some algorithms just take one step maybe or maybe two steps or four steps or 10 but a constant number of steps. So let me ask of the algorithms we've looked at thus far for instance linear search being the very first today what is the running time of linear search in big O notation that is to say if there's n people uh if there's n lockers on the stage how many steps might it take us to find a number among those n lockers big O of yeah >> big O of N in fact is exactly where I would put linear search. Why? Well, if you're using linear search in the very worst case, for instance, the number you're looking for, as with Jose, might be all the way at the end. So, you might get lucky. It might not be at the very end, but generally, it's useful to use this bigo notation in the context of worst case scenarios because that really gives you a sense of how badly this algorithm could perform if you just get really unlucky with your data set. So e even though big O really just refers to an upper bound like how many steps might it take it's generally useful to think about it in the context of like the worst case scenario like ah the number I care about is actually way over here but what about binary search even in the worst case so long as the data is sorted how many steps might binary search take by contrast >> big O of log N so binary search we're going to put here which is to say that in general and especially as n gets large binary search is much faster it takes much less time. Why? Because assuming the numbers are sorted, you will be dividing in half and half and half just like with the phone book in week zero that problem and you will get to your solution much faster. Why should you not use binary search though on an unsorted array of lockers like a random set of numbers? Yeah, >> you could just get rid of the value because you don't know like what the inequality is going to be. >> Exactly. You're making these decisions based on inequalities, less than or greater than, but based on like no rhyme or reason. You're going left, going right, but there's no reason to believe that smaller numbers are this way and bigger numbers are that way. So, you're just making incorrect decision after incorrect decision. So, you're probably going to miss the number altogether. So, binary search on an unsorted array is just incorrect. Incorrect usage of the algorithm. But, like Kelly did, if you sort the data in advance or you're handed sorted data, well, then you can in fact apply binary search perfectly and much more efficiently. >> I have a question. Is there ever a case where linear search is more efficient just because the process of sorting the data yourself? >> Absolutely. Is linear search sometimes more efficient if it's going to take you more time to sort the data and then use binary search? Absolutely. And that's going to be one of the design decisions that underlies any implementation of an algorithm because if it's going to take you some crazy long time not to sort like seven numbers but 70 700 7,000 7 million but you only need to search the data once then what the heck are you doing? Like why are you wasting time sorting the data if you only care about getting an answer once? You might as well just use linear search or heck do it even randomly and hope you get lucky if you don't care about reproducing the same result. Now in general that's not how much of the world works. For instance, Google's working really hard to make faster and faster algorithms because we are not searching Google once and then never again doing it. we're doing it again and again and again. So they can amortize, so to speak, the cost of sorting data over lots and lots of searches. But sometimes it's going to be the opposite. And I think back to graduate school where I was often writing code to analyze large sets of data. And I could have done it the right way, sort of the CS50 way by fine-tuning my algorithm and thinking really hard about my code. But honestly, sometimes it was easier to just write really bad but correct code, go to sleep for seven hours, and then my computer would have the answer by morning. The downside, as admittedly happened more than once, is if you have a bug in your code and you go to sleep and then seven hours later you find out that there was a bug, you've just wasted the entire evening. So there too, a trade-off sometimes when making those resource decisions. But that's entirely what today is about, making informed decisions. And sometimes maybe it's smarter and wiser to make the more expensive decision, but not unknowingly, at least knowingly. All right, so there might we have our first two algorithms, but let's consider another way of describing the efficiency of an algorithm. Big O is an upper bound. Sort of how bad can it get in these uh cases where maybe the data is really uh not working to our advantage. Omega, a capital omega symbol here is used for lower bounds. So maybe how lucky might we get in the best case, if you will. How few steps might an algorithm take? Well, in this case here, here's just a cheat sheet of common runtimes, even though there's an infinite number of others, but we'll generally focus on uh um u functions like these. Let's consider those same algorithms. So with linear search from left to right, how few steps might that algorithm take? For instance, in like the best case scenario? Yeah. Is this hand about to go up? >> Yeah. So one step. Why? Because maybe Jose could have gotten lucky and opened this door and voila, that was the 50. It didn't play out that way, but it could have. In the general case, the number you're looking for could very well be at the beginning. So we're going to put linear search at omega of one. So one step and maybe it's technically a few more than that, but it's a fixed number of steps that has nothing to do with the number of lockers. Case in point, if I gave you not seven but 70 lockers, he could still get lucky and still take just one step. So omega is our lower bound. Big O is our upper bound. Ah, spoiler. What is binary search's lower bound? Well, apparently it's also omega of one. But why? That is in fact correct. Yeah, >> you could just get lucky again. >> Same reason you could get lucky in the best case and it's just smack dab in the middle of all of the data. So the fewest number of steps binary search might take is also actually one. So this is why we talk about upper bound and lower bound because you get kind of a r a sense of the range of performance. Sometimes it's going to be super fast which is great but something tells me in the general case we're not going to get lucky every time we use an algorithm. So it's probably going to be closer to those upper bounds the big O. Now, as an aside, there's a third and final uh symbol that we use in computer science to describe algorithms. That of a capital theta. Capital theta is jargon you can use when big O and omega happen to be the same. And we'll see that today. Not always, but here's a similar cheat sheet. None of the algorithms thus far can be described in this way with theta notation because they are not all the same with their big O and omega. They differed in both of our analyses. But we'll see at least one example of one where it's like okay we can describe this in theta and that's like saying twice as much information with your words to another computer scientist rather than giving them both the upper and the lower bounds. The fancy way of describing all of what we're talking about here big O omega and theta is asmmptoic notation. And asmtoic notation refer or asmtoic uh lee refers to a value getting bigger and bigger and bigger and bigger but not necessarily ever hitting some boundary as n gets very large in short is what we mean when we deploy this here asmtoic notation. All right. So, with the first of these things like linear search, let's actually kind of make this a bit more real. Let me actually go over to in just a moment uh my other screen here. Okay, in VS Code, let me go ahead and create a program called search.c. And in search C, let's go ahead and implement a fairly simple version of linear search initially. So, let me go ahead and include, for instance, cs50.h. Let me go ahead and include standard io.h. Then, let me go ahead and do in main void. So, we're not going to bother with any command line arguments for now. And then let me go ahead and just give myself an array of numbers to play with. And we did this briefly last week in answer to a question, but I'm going to do it now concretely rather than use something uh ma more manual to get all of these numbers into the array. I'm going to say give me an array called numbers. And the numbers I want to put in this array initially are going to be the exact same denominations we've been playing with. 20 500 10 5 100 1 and 50. Again, this is notation that I alluded to in answer to a question last week whereby if you want to statically initialize an array, that is give it all of your values up front without having the human type them all in manually, you can use curly braces like this. And the compiler is pretty smart. You don't have to bother telling the compiler how many numbers you want, 1 2 3 4 5 6 7 because it can obviously just count how many numbers are in the curly braces, but you could explicitly say seven there so long as your counting is in fact correct. So on line six, this gives me an array of seven numbers initialized to precisely that list of numbers from left to right. All right, let's ask the human now what number they want to search for just as I did our two volunteers and say int n equals get int. Then let's just ask the user for the number that they want to search for. Then let's implement linear search. And if I want to implement linear search in terms of the programming constructs we've seen thus far like what type what uh keyword in C should I use? What programming technique? Yeah. Yeah. So, maybe a for loop or a while loop, but for loop is kind of uh my go-to lately. So, let's do four int i equals zero because we'll start counting from the left. I is less than seven, which isn't great to hardcode, but I'm not going to use the seven again. So, I think it's okay in one place for this demo. then I ++ then inside of this array let's go ahead and ask a question just like Jose was by opening each of the doors by saying if numbers bracket I equals equals the number we asked about n well then let's go ahead and print out some informative message like found back slashn and then for good measure like last week let's return zero to signify success it's sort of equivalent to returning true but in main recall you have to return an int. That's why we revealed at the end of week two the return type of main is an int because that is what gives the computer its so-called exit status which is zero if all is well or anything other than zero if something went wrong but I think finding the number counts as all is well but if we get through that whole loop and we still haven't printed found or return zero I think we can go ahead and safely say not found back slashn and then let's just return one as our exit status to indicate that we didn't find the actual number. So in short I think and see this is linear search. Let me open up my terminal window again. Let me make search enter. Let me do / search enter. And I'll search for as I asked Jose the number 50. And we indeed found it at the end. Let me go ahead and rerun dot slash search. And let's search for the other number at the beginning 20. That then works. And just to get crazy, let's search for a number we know not to be there like a th00and. And that in fact is not found. So I think we have an implementation then of linear search. But let me pause here and ask if there's any questions with this here code and the translation of algorithm to see. Yeah, in the back why I did not specify the length of the array. So it is not necessary when declaring an array and setting it equal to some known values in advance to specify in the square brackets how many you have because like the compiler is not an idiot. It can literally count the numbers inside of the curly braces and just infer that value. You could put it there, but arguably you're opening up the possibility that you're going to miscount and you're going to put seven here but eight numbers over there or six numbers there. So it's best not to tempt fate and just let the compiler do its thing instead. A good question. Other questions on this code so far? All right, if none, let's go ahead and maybe convert this linear search to one that's maybe a little more interesting that involves like searching for strings of text. After all, we started the class in week zero by searching for names in a phone book like John Harvard. Let's see if we can't adapt our code for searching for strings instead of integers. So, in my code here, let's go ahead and delete everything inside of main just to give myself a clean canvas. Let me go ahead and give me another array. This one called, let's just call it strings, cuz that's the goal of this exercise. And set them equal to some familiar pieces from the game of Monopoly if you might have played. So, there's like a battleship piece in there, there's a boot in there, there's a cannon in there, an iron, a thimble, and a top hat. Though, it does vary nowadays based on the addition that you have. So kind of a long array, but I have 1 2 3 4 5 six total values in this array of strings. Now let's ask the user for a string. We'll call it s for short. And say with get string, what string are you looking for among those six? Then I think we can do an a for loop again for int i= 0 i less than 6 i ++. And then inside of this loop, let's do the same thing. If uh let's say uh strings bracket i equals equals the string s that the human typed in. I think we can go ahead and say print found back slashn and then as before return zero to signify success. And if we don't after that whole for loop let's print print f not found back slashn down here and return one to signify error. So, it's really the same thing at the moment, except that I'm actually using strings instead of integers. All right, let me go ahead and open up my terminal window again and clear it. Let me go ahead and recompile this code. Make search.c seems to compile. Okay, let me do dot / search and let's go ahead and search for the first one. How about battleship enter? Huh, not found. All right. Well, let's maybe typo. Maybe let me search for something easier to spell. boot not found. That's weird. Both of those are at the very start of the array. Let's do dot slarch again and search for top hat. Enter. Not found. What is going on? Well, this isn't actually that obvious as to what I'm doing wrong. But it turns out that when we actually compare strings instead of integers in C, we're actually going to have to use this other library, at least today, that we saw briefly last week. Last week we introduced it because of a function called sterling which gives us the length of a string. Turns out that string.h also comes per its documentation with another useful function called stir comp for string compare and its purpose in life is to actually compare two strings left and right to make sure they are in fact the same. So for today's purposes suffice it to say you cannot use equals equals apparently to compare two strings intuitively. Why is that? Well, for a computer, it's super easy to compare two integers because they're either there or they're not in memory. But with a string, it's not just a character and another character. It's like seven a few characters over here and a few characters over here. Maybe it's a few, maybe it's more. You have to compare each and every character in a string to make sure they're in fact the same. So, stir compare does exactly that. probably in the implementation of stir comp from like years ago someone wrote a while loop or a for loop that looks at each string left to right and compares each and every one of the characters therein and then gives us back an answer. So how do we go about using this? Well to use stir compare what I can actually do in VS code here is go and change my code as follows. Instead of using equals equals I'm going to actually use this function per its documentation. I'm going to call stir compare. Then I'm going to pass in one of the strings which is in strings bracket I. Then I'm going to pass in the second string which is S. However, having read the documentation and this is a little non-obvious. It turns out that stir comp will return zero if the strings are equal. Otherwise, it's going to return a positive number or a negative number. So what I care about for now is does the return value of stir comp when given those two strings give me back zero. If so, they are equal and I'm going to say quote unquote found. So, let's go ahead and open the terminal again. Let me go ahead and clear it and do make search to recompile my code. And huh, I've done something wrong. Let's see. Let me scroll up to the very first line. In line 11, error call to undeclared library function stir comp with type in and something something which gets more complicated after that. Why is line 11 not working despite what I just preached? Yeah. >> Yeah. I just did something stupid. I didn't include the string.h header library. So all clang, our compiler, is doing when invoked by make is it's encountering literally the word stir comp and not knowing what it is because we haven't taught it what it is by simply saying include string.h at the top. Okay, let me reopen my terminal window. Clear that message away. Do make search again. Now it's compiling. Dot / search. Enter. Now I'm going to go ahead and search as I did before for battleship. Ah, now it's finding it. Let me run dot slash search again. Search for boot. Ah, okay, that's found. Let me go ahead and search for top hat. That too is in there. Let me go ahead and search for something that's not there, like the number 50. Not in fact found. So I think we've actually fixed that there problem. But if we go back to this code for a moment, it's indeed the case per the documentation that equals equals 0 is what I want to do. Why in the world would stir comp be designed to return a positive or a negative number too? It's not returning true or false. It's returning one of three possible values. Zero, negative, or positive. Why might it be useful? Yeah. >> Um you could kind of like compare which of the strings is like greater. >> Yeah, super clever. So, if you're passing in two strings, it's great to know if they're equal. But wouldn't it be nice if this same function could also help us sort these strings ultimately and tell me which one comes first alphabetically. And technically, it's not going to be alphabetically. It's going to be a cute phrase asetically because it's actually going to look at the asky values of the characters and do some quick arithmetic and tell you which one comes first and which one comes later, which is enough as we'll eventually see for actually sorting these strings as well. So in short, the documentation will tell me that I should check not only for zero if I care about equality, but if I care about inequality, that is checking if one comes first or last, I should check whether something is less than zero or greater than. But for this demonstration implementing linear search, I don't care about comparing them uh for inequality. All I care about is that they are in fact the same or not in this case. All right. All right. Well, let's go ahead and do one other example of sort of linear search, but let's make the problem more like that actually in week zero of searching a phone book. So, let me go back to VS Code here. Close search.c and let's make an actual phone book. So, I'm going to say code of phonebook C. And then inside of phonebook C, let's use our same header file. So, include CS50.h, include standard io.h, and let's include an advanced string.h. Then let's before as before do int main void. No command line arguments today. Then inside of here, let me give myself first an array of strings. How about some names in the phone book? So I'm going to say string names equals and then three names just to make uh a demonstration. Kelly and David and say John Harvard here. But if it's a phone book, I need more than just names. So let me go ahead and give myself another array. String of numbers open bracket close bracket equals. And now the same phone numbers we used in week zero for the three of us. Uh + 1 617 495 1. Uh same for both Kelly and me. So plus1 617495 uh 1. And then as before, if you'd like to text or call John directly, you can do so at plus1 9494682750 and semicolon. So one question first. I obviously declared our names to be a an array of strings because that's what text is. Why have I also declared phone numbers to be strings and not integers? Because a phone number is like literally a number in the name of it. Yeah. >> Yeah. So even though we have phone numbers in the US, even though we have social security numbers and a bunch of other things that we call numbers, if you have other non-digits in those uh in those values, you have to actually use strings because if it's not an actual integer, but it does have things like pluses or dashes or parentheses or any other form of punctuation as is common in the US and other countries for phone numbers in particular, you're going to actually want to use strings and not numbers. as well as for corner cases like if there are if you're in the habit back home if you're not from uh say the US and you actually have to dial zero first to make like a local regional call you don't want to have a leading zero in a integer because mathematically as we know from grade school like leading zeros number zeros that come first have no mathematical meaning they're going to disappear effectively from the computer's memory unless we store them in fact as characters in strings in this way okay with that said let's go ahead and ask the human now after having declared those two arrays for the name they want to look up the number of. So let's say string name equals get string and let's go ahead and ask the human uh for the name for which to search. Then let's use a for loop as before for int i equals z i less than 3 which again for demonstration purposes I'm just hard coding today i ++ and then in the for loop I'm going to use our new friend stir comp. If the return value of stir compare passing in names bracket I and the name the human typed in equals equals zero signifying that they are in fact the same. Well that means we found the location i where the person's name is. So let's go ahead and print out found. But just to be fun let's print out whom we found. So percent s back slashn and then output there the number which is going to be in the corresponding numbers array at that same location I will return zero and at the very end of this program let's go ahead and print out not found if we get that far and return one. All right. So, a little more complexity this time, but notice I'm comparing the names just like a normal person would in your iOS app or your Android app when looking for someone's name. But what I care about is getting back the number. So, that's why two lines later, I'm printing out the number that I found at location I, not the name because I already know the name. All right. In my terminal window, let's go ahead and make this phone book dot /phonebook. Let's go ahead and search for John, whose number is hopefully indeed exactly that number. So, suffice it to say, this code two does work. This is a linear search because I'm searching left to right. These aren't actually sorted alphabetically by name or let alone number. So, I think we're doing well here, but I don't necessarily love this implementation. Even if you're new to programming, what might you not like about how I've implemented a phone book in the computer's memory? Why is this maybe not the best design? Yeah. >> Like there's a correspondence between names and numbers. So like having two different >> Okay. Yeah. And I would say so uh you're pointing out that we have this duality. We've got two arrays. They're the exact same length. And it just so happens that location zero's name lines up with location zero's number and location one and location two. But we're kind of on the honor system here whereby the onus is on us to make sure we don't screw this up and we make sure we always have the same number of names and the same number of numbers and better and moreover that we make get the order exactly right. We are just trusting that when we print out the e number so to speak that it lines up with the e name. So that's fine and honestly for three people who really cares it's fine. But if you think about 30 people, 300, 3 million, well, we're not going to hardcode them all here, but even in some database that we'll store them in later in the course feels like just trusting that we're not going to screw this up is asking for trouble. And indeed, a lot of programming is just that, like not trusting yourself and definitely not trusting your colleague not to mess something up, but programming a bit more defensively and trying to encapsulate related information a little more tightly together and not just assume as on the honor system that these two independent arrays will line up. But at this point, we have no means of solving this problem unless we give ourselves just a bit new functionality and syntax. So I used this phrase earlier to kick things off. data structures. It's like how you structure your data in the computer's memory. Arrays are the simplest of data structures. They just store data back to back to back from left to right continuously in memory. But they all have to be, as we've seen, the same kinds of values. Int int or string string string. There's no mechanism yet for storing an int and a string together and then another int and another string together or let alone two strings, two strings, two strings that are somehow a little bit different. But it would be nice if C gave us an actual data type to store people in a phone book such that we could create an array called people inside of which is going to be a whole bunch of persons if you will back to back to back and I want two of them. So wouldn't it be nice if I could literally use this code in C. Well decades ago when SE was invented they didn't give us a person data type. All we have is int and float and char and bool and string and so forth. Person was not among the available data types. But we can invent our own data types it turns out. So in C what we can do if we want persons to exist and every person in the world shall have a name and a phone number for now we can do this string name string number. Now that's a decent start but it's going to be kind of a stupid implementation if I then just do name uh string name one string name two string name three string name four. We've already started down that road last week and decided arrays were a better solution. But here's an alternative when you want to just store related data together. I can use these two keywords and see typed defaf strruct which albeit tur just means define a new type that is a data structure. So multiple things together inside the curly braces you literally put the two things you want to relate together string name string number and then outside the curly braces you specify the name you want to give to this brand new custom type that you have invented. Technically, stylistically, you'll see that style 50 prefers that the name actually be on the same line as the last curly brace, which looks a little weird to me, but that's what industry tends to do, so so be it. But these several lines together tell C, invent for me a new data type called person, and assume that every person in the world has a string called name and a string called number. And now I can use this new data type in my own code to solve this problem a little bit better. So, in fact, let me go ahead and do this as follows. I'm going to go back to VS Code here. And at the very top of my code, above main, just to make this available to not only Maine, but maybe any future functions I write, I'm going to say type defrct, as we saw on the screen. Inside of my curly braces, I'm going to say string name and string number. And then I'm going to name this thing person. Now, I'm going to go about using this and I'm going to go ahead and delete my previous honor system approach of having names and numbers in separate arrays. And I'm instead going to give myself an array of people. Uh, we could call it persons, but I'm trying to be somewhat grammatically correct. So, I'm going to say people bracket three to give myself an array called people inside of which is room for three persons inside of which is room for a name and number each. So, how do I now initialize these values? So I'm going to hardcode them. That is type them manually. But you can imagine using get string or get or some other function to get this data from the human themselves. I'm going to say go to the people array at location zero and access the name field. And this is syntax we haven't seen yet, but it's not that hard. You literally use a dot, a single period to say go inside of that structure and access the name field, the name attribute, so to speak. And let's set that equal to Kelly. Then let's go into that same array location people bracket zero and set the number for the zeroth person to be + one 6174951000. Then let's go ahead and do the same thing for people bracket 1. Set that person's name to for instance mine David. Then let's do people bracket 1 number equals quote unquote same as Kelly cuz we're both in the directory. So + 1 617495 1,000. And then lastly, people bracket 2.name equals quote unquote John for John Harvard. People bracket 2 number equals + one uh 949 468 275 0 in this case. And now the rest of the code is almost the same. I'm going to now on the new line 24 still ask the user what name they want. I'm going to still iterate from 0 to three because there's still three elements in this array even though each has two values within. And I'm going to compare now not names but people bracket i.name to go access the name of that i person and compare it to the name that the human has typed in. And when I find that person I'm going to go into the people array at location i but print out the number instead. So all we've done here is add this dot notation which allows you to access the inside of a data structure. And all we've done is introduce up here some new C keywords that let you invent your own data types inside of which you can put most anything you want. I have chosen a string name and a string number. All right, let me go ahead and open my terminal window and clear it from before. Let me do make phone book to make this version. So far so good. Make phone book. Enter. I'm going to go ahead now and search for say John. And I have again found his number. So this is still correct. But even though this took more minutes in terms of the voice over and it took more lines of code, it's arguably better designed now because at people bracket zero is an actual person and everything about them. At people bracket one is another person and everything about them and so forth. This is what we mean by encapsulate. You can think of these curly braces as sort of hugging these data types inside of the data structure together so as to keep them together in the computer's memory as well. All right. Well, just to set the stage, uh, literally as we'll strike the lockers and put something else up, the efficiency of binary search as implemented by Caitlyn was predicated on Kelly having in advance sorted the values up front. But of course, we've only considered now the running time of searching for information using two algorithms, and there can be many others in the real world, but those are two of the most canonical. We found that binary search was faster than linear search, but it required that we sort the data. So to your question earlier, maybe we should consider just how expensive it is in terms of time, money, space, humans to sort data, especially a lot of data, and then decide whether or not it's worth using something like binary search or perhaps even something else. So the next problem we'll solve today ultimately is given a generic input and output. The input to our next problem is going to be unsorted data. So like numbers out of order, the output of which should be sorted data. So for instance, if we pass in 72541603, I want whatever black box is implementing my sorting algorithm to spit out 0 1 2 3 4 5 6 7. So that's going to be the question we answer. But first, I think it's time for some delightful hello pandas, chocolate biscuits. Uh let's take a 10-minute break and snacks are now served. All right, we are back. And recall that the cliffhanger on which we left was that how do we go about sorting numbers? Well, here are some numbers, eight of them in fact, from 0 to seven. but currently unsorted. Um, we don't quite have enough Monopoly boards for everyone, but we do have some delightful uh Super Mario Brothers Pez dispensers. If I could get eight volunteers for this final demo up here. Oh, and not a lot of hands. Okay. All right. One, two, three, four, five, six, and let's go farther back. Seven, and eight. How about All right. Come on up. Hopefully I counted properly. Come on over. Upon arrival at the stage, go ahead and grab your favorite illuminated number and stand in that same order at the front of the stage if you all could. Welcome to the stage. All right, grab your favorite number. Stand in that same order. All right, good. And one, two, three, four, five, six. I definitely said one through eight. Who is the number eight then? Okay, we need an eight. Come on down. All right. Well, technically we need a four, but come on down. Yeah. All right, grab the four and let me start from this end first if you want to give a quick hello and a little something about you. >> Uh, hi, my name is Cameron. I'm a first year and I want to study mechanical engineering. >> Welcome. >> Hi, I'm Charlotte. I'm also first year and I'm in Canada F. >> Welcome. >> Hi, I'm Ella. I'm also a first year and I'm in the >> Hi, I'm Precious. I'm also a first year. I'm there. >> Hi, I'm Michael. I'm just an Eventbrite guest. >> Yeah. >> Hi, I'm Marie. I'm a first year and I'm in Canada. >> Welcome. >> Hi, I'm Rick. I'm a first year and I'm in whole worthy. >> Welcome. >> Nice. >> I'm Jaden. I'm a first year in Hullworthy and I really like free stuff. >> Okay. Well, let's see then uh if we can't award all these Super Mario Brothers Pez dispensers. The first notice, of course, that all eight of our volunteers are completely out of order, but in an ideal world, we would have the smallest number over here. Go over there. Number zero. Wait a minute. Seven. Let's go over here. Two. Okay. F. Okay. Make yourselves look like that. No pez. It's okay. All right. So, 725 41603. Okay. We won't do the introductions again, but now we have a list of numbers completely out of order. And wouldn't it be nice if zero were eventually over here, seven were all the way over there, and everything else was sorted from smallest to largest? Well, if you all could go ahead and sort yourselves from smallest to largest. Go. All right. And Jaden, what was your algorithm for doing that? Um I I I I know that I have the least number because I don't think there anybody has a number less than zero. So I put myself at the last bottom line. >> Okay. And I assume Precious. What was your algorithm? >> I knew I had the largest number. So I just had to be at the end of the >> Okay, fair. So you guys got the easy ones. Uh number four. How about >> I knew three was before me and five was after me. >> Nice. So number four didn't actually have to move coincidentally. But as for five and three and two and one and six, they probably had to take into account some additional information. Who's to their left? Who's to their right? And it just kind of worked. But it didn't look very algorithmic, if you will. It looked very organic and obviously correct. But I'm not sure that same approach would work well if we had not eight, but 80 or 800 or 8,000 pieces of data. So let's see if we can't formalize this a little bit. Let me take the mic and if you guys could reset yourselves to those same original positions from seven on the left to three on the right. Let me propose a couple of algorithms, canonical ones if you will, but see if maybe we can't formalize step by step what to do. So the first one I'm going to do given all of these numbers is just try to select the smallest number. Why? To Jaden's point earlier, I just want to put the smallest number over here. At least that's a problem I can solve. It's very well defined. It's a nice bite out of the problem. So seven. Okay, smallest so far. Two, that's that's smaller. So I'm going to remember that two is the now smallest number I've seen. Not five, not four. One is even smaller. So, I'm going to remember one, not six, zero. That's pretty good. But I'm going to check the whole list. Maybe there's negative one or something like that. But no, three. So, I'm going to remember that zero was the smallest element I found. Let's select Jaden and put Jaden over here. But before Precious or anyone else moves, we don't really have room for you. Like, Precious is in the way because if this is an array of eight values for integers, well, we can't just kind of make room over here because if you think back to last week, we might have uh some garbage values there or something else is going on. We don't want to change data that doesn't belong to us. So what to do with precious? Well, maybe Precious, maybe you can go over there. So you just take Jaden's spot and we'll swap these two values accordingly. Now though, Jaden is in the right space, which is good because now I can move on to the second problem. What's the next smallest element that's presumably greater than zero? Well, at the moment, two is the next smallest element. Not five, not four. Ooh, one is the next smallest element. I'm going to remember that. Not six, not seven, not three. Okay, so number one, if you could go to the right location, but I'm afraid we're going to have to evict number two to make room. All right, let's do this again. Zero and one are in good shape. So now I think I can ignore them as complete. Five is the current smallest. Nope. Four now is Nope. Two now is six. No. Seven. No. Three. No. Okay, so two is the next smallest. So let's swap two and five. And now I've solved three out of the eight problems. Let's do this again. Four is at the moment the smallest. Not five, not six, not Oh, three is the now smallest. So, let's swap three. Four and three, which unfortunately is making the four problem a little worse. Like he belongs there, it would seems, but I think we can fix that later. So, now half of the list is sorted. Five is the next smallest. Six and seven. A four. Now, we got to fix the four. So, four goes back there. Now, I messed up the five, but it will come back to that. All right. Six. Seven. Okay. Five. Let's put you where six is. And now one more mistake to fix. So, seven. Okay. Six and seven need to swap. And now I've solved eight problems in the aggregate. So it's complete. Now to be fair, my approach is clearly way slower than your approach, but you all were working in parallel, whereas I was doing it more methodically, step by step. And I dare say my algorithm is probably going to be more translatable to code. And indeed, what I just acted out is what the world would call selection sort, whereby on each iteration, each pass in front of the humans, I was selecting the smallest element I could find. All right. What how else could I do this, though? So, let's do something that's maybe a little more organic like your approach where you were actually comparing who was next to you. Go ahead and reset yourselves one final time to this arrangement. Seven on the left, three on the right. And let me propose again to walk through the list again and again. But let me focus more narrowly on the problem right in front of me because I felt like I was taking a lot of steps back and forth, back and forth. Maybe we can chip away at some of that wasted time. Let's compare seven and two. They're obviously out of order. So, let's just immediately swap you two if we could. All right. Now, seven and five clearly out of order. Let's swap these two. Seven and four out of order. Let's swap these two. Seven and one out of order. Let's swap these two. Seven and six out of order. Let's swap these two. Seven and zero out of order. Swap these two. Seven and three out of order. Swap these two. So, a lot of work for Precious there. But, I've now indeed solved one of the eight problems. Moreover, I don't need to keep uh addressing the seven problem because notice that Precious has essentially bubbled her way up to the end of the list. And indeed, that's going to be the operative term here. Another algorithm that computer scientists everywhere know is called bubble sort, whereby the goal is to get the biggest elements to just bubble their way up to the top of or the end of the list one at a time. Now, am I done? Well, no. Clearly not. There's still stuff out of order except for precious. Indeed, I have solved one of these eight problems. And now fine, I'll go back and I'm just going to try this same logic again. Two and five, good. Five and four, nope, swap those. Five and one, nope, swap those. Five and six are good. 6 and zero, nope, swap those. Six and three, nope, swap those. And I already know that Precious is where she needs to be. So, I think I'm done with the second of eight problems. And I'll do this a little faster now. Two and four. Four and one, swap. Four and five are good. Five and zero, swap. Five and three, swap. And now we solved three problems. Let me reset. Two and one, swap. Two and four are good. Four and zero, swap. Four and three, swap. And now I've solved half of the problems. Four out of eight. We're almost done. One and two are good. Two and zero, swap. Two and three are good. Okay. And now we're done with five out of the eight problems. One and zero swap. Uh, one and two are good. Those are all good. And let me just do a final sanity check. Everything now is sorted. So now I'm done solving all eight of those problems. So, you all were wonderful. We need the numbers back, but Kelly has some delightful Pez dispensers for you on the way out. If you want to head that way, just leave the numbers on the shelves. And a round of applause for our eight volunteers for helping to act this out. Thank you. So, let's see if we can't formalize what these volunteers kindly just did with us. Starting with the first of those algorithms. Thank you. Namely, selection sort. Let's see if we can't slap some pseudo code on this. thinking of our humans now as more generically an array. So we had the first person at location zero and we had the last person at location n minus one. And just for clarity so that you've kind of seen the uh symbology this obviously is going to be location n minus2. This is location n minus3 and so forth until sort of dot dot dot you hit the other end that we've already written out. So that's just how we would refer to all of our eight volunteers locations or in this case 1 2 3 4 5 6 seven locations but dot dot dot in the middle conoting that this can be a much much larger array. So here's some pseudo code for the first algorithm selection sort for i from zero to n minus one. So from the first element to the last element find the smallest number between the numbers bracket i and numbers bracket n minus one. In other words, if you're starting I at zero, look at specifically every lighted number between location zero and location n minus one. When you have found that smallest element, swap it with the number at location i, which starts again at zero. That's how we got I think jaden into place at the very beginning. Then I by nature of how for loops work gets updated from 0 to one. So that we do the same thing. Find the smallest number between numbers bracket one. So the second element through the eighth element because this number is unchanged. N is the total number of values. So the end point there is not changing. Once we found the second smallest person, we swap them with location I aka one. And that's how we got the number one into position and then the number two and then the number three and number four. So this then was selection sort in pseudo code form. And that allowed us to actually go through this list again and again and again in order to find the next smallest element. So what was happening a little more methodically if it helps just to map that symbology of the bracket notation and the eyes. If this is where we started with location I and we did everything between location N minus one. Essentially I traversed this whole list from left to right literally walking in front of our volunteers looking at each element and the first element I saw was seven. At the moment that was the smallest element I had found. And who knows in a different list maybe seven would be the smallest element. So I kind of stored it in a variable in my mind. But I checked then two and remembered no no two is clearly less than. Now I'm going to remember two. Okay. Now I'm going to remember one when I find it. Then I'm going to remember zero when I find it. And then what I did once I found jade in it with the value of zero uh lighted up. I moved location that location to here and then evicted precious recall and moved precious over to that location that we had freed up. Why? Why all this sort of back and forth? Well, you have to assume with an array that you're not entitled to the memory over here. You're not entitled to the memory over here if you've already decided that you have seven lockers or eight people. You have to commit to the computer in advance. That's why we put the number typically in the square brackets or the compiler infers from the curly brackets how big the array actually is. All right. And suffice it to say when I went through this again and again and again, I did the same thing over and over. Now, you might have thought me sort of dumb for having asked the same questions again and again like I was surprised to discover the number one. I was surprised to discover the number to two even though on my very first pass I literally looked at all eight of those numbers but you have to think about what memory I'm actually using. Now I certainly could have memorized all of the numbers and where they are. But I propose that just very simply I was using like a single variable in my brain just to keep track of the then smallest element. And once I'm done finding that and solving that problem I moved on to do it again and again. But that's going to be a trade-off. And this is going to be thematic in the coming weeks whereby well sure you could use more memory and I could have been smarter about it and maybe that would have improved or um hurt the running time of the algorithm. There's often going to be a trade-off between how much memory or how much time you actually use. So we'll discover that over time. So how fast or slow is selection sort? Well consider when I had eight humans on stage I first went through uh all n of them. But how many comparisons did I make? Really, I was doing n minus one comparisons because if I've got n people, I've got to compare the smallest number I found against everyone else. And you compare n people left to right n minus one times total. So the first pass I was making I was asking n minus one questions. Is this the smallest? Is this the smallest? Is this the smallest? N minus one times. Once I solved one problem, when we got Jaden into Jaden's right place, then I had one fewer problem. Then one fewer fewer problem and so forth. So, it was like n -1 steps plus n -2 steps plus n -3 steps plus dot dot dot one final step once I got to the final of the eight problems. Now, if you remember kind of the cheat sheet at the back of your math books, uh say growing up, you'll note that this uh series here can be more simply written as n * n -1 all / 2. And if you've not seen that before, just take on faith that this is identical to this series of numbers up here. So, now we can just kind of multiply this out. So that's technically n^2 minus n all divided by 2, which is great. If we multiply that out, that's n^ square over 2 - n /2. We're getting too into the weeds. Let's whip out our big O notation now, whereby we can wave our hands at the lower order terms only care about the biggest most dominant term, which mathematically in this expression, if you plug in a really big value of n, which is going to matter more? The n squ, the two, the n, or the two? Like the n squ? like the others absolutely contribute to the total value. But if you plug in a really big value, the dominant force is going to be this n squ because that's really going to blow up the total value. So we can say that selection sort when analyzed in this way, ah it's on the order of n squared steps because I'm doing so many comparisons so many times. So if that's the case, the question then is um what is indeed not just its upper bound but maybe it's lower bound as we'll eventually see. So for selection sort for now, let's stipulate that it's indeed in big O of N squ. And that's actually the worst of the algorithms we've seen. Like that's way slower than linear search because at least linear search was big O of N. Selection sort is N squar which of course is N * N which is and will feel much much slower than that. So what if though we consider the lower bound of selection sort? All right, maybe it's bad in the worst case, but maybe it's really good when the numbers are mostly sorted. Unfortunately, this is the same pseudo code for selection sort. We make no allowance for checking the list to make sure it's already sorted. And in fact, that's kind of a perverse case to consider for any algorithm. What if the problem's already solved? How's your algorithm going to perform? Like if all of my volunteers is they kind of almost did accidentally, they started lining up roughly in order. Suppose they literally had been in order from 0 to 7. Well, my stupid algorithm would still have me walking back and forth, back and forth, back and forth. Why? because the code literally tells me do this this many times and every time I do that find the smallest element. So it's going to be sort of a stupid output because the list is not going to be any changed any any at all changed but my code is not taking into account in any way the original order of the numbers. So no matter what this is to say that if we consider whether the lockers or the humans the omega notation for this algorithm even in the best case where the data is already sorted is crazily also n squared. Now I could certainly change the pseudo code but selection sort as the world knows it is more of a demonstrative algorithm or sort of a quick and dirty one. Its running time is going to be in omega of n squ. And now we can actually deploy our theta notation because the bigo notation is n^ squ and the omega notation is n^ squ and the same. We can also say that selection sort is in theta of n^2 which is not great because that's annoyingly slow. So maybe the solution here is don't do that. Let's use bubble sort instead. The second algorithm where I just compared everyone side by side again and again. Well, here's some pseudo code for bubble sort which you can assume applies to the same kind of array from zero on up to n minus one. Here's one way to write bubble sort. Repeat the following n times. For i from 0 to n minus 2, if the number at location i and the number at location i + 1 are out of order, swap them. And there's kind of an elegance to this algorithm and that like that's it. And you just assume that when you go through the list, this is how from I from 0 to n minus two, this is how I was effectively comparing elements 0 and 1, one and two, two and three, three and four, dot dot dot, uh seven, six and seven. But notice I didn't say eight. There were eight total people. Why do we go from 0 to n minus2 instead of from 0 to n minus one? Uh yeah. Yeah. We already checked the last one. >> Not quite. So it's not that we've already checked the last one. I'm saying with this line of code here, we never even go to N minus one. Technically, >> if we have NUS, it is going to compare against NUS because that's >> exactly because we're doing this simple arithmetic here. We're checking current location I + 1. You can think of these as my left and right hand. Left hand is pointing at zero. Right hand's pointing at one. I don't want to do something stupid and have my left hand point at n minus one because then my right hand arithmetically when you add one is going to point at n which does not exist. That's beyond the boundary of the array because the array goes from zero to n minus one. So just a little bit of a safety check there to make sure we don't walk right off the end of the array. But we do this n times because recall that precious ended up being where uh seven needed to be at the very end of the list. But that didn't mean there weren't seven uh seven more problems still to solve. 0 through six. So I did it again and I did it again and per its name bubble sort the biggest element bubbled up first then the next biggest then the next biggest then the next business biggest biggest that is seven then six then five then four and we got lucky on some of them but eventually we finished with zero. So how do we analyze this thing? Well, we could also technically do this n minus one times as an aside if you're thinking through that I'm wasting some time because we get one for free once we get to uh solving seven problems. You get the eighth one for free because that person is obviously where they need to go. So when we had these numbers initially and we were comparing them with bubble sort again left hand right hand it's like treat this as I this is I plus one and we just kept swapping pair-wise numbers if in fact they were out of order. So all this is saying is what our humans were doing for us organically. So how do we actually analyze the running time of this? Last time I just kind of spitballled that it was n minus one steps plus n minus two steps. Well, you can actually look at pseudo code sometimes and if it's neatly written, you can actually infer from the pseudo code how many steps each line is going to take. For instance, how many steps does this first line take? I mean like literally n minus one. The answer is right there because it's saying to the computer or to me acting it out, repeat the following n minus one times. All right, so that's helpful. How many line how many steps does this inner loop induce? Well, you're going from i to n minus2. So that's actually n minus one total steps not n. And then this question here, if numbers bracket i and numbers i are out of order, it's a single question. It's like our boolean expression. We'll call it one. I mean, maybe you need to do a bit of more work than that, but it's a constant number of steps. Doesn't matter how big the list is. Comparing two numbers is always going to take the same amount of time. And then swapping them, oh, I don't know, it's going to take like one or two or three steps, but constant. Doesn't matter which the numbers are takes the same amount of work. So, let's stipulate, let me rewind, stipulate that the real things that matter are the loops. These constant number of steps, who really cares? But the loops are what are going to add up as n gets large. So this really then is if this is the outer loop and this is the inner loop. Think about our two-dimensional Mario square from week one. We did something on the outside and then something on the inside to get our rows and columns. This is equivalent to n -1 * n minus one. If we do our little foil method, n^2 - n - n + 1 combine like terms, n^2 - 2 n + 1. Who cares? This is ultimately going to be on the order of big O of N squared only because again if you ask yourself when I plug in a really big value for N which of these is really going to contribute most to the answer it's obviously going to be n^ squ again and we can ignore the lower order terms. So this doesn't seem to have made any progress like selection sort was on the order of big O of N was on the order of N squ bubble sort based on this analysis is also on the order of N squed. Maybe we're getting lucky in the lower bound. So on the upper bound for bubble sort, it's indeed n squ as was selection sort. But with this pseudo code for bubble sort, unfortunately we rather unfortunately we were not doing anything clever to catch that perverse case where maybe the list was already sorted. After all, consider if the list was sorted from 0 to 7. I was still asking all the same darn questions. Even if I did no work, I was going to repeat that n minus one times back and forth making no swaps but making all of those comparisons. But here's an enhancement to bubble sort that we can add that selection sort didn't really have room for. I can say after one pass of this inner loop walking from left to right, if I made no swaps, quit. So put another way, if I traverse the list from left to right, I make no swaps, I might as well just terminate the algorithm then because there's no more work clearly to be done. All right. So based on that modification, the lower bound of bubble sorts running time would be said to be an omega then of n because I'm minimally going to need to make one pass through the list. You can't possibly claim that the list is sorted unless you actually check it once. And if there's n elements, you're going to have to look at all n of them to make sure that it's in order. But after that, if you've done no work and made no swaps, no reason to traverse the list again and again and again. So a bubble sort can be said to be an omega of n because indeed we can just terminate after that single pass if we've done no work. We can't say anything about theta because they're not one and the same big O and omega. But that does seem to have given us some savings. Unfortunately, it really only saves us time when the list is already or mostly sorted. But in the average case and in the worst case, odds are they're both going to perform just as bad on the order of n square. In fact, let's take a look at a visualization that'll make this a little clearer than our own humans and voices uh might have explained. Here is a bunch of vertical purple bars uh made by a friend of ours uh in the real world. And this is an animation that has a bunch of buttons that lets us execute certain algorithms. A small bar represents a small number. A big bar represents a big number. And the goal is to get them from small numbers or small bars to big numbers or big bars left to right. So I'm going to go ahead and click on selection sort initially. And what you'll see from left to right is in pink the current smallest element that's been discovered, but also in pink the equivalent of my walking across the stage left to right again and again and again trying to find the next smallest element. And you'll see clearly just like when we put Jaden at the far left, the smallest element ended up over here. But it might take some time for precious for instance or number seven to end up all the way over on the right because with each pass we're really just fixing one problem at a time and there's n problems total which is giving us on the order of those n squared steps and now the list is getting shorter so we're at least doing some work that you don't have to keep touching the elements you already sorted which just like I was. So now selection sort is complete. Let's visualize instead bubble sort. So let me rerandomize the array just so we're starting with a random order. Now let's click on bubble sort. And you'll see the pink bars work a little differently. It conotes which two numbers are being compared at that moment in time. Just my like my left hand and right hand going left to right. And you'll see that even though it's not quite as pretty as selection sort where I was getting at least the smallest elements all the way to the left here, we're just pair fixing pair-wise problems, but the biggest elements like precious's number seven are indeed bubbling their way up to the top one after the other. But as you can see, and this is where n squared is sort of visual visualizable, we're touching these elements or looking at them so many times again and again. We are making so many darn comparisons. This is taking frustratingly long. And this is only what a few dozen bars or numbers. You can imagine how long this might take with hundreds, thousands, or millions of values. I dare say we're going to have to do better than bubble sort and selection sort because we're not done even yet. just trying to give the satisfaction of getting to the end and now we are. But neither of those algorithms seems incredibly performant because it's still taking us quite a bit of time to actually get to that there solution. So how can we actually do better than that? Well, we can try taking a fundamentally different approach. And this is one technique that you might have encountered in math or even in the real world even if you haven't sort of applied this name to it. Recursion is a technique in mathematics and in programming that allows you to take sort of a fundamentally different approach to a problem. And in short, a recursive function is one that's uh defined in terms of itself. So if you had like f ofx equals f of something on the right hand side of a mathematical expression, that would be recursive in that the function is dependent on itself. More practically in the world of programming a recursive function is a function that calls itself. So if you are writing some function in C and in that function you call yourself you actually have a line of code that says call that same function by the same name. That function is recursive. Now this might feel a little weird because if a function is calling itself it feels like this is the easiest way to get into an infinite loop because why would it ever stop if the function is calling itself calling itself calling itself calling itself? We're going to have to actually address that kind of problem. But in the real world, we've actually or rather in this class already, we've actually seen implicitly an example of this including today as well as in week zero. So here is that algorithm for searching the doors of the lockers. And recall that after we did this check at the very top, if there are any doors left, return false. If if uh not, we did these uh conditions. We said if the number is behind the middle door, return true cuz we found it. But things got interesting here where I said if else if the number is less than the middle door then search the left half. Else if the number is greater than the middle door then search the right half. Well at that point in time you should be asking me or yourself well how do I sort search the left half? How do I search the right half? Well here you go. Like on the screen right now is a search algorithm. And even though it says down here search the left half or search the right half which is like well how do I do that? We'll just use the same algorithm again. And this is how in terms of my voice over, you end up searching the left half of the left half or the right half of the left half or any such combination. This line here, search left half. This line here, search right half, is representative of a recursive call. This is an algorithm or a function that calls itself. But why does it not induce an infinite loop? Like why is it important that this line and this line are written exactly as they are so as to avoid this thing just forever searching aimlessly? Yeah, >> there's the condition at which it stops. >> We do have this condition at which it stops. But more importantly, what is happening before I make these recursive calls? >> Exactly. I'm recursing that is calling myself but I'm handing myself a smaller problem. A smaller problem. a smaller problem. It would be bad if I just handed myself the exact same number of doors and just kept saying, "Search these, search these, search these." Because you would never make any progress. But just like our volunteers earlier, so long as we did divide and conquer and we search smaller and smaller numbers of doors, eventually indeed we're going to bottom out and either find the number we're looking for or we're not. So, generally, we're going to call these kinds of conditions that sort of just ask a very obvious question and want an immediate answer base cases. Base cases are generally conditionals that ask a question to which the answer is going to be yes or no right then and there. A recursive case by contrast these two down here is when you actually need to do a bit more work to get to your final answer. You call yourself but with a smaller version of the problem. So we could have in fact in week zero have written this sort of similarly. If you go back to in your mind to week zero we had more of a procedural approach so to speak. When we were searching the phone book, I proposed that this induced what we called loops on line 8 and line 11, which just literally said go back to line three. And that was more of a mechanical way of sort of inducing a loop structure. But if I really wanted to be elegant, I could have said, well, you know what? 7 and 8 together really just mean search the left half. And 10 and 11 together really mean just search the right half. So let's condense these pairs of lines into shorter instructions. Search the left half of the book. Search the right half of the book. I can then delete two blank lines and now I have a recursive algorithm for searching a phone book. It's a little less obvious because you have to ask yourself when you get to line seven or nine, wait a minute, how do I search the left half or the right half? And that's when you need to realize you start the same algorithm again but with a problem that's half as large. In week zero, we do the procedural approach where we literally tell you what line of code to go to, but today we're offering a different formulation, a recursive approach where it's more implicit what you should do. and we'll see now a couple of examples from the real world, so to speak. So, here's a screenshot from Super Mario Brothers 1 on the original Nintendo uh entertainment system. Let me go ahead and get rid of some of the distraction like the the um ground and the mountains there. And here we have a sort of half pyramid, not unlike that you implemented in problem set one. But this is an interesting realworld physical structure in that you can define it recursively. Like what is a pyramid of height for if you will? Well, just to be a little uh a little difficult, a pyramid of height four is really just a pyramid of height three plus one more row. Okay. Well, what is a pyramid of height three? Well, a pyramid of height three is really just a pyramid of height two plus one more row. Well, what's a pyramid of height two? Well, a pyramid of height two is really just a pyramid of height one plus one more row. Well, what's a pyramid of height one? A single brick on the screen. And I sort of changed my tone with that last remark to convey that this could then be our base case whereby I just tell you what the thing is without sort of kicking the can and inviting you to think through what a smaller structure is plus one more row. Whereas every other definition I gave you then of a pyramid of some height was defined in terms of that same structure albeit a smaller version thereof. So we can actually um see this in the real world. Let me go ahead and pull up one thing here. I'm going to go to uh give me one sec before I flip over. Here I am on google.com. If you'd like a little computer science humor here, uh if you ever Google search for recursion and hit enter, you'll see uh a joke that computer scientists at Google find funny. Haha. One, two laughs. Does anyone see the joke? I did not make a typo, but Google's asking me, did I mean recursion? And if I click on that, I just get the same haha page. Okay. All right. That didn't go over well. Anyhow, so there are these Easter eggs in the wild everywhere because computer scientists are the ones that implement these things. But let's go ahead and actually um implement, for instance, a version of this in code. Let me go back over here in a moment to VS Code. And in VS Code, let me propose that in my terminal window, let me create one of two final programs. This one's going to be called iteration C. Just to make clear that this is the iterative that is loop-based version of a program whose purpose in life is to print out a simple Mario pyramid. I'm going to go ahead and include cs50.h at the top as well as standard io.h. I'm not going to need string.h. I don't need any command line arguments today. So this is going to start off with inmain void. And now I'm going to go ahead and ask a question like uh give me a variable called height of type integer and ask the human for the height of this Mario like pyramid. And then let's assume for the moment that I've already implemented a function called draw whose purpose in life is to draw a pyramid of that height semicolon. So I've abstracted away for the moment the notion of drawing that pyramid. Now let's actually implement draw whose purpose in life again is to print out a pyramid akin to the one we saw a moment ago like this here on the screen. Well, in order to print out a pyramid of a given height, I think I need to say uh void uh draw int n for instance because I'm not going to bother returning a value. I just want this thing to print something on the screen. So void is the return type. But I do want to take as input an integer like the height of the thing I want to print. I can call this argument or parameter anything I want. I'll call it n for number. So how can I print out a pyramid that again looks like this? Well, I'll do this quicker than you might have in problem set one. But seems obvious that like on the first row I want one brick. On the second row I want two. On the third I want three. On the fourth I want four. So it's actually a little easier than problem set one in that it's sloped in a different direction. So let me go ahead and do exactly this in code. Let me say for int i= 0 i less than n the height i ++. So this is going to be really for each row of the pyramid pyramid. Let me go ahead now and in an inner loop for int j equals z, let's do j less than i + 1 for reasons we'll see in a moment and then j++ and then inside of this loop let's just print out a single hash no new line but at the end of the row let's print out a single new line to move the cursor to the next line. Now why am I doing this? Well, this represents for each column of pyramid. And if you think about it, on the first row, which is row zero, I actually want to print not zero bricks, but one brick. So that's why I want to go ahead here and go from zero to i + 1 because if i is zero, i + 1 is 1. So my inner loop is going to go from 0 to 1, which is going to give me one brick. It's a little annoying to think about the math, but this just makes sure that I'm actually getting bricks in the order I want them. And then it's going to give me two bricks and then three and then four. And between each of those rows, it's going to print a new line. So let's go ahead and do make iteration to compile this code. Ah, I messed up. Why do I have a mistake on line eight of this code? Let me hide my terminal and scroll back up. It seems clang. My compiler does not like my draw function. Yeah. Yeah, I forgot the prototype. So this is the one and only time where it seems reasonable to copy paste. Let's grab the prototype of that function up here and go ahead and teach the compiler from the get-go what this function is going to look like even though I'm not defining it now until line 13 onward. All right, let's go ahead and make iteration again. Ah, dot /iteration. Enter. Let's do a height of say four. And voila, now I've got that there pyramid. So, I did it a little quickly and it's certainly to be expected if it took you hours on problem set one to get the other type of pyramid printed. But the point for today is really to demonstrate how we can print a pyramid like this using indeed what I'd call iteration. Iteration just means using loops to solve some problem. But we can alternatively use recursion by reimplementing our draw function in a way that's defined in terms of itself. So let me go into my code here and I'm actually going to leave the prototype the same. I'm going to leave main the same. But what I'm going to go ahead and do is delete all of this iterative code that's doing things very procedurally step by step by step with loops. And I'm instead going to do something like this. Well, if I want to print a pyramid of height n, what did I say earlier? Well, a pyramid of height n is really just a pyramid of height n minus one plus one more row. So, how do I implement encode that idea? Well, let me go back in code here and say, well, if a pyramid of height n first requires drawing a pyramid of height n minus one, I think I can just write this, which is kind of crazy to look at, but cuz you're calling yourself in yourself, but let's see where this takes us. Once I have drawn a pyramid of height n minus one, that is a height three for instance, what remains for me to do is to myself print one more row. And so to print one more row, I think I can do that really easily with fewer loops. I can do four int i= 0 i less than n i ++ and then very simply in this loop I can print out a single hash one at a time at the end of this loop I can print out a new line but no more nesting of loops what I've done is print one more row and here I've done print a pyramid of height n minus one I'm not quite done yet but I think this is consistent with my verbal definition that a pyramid of height three is a pyramid of height sorry a pyramid of height four is a pyramid of height three which I can implement per line 16 just draw me a pyramid of height n minus one and then I myself will take the trouble to print the fourth and final row but something's missing in this code let me go ahead and try running it let's see what happens make oh oh darn it I meant to call this something else so I'm going to do this I'm going to close this version here I'm going going to rename iteration C to recursion C to make clear that this version is completely different. Let me now go ahead and make the recursion version. And huh, Clang is noticing that I have screwed up. On line 14, it says error. All paths through this function will call itself. And Clang doesn't even want to let me compile this code because that would mean literally just forever loop effectively by calling yourself. So what am I missing in my code here? If I open up what we're now calling recursion.c in my editor, what's missing here over here? Yeah, I'm missing a base case. And I can express this in a few different ways, but I would propose that before I do any drawing of anything at all, let's just ask ourselves if there is anything to draw. So, how about if n equals zero, well then don't do anything, just return. You don't return a value. When your return value is void, it means you don't return anything. So you just return period or return semicolon. Or just to be super safe, I could actually do something like this, which is arguably better practice just in case I get into this perverse scenario where someone hands me a negative number. I want to be able to handle that and not print anything either. So just to be safe, I might say less than or equal to zero. I'm not doing one because if I did do one, then I would want to at least myself print out one brick, which is fine, but I'd have to like rech change all of my code a little bit. So I think it's safer if my base case is just if n is less than or equal to zero, you're done. Don't do anything. And this then ensures that even though thereafter I keep calling draw again and again and again and the problems getting smaller and smaller from four to three to two to one, as soon as I hit zero, the function will finally return. So let's go ahead and open up my terminal. Rerun make recursion to make this version did compile this time. dot /recursion enter let's type in four cross my fingers and this too prints the exact same thing and even though it doesn't look like fewer lines of code I would offer that there's an elegance to what I've just done whereas with the iterative version with all the loops it was very clunky like step by step just print this and print that and have a nested loop inside of another but with this especially if we distill it into its essence by getting rid of my comments like this and frankly I can get rid of the unnecessary curly braces only because for single lines in conditionals. You don't need them. Like this is arguably like a very beautiful implementation of drawing Mario's pyramid even though it's calling itself and arguably because it is calling itself. Questions then on this idea of recursion or this implementation of Mario? Yeah. >> Are there no scope issues involved if you like? >> Good question. Are there any scope issues involved? Short answer, no. However, the current value of I, for instance, will not be visible to the next time the function is called. It will have its own copy of I, if that's what you mean. And we'll next week talk in more detail about what's going on here. And in fact, I probably can't break this in class very easily. But it turns out if I use a very large version for heights, let's just hit a lot of zeros and see what happens. That was too many. Let's see what happens. That's also too many. Let's see what happens there. That's the first time at least I in class have encountered this error. You might have encountered this weird bug in office hours or in your problem set and that's fine if you did. We'll talk about what this means next week too. But this is bad. Like this clearly hints at a problem in my code. However, the iterative version of this program would not have that same error. So this relates to something involving memory because it turns out as a little teaser for next week, each time I call draw, I'm using a little more memory, a little more memory, a little more memory, a little more memory, and my computer only has so much memory. this program in its current form is using too much memory. There are workarounds to this, but that is a trade-off to the elegance we're gaining in this solution. So, what's the point of all this? And how do we get sidetracked by Mario? There's another sorting algorithm. The third and final one that we'll consider today that actually uses recursion to solve the problem not only elegantly arguably, but also way faster somehow than bubble sort and selection sort. And in essence, it does so by making far fewer comparisons and wasting a lot less work. It doesn't keep comparing the same numbers again and again. Here in its essence is the pseudo code for merge sort. Sort the left half of the numbers, sort the right half of the numbers, then merge the sorted halves. And this is kind of a weird implementation of an algorithm because I'm not really telling you anything. It seems like you're asking me how do I sort numbers and I say, well, sort the left half, sort the right half. It's like someone being difficult. And yet implicit in this third line is apparently some magic. This notion of merging halves that are somehow already sorted is actually going to yield a successful result. As an aside, we're actually going to need one base case here, too. So, if you're only given one number, you might as well quit right away because there's nothing to do. So, we'll toss that in there as well. And base cases are often for zero or one or some smallum sized problem. In this case, it's a little easier to express it as one because if you have one element, it's indeed already sorted. So, what does it mean to merge two sorted halves? Well, let's actually consider this. I'm going to reuse some of these same numbers here. I'm going to put my one, my three, my four, and my six on the left. And these together represent a list that is indeed sorted of size four. And then I'm going to put four other numbers on the right there that are similarly sorted as well. And by merging these two lists, I mean start at the left end of this list, start at the left end of this list, and just decide one step at a time which number is the next smallest. And then I'm going to put it on the top shelf to make clear what is sorted. So if my left hand's pointing at this list, my right hand's pointing at there, which hand is obviously pointing to the smaller element, left or right? Like the right. So I'm going to grab this and I'm going to use a little more space up top here and put the zero in place. And then I'm going to point to the next element there. So my left hand has not moved yet. It's still pointing at the one. My right hand is pointing at the two. Which number comes next? Clearly left. So, I'm going to grab the one and put it up there and update where my left hand is pointing. So, now I'm pointing at the three here and the two there. What comes next? Obviously the two. What comes next? Obviously the three. What comes next? Obviously the four. What comes next? Obviously the five. But notice my hands are not going back and forth, back and forth, back and forth like any of the algorithms thus far. I'm just taking baby steps, moving them only to the right, effectively pointing at for a final time each number once and only once. What comes next? Six. And now my left hand is done. What comes last? The number seven. So what I just did is what I mean by merge the sorted halves. If you can somehow get into a scenario where you've got a small list sorted and another small list sorted, it's super easy now to merge them together using that left right approach, which I'll claim only takes n steps. Why? Because every time I asked you a question, I was taking one bite out of the problem. There's eight bytes total. I asked you eight questions or I would have if I verbalized them all. So, it's n steps total to merge lists of that size. So, what then is merge sort? Merge sort is really all three of these steps together only one of which we've acted out. Two of which are sort of cyclical in nature. They're recursive by design. So what does this mean? Well, let's start with this list of eight numbers which is clearly out of order. 6 3 4 1 5270. And let's apply merge sort to this set of numbers. And I'll do it digitally here because it'll take forever to keep moving the numbers up and down physically. So let's move it to the top just to give ourselves a little bit more room. And let me propose that we apply merge sort. What was the very first step in merge sort? At least that we highlighted the juicy steps. What's the first step in merge sort? Sort the left half. Yeah. And then the second step was going to be sort the right half. And then the third step was going to be merge the sorted halves. So let's see what this means by actually acting it out on these numbers. So here's my eight numbers. Let's go ahead and sort the left half. Well, the left half is obviously going to be the four numbers on the left. And I'm just going to pull them out just to draw our attention to them over here. Now I have a list of size four and the goal is to sort the left half. How do I sort a list of size four? >> Uh be well yes but just be more pedantic like how do I sort any list using merge sort >> sort the left half. So let's do just that. So of a list of size four how do I sort this? Well I'm going to sort the left half. How do I sort a list of size two? >> Sort the left half. All right. Well I'm just going to write the six here. How do I sort a list of size one? I just don't. I'm done. That was the so-called base case where I just said return. Like I'm done sorting the list. Okay, so here I here's the story recap. Sort the left half. Sort the left half. Sort the left half. And I just finished sorting this. So what comes next? Sort the right half, which is this. And now I've sorted the left half of the left half of the left half, which is a big mouthful. But what do I do as a third and final step when sorting this list of size two? Merge them. This part we know how to do. I point left and right. And I now take the smallest element first, which is the three. Then I take the six. And now this list of size two is sorted. So if you remind in your mind's eye, what step are we on? Well, we have now sorted the left half of the left half. So what comes after the left half is sorted? We sort the right half. So we're sort of rewinding in time, but that's okay. I'm keeping track of the steps in my mind. I want to now sort this list of size two. How do you sort a list of size two? Well, you divide it into a list of size one. How do you sort this? You're done. You then take the other right half and you sort it. Done. Now you merge the two sorted halves. So I point at the four and the one. Obviously the one comes first, then the four. Now I have sorted the right half of the uh the right half of the left half of the original numbers. What's the next step? Now that I have the left and right halves of this list of s four sorted merge those. So same idea but with fewer elements. I'm pointing at the three and the one. Obviously the one comes. Now I'm pointing at the three and the four. Obviously the three comes next. Pointing at the six and the four. The four comes next. And now the six comes last. Now I have sorted the left half. And it's intentional that 1 3 4 6 is the original arrangement of the lighted numbers I had on the shelves a moment ago. All right, it's a long story it seems. But what comes after you sorting the left half of the original list? You sort the right half. So let's put some uh put those numbers over here. How do I sort a list of size four? Well, you sort the left half. How do you sort this thing of size two? You sort the left half. You sort the right half. And now you merge those together. How do I now sort the right half of the right half? Well, I sort the left half. I sort the right half. And then I merge those together. Now I have sorted the left half and the right half of the right half of the original elements. What's next? The merging 0 2 5 and 7. Now we're exactly where we were originally with the lighted numbers. I've got 1 3 4 6. The left half sorted 0257. The right half sorted. What's the third and final step? Merge those two halves. of course 0 1 2 3 4 5 6 and 7 and hopefully even though there's a lot of words that come out of my mouth I was acting this out there wasn't a lot of back and forth like I definitely wasn't like walking back and forth physically and I also wasn't comparing the same numbers again and again I was doing sort of different work at different conceptual levels but that was like only what like three levels total it wasn't n levels on the board visually so where does this get us with merge sort s. Well, with merge sort, it would seem that we have an algorithm that I claim is doing a lot less work. The catch, though, is that merge sort requires twice as much space, just as we saw when I needed two shelves in order to merge those two lists. So, how much less work is actually going to be possible? Well, let's consider sort of the analysis of the original list and how we might describe its its running time in terms of this big O notation. Hopefully, it's not going to be as bad as n^ squ ultimately. So, here are some like breadcrumbs that if I hadn't kept updating the screen and deleting numbers once we moved them around, here are sort of like traces of every bit of work that we did. We started up here. We did the left half, the left half of the left half, the right half of the right half, and then everything else in between. And you'll see that essentially I took a list of size eight and I did three different passes through it. At this conceptual level, at this conceptual level, and at this one. And each time I did that, I had to merge elements together. And if you kind of think about it here, I pointed at four elements here and four elements here. And in total, I pointed at eight elements. So there was n steps here for merging. And if you trust me, I'll claim that on this level conceptually, there were also eight steps. I wasn't merging lists of size four, but I was merging two lists of size two over here and two more lists of size two over there. So if you add those up, those are n total steps or or merges, if you will. And then down here, this was sort of kind of silly. I was but I was merging ultimately eight single lists alto together into the higher level of con uh of conceptually. So from a list of size eight we sort of had three levels of work and on each level we did n steps the merging. So where is three? Well it turns out if you have eight elements up here the relationship between 8 and three is actually something formulaic and we can describe it as log base 2 of n. Why? Because if n is eight, if you don't mind doing some logarithms here, log base 2 of 8 is the same thing as log base 2 of 2 to the 3 power. The log 2 and the two cancel itself out, which gives you exactly the number three that I sort of visualized with those traces on the screen. Which is to say irrespective of the specific value of n the big O running time of merge sort is apparently not n^ squ but it's log n time n or more conventionally n * log n because you're doing n things log n times technically base 2 but we don't care about that generally for big O notation and indeed in big O notation we would say that merge sort is on the order of N log N that's its big O running time sort of at the upper bound. What about the lower order bound? Well, there's no clever optimization in our current implementation as there was for bubble sort. And so it turns out the lower bound would be an omega of n login and in theta therefore of n login as well because big o and omega are in fact in this case one and the same. And if we actually go back to our visualization from earlier, give me just a moment to pull that up here. In our earlier implementation or an earlier demonstration of these algorithms, we had a side-by-side comparison of all the comparisons. But here, if I go ahead and randomize it and click merge sort, you'll see a very different and clearly faster algorithm. Even though the computer speed has not changed, but it's touching these elements so many fewer times, it's wasting a lot less time because of this cleverness where it's instead dividing and conquering the problem into smaller and smaller and smaller pieces. And to give this a final flourish since that was yes faster but not necessarily obviously faster than other things that we've done. How might we actually compare these things side by side by side? Well, in our final moments together, let's go ahead and dramatically and for no real reason just dim the lights so that I'll hit play on a visualization that at the top is going to show you selection sort with a bunch of random data. On the bottom is going to show you show you bubble sort with a bunch of random data. And in the middle is going to show you merge sort. And the takeaway ultimately for today is the appreciable feel of difference between big O of N^2 and now big O of N log N. Heat. Heat. All right. The music just makes sorting more fun. But that's it for today. We will see you next time. All right. This is CS50 and this is week four, the week in which we take off the proverbial training wheels that have been the CS50 library and reveal to you all the more what's going on underneath the hood of a computer in terms of its memory. We'll also talk about files and how you can actually persist information for a long time, whether it's a file you've downloaded or today that you've created yourself. But first, I just wanted to share some artwork that two of your classmates, Avery and Marie, kindly made before class, which is a picture made out of Post-it notes. uh some green, some purple, which collectively from where you are looks like what? >> Yeah. So indeed it's a cat that they made using only zeros and ones or green and purple pieces. And in fact, even though this is fairly low resolution in that it only has a few pixels this way and a few pixels this way, it's actually representative of how computers do actually store images underneath the hood. So let's actually start there. In fact, we've had this bowl of stress balls for some time here on the lect turn. And if we take a beautiful photo of it, they look a little something like this. Of course, this too is a finite resolution. And by resolution, I just mean how many dots go horizontally and how many dots go vertically. Multiply those two together and you get some number of bytes, maybe in kilobytes, megabytes, or heck, if it's a massive image, it could be even bigger than that. But it is in fact finite. And if we zoom in on this image, you start to see a little more detail. But at the same time, if you keep zooming in, you start to see indeed that there's only finite detail. And when we go really uh zoomed in, you start to see actual dots or pixels as they're called. In fact, on most any screen, any image you look at, if you look close enough by pulling your phone up to your eyes or walking really close to a TV, you may very well see the same thing because any image on a screen like this is represented by hundreds, thousands, millions of tiny little dots called pixels. And each of those pixels has a color that gives it collectively the appearance of stress balls in this case or cats in this case. So in fact among the things we're going to do this week in the problem set is actually have you write code via which you can manipulate your own images um not only to understand what's going on underneath the hood but to apply some of today's most familiar filters so to speak. In fact if we go all the way down here you'll see that this image of course is multiple colors. We've got some white and some red and shades in between. But let's keep things simple for a moment and propose that instead of looking at these dots, we look at these zeros and ones. And let me propose that in a picture like this, any zero will be interpreted as black. Any one will be interpreted as white accordingly. If you can see it, what is this a picture of? >> Oh, smiley face is in fact right. Because if you kind of focus only on the zeros and try to ignore those ones, as I can do here for you, you'll see that embedded in that image was in fact this smiley face. Now, this would be a sort of one bit image. You either have a zero or one representing each of the colors. In modern times, we would actually use 16 bits per color, 24 bits for color, maybe even more. And that's how we can get every color of the rainbow instead of just something black and white. But in effect, what's happening here is that if you did have a file on your Mac or PC or phone storing this pattern of zeros and ones and you opened it up in some kind of image program or like the photos app, it would be depicted to you visually as this simply a grid X and Y where some of the dots are white, some of the dots dots are black. All right, so with that said, how what kinds of um representations might be involved here? Well, we can actually rewind to week zero. Recall that we talked briefly about RGB, which just means red, green, and blue, which is one of the most common ways to represent colors inside of a computer. And if any of you have ever dabbled with Photoshop or similar editing programs, or if maybe in high school or earlier you made your own web pages, odds are you're actually familiar with a syntax we're going to see a lot of today. This doesn't add anything intellectually new. It's just an introduction to a common convention for how else we can represent numbers. So, this is a screenshot of Photoshop's color picker. Photoshop being a popular program for editing photos and files. And you'll see here that my selected color looks to the human eye as black. And I've highlighted here how I got that. I chose black by typing in 0 0 0. Which also, if you look up here, means that I want zero red, zero green, and zero blue. And yet, we somehow translated it to six zeros instead of just three. Well, if we take a look at another color like white instead, I claim that you can represent white in Photoshop and today in code with FF FFF or equivalently 255 red, 255 green, 255 blue. And here, if you think back to week zero is maybe a hint at where we're going with this. If you're using an 8bit number, which means then you can count from zero on up to 255. So recall that 255 is like the biggest number you can represent with just eight bits. And yet somehow there's going to be a relationship between the 255s and these Fs that we see down here. Let's just run through a few more. If we wanted to represent something like red, we're going to use FF 000000. If we want to represent green, we're going to use 00 FF 0. And lastly, to represent blue, we're going to use 0000 FF. So what's going on here? And why do we have just this different convention? Well, turns out in the context of images and also memory in general, it's just human convention or programmer convention to use this alternate representation of numbers. Not the so-called decimal system, but another one that's not all that far off from what we've been doing over the past few weeks. So, here again was the binary system. You've got just two digits in your vocabulary, 0 and one. Here is the familiar decimal system where you've got 10 instead, 0 through 9. Suppose we wanted a few more digits. Well, we're sort of out of Arabic numerals here, but I could toss into the mix like A, B, C, D, E, and F, either in lowercase or uppercase. And in fact, that's what computer scientists do when they want to have more than just 10 digits available to them, but as many as 16 digits available. And in fact, when you want to use this many digits, you call it hexa decimal, implying that you've got 16 digits, aka base 16. Now, this there's an infinite number of base systems. We could do base 3, base 4, base 15, base 17 on up. But this is just one of the relatively few conventions that are popular in computing. And let's just tease it apart because we're going to see these kinds of numbers a lot. Well, thankfully, like in week zero, like it's the same old number system with which you're familiar with the columns and the placeholders. It's just the bases in those columns mean a little something different. So instead of using powers of two or powers of 10, we're going to today use powers of 16. So 16 to the 0 of course is 1. 16 to the first power is uh 16. So we have the ones column, the 16's column and so forth. Meanwhile, if we wanted to therefore start counting in hexadimal, this twodigit number in hexadimal is of course the number you and I know in decimal as 0 because it's still just 16 * 0 + 1 * 0. This in hexadeimal is how you would represent one, but you would say 01 or 01 instead of just one to make clear there's two digits. This would be 02 03 04 05 6 7 8 9. Now things get a little interesting. In the decimal world, we're about to carry the one and give ourselves two digits 1 and zero. But in hexodimal, you can keep going. So the next number in hexodimal is going to be 0 A 0 B 0 C 0 D 0 E 0 F. And now things get interesting again. What probably comes after zero F? Even if you've never seen hex before >> so one zero. You still still carry the one as before. This goes back to zero. And why is this now appropriate? Well, how many digits did we just how many numbers did we just count through? Well, we started at 0 0. We went up through 0 F. And that's a total of 16 combinations. So, the highest we counted, let me rewind. This number here, of course, is going to be 1* F. But what is F? Well, let's rewind further. In fact, let's have our little cheat sheet here. If we want to have these digits at our disposal, I dare say that 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15. So fif f is just going to represent the number 15. So if we now fast forward back to where we were just counting from zero on up through 0 a through 0 f, we land here. This of course is 16 * 0 1 * f which is 1 * 15. So this is how in hexodimal you would represent the number 15. This in hexodimal is how you would represent the number 16 instead. 15 to 16. This is not 10. That's how you would pronounce it in decimal. This is 1 0 in hexodimal because 16 * 1 + 1 * 0 gives us of course 16. Now we could do this toward infinity but we won't. 1 2 1 3 dot dot dot all the way up to ff. So quick mental math. 16 * f. That is to say 16 * 15 + 1 * 15 is any guesses? >> It is in fact 255. You don't even have to do the math because if you just think about where we were going with this, indeed we saw pairs of fs in the Photoshop screenshots because this is how a computer would represent the number you and I know in decimal is 255 by just using two fs. So why do we care about hexadimal? Well, it turns out that it's just convenient to use two hexadesimal digits to represent numbers because a single hexodimal digit can be used to represent four bits at once. For instance, let me go ahead and explode this by putting a little bit of space between the two digits here. And let's consider how you would represent f. Well, if f is 15 and you want to represent 15 in binary, I think that's just going to be 1 one one. Now, why is that? Well, one in the eighth's place plus one in the four's place uh plus uh one in the two's place plus one in the onees place indeed gives me 15. So using a single f I can count up as we've seen already as high as 15. But of course I've claimed in the past that it's super common to use eight bits at a time or one bite to represent any value because that's just a very useful common unit of measure. And so in hexadimal if you wanted to represent four ones you can say f. If you want to represent another four ones, you can just say f, which is to say that f and f together is just like the same as eight ones together, which is how we finally get to the total number of 255 because this is the ones place, the two's place, the four's place, the 8s, 16, 32, 64, 128. But if you group these into clusters of four bits alone, you can represent all of the possibilities from 0 through 15 just using 0 through f. So with one hex digit you can represent four bits which is a long way of saying is it's just convenient for that reason which is why the world tends to use hex when talking about colors and as we'll see memory as well. So in fact let's consider what is meant by memory and what's going on inside of the computer when we've been storing values thus far. Well here's that canvas of memory. I proposed last time uh in uh I proposed last time and before that we can sort of number these bytes arbitrarily but reasonably. This is bite 0 1 2 3 4 5 6 7 dot dot dot and maybe this is bite 15. That's fine. Nothing wrong with that. But in the real world, any programmer would actually think of these locations instead not in decimal notation but in hexadimal notation just because because it's convenience for the reasons discussed. So we would actually number these from zero on up through 9 and then keep going with a b c d e f and so forth. So what does that mean for the other digits? Well, this would be 1 0. This would be 1 1. This would be 1 2 dot dot dot. Here now is 1 9. But here's 1 A, 1 B, 1 C, 1 D, 1 E, 1 F, and so forth, just using hexodimal notation. But there's arguably some ambiguity here. For instance, if you just at a glance were to look at this board and see this address 1 0, is that by 10 or is that byte 16? It's just non-obvious because if you don't know what base system you're working in, which you could infer by looking at the rest of it, it could potentially be ambiguous. So in the world of hexodimal, super common to literally prefix any number you ever write in hexodimal notation using 0x. The zero doesn't mean anything per se or the x. It just means what follows the 0x is a number in hexodimal notation which makes unambiguous the fact that this is o x10 which if you do the math in decimal again ends up being 16 not of course the number 10. In short today you're about to see a lot of zero x's and a lot of twodigit or fourdigit or 8digit numbers in hexodimal notation. Generally we don't care what the numbers translate to. You don't need to do a lot of math but it's going to be common place to see syntax like this. All right, back to sort of normal time. So, here is a line of code int n equals 50 wherein we might want to declare a variable called n and store a number like 50 in it. Let's actually go ahead and do this simple now as it probably is in a file called how about addresses C. We're going to play around with computer addresses. And in addresses C, I'm going to do something super simple at first whereby I'm going to include standard io.h. Then I'm going to go ahead and in uh write int main void. No command line arguments here. And then I'm going to declare this variable n, set it equal to the arbitrary but familiar value of 50. And then just so that this program does something mildly useful, let's go ahead and print out with percent i and a back slashn that value of n. So nothing new here. I'm just literally going through the motions of declaring a variable and printing its value. So let's do that. Make addresses enter dot slash addresses. And hopefully I'll indeed see the number 50. So, not all that much going on in the code, but let's consider what's going on in the computer's memory. This line of code and the one after it is giving the results of that program, but where is that n ending up? Well, here's my grid of memory. And let's just suppose for the sake of discussion that the 50 ends up down here. Maybe there's other things going on in my program. So, this part of my computer's memory is already in use. So, it's reasonable that it could end up in this location here. But what is important is that how many bytes am I using for n? Apparently, >> four. And that's because we've said integers tend to be four bytes aka 32 bits. So this is at least to scale even though I'm just imagining where it ends up in memory. So that's where the 50 actually ends up. So when I actually call print f and pass in n, clearly the computer is going to that location in memory and actually printing out that value. But that value is indeed at a specific memory address. It's not going to be quite as simple as ox0 or o x1 or a small number typically. It maybe is going to be something arbitrary like ox123 where I'm just making this up. It's an easily pronouncable number in hexadimal notation. All right. So what can I use that information for? Well, thus far this hasn't been useful to us, but certainly programs we've been writing have actually been making use of this. But with a bit more syntax, I can actually start to see things like this, not just on the screen, but in code. In fact, let me propose that we introduce two new operators in C. So, two new pieces of syntax. One is a single amperand and one is a single asterisk. And we'll see that uh the asterisk has a few different uses, but the amperand has a very simple straightforward one, which is to just get the address of a variable in memory. So if you've got a variable like n, if you prefix it with amperand n, you can actually ask the computer at what address is this variable stored. You can find out if it's indeed ox123 or something else altogether. So in fact, let me go ahead and do this by going back to my addresses.c program and let's see if we can print out not the value, which is obviously going to be 50, but let's actually print out the address thereof. So up here in my code, I'm going to change the N on line six to be amperand N instead. And I'm going to go ahead and make one other change because yes, N lives at an address. And yes, that address is technically a number, but it's conventional not to use percent I to display that number, but rather another piece of syntax, which is just a new format code, which you don't often need. This is more demonstrative than useful, I would say. But percent p is going to be what we use when we want to print out an address of something in the computer's memory. So, back to the VS Code. One more change. I'm going to change my percent i to percent p instead. So, at this moment, we should see a version of the program that's not going to display 50 anymore, but something like ox123, but probably a bigger number than that cuz my computer has way more memory than that address suggests. So, let's again make addresses. Let's run dot / addresses. And indeed, this variable at that moment in time apparently lives somewhere in the computer's memory at address ox7 FFD3 C34 EC C. All of those are hexodimal digits. It would be painful to do the mental math to figure out what the numeric address is. But we're seeing it indeed in this common hexodimal notation which is not going to be often useful for us as humans. But the computer is and has been using this information for some time. So in fact what we're about to introduce is admittedly one of the more complicated concepts in computing and in C in particular namely a topic called pointers. And I will say today more so than ever might feel like a bit of a fire hose. In fact, all these years later, I still remember the day in which I finally understood this topic, which was not the day of the lecture in which it was introduced, but it was in like the back right corner of the Elliot House dining hall. I was sitting down during office hours with my teaching fellow and he finally helped that light bulb go off over my head. So, if some of this feels a little arcane today, it just comes with time and with practice like everything else. So, what is a pointer? A pointer is going to be a variable that can store an address. Now, yes, that address is technically just a number, like an integer, but we distinguish between integers that we care about like 50 and things we might do math on, and a pointer, which in this case is just going to be the address of a variable uh the address of a value in memory. So, what does this mean? Well, we can start to do things like this. I can declare my variable n as before and set it equal to the value 50. But I can actually get the address of n and put that address in another variable. And that variable we now call a pointer. So P is going to be the name of this variable. It's going to store the address of N which we can get using the amperand. But there's one more piece of syntax which I promised before. This asterisk here. And the asterisk here means that this variable P stores the address of an integer, not an actual integer per se. It's weird looking syntax. It kind of looks like multiplication, but it isn't. It's just the developers of C decades ago decided to use an asterisk, even though it's admittedly nonobvious what it's doing. But in this context, when you see an asterisk right after a data type like int, it just means that the variable in question is not going to be an int per se, but an address of an integer. Okay, so let's put this to the test using a line of this code in my own file here. Let me propose that we do this. Let me go back to VS Code here. Let me introduce this additional variable int star p as it's typically pronounced. Set that equal to amperand n and then do the exact same thing as before. Let's not print out amperand n but let's actually print out the value of p itself because p is now equivalent to amperand n. So let me go back to VS Code. Let me do make addresses again. And huh, I did something wrong and stupid here. This was not meant to be the moral of the story. What did I do wrong? Yeah. >> Yeah, I just missed the semicolon. So, still making those mistakes here. All right. And let me clear my screen again and do make addresses. Entertresses. And now I should indeed see the address of N, I just so happen to temporarily store it this time inside of a variable called P. Now, just so you've seen it, it turns out that when using this syntax of using a star to declare a so-called pointer and amperand over here to get the address of something, you might see in online references and such different formattings of this. This is the canonical way to declare a pointer. Int space, then the star, then without a space, the name of the variable. However, it will work and you will sometimes see that the star is over here or the star is in the middle. But again, we would recommend stylistically that it just go here. Admittedly, I think it would have been clean clearer if the star were over here, making clear that it's related more to the int than it is to the variable name. But this is simply the convention. So this means, hey computer, give me a variable called p that's going to store the address of an integer. And the amperand is just saying, hey computer, tell me the address of n. And it's the compiler and computer itself that decided where to put that variable in memory. Questions. >> Would you get an error if you didn't put the asterisk? You would. And let's take a look. So, let me go ahead and clear my terminal. Let me go ahead and delete the star before the variable p. Now, let me go ahead and do make addresses again. And indeed, I'm getting an error. Incompatible pointer to integer conversion initializing int dot dot dot. And even though that's a lot of big words, it kind of says what it means. You're trying to go from a pointer on the right to an integer on the left, which is just not appropriate here. Yes, at the end of the day, they're all numbers, but it's more properly a pointer or an address on the right, but a little old int now incorrectly on the left. So, the fix there is just to indeed put it back. Other questions on this new syntax? Yeah. you do like >> indeed. To recap the question, can you use the address of operator to find the address of other data types like strings? Absolutely. And we'll do that with a couple of examples today as well. We're just using ins to keep it super simple initially. Other questions on these addresses and pointers. >> So we still use variables even if they're not integers. Is that right? >> Correct. Correct. Even if it's not an int question, we'll come back to other data types in a little bit. You're still going to use the star. That is the same syntax for everything. And yes, >> can you tell the computer I want to store these variables in this address? >> Oh yes. Can you tell the computer you want to store a variable in this address? That's where we're going in just a bit. Indeed. Now that we have the ability to find out the address of something in memory, stands to reason that we can go to that address ourselves and maybe poke around and actually put values there. And in fact, that's that's among our goals for today. So let's consider how we might get there. So here now is my canvas of memory and let me propose that the number 50 happened to get stored in the variable n down there at bottom right just because and that's probably ox123 or in reality a much larger address but it's easier and quicker for us to just pretend it's at 0x123. What is actually happening in code when I declare P and put a value there? Well, recall a moment ago I declared P to be a pointer to an integer. that is the address of an integer. So what's happening in memory is this. If n is down here and happens to be at address ox123 when I actually assign p to amperand n that just literally takes that address of n and puts it inside of p. Now p as an aside happens to be pretty big. It turns out by convention on most systems a pointer that is a variable that stores an address is actually going to be eight bytes large. It's going to be 64 bits. Why is that? Our computers have so much darn memory nowadays in the gigabytes that you need to be able to count higher than 4 billion. As an aside, if you only used 32 bits for your pointers, you could only count recall as high as 4 billion. 4 billion uh is 4 gigabytes equivalently. That would mean your computers could not have 8 gigabytes of memory, 16 gigabytes of memory. Your servers couldn't have tens of gigabytes of memories. We use 64 bits or eight bytes nowadays for pointers because our computers have that much more memory. All right. So what is Ptor Storing? Literally just an address like this. So when we wrote this code just a moment ago, what the computer did and has been doing for the past several weeks is literally just finding the location of N in memory and plopping that value inside of P which itself is taking up a bit of memory but or uh by convention more memory 8 bytes in this case. The thing is who really cares about this level of detail? Typically, as programmers, it's useful to understand what's going on, but rarely are we going to care precisely about where things are in memory. Today is really about just kind of looking at what's going on underneath the hood. So, in fact, we can abstract away most of my computer's memory, I would propose, because at the moment, all we care about is P existing and N existing. So, who really cares what else is going on? And frankly generally I am not going to care that N is at address ox123 just that it is at an address that happens to be ox123. And so the way a programmer or computer scientist when talking about design on like a whiteboard or frankly in sections and office hours on a whiteboard we rarely care what the actual addresses are. So we generally abstract the specific address away and literally represent pointers with arrows on the screen or on the whiteboard or the like. This just means that P is a variable that points to the number 50 in memory. Okay. Questions on this mental model for what a pointer is. It's a pointer in like very much the literal sense. Okay. So, if you're on board with that, let me propose that we consider now um what these things look like maybe more physically. In fact, we've we've got a couple of mailboxes here to make clear with a little metaphor that uh here is a physical representation of our variable say P labeled as such. Inside of this is presumably going to be the address of some actual value. That value at the end of the story is going to be the value of N which recall for consistency is that address ox123. So what happens when you actually try to uh locate a value in memory is analogous to sort of looking up something inside of these mailboxes which if you think of your computer's memory as hundreds or thousands of little mailboxes maybe more apartment style where you've just got rows and columns of mailboxes as opposed to individual ones for single family homes. Each of those mailboxes can contain the address of some value in memory. And so what's really happening is that if this is P, not drawn to scale because they only make mailboxes so large. Inside of P is going to be an address like ox123. And just to be dramatic since there's a big football game this weekend, uh here is a Harvard foam finger metaphorically like this pointer is like pointing at that value over there. And in fact, we're going to see as you asked a moment ago, can we actually go to an address in memory? We don't yet have the syntax for that, but we're about to. Yes, you can. And in fact, if I follow what I'm pointing at, open up this location in memory, voila, there is the 50 in question. So, anytime we're talking about values or we're talking about the addresses thereof, you can think of it analogously as being like physical mailboxes, one of which might contain a useful number like 50, one of which might contain the address of that value. And we now have the syntax we'll see to actually go from one to the other. Let me actually go back into VS code here which in the most recent version of my program what I was doing was getting the address of N and storing it in P and then I was literally printing out P itself and that's when we saw the big hexodimal number that is generally not useful but it's maybe interesting to see that one time. Let me instead though introduce another use of that star or asterisk operator that allows us as was asked a moment ago to actually go to that address. So in this version of my program, I'm going to keep N equal to 50. I'm going to keep P equal to the address of N. But what I'm now going to do is show you how syntactically I can print out not P, but N, but by using P, following the proverbial uh foam finger metaphor by printing out percent I back slashN and printing out N instead. Now, obviously, I could cheat and just say N and print out N like in version one, but that doesn't really demonstrate anything interesting here. However, if I only have P at this point in the story, it turns out you can use the star for another purpose. If you simply prefix your variable name with a star, that is the so-called now dreference operator, which means go to the address in P. So if I now open up my terminal here, do make addresses for this version, then dot / addresses and enter, I now get back the number 50. So what's really happening in line five, as has been true for several weeks now, we have a variable called n being initialized to the number 50. Then on my next line six, I'm declaring p as an address of some value, an integer specifically, and putting the address of n in there exactly. And then on line seven, I'm actually saying print out an integer percent I as we've done for weeks. But what integer? Go to the address in P and print out what you find there. So that's equivalent again to the the foam finger which is over there pointing at the address I actually want to point print out instead. Okay. So usefulness. Well, I think we can get there by taking a look at one of our little white lies that we've been telling. In fact, let's turn our attention to strings, which up until now have been a sequence of characters in the computer's memory. A string is a thing in programming more generally, but in C, it technically doesn't exist by this name. But you can still use strings in C, but just not by calling them str iing as the actual data type. But let's let's start with our familiar code here. Let me go into addresses.c. Let me add our trading wheels in for now and include cs50.h because in this version of my addresses program, what I want to do is declare a string s and I'm going to set it equal to high exclamation point. Then as we did in week one, let's go ahead and print out with percent s back slashn that value of s. So nothing new, nothing interesting here. So let me just do it quickly and do make addresses then dot /resses and we see hi on the screen. So that has all been something we've been taking for granted. But let's consider what is going on underneath the hood of even that program. So the string we've declared in memory exists somewhere in the computer's canvas of memory. So string s equals high might end up somewhere down here. And I'm going to stop drawing all of the boxes when not necessary. But here we have hi exclamation point. And as we discussed two weeks ago, the null character and ul which just means the string stops here. So as a quick refresher, even though the word is three characters, it takes up how many bytes? Four. always because you need that null terminator. All right, so maybe that string could be accessed then by its name S. And we've seen this before. S bracket zero is the first character. S bracket 1 2 and then if you want to poke around, you can go into S bracket 3, but you'll probably see quote unquote null on the screen or the compiler will sort of the computer will sort of remind you that you don't really want to look there at that point. So, three characters accessible via this array syntax. But we know now that everything in the computer's memory is addressable. And maybe that H just so happened to end up at ox123 and the i ends up at ox124 125 126 respectively. Doesn't matter what these numbers are, but because strings are sequences of characters back to back up to back in memory, it must be the case that these addresses are themselves contiguous back to back to back without gaps inside of them. That's how a string has always been stored in memory. It's just an array of characters. All right, so with that said, what really is S? We've thought of S in every program we've used strings in before as just a string. Like that is the sequence of characters or really it's the name of an array. But that's a bit of a white lie because what S really is is going to be a more specific value. Take a guess what is actually going to be the value in S. >> Yeah, the address of if I may that array. So we've got like sort of four possible answers here. A, B, C, and D. Multiple choice. Which of those numbers probably makes sense to store in the variable called S in order to get to this string? What what is S's value? Yeah. >> 0x123 is correct. So we don't talk about this in like week one because like it's already hard to like remember semicolons in week one. Like god forbid start thinking about like what these specific addresses are. S is a string. S. But technically S is and has been since week one a pointer. The address of an array of characters in memory. The address specifically of the first character in memory which is sufficient. Why? Because of this null terminating convention that we talked about weeks ago that tells the computer where the string ends. The pointer tells the computer where the string begins. And that's how you get using just numbers, zeros and ones inside of a computer to store something as interesting as an actual string. So in fact, let's make let's take a closer look at this. In fact, let me go into uh VS Code again and just for the sake of discussion, let me declare S as before, but instead of printing out uh the whole string at once, let's go ahead and do this. print f uh quote unquote percent p back slashn and then let's print out s itself initially to see whether it's actually o x123 or presumably a much bigger number then after that let's print out another pointer another address rather percent p back slashna and now I'd like to print out the address of the first character of s but let's let's not get ahead of ourselves let me go ahead and make addresses n dot /resses. Okay, there now in this high program is the address at which the string itself is stored. ox 5a7143027004. So bigger than ox123. Well, let's now poke around. What if I were to do this? What if I want to print out the address of how about the first character in that string? Well, at the moment, recall that s bracket zero is literally the first character. That is a char. So with what syntax could I get the address of the first character? Well, we haven't learned all that much that's new today. It's just a single amperand that will get me the address of that character. If I do this for the next character, I can see one after another. And in fact, this is going to have four characters in total, including the null character. So let me copy paste, which is generally frowned upon, but not for a lecture demo because we're just trying to do this quickly. Let's print out the address of S itself. and then more specifically the address of S's first character, the address of S's second character, third, and the address of that null terminator. All right, let's go back into make addresses. Let me go ahead and clear my terminal and dot slash addresses. And we see if I zoom in on my terminal here, the following. S itself contains ox 56199 bd00004. And the address of the first character in S, aka S bracket zero, is exactly the same thing. The next character, the I in high is one bite away. The exclamation point is one more bite away. And the null terminator is one more bite away. So again, bigger numbers, but the point is these are indeed just the actual addresses of all of these characters in memory. All right, let me pause for any questions here. Yeah, >> why do you need a reference specific but not S? >> Good question. Why do I need the amperand before the specific characters in S but not S itself? Think what S actually is. I'm claiming for the moment that S itself is the address of that whole string which just so happens by design to be equivalent to the address of the first character because that is the convention humans came up with decades ago to represent a string. Now you might think that you need the address of every character in the string. But no, that's why humans decades ago decided to just terminate every string in memory with the backslash zero or null terminator because if you give me the beginning of the string and the end, I can obviously with a loop find everything else in between. Other questions? No. All right. Well, what is then this actual thing in memory? Well, it turns out that S is yes, a string as we've been describing it. It turns out that yes, S is a string as we've been describing it all this time. But technically, I think we're ready to reveal what little white lie we've been telling or if you will, what abstraction S actually is in the CS50 library. The type you know as string since week one all this time has simply been a synonym for char star s this is where maybe so what does this really mean well we saw instar p earlier here we're seeing char star s but what does that really mean well s is the name of the variable and yes it's a string but what is it really s is the address of a char and so in week one of the course in the actual CS50 50 library. We've told this little white lie by just creating a synonym in the library that makes char star so to speak the exact same thing as string s t r i n g just so that we don't have to think about this level of detail let alone hexodimal notation and addresses and pointers and dreferencing and all of this complexity in the first weeks of the course. It simply abstracts away what the char what a string actually is. And in fact we've seen this technique before in a more complicated way. In fact, if you recall a couple lectures uh last week, we actually claimed that you could create a phone book for instance using uh persons and persons have names and numbers and we created our own type by saying type defaf and that type was a whole structure which is the complexity part a structure containing a name and a number and we gave that data type ultimately the keyword person. So we've already invented in class our own makebelieve data types to create things that didn't come with C itself like a person. Well, the strruct is very specific to what we were trying to do with the phone book, but typed defaf is more generally useful because it literally allows you to define your own type. So, for instance, if we wanted to create an synonym for int because we never remember what it is and call it integer instead, you could simply say type def int. And that would create in your programming environment a data type called integer that is literally equivalent to int. Now, this is not all that useful. So instead in the CS50 library, we do use typed defaf to tell the computer that charar should instead be spelled as string semicolon. And that just means that string ever after is the same thing as saying char star. So all of this time since week one, I could have been doing exactly that if I wanted. And in fact, if I go back to VS Code here, let's simplify this quite a bit and go back to the very first version of the program wherein I use percent s and just print it out s is value itself, the string high. Well, this of course is going to work as always as follows. It's just going to print out high on the screen. But now, if I get rid of the CS50 library and try to recompile this, notice we'll get an error that I think I've seen before. Here we have if I scroll up to the very first line use of undeclared identifier string did I mean standard in and no I don't and no I didn't a couple weeks ago when I accidentally did that but it the compiler does not know about the keyword string at the moment. Well that's fine even if I don't have the CS50 library installed on this computer. I can just get rid of the word string which is a concept but not a keyword in C and just rename it to char star. And now in my terminal window, I can do make addresses again, dot slash addresses, and voila, we're back in business with no CS50 training wheels whatsoever because printf knows given a char star, go to that address, print, print, print, print until you get to the null terminator, and then stop printing. There's a loop in there that does exactly that. questions on char star or what a string actually now is. >> Yeah. In front. >> Good question. How does print f know to keep going until it gets to the null? the format code because I've been using percent s which means print a string instead of percent c which means print a single character print fc is that percent s and it was like oh I should use a loop to print out all of the characters until the null terminator if I instead passed in just percent c it would stop after a single character >> okay that makes sense >> other questions >> good question why Why don't I dreference S in order to print it out? So, let me try that for just a moment here. Why do I not have to now or any week prior do S here? Because after all, if S is the string, I want to go to the string and print it out. Well, the first answer is that print f is doing this for you because it's being handed the address and it is going to the address for you. So, that star is somewhere in print f's implementation. But this is also incorrect conceptually because yes s is the string but more technically today s is the address of the first character in the string. So I really want to provide print f in this case with the address not the specific character because I want it to treat it as a string not a single character indeed. So I could use the percent s if I change to percent uh I could use star s if I change to percent c to print out the single character. All right. So let's play around just syntactically for just a moment here in VS code. Let me propose that we still use charst star s here and then just demonstrate exactly what's going on. So I'll do exactly what was just asked. So I'll use percent c and then I'm going to go ahead and print out for now our old week 2 syntax treating s as an array. So s bracket zero, s bracket one and s bracket 2. And I'm using some copy paste just for time sake. This of course is not going to do anything all that interesting, but it is going to demonstrate that indeed we have h i exclamation point back to back to back in memory. And if I really want um I could print it all on one line by getting rid of of course those new lines. But what more can I do with this syntax? Well, I could take literally the fact that s is the address of the first character in memory. So instead of using this array notation which we introduced in week two, I could technically go to the address of S. Why? Well, S is the address of the first character of the string. Star S means go to that address. And voila, you're at the first character by definition of what S is. So I could print out the first character using star S instead of S brackets zero. How could I do this? Well, here's where we can actually take advantage of the fact that pointers and addresses more generally are in fact numbers and you can actually do arithmetic on pointers themselves. In other words, there is a concept known as pointer arithmetic which means given an address, you can add to it, subtract to it. Heck, you could even multiply or divide. Even though that would probably be weird in most cases, we could certainly add numbers to an address. So for instance, if I want to print out the second character of S, that's kind of equivalent to going to S but then moving over one character. So maybe I should do a little bit of pointer arithmetic and do S + 1 in parenthesis just so that like in math class we uh do order of operations correctly. And then down here I could go to S again. But wait a minute, I want to go to S plus two characters away or two bytes away. So now I can do make addresses down here. Oh, and I did mess up. Oh, new mistake. Unintentional. Yep, I forgot my parenthesis on the very end here. So that was just user error. Make addresses again dot sladdresses. And now I indeed see h i exclamation point one more time using pointer arithmetic instead of our familiar array notation. So what is that array notation? It's what we would generally call syntactic sugar, which is a very weird way of saying like it's just nicer syntax. Like no one wants to write code that looks like this. It sort of, you know, bends the mind a little bit to read and parse all of this visually. Just s bracket zero is much more straightforward. But what it's really doing is this. And the computer is essentially converting that bracket notation for us into this more esoteric but correct version instead. All right. What else can I do? Well, just for fun, for some definition of fun, let's go ahead and print out three different strings. And recall that a string is a sequence of characters that starts at some address. So, let's first print out the sequence of characters that starts at s. Let's next print out the sequence of characters that starts at s+ one. And let's lastly print out the string that starts at s+ 2. Just playing around with the definition of what these pointers are. Let me do make addresses. And oh, not my day. What did I forget? Semicolon. So if it happens to you, it happens to me, too. Make addresses dot sladdresses. And now this one's going to be a little curious. But I see hi I and just exclamation point. Why? Because I'm treating a string literally as what it is, a sequence of characters, but I'm giving print f the address of the first character initially, then of the second character, then of the third. But all three of those statements work because all three of them happen to be terminated by the same null character. Even though I and the exclamation point alone was not really my intention, that doesn't stop me from being able to do it nonetheless. All right. Well, let's do one other maybe uh application of this idea. Let me propose that. Let me propose that we take a look at our computer's memory here and let's suppose that we want to start uh comparing values because in week one we did a lot of that and we even in week zero we did a lot of that with if and else if and else and so forth. So let's make this a little more real and also reveal why last week we had to solve a unexpected problem using another string function namely stir comp str cmp. So here for instance are two arbitrary variables in memory I and J and I gave them both the value of 50 and maybe they indeed end up there each of them taking up four bytes. Last time recall that we weren't able to compare two values in memory just by using the equal equal operator unless those values last time were actually integers. In fact let's do that. Let me go back into VS Code here. close out addresses and let's code up maybe another version of my compare program from last uh from the past. This time I am going to use the CS50 library just to keep things simple initially. I'm going to include both it and the standard IO library here. I'm going to give myself main with no command line arguments. And then in main I'm going to declare exactly what we just saw on the screen. A variable I set to 50, a variable J set to 50. And then we're going to do our old familiar syntax from week one. If I equals equals J, then let's go ahead and print out something like same back slashn. Else, let's go ahead and print out quote unquote uh different back slashn. So super simple program that simply compares two variables that yes are obviously going to be the same, but let's do this. So let's do make compare dot /compare. They're in fact the same. Okay, so that actually works as intended. But why didn't it work last time when we tried comparing strings? The solution to which was actually to introduce stir comp. Well, let's go back to VS Code and resurrect that buggy example initially. In fact, let me go into VS code here and instead of using say integers, let's go ahead and do this. And I'll rename them just by convention. So my first string will be quote unquote uh let's do my first string will be whatever get string gives me. So we'll prompt the user for s. My next string will be called T by convention and I'm going to ask the user for that. Then down here, instead of using I and J, which are common for integers, I'm just going to use S and T, which are common for strings, and just ask literally the same question as we have in the past. All right, let me go ahead and do make uh compare and wow, what's the error? Well, I'll show you the error message. What did I unintentionally do wrong here? Yeah, I'm getting a string, but I'm trying to store it into an int. So, this is just frowned upon. So, let me go ahead and change that to what I should have typed the first time. Give me a string s and a string t. Now, if I do make compare, we're back in business. All right, let me do dot /compare. And I'm going to go ahead and type in, for instance, uh let's say hi exclamation point and high exclamation point, both for S&T, which are obviously clearly different. Now, we've tripped over this before and recall that the solution was indeed to introduce a function called stir comp. And I explained at a high level. Well, that's because you're not just comparing two values. You got to compare character after character after character. And that's what indeed stir comp does. So, let's go ahead and do that. Let me go back into this file. Let's go ahead and include the string library at the top here. And instead of doing s= t, let's do if the string comparison of s and t happens to equal equals zero, which per the documentation for the function means they're equal instead of one before or one after the other. No, I did not get it wrong this time. I caught it. Um, yes. So, how do we actually go ahead and compare the strings this time? Well, let me go ahead and do make compare dot /compare. And now type in exactly the same thing. Hi exclamation point. Hi exclamation point. And now they're in fact the same. And just to demonstrate that this isn't just some fluke, I can type in hi for instance and buy. And those are in fact different. So clearly stir comp is doing something useful. But what is it actually doing? Well, first of all, let's make clear that what was a string last week is technically a char star this week. So I can remove that training wheel. I'm still going to include the CS50 library because as we'll see by the end of class today, get string and get int and all of those get functions from CS50 are actually still useful because it's a pain in the neck in C still to get user input without using functions like those. But I'm going to get rid of the data type that we thought was called string. This will still work exactly as before. If I do make compare dot /compare and type in high and high, we're indeed seeing that they are now the same. So, what's actually going on inside of the computer's memory with strings? Well, I would offer that S probably ends up like over here in memory. And then maybe it actually has its characters down here. So, notice the duality. S as of now, is an address, which means it takes up eight bytes or 64 bits, but the actual characters, it turns out, end up somewhere else in the computer's memory. And this is what's different about an int. The int i and the int j both ended up exactly where the variables were named. But with strings, the variable itself contains not the string, but the address of the first character in that string, which I claim could end up anywhere else in the computer's memory. So that those addresses might be ox123, 1 124,125, and 126 for instance. Meanwhile, S is going to contain literally the address of that first character. When I create T in memory now, it ends up maybe over there taking up eight bytes of its own down here ends up the second thing that I typed in not at the same address but at ox456 457 458 459. Now if the computer were really smart and generous, it could probably notice, oh wait a minute, you typed that thing in already. Let me just point you at the other memory. But that's not how it works. When you call get string, you get your own chunk of memory for whatever the human typed in. Even if by coincidence it's exactly the same. So T's characters are ending up here. S's characters are ending up here. What value should go in T? >> Exactly 0x456 because that's the first uh address of the first character in T. So we put ox456 there. So at this point in the story, we have two strings in memory and two pointers there too. And so in fact, if we kind of abstract that away, it's kind of equivalent to S pointing at the chunk of memory on the left and T pointing at the chunk of memory on the right. So why was string comparison actually necessary? Well, in this case, we wanted to make sure that the stir comp function was handed the address of S and the address of T. So that the stir comp function written by someone else decades ago actually has its own for loop or while loop that essentially starts at the beginning of each string and compares them character by character by character by character. That's what it's designed to do. By contrast, when I was using equal equals a few minutes ago and also last week incorrectly to compare strings, what was getting compared? Well, if you literally compare s= t, that's like saying, does o x123 equal equal ox456? And that's obviously not true because those are literally two different addresses. So, the answer I was getting last week and today was correct. Those addresses are different. But conceptually of course I actually intended for the program to compare the actual characters in the string not the uh simply the addresses thereof. So how do we go about fixing something like that? Well using stir comp ensures that we can actually go ahead and compare them character by character and I don't need to create my own for loop or y loop. The stir comp function does that for me. And we can see this too. If I go back to VS Code here, get those two strings and just for kicks, go ahead and print them both out using print f of percent p back slashn. Then let's go ahead and print out with percent uh p again back slashn for each of them passing in those variables s and t respectively. What I should see that even if I type the exact same thing, we're going to see two different addresses when I make this version of the program. Here's my first high. Here's my second. And the two addresses are it's subtle very much different. The first one ends in B 0. The second one ends in F0. Both of which are hexadimal values. Question on any of that thus far? Any qu? Oh yeah, question in front. Yeah. What's that? >> Really good question. When you create a pointer in memory or really when you allocate a string or an integer in memory, how does the computer decide where to put it? It uses different chunks of memory for different purposes. And in fact, one of the topics we'll look at after break today is exactly that. How a computer decides where to lay things out. It's often very intentional and it is often auto incremented. So they'll go back to back to back when possible, but over time things will start to get messier, especially in larger programs where you're adding and subtracting values from memory all the time. So more to come. Other questions on what we have done here. All right, before we break, let's do one other example that elucidates perhaps what can go wrong without understanding some of these underlying building blocks. whereby let's go ahead and create a program this time that aspires to copy two strings, which seems pretty reasonable at a glance because it's certainly easy to copy two integers. You just set one equal to the other, but that's not going to be the case, it turns out, with copying a string. So, let me open up how about uh copy C, a new program, and I'm going to include a few libraries at the top. We'll use CS50.h so that we can still use get string conveniently. We're going to include uh cype.h for reasons we'll soon see, but we saw that a few weeks back. We'll include standard IO as always. And lastly, we'll include string.h inside of my main function, which won't take any command line arguments. Let's go ahead as before and declare a string equal to get string and just prompt the user for a variable s. Then let's go ahead and try to copy uh s into a new variable t just like I would copy any two variables using the assignment operator. Then let's treat the copy otherwise known as T now as an array which we're allowed to do per week 2. So let's say the first character in T we actually want to set equal to the uppercase version of that same character. So this line 12 at the moment is literally on the right hand side saying use the two upper function from the cype library which we used a couple weeks back. Pass in the first character of the copy T and then update the actual first character of T. So let's capitalize T but not S. Now at the very bottom of this program, let's go ahead and print out the value of S at this point in time. And then let's print out the value of T at this point in time. And when I go ahead and make this program called copy and dot /copy, let's type in high exclamation point. Uh no, let's do it lowerase first. Let's do high in lowercase. Enter. And we'll see curiously that S and T both got capitalized even though the only character I touched was T bracket zero. I didn't touch S after making this copy. Now to be clear what's going on? Why don't we remove one of these training wheels? So string really doesn't technically exist. It's always been a char star. And this string is also a char star. So what's really going on? Well, more clearly now S is the address of the string uh that the human typed in. But T is a copy of what? Literally the address of the thing the human typed in which is going to be one and the same. So in fact pictorially you can think about it this way. If here is my canvas of memory and the user is prompted for S and the user types in high in lowercase as I did and it happens to end up down there. what gets stored in S is going to be the address of that memory which for the sake of discussion is maybe ox123. So ox123 is what is stored in S. When I then on my second line of code create T, I get another eight bytes of memory or 64 bits to store a pointer charar aka string. But what is put in S? What is put in T? Literally S o X123. So abstractly it's essentially equivalent to S and T both pointing to the same chunk of memory. So when I do t bracket zero and go to the zeroth or first character of t, that happens to be the exact same chunk of memory that s is pointing to. And so when that lowercase h becomes a capital h, it's as though both s and t have changed. And recall too, if you're enjoying the syntax, if I go back to VS code here, I did use array notation, but I equivalently could have said go to the address in t. go to the address of that first character which functionally is exactly the same. We're just not using the syntactic sugar now of the square brackets. That is why hi is actually being capitalized for seemingly both versions of it. The original and the copy. So how do we go about fixing this? Well, we need a couple of new solutions, namely two new functions here. Maloc is going to be a function that allocates memory. So memory allocation aka maloc. and then free which is going to be the opposite which is when you're done with new memory you can hand it back to the computer and say use this for something else. So using these two functions alone I dare say we can solve now this problem in memory by making an actual conceptual copy of the string by copying hi exclamation point and the null character elsewhere in memory so that we can actually manipulate the copy thereof. So how do I do this? Well, let me go back to VS Code here. Let me propose that we get rid of much of what we did earlier except we'll keep around the declaration of S. But now if I want to create a copy of S, it turns out I'm going to need to ask the computer for as much memory as S itself takes up. So hi exclamation point takes up how many bytes in memory? Four is correct because you need the null character. So how do we figure this out? You can do this. Let me give myself another string called T. But we don't need that white lie anymore. Another char star called t and set it equal to not s which we knew was going to go wrong. Set it equal to the return value of this new function maloc which is going to return the address of a chunk of memory for me. How many bytes do I want? Well, technically I just want four bytes. So I could do maloc of four. And that will literally ask the operating system running in the cloud in VS Code for four bytes of memory somewhere in that black and yellow grid I keep drawing on the screen. I don't know where it's going to be, but I don't care because Maloc's return value will be the address of the first bite thereof. Now, it's a little dumb to hardcode four, not knowing what the human's going to type in, but that's okay. We can do this more dynamically and use our old friend Sterling, ask the computer, what is the length of S? and then add one because we know that we need to additionally have an extra bite even though the length of high in the real world is three but we know underneath the hood we actually need that fourth bite hence the plus one. Now to use maloc I actually need to add another library here standard lib for standard library.h and that's going to give me access to the prototype for and in turn the maloc function. Now with this chunk of memory, it's up to me to copy the string. So how do I go about copying a string from S into T? Well, I can do this in a bunch of ways, but let me propose that we do it like this. For int i equals zero, i is less than the string length of s, whatever that is, i ++. And then inside of this fairly mundane loop, let's just set the uh i value of t equal to the i value of s and copy literally very mechanically every character from s into t. Then down here, let's go ahead and capitalize just the first character of t by using two upper as before with or without the syntactic sugar. And then at the very bottom of this program, let's print out the value of S itself just for good measure to make sure we didn't screw it up this time. And let's print out the value of T just so we see that I in fact have capitalized T and only T. But I'm not quite done yet. There's a design flaw here and a mistake, but it's subtle. Does anyone want to pluck off one or the other? Check 50 and design 50 are not going to like this. Yeah. We don't actually pop over the like terminating character of the string. >> Yes, because Sterling always returns the sort of real world length of the string. Hi exclamation point 3. This would seem to accidentally forget to copy the null character. So I can fix this in a few different ways. I could for instance at the bottom of my loop actually do something like t bracket 4 equals single quotes back/z and manually terminate it myself because I know it's got to end with a null character. This would be frowned upon too. I shouldn't be hard coding the four. This is all too sloppy. So don't do this. What I could instead do is say go up to and through the length of S because if the length of S is three, but I use less than or equal to that thing's going to iterate of course four times because I'm starting at zero as always. So that I think fixes that problem. But now the design flaw which is subtle but we've seen it before. Yeah. Exactly. It's just dumb of me to be asking the computer what's the length of s what's the length of s what's the length of s and every iteration. So this is why we introduced this trick where you can set another integer variable like n equal to that string length and then after the semicolon just keep comparing i against n which means you're not calling functions wastefully as before. All right if I didn't mess up anything else let me go into my terminal. Let me do uh oh did I mess something up? I still Yes, I did mess something up. I should have put this back as well. Thank you. All right. So, let's go ahead and do make copy. Enter dot /copy. And now I'm going to go ahead and type in hi in all lowercase and hit enter. And you'll see now that s is unchanged. It's printed out again in lowercase, but t is in fact capitalized here. Now, why is this? Well, in this case, what's happened is that I've got S in memory, but this time when I allocate T, I then use Maloc to get a whole chunk of memory here that initially just contains who knows what garbage values as we've called them before. I'll just leave them as blank here, but it happens to be for the sake of discussion at ox456 7 8 and 9. When then I actually set t equal to the return value of maloc, it's as though t is just pointing to this chunk of memory. Then in my own loop when I go from zero on up through n that just means to copy the h then the i then the exclamation point and because of the equal sign also print uh copy the null character instead. So this is getting a little tedious though admittedly like this is a lot of work just to copy a couple of strings. Could we be doing this a little bit better? So we actually can because of the libraries we're including. Turns out there's functions for copying strings that come with C. So in fact if I go back to VS code here I don't actually need any of this for loop here so long as I have actually allocated enough memory for this string which I do think I've had. I can actually use literally a function called stir copy strcpy for short and pass in the destination and the source in that order. Almost feels a little backwards but that's the way it's done to copy s's bytes into t. It's easy to mess them up, but don't mess them up. Per the documentation, the destination comes first and then the source string instead. So, if I do this now, let's do make copy. We're good to go. Uh, if I do dot /copy now and type in high and all lowercase, we still have preserved that good property. But let me propose that things can go wrong. And in fact, this is about to make the program look way more complicated than feels ideal. But I've been a little lazy here. There's a bunch of things that can go wrong for which it's worth knowing about the return values of these here functions. So all of this time it has been possible for certain functions we've been using get string among them to return confusingly this null value null. Again humans decades ago decided that one would be called null. Other humans decided this new thing would be called null. N UL pronounced null is just the null terminator back/zero. It is a single bite of eight bits all of which are zeros. That's been true for a few weeks now. NL happens to be a special memory address literally ox0 at which nothing is supposed to ever live. So whenever I describe the top left corner as this is address zero, this is one, this is two. Humans years ago decided, you know what, let's just waste bite location zero and never put anything there so that we have a special value to ensure that we can signal when something has gone wrong. So humans just decided don't use memory address ox specifically and a few bytes after it. So what does this mean? Well, in my code all this time and since week one, frankly, things could have gone wrong. So in VS Code here, I'm using get string and I'm using Maloc and I'm using stir copy and um all of these print statements here, but I'm not actually adding as many error checks as I should. So it turns out if you read the actual documentation for get string, which in fairness we never told you about until now, in cases of error, get string can return null. Why would it ever have an error if the human types in such a large paragraph of text maybe that there's no room in the computer's memory for everything they've typed in? Well, you don't want to just get back part of the text and not know that something went wrong. Get string is designed to return a special sentinel value null in all caps. That just means I can't oblige. I can't return you a correct value. Here's an error instead. So what I should always have been doing since week one but we consciously don't because it adds just too much overhead is check if s equals equals null then we should abort the program altogether and for instance like return one as we've done before to just signify error like we cannot proceed because get string did not work that is true of maloc 2 technically we should say if the address in t also equals null that is ox0 we should also return one because something uh went wrong. So, let's do this one more time. Turns out that even two upper is taking for granted the fact that the humans typed in anything at all. What if the human just types enter? Well, that's a valid string. It's the so-called empty string, quote unquote. But what is the length of nothing? It's going to be zero. And that's problematic because if you try to go to T at the first location, what is actually there? Well, that's actually the null character, which is not something you should even try to capitalize, it would seem. So, what we should really do here, too, is check only if the sterling of S is greater than zero should you even bother uppercasing that first character. I mean, one, at best, it makes no sense because if there's no string, there's nothing to uppercase. At worst, I could break something by touching memory that I should not. And if I may, there's another issue. Now, on line 15, I'm asking the computer for memory, and it's going to hand me those four bytes. But technically, I'm never giving them back. And so, even though this program is so short that it's going to quit pretty soon, and it's not a big deal, the computer will automatically reclaim that memory in longunning programs that like servers or things that are running for a long time. If you use Maloc and ask for memory, but never give it back to the computer, never free it, so to speak, your computer might get slower and slower and slower and slower essentially because it's running out of memory. Not physically, but the computer thinks it's using all of its memory even if it's not actively in use. You as the human know best. And so at the end of this program when I am completely done with T, you should similarly call free of T passing in the address that you allocated previously so that the operating system gets that memory back. If you don't do that, it's what's called a memory leak. If you've ever used a Mac program, a Windows program, an iPhone or Android program that somehow is just getting slower and slower and slower and slower, that is often a symptom of a human having messed up and not freeing memory that they don't actually need anymore. Questions on null or any of these kinds of checks? No. All right. Well, as a teaser, in just a bit, we're going to reveal when and why things can go terribly wrong by way of a little bit of claimation from our friends at Stanford, but feels like we're long past a good uh snack break. So, why don't we go ahead and have some oranges and some fruit snacks, and we'll see you in 10. All right, we are back. So, with memory, a lot of things can go wrong. And in fact, a question came up during the break about whether or not I should have also called free on s, which was the string that I actually got back from get string. The short answer is no. This has been a deliberate choice over the past several weeks whereby the implementation by CS50 of get string automatically frees memory that it has given to you once it is no longer needed. So that's a bit of magic underneath the hood once those train once you no longer use that though that feature goes away. But because I actually used maloc to get my memory for t I did have to free that specific memory. So the rule of thumb quite simply is if you maloclocked it you must free it. If we get string malocked it, you do not have to free it yourself. But of course, things can go wrong. And thankfully, there are tools via which we can find memory related errors. And one thing we're going to show you briefly is another tool called Valgrren, which is a nice complement to something like debug 50 and print f and the duck for actually chasing down specifically in this case memory related errors. So in fact, let me go over to VS Code and open up a program I wrote in advance because it's just not all that useful, but it is demonstrative of some things that can go wrong. And in memory.c we have this code here. We include standard IO.h and we include standard lib.h the latter of which recall is necessary now when you want to use maloc and in turn free. And inside of this main function I'm doing a few things. I am first allocating three integers in kind of an interesting way because it turns out that maloc takes as its argument the number of bytes that you want to get. Now I know on most systems an integer is indeed four bytes. So if I want space for three integers, I could just do 3 * 4 is 12 and put 12 inside the parenthesis here. But that's generally frowned upon because it would make my code less portable to other systems where an int might not be four bytes. So turns out you can use this operator size of and actually ask the computer how big is a data type like an int on this specific system. And for chars you'll always get back one. For ins usually get back four. And same goes for other data types as well. But this is the more dynamic way to ask that question. If you want to get three uh integers worth of memory, what I'm then going to do is assign on the left hand side the return value of maloc to this variable x just because and x itself is a pointer to an integer more specifically to this chunk of memory which is a sequence of three integers. This is very arbitrary and this is only meant to demonstrate things you can do incorrectly ultimately. But this is how I would dynamically get space for three integers from maloc and store the address thereof in x. So it stands to reason that I could put my first value at uh x bracket 1 equ= 72, my second value uh equaling 73 and my third value equaling 33. Now if some of this is rubbing you wrong, like these are actually there's riddled with mistakes already, some of which are old to us. What's the first thing I've done wrong? Even if you have no idea what's going on with line eight, what about lines 9, 10, and 11? What I do wrong? Yeah. >> Yeah, my indexing is wrong. Like we've known for weeks now that with arrays or with array syntax, you always start counting at zero, then one, then two, not one, two, three. So that's an issue. And this is a new detail. But given that I've used maloc on line 8, what other mistake have I done in this version of the program? What's missing? Free. So I didn't actually call free. So this program has a memory leak. It's asking for memory and never handing it back. Now that's pretty good. You know, a few of us were able to just kind of eyeball the code and debug it. But that's not going to be true for all people, all programs, certainly when the programs get larger and more complicated. So a program like Valgrren's purpose in life is to help you spot these kinds of errors. So for instance, when I run make memory to compile this program and then do slashmemory at a glance, like it actually seems perfectly fine, if only because I'm not seeing any me errors even when I compile it or when I run it. But we I do claim that there's at least two that we've seen here. It's just we're not getting so unlucky that the program is actually crashing as a result. So this is a more latent, harder to detect bug. But what I'm going to do now is this. I'm going to open up my terminal window in full screen. I'm going to then do Valgrind space memory so as to run the Valgrren memory checker on this program. So similar to debug 50, but the name now is Valgrren. This isn't a CS50 thing. This is a common program that programmers use. When I hit enter, the output's going to be atrocious, frankly. Um it's more way more complicated than it needs to be. They put this number here, which means something specific, but it's just stupid that it's on every line of output. So it's overwhelming at a glance. But once you've trained your eyes to look for useful information, there's a couple of useful insights here. So one, invalid write of size 4 that apparently is somehow related to line 11. So let's go there. Let me just minimize my terminal window, look at line 11 of memory C, and just see which line that was. Okay, invalid write of size 4. Well, writing means like changing a value. Reading means accessing a value. So they're sort of opposites. invalid write of size four. Well, here's why it's generally useful to know generally how big an int is. Like four, you're trying to write four bytes incorrectly. So why is line 11 invalid? Just to be clear, because the index is off like I'm touching memory that I should not. If I ask the computer for space for three integers, each of which is four bytes, that should give me location 0, one, and two, not location three. So you still have to know a little something about programming to be able to make good use of that information invalid right of size four but once you've sort of trained your mind and your eye to catch it like h now I'm an idiot I have to go in and fix that problem but what else is wrong based on valgrren's output here so this is kind of worrisome leak summary definitely lost 12 bytes in one blocks I don't really know what one blocks means for now but 12 bytes should be familiar because if you generally remember that an int is four bytes and you ask or three of them. Oh, there's my 12. So, somehow I'm losing 12 bytes of memory. Not in a literal sense, but it means by the time the program finishes, you have not returned or freed all of the memory that you asked for. So, this line here is your hint that you've done something wrong with respect to 12 bytes in total. And sometimes you'll see slightly different output here. For instance, we see mentioned up here, 12 bytes and one blocks are definitely lost in loss record 101. Very verbose. But the juicy part is ah on line 8 is the source of that error specifically. So there too it's a little bit of a breadcrumb leading me to the solution for fixing this. So if I go up here, I look at line 8. Okay, there's only so much that I could have done wrong on line 8. If I've maloced the memory on line 8, sounds like I do need to free it later on. So let's fix both of these problems. The first one is just the indexing issue. Change the 1 2 3 to 0 1 2. Let's then ch fix the second problem by just freeing x at the very end. And just for good measure, this was not caught by Valgrren because it doesn't always happen. But there's one other scenario that could go wrong and it relates to line eight. What should I be doing? >> I am doing an array, but recall that we can use array syntax on chunks of memory. So technically what line 8 is doing is this. It is allocating 12 bytes of memory from the computer just because just to demonstrate how maloc works and it's storing the address of that first bite in a variable called x. The bracket notation is just the syntactic sugar that allows me to change values at x's address. I could alternatively just use pointers and say go to x and put 72 there. Go to x + one and put 73 there. go to x + 2 and put 33 there using pointer arithmetic. But those are identical and no generally, you know, most people would just use square bracket notation because it's just a little cleaner and easier to read and write. Okay, but back to this question. There's still a subtle bug here based on our example just before break. What should you be doing anytime you call maloc and get string and a few other functions for that matter? Did I hear the answer? Checking for checking for null, right? Because if me lock has an error, there's not enough memory for whatever reason, you should not be proceeding to touch that memory because it might be the null address that is 0x0. So what you should really be checking is, well, if x equals equals null, there's no more work to be done here. Let's just return one down here. And only if we get all the way to the bottom should we maybe return zero to signify uh explicitly that there is in fact successful operation. All right, with that said, let's go back down here. Remake memory. No error messages from the compiler. Dot /memory. That too seems okay, but it was fine the first time. Let's now run valgrren. Let me uh maximize my window. Run valgrren dot slashmemory. Crossing my fingers as always. And now this is actually pretty good. It's much shorter output even though it's just as scary at a glance, but most of this is fluffy and not uh very uh revealing. Heap summary in use at exit zero and zero. So look like all heap blocks were freed. No leaks are possible. Heap is a word we'll come back to, but this means there's nothing wrong. In fact, zero errors, which is a good thing. So in short, Valgrren is among the most arcane programs we're going to use. It's output was really designed for those more comfortable, if you will. But there's still juicy insights there. If you just kind of look for things that lead you to like this file on this line number, odds are that will lead you to the most subtle of bugs. In fact, another type of bug is when we do indeed touch memory, we shouldn't. So, let me uh zoom out on that, clear my terminal, and let me open up another program or maybe write this one real fast incorrectly. So, let me create a program called garbage.c C to demonstrate what we've generally called garbage values. That is values that are still in memory, but I didn't put them there myself necessarily. I'm going to include standard io.h. I'm going to include standard lib.h. And then I'm going to go ahead and actually no need for standard lib this time. Let's do int main void. And inside of main, let's give myself an array of like way too many exam scores or whatnot. We used to do just a few, but let's say there's a,024. Then let's go ahead and do for int uh for int i equals z i less than 124 i ++ and in here let's go ahead and print out uh whoops let's go ahead and print out using print f each of those scores of course I have clearly forgotten to do something in this program which is what I haven't actually put in any scores there for real like I've asked the computer give me an array for 12,024 integers, but I've not used get int or even manually typed in any of my quiz scores, which we did in the past. That's because I'm intentionally trying to show us garbage inside of the computer's memory. What this loop is going to do on line 8 now is literally print out the first int, the second int, the third int, all,024 ins, but all of them should be garbage values because I myself haven't put anything in those addresses yet. So, let's go ahead and make garbage. Let's go ahead and maximize my terminal window just to see more on the screen. Do dot/garbage. It's going to be super fast output because the computer's way faster than,024 variables values alone. There is a lot of garbage output. So when we talk about garbage values in the abstract like here's just some random zeros, a 25, a 32,000, a negative number and so forth, that's because that's essentially remnants from the computer's memory of stuff that might have happened previously, not necessarily by me in this moment, which is to say you just shouldn't touch that memory at all whatsoever. So now we're seeing garbage values for the actual first time. Let's consider another example of a program that uh doesn't contain that does contain potentially memory errors. And let's look at this too. So this is not really a useful program. It's meant to be demonstrative of some of these concepts. So here we have a program takes no command line arguments. Up here we've got a line that pair of lines that declares two pointers but doesn't yet initialize them to any variables. And that's fine. You don't have to have an equal sign with any variable. You just eventually should assign it some value. But this just tells the computer, give me a variable X that's going to store the address of an int. Give me another variable Y that's going to store the address of another int. Okay, what happens next? Well, on this line of code, in this simple example, we're allocating enough space for a single integer just because it's a stupid exercise. There's no reason to do this other than to demonstrate how Maloc works for the moment. Maloc returns the address of that chunk of memory. So that's what goes in X. So X is now pointing at somewhere in memory four bytes of space that it can certainly put a value at. How do we do that? Well, if you do star X and use the dreference operator, that means go to that chunk of memory and put the number 42 there. That's totally valid. This says go to the address in Y and put the unlucky number 13 there. Unlucky quite literally because what is Y pointing to at this moment? It's just the garbage address. Why? Because if you don't initialize Y, who knows what it's going to be pointing to? Maybe it's zero, maybe it's 25, maybe it's 32,000, a negative number, just like we saw in the previous example. You have no idea what values are going to be in X and Y unless you yourself put those values there. So, this is highlighted in red because bad things are going to happen if you try to dreference an invalid or a bogus pointer. Even worse than just touching uh variables that might not have values, if you dreference an address and try going to some random place, the computer is generally not going to like that. And in fact, our friends at Stanford wonderfully brought this particular scenario to life whereby even though this example is a bit contrived just to fit it all on the screen at once, it is going to be the case that bad things happen if we don't check for these values and actually assign valid values in the form of as we'll see now some claimation. So here I give you uh binky uh which is a bit of claimation from our friend Nick Parlante at Stanford. If we could dim the lights unnecessarily dramatically. >> Hey Binky, wake up. It's time for pointer fun. What's that? Learn about pointers. Oh goody. Well to get started I guess we're going to need a couple pointers. Okay. This code allocates two pointers which can point to integers. >> Okay. Well, I see the two pointers, but they don't seem to be pointing to anything. >> That's right. Initially, pointers don't point to anything. The things they point to are called pointies, and setting them up is a separate step. >> Oh, right. Right. I knew that. The pointies are separate. So, how do you allocate a pointy? >> Oh, thanks. >> Okay. Well, this code allocates a new integer pointy, and this part sets X to point to it. >> Hey, that looks better. So, make it do something. >> Okay. I'll dreference the pointer X to store the number 42 into its pointy. For this trick, I'll need my magic wand of dreferencing. Your magic wand of dreferencing. Uh, that that's great. This is what the code looks like. I'll just set up the number. And hey, look, there it goes. So, doing a dreference on X follows the arrow to access its point. in this case to store 42 in there. Hey, try using it to store the number 13 through the other pointer Y. Okay, I'll just go over here to Y and get the number 13 set up and then take the wand of dreferencing and just Oh, hey, that didn't work. Say, uh, Binky, I don't think dreferencing Y is a good idea cuz, uh, you know, setting up the point is a separate step and, uh, I don't think we ever did it. H good point. >> Yeah, we we allocated the pointer Y, but we never set it to point to a point D. H very observant. >> Hey, you're looking good there, Binky. Can you fix it so that Y points to the same point as X? Sure, I'll use my magic wand of pointer assignment. Is that going to be a problem like before? No, this doesn't touch the pointies. It just changes one pointer to point to the same thing as another. Oh, I see. Now Y points to the same place as X. So, so wait, now Y is fixed. It has a pointy. So, you can try the wand of dreerencing again to send the 13 over. Okay, here it goes. Hey, look at that. Now, dreferencing works on Y. And because the pointers are sharing that one point, they both see the 13. Yeah, sharing. Uh, whatever. So, are we going to switch places now? Oh, look, we're out of time. But I can only imagine how long that took, Nick. But the key detail was that bad things happened to Binky when we did this line of code. Dreferencing a invalid pointer that had no true value assigned. It was just some garbage value. Now what's the solution? Well, as Nick proposed, just don't do that. And instead, at least do something sensible like assign X equal to Y. Not to make a copy of anything per se, but to literally point X at the same location in memory to point Y at the same location in memory as X. Then a line like this is perfectly valid. you can go to that address which happens to be the same as the 42 and that's why in the claimation form we saw that the 42 became a 13 instead. So again at the end of the day this is only demonstrative of these basic building blocks that we now have at our disposal but also how easy it is to do things incorrectly. So this is one of those with great power comes great responsibility. C is one of the languages that is incredibly high performing. It's so close to the hardware that you have so much control over the memory and operation that you can write really good, really fast code. And that's why even all these decades later, it's among the most omniresent programming languages in the world. At the same time, you can really screw things up. And so many of today's software that are hacked in some way or crashed for some reason is often because humans have just missed some simple mistake like this that happens to relate to memory. So more modern languages that we'll soon see like Python and if I in high school you studied Java. Uh you don't have this much control over the computer's memory. There's many more defenses put in place to protect you and me from ourselves so to speak. But you pay the price by some of those languages tend to be uh less uh slower and less performant. Yeah. What is the difference here that we're now playing with memory? This will become clear this week and next. And in fact, some of the examples on which we'll end today will motivate needing to have finer grain control over what's going on inside of the computer. When you want to deal with files, for instance, you're going to need to know a little something about memory addresses and where things are. when you want to build structures in memory beyond the complexity of an array. In fact, next week we're going to start building like two-dimensional structures in the computer's memory to represent the equivalent of like a family tree, for instance, or trees more generally that can store data in a more efficient way. Up until now, all we have is arrays. And with arrays, you can achieve something like binary search, but we're going to see there are things you can't do with arrays, especially if speed's important. >> But I I was saying like, for example, if you were to ask me to do this like say last week about this, I would be like x equals like 13 or something like assigning a variable. >> Correct. So last week if you just said int x= 13 or in y equals 42 or whatnot totally fine. And again this program sole purpose in life is to demonstrate how you can make mistakes in and of itself is not useful here but it's representative of how we're going to start using this syntax not only in this week's problem sets but next week as well. All right. So, with that claim made that we can do a lot of damage, let's consider how pointers and knowledge of memory addresses can actually solve some useful problems. Um, can we get one volunteer to come on up and help pour a drink? Come on up. All right. What is your name? Come on over. >> If you want to say a quick hello to the group. >> I'm Olivia. >> Okay. and and a little something about yourself. >> Oh, um I live in Canada. >> Okay, welcome. Well, come on over here, Olivia. And we have um two glasses. Well, really three glasses. So, we have these fancy ray bands that have cameras built in whereby we can sort of capture your point of view. If you're comfortable, we'll put these on. There's no lenses in them. The white light will mean we're recording. Hopefully, a memorable moment. This battery too is dead. All right. We don't have a backup for the backup, so we're going to pretend that this part never happened. So, >> Olivia, we have two glasses here for you. And I'm going to go ahead and pour uh some colored liquid into both. So, we've got some blue liquid here into this glass. All right. So, we'll fill this up here. And then in this one, we're going to go ahead and pour this orange liquid. And at this point in the story, I'm going to exclaim, "Oh no, I accidentally put the wrong liquid in the wrong glass. So, I got this backwards." So, what I'd like you to do is swap the values in these glasses so that the blue goes into that glass and the the orange goes into this glass >> without mixing it or >> without mixing it. So, well, you're hesitating. Why? >> Well, it would be hard to do unless you can like talk to the mic if you could. >> Oh, it would be like hard to do um without mixing the two because like you don't have anywhere to put the other one, >> of course. So, in the real world, this is not really solvable unless for instance, we have a temporary variable if you will, like an empty glass in which to do this. So, here is your third variable if you want to go ahead now and get the blue into that one and the orange into that one. Yeah. No pressure. All right. So, we're putting one value into the temporary variable. We're putting the other value into the original value. Okay. And now you're probably going to take Yep. I'm guessing the temporary value put it into the original variable and that that was very well done. If maybe we can give Olivia a round of applause for just that. Thank you. We have little parting gift for you here too. So goal here really being to create a memorable moment of like oh remember the time Olivia tried to swap two values she needed a temporary variable is the takeaway. So why is that? one code. If we wanted to do the same principle, we're going to need somewhere temporary to put one of those values before we can make this happen. The catch is though that if we don't do this intelligently, like it's just not going to work in C unless we take advantage of some of these new capabilities. So, in fact, I'm going to go over to VS Code here and I'm going to open up a program called swap.c that I wrote in advance whose purpose in life is simply to swap two variables values. So, I've got standard io.h at the top so I can use printf. I've got the prototype for a swap function which is uh might as well be Olivia in this case that's going to take two inputs A and B or two uh glasses and swap their values ultimately is its purpose inside of main though I'm going to do this I'm going to set two variables X and Y equal to one and two respectively I'm then just as uh point of clarification going to print out the value of X is such and such y is such and such then I'm going to call the swap function aka Olivia to swap the values x and y then I'm going to print out x is this and why is this? So that hopefully I'll see that they've indeed been swapped. At the bottom of this file, we have the actual swap function. And as you might expect, it takes two inputs, A and B, both of which are integers. So I could have called them anything I want. The first thing this function does is it grabs an empty glass called temp, puts a or the blue liquid into it. Then we put into A the value of B. So we've sort of lost the value of A at this point except that we did make a copy of it into temp. And then lastly, we put into B the temporary variable. And at the end, the temp variable is empty. Although technically it still has a copy of the value, but it's no longer useful because the job is done. And A has become B and B has become A. So I dare say this is like the literal translation of what Olivia just did. And I I like the logic of it. However, when I actually run this program, something goes ary. So let me go ahead and do make swap dot slap. And I'll maximize my window. I should see hopefully that X is one, Y is two, and then X is two, and Y is one. But no, like even though I literally translated into code what Olivia did, this didn't actually seem to work. And why is that? Well, it turns out that this version of the program is not right. In fact, because of issues of scope. And we've talked about scope before, generally in the context of like where a variable lives. We've said that a variable only exists in like the most recent curly braces that you opened up for it. And that was true. It's just sort of a colloquial way of describing what scope is. But scope comes into play here because it turns out that A and B, in so far as they are the arguments or parameters for the swap function, they have a different scope than X and Y. And that still follows the same definition. They're inside of different curly braces than X and Y are. So it seems that I may very well be swapping A and B, but I'm not having any impact on X and Y. So why is that? Well, in C, all this time, anytime you pass in arguments to a function, you are passing in those arguments by value, so to speak. You're literally passing in copies of the variables to the function you are calling. So what does this mean? Well, more concretely, if like this is a p photograph of a chunk of memory inside of the computer and we sort of zoom in as we've done before and we abstract away all of the bytes from top to bottom, what's really happening inside of the computer's memory is that we're using some of it for X and Y and some other memory for A and B. But how is that in fact happening? Well, it turns out to a question that came up before the break, memory in a computer is actually assigned in a somewhat deliberate fashion. And generally if we think of this rectangle is representing my computer's whole chunk of memory. Generally what happens when you run a program with dot slash something or on a Mac or PC by double clicking or on a phone by single tapping. What happens is all of the zeros and ones that were compiled by the company or person who made that program are loaded into the top of the computer's memory so to speak. This is just an artist rendition. There's no notion of top or bottom per se, but it's loaded into this chunk of memory at the very edge of the computer's memory aka machine code. the zeros and ones that compose the actual program. That's where they go. So, they're copied from the hard drive or the SSD, whatever you know it as, the persistent storage, and it's put there in the computer's RAM or random access memory, which is the faster memory where programs and files live while you are using them. Meanwhile, if your program or the program you're using has any global variables, global in the sense that they're defined outside of main and not inside of main or inside of other functions, they end up right below that machine code by convention, just so they're accessible everywhere. Meanwhile, there's this big chunk of memory below that called the heap. The heap is the chunk of memory that Maloc uses to allocate memory for you. So the first time you call Maloc, it's going to give you probably this chunk of memory. The second time this chunk, the third time, this chunk, and this chunk, and so forth, back to back to back in memory, but Maloc is going to manage all of that for you. You don't have to worry about where it's coming from, but it's coming more generally from this big heap area. But it turns out that the way computers are designed is that the heap of course sort of grows and therefore downward again even though there's no notion of up down inside of the computer but it grows in this direction. But it'd be nice to make use of this other area of memory and that's what's called the stack. And the stack is the area of memory that's used anytime you create local variables or call functions. So again, maloc uses memory from up here and functions and variables use memory down here just because this is what humans in a room decided years ago is how the computer's memory would be used. Therefore, the stack grows sort of vertically much like stacking trays in a cafeteria or the dining hall. They go from bottom to top in this model. All right. Well, let's consider for the moment just how the stack is used because we're using a main function in this program. We're using a swap function in this program. So I claim that those functions are going to use memory down here. Well, how are they going to use it? And how is this in fact bad for our current goal? Well, when you call the main function, it uses this chunk of memory here. Specifically, if main has any arguments like command line arguments, or if main has any local variables, they end up down here in memory. Meanwhile, when Maine calls swap, swap gets the next available chunk of memory above it, so to speak, in memory, and any of its arguments or local variables end up there. So when main uh when swap is done executing it's as though that memory disappears even though the zeros and ones are still there but the computer can now reuse that same chunk of memory later. Airgo garbage values when functions are being called going up and down conceptually that's why you're getting remnants of previous values in the computer's memory. But let's focus on main for a moment in Maine in this program. Recall that I declared two variables X and Y. X getting the value one Y getting the value two per these two lines of code. Then I called the swap function. So swap is going to get its own chunk of memory, more technically called a frame of memory. And inside of that frame, it has two arguments, A and B, and a local variable called temp. So I'll draw them as such. When you actually call swap passing in X and Y, X and Y are passed in by value, that is to say copy. So A becomes a copy of X and B becomes a copy of Y. So when this line of code or rather this uh prototype for swap just makes clear that it takes two arguments a and b both of which are integers in that same order. So x comma y uh lines up with a comma b. So what happens then inside of the swap function if a is a copy of x and b is a copy of y. Well at the moment it's equal to one and two respectively. But consider this first line of code int temp gets a. So temp takes on the value of a. Next line of code, A gets B. So A gets the value of uh B. Sorry, which just happened. Meanwhile, B gets the value of temp. So B gets the value of temp. Now temp still has a copy of one. So it's not quite analogous to the liquid because we're that glass is clearly now empty, but it does contain remnants of what it once did. But the key here is that A and B have successfully been swapped. If I were to print out A and B, I would see that they've been swapped. But what has obviously not been swapped in this story? No one has touched X or Y because when swap returns, especially if I don't even print out anything in swap, X and Y are unchanged. So A and B, the copies were swapped but not the original values. And that's the essence of the problem here with this represent this simple uh example of swapping values because I was passing by value. But as of today, we now have a solution to this problem. Because previously today, if I asked you to write a function that swapped two values, you could not physically do it in code because you had no way of expressing the solution to this problem. But now we have the ability to pass by reference. That is use pointers and addresses more generally to tell the function how to go to an address and do something there. How to go to another address and do something there. How do I express this syntactically? It's going to look a little scary at first glance, but it's just an application of today's new building blocks. This bad version of the program where a and b are both integers just needs to change to be addresses of integers. So give the function a sort of treasure map that leads it to the actual x and y by saying that a is now not going to be an int per se but the address of an int. b is going to be the address of an int. And now to use those values, you can say the following. int temp gets whatever is at location A, go to location A and put whatever is at location B, go to location B and put in the temp value. And here is a perfect example of where this use and overuse of the star or asterisk operator is just like cognitively confusing frankly because we use star for multiplication. We use it for declaring a pointer. We use it for dreferencing a pointer. Ideally, humans years ago would have come up with another symbol on the US English keyboard to represent these different ideas. But this is where we're at. We're using the star for different things in different contexts. So, this just tells the computer that A is going to be a pointer, an address of an int. This tells the computer that B is going to be the address of an int. This star when there's no data type to the left of it means go to that address, as does every other example thereof. So, what's happening this time? If we actually look at the diagram again, X and Y are still one and two respectively. Swap gets called. It gets now the values of the address of X and the address of Y. So pictorially we might draw that as following. A is pointing to X. B is pointing to two. I mean technically it's like ox123 and ox12 whatever, but who cares? We're just going to abstract it away now with actual arrows or pointers. The beauty of this now then is if we look at the swap function, int temp gets star a that means start at a and go there sort of shoots in ladder style familiar with the game and you find the value one. So you put the value one inside of temp which is why it's there. Now meanwhile this next line of code go to A's address go to B's address and copy the ladder to the former. So this means go to A. This means go to B where you find the two. So put the two where A is pointing. Lastly, go to B and put temp there. So that's easy. Go to B and point temp, which is why we now have the one. And the beauty of this now is that when swap is done executing, this memory, this frame sort of goes away conceptually, even though the zeros and ones are still there, but it's done being used, but we have now mutated the actual values of X and Y by giving them a proverbial treasure map of the addresses of X and Y, not copies of the values themselves. So hopefully this is the beginning of an answer to like why is this stuff useful? You can now solve a whole new class of problem and even more next week. Other uh questions though on any of the syntax pictures or the like. This is good use of pointers now instead of bad. All right. So with that new capability, let us consider here how things can still go wrong and why indeed with this power comes that responsibility. Well, if you consider now the bad version of the code is fixable via this good version of the code, we've still left a big glaring problem in the diagram itself. Designing something that grows this way against something that grows this way, like this is not going to end well. Why? Because the more you call maloc, the more memory that gets used here. The more functions you call, the more memory that gets used here. And at some point, like they will collide because the computer only has a finite amount of memory. So how do you avoid this situation? Like you kind of don't like you honestly just make sure that you minimize how much memory you're using by calling maloc only as much as you need to and not calling for a million bytes of memory just because you might need them. You only allocate what memory you need. and you try not to call functions again and again and again and again and again and again without them finally returning. So if you ever did something recursive a a couple weeks ago where you accidentally maybe called a function that never had a base case never divided and conquered and actually shrunk the problem you could overflow the stack or equivalently heap by just using too many frames of memory. So it's just a mistake in the programmer uh for the program themselves. So if you've ever heard these phrases now, which some of you might have heap overflow or stack overflow, there's a very popular website called stack overflow. And this is the etmology thereof. Like stack overflow refers to this representative big problem with computers memories if you're not mindful of how you're using the computer's memory. And this is just the way it is. If you've got finite amount of anything, that resource can eventually run out at which point program will crash or something else might very well go wrong. In fact, this is a general more specific examples of what are called buffer overflows. A buffer overflow is generally just a chunk of memory like an array that actually just gets uh overflowed with too many values like using allocating a small array and trying to put too many numbers therein. There's problems that um and in fact you can see this very simply if we take off those last of our training wheels. So for instance these are the functions in the CS50 library get int get string and so forth. um they're harder to take off these training. It's harder to take off these training wheels because C does not fundamentally make it that easy to manage memory yourself. So for instance, let's focus for just a moment on get int. I'm going to go over to VS Code here in just a second and let's go ahead and create our very simple program called getc whose purpose in life is to just get an integer much like CS50's own function. So, in get C, I'm going to propose that we write a program that does a little something like this. Uh, include CS50.h, include standard io.h, and then inside of main, let's go ahead and declare an int n. Uh, set it equal to get int, and we'll just ask the user for the value of n. Then let's go ahead and print out n's value verbatim back by just doing quote unquote comma n. This program is simply using the get in function in order to get an int and stored in n. So let's run it. Make get slashget. Type in a number like 50. Seems to work. And yes, I think this program is correct even though it is using the CS50 training wheel of get int. Let's stop using get int though. It turns out that you don't have to use get int if you instead use a function called scanf which scans formatted input which just means read something from the keyboard into memory. This is essentially what get string and get in using although that too is a bit of an oversimplification but let's use it here now is an opportunity to get rid of the training wheel of the CS50 library al together and down here let's do this instead of using get int let's declare a variable n but not give it a value yet let's now print out just a little prompt just to tell the human what we want we want them to type in a value for n and now let's use this new function called scanf and say scan from the user's keyboard an integer represented by percent i, our old friend and format code. And please put the integer that the human types in in the variable n. This is slightly buggy though because if I want a function like scanf to be able to change the value of a variable, just like the swap function, I can't just pass in n. I need to pass in the address of n here. In fact, let's take a moment now to go into the swap function which we knew to be buggy before and actually update it to match what we saw on the slides. I claim that the problem is that we're passing in originally x and y as one and two into the swap function but therefore we're passing in copies. But what if we change the swap function to take indeed the address of an int and the address of an int. Let me change my prototype accordingly because that two must be changed. Then when I change this function to take in those pointers, I need to change my code to dreference them. But there's one last thing I need to do. I'm still on this line of swap passing in X and Y, which is literally the values X and Y. If I want to pass in the address of X and the address of Y, what other operator do I now need? the amperand x and the amperand y to pass in sort of the treasure map the pointer to those two variables locations. So if I open up my terminal window now do make swap on this version dot / swap cross my fingers now this new and improved version of swap as claimed does actually swap the values the key being swap now has access not to x and y per se but to the addresses of x and y. So if we now close out swap and go back to get, here is the same principle applied to scanf. If scanf exists and it comes with c, its purpose in life is to scan an integer from the keyboard and put it somewhere you want. You can't just give it the variable name because it's going to get a copy of whatever garbage value is in there. You have to say put this answer in the address at the address of n itself. So lastly after this, let me go ahead and print out n colon and then percent i again as a format code back slashn, n. This line is just my prompt because I just want the human to know what they're being asked for. This line is printing out n colon and then the actual value. So the only interesting part here is that I'm declaring a variable called n, but I'm not giving it a value myself, but I'm using scanf instead of get int to scan so to speak an integer from the keyboard and put it at the address of n. So that scanf has access to that value. So if I now do make get without any cs50 library/get, let's type in the number 50, I indeed see the number spit back at me. And just to be clear, print f uses these format codes of percent i and so forth. Scanf uses essentially the same format code. So that's why I'm using percent i in both places. Both functions per their documentation are designed to do just that. So this is great. We've gotten rid of get int. Catch is that getting rid of get string is much much harder. Why? Well, let's try another example. Let's go ahead and try to get a string from the user instead of just an int. So we'll call it string s. But wait a minute. CS50 library is not included. So we need to use the actual thing that this is. So char star s means give me a variable that's going to store a string. Let's go ahead and print out that prompt just to prompt the user for s just for clarity. Now let's use scanf and scan a string with percent s and put it at location s. Then let's go ahead and print out just a reminder that the value of s is now that passing in s. Now there's something a little bit bit different here. Notice that I've deliberately not used an amperand before this s why even though I did before the n. Yeah. >> Yeah. So I want to pass in the address of the string which is if I may like already s like s is by definition the address of some string that is what a char star is or rather it's the address of a character but we know already that if you lead it to the first character whatever function can find the end of it thanks to the null character except that that's not going to be wholly true here but I don't want to do amperand here because if s is an address doing amperand s would be the address of an address which is actually a thing called a pointer to a pointer but none of at today, but it's going to be correct as written here. N was an integer, so I needed the address of it. S is already a pointer by definition. It's a char star, so I don't use the amperand here. But the problem is this. If I now do makeget dot slashget, and let's type in a word like how about hi. Okay, it did work. Let me try something even bigger like hi. Let's just hold this down a lot. Uh, let's do how about this? A really long string. Oh, come on. Let's type in a really long string like hi. And it's always a gamble to see if I've done this long enough, but okay, it didn't break. Okay, you'd like to think that this is correct, but let's go ahead and do this. Valgrind of get uh slashget enter. Let me maximize my screen. Oh, uh, and let me go ahead and type in a value for S. While Valgren is running, I'm going to type in hi exclamation point. And now lot, uh, let's actually scroll down to the scroll up to the top of this. A lot of error seems to have happened here. Use of uninitialized value of size eight. Use of uninitialized value of size eight. Like a lot of stuff is going wrong here apparently on it looks like maybe line four, which is quite early in the program. And in fact, well, actually that's not it. Uh, line multiple lines of code here we're having issues with. But why? Well, let's focus on the code here alone for a moment. Line five is giving me what? A variable called S. That's the address of a char. But what is S right now? Like what value is in there? >> It's a garbage value because there's no equal sign involved. I'm just saying give me space. Like give me eight bytes, 64 bits to store the address of a character. But if I don't use the equal sign and actually put anything there, it is in fact just some garbage value. The print f is uninteresting. It's just printing out son. Scanf though is saying go to this address and store the characters that the human typed in. But that means like following the wiggly line that we drew on the screen before because we have no idea where S is pointing. It might be there, there, there, there. You're putting the string at a bogus location in memory. You haven't actually allocated memory. So when you then try to print it, you're just trusting that you're going to memory again that you control. So what is the solution here? Well, there's a few different ways we could solve this. We could do something like this. Actually allocate space for like four bytes so that the human can safely type in uh so the human can safely type in high exclamation point with room for the null character. We could change S to actually be an array of size four because we can treat arrays as though they're addresses and addresses as though they're arrays. It turns out that syntactic sugar really goes in both directions. This too would solve that problem. Or better still, we wouldn't use scanf at all because how do I know how many characters the human's going to type in? Like this was a question too that came up during break. Well, high will fit in four bytes with the null character. By will not. So maybe I need five. Well, what if they type in a longer word? Six. Well, maybe the longer words, seven. Well, maybe a hundred or maybe a thousand or 10,000 or 100,000 or a million. Like, at some point, you've got to draw a line in the sand and say you can't type in something longer than this. And you see this in applications all the time. Like on the web, you can only type in so many characters sometimes into forms. And that's for various reasons. Among them is this. Get string though will handle almost an infinite number of characters because the way we implemented get string is to take baby steps through the input. When you type in a word on the keyboard or even a paragraph on the keyboard, we get strings implementers call maloc essentially again and again and again and again just asking for one more bite if we need it, one more bite if we need it, one more bite so that you don't have to worry about doing that. The problem is if you were to write code yourself without the CS50 library or someone else's equivalent library, you have to decide like how many bytes do you want to allow and you have to trust that the human is not going to mess around and type in more values than you actually expect. So what's happening with all of these examples thus far is that if you think of your memory as kind of a minefield of garbage values wasn't a problem when we declared n to have a value of 50 because we told scanf to go to that address and put the number 50 there and it fits. That's fine because an int is always four bytes in this case. Who knows how many times the human is going to hit the keyboard when typing in a string. Could be three or four or a million or anything else. So when we declare S here to be a pointer, it takes up eight bytes per the Oscar the grouch Oscar is the grouch here whereby that's eight garbage values that collectively represent that address at the moment because we've not assigned it to any other value. So if we try to tell scanf go to this address and store high or anything else there like who knows where it's going to end up in memory hence the squiggly line again and the program will quite often crash. I didn't get it because I didn't type in long enough of a string, but it would eventually, if I tried hard enough, crash because you're touching memory that you yourself did not allocate as an array via maloc or some other mechanism. So, what is the solution? Honestly, like don't use C for user input like this unless you're prepared to implement that complexity yourself. Use the CS50 library or some other library. This too is why in two weeks we're going to switch to Python because Python makes life so much easier when it comes to basic things like getting user input as do many other modern languages. But those languages just have code that other humans have written to solve these problems for you. So these problems exist but they'll be abstracted away for you. All right, let's tie this now together with where we began, which was to convey ultimately that we want to have uh the ability now to actually access files. And we introduce now a topic called file IO. IO for input and output. A file is just a bunch of bytes that are stored on disk, where disk might mean a hard drive, the thing that spins around with a platter with lots of zeros and ones on it, or an SSD, a solid state drive, which is u no moving parts nowadays and generally where our data is stored long term. Whereas RAM, random access memory, the y, the yellow pictures we've been drawing, is volatile. That is to say, when you lose power, the battery dies, you lose everything in RAM. On a hard drive or a solid state drive, that's persistent storage or nonvolatile storage, which means when the power goes out, thankfully, you don't lose all of your documents and essays and so forth, whether it's on your Mac or PC or somewhere in the cloud. But we haven't yet seen any code via which you yourselves can create files. Like literally every program we've written, even the phone book example last time when I typed in names and numbers, they got deleted as soon as the program quit and ended. So with File IO though, we have the ability now to start creating, saving, editing, deleting files much like you would from the file menu of Google Docs, Microsoft Word, or the like. Here are just some of the functions that come with the programming language C that allow you to open files aka FOP, close files, aka Flo, print to a file, scan from a file, read a file, write to a file, lots of different functions, some of which we'll explore this coming week. But why don't we first use them to solve a problem here in VS Code. So, let me go ahead and close get.c. Let's go ahead and open up a new program called phonebook.c, C, but implement a persistent version of it ultimately that doesn't just get deleted from memory when the program quits. Let's go ahead and only because it will make life easier, let's include the CS50 library still for this. Let's include standard io.h for this. And let's include string.h for this. Then inside of main, no command line arguments, let's go ahead and open a file called phonebook.csv. CSV stands for commaepparated values. Many of you have probably used them in the real world. They're like very lightweight spreadsheets where things are effectively stored in rows and columns where the columns are represented by just commas between values. And we'll see this in just a moment. How do you open a new file called phonebook.csv? Well, I'm going to do file star file equals fop phone.csv. And then I'm going to do quote unquote w for write. So what's going on here? fop is opening a file whether or not it exists yet called phonebook.csv and it's opening it in such a way that I will be allowed to write to it. Hence the quote unquote w per the documentation it means I can write to this file and not just read it. The return value is going to be stored in a variable called file. All lowercase by convention but that file is technically a strct called file in all caps. It's a little weird. It's among the few things that is fully capitalized in C. It doesn't mean it's a constant or anything like that. It's just how someone implemented it years ago. This is giving me a pointer to essentially the contents of that file. That's a bit of a white lie. Technically giving you a pointer to a chunk of memory that represents that file, but for all intents and purposes, it's a pointer to the file for now. Now, let's go ahead and ask the user for a name and number to add to this phone book. Let's do charar name equals get string uh quote unquote name to prompt the human for that. Charar number. Let's prompt them for that. and do it with this. And I could be using the string data type, but I'm trying to at least remove what training wheels we don't technically need anymore. And now that we've got a name and number in variables, let's print them to the file. That is, let's save them to the file. Instead of print f, we're going to use frrint f, we're going to specify what file we want to print to in case we have multiple ones open. What do I want to print? A string followed by a string followed by a new line. ergo comma separated values one after the other per line. Then I'm gonna pass in the values name and number respectively. And now I'm going to go ahead and do f close to close that file so that it's effectively saved. All right. So let me go ahead and demonstrate first that phone book.csv does not really exist. It's empty initially. Let me go ahead and scooch it over to the right here so we can see both at the same time. I'm now going to do make phone book. Enter. So far so good. Dot slashphonebook and let me go ahead and type in for instance uh let's see uh my name 617495 1000 and watch the top right of your screen as the program f writes to it and f closes the contents. All good. All right, let's run it again because maybe like the iOS app or the Android app, I'm adding new friends to my phone book here. So, I'm going to do dot /phonebook and I'm going to go ahead and uhoh, top right just got turned blank. Well, let's try this. Kelly 6174951,000. Enter. Okay, she's back. Let me run it again. Dot phone book gone. Well, what's going on here? It's not persisting at least as long as I would like. It seems to be the case that like writing to a file means literally rewrite the file. So if you use W, you're going to write to the file, but literally starting at the first bite. If you want to be smart about it and append to the file, well, per the documentation for FOP, you instead use quote unquote A for append instead of quote unquote W for write. This is a convention in other languages, too. All right, let's start this over. Let me go ahead and recompile this program. Make phone book. Now, let me do /phonebook. I'll type in my name again first. 6174951000. Enter. So far so good. Phonebook. So far so good. Kelly 6174951000. Enter. And now we're on our way. In fact, I can close this file. I can close this file. I can then open up phonebook.csv. And indeed, it has persisted. And in fact, if I downloaded this file onto my Mac or my PC, I could then rightclick it or double click on it and probably open it in Microsoft Excel or Apple Numbers. I could import it into Google Sheets or any number of other spreadsheet tools because now I am persisting and writing files of my own. questions on any of the techniques we just tried out here. If we really want to be nitpicky, like technically I should fix one bug or missed opportunity if I open up phonebook.c, I'm going to propose that as with any use of pointers and addresses more generally. Here too, something could be wrong like maybe I'm just out of space and so fop can't physically open the file for me. So here too, I should check if file equals equals null. Okay, fine. return one and then maybe at the very bottom here I return zero to make clear nope nope if I get this far all is well. So in short anytime you are dealing now with pointers you should be checking the return values to see if all in fact went well. Yeah >> yes everything we are using is part of standard io.h H which is wonderfully useful now because it has not just print f but frint f and so forth as well. Good questions. Yeah. >> Yes. So we have how are pointers used in this code? The short answer is you have to use pointers because this is how C designed files to work. So, we couldn't really introduce you all to files, file IO in week one or two or three because we had it. We'd have to introduce like this stupid little character to you and you'd be like, "What does this mean? It's not multiplication." Because the way file IO works is that when you open a file, you are essentially handed the address of that file in memory. That's an oversimplification. You're technically handed the address of a data structure in memory that references the file actually on disk. But for all intents and purposes, as I said, this gives you a pointer to the contents of the file. And if you want to write to the file, you need to then do use frint f in this case, tell it what file to write to. So you can go there and then store something like this string with these values plugged in. So in short, in C without pointers, you just can't do file IO unless it's abstracted away for you by some library. Good question. Other questions on file IO? All right. Well, let me do one other example here that's a little reminiscent of things we see all the time on our phones and laptops and desktops, like these progress bars for like video players. And you're all probably generally familiar with the term like buffering. If only because YouTube and other apps when they are slow or you have a slow internet connection, they might say buffering dot dot dot. Well, what does that mean? Well, a buffer is just a chunk of memory. More specifically, it's often an array that is only a finite size that stores bytes of stuff. Well, in the context of a video player, for instance, this red line here, which represents you're that way through that much through the video, it's an array that stores like the next few bytes of a video. And ideally, if you have a fast enough connection, when you hit play, those bytes keep getting downloaded and added to the buffer. And hopefully, you don't finish watching the bytes that have been downloaded before more bytes have been downloaded. So, a buffer is just a chunk of memory or more specifically an array in a language like C. Well, just to demonstrate how else you can do things with file IO, let me propose that we write a simple little program that is our own implementation of the CP program, the copy program that we've used a few times already that allows you in your terminal window to copy one file to another, likening it to this idea of a progress bar, where bite by bite, you want to do something, namely in this case, copy it, not watch it instead. So, let me go in VS Code and code up a program called CP.C. And in in this program, I'm going to go ahead and include standard io.h at the top. I'm going to then give myself a main function that this time does take finally a command line argument via int arg c and our old friend string uh arg v which today we can now reveal to be also just a char star. In fact, this is how we could now technically write the declaration for main because string no longer exists without the CS50 library per se. So that's really what's been going on this whole time. Now, let me go ahead and do this. I want to be able to write a program that takes two command line arguments actually. The name of the file to copy and the name of the new file to create from it. So let's go ahead and create a file using the same syntax as before called src for short, source as is a convention. And let's open a file using uh the file name argv bracket one. So the first word the human types and let's go ahead and open it in read mode because I want to read the source and write to the destination. My next file file star dst destination for short will be fopen of argv 2, quote unquote write. Now why one and two and not zero and one in zero is the name of the program which is not interesting. One and two will contain the next two words that the human types. Now let me propose that I want to copy this file from source to destination bite by bite similar in spirit to a buffer like this where you're just grabbing from the internet one bite of the video at a time so as to watch it. In this case I want to copy it. So how can I do this? Well we don't have a data type per se for representing a bite eight bits. However, a common convention is to actually use our new friend type defaf and simply declare bite to be something significant or something specific. So, let me declare a type uh called bte. And what is a bite going to be? Well, it ideally is just a char because a char we know is one bite or eight bits. But recall that chars can be treated as integers and integers of course can be positive and negative. So even though this is a little esoteric, technically I want to define a bite to be what we'll call an unsigned char, which is probably a keyword you haven't yet seen. But it just tells the compiler that this char that is this sequence of eight bits cannot be interpreted as a negative number because I am not doing anything with math. These are just raw bytes or eight bits. So now down here I can give myself a bite and I'll call it B for short. And now I'm going to write a loop similar in spirit to what YouTube and other players are probably doing which just iterates over a file bite by bite making in our case a copy thereof. So while I am reading from this file into this bite the size of one bite one at a time into this destination. Go ahead and check that I've read at least one. So while the return value of a new function called fad is not equal to zero go ahead and oops sorry source go ahead and call fright another new function going to that address of the bite grabbing the size of it which happens to be one but I'll use size of for consistency grab one such bite and write it to destination this is a huge mouthful admittedly the last thing of which I need to do is close the destination so as to save it close the original file the source. Um, but this huge mouthful which you'll get more familiar with the next problem set is essentially saying on line 12 while I can read one bite at a time, write on line 14 that bite to the file. Implementing essentially this idea of the red progress bar going bite to bite to bite reading one bite at a time reading from one file the source writing to the other the destination. And here too to your question earlier like why why pointers? This is the way file IO is done. You have to be able to express go to this address, go to this file if you want to get data from it or to it. And a minor refinement too, technically when you open in files, if you know they're binary files, that is zeros and ones and not asy or unicode text files, you can technically tell fop write and read in binary mode. So there's no mistaking the bits for something other than raw data, an image or otherwise. All right. So, if I go ahead now and do make cp, it so far compiles. Let's try this out. So, here again is phonebook.csv. Whoops. Here, that's phonebook.c. Here again is phonebook.csv with two of us, David and Kelly. Let's try to make a copy of this file as follows. CP. So, this is my version of the copy program, not the one that comes with the system. Let's copy phonebook.csv into copy.csv. Enter. Let's open now the copy of the CSV. Enter. And voila. Thank god like it actually worked. I have made a bite forbyte copy of this file using syntax that was not available to us until today. So who cares? And what's the motivation? Well, it's a lot more fun to treat not just text files and these tiny little examples, but to actually play with real world examples. And in the next problem set, among the things you'll do is experiment with BMP files, bitmapped files, which essentially just means a grid of pixels top to bottom, left to right, much like our cat uh that our volunteers at classes start created for us. With a bit mapap file, you'll store in files literal uh sequences of pixels or dots, each of which is going to be represented with a specific color, a red value, a green value, and a blue value. And among the things you'll be able to do given such beautiful photos as this is as the weeks bridge down by the Charles River is actually make your own Instagram-l like filters to apply to photos like this understanding now as you do or soon will understand to be able to iterate over the file top to bottom left to right over each of the bytes therein and somehow mutate the bites to look a little bit different. So if this is the original photo, you might be able to make it all grayscale by changing the Rs, the G's and the B's to smaller values somehow that are simpler values that are just black and white and gray tones. You might take that same photo as input and give it more of a sepia tone like an old school photograph instead. You might actually reflect it like actually put these bytes over here and these bites over here so as to create the inverse of the image by reflecting it over the the vertical axis here. Or you might even blur the image like this. This is kind of a common feature in a lot of photo editing programs to either blur or deblur. Well, you can sort of do a little bit of math and make every pixel a little fuzzier by kind of clouding what the human is actually seeing. Or feeling more comfortable, you can actually write code now that you know how to manipulate files and addresses thereof and actually do edge detection and find the salient characteristics of something like the bridge to distinguish it from the sky and actually find filter-like edges like these. So, those are just some of the problems that you're going to solve over the coming week's problem set and manipulating ultimately files like these as well as JPEGs. And the last thing we thought we'd end on is a sort of computer science joke which for better or for worse, you're now getting more and more able to interpret. So, I'll leave you dramatically with this here famous joke. Oh, that's more laughter than usual. All right, that's it for week four. We will see you next time. Heat. Heat. All right, this is CS50 and this is week five already uh wherein we will focus today on data structures which is a topic we've touched on a little bit in simp in simple form but today we'll dive all the more deeply and for better or for worse this is our last week on C uh next week of course we transition to Python which is a so-called higher level programming language which is really frankly just going to make our lives a lot easier we're going to be able to solve a lot of the same problems but so much more quickly as humans but not necessarily as we'll see as fast when we run the code as the computer might have if we were still using a lower level language like C. So indeed thematic over this weekend next is going to be the theme we've seen before of tradeoffs. But before we get there, why don't we focus on a couple of data structures that you might encounter in the real world. Uh namely stacks and cues. Let's learn some facts about both of these. If we could dim the lights dramatically. Once upon a time, there was a guy named Jack. When it came to making friends, Jack did not have the knack. So, Jack went to talk to the most popular guy he knew. He went up to Lou and asked, "What do I do?" Lou saw that his friend was really distressed. "Well," Lou began, "Just look how you're dressed. Don't you have any clothes with a different look?" "Yes," said Jack. "I sure do. Come to my house and I'll show them to you." So they went off to Jack's and Jack showed Lou the box where he kept all his shirts and his pants and his socks. Lou said, "I see you have all your clothes in a pile. Why don't you wear some others once in a while?" Jack said, "Well, when I remove clothes and socks, I wash them and put them away in the box. Then comes the next morning and up I hop. I go to the box and get my clothes off the top." Lou quickly realized the problem with Jack. He kept clothes, CDs, and books in a stack. When he reached for something to read or to wear, he chose the top book or underwear. Then when he was done, he would put it right back. Back it would go on top of the stack. I know the solution, said a triumphant Lou. You need to learn to start using a queue. Lou took Jack's clothes and hung them in a closet. And when he had emptied the box, he just tossed it. Then he said, "Now Jack, at the end of the day, put your clothes in the left when you put them away. Then tomorrow morning when you see the sunshine, get your clothes from the right, from the end of the line. Don't you see? said Lou. It will be so nice. You'll wear everything once before you wear something twice. And with everything in cues in his closet and shelf, Jack started to feel quite sure of himself. All thanks to Lou and his wonderful queue. All right. Our thanks to Professor Shannon Deval at Elon University who kindly put together that animation. And it's meant to paint a picture of a couple of things that we've all encountered in the real world. But more technically, what we just saw were what are known as abstract data types whereby they're data structures in some sense, but it's really about the design thereof. What characteristics or features or functionality these structures offer irrespective of how they are implemented in terms of lower level implementation details, which is to say you can implement, as we'll see, cues and stacks in any number of ways, which are going to have real world implications for how you can actually use them and what kinds of problems you can solve with them. So let's consider for instance Q's in the first place. So a Q is something you sort of experience all the time. Anytime you go to a store uh go to uh some event in for which you have to line up in a so-called queue. You'd ideally like there to be some fairness property about that queue such that if you got in line first you get into the store first. You get to check out first or some other such goal. Meanwhile, the person who got there last actually is at the end of the line and stays at the end of the line and therefore gets served or enters in at the end. So Q's have what a computer scientist would say is a FIFO property. First in first out. That is if you're the first person in line, you're the first person to get out of line. And for many problems, that is a good solution. Certainly if you're concerned with fairness. Um but more technically, AQ has what we'll call two operations. NQ, which is a fancy way of saying getting in line, and DQ, a fancy way of saying getting out of the line from the front of it. But those two operations, if you think about it in code, could it be implemented with different actual details? And by that I mean this here is one way that we could go about implementing in CC code a que for a bunch of people or persons who want to line up for something. So for instance we'll decree that this queue can hold no more than 50 people like that's the physical capacity and then we define a structure which we've done a couple of times in the past whereby this structure has not only an array of persons that we'll call people and that will be as big as is the capacity. So this is an array of size 50 for 50 such persons. And then we're going to propose that we also keep track in this implementation of a queue of the current size of the queue. So we're going to make a distinction between the capacity like how many total people can be there and the size like actually how many people are in line at that moment in time so that you know which of the spots in the array are effectively empty. And we're going to call that whole structure a Q. Now the catch with this particular implementation in code of a Q is what there is inherent in it a a limitation something you just kind of have to deal with and I see you nodding what what's your instinct for this >> for example 50 students >> okay well I think you hit the nail on the head in that it's only for 50 students or 50 people which means if a 50irst person wants to get into line you literally have no means of remembering them in this data structure so how do you solve that well we could just recompile our code after changing the 50 to like 51 or maybe 500 or 5,000. But there there's this trade-off because you could still be undershooting the total number of people trying to get into maybe a big concert in the case of an extreme. But at at the same time, if you overallocate memory using 5,000 locations in memory, what if only a few people show up? Now you're just wasting memory. And certainly at the end of the day, you only have a finite amount of memory in the computer. So you kind of have to decide a priority like before compiling your code, how big is this structure going to be? how much space are you going to waste? And in the end, it's all sort of stupid. It would be ideal if instead we could just grow the queue as needed and shrink it. Essentially asking the operating system, as we started doing last week, for more memory and then giving it back if we don't actually need that memory, which is to say can't really do an array in this static sense. And by static, I mean we're literally deciding in advance at compilation time how big this thing is going to be. As an aside, this is also a bit annoying for implementing a queue because you have to somehow keep track of who is at the head of the queue, the front of the queue, because as you start plucking people off, you need to remember who's the next person effectively. But there are ways in code that we could solve this. So let's consider an alternative to a queue which gives us very different properties, namely a stack. And we saw that in the animation whereby uh Jack used a stack to put his clothes into a box so that every time he got dressed he sort of took the sweater from the top from the top from the top and might never wear anything other than black as a result. If he does a wash before he actually reaches the blue and the red sweater there. So a stack as we've just seen has a LIFO property to it. Last in first out. So, if I do a load of laundry and I plop some more sweaters on this stack, well, I'm presumably going to use the last sweater that went in first as opposed to trying to create a mess and like, you know, pull the bottommost sweater out, which is just going to be a little more effort than uh than it would be otherwise from just taking it from the top. So, sometimes last and first out doesn't give you maybe this fairness property you might want for other problems, but it does give you an efficiency, a convenience certainly. So, maybe that might be compelling. And stacks are actually everywhere, too. If you've checked your Gmail recently, odds are you've opened up gmail.com or outlook.com and you've looked at your inbox. And where does the new mail by default end up? At the top. At the top. At the top. And I dare say all of us are guilty of sort of neglecting emails that fall below the break or onto the next page and sort of focusing only on the last in and therefore replying to it first out, which isn't great maybe for the senders of those emails, but it's just how those user interfaces are implemented quite often unless you override those default settings. So how might we implement a stack? Well, we need to implement more technically two fundamental operations. The analoges of NQ and DQ in the world of stacks are called push, which means push something onto the top of the stack, and pop, which means remove something from the top of the stack also. And the the team in the cafeterias and dining halls on campus do this all day long. Any of the cafeterias or dining halls that have stacks of trays, of course, you put the first tray at the bottom and then the next tray and the next tray and the next tray. And which tray do all of you pick up? Well, presumably the one on the very top because it's even harder to grab the bottommost tray than it would be for something like a sweater. As a result, there's maybe undesirable properties like maybe no one ever gets to the nasty tray at the very bottom of the stack because we're constantly replenishing the top ones. But thanks to gravity, like that just happens to be the most appropriate data structure in the real world for distributing things like trays in a cafeteria. So, how might we implement that idea in code? Well, funny enough, we can pretty much use the exact same structure. We could just rename Q to stack because at the end of the day we need to keep track of some number of people and maybe people's is a weird sort of analog here but we kept everything else the same so why not that but the size is also something we still need to remember and it turns out it's a little easier to implement a stack in this way because you could always remove it from the end of the array end of the array and the first thing that went into the stack the first in can always stay at location zero for instance but ultimately we could implement it in this way but we have the same darn limitation You can still only put 50 sweaters, 50 trays, 50 people into that stack data structure. So this is just one implementation approach. But that doesn't mean that's necessarily a limitation of stacks and cues. They're abstract in the sense that we could do better. We could maybe start to manage our own memory, move away from statically defining the total size of this array and just start allocating and deallocating, that is growing and shrinking the data structure instead. which is to say we can make these abstract data types much less abstract with actual implementations. Let's consider a data structure that we saw an abstract data type that we saw early on that we didn't necessarily give this name. A dictionary is yet another abstract data type that's sort of everywhere in the world literally in the world of dictionaries containing words and their definitions. And you can think of a dictionary really in the abstract if you were to draw this on the chalkboard as really just a two column table whereby on the left is the word and on the right is the definition. And if it's a physical book, it's essentially the same thing with lots of columns of words on the left, often bold-faced, and then the definitions right next to them. You can also see this in the context of like a phone book, which is where we began the course in week zero, where it's essentially a dictionary of names and numbers instead of words and definitions. And a computer scientist would generalize the notion of a dictionary further and just call the thing on the left a key and the thing on the right a value. And these things are omniresent in computing. And you're going to start to see them all the more today. next week and beyond in that if you just want to associate some piece of data with another piece of data, a so-called key value pair, a dictionary is going to be your go-to data type. But even these two we can implement in different ways for reasons that we've already seen. Like maybe there's only a finite size to this dictionary if we're using an array. Maybe we can do better than that. And maybe a dictionary if implemented one way is going to be fast. Maybe if implemented another way is going to be slow. So we'll consider these other design possibilities today too in the context of phone books and other data structures as well. After all, if you have an iPhone or an Android phone and Apple or Google only decided that you can have 50 friends because they implemented the contacts app in an array. I mean that would be an annoying limitation. So presumably they've done things a little more dynamically as we'll do today. So let's focus on the first of the data structures we saw back in week 2. That is an array which recall was just a chunk of memory where you can store values in it back to back to back and that was the fundamental definition. The values are back to back to back or contiguous in memory and as we've seen we generally have to decide in advance the size of an array. So for instance if we want to store three values like 1 2 and three it might look pictorially like this or in code let's go ahead and implement this same idea and take a moment to whip up our very first program here and we'll call it say list C. And in this program, let's just do something demonstrative of how you could use arrays to store three things in memory. It's quite simply the numbers 1 2 3, but you can imagine it being three people's names, three sweaters, three people, or any other piece of data as well. So, I'm going to go ahead and at the top of list C include standard io.h. I'm going to then do int main void. So, no command line arguments. Then, I'm going to go ahead and give myself an array of integers of size three called list. And that's how we've done that uh from week two onward. Then just for the sake of discussion, I'm going to hardcode some representative values. So the first value will be at location zero because arrays are zero indexed. Then I'm going to do the second value which will be two. And then the third value which will be at location two, but the value will be three. Now just to prove that we've stored this correctly in memory, let's just do a quick for loop for int i equals uh equals z. Uh i is less than 3 i ++. And then inside of this for loop, I'm just going to do a quick print f of percent i back slashn printing out the value of list at location i. So it's not a useful program per se, but it gives us an array to play with. It prints out that what's in it. So hopefully we will see one, two, and three on the screen. So let me make this list program dot /list enter. And voila, we're on our way going. All right. But what if now we actually want to uh change that design and be like, "Oh, shoot. I now have a fourth number that I want to store or just bought a fourth sweater or a fourth person wants to get in line or I want to add a fourth friend to my contacts. Whatever the scenario might be, it stands to reason that ideally you would plop that fourth value right here in memory so that everything remains contiguous. You're still using an array. Your code doesn't really have to change except for the length. All for for all intents and purposes, it's the same implementation using a just a bit more memory. But recall that when you declare an array of a fixed size, you only are getting promised that chunk of memory, not necessarily more memory to the right, to the left, above or below conceptually because recall in the context of your whole computer, you've got this canvas of memory, all of which represent here bytes. And there could be a whole bunch of actual values or garbage values in memory. So in a more complicated program, that 1 2 3 sure might end up here. But if I also had created a string in this program, h e l o comma world might have also ended up right next to it in memory. Which means I can't just plop the four here because then if I'm still using that string elsewhere in my program now it's going to say hello world instead of hello world because you're just claiming the h that bite as your own which does not in fact belong to your array. Of course there looks like there's plenty of other memory I could use here because these garbage values represented by Oscar are not being used. They've been used in the past, but we treat garbage values as memory we could reuse. Certainly. So, wouldn't it be nice to maybe just plop the 1 2 3 and four in this chunk of memory over here? And I can totally do that. But, of course, if I want to do that, I got to copy the first three values over and then put the fourth one there and then presumably give back to the operating system the memory I no longer need. So, that in fact when using arrays is a perfectly valid solution. And I think we can go ahead and do this in our same program. So let me go back to VS Code here. And instead of statically allocating memory for this array and by static I mean literally hard hard- coding the number three here in a way that is permanent uh effectively. Let me go ahead and do this instead. At the top of my code, let me delete the static allocation of that in uh that array before. And now let me leverage my understanding if still preliminary of pointers and memory management from this past week four to just dynamically allocate a guess at how much memory I need initially. So I'm going to go ahead and use maloc and allocate space for three integers but integers take up a few bytes and it's usually is four but just for good measure I'm going to say times whatever the size of an int is is the total number of bytes I want. So presumably it's going to be 3 * 4 equals 12. But I'm generalizing it. But then recall that maloc returns the address of that chunk of memory, the address of the first bite. So if I want to create an array effectively called list, I can't just do int list like this yet. But what I could say is that all right now my list variable is actually going to be the address of an integer and set maloc's return value equal to that. So in code here what I've done is I'm asking on the right hand side the operating system please give me 12 contiguous bytes in memory. All of those bytes of course can be numerically addressed like ox123425. We've had that story before. Maloclock by definition returns the address of the first such byte and it's on me to remember that I allocated 12 if need be. So I'm just storing the address of that first bite in a pointer called list. But recall from last week, there's this functional equivalence we saw between treating a pointer as an array and sometimes even treating an array like a pointer. The C uh language sort of lets us do this this conversion if you will. So what I could do here now is quite the same syntax as before. I could say list bracket 0 gets one, list bracket one gets two, list bracket two gets three. And even though I have this fancy new line inspired by week four, the syntax thereafter can be exactly the same. Why? Well, recall that these three lines here using square bracket notation is just syntactic sugar for the stuff we learned last week. Specifically, I could instead of doing list bracket zero, I could much more arcanely say go to that address in list and put the number one there, please. I can say go to the address list + one and put the value two there. I could then say finally go to the address at list + two and put the number three there. But this looks ridiculous and even u sort of an experienced programmer might not be inclined to do this. If with using fewer keystrokes and more readable code, they could just do instead what I did the first time around, which is functionally the same, and just treat that chunk of memory as though it's an array. and the computer will essentially do the requisite pointer arithmetic to figure out where to put one, two, and three. So even though this is still kind of fresh, hot off the press from last week, it's exactly the same as we tinkered with last week. So suppose now that some time passes and I realize for the sake of the story that oh shoot, I need more than three integers. I need space for four so as to achieve this picture in memory. Well, I could of course just like delete all that code, change the three to a four, redo the whole thing, recompile the code, rerun it. But let me propose that we write our code in a way that allows us to change our mind while the program is running how much memory we actually need. And case in point, if you meet someone new, you want to add them to your phone. Well, you obviously don't want to have to wait for Apple to recompile the contacts app, reboot your phone just to add one more person. You want the program just to ask the operating system for more memory for that new person. So in this case, let's just pretend that some time passes and now I want to go ahead and actually change my mind and instead allocate space for four integers instead. Well, I could do something like this. I could just say literally list equals maloc of 4* size of int semicolon. I don't need to redeclare list on line 13 because it already exists from line five. But this is bad because what have I done wrong here in line 13? I've made a poor decision. Yeah, in front. >> You like waste all the memory that >> Yeah, I'm wasting all of the memory I had from line five because I'm essentially forgetting where it is. If the list pointer is literally a pointer, like a foam finger pointing somewhere in memory, what I'm really doing is saying point it over here now, but I've completely lost track of those other three integers in memory. And that's what we described last week as a memory leak, which you could find with valgrren. And if you didn't find it or fix it in code, eventually the computer and the program would slow down over time. So this is probably bad. It's not good to just unilaterally change your mind and say, "No, no, no, forget about that memory. Give me a new chunk of memory." especially if you want to copy the old memory into the new, just like I did a bit ago when trying to get the 1 2 3 into the bigger chunk of memory that can fit 1 2 3 4. So, how might I do this? Well, a temporary variable is kind of our go-to solution anytime we need to remember something in addition to uh something we already have in mind. So, let me just give myself a temporary variable called tmp by convention for short and set the return value of this mala call to that. And then what I could do is something like this. Much like my print statement earlier, I could do another for loop and say for int i equals 0, i is less than 3, i ++. And then in this for loop, I could say treat that new chunk of memory as an array like we can set the i location equal to the i location in list. So these lines here copy old list into new list. It copies those first three values. And then what I bet I could do at the bottom here is then just manually I can say go to the fourth location which when you zero index is technically bracket three and set that equal to the number four. So these lines here copy the one, the two, and the three using a loop. And then line 20 here at the moment just adds the fourth value. And again, this is a stupid sort of way to write code in that if you want to put the four there, you should have just done it earlier. I'm just pretending that some time has indeed passed in the program. and I've changed my mind along the way and I want to let the user add some value to memory. Okay, but before we proceed further, I dare say that there are some other mistakes we should clean up. One of the lessons I preached last week was that anytime you use Maloc, what should you do or check for is you should always what? You should always free. So here I'm clearly not freeing any memory. So I should definitely do that. And there was one other rule of thumb with memory. What should you always do when using Malik? Yeah. >> Check to see if null came back, which just means something is wrong, like it's out of memory or something else went wrong. And if you don't do that, your program may very well crash with one of those segmentation faults that we saw uh briefly in the past. So, it makes the code a lot more bloated, but it is good practice. So, let's just check if the list pointer I get back contains null. There's no point continuing on. Let's just go ahead and immediately return one because something has indeed gone wrong. And then down here under maloc again, let's do the same. If the temporary pointer also contains null, now let's go ahead and similarly return one or any other nonzero value. But here's a subtlety and let me combine your two ideas. If I immediately return one on line 20 after the second maloc call fails, what should I still go back and do first? Yeah. Yeah. You want to elaborate on your first instinct? >> Yeah. I want to still free the first chunk of memory because if we execute line five and all is well, which means that line 6, 7, 8, and 9 don't apply. Like it's not in fact null. We got back a legitimate value. That means we have a chunk of memory given to us for three integers, which means it still exists down here at line 19 and 20. So if I'm ready now to abort this program and return one to signify error, I first want to free that original list and say to the operating system, here's your memory back. Now, as an aside, strictly speaking, this is not necessary because the moment the program itself quits, the computer is just going to give back the memory to the operating system. So when programs quit, the memory leaks sort of go away, but your code is still buggy. And generally we're running software that doesn't run for a split second but for minutes, hours, days, uh continually in which case it's best practice to squash these memory related bugs now. Check for null, free any memory so that you never indeed encounter these kinds of leaks. All right, so let's forge ahead a little bit more and let me propose that after we have done the copy, we now want to similarly free the original list. However, what I think we're going to want to do first is after freeing the original list is remember that the new list is effectively that which we allocated the second time around. So even though this program is getting a little long, notice that what I've just done is I've said, okay, store in the list variable the address of this new chunk of memory. So that list now with a foam finger is effectively pointing here instead of up here. But before that, I made sure to free what my finger was pointing at originally, the list pointer. All right. Lastly, let's just scroll down to the bottom of the code here. I can manually change the three to a four just to demonstrate that I've stored all four values in here. And then at the very end of the program, I think I have to free the list again because now list is pointing all the foam finger to the bigger chunk of memory, the 1 2 3 4. And then I can go ahead and return zero at the very end because all is hopefully well at this point. Let me go ahead and open my terminal window again and make this version of list. I made a lot of mistakes here it seems. Let's scroll up to the very first call to undeclared library function maloc dot dot dot. What have I apparently done wrong or forgotten? What have I done wrong? Yeah. In back. Yep. Yeah. So in standard lib.h H is where maloc is actually declared. So let's just add that quickly. Let's go ahead and include standard lib.h in addition to standard io.h. Let me clear my terminal window. Rerun make list. Enter. Now we're good. Dot /list. And ph we see 1 2 3 4. Okay. So at this point in the story, all we've done is write a dopey little program that allocates memory for three integers. 1 2 and three. then changes our mind and allocates more memory for four integers, freeing the original chunk of memory after copying the first three integers into the new memory and adding that fourth value. But this is kind of a lot of hoops to jump through. And let me propose one refinement here. So if back in VS Code, we go back into list.c here. It turns out that at least this loop isn't strictly necessary, not to mention the fact that we already have another loop for just printing the list. If I want to more cleverly reallocate memory, it turns out that there's another function that we didn't talk about last week, but is in standard lib.h2 called realloclock, which as the name kind of suggests, it reallocates memory, but a little smarter in that it will try to grow your existing chunk of memory if it can, which is going to be super efficient because then you can just plop the four at the very end. or if there just isn't room there because maybe someone else put hello world right there in memory elsewhere in your program. It's going to do all of the copying for you. So what you get back ultimately is a pointer to the new chunk of memory containing all of the original data as well. However, we're still going to have to check for null. We're still going to want to free the original list if something goes wrong and then return one. We're still going to want to add the fourth value because realo has no idea what more we want to put in the list. But I can in fact delete my other for loop whose purpose in life was just to copy all of those integers from old into new. All right, that was a lot. Let me pause for any questions. >> How does real know that it should reallocate the memory in list? Should you tell like if you have a lot of before, how does it specifically? >> Very good question. That's because I wrote a bug uh that we didn't trip over because I didn't compile this version of the code. So the question is how does realloc know what to realloclock? Well, according to the documentation which I forgot to read, you need to tell realloclock what the address is of the chunk of memory that you do want to realloc. So the first argument to realloc, which I did admittedly forget until a moment ago, is to put the address of the chunk of memory that you already maloced earlier so that it knows to go there, see if there's indeed some garbage values it can reclaim at the end of that chunk of memory or if it has to wholesale move things elsewhere in memory to give you four times the size of the int this time instead of just three. But still things can go wrong like you still want to check for this null value because real might not be able to give you enough memory or your memory could just be so fragmented that even though you want four bytes maybe there's three bytes over here two bytes over here one bite over here if there aren't four contiguous bytes realloclock 2 could fail and it will return null to signify as much other questions on any of this >> why do we still need the tempable >> why do we still need the temp variable for the same reasons as before because if we just say list equals reallock and something does go wrong. Realloc by definition will return null but not touch the original memory which case we have now lost track of where that original chunk of memory is. So we can never go back to it to print it to change it to free it. So we have to use this temporary variable here. Good question. Other questions? Yeah. >> Is there a reason? Is there a reason that we free list instead of temp? Uh, so let me So down here or further down? Okay, so further down, let me scroll down to where we came from. So here after we've added this fourth value to temp, I've gone ahead and freed list, which at this point in the story is still pointing to the original chunk of memory, the 1 2 3. Then I am updating list as a variable to point to the new chunk of memory. Then I'm doing my thing by printing out all of the integers therein. Then I am freeing what list is then pointing to. So I'm not technically freeing the same address in memory multiple times because I'm in the intervening time moving what list is pointing to. >> Absolutely yes. it would be correct to go ahead down here and just say temp because temp is still in scope. It's still pointing at the same thing. I would just argue that that's semantically wrong because at this point in the code really list is the variable you care about. Temp was really meant to be a throwaway temporary variable and you're asking for trouble if you use a temporary variable later than you the programmer intended. And if a colleague did that too, who knows what you've done with the temp variable in the meantime. Good questions. Yeah, in front Real always goes for the like memory space right after your original place. >> Correct. Realloc will try to give you more memory in the same location as before if there's room at the end. >> The code we made earlier originally instead of realloc >> so realloc will two potential things for you. So if the computer's memory looks like this, you're sort of out of luck because realo can't give you this bite. However, if it finds like four bytes down here, for instance, realloc will not only allocate those four bytes for you, it will then copy the data for you over to it, which is wonderful because it just means we don't need an extra for loop all the time we do this. Yeah, in front. >> How does it know how much data? >> How does it know how much data to >> copy? >> Uh because how much how does the how does real know how much data to copy? Because the operating system and you can think of it as the standard library stdlib.h keeps track of what memory has been allocated for you in the past. So when you pass in that same address, it knows it has essentially a lookup table, a dictionary if you will, that tells it what memory has been allocated already. So you don't have to worry about that. >> Yeah. In front. >> Good question. In other programming languages, you don't always have to declare the length of an array. Case in point, Python coming next week. That is because someone else who invented that programming language wrote all of this kind of code for you. And indeed, that's one of the goals with our transition between weeks five and six is to demonstrate that all of these problems are still being solved, just not by you and not by me anymore. We're standing on the shoulders of other smart people who have invented not just new code, but like a new language and a new compiler, or as we'll see, an interpreter for it so that we can hide all of these lower level details. Because honestly, as you can see already, like this is an annoying number of lines of code just to have a conversation about the numbers 1 2 3 4. In Python, we could reduce this code to like two lines of code, one line of code. It's going to be fun. All right, so with that said, the uh among the goals here was to demonstrate that there are a bunch of ways in which we can implement these data types, but let's talk more concretely about what we'll call data structures, which are concrete definitions of how you use the computer's memory to lay stuff out in memory. and using data structures, you can implement stacks and cues and dictionaries and all of these other things. So, we're going to put into your toolkit today a whole bunch of canonical data structures that like every computer scientist does and should know that you necess won't necessarily implement all of the time yourself. But when you use some feature of Python or Java or C++ or some other language, you are choosing among typically implementations of these data structures that someone else has written the code for so that you can just benefit from the functionality and the features thereof like that FIFO property we talked about or LIFO without having to get into the weeds too much yourself. So when it comes to data structures, let's consider that we have at our disposal now a few new pieces of syntax in C and we're going to add just one more today. We saw last week that we have the strruct keyword and we've seen that for a few weeks now. Whenever we want to invent our own data structure, we can use literally strruct. We saw in the past that you can use the dot operator to actually go inside of a structure to get at someone a person's name or their number. And we saw last week the star operator for dreferencing a pointer, dreferencering an address to actually go somewhere like inside of a structure wonderfully. Today we're going to see that you can actually in some cases combine the dot and the asterisk into a single operator with two characters that literally looks like an arrow and that will help reflect the yellow and black drawings that we've done over the past couple of weeks where we have an arrow on the screen pointing somewhere. This literal arrow in code is going to line up with that same concept. So let's introduce the first of our alternatives to arrays. An array again is a contiguous chunk of memory where the values are back to back to back. Among the upsides so fast because like all the data is right there. We've seen since week zero, you can do binary search and just jump around randomly by just doing simple arithmetic to go to the middle the middle of the middle by just dividing by two a couple of times and rounding as needed. But the problem with arrays to be clear is that they are statically uh they are statically all allocated to be a specific size maybe three maybe four but it is a finite value which is problematic because look at all the code we had to write just to resize these things again and again. Well, what if we sort of try to preempt that kind of pain and try to just build up a list by linking it together no matter where the values actually are in memory and move away from this constraint that everything has to be contiguous. After all, as I said a moment ago, if the computer has plenty of memory here, here, here, here, that to collectively is more than enough memory, but none of those individual chunks is quite as big as you need for an array. Well, heck, let's at least try to leverage all of the available memory and stitch together the data structure as opposed to really holding firm this constraint that the array be back to back to back and contiguous. So, a linked list is something you can now build using that syntax from last week and a bit more today in your same canvas of memory. So, that for the sake of discussion, suppose that we want to store first in our list the number one. Well, we all know already that it might very well exist at an address like ox123 for the sake of discussion, but it's somewhere there. Suppose that you want to store a second value in memory, but you didn't think about it initially and so you weren't smart enough to put it like right next to the one and then the next value next to that, but you know somehow from maloc or similar functions that you could put the number two over here at address ox456 for the sake of discussion and similarly there's room for the number three over here at say address ox789. So already we have a list of values in memory, but because they're not continuous, you can't just do some trivial plus+ trick to go from one to the other because they're differing numbers of bytes apart. They're not just backto back one bite. So what if we try to solve that problem in the following way? Instead of just using one bite for each of these values, let me waste a little bit of memory or spend a little bit of memory and have some metadata associated with our data. So data is value or values you care about. Metadata is data that helps you maintain the data you care about. So let me propose that we use two chunks of memory for every value such that the top of each of those chunks represents the actual var you we care about 1 2 and three respectively. And you can perhaps see where this is going. The second chunk of memory that I've allocated to each of these values could perhaps be a pointer to the next one. A pointer to the next one. And if this is the end, we can put our old friend o x0 aka null and just treat that as the end of the list implicitly. So even though these things could be anywhere in memory, by just storing with each value the address of the next value in memory, creating effectively a treasure map or breadcrumbs, however you want to think of it metaphorically, we can get from one node to the other. And indeed, that's going to be a term of art we start using. A node is just a generic structure that contains data and metadata usually like the number you care about and a pointer to the next such node. Um these are not to scale as an aside. This is typically four bytes. A pointer as we've discussed is technically eight bytes but it just looks prettier to draw them as simple squares on the screen. So what does this really mean? Well, who really cares about ox 1 2 3 4 5 6 7 8 9. We can really think of this actually as being more of a picture with arrows. But to keep track of this list of three values, I do propose that we're going to need one additional value over here. And it's deliberately just a single square because to keep track of this list of three values, I'm going to use just one variable called say list and store in that variable a pointer as we defined it last week, the address of the first node. Why? Because the first node can then get me to the second. The second node can then get me to the third and so forth. So what's the upside now? If I want a fourth value somewhere on the screen, I could put it here, here, here, here, wherever there's enough room and just make sure that I update the arrow to point to that next chunk. Update the arrow to point to the next chunk. There's no copying of data. 1 2 and three can stay there now forever until the program quits and we do actually free it. But we can just keep adding adding adding or growing this data structure in memory. So that is what the world knows as a linked list. In Python to which you were essentially alluding um a list in Python is indeed a linked list. Other languages call these vectors but they are essentially arrays that can be grown and shrunken automatically effectively without you having to worry quite as much about it. So how does the code for implementing something like this work? Well, let me propose that we have this familiar friend of a person, which we claimed in past weeks has a name and a number associated with them. We know from last week that strings are not technically a thing in C as a keyword. So that's technically just char star name and number, but same idea otherwise. And this is what we defined in the past as a person. So this is a structure we've seen before. I now need to implement the code equivalent of these rectangles, each of which has an integer and then a pointer to the next such value. So let me propose that we delete what's inside this structure, change the name from person to node, which again is a generic term for a container of values, and let me propose that inside of this new node structure, we put literally an int for the number we care about. There's going to be my 1 2 3 or four. And then and this is a little bit new. Let's include in this structure a pointer to the next such node. It's a pointer in the sense that it's an arrow. It's the address of the next node. So that's why we say node star. I could call it anything I want, but semantically calling it next makes perfect sense because it's the next such node. But this isn't quite right. For annoying technical reasons, I need to do one other thing here. I need to technically and we've not done this before put the name give the a temporary name to this structure if you will. So literally say strruct node here even though I've already said node here. Why? Because I technically need to change this line to say strruct node star. Long story short why is this necessary? Well recall in the past C and the compiler read your code top to bottom left to right. Well if in a previous version of this code we use the word node here but the compiler never sees the word node until down here. like it's just not going to compile because the word literally doesn't exist. We saw this with functions in the past. So we the solution to that was to put the prototype higher up in the file and then it would compile. Okay, you can think of this as somewhat analogous whereby if I give this structure a name on this first line even if it's redundant to this one then I can say struck node inside of these curly braces because the compiler has already seen the word node there. So just you have to do it this way. So now that we have this in code, we can kind of start playing around with actually storing these things in memory. So let me propose that we go ahead and do this by transitioning back to VS code here. And let's instead of using our array based implementation, let's implement the first of our linked lists. And I'm going to be a bit extreme and delete pretty much everything inside of main. I am for convenience now going to include the CS50 library not so much for the char star thing but because as we discussed last week it's still useful for getting ints and getting strings and other things which instead unless you use scanf are much harder and more annoying to get in C. So let's go ahead and do this um outside of main let's go ahead and invent this node called strruct node here. Then inside of my curly braces, we'll give every such node a number and every such node a pointer to the next such node. And we'll call this whole thing node by convention. Then inside of main, let's go ahead and do this one step at a time. Let me propose that to create a linked list. Initially, it's empty. So how do I represent an empty linked list? Well, I could call the variable list and set it equal to null. But what is the data type for a linked list? Well, per the picture that we had up earlier, in so far as all we need is a single pointer at far left here to represent the address of the first node in the list. I dare say all we need to say is that our list is of type node star. That is to say, what is the link list? Well, it's by definition the address of the first node in the list. So that's the first subtlety here. So that gives me a picture with no other nodes. It just gives me a single pointer initialized to null. Now let's go ahead and for par with the previous example just do something three times. So in this for loop structured exactly as before, let's go ahead and allocate a new node, ask the user for a number to put inside of it and then start stitching things together so as to achieve a picture in memory quite like this. So how am I going to do this? Well, first I need to allocate a new node. How do I do that? Well, I can use our new friend Maloc and allocate the size of a node. I want to store the address of this chunk of memory somewhere. And what I'm going to propose is that we have a temporary variable and I'll call this n which whose type is that of a node star. So what am I doing here? I'm trying to build up this list in memory so that I first have a pointer to the list. I I first have a pointer that is null pointing nowhere. no list exists. I then want to go ahead and create one new node, store value in it, and then point my list at that node. Then I want to do it again and again a total of three times. So how do we do this? We allocate space for the size of a node. However many bytes that's going to be, it's probably going to be 12 cuz it's four for the int and eight for the pointer, but who cares? Size of will answer that question for me. I'm going to store the address of this chunk of memory inside of a temporary variable called n for node and that's why it has to be node star because it's going to be pointing to an actual node. I'm going to do my quick sanity check. So if n equals equals null, we can't proceed further. I'm going to go ahead and just return one right now. So that's just sort of boilerplate code you should be in the habit of doing anytime you're using Maloc. But if all goes well, let's do this. Let's go to the address in n and then go inside of that node and change its number to be whatever the human wants it to be by using get int and just prompt the human for their favorite number. Then let's go to that same node and update the next field to equal for now null because all I want to do is allocate one new node with that number. That's it. Then I'm going to need to stitch this together further. So I'll propose that all we need do and let's clean this up first is now make sure that we string these nodes together. This syntax isn't quite right because technically because of precedence I need to drefer oops I need to uh dreference n and then go inside of it. I need to dreference n and then go inside of it. However this syntax if it's looking a little overwhelming and you have no idea now what's going on. Thankfully in C there's much simpler syntax which is this. Go to the node and go inside it to get the number. Go to the node and go inside it to get next. So the arrow notation that I promised we would now have is the same thing as using the star operator the deep reference operator parenthesizing it. Then the dot operator which is just a pain in the neck to write out all the time. I dare say n arrow number and n arrow next is just much simpler. It says go to n and point at the number field or the next field respectively. All right. So the last thing I'm going to propose we do and then we'll make this much more clear in picture form is this. Let's go ahead and prepend the node to the list. And by prepend I mean insert it at the beginning. Insert it at the beginning. Insert it at the beginning again and again. I'm going to say n next equals list. Then update the list to set equal to n. And then after all of this mess, I'm going to return zero. Okay, this was a huge amount of code, but let me give a quick recap. Then we'll paint a picture. Here is my init list initially. So the foam finger is pointing to null, which is means the list is of size zero. There's nothing there. Then I ask the computer to do this three times. Give me enough memory for a new node. Then after checking that it's not null, put the user's favorite number in it and update the next field for the moment to null. Then lastly, go ahead and prepend this brand new node to the existing list. And by preand prepend, I mean put it at the front. So n at this moment is pointing to that new node. And I'm saying, you know what, whatever the current list is, empty or otherwise, set the next pointer equal to the list, whatever that list is, and then change the list to point at this new node. So now let's do this more carefully, step by step, in picture form. So I'm going to propose that we go through some of these representative lines as follows. Here is the first line of code even without the assignment. If you just allocate a variable called list that's a pointer to a node, what you essentially has is a box of memory that looks like this. It's a garbage value though because there's no assignment operator. So who knows what's inside of this pointer. That is why in my actual code I set it equal to null which effectively creates in memory the same box but gets rid of Oscar the Grouch and puts the null value there. So we know it's not a garbage value. It's a pointer known as null. So that's what that very first line of code did in the computer's memory. The next thing I wanted to do was allocate enough memory for a node, not a node star, for a whole node. I want that whole chunk of a rectangle given to me in memory. That's going to return to me the address of the first bite thereof. And I'm going to store that in a temporary variable called n. So at this point in the story, n is going to be a pointer of its own, another box that initially sure is going to be a garbage value, but because I am using the assignment operator, it's going to point to that chunk of memory which maloc if successful presumably allocated for me in the computer's memory. So n for all intents and purposes points at that same chunk. These values are still garbage values because it's just a chunk of memory. Who knows what it's been used before? But that's why after this line of code, I took care to get an int from the user and then initialize the next pointer to null. So for instance, for the sake of discussion, let's get rid of get int for the picture and just say the human typed in the number one initially. Well, that's equivalent to putting the one in the number field by first going to the address of in n and then dreferencing it using the star and the dot notation respectively. So that means follow the arrow and then change number to the value one. Then the next line of code or rather or equivalently you can just do the same thing. And thankfully now C syntax lines up with what the pictures look like we've been drawing. Go to N follow the arrow to the number field. That's literally what the syntax is telling me. Meanwhile, if I use that same syntax again for N arrow next set it equal to null. That's like saying go to N follow the arrow and change the next field in this case to null. or we'll just blank it out to be clear. So at this point in the story, we have allocated the node. We have stored one and null. There list is still null. N is pointing to this, but the whole point of this exercise is to add this node to the list. So we need to somehow update this value, which is why ultimately I'm going to do something like list equals N. Now that seems a little weird semantically, but recall that N is a pointer. That is the address pointing at ox123 or wherever that is. So to point list at the same node, it's equivalent to setting list equal to n because then we'll effectively have an arrow identical from list pointing at that new node. And at this point, I don't even care what n is anymore. It was always meant to be a temporary value. This now is my list. So even though I did it in code already pre preemptively in a loop, the first iteration for that loop literally created this in memory. Let me pause before we go through numbers two and three for any questions because the VS Code version looks scary. This is perhaps a little more bite-sized. Okay. So, how about we do this twice more for two and three, respectively. So, again, inside of our loop, we're back to this line, which asks the operating system for enough memory for the size of a node, stores that address temporarily in a variable called n. So, here's our friend Oscar brought back onto the screen. Maybe the new chunk of memory is over there. This effectively points n at that chunk of memory. The next line of code inside of that loop that's relevant is this. And we'll get rid of get int and just pretend that I literally typed in two. We're going to go to this version of n, follow the arrow, go to the number field, and set that equal to two. The next line of code, we start at the end, follow the arrow, change the next field to null. And then same lines as before, we now need to update list equaling n. But something's about to go wrong here. If I update list to point to the same node that n is pointing at, watch what happens. I set list equal to that n because it's temporary might as well go away at this point. But what have I done wrong logically here? Yeah, >> you lost the arrow to >> Yeah, I lost the arrow to the original node. I have orphaned the first node because now nothing in my code is actually pointing at it. I've got in duplication two pointers pointing at this chunk of memory. So this thing, even though we obviously as humans can still see it, we have lost track in code of where it is, which means that is the definition of a memory leak. I can never get that back or give it back to the operating system until the program itself finally quits. So, I think I need to be a little smarter and not do this line quite like this yet. I think what I want to do, and I've rewound, so list is still pointing to the original list. N is pointing to only the new node. What I think we need to do is something like this. And this is why the code was fairly non-obvious in VS Code at first. Go to N, follow the arrow, go to the next field, and here's the cleverness. Point this pointer to the existing lists value. So if the existing list is pointing here, that just means, hey, point this to the exact same thing because now I can safely update the list to point at the same thing as n. So its arrow now points here. But even when I get rid of n, I wonderfully have the whole thing stitched together. And the metaphor I often think of is like around like Christmas time in olden times when people would like stitch popcorn together. That's what you're kind of doing with a thread here. You're trying to stitch together these nodes or popcorn kernels if you will such that one can lead you to the next can lead you to the next can lead you to the next but you can never let go of part of that strand in the process. So here now we have a list which is great because notice we haven't touched the one but we've added the two. We can go ahead in a moment and add the three but you can perhaps see where this is going. I'm kind of doing it backwards by accident but we'll get there soon. So now let's allocate a new node run through in our mind's eye all of those same steps. I'm going to hopefully end up with a list that now looks like this. And even though it's kind of long and stringy, these values could be anywhere in memory, but because of these various pointers, I can jump from one location to the other, making more efficient use of everything inside of the computer's own memory. All right, but of course, we've got this symptom that I didn't really intend whereby the whole darn thing is backwards. But I think that's kind of okay for now. But I'd like to propose that we consider how we can now maybe traverse this thing and actually print out the values in memory. So let me go ahead and do this. Let's go ahead and how about let's say let's go back to VS code here. So at this point in the story we've got the same code that implements that same idea except I'm using get int just so that I can dynamically type in the one the two and the three without having to hardcode it into the actual code. Suppose that after doing this exercise, I actually want to do something interesting like print the numbers. Well, we don't have that code yet in this version of my program. So, let's bring that back. Last time I did this just using a for loop and array notation. And I think I can do that. But let me propose first that I implement this idea pictorially. Here's the same diagram. This is what exists in the computer's memory. If I want to go ahead and print out these numbers, albeit in reverse order, let me propose that we can do this by giving ourselves another temporary variable. We'll call it ptr, pointer for short. And that's like having another foam finger that points at the start of the list. So it's not pointing at list. It points at whatever list is pointing at, which means here. Then I can print out the three pretty easily. So long as I next update pointer to point to the two, print it out. then point it to the one, print it out, and eventually I'm going to realize, oh, I'm out of nodes because the end of this list is null. So that's the idea I want to implement now logically in code. Create a temporary variable called pointer. Set it equal to whatever the list itself is. Print out the value, update the pointer, print out the value, update the pointer, print out the value, update the pointer, realize it's null, and stop. So in code, it's a relatively small loop, even though the syntax is still pretty new since we've only just started playing with memory since last week. But what I'm going to do is exactly what I proposed. I'm going to create a new pointer called ptr and set it equal to the list itself. That's like having another foam finger temporarily pointing at the first element in the list. Then what I'm going to do is say while that temporary variable is not null, go ahead and traverse the list. What do I mean by that? Well, let's go ahead and print out the current element in the list by using percent i back slashn and printing out whatever the pointer is pointing at specifically its number field. So that is follow the arrow and print out the number. Then inside of this loop, I'm going to update after doing that my temporary variable called pointer to be equal to pointer arrow next. And that will have the effect with just those few lines of code of implementing precisely this idea. I first set pointer equal to the list which happens to point here first. I then do my print f and then I update the next field rather I update pointer to be the value of pointer follow the arrow next. So if this is ox123 for instance that is what is now in oh sorry if this is ox456 that is what's now in pointer. So the arrow effectively looks there in my for loop I print out with percent i this number and then I go to the next field follow the arrow and then set it equal to rather whatever this pointer is here ox789 set it equal to the pointer there. So I effectively move the arrow there. Then lastly, I update ptr to point to the value of this next field which is null. Which means effectively pointer itself is null. Which means the for loop cleverly stops now because I was supposed to do this whole loop while pointer is not null but pointer is now null. And just as an aside, if you prefer the semantics of a for loop, there's nothing new here per se. I can do this exact same thing using a for loop simply as follows. And it's a little tighter to implement as follows. I can say for instead of int i equals z in that old approach. I can actually use pointers in a for loop like this. For node star pointer equals the start of the list. Keep doing something so long as pointer does not equal null. And on each iteration of this loop, update the pointer to equal whatever the pointer's own next field is. And then inside of this for loop print out using percent i back slashn the current pointers number field semicolon. So here is where again we see the equivalence of for loops and while loops. What you can do with one you can do with the other. This is a little more elegant in that you can express a whole lot of logic in one line of the for loop. Frankly I do think the first version is nonetheless more readable. So let me undo undo undo undo everything I just did. On the courses website you'll see both of these versions. This one's a little more pedantic as to what it's doing step by step. Okay, that two was a lot. Let me pause here to see if there are any questions. And if you're feeling like that fire hose like this is why we transition to Python where all of this now gets swept under the rug but is still happening just not by us in a week. Questions? Yeah. Yeah, really good question. So we I I here I've been preaching like we don't want to lose memory. We don't want to leak memory. And here I am fairly extravagantly now spending twice as much memory to maintain this data structure. That's going to be among the themes with all of the data structures we talk about. If we want to gain some benefit like dynamic growth and shrinking of the data structure, you got to give me something. And what you've got to give me in this case is the ability to use more space. Um, in a bit today and after break in particular, we're going to decide we'd really like these algorithms to be faster. Well, that's fine, but you're going to have to give me something in return. You're going to have to spend more space to make the code faster. And so time and space and financial cost and human time and any number of other resources are all things that you need to evaluate as a programmer or a manager and decide which is least andor most important to you. And right now I don't care about space as much as I care about the dynamism that I'm trying to solve first. Other questions on here? Yeah. >> Yes. Why am I using pointer instead of n? I Well, yes, I could reuse n at this point. I deliberately chose to use pointer for two reasons. One, I'm using it for different reasons here. Um, two, it's not necessarily the best idea to use one variable here for a specific purpose and then reuse the name down here besides it's out of scope at this point anyway. Um, so it just makes me feel better that I have different variables doing different things, but it would not break if I did it your way. Other questions? Yeah. And back >> are pointers temporary? Not necessarily. Like the linked list we are building up in memory exists because we are using pointers to build this data structure and to keep it intact for as long as the program is running. My temporary variables n and pointer ptr in this case those are ephemeral and I'm only using them to kind of stitch things together temporarily. A good question. All right. So let's now motivate why we're spending so much time sort of stitching these things together so carefully. Well, here's our little cheat sheet of common but not exhaustive running times. Let's consider what the running time is for some fairly basic operations like inserting a number into a linked list, maybe searching for a number in a link list or traversing it uh and also deleting ultimately numbers in a linked list. So here is my list initially completely empty. And suppose I go ahead and insert the one, then I insert the two, then I insert the three using code like we just wrote. I love this approach because even though it looks a little scary at first, this is probably the simplest way to implement insertion into a linked list. Why? Because I'm just constantly prepending the next element. Prepending, prepending, which means all of my hard work is just here at the beginning of the list. So even if this thing has a thousand elements in it, I'm only manipulating some pointers all the way over here pictorially at the left, which means it's pretty darn fast. So given that definition in this picture, what would you say the big O running time is of insertion into a link list when using my current implementation? >> Big O of one. Why? Well, it's not literally one step, but it is a constant number of steps because if we literally counted the lines of code I was executing, it's a a few steps to sort of point one thing up here, point the other thing down here, then update the third, and boom, we're done. In particular, what my current code does not care about is the whole length of this list. Why? Because I'm never traversing the whole thing for the insertion part. I am obviously for the printing part, but for the insertion, I'm just prepending again and again. The downside though of this approach is that the whole darn thing is coming out backwards. I'm not doing anything with regard to the ordering of these elements, which means what's the running time of search going to be? For instance, if I tell you search for like the number one, find it for me. What's the running time going to be there in big O? Big O of yeah, big O of N because in the worst case, it's going to be all the way at the end. And we've seen this scenario before. So, it's big O of N for searching. It's definitely big O of N for traversing or printing. But that goes without saying. If you want to print every element, obviously you have to touch every one of the N elements. But what about deletion? Suppose I want to delete an element. That's going to be in big O of >> N. >> Also N. Why? Because again in the worst case it could be all the way at the end. So only insertion as currently implemented is bigo of one because we are exercising full control over where the new elements go irrespective of what the actual values are. So things could escalate quickly here if we do actually want to start keeping things say in sorted order because we can no longer just naively plop things at the very beginning of the list. I think we need to start being a little more careful as to where we put things. So in fact, even though we're doing okay on insert right now, we still have big O of N for the searching and for the deletion, which we won't do in code, um as well as of course for traversal. So how else might we go about building this list? Well, let me propose that we could maybe append to the end of the list. Let's try that and see if it gets us anywhere better. So here's my list initially, completely empty, aka null. I go ahead and insert the number one as before, but now in this algorithm I'm going to insert the number two and the number three. So this is great because now by chance it ended up beautifully in order. But that's because I chose the numbers 1 2 3. But we'll come back to that detail. Let's consider now what the running time is of this algorithm of insertion using appending to the list. What's the big O not big O running time of insertion now? Big O of N. So it's sort of strictly worse because now it's always going at the end. Now I could be a little smart about it. I could just allocate another pointer and just always have another pointer pointing at the end of the list just as I have a pointer pointing to the start of the list. That's totally fine if you're willing to spend one more pointer which is a drop in the bucket. A legitimate solution. But where I'd like to go with this is let's maintain sorted order no matter the order in which the numbers are inserted. Whether it's 1 2 3 3 2 1 213 312 whatever order the human types in the numbers I want to build the structure out such that they always end up in sorted order just so that my contacts in my iPhone or my Android phone for instance are sorted as intended. So how do we go about doing that? Well here we're still dealing with some big O. Let's try this. Here's my list initially empty. Now we the user inserts person number two first. So it ends up there. Then they insert number one. I'd like it to go there. person number four, it goes over there. And then person number three, it ends up here. Even though it's sort of obvious with a piece of paper and pencil how to stitch this together, this is now an annoying number of logical steps because there are so many opportunities where I could screw up and orphan one or more of these nodes. But let's consider the scenarios that might we encount we might encounter. Maybe we get lucky and it's like an empty list and we just have to insert one new node. That is trivial. We've done that already. The two was super easy to implement. The one could be really easy to implement too because that involves the prepending scenario and we've seen that prepending is super simple. So there's only two other scenarios to consider appending if it's a really big number and ends up at the end and we've talked about but haven't seen code for that. The annoying one I dare say is going to be when the new number belongs in the middle. But I propose to think through it this way because now you just have four problems to solve not just one massive illdefined problem. You've got scenarios in which you want to insert a new node into an empty list. you want to prepend the new node into the beginning of the list, append it to the end of the list or somewhere in the middle. So that's like four blocks of code in my program. I can now sort of take the proverbial baby steps and implement this bit by bit. And to do this, let me propose that in a moment I'll switch over to VS Code, but uh sort of Julia Child style, I'm going to open up a pre-made version of the program that actually gives us a working solution, albeit initially with some bugs. So here we have out of the oven this version of list C at the top of the file I've got my same includes as before I've got my same structure as before here I've again got in main void I've got the beginning of my list here setting it equal to null and then for the sake of discussion I'm going to insert three values for this example 1 2 and three by allocating enough room for a node setting it equal to n then I'm going to make sure a sanity check that n is not null and then I'm going to populate this with the human's first choice of values. So, let me scroll down. But as such, there's nothing too new just yet. Here we have the lines of code in which I'm getting an int from the user, setting next equal to null, and then I'm prepending no matter what per our earlier version that we did on the fly this new node to the list and then updating the list to point to it. And then down here, I'm printing the number. So, this is where we left off, but this is a pre-made version that's nicely commented. It's on the courses website for reference. What I'm not doing now is intelligently prepending, appending, or plopping the code in the middle. So, how do we do that? Let's take a look at this version of the code. So, everything thus far is the same. And if I scroll down besides the new comments, you'll see that now I'm starting to make some decisions after I have allocated the new node and populated its number and next field. As an aside, I don't strictly need to initialize the next field to null because eventually, as we've done in every past example, I've updated that next field anyway. However, because this one might now end up at the end of the list, and I just want to program defensively, initializing pointers to null before you're ready to assign their value is a good thing in general. So, here's the first of the questions I'm going to ask myself. If the list into which I am inserting this new node is empty, so it's the beginning of the story. Super easy. Just set the list equal to the address of that new node, and we're done. That's what happened when I inserted a bit ago the number two for the very first time. So indeed what has just happened here is that now the list previously empty contains only a node containing two. However, thereafter there was another scenario. So when we moved on in our story and added the number one to the list, well that happened to end up at the beginning but it could also end up at the end or in the middle. So let's break down those scenarios here too. So here if it is not the case that the list is empty in that if condition we're going to end up here now in the else. What do I want to do here? Well let's go ahead and for now in this simplified version append it to the end of the list so we can see that code. How do I do this? Well I'm using a for loop much like the one I had before which just allows me to traverse the existing list whether it has one node or many. And I'm gonna ask a question. If following the current nodes pointer field, next field leads me to null, aka the end of the list. Okay, let's go ahead and update the end of the list to actually equal the new node. So in other words, if I'm sort of following following following all of the arrows and I reach a node whose next field is null, no problem. Update that next field to point to the new node I want to insert. Irrespective of the values, I just want to append this node. no matter what. And then I want to break out of the code. Then at the bottom of this version of the program, it's all quite the same, printing out the numbers using the for loop version of my code from before instead of the while loop, but they're equivalent. But what I did do in advance in baking this version of the program is also go through the motions of freeing every one of the nodes afterward, but we'll come back to that. So this version of the code, just to be clear, only appends nodes to the list. It's still not treating things in order. But we've now seen two of the scenarios plucked off. The list is empty or it has numbers and we want to put something at the end. So let me propose now that I take out of uh our distribution code another version of this program that does that and a bit more. I'm going to go ahead and open up in just a moment a new and improved version of list.c. And now it looks almost the same at the top. Scrolling down. Scrolling down. Scrolling down, here's some now familiar code. If the list is empty, do that simple thing as before and just prepend it. Uh rather just set it equal to the list. But here is now where we're adding some inequality. So if the number in question belongs at the beginning of the list. So if the number in the new node n is less than the number in the current list which is presumed to be the first node at the moment then go ahead and update the new node's next field to point at the existing list and then update the list to point at this new node thereby giving us from two in the list to one and two in the list. To be clear, if I go back to VS Code here, what's happened here is because one is less than two, of course, I'm going to update the new nodes next field to point to the list. What does this mean? Well, the new node at this point in the story is the new node for the number one because that's the second thing we're inserting. I'm going to update its next field to be whatever the list a moment ago was already pointing at. So this is the after effect but a moment ago list was pointing at only the two. So now the next field of the one points at the two and then lastly here in this line I update the list pointer to be the address of that new node. And here's where I'll wave my hand a little bit today because it starts to escalate quickly. It's useful and it might very well be useful for problem set five in particular, but I think more healthily reviewed step by step at a slower pace. Here is where I'm asking myself, all right, if it's not the only element in the list and it doesn't belong at the beginning of the list, well, it belongs somewhere later in the list, which gives me two final scenarios. Let's figure out which scenario we're in. Let's use this for loop to iterate over all of the as as many of the nodes in the list as we need to. If we get all the way to the end, because our pointer variable now equals null, it's like following the arrows, following the arrows, and maybe we're trying to insert the number five. I've already hit the number four. I've hit null. five belongs at the end. So here we have our promised append code which is exactly the same as before but now I'm doing it conditionally if I've indeed found my way to the end of the list. And then lastly, let me scroll down just a little bit. If it's not the case that the list is empty and it's not the case that the new node belongs at the beginning and it's not the case that the new node belongs at the end, I'm just somewhere in the middle of the list because the new number I'm inserting is less than the one I'm looking at here. And it's okay to use two arrows, but I'll wave my hands at that for now. These three lines, two pointer manipulations and a break is what's going to stitch together that three in between the two and the four. And let me propose for lecture sake, take this on faith that this collectively does stitch things together properly. But I do think as you'll see in problem set five, it's a much better exercise to think through a little more carefully step by step because there's just a lot of fine-tuning of these pointers together and the order of operations does matter. But at the very end of this program, notice this is kind of mindless even though the syntax is undoubtedly less familiar. Here is how just like traversing the whole list to print it out, we can similarly do one more pass over the linked list and free every one of the nodes. But notice it's not quite as simple as just saying free the whole list. Free is not that smart. Maloc is not that smart. And even though you have called maloc one, two, three times, you have to really call free. You have to call free one, two, three times. You can't just pass at the beginning of the link list and say you figure out what to delete cuz it has no idea what a linked list is or what your data structure actually is. So the reason that this loop is a little complicated is that what I'm doing with these three lines is essentially traversing my list and making sure that I have a pointer that when I'm ready to delete the three, the one, I have a pointer pointing at the two and then I free the one. I update my pointer to point at the three and then I delete the two. I update my pointer to point at the four, then I delete the three, and then I delete the four. So, there's a bit of trickery involved in making sure you don't orphan things step by step. Okay, that was a lot. Let me pause here to see if there are in fact any questions, even though we're deliberately waving our hands at some of those details. Questions on this? Now, let me add one final flourish. If we were to really quibble over this, I mean, my god, we're up to 80 lines of code already just to implement the numbers one, two, three, four. But there are some subtle bugs in here at the moment. So, for instance, suppose that something goes wrong with maloc inside of this for loop here. And suppose that it's not your first iteration, something goes wrong on maybe the second or the third iteration. Why is this error check suddenly bad as I've implemented it? Yeah, I didn't free the memory from the previous iteration. So this is where like oh like memory management starts to get really annoying because if you do want to practice what I've been preaching which is free any memory you've allocated and you've already allocated one maybe two nodes because maloc is again failing maybe at the last iteration here you have to somehow go back and free all of that and that's fine like we have code at the bottom of my file here which could traverse through the existing list and just free it all. So I could just copy paste that code, put it into my if condition and then run that code too to delete the whole list. But at this point if you're copying and pasting you're probably doing something wrong. And so let me propose as a final version of this just for your reference later in the ninth and final in version nine of this file here zero indexed what we have. Give me one second to just make a quick copy and copy it over in list 9. see our last version of this. We have the following whereby now in my function uh in my main function I have the exact same code as before but I've taken the liberty of implementing an unload function so that I can call it here as well as at the bottom of this main function. So I can unload it here or unload the list there. And all I've done now is in good form in terms of design just implement the notion of deleting a linked list in its own function. So I could call it any number of times from any number of places. But just so you've seen how I might do that there. All right. So let's ask the question after all of this. What is the running time of inserting into a linked list? Big O of say a little big O of >> N. Damn it. Like that's no better. All right. What's the running time of searching a link list? >> Big O of N. Damn it. Uh what's the running time of deleting from a link list? >> Big O of N. So like everything is literally big O of N. So there's the price we've suddenly paid. We have an hour after we started with arrays gotten to the point where we can dynamically grow in a linked list and I dare say even though we've not done it and won't do it today, shrink the link list by freeing things that we don't need. So we have the dynamism and we can make more efficient use of memory even if it's very fragmented and there's a few bytes here a few bytes there but we've paid this price because with arrays recall even our phone book example we at least had binary search the running time for which was big O of log so my god not only are we spending more space the darn thing is slower surely this is not how our phone contacts are implemented surely this is not how stacks and cues are always implemented and indeed it's not this is just going to be a stepping stone to now doing a sort of mashup of data structures whereby we take the best features of arrays, the best features of link list, mash them together to get new and improved data structures. But for that, we're going to have to have some cookies first and we'll come back in 10 minutes. Cookies are now served. All right, we are back. So, let's recap how we got here and why. So, we started with our old friends arrays, which we introduced in week two. And recall that the whole appeal of arrays was that one, as all things go, like relatively simple, certainly now in retrospect, but more importantly, they were really darn fast. Like arrays in so far as they are stored backtoback contiguous in memory means that we could do very simple arithmetic recall to like fi figure out the length of it and then divide by two to get the middle divide by two again to get the middle of the middle and so forth. And even though we might have to deal with a little bit of rounding arrays lent themselves to binary search and thus logarithmic time so big O of login. But today I claim that the downside of arrays is that you have to decide in advance how big you want it to be. And if you guess wrong and it's too small how much uh memory you ask for, you then have to reallocate memory. And that's fine. It's solvable with maloc or realloclock. But it's going to take some amount of time to copy all of the old memory into the new memory. Whether you do it with a for loop or mal realloclock does it for you. Meanwhile, we only did it with like three values, maybe four. But imagine it being 3 million values that you now need to allocate more space for. You're going to waste a huge amount of time copying 3 million values from the old location to the new. And so that's just generally not very appealing. And so that motivated our whole discussion of linked lists whereby now we can create a more dynamic data structure whereby we only allocate memory as we need it. So we don't have to worry about underestimating or overestimating and therefore wasting memory. We can just go bit by bit for each new value. We allocate another node, another chunk of memory, and the thing just grows and grows and grows. But as we saw just before break, the downside is even though we're avoiding the inefficiency of having to move stuff around in memory, once allocated, the nodes can stay where they are and we just update our pointers. All of our running times for searching, inserting new elements, deleting old elements would seem to be big O of N. But why was that? Well, in the context of a linked list, recall that it might look a little something like this, whereby we have a pointer called list pointing to maybe four values like this. And suppose that we do want to uh search for a value. Now, it's nice because in our latest version of this linked list, it was sorted from smallest to largest. And that was always a precondition of doing binary search. But even though it's obvious to our human eyes where the middle is, it's like roughly over there. How is the computer going to figure that out? is how is your code that you write? Well, unfortunately, the way we've stitched a link list together with these pointers is if you want to find the middle, you can, but you got to start at the beginning, traverse the whole thing to figure out how long it is, then do it again, and stop halfway through once you know what the halfway point roughly is. Then, if you want to search the middle of the middle, you've essentially got to do that whole process again. And so, now just to use binary search, you need to spend big O of N steps just to even find the middle. Now, if your mind is kind of spinning and you're like, well, maybe I could just kind of cheat and use a pointer to always point to the middle of the list. Totally fine. You can spend in some additional space to remember the be the middle of the list, the end of the list. But where does that stop? What if with binary search, you go not just to the middle, but the middle of the middle, the middle of the middle of the middle, the middle? Are you going to keep around a pointer to every element? Because if you do, you're essentially back to an array if you've got one location for every other location. So it just kind of devolves into a mess. Even though there's some minor optimizations we could in fact make. In fact, we didn't talk about it yet. But one common alternative to a singly linked list, which ours is, it's linked with a single pointer from node to node. Uh computer scientists also like to talk about doubly linked lists where there's arrows going both directions, which actually would have simplified some of the last code that we looked at because I don't have to look ahead to figure out what I want to free or what and where I want to insert some value. But that too doesn't fundamentally change the speed. It just makes your code a little easier to write. So in short, with link list, we get dynamism. We can now grow and shrink things without wasting time copying. But we've lost hold of our binary search. And that was very appealing as far back as week zero when we wanted to do something quite quickly. So let's see if we can't make some mashups now. take some arrays, take some link lists, literally mash them together into a sort of Frankenstein data structure and see if we can't get some of the speed of arrays, but the dynamism of linked lists. And so I give you trees. If you think about in your mind's eye what a family tree looks like where you typically have some parents and then some children and some grandchildren and so forth. It's this sort of treelike structure even though by convention it's drawn top down instead of bottom up like trees in the real world. But the top of that family tree uh we're going to call the root of the tree. It just so happens to indeed grow down. But a tree is a very common data structure and it's interesting visav arrays and link lists in that it's the first of our two-dimensional data structures. An array is effectively just a single dimension along from left to right. A link list is essentially the same. Even though in reality it might be up, down, left, and right in memory. It's still just one thing stitched together in a single dimension. A tree adds now a second dimension. And specifically useful for us is what we're going to call binary search trees, which is spoiler going to give us back the ability to use binary search. But we're going to store the data a little more cleverly than in arrays alone. Instead of storing our data in one dimension in a binary search tree, we're going to store in effect in two different dimensions. And that's going to gain us some speed. So here for instance is an array of seven numbers as we might have seen it back in week uh two when we first introduced arrays. Let me draw our attention to the middle element and then to the middle of the middles and then the middles of the middles of the middles just by color coding them slightly differently. If I were to run binary search on these numbers or the lockers that we had on the stage a few weeks back, I would jump to the middle then the middle of the middle and so forth. The catch though is that implementing it as an array, it's not going to be very easy to add new values. Why? Because if I want to add the number eight or nine or 10, I might get lucky and there might be room in memory here, but I might get unlucky. In which case then we got to start jumping through those hoops of maloc or realloclock and all and and copying all of this memory to a new location which is doable. We solved it in code but it's going to be slow for larger data sets. So can we avoid that? Well maybe I deliberately colorcoded things like this because let me propose that instead of storing these seven values in an array, let's store them in a family treel like structure like this where I just kind of exploded them vertically on the y-axis here. So now the middle element, the fours at the top of this tree. The four, the two and the six which were the middle elements after the middle are going to be to the left and right of the four. And then these leaf nodes so to speak. We borrow a lot of vernacular from the world of actual trees. These are leaves in the sense that they themselves have no children. They're at the edge of the data structure are going to be the middles of the middles of the middles. But all of the data is still there. I've just exploded it from one to two dimensions. And let me propose that now that we have this technique of using pointers which we use with CC code but you can depict them pictorially with arrows. Let me propose that we stitch together these seven values in memory using a bunch of pointers whereby now each of these nodes drawn as a single uh square for simplicity is going to have not only an integer associated with it and not just one pointer but per these arrows as many as two arrows associated with it. So our nodes are about to go from data structures with two things, a number and a pointer to three things, a number and two pointers for the left and right child respectively. And I dare say now that we have a two-dimensional tree data structure, consider how you might find a number therein. Suppose I'm searching for the number five. Well, I start at the root of the data structure. And even though our human eyes obviously know where we're going, notice what's important about this binary search tree. If I go to the root of the no of the tree, I see the four. Four is obviously less than five. What does this mean? This means I can divide and conquer the problem right off the bat. I know that five is going to be to the right of this node, which means effectively, if you think in your mind's eye about snipping the branch there, I have just haved the problem essentially like dividing the phone book in half. Why? Because I don't even waste time looking at this subtree, the left child of the four element. Meanwhile, if I go from the root to its right child here, I see the number six. Five, of course, is less than six. So, this is effectively like snipping off that child because I don't need to go further there because I know a smaller element is going to be in this direction. And that's the key property of a binary search tree. It's not just a family tree with numbers all over the place. They follow a certain pattern. every element is going to be greater than its left child and less than its right child assuming you don't have identical values and that property is actually a recursive one to borrow terminology from a couple of weeks back recall that a recursive function is one that calls itself a recursive data structure like the pyramid in Mario is a data structure that can be defined in terms of itself well binary search tree is a recursive property in so far as if it applies to this node it also applies to this node case point two is greater than one but it's also less than three. It's true over here. Six is greater than five but less than seven. And it's technically true of the leaf nodes because the definition is at least not violated there because they don't even have children themselves. So this is a binary search tree because of that pattern. So this then invites the question, well how long does it take us to search for a value in a binary search tree? Well, if the number is five, it's going to take me one two steps. But if there's n elements here, can someone want to generalize that either mathematically or just instinctively? Big O of log n. And even if you're not quite sure how the math works out, anytime you take a data set and you have it, have it have it, we're talking about log base 2 of n again. And indeed, that's going to describe the height of this tree. The height of this tree is essentially log base 2 of n because if n is seven, it's going to give me uh essentially two when we round appropriately. If we round up, if we've got eight elements, log base 2 of 8 2 the 3r. So that means three. So 1 2 3. It kind of works out even if I'm doing that a bit quickly. The height of this tree is log base 2 of n aka bigo of login. How long does it take to insert? I think it's going to take login because I can insert over here or over here or over here depending on where the number goes. Uh how long does it take to delete? I'll claim it's going to take about the same. So wow, we're back in business. I've got now the ability to grow and shrink my data structure because if I want to insert the number eight, it's going to go right there. If I want to insert the number like 5.5, I I can see where I would put it. It's going to be easy to add new nodes by just updating the pointers without copying everything in memory like we had to for arrays. But there is a downside here. I got to concede something. What am I what price am I paying? What's the trade-off here to gain that dynamism and that speed? But >> each individual node takes more memory. >> Yeah, I'm literally using three times as much memory now because even though it's not depicted here explicitly, each of these squares represents an integer and a pointer and another pointer. So that's like 16, that's like 20 bytes at this point of memory instead of just four bytes for each of the integers in an array. Nowadays though, space is pretty cheap. We all have very large Dropbox folders, iCloud folders, and the like. So it's not really a big deal to use that many more bytes. Certainly not a big deal for seven numbers, but if it's seven million numbers, maybe this isn't the best data structure to use, even if speed is important. You got to decide ultimately based on your actual use case what matters more. So in short, a binary search tree you can kind of think of as an amalgam of or rather a variant of a linked list except that every node has as many as two pointers instead of one, which is what gives us now this this second dimension. And in fact, this translates pretty nicely to code. In fact, if we consider how we implemented in a linked list a node, recall that it looked like this where you got a number in each node and a pointer to the next element in the linked list. Well, I think for a binary search tree, we can sort of borrow this as inspiration, make a little more room because we need two pointers instead of one. And I'm just going to call the left child the left pointer and the right pointer. But here is the three times as much space give or take because I now have three elements associated. Two pieces of metadata and one piece of data that I actually care about to stitch this thing here together. All right. Well, if this is the data structure there, how could I implement this in code? Well, here's where recursion again comes into play. The fact that a binary search tree is recursive in nature in that what you say about this node about it being greater than the left child and less than the right child can be said of this node and this node and this node and this node. You can leverage that beautifully in code like this. So suppose I'm implementing a search function in C whose purpose in life is just to say yes or no true or false the number you're looking for is in this tree which might be a useful thing to uh check uh in a in an algorithm. Search is going to take two arguments. I propose the number you're searching for and a pointer to the tree. That is the root of the tree initially. So how do you actually traverse this thing in C code? Well, we can pluck off the the easy case first. The base case if the tree itself is null. Like if you hand me nothing, I'll give you your answer right now. False. Like there's no number here if the tree is empty. So that's easy. Otherwise, if the number you're looking for is less than the number in the current node. So tree is what's passed in a pointer to the root. So if you follow the arrow, you can get inside of that value and see its number. If the number you're looking for is less than that, okay, you want to what? Snip off the right tree and dive down the left subree. So you search the trees left child for the same number. Else, if the number you're looking for is greater than that number, you search for the trees right child for that same number. And the fourth and final scenario is what? Well, if the number you're looking for equals the number in the current node, you got it. Return true. And if you're uh recall some of our past design discussions, this is sort of a waste of everyone's time to ask this question explicitly. Let me tighten this up design-wise because there's only four possible scenarios. Either there's nothing there, it's to the left, it's to the right, or you found it. It's right there. So whether or not you agree at this point in your programming career, like there is a beauty to this code that most programmers would claim is here and that it's so relatively elegant whereby you've defined what the function is. You've got this base case which is arguably one of the clunkiest parts. But the fact that you can just check a value here and then traverse the exact same structure but a subset of it by traversing the left subree or the right subree is like a beautiful application of recursion. And it allows you to uh search for this thing no matter where it is in the computer's memory. Questions then on this idea of a binary search tree or this actual code thereof. >> And if you don't ask the question, if the number is not there, >> uh, nope. If the number is not there, we recall. So, if we get all the way to the bottom of the tree such that now I'm at one of those leaf nodes and that's not the number I'm looking for, such that there's no left child left, no right child left, this conditional is going to kick in and I'm going to return false. But if I find it along the way, whether it's at the top of the tree or somewhere in the middle or among the leaves, I will eventually return true. Good question. And to be clear, even though I'm calling this a tree, that's true certainly for the first time I call this function because I'm passing in a pointer to the whole tree structure. But if you think about it, what's the left subree and the right subree? It's just a smaller tree. It's like a baby tree that's attached to this parent node, so to speak. So it's perfectly reasonable to just call the search function with that child because it in turn has a whole subree below it or the right child which has the whole subree below it instead. All right. So I like this direction. We've now kind of improved upon link list. We've gained back some of our performance because we can now find something with big O of log and time. I don't love the fact that I'm using three times as much memory roughly. That feels like kind of a high price to pay just to speed things back up. But let's consider whether or not this thing is actually going to work as the data structure gets bigger and bigger as well. So it looks beautiful here as written and that's deliberate because I drew the picture like this and it's got seven elements in it. But how did we get to seven elements? Let's start from the beginning. Suppose that the tree is initially empty and suppose that a human using get int or some other technique inserts the first element into the list like the number two and the goal is to maintain the binary search tree property which means you got to have it greater than left child less than the right child. So suppose the human using get int or some other technique next gives me the number one no big deal I plop it right there as the left child suppose they give me the number three next no big deal it goes right there I have very deliberately manipulated this story to work out beautifully such that the tree is smaller but it's still a binary search tree and nicely balanced so to speak but what if the user for whatever reason just gives me a more perverse sequence of inputs like the worst case scenario to give me three elements and suppose they give me one first Okay, that's the root. Then they give me two. Okay, that's cool. That's like the right child. But what if they then give me three? Well, to maintain that binary search property, the three has to go over here. Suppose perversely then they didn't give me four, then five, then six. Imagine in your mind's eye where this story is going. What have I accidentally created in memory? Then a link list, which is like bad for all the reasons we discussed before the break because even though we're getting the dynamism, it's devolving into big O of N. So I've kind of manipulated the situation here with their original example with seven seven elements and then three elements by making sure that they were inserted in just the right order. Because unless you are clever about how you build the tree in memory, it could very well devolve from a tree in two dimensions into actually a linked list in one dimension. And now this is just a long and stringy tree that does not violate the binary search tree definition, but it is surely not balanced in this case. Now, as an aside, if you take higher level languages and data structures and algorithms, there's many different alternatives to binary search trees that actually have baked into the algorithms a little bit of rejiggering of the structure so that really as soon as you insert this three, you spend a little bit more time and clean the situation up. And essentially what you do is like pivot the thing around this way so that two becomes the new route and then one hangs off of it and three still hangs off of it. So with each insertion or deletion, you rebalance the tree as needed, which does cost you a bit more time, but it avoids the thing devolving into big O of N again. And we won't do that in code. So this is recoverable, but not if you implement it naively, as I did, at least verbally in this story. All right. Well, can we do better than that? Well, why might we want to? Well, at this point in the story, it certainly could devolve into big O of N, and that's not great. Certainly for large data sets, it's nice that we're back to login. At least if you take on faith that we could kind of rebalance this thing as needed and maintain a logarithmic height for it. But really the holy grail of data structures is to achieve something that is big O of one like constant time whereby no matter how many numbers or names or sweaters are in the data structure it will take just one step or maybe three steps or even 100 steps but a number of steps that is completely independent of how many actual pieces of data are in the data structure. That is to say over time it doesn't get any slower even if you've got tens, hundreds, thousands, millions of elements in there already. So how do we gain something like big O of one constant time the appeal of which is reminiscent of our early picture from week one like this was our early algorithm for finding someone in a phone book or counting students in the room something linear literally straight lines. This was the logarithmic curve which especially as you zoom out starts to get very very appealing time-wise. Something that's constant time looks even prettier. It is a straight line at like the one step mark or the twostep marks whatever the constant number of step marks is. And even though logarithmic will still grow in perpetuity, constant time by definition never changes. And this is what we'd really like. So when you're searching for someone in your phone, you're searching for something on Google, you're asking a question of chatbt, you get an answer like that in constant time independent of how much data is actually in there. Well, let's see how we can do this. To do this, we're going to at least need a new building block, a term of art known as hashing. Hashing sort of formally takes an infinite domain of values and maps it to a finite range of values. So from high school math class, domain is the input, range is the output. So an infinite domain to a finite range is the goal here of hashing. And we might see this actually in the real world when you're playing, you know, games or whatnot or you're cleaning up after a game like here is here are some super jumbo playing cards that we got online. And suppose that you want to just get these into sorted order. Um you could do this very painstakingly. There's 52 cards here. You can kind of lay them all out and start sifting through them and put the two over here and the four over here and the hearts and the clubs and so forth. Or you can start to look at the cards and bucketize them first to take a 52- size problem and maybe uh shrink it down into four 13 byt problem. So here for instance is where uh the first diamond might go, the club here, spade over here, diamond over here. And I can kind of just do this again and again bucketizing literally all of these values so that I've got a very simple heristic that allows me to move the cards into these buckets each of which is going to have a subset of the values and then I've got smaller problems I can deal with. So dot dot dot assume that I bucketize all 52 of these values. Then I've just got four problems remaining. And I dare say it's a little easier then because they're all of the same suit and so I can pretty easily sort it from ace to king or whatnot because those are effectively just numbers at that point. So hashing refers to again taking values from an infinite range. In this case, it it can be finite and it is in this case. But if you were doing it more generally with numbers, you just have to map it to a finite range like 1 2 3 4 finite number of buckets of values at which point then you can solve the problem a little differently or a little more efficiently. So why is this gerine? Well, I would propose that if we want to start organizing our data in memory toward an idealistic goal of achieving constant time, hashing might be one ingredient for the solution there too. And generally, we're going to describe the process by which you decide what input goes to what output is namely what's called a hash function. It's a mathematical function or a function in code that takes as input a card from a deck or maybe a word from a dictionary and outputs a value that represents the bucket into which it should go. So in the case of our contacts app for instance, of course in the guey of it, you have all of your friends and family top to bottom uh alphabetically presumably you might want to ideally find someone quite quickly, ideally in constant time, right? The naive implementation that Apple or Google could implement is just use linear search. Search through all of your contacts top to bottom and eventually you will correctly find the person. But wouldn't it be nice if they instead use an array and then they can use binary search and get you the person in logarithmic time? That's great. But if you have a lot of friends and family in there or a much larger data set, wouldn't it be nice to just jump to the answer in one step instead of even log of nst step? So that's our goal. Can we get close to or actually at constant time? So with a hash function, we essentially have our old friend problem solving here, the inside of which the algorithm is known as a hash function. And for instance, if I'm looking at Mario's number, I might now want to look for Mario, not top to bottom or not divide and conquer, jumping around to the half, the middle of the middle of the middle. Let me just figure out what bucket Mario is in. And in the English alphabet, there's 26 letters of the alphabet, A through Z, either uppercase or lowerase. And suppose that I want to find what bucket Mario is in. Well, much like these cards and the suits thereof, wouldn't it make sense that anyone whose name start with with A goes into the first bucket and maybe the B's go into the second bucket and the dot dot dot Z's go into the last bucket. So, it stands to reason that if I pass in Mario to a hash function implemented in C or some other language, I would like to get back the number 12 because M is the 13th letter of the alphabet, but if we start counting at zero with our buckets, which are essentially an array, then it's index location 12 instead of 13. Similarly, if Luigi is the input, I'd like to get back the number 11. So, my hash function somehow takes as input in this story, a string, and gives me an integer. I claim there's theoretically an infinite number of names in the world in the English language. But there's only going to be 26 possible answers from this hash function 0 through 25. So, that's our infinite domain to our finite range. Instead of four, it's now 26. All right. So what should we do with the computer's memory to leverage the fact that we can very easily bucketize names based on the first letter of someone's name? Well, let me propose that the hash function part of this arcane as it looks is actually pretty straightforward. So if you wanted to translate this idea into C, you can include uh cype.h, which we've used a few times to get it access to like functions like two upper. And this is just to make sure you can be case insensitive. Here's my hash function. It's going to return an int, which is the goal. Takes a string as input. We'll call it name. And what does this function do? Well, it's kind of some clever asymmetric. It first converts to uppercase. The first letter of that person's name. So, if it's in all lowercase, forces it to uppercase. Why? Because I want to subtract no matter what 65 aka the asky value of capital A from this. And I don't want to screw up the math. If I'm doing like a lowercase letter minus a capital, I want capital minus capital is all. So this will return to me a number between 0 and 25 inclusive because if it is a letter a name that starts with a. I'm only looking at the first letter. I'm subtracting off a that gives me zero and I'm going to return zero as a result. Dot dot dot. If it's z, I'm going to return 25 instead. Now there's no error checking in here. If you type in uh non-English symbols, uh it's going to break. So let's just assume for simplicity this is indeed an English name that's coming in. I can refine this a little bit. I'm going to propose moving forward in our final week here of C, there are some added defenses you can put in place when writing code. Like if you know that you're receiving a name as input, that is you're passing something in by reference, there's a danger now per last week, because now the caller of this function, whoever's using this function is telling you where to find Mario and where to find Luigi's name. The problem with that is that you could go to that address and actually change their name in memory. Even if you're not supposed to, you're supposed to just use the name. So you can do something like const which says you should not be able to change this value even though I'm not giving you a copy of it by value. I'm giving you a reference there too. Another refinement here is that a hash function for an array as the goal should return a value that's zero or one or two on up. Never negative. So we can even more protectively say it's not just an int, it's an unsigned int. And we talked briefly about that last week, albeit in the context of chars. These are just like minor improvements that makes your code arguably better designed because you're opening yourself up to fewer possible mistakes or issues. All right, so with that said, let's now assume that we've got this kind of function in uh implemented and we can now use it to decide what bucket to put these people's names into. Well, let's give you what are called hashts, which are sort of the Swiss army knives of data structures. the kind of thing that some computer scientists have been quoted as saying if they were stuck on a desert island with only one data structure, this is probably the one they would want. Why? It's just really generally useful because it allows you quite powerfully to associate keys with values. Which is to say to come full circle today, hashts are often how you would implement at a lower level the thing we began class with talking about dictionaries, collections of key value pairs. That after all is what a phone book is. We call it, you know, names and numbers, but it's keys and values. That's what an actual English dictionary is. The Oxford English dictionary, it's a bunch of words and definitions or keys and values. So useful in general to be able to associate one piece of data with another. Argo hashts. So here's how you might implement in C a hash table. You want it to be of size 26 for instance. So 26 buckets from A to Z, hence the 26. You want this to be an array and that's fine. This is an array of four buckets. I'm going to use an array of 26 buckets because a hasht 2 is going to be an evolution of our linked list mashed together with an array. So a hasht in short is going to be an array with linked lists as we'll soon see. Here's the array. 26 pointers to nodes. So I'm going to give myself an array of pointers that is going to store ultimately a whole bunch of person objects like this. So for instance, here's a char star name, charst star number, as we've discussed in the past, representing a person. These are the pieces of data I might want to store in this data structure. However, let's simplify. Let's not worry about the phone number because we're not going to call anyone today. But for a linked list of persons, I'm going to need to store let's say the person's name, but also a pointer to the next such name, to the next such name, to the next such name. So again, I'm just deleting number as being unnecessary detail. But if we're going to have an array of link lists, this is our new definition of node for this part of class whereby it's not for a tree. It's now for a hash table. And we'll see this in action now. Here is my array of size 26. I drew it vertically, but who cares? These have always been artist renditions thereof. It just fits nicely on the screen this way. This is location zero. This is location 25. So any A names should end up over here. any uh Z name should end up down here and so forth. Let's just generalize this away as letters of the alphabet for clarity. That's where all the names are going to go. So hopefully Mario here, Luigi here, and everyone else. So what are each of these squares? They're just pointers to nodes. Initially, all null, all claim. But as soon as I insert Mario into this so-called hash table, I'm not going to put him literally here. I'm going to create a new node in memory, put Mario there, and then stitch it together. Because if I get another M name, I'm going to stitch it together and together and together again. So for instance, here comes Mario into this data structure. So this is a pointer to a person structure. Here's Luigi. And here's a third character as well, Peach. That's all working out great. Dot dot dot. There's a whole bunch of characters in the Nintendo universe. Here's a lot of them. Unfortunately, especially if you're a fan, there's also other names that do start with M and L and other letters of the alphabet. So, we're poised to have what we're going to call collisions, which is a downside of using a hash function. If you're going from something infinite to something finite, by definition, you're going to have a heck of a lot of potential collisions somehow. Multiple M names, multiple L names, and so forth. So, we've got to mitigate this somehow. Well, if you meet someone in the real world whose name happens to start with M, and you already are friends with Mario, well, you could delete Mario from your phone and put that new person there. But that's kind of dumb. You could clobber the value, that is. Or maybe you put the M friend here. And when that fills up, you put the M friend here. And then when you meet someone else whose name starts with M, you put it here. But then it just devolves into this mess. At which point now there's no rhyme or reason as to who is where. It devolves back into something linear. If you have to search the whole darn thing looking for M friends just because you ran out of space where you want it. So here's the beauty of mashing together an array with a linked list. You hash the name to the intended location like box 12 here. And then you just start stringing them together in a linked list. And hopefully you don't have too many of those collisions, but at least now you don't have to delete or make a mess of the data structure. So here's another bunch of names, three starting with L. Here's a bunch for the other letters of the alphabet. And it's just a linked it's an array now of linked lists. This then is a hash table. So the question to consider now is this better than an array? Is this better than a linked list? Well, I dare say it's better than a linked list because if it were a linked list from A to Z, what would be the running time of searching for anyone? Well, I'll spoil it. Big O of N. Because even if it's alphabetically sorted, you got to start at the beginning and go all the way through the list potentially to find someone like Zelda whose name starts with, of course, Z. But here we have an array of linked lists. So what's really the running time here? It's not quite as bad as n steps because if you assume a uniform distribution of names such that the world of Nintendo maybe has as many M names as L names as A names as B names, you could assume that there's a bunch of chains, a bunch of linked lists here chained together, but they're all roughly the same. So maybe you have n names in your phone book this way, but there these lists are only of size uh they're only 126 of that length because you've got that many names there. So what's the running time? Well, ideally we'd move away from link lists with big O of N and achieve our constant time. But uh we have these collisions to worry about here. Just to be clear, we want to get from big O of N to something constant time, but we're not going to get to constant time if we've got collisions. If we've got three L names and a few B names and a few A names, we can't just jump to that location and find the person we're looking for. So, what's the fundamental goal? Well, I think we want to maybe use a smarter hash function. And here depicted is an excerpt from a bigger hash table that is a much bigger array that assumes that you're not looking at the first letter of everyone's name, but apparently what instead the first three letters of the person's name, which just decreases the probability of collisions because in this model, I dare say there's no one else's name in the Nintendo universe that starts with L I N. So now Link has its own location in memory. And similarly for Luigi, LUI I believe is unique in the Nintendo universe. So we don't have a collision. Unfortunately, while this does seem to eliminate collisions based on this tiny example, what's the trade-off or what's the catch? Yeah, >> use a lot more memory. >> This is a lot more memory. I mean, kind of hinted at the fact that I didn't even fit most of it on the screen anymore. Here's L A. Here's L U. But what about all of the other letters of the alphabet and the other combinations of dot dot dot dot dot dot all possibilities. Moreover, some of these just don't make much sense. At least in English or in the Nintendo world, I don't think there's anyone whose name is going to start with a aaa or a aab or a a or a a d or a and so forth. You we're wasting a huge amount of space to reduce the probability of collision. So that's fine. We might get constant time now, but at what cost? Well, a heck of a lot more memory. And so this is one of the tensions when using a hash table is you want to come up with a good hash function that's maybe a little more sophisticated than the first letter but not so wasteful that you need a crazy number of buckets and therefore a huge amount more memory. So really even with collisions it's not quite as bad as n steps cuz technically if you have k buckets where k is like 26 buckets or four in this case technically if you do assume that the names are uniformly distributed over a through z the English alphabet. Well each of those link lists is going to be hopefully no bigger than n / k. So n / 26. But what do we know about higher order terms when doing big O notation? Big O of N / K. Yes, it's faster but asmmptoically that is theoretically you're still talking about big O of N. So here's the tension though like it's absolutely going to be faster. It will be like 26 times faster than a linked list but it's still just big O of N because it's going to take an amount of time that's still linear in the size of the data set. So we seem to have strayed yet again away from our constant time search. So can we find this holy grail? Well, we kind of can if you let me spend just like a lot more space. There are tries in the world, which could weirdly is short for retrieval, even though we don't say retrival, but a try is a tree made out of arrays, right? So, at some point, computer scientists were just like mashing things together Frankenstein style, like like length lists and arrays, and now we've got uh trees and and arrays. You two can mash something together and come up with your own. Let's look at what a try actually is because it is going to get us that constant time grail. So here is the root of a try. You can think of each node in a try as really being an array of values a through z in the case of an English problem like we've been playing with here. And what you do is you treat this array as being indexed from 0 through 25 or equivalently a through z. And you treat each of those elements as a pointer to another such node in the try. And what you do is implicitly store the names that you're storing in this data structure by going to an appropriate location based on the first letter in their name and then adding a pointer that represents the second letter in their name. Adding a pointer that represents the third letter of their name and so forth. So what do I mean by this? Suppose we want to insert Toad, one of the characters from the Nintendo universe first. If we count up where T is in the alphabet, this uh pointer here will be changed from null to a pointer to a new node that represents the second letter in Toad's name, which is going to be, of course, O. Then to insert to o A, we're going to need another node. A is going to lead me to D. And for p uh depiction sake, I'm going to draw in green, even though this would actually be a boolean or something like that in memory that indicates that Toad's name stops here. So in other words, this try in memory has four nodes. Now each of those nodes is essentially an array of size 26. But the word toad is not actually stored in the data structure explicitly. There's no charar toad, but implicitly because the tinter is non-null, the o pointer is non-null, the a pointer is non-null, and the dp pointer is in fact null at this point is the common technique here. This allows me to to insert other names from Nintendo's universe like Toadette because I can continue from here to go to the E node to the T- node uh to the T- node again and an E node which I'll again mark in green. So you can even have names that are substrings or equivalently superstrings of each other by just having all of these various breadcrumbs along the way where again a non-null pointer here to a non-null to a non-null to a null pointer here indicates that or it can't be null at this point. This is where we have to use a boolean indicates that there is a name in this data structure that ends here and there's another name that ends here. Meanwhile, if there's a third name from the universe like Tom, same idea, but eventually we can start reusing some of these arrays whereby non-null non-null null or there's a boolean flag here that says true, a name ends here. Now we're reusing that same array. So each of the nodes represents the e letter of the word or the name you're trying to store in the data structure. And by playing around with null and non-null and some booleans, you can implicitly store names in this structure. Now, it's way too uh pictorially difficult to depict lots and lots of names in this form. So, just imagine in your mind's eye that there's dozens, hundreds, thousands of names now in this data structure, but just more arrows and more arrays. How do you actually look someone up in this data structure? Well, if you want to ask a question like is Toad in this data structure or is toad in this data structure or anyone else, you can simply start at the root node as we would do for any tree and you hash on the first letter of toad's name which gives you this location and you check is it null? If not, T is implicitly there. So, you follow that pointer here and then you hash the second letter of Toad's name, an O, and check this pointer. And you follow that arrow. Then you check the third you hash on the third letter of Toad's name A and you follow that arrow. Then the fourth letter of Toad's name D and you see ah there's a boolean here represented in green that means Toad is in this data structure. And notice what's subtle here. It doesn't matter if there's three names in this try or three million names in this try. How many steps did it take me to confirm or deny that Toad is in this try? one, two, three, four, which is arguably constant. Even though the names can vary, at some point there's no Nintendo name longer than what, like 10 characters, 20 characters, maybe 30. I mean, there's some reasonable bound that is finite where there's never going to be a name longer than that because Nintendo's never going to come up with a crazy long name for a game. And so, you effectively have constant time for looking up to o a d, Toadette, Tom, Mario, Luigi, Peach, any of the other names we've looked at. So this is to say a try allows you to ask questions like is Toad in this data set or equivalently what is Toad's phone number in this data set because if you assume now that each of these pointers ultimately is not just a bull saying yes or no but maybe it's an actual person structure with a name and a number you can store even uh data like that your key value pairs where your names are your keys and your phone numbers are your values to make this more clear then here is a data structure how we might represent in See each of these nodes. It's not quite technically an just an array. It's an array of size 26. We'll call it children because it represents the children of that node of type struck node star. And then here for instance for simplicity is that person's number. If we reintroduce numbers and want to store in this data structure someone's phone number as well. So using that data structure and that kind of uh code you can implement a try using something as simple as this. Initially your try is just a pointer to a node. one such uh strct. We can of course initialize it to null to make clear that there's no names in here. But each time we allocate a node, we can then add another node, another node, hashing on the first, the second, the third, the fourth, dot dot dot, the last character in the person's name, allocating a node as needed, flipping that boolean to true or false, or adding their phone number as a char star to indicate that we have then found them. And so of all the data structures we've looked at today, big O of one is actually achieved with tries. And yet curiously for problem set five, you're not going to implement tries, you're going to implement hashts, that sort of Swiss Army knife of data structures that like every programmer everywhere knows about. Why? Like why not use tries very often in practice? Perhaps you certainly can, but what's the trade-off perhaps? Yeah, >> take up too much memory. >> It's a huge amount of memory. Things have escalated since the start of class. We add we started with one int. Then we added an int and a pointer and int and two pointers. Now I'm proposing 26 pointers plus a boolean or a data structure called person. I mean it's escalating significantly. And the biggest catch with a try as you might have imagined with toad and toad and Tom on the screen there's a huge amount of wasted memory just as we saw with a hash function potentially but that can be reigned in as you'll explore in the problem set with a try. most of the pointers in those arrays are just null and unused and it just tends to result in you're using way more memory to solve the problem correctly but in a way that tends to slow the computer down and just waste more memory than is useful. That said, just as we started today, there are stacks in the real world. There's cues in the real world. There are even hashts in the real world which you'll indeed implement in code for problem set five. Has anyone here ever had a salad from a restaurant called Sweet Green in Harvard Square? also elsewhere in the US like not one, two, like two of us, three of us. Okay, so not hard to imagine going to such a store, getting in a queue and staring at a shelf like this because what Sweet Green and similar restaurants do when you order for pickup is they hash your salad into a shelf like this. And so literally in Sweet Green might you see some wooden shelves like this. This is the A through E bucket, the F throughJ bucket, the K through N bucket and the uh O through Z bucket whereby if your name like Min happens to be in one of those ranges, they will hash my salad and put it here. But of course, even in the real world, there are some constraints. And what can go wrong with this here hasht system? Someone who's been there maybe what can go wrong? Imagine like the extreme lots of values here. Yeah. So there's no more space, right? So and this has happened to me in the past especially since green before adopting this system. And they used to put the A's here, the B's here, the C's here, the D's here and so forth. And then someone at some point realized that they were very frequently overflowing the A's to the B's and the B's to the C's. The no one was using Q or Z with any frequency. And so they were sort of wasting space and running out of space. So at some point they decided to like literally remove most of the letters of the alphabet, make the buckets bigger and fewer. So now it's very unlikely that you're going to have so many K's through N's that you overflow the shelf. But this is in the real world a data structure like we've seen today. And so therefore among the goals, even as arcane as things seem to be getting with all the pointer notation and dreferencing this and that, really all we're doing in code is implementing realworld solutions that other people have already come up with and translating them to a new domain. And the very last thing you'll do in C this week is indeed implement your very own spell checker whereby we'll give you a very large file of 100,000 plus English words. you'll have to come up with a clever and efficient way to load it up into memory. And we'll give you tools that will actually measure how fast or how slow your code is, how much memory or how little memory your code is so as to actually compare it against not just your own but perhaps others as well. So with that said, we'll end a bit early today. We'll see you next time. Heat. Heat. All right, this is CS50 and this is already week six wherein we transition away from C to a programming language called Python. And that's not to say that the past several weeks haven't been among the goals of the course. Indeed, in learning C, I very much think that you'll have at the end of this class so much more of a bottom-up understanding of how computers work, of how programming languages work. And in particular, you'll appreciate and understand better how Python and Java and C++ and Swift and so many other languages are actually doing their thing nowadays. But recall that we started with Scratch some weeks ago. When in Scratch, what was nice was that the first program we wrote, hello world, was just all too accessible. All you had to do was interlock two puzzle pieces in order to make the cat in that case say hello world. Well, thereafter, of course, we transitioned to C. And recall that in week one, we asked you to take on faith that you can sort of ignore that first line and a lot of these parentheses and the curly braces and really just focus on the essence of the program, which clearly is still about hello world and printing it, albeit using a different function and a bit new syntax. Today, very excitingly, all of that is truly going to go away and be distilled into a single line of code when you indeed want to have the computer say something like hello world. And this is what we mean by Python being a higher level language. So, humans over the decades learned uh from earlier designs, earlier programming languages, what worked well, what did not. Computers got faster, computers had more memory, and so you were able to start spending more of those resources in order to have the computer do more for you. And so, you don't need to be as pedantic syntactically anymore. you don't need to write as much code anymore and frankly you can just start solving problems of interest to you building products of interest to you so much more readily by choosing the right tool for the job and so in the real world if you continue coding after CS50 like sometimes C will be the right tool for the job sometime Python will be the right tool for the job and sometimes it's going to be a different language altogether that you'll never have studied in school and in fact what's compelling I think about this week six much like when I took the class back in the day is that after CS50 50, you'll have a taste of one, two, maybe a few different programming languages. And that's going to be enough to bootstrap yourself and teach yourself new languages because you're going to start to recognize in the real world similarities with past languages that you've seen, programming paradigms that are still sort of with us. And the syntax, yeah, that's invariably going to change, but that's the stuff that you are going to Google or ask chat GPT or some other AI about down the line. So long as you know enough of it to sort of get real work done, you'll focus mostly ultimately on the ideas and the problems you want to solve and less on the syntax. And so among the goals for this week and this week's problem set and really the rest of the course is to get you more comfortable feeling uncomfortable in front of your keyboard because we're not going to give you and tell you everything you need to know for a language like Python. You're going to turn to the documentation. You're going to turn to the duck and you're going to learn to teach yourself ultimately a new language. So let's actually write our first program and compare and contrast with how we might do that in C. So recall that in C we were in the habit for the first couple of weeks and doing make hello and make this build utility just kind of magically new to look for a file called hello.c C and magically to create a program called hello and then you could run it with dot/hello and then a week or so later we revealed that make is really just automating compilation of your program with the actual compiler clang in this case and passing it command line arguments like - o to get a specific output like the file name hello instead of the default which recall was a.out out passing in the name of the file you want to compile and turning on any libraries that you might want to compile into your program link into your program beyond the standard ones but then you could still run it in exactly the same way starting today when you write Python code and then want to run it you're simply going to run the Python program itself so just as clang is a C compiler uh Python is itself not only a programming language but a program as well and with the Python program which understands the Python programming language. Can you run code that you'll have written in a file called hello.py? And what this program is doing is a little bit different from what clang is doing, but we'll see that difference before long. But first, let me go over to VS Code and let's write our simplest our first of Python programs by doing code hello.py. And then in this file without any includes, any int main voids, I'm simply going to say print quote unquote hello, world close quote. All right. Now I'm not going to do make. I'm instead just going to do Python of hello.py. Cross my fingers as always and voila, my first program in Python. So it's sort of obvious that we got rid of the uh hash include. We got rid of the int main void. No curly braces. Only a couple of parentheses here. But what else is different to your eyes that's a little more subtle here versus C. Yeah. >> Yeah. So there's no F. So the print function is a little more human friendly. It's print instead of print f where the f did mean formatted, but we'll see that we still have that functionality. >> No need for the line break. >> So no need for the line break, specifically the back slashn. And yet here's my cursor on the next line. So I dare say humans over the years realized we are more commonly wanting a new line than we don't want it. And so they made the default actually give it to you automatically. And there's one more detail. Yeah. >> No semicolon. >> So there's no semicolon. So, I finished my thought at the end of the line, but I didn't need to explicitly terminate it with a semicolon. This is just with one program, all of these salient differences, but I'd argue that we got rid of all of the annoying stuff thus far anyway. So, we can really focus on what this program itself is doing. But what's exciting with Python 2 is just how quickly you can solve certain problems. And this isn't true of just Python. It's really any higher level language than C. In fact, just for fun, let me go ahead and implement Problem set five wherein you're challenged with implementing the fastest spell checker possible. So let me go back here to VS Code. Let's close out hello.py and clear my terminal window. And let me go ahead and do this. Let me first split my terminal by clicking this rectangular icon over here. And that's going to give me two terminal windows now left and right. Because in the first one at left, I'm going to CD into a directory I came with today, which is the staff's solution to problem set 5's spellch checker in C. And on the right hand side here, I'm going to CD into another directory I brought with me today called Python. Inside of which is a translation of problem set 5 into Python. In particular, I've implemented in advance a spell.py file, which is the analog in Python of spellar.c in C. And I've also prepared a dictionary. Py file. Unfortunately, if we open up dictionary.py, you'll see that it's not actually implemented yet. So in dictionary.py, let's implement in Python problem set five and see how long it takes. Well, the first thing I'm going to do is declare a global variable. We'll call it words. And set that equal to the return value of a Python function called set, which essentially gives me a set object, wherein I can store a whole bunch of words without duplicates. Python's going to manage all of that for me. In effect, it's going to implement what I needed to implement myself in problem set 5, a hash table. Now, down here, I'm going to go ahead and define a function called check. Pass in as input a parameter called word because, of course, that's how it was implemented in C. But notice a difference already. In Python, we use a new keyword called defaf to define a function. And we don't have to specify the type of the variable being passed in word in this case. And we also don't have to specify a return type for the function. Now, inside of this check function, it suffices to do this. I'm going to return word. In words, which is effectively a boolean expression asking, is the lowercase version of this word in the set? If so, return true. Otherwise, return false. done with the check function. Now let's go ahead and define another function called load which recall took an argument of the dictionary that you want to load into memory. And let's go ahead now and do this with open dictionary as file which effectively opens the dictionary as in C we used fop in Python we use open and it gives it a variable name of file. Then once that file is open, I'm going to go ahead and update that entire set of words which starts out empty by taking the file, reading the entire contents top to bottom, left to right, and splitting all of the lines therein on the new lines that terminate each of the strings, effectively updating the set with every word in that their dictionary. Then I'm going to assume that it all just worked because there's a lot less effort for me to uh to perform myself in Python. And I'm just going to go ahead and return true capital T in Python. Done. Next, let's go ahead and define that other function from problem set 5 size whose purpose in life was to tell me the size of the dictionary I had loaded. Well, in Python, that's pretty easy. I can just return the length or leen for short of the set in which I've stored all those words. Done. And then lastly, I'm going to go ahead and define an unload function, which recall was responsible for freeing any memory I myself had allocated. I don't seem to have done any of that in Python. In fact, that's managed for me now. So, I'm going to go ahead and simply say return true because there's no work to be done. And that's it. In like 19 lines of code in Python, most of which are blank lines, I claim I have reimplemented problem set 5 in Python. Well, let's take a look now at the difference. I'm going to go ahead and reopen my terminal window, and I'm going to go ahead and maximize it so we can see more output. And now I'm going to go ahead and run Python, which is going to be not only the name of the language, but the name of the program we use today to start running our Python code. And I'm going to run it on spellar.py, which I brought with me today, specifically on the largest of problem set 5's files homes.ext. Enter. And as with problem set 5 itself, we'll see a whole bunch of misspelled words being printed to the screen. Some of which might very well be misspelled. Some of which are just not in the dictionary. Some of which are simply possessives of words that are in the dictionary. But at the very end of this output, I should see not only how many words were found, but the total time involved, which appears to be 1.87 seconds. Not bad, seeing as it only took me like what, a minute or two to write the actual code. But there is going to be a trade-off. We'll see. Even though it took me much less human time and arguably was a lot easier to implement this imp spell checker in Python than I dare say it was for most everyone in C. Let's see what that trade-off might be. over in my lefthand terminal window in which I'm in the C directory which I brought with me as the staff solution in C to problem set 5. Let's go ahead and make that spellch checker. Then let's go ahead and do/speller and run it on the same file uh homes.ext and see how long the C implementation takes. Enter. And we see some of the same output might be slower sometimes just because of the cloud. there. Total time spent in the CPU, not necessarily printing everything to the screen, which might take longer, is only 1.32 seconds versus the 1.87 seconds in Python. Now, while only half a second, that's a decent percentage of the total amount of time spent running the spell checker in each of the windows. And so, that alone seems to be one of the trade-offs. Even though it seems to be much faster and there say easier to implement a problem in Python, there's going to be trade-offs in so far as the code might very well run slower. And as we'll see today, that's in large part because whereas C is of course compiled. That's why I ran make and in turn clang. And then the zeros and ones, the so-called machine code is what you're running. In Python, generally the pro the computer is interpreting your code essentially reading it top to bottom, left to right, much like a human in between two other humans might slowly translate one spoken language to the other if those two people don't in fact speak the same language themselves. So there's a bit of overhead when using Python, but I will say that the Python community has been working on this problem for some time. And so in general, it's not necessarily going to be as significant a trade-off because there are certain tricks we can do. And in fact, underneath the hood, what the Python language can do for you and the specific interpreter you're using is technically semi-secretely compile your code for you into something called bite code and then run that bite code, which is more efficient than actually reinterpreting it again and again. But we'll see more of this over time. For now, let's take a look at maybe two other problems that we might solve, dare say more easily, more quickly than we could have in C for problem set 4. Let me go ahead and shrink down my terminal window here. Close out dictionary.py. close one of my terminal windows and cd back to my main directory. And let's go ahead and open up that bridge bit mapap photograph that we used in problem set four and had to apply a number of Instagram-l like filters there too. Well, now let's go ahead and implement maybe one of those filters, the blur filter, whose purpose in life is just to blur this image. Well, let's see how long this takes. Let me go ahead and open up say uh blur.py, which is now going to be a Python program for blurring images. It's empty initially, but I can pretty much write this quite quickly. Now, let me go ahead and at the top of this file, write the Python keyword from PIL for Python image library. Import a object called image and another one called image filter. In particular, two features of the Python image library that's going to make this so much easier to actually solve. And then let's go ahead and define a variable. We'll call it before representing the before version of this image. And set that equal to image.open open quote unquote bridge.bmp where that of course is the name of the file we want to blur. Then let's go ahead and create a variable called after representing the after version of this same filter and set that equal to before filter open parenthesis image filter.box blur and then just to be a little dramatic I'm going to blur it more so than you needed to in problem set four but we'll see it more visibly now on the screen. Let's do an argument of 10. And then at the very end of this process, let's do after.save and save it in a file called say out.bmp. Done. So in just four lines of code, I claim I've implemented the blur function now in Python of what we did previously in C. Let me open my terminal window. Let me run the Python command this time on blur.py. Cross my fingers as always. And indeed, I've made a mistake. Perhaps even if you've never written Python before, you can see it. And in fact, we'll see a number of these errors. Some intentional, some unintentional. But on line four, what I intended to do was set equal to uh before.filter that variable I created called after. All right, that's all right. Let's go back down to my terminal window, clear it to get rid of all that, and rerun python of blur.py. Cross my fingers even harder this time. Nothing bad seems to be happening indeed. Now, let's go ahead and open up out.bmp. And before we reveal that, let's go back to the original, which is bridge.bmp. BMP. And now dramatically, let's see the blurred version thereof. Voila. Hopefully to your eyes, too. It looks quite a bit blurry. Well, how about one more flourish? Those of you who were feeling more comfortable last week and implemented perhaps uh edges edge detection in C. Well, let's see if we can whip that up quite quickly, too. Let's go ahead and write a file called edges.py using that same bridge.bmp file. And in this file, let's go ahead and do the following. As before, from the Python image library, let's import uh the image feature and the image filter feature. Then, as before, let's create a variable called before. Set it equal to image.open, passing in bridge.bmp. So, so far the same as before. Now, let's create a variable called after. Set it equal to before. Passing in this time image filter.find edges, which is different from box blur. And by definition, it's going to find the image the edges of this image. And then after, as before, let's do after.save of out.bmp and just clobber the version of the blurred file that we just created. All right, that's it. Let's go ahead and open up my terminal window now. Let's go ahead and again run Python, but this time on edges.py. Cross my fingers real hard. So far so good. And that was quite fast. Recall that the bridge.bmp image looked like this. But now when we open up this new and improved version of out.bmp, BMP. Thanks to Python in just four lines of code, we now have all of our edges detected. So, what can we then learn from C itself? Well, C had, of course, functions. And functions were those actions or verbs that simply got work done. And let's go ahead and compare side by side, much like we did with Scratch and C, the ideas that today onward, are still going to be the same. And uh how they translate to Python. So, on the left here, we'll now have our friend Scratch. This, of course, was one of the first puzzle pieces we saw. It's a purple puzzle piece saying say and it was a function in so far as it said the value of its argument which in this case is hello world. Well, we've already seen in Python what this looks like. It looks similar to the version in C, but it's no longer print f. There's no longer a semicolon and there's no longer an explicit new line. So in Python, it's quite simply this. Meanwhile, in Python, there are a whole bunch of libraries as well. Now in C we had simply header files and those header files give you access to the prototypes of that is the signatures of the functions that you want to use from those libraries. Python uses somewhat different vernacular whereby Python has what are called modules and packages and a package is just a collection of modules. But a a module is just a library using Python speak so to speak. So, anytime you hear someone discussing a module or a package in Python, they're just talking about using a library. And that library might come with the language itself just built in as standard or it might be a third-party library that you might download and install yourself much like I did a few weeks back when we installed uh the cowsay program so that I could actually have a cow or other animals on the screen display text. So, in C recall, we had something like this include CS50.h, which was the header file pre-installed for you somewhere. But we will have for at least this week a analog of the CS50 library in C also in Python just to make this transition from C to Python a bit easier. These two though are meant to be training wheels that you can take off and should take off, you know, even within a week or so. It's just meant to smooth that transition and make clear what's the same and what's different. So in the CS50 library for Python, we also have a function called get string whose purpose in life is to get a string. To access it though, you don't use hashincclude cs50.h. That's a C thing. In Python, you would say from CS50 import get string. It's a little more verbose, but it's also a little more precise as to what you want from the library, especially if you don't want the whole thing loaded into memory. So here, for instance, is now a Scratch program that was a little more interesting than just printing out hello world. This was the first program we wrote that actually got some user input. So in fact, let me go back to VS Code and let's see if we can't resurrect this C program real quickly in the form of a new hello.c. So I'm going to run code of hello.c and then in my ter in my uh code tab I'm going to do include cs50.h include standard io.h and then below that I'm going to go ahead and whip up our familiar version of this int main void and then inside the curly braces we'll bring back string even though we now know it's char star. We'll call our variable answer. Set it equal to get string. Ask the user quote unquote what's your name with a space just to move the cursor over. still need my semicolon and C. And then after that, recall back in week one, we did hello, percent s back slashn and then plugged in the variable answer so as to see hello David, hello Kelly or something else. Just to be safe, let me do make hello. All is well so far dot /hello type my name. And this version in C seems to be working. Okay, so in C, these lines of code here translate pretty literally to what we just saw. Although we got the answer variable in Scratch for free. That blue puzzle piece just existed without R having to create it. But it's a decent number of hoops to jump through in order to just get user input and print it out. Well, in Python, this is going to get a little more succinct in that the Python version of this code is now going to look like this. Print f is now print. The semicolons are gone. And what else seems a little bit different? Yeah. >> I don't need any placeholders. Yeah. So, we don't need the percent s anymore. In fact, I'm curiously using a plus, which if some of you studied Java or some other language, you might have actually seen this before. Even if you've never seen Python before, you've only seen C in CS50, you can probably guess what the plus is doing. Even if you don't know the the technical vocab, what is the plus probably doing here? Yeah. So, it's concatenating or joining together the thing on the left with the thing on the right. And we actually had that vernacular in the world of Scratch. We had the join puzzle piece that joins hello, space and the value inside of answer. A plus in Python can do exactly the same thing. So it's a little more user friendly than having to anticipate, oh, let's put the placeholder here and then come back later and plug in the variable. Humans over time just realize that it's a lot easier to sort of do this in this way than bother with placeholders. Though you can still use placeholders for other purposes. Another subtle difference between the C and Python version of these two lines. More subtle than that. What's missing? Yeah, I'm back. >> Uh, so the back slashn is again gone for Python. So that sort of happens for free indeed. And one more difference. >> You don't need to declare the type of answer. >> Yeah, we don't need to declare the type of answer. Recall that if we rewind in the C version, you needed to tell the compiler that this is a string. And last week, we could have changed string to char star, but we still had to tell the compiler what data type we're putting into that variable. In Python, we can now get rid of that data type. And Python will just figure it out from context. If get string returns a string, well then obviously the variable should store a string. If a function returns an int, well then obviously the variable should store an int. And the language is just doing more of that decision-making for you just to save you time and save you thought. There's a subtlety here though where we can make this program a little bit different. In fact, let's whip it up first in Python. Let me go back to VS Code here. Clear my terminal and let's go ahead and create a program again called hello.py. That'll open up my previous version thereof. And just so we can see these things side by side, I'm going to drag that tab over to the right of VS Code and let go. And now you can see the C version still on the left and the Python version at the right. What I'm going to do here now in my Python version is change it to be quite like the version in C now at left. So as promised I'm going to do from CS50 import get string. Then below that I'm going to say simply answer equals get string quote unquote what's your name question mark space no semicolon. But then on the next line what I'm whoops but uh parenthesis. Then on the next line, I'm going to do print quote unquote hello, space close quote plus answer. Down here, I'm going to go ahead and run Python if hello.py again. No compilation step. I'm just going to interpret it line by line. What's my name? David. And it seems now to work exactly the same. Now, it turns out in Python there's even more ways to solve problems like this, even trivial problems like this. So here we're using the plus sign, not as addition per se, but as the concatenation operator, the join operation. If you want though you can take advantage of the fact that print in Python can take more than one argument. It can take two or three or four or even zero by simply changing the plus to a comma getting rid of that seemingly superfluous space and just give print two things to print because it turns out per the documentation of print which we'll eventually see it knows that if it takes one two arguments by default separate them for you by a single space and that's something we can override as well. which one is better like h like I don't know like they're sort of equivalent. It's such a trivial difference but it speaks to the flexibility that you'll start to have whereby the language is a little less rigid than C was certainly when it comes to printing strings. So in fact if I go back to VS Code here and I go ahead and change that plus to a comma and get rid of the space inside of the quotes. I can rerun Python of hello.py, type in my name and we see exactly the same result there. But we can take this one step further. Even though it's going to look a little cryptic, this is sort of the more Pythonic way to do things. And that too is actually a term of art to do something Pythonically is to do it the way that most Python programmers would do it. It's not the only way. It's not necessarily the right way, but it's sort of the recommended way in the community. So here we have that latest version where I'm passing two arguments to print. The first is quote unquote hello, and then the second of which is the value of answer. I could similarly write this same program with this crazy syntax. Takes a little getting used to, but it turns out it's actually kind of nice overall. What's obviously different? Well, one, there's these weird curly braces are back. They're not part of the logic of the program. They're literally inside of the double quotes. But you can probably guess how this what this does for me because there's one other crucial difference. What else has changed between before and after? Yeah, there's this weird f which is not part of print f. It's actually inside of the parenthesis and next to the double quotes. And even this one when this came out was a little weird looking to people. But this is how you get this thing to be a formatted string, aka an F string, as opposed to it being just a literal string of text. Now, you can probably guess what it means to put the variable's name inside of the curly braces. It means the value of that variable is going to be substituted right there. Similar in spirit to the percent s in C, but a little more explicit. With the percent S, you had to remember that that percent S corresponds to this variable's value or something like that, which was just annoying if anything else uh if anything. But this time you have a placeholder in curly braces that just says what you want there, that particular value. And what this means more technically is that the answer variable will be interpolated by the interpreter which means its value will be plugged in right there. So let's try this. Let me go back over to VS Code and quite simply on my last line of code here, let's change the input to print to be quote unquote hello, and then curly brace answer then close curly brace close quote. And I've done this. This is intentional, but let's see. Let me go ahead and rerun python if hello.py davv ID. What are we about to see? Hello, answer. So this is a bug, but just to demonstrate like what is going on and what's therefore missing. What what did I forget? Yeah. >> Yeah, I didn't declare that this is a so-called fring or format string. The fix for this, weirdly, is just to put an F right there. And now if I rerun Python of hello.py, Pi. Type in my name again. Cross my fingers. Now I see that the variable has indeed been interpolated and its value plugged in where I wanted it. All right. Turns out we can take off one of these training wheels already. I I propose that get string just exists in the library just to smooth the transition, but honestly it's not really doing anything all that interesting. So let's take this first training wheel off. It turns out that Python comes with a function appropriately named input such that if you want to get input from the human via their keyboard, you can just use the input function. So we can already for this program get rid of the CS50 library because input essentially behaves just like the get string function. So if I go back to my Python version here, I can change get uh get string to input. And I can even go and delete this training wheel up there. Rerun Python of hello.pay in my terminal. DAV ID enter and we're still in business as well. So input is generally going to be the way you go about getting input now from the user. All right, let me pause here and see if there's any questions as we try to bridge these two worlds from C to Python. Yeah, >> so in Python, we don't need the main function. And why is that? >> Good question. In Python, why don't we need the main function anymore? because clearly that's been omnipresent in like every program we've written thus far. And here we have it in all of our Python programs thus far absent. It turns out that humans realize it's just so common that you want the file you're editing to be the main part of your program. Like why bother adding the additional syntax of saying int main void or something analogous? It's just easier if you want to write two lines of code to get some work done. Why do you have to waste my time adding all of these this boilerplate code which we've been doing up until now. Now that said, we're going to bring back main in a little bit because it will solve a problem. But generally speaking, what I'm doing here is indeed a program, but people in the real world would also call these scripts where a script is like a lightweight program that pretty much just reads top to bottom, left to right. It might be fairly lightweight. It's really synonymous with writing a program, but this is again one of the appeals of a language like Python. You can just get right in and get out and get the job done. Even Java has moved to this in recent years where you don't have to put everything in a class. Uh public static void main for those familiar. You can just write uh system.out.print line and get some work done. >> Yeah. >> Is input only for string? >> Good question. Is input only for a string? Yes. Right now it will get input from the user via their keyboard and you'll get back a string just like get string. And we'll come back to why that's maybe not a a good thing. All right. So what's more might we want to do at this point? Well, let's tease apart some differences now with C. So up until now, every argument we've ever passed into a function in C and Scratch for that matter is a so-called positional parameter. And a parameter is the same thing as an argument, but generally when you're looking at the function from the functions perspective, it's a parameter that it accepts. But when you're calling the function and passing in an input, you call it typically an argument, but they refer to essentially the same thing. And all of the parameters we've been passing into functions thus far have been positional in the sense that the order matters. the first thing, then the second thing, then the third thing, and so forth. For instance, with print f, the first thing has to be the quoted string, maybe with a placeholder, and then if there's another argument after the comma, that can be the second argument, the third argument, and so forth. But it turns out Python additionally supports what are called named parameters, whereby you don't have to rely only on the order in which you're enumerating the arguments to a function. And that's helpful because some functions, especially in the real world, when you start using other people's libraries that have lots of functionality, they might not take just one or two arguments. They might take four arguments, 10 arguments, maybe even more. And it can just be unwieldy to have to remember the precise order of all those arguments. You're just asking for trouble if you're going to screw up or a colleague is going to get the order out of uh out of whack. So with name parameters, you can actually be explicit with Python and tell it what argument you are trying to pass in by giving it an actual name. So let me go over to VS Code here and propose that we use this for really the simplest of programs in order to override that default new line that we seem to be getting for free just by calling print. In other words, let me go ahead here and clear my terminal window. Let me close. C and focus only on hello.py for just a moment. And let's make it much simpler like the very first version and just print out using Python's print function, not print f quote unquote hello world close quote. And now here I'm going to do Python of hello.py. Enter. And we still see that the cursor moves to the next line. The dollar sign moves to the next line because I'm automatically getting a new line. Well, what if you don't want that? How can you override that behavior? Well, you can actually use a named parameter in Python. And I can go up here and add a second argument that if it were just something like uh this, that would literally print out the word this because it's just another string. But if I give it a name like end equals quote unquote, I can override the default behavior of the Python print function by changing the value of its end parameter to be the so-called empty string, quote unquote, which means literally there's nothing there. Watch what happens now. If I run Python of hello.py and hit enter, the dollar sign is weirdly and sort of in the ugly way on the same line, just like it was when I made the mistake in C in week one of omitting the backslash. That is to say, what the default value of this end parameter really is is quote unquote back slashn. And I can make it explicit by changing my code as such. I'm going to go ahead and rerun python of hello.py. And now the cursor is back on the next line. And not that this is that useful other than overriding that default, but you could do fun things like exclamation point, exclamation point, exclamation point if you really want print to be excited to print some things for you. And if I now run Python of hello.pay a third time, now you see that it's ending with exclamation point, exclamation point, exclamation point. Looks a little stupid with the dollar sign. So you could even toss in a new line there. Run it yet again. And now we sort of get both of those there. But I would say the common case is to use that end uh named parameter simply to override it. So how do you learn more about these kinds of things? Well, if you go to the official documentation for Python, which is a thing more so than with C, like if you want to learn more about Python and the functions it offers and the arguments it takes, you go to the official documentation uh docs.python.org. This is essentially analogous to the so-called manual pages or man pages that CS50 has a version of, but there is no one de facto source for those man pages. Several different versions of them exist in the while. Whereas Python itself as a community maintains its own official documentation. So for instance, if you go to a specific URL like this ending in functions.html, you'll see an exhaustive list of all of the functions that come with Python besides just the print function. And we'll see a bunch of more today. If specifically you scroll down to the print uh documentation, you'll see something that's a little arcane that looks like this. But this is representative of a Python prototype, if you will, often also called a signature that just tells you the name of a function and then how many and what type of arguments it takes. So how to read this? Well, the print function takes some number of objects. So in Python specifically this syntax of star objects just means zero or more objects whatever that is like a number or a string or something else the stuff you want to print out. After that if you start using named parameters you can specify what the default separator is the separator between arguments to print. So, recall that when I did quote unquote hello, comma, quote unquote, uh, or quote unquote hello, comma, answer, that was separated automatically for us by a single space, even without my hitting the space bar inside of my quotes. That's because the default value here is in fact a single space. The default value for end, as promised, is indeed back slashn. And then there's some other stuff related to file IO that print can also deal with, but more on that perhaps another time. There's one curiosity here. In Python, it turns out that you can use double quotes or single quotes around strings, where in C, it was much more regimented. Double quotes are for strings and single quotes are for chars, characters only, single characters. It doesn't matter in Python which one you use so long as you're consistent. And stylistically, you should really pick one and go with it. And the only time you should really alternate between the two is maybe if you want to put like an apostrophe for some human's name inside of double quote inside of single quotes or something like that. But generally you have a little more flexibility in Python. And you'll see in different languages Python community tends to use single quotes at least in the documentation. The JavaScript world tends to use single quotes. Um we in CS50 often use double quotes just for consistency with what we do in C. But any uh community or company would typically have its own style guide that dictates which one you should use if only for consistency questions then on this here print function as just representative of all of the docs that you'll see. All right. Well, let's take a quick look at variables. We've used these a few times already, but let's focus in a little more detail on what's actually different in Scratch. If you wanted to create a variable called counter and set it equal to zero, you would use this orange puzzle piece here. In C, you would do something like this. The type of the variable, the name of the variable, and then set it equal to the initial value semicolon. In Python, it's going to be a little similar, but you can probably guess where we're going with this. How is this line of code probably about to change? Yeah, >> good. We're not going to bother with int or the data type more generally. We're just going to say counter cuz obviously like a smart interpreter can just figure it out from context that you're putting a zero in there. It's obviously an integer. And what else is about to go away? The semicolon. So this is the C version. And voila, this now is the Python version. And this is as silly as this example is, it's kind of representative of how languages like Python just tend to be a little more programmer friendly because you just type less and get the same work done. All right. So if we wanted to do something now in Scratch like increment the counter by one, you would use this puzzle piece here. In C, we could do something like this. In Python, it's going to be almost exactly the same except of course no semicolon. In C, we could alternatively do this. And you can also do this in Python. Uh in C though, you could also do what other technique >> plus+ I'm sorry, but Python has taken that away from us. So if you got into the habit of using plus+ or minus minus, that's great. Use them in C all you want. In Python, they just don't exist. So you'll see this more commonly instead as the heruristic. All right. What about the various types that exist in Python? Because even though you don't have to specify the types when declaring your variables, they do in fact actually exist underneath the hood. And it's worth knowing a little something about them because not knowing will lead often to some form of bug. So in C, we had types like this bull, char, double, float, int, long, and string. The last of which was thanks to the CS50 library. that last week we would have started calling uh a string charst star instead which it still is a data type the address of some char. In Python we're going to whittle this list down to a subset of those essentially whereby we still have bulls we still have floats we still have ins and we do have strings but they're literally called stirs str. So it's not a CS50 thing. The Python community call strings str. But absent from this list is any mention of star not to mention charst star. There are no pointers in Python. And indeed, as powerful as I'd hope you found uh weeks four and five to be, I dare say you also found them incredibly frustrating and challenging and want to yield bugs in your code because with that power of memory management comes a whole slew of potential mistakes that you can make. And that's true not just for CS50 students, but for programmers, adult programmers, full-time programmers around the world. And so among the other features of languages like Python is they try to take away certain features of languages like C that were just too dangerous in the first place might be wonderfully powerful might help you solve problems more quickly more precisely but if they tend to do more damage than they're worth sometimes it's worth just abstracting those details away. Similarly Java has references as some of you might know but does not have pointers per se. You can't go poking around arbitrary locations in memory in the same way that you can with C. So, let's take some of these data types out for a spin and see what's the same and what's different. Let me go back to VS Code here and let me propose that we bring back one of our old calculators from a while back. So, let me clear my terminal, close hello.py, and let me go ahead and open up a version of this program that I brought in advance, which was our calculator version 0 from back then. So, just to remind you, one of the first versions of our calculator had the CS50 library as well as the standard IO library. And then we simply got an int using get int in week one. We got another int in week one using get int. And then we simply perform some addition. So it was a very trivial calculator that we did very early on just to demonstrate some of the operators and syntax of C. Well, let's go ahead and try converting this to Python by creating our own program calculator.py. So in my terminal window, I'm going to write code of uh calculator.py. It's going to open another tab which I'm just going to drag over to the right just so we can see both side by side. I won't bother with uh say well let's do it for par here. Let me copy the C code into the Python file even though this will not work in the same way but let's keep what we need and get rid of what we don't. So instead of the slash for comments in Python turns out the convention is to use a single hash symbol like this. So it's a minor difference. It's uh half as many keystrokes. So that's nice, but we're not going to include anything like this. But we are going to do from CS50, let's import a function that I promised would exist called get int. But we'll soon get rid of that training wheel as well. We don't need main or this curly brace. We don't need this curly brace. And we don't need all of this indentation as a result. So I'm going to move all of that over to the left. I'm going to fix all of the comments to be Python comments by changing the slash to hash symbols. And now I'm going to change each of these three lines of code, as you might expect, to the Python version. So you probably can guess already, we can get rid of the int there and the int there. We can get rid of the semicolon here and the semicolon here. We can get rid of the f in print f here. And we can get rid of the semicolon here. And there's a few different ways we could do this, but I dare say the simplest is going to be to get rid of the format code altogether and that first argument and just tell Python to print x + y. So, there's a few different ways we can do this, but that's probably the most literal translation of the program at left to the program at right. Let's reopen the terminal window and run Python of calculator.py and hit enter. Let's do something like x is 1, y is two, and hopefully we do in fact get three. All right, so that's all fine and good, but let's take off one of our training wheels now. So, let me get rid of our C version here and focus just for the moment on Python. Let's take away this C code. And what was the function we can use to get user input? Yeah, it was called a little louder. It's just called input. So, let's get rid of CS50's get int already and use input instead. All right. So, this program is much simpler already. So, let's go ahead and reopen the terminal window. Run Python of calculator.py. Do one again for x, two again for y, and of course 1 + 2 equals 12. So what's going on here? Because clearly this is a step backwards. Yeah. >> Yeah. So in the context of strings, plus represents concatenation, the joining of two arguments on the left and the right here that seems to be what's happening because it's not 12 per se. It's more literally one two concatenated together. But why is that? Well, apparently the input function indeed returns a string. That is the key. Those are the keystrokes that came back from the user. might look like numbers and Arabic numerals to us one and two but it's being treated as a string more technically like underneath the hood there is some char star stuff going on there even though we're not using that same terminology so intuitively what's going to be the solution without just reverting to using the training wheel that is the get int function from CS50 put another way how did CS50 probably implement get int might you think >> Yeah. So recall that in C we could cast some data types to other data types. Typically ints to chars or chars to ints. It's not quite as simple as casting in this case because underneath the hood thanks to our knowledge of C. There's a bunch of stuff going on. There's probably a one and there's a null character. There's a two and there's a null character. So it's not quite as literal as a char to an int or an int to a char. So, we're going to more properly convert the string or the stir to an int. We're not casting, but converting. And converting just implies that there's a little more work that has to be done. But thankfully, Python can do this for us. In fact, let me go up to line four here and say, uh, pass the well, actually, let's do it in this a couple ways. Let's first convert the x value to an integer. Let's convert the y value to an integer as well. So, funny enough, it's very similar syntactically to casting, but in C, when you cast something, you actually wrote the data type in parenthesis. Now, the data type itself is a function that takes an argument, which is the stir or string that you want to convert. So, let me go back to my terminal, do Python of calculator.py, enter, type in one, type in two, and now I get back my three answer. Now, as you might imagine, just like in C, we can kind of play around with where we're performing some of these operations. And this looks, you know, arguably a little less obvious now as to what is being added. So I really like the simplicity of x plus y just does what it says. So I could convert these in other ways. I could say after line four, you know what, re change x to be the int version of x. But generally speaking, that's kind of wasting a line of code by just doing something you could do on a single line. So let me delete that and instead just say that well if I know the return value of the input function is a stir let's just pass that output as the input to the int function and it'd be a little more Pythonic so to speak to just pass the input functions output as the input to int which is really hard to say but we've done this in C just nesting function calls like this. All right so if I run this one more time Python of calculator.py pi. Type in one. Type in two. We're back now in business. Now, what I won't trip over just yet is a subtlety that whereby I'm deliberately typing in actual numbers like one and two, but if you are following along at home or on your laptop, if you were to type in cat and dog, like bad things will happen. But we'll come back to that before long. All right. Questions though on any of this conversion of our strings to our integers in this case? Oh, all right. Well, what more does Python offer to us? Well, in addition to these data types, there's actually going to be a bunch of others. A few of which we'll actually use today. In fact, we'll see ranges of numbers. That's like that's a thing built into Python. We'll see lists of numbers, which is going to be like a new and improved version of an array that solves like all of last week's problems when we talked about the downsides of using arrays. There's going to be tpples for things like x, y coordinates or GPS coordinates or anything where you have collections of values. There's going to be dicks or dictionaries whereby you can have key value pairs provided to you without having to write a whole hash table yourself. And you can have sets which you can use to just contain unique sets of values that you just want to check for membership. And there's bunches of other data types as well. And this is where languages like Python start to get really powerful because all of the data structures we talked about in C, we really only got from the language itself an array. everything else we had to build or at least talk about building in class. These now and more come with the language. Meanwhile, in the CS50 library for Python, just so you know, there are a whole bunch of functions. These though were the C versions. In Python, it stands to reason that we don't need as many because there's fewer data types in Python, but get float, get int, and get string do all exist in the CS50 library for Python. you're welcome and encouraged to use it because indeed among the goals for problem set six are going to be to redo some of your C problem set problems in Python where you can look at your own C code and hopefully um uh you like that solution and figure out how to convert it line by line essentially to the corresponding Python version but clearly we've seen ways of taking these training wheels off quite quickly as well and in fact if you wanted to import all three of those functions for a larger program you could do this just following the uh approach that I took so already, but you can also just separated them by commas like this. Or it turns out you can also import the whole CS50 library as you'll see in some code and then just access the functions within with slightly different syntax as well. All right, how about another construct from scratch and from C now in fact in Python. So in uh Scratch if we wanted to do a comparison like is X less than Y where each of those are variables then say as much here in C it looked like this and nicely enough you can probably guess already which what's going to change here like the f is about to go away the back slashn is about to go away the semicolon is about to go away but some other stuff's about to go away as well focus your attention on the syntax like parenthesis and curly braces because in Python it's just that so we got rid of the parenthesis because they didn't really add all that much logic ically we got rid of the curly braces which technically we could do in C anytime there's a single line of code inside of a conditional but for uh consistency stylistically we always use them as well. Python though does not have you use any of those curly braces at all. But Python requires that you indent your code properly. So, if you've ever been among those who are writing out your program and like everything is just crazily like left aligned and just a big mess until style 50 swoops in and cleans it up for you, you're not going to be able to write Python code like that anymore. That's been such a societal problem among programmers, newbies and professionals alike, that the language itself requires logically that if you want this line of code to execute if this boolean expression is true, you've got to indent this line by convention four spaces. You can't be lazy and leave it all left aligned and sort of fix it up later. This has made Python code arguably more readable because of these language-based requirements. Meanwhile, let's look at a if else construct in Scratch which looked a little something like this. In C, it looked like this, which is kind of a lot of lines just to express the simple idea. All of those same things are going to go away. Whereby in Python, it looks like this instead. And the only other difference worth calling out is that because you don't have the curly braces, you do have a colon which precedes the subsequent indentation as well. Meanwhile, if we've got an if else if else in Scratch in C, of course, it looked like this. A lot of this is going to go away in the flash of a screen, but there's going to be a curiosity, which is not in fact a typo. Notice what happens with the elseif. It's abbreviated L if. And honestly, to this day, all these years later, I can never remember if it's l if or else if because different languages use different shorthand spellings of this phrase. It's L if in Python. Uh because that's maybe the most succinct you can make the two words themselves. But everything else is effectively the same, including the additional colon this time. Okay, questions on any of those conditionals and syntax. Yeah. >> So, what language did they code Python? >> What a good question. What language did they code Python in? The interpreter we are using within VS code is itself written in C aka C Python. However, you can implement a Python interpreter really in any language including machine code like raw zeros and ones if you have that much free time in assembly language which we saw briefly weeks ago. You could write an interpreter for Python in Python if you really want to be meta about it or in C++ or in Java. This is the thing about programming languages. You can use any language to create a compiler for or interpreter for another language. What's going to vary is just how easy or difficult it is and how much time it therefore takes you. Good question. Other questions on any of these here features? Oh. All right. Well, let's do something a little bit uh different in Python visa VC by opening up maybe a comparison program that we looked at some time ago. So, let me go back to VS Code here. I'm going to close my calculator and I'm going to open up now from my uh distribution code today a version of our comparison program from a while back which was essentially the uh version three zero index thereof. So this one has comments which the very first one in week one did not. But notice as a refresher what this comparison program was doing. It was including cs50.h and standard.io.h. It was prompting the user for two integers via get int x and y. It was then doing a very simple comparison comparing X against Y to determine if it's less than, greater than, or dot dot dot the same as X and uh the same or equal to the same. So just so that we can go through the motions of converting one of these to the other, let's do that side by side. Let me code a program called compare.py. Let me close my terminal. Drag the Python version over to the right here. And without comments this time, let's just do from CS50 import get int. Then below that, let's do x equals get int and ask the user for what's uh x question mark. Then let's ask the user for y using get intquote what's y question mark. Then below that, let's do if x less than y colon. Go ahead and print quote unquote X is less than Y. Close quote. L if X greater than Y. Go ahead and print quote unquote X is greater than Y. Else colon, let's go ahead and print out quote unquote X is equal to Y. So I dare say these are now equivalent. It's clearly fewer lines because a lot of the lines it left were admittedly comments, but also some curly braces. And there's more syntax like parenthesis that we got rid of, too. Let me open my terminal window. Let me run Python of compare.py. We'll type in one and two. One is less than uh x is less than y. Let's do it again using two and one. x is greater than y. Let's do it one last time. One and one. And of course, those two now are equal to each other. All right. But why go down this road again? Because that was kind of a simple exercise. But recall that we introduced this comparison of ants because it was so sort of stupidly simple. even if the syntax at that week was completely new. But we ran into an issue pretty fast when we started comparing strings. And that was a problem we really only fixed in week four when we finally revealed what a string actually is. If we focus a bit more on Python strings, it turns out that we can solve that problem much more easily in the world of Python. In fact, let me go back to VS Code here. Let me close these two versions of int comparison. Let me open up at left a version of my program that I brought with me here that contains a version from week 2 wherein we finally revealed that a string is just a char star. But recall that the solution in week four as well as in week one when we first encountered this problem was to use stir comp a function that whose purpose in life is to compare two strings character by character by character using a for loop or something like that. But they have knowledge therefore of how to navigate pointers, how to look for the null character, the back/zero at the end. And all of that came from our friend string.h. Well, how can we go about implementing the same idea in Python? Well, let's open up VS Codes terminal window, open up a new program called compare.py, but this time let's get rid of the integer version thereof. Let's get two ins from the user. And I won't even use any CS50 training wheels. Let's just use the input function to get S and ask the user for a value of S. So S colon close quote with a space T equals input ask the user for a variable T. And then let's just ask the question. If S equals T, then print out quote unquote same. Else go ahead and print out quote unquote different. Let me move these side by side just so you can see the difference. Notice how much code we have to write and how much we needed to understand in order to compare something as trivial as two strings in C. But in Python, we're literally just using equals equals. And let's see if it actually works. So, Python of compare.py. Enter. Let's type in maybe cat for s and dog for t. And those are in fact different, but we would have gotten the same answer in C. Let's rerun Python of compare.py and type in cat. Type in cat again. And now it's detecting them the same. So wonderfully, Python has solved that seemingly annoying problem of not taking us literally like don't compare the pointer against the pointer. Compare what a reasonable programmer probably really cares about the values of those strings. So the equal equals is doing all of the for loop or the while loop iterating over those things character by character and actually giving us the answer we want. So what else gets easier in Python? Well, let's focus a bit more on these strings. Let me go back into VS Code here. Let me close out our two comparison programs and clear my terminal. And let me go ahead and open up a prior program that we wrote that one called agree.c. And namely in the staff version of the code online, this was agree to. C, which is where we left it. Now recall in this C program that we did the following. We first using CS50's get char function prompted the user for a char hopefully Y or N for yes or no respectively. And then we used a boolean expression and actually the combination of two using the two vertical bars to ask whether the inputed character is capital Y or the inputed character is lowercase Y. And if so, we went ahead and printed out that the user agreed. Otherwise, if they type in anything else for that character, we simply printed out not agreed. Well, how can we go about implementing that same program in Python? For instance, in a file called agree.py. Well, let me go ahead and open up my terminal window again. Let's create a file called agree.py. not pi as before. Let me go ahead and drag it over to the right so we can see these two things side by side. And let me go ahead and do this. I'm going to set a variable say called s uh equal to the return value of input quote unquote do you agree thereby asking the user the same question as before. No need to use the CS50 library because the input function here suffices. And instead of using C, I'm deliberately using S because it turns out in Python, there is no way to get a single character per se, but you can get a string that has a single character. Indeed, char is not a data type in Python. But once we have this input from the user, let's now go ahead and implement a conditional using one or more boolean expressions. Well, let's ask if S equals equals quote unquote capital Y or S equals equals lowercase Y, then let's go ahead and print out as before quote unquote agreed. And now notice what's different this time. I'm literally using the word or instead of the two vertical bars because in the spirit of Python, things tend to be a little more English-like, a little more readable, top to bottom, left to right. And indeed, or hits that nail on the head. Otherwise, if it is not an capital Y or a lowercase Y, let's go ahead and print out quote unquote not agreed. And that's it for converting this program from C here into Python. But of course, this isn't the most robust version of the program because it would be nice if the user could type in something like yes uh ye capitalized maybe in different ways. So, how might we go about implementing that? Well, we could do this in a few ways. I could of course and let's go ahead and get rid of my C version now and focus just on the Python. I could do something like this and just start oring together more possibilities like or S equals uh quote unquote yes or S equals equals quote unquote yes very emphatically or and so forth. But you could imagine that this doesn't scale very well. If I want to consider all the possible permutations maybe of the caps lock key being up or down, that's quite a few possibilities to enumerate. So perhaps we could do this a little bit differently. And in fact, we can by maybe storing all of the possibilities in a so-called list. So whereas C had of course arrays, Python has what are called lists which effectively underneath the hood are indeed linked lists as we explored in week five. Now a linked list of course can dynamically grow and even shrink. And that's indeed what Python does for us. I can simply create a list of values from the get-go. Or as we'll eventually see, I can add things to it, remove things from it, and all of the underlying memory gets managed for me. And in fact, with lists, we get a whole bunch of features that can make this possible. But for now, let's use them simply as statically initialized lists with values I know from the get-go that I want. And I'm going to go ahead and do this in VS Code. I'm going to delete most of this boolean expression, the combination of all of those there phrases. And I'm going to simply say if S is in using a Python keyword in, literally the following list of values quote unquote Y, quote unquote yes. And for now, I'm going to use just those two. But let's see how it works. Let me open up my terminal window again. Let me run python of agree.py. Really for the first time, but let me claim that it would have worked even in the previous version. Enter. I'm going to go ahead and type in lowercase y. And I've agreed. I'm going to go ahead and run it again and type in lowercase n. And I've not agreed. I'm going to go ahead and run it again. And I'm going to type in all caps. Yes, because I really agree. And yet I don't because there is a bug still in this version. So even though up here in my Python implementation I do have a list of values that I'm looking for, Python's going to look literally for those values. So lowercase Y and lowercase yes. So how can I go about tolerating different capitalizations by the user? Well, I can do this in a few different ways. I could for instance after getting the user's input in a variable called S, I could update S to be S.L, lower which is going to have the effect of lowercasing the word for me and then updating the value itself of s and now I think this will work even for an uppercase version let me go ahead and run python of agree.py pi emphatically type in yes enter and yet this time I've agreed because I forced the user's input to lowercase and then I have compared against the canonical forms I've written which are all lowercase I could have done the opposite I could have forced the user's input to uppercase and then enumerated in my Python list in between those square brackets uh capital y and capital yees but either approach here is fine now technically I don't need this additional line here I can go ahead and delete that line wherein I lowercased it and in Python I can actually ain some of these function calls together by saying input.lower so that the return value of input ultimately gets forced to lowercase by using lower here. Uh alternatively still I could just lowercase the very at the very moment I'm actually comparing it and down here I could do s. And then compare the lowercase version of what's going on uh to y or yes. Now what's really this all about? Well, this is actually an example of what's generally known as object-oriented programming or OOP for short, whereby in Python and a lot of other languages. Now, you can have variables and data types more generally that have not only values associated with them like Y or yes, but also functionality built in. In other words, whereas in C, we would have used a function from like the C type library called to upper or to lower and we would have passed as an argument to those functions the very character that we wanted to force to uppercase or to lowercase. Well, in Python and indeed object-oriented programming languages in general, the developers behind the language recognize that sometimes there's functionality that's inherently related to the values in question. And indeed, when we're dealing with strings, it's pretty reasonable to want to sometimes uppercase them or lowercase them, capitalize them, or do any number of other things. And so, built into the string type in Python is in fact the lower function itself, as well as a whole bunch of others. In fact, at this URL here, can you see the documentation for all of the string functions built into Python? More technically, when a function is built into a data type and you access it via this dot notation, instead of by calling some global function and passing an argument into it, you are using what are called methods. So methods are simply functions that are inside of objects. And in this case, the object in question itself is a string. So what's really happening with this here example when I'm checking whether the user has agreed or not is I'm taking that value that string s which is technically now an object in memory and inside of that object are is not only the user's input but some built-in functionality otherwise known now as methods and those methods were written by the same people who invented the string data type itself. So this is just the first of these examples, but we'll see yet others. But notice the syntax is actually quite similar to C, just as in C. When you wanted to go inside of a structure, you can similarly go inside of an object in Python and access not just the values ultimately, but also these built-in methods. All right, how about another comparison of C to Python again involving strings? Well, let me go ahead and reopen and clear my terminal and close out of agree.py. Let me go ahead and open up a version of copying strings from a couple of weeks back whereby we finally started solving it correctly by doing some proper memory management. So here in the staff version of copy 5.C we have not only a commented version of what we did a couple weeks back but we also have a reminder of how what was involved in copying strings in C. Recall for instance that we prompted the user in this example using CS50's get string function for a string that they wanted to make a copy of and then we did some error checking ultimately to make sure that there was enough memory and nothing went wrong. Then recall that the right solution to this problem in C was not to just use the assignment operator and assume that S can be copied into T, but rather to allocate using maloc enough memory for the copy plus one more bite for the null character. Again, making sure that all is well by checking the return value of that. and then actually copying character by character by character the characters from S into the chunk of memory now known as T or ultimately recall we used a built-in stir copy function which does all of that looping for us and then when it came time to capitalize just the copy we did a quick sanity check is the length of t greater than zero otherwise there's nothing to capitalize and if so go ahead and use the cype libraries to upper function passing as input that specific character t bracket zero and and updating t bracket zero itself. So here's an example of procedural programming in contrast with object-oriented programming. Again, I'm passing the argument to be uh uppercased into the two upper function as opposed to simply going to that character and asking it via some dot operator to for instance uppercase itself. Now I went ahead in the C version and printed out the two strings. I freed up my copy of memory that I myself had allocated and that was it for this program. So, it was a decent amount of work, recall, in C, to actually go about just copying a string. Well, as with so many things in Python, it's going to be so much easier. Let me go ahead and do this. Let me open my terminal window. Let me create a file called copy.py. Let me move it over to the right hand side so we can see them side by side. Closing my terminal window. And let's do roughly the same. Let's create a variable called s. Set it equal to on the right hand side the return value of Python's own input function because we don't really need CS50's own get string function. and ask the user for s. Then let's go ahead and create a second variable called t. Set it equal to literally s. capitalize whose purpose in life, if we read Python's documentation for string methods, will be to uppercase the first letter of the word that the user has presumably just typed in. Then I'm going to go ahead and print out as before the user's input. And I can do this in a couple of different ways, but I'm going to use one of our format strings and say s colon and then interpolate that variable s by using my curly braces to say put the value of s here. Then I'm going to go ahead and print out t by saying t colon interpolate its value here inside of quotes close parenthesis. So let's see if this works. Let me go ahead now and run python of copy.py. I'm going to go ahead and type in say cat in all lowercase and hit enter. And now notice S remains in all lowercase, but the copy indeed has been capitalized alone. All right. Well, let's take a look at one other example involving strings uh between C and Python equivalents. Uh let me go ahead and remind us that a few weeks back too, we created this uppercase program whose purpose in life was to prompt the user using get string for a string saying here's the before string. then it prints out after because the purpose in life of this program was to uppercase all of the characters in the string, not just capitalize the first one. So, as you might expect, we used a loop a few weeks back and we iterated from zero on up to the length of the string using plus+ to increment i in each iteration and then each time we went ahead and printed out one character at a time. So, strictly speaking, we didn't change the string from lowercase perhaps to uppercase. We just changed each letter to uppercase and printed it out right away. Well, how might we do something similar in Python? Well, here too we have a couple of different approaches. Let me go ahead and open up my terminal now. Run uh code of say uppercase.py. Close my terminal window and let's drag this to the right so we can see them side by side. And let's do roughly the same. Let me create a variable this time called before. uh set that equal to the return value of input and just prompt the user for that before string. Then after that, let's go ahead and print out preemptively after colon space space just to align everything nicely. But let me not print a new line yet because I want to go ahead and see uh the following string on that same line. And then let's go ahead and do this analogously to the C version first, but then tighten things up. Here's how we can iterate in Python over every character in a string. I don't need to bother with I and indexing into the string or anything like that. I can using a Python for loop simply say for each character C in that string called before go ahead and print out the uppercase version of that character. But don't yet print out a new line. But at the very end of this loop, go ahead and print out nothing but a new line. Let me go ahead and open my terminal. Run Python of uppercase.py. Enter. Type in cat in all lowercase. Cross my fingers. and after each and every one of the characters is uppercased. And what's nice about this, if nothing else, is that this for loop in Python there on line three is pretty elegant, whereby you implicitly get access to each character in the string because that's how Python knows how to iterate over a string object. But it turns out we don't have to do this quite as analogously in Python as we did in C. We don't have to do it character by character in so far as Python is object-oriented and these strings are objects and those objects have methods. those methods will actually operate on the entire string at once unlike the more pedantic work we had to do character by character in C. So in fact let me go ahead and close the C version here uh clear my terminal and hide it and let's go ahead and make this quite simpler. Let's get rid of the for loop al together and let's simply and let's get rid of that print statement al together leaving only the before variable and getting the user's input. And now let's create an after variable. Set it equal to before dot upper thereby uppercasing the entire string called before and setting the return value to the after variable. And then let's go ahead and print using our old friend string uh after colon uh space and then interpolate the value of that after version. So now we're down to just three lines at that. Let me go ahead and reopen my terminal. Python of uppercase.py enter. Type in cat and all lowercase. And voila. Now I have capitalized the cat all at once. All right. Before we take a break for some uh fruit by the foot, let's go ahead and take a look at Python's implementation of loops further. So in Scratch, recall that we implemented a loop with something like this. If I wanted to meow three times on the screen, I would literally use a repeat block. In C, it was a little clunkier to mimic that same idea. Like we could implement a variable uh called I and set it equal to zero. Then we could ask a boolean expression, is I less than three? If so, print meow and then increment i using our old plus+ friend, which in Python is now gone. In Python, we can do this almost the same except I don't think we need the data type. I don't think we need the semicolon. We don't need the parenthesis. While still exists, we don't need the curly braces. And we can't use the plus+. We don't need the f. I mean, we're mostly just trimming clutter from this here implementation. So, this is the C version. This now is the Python version. a little tighter, a little easier to read. It's pretty much the minimal syntax available to get the job done. So, how can we actually have a cat meow in this case? Well, let me go into VS Code and I'll stop doing everything side by side and just stipulate that we've done most of these examples previously in C. And in my first cat, well, I could certainly do it the easy way. And let me go ahead and create cat.py. And like we always started in the past with, I could just do me and then our old friend copy paste. And this of course was bad for bunches of reasons, but it gets the job done. In Python, if I want to do this, well, I can just borrow that same inspiration and I could say set I equal to zero, then do while uh I is less than three colon, then go ahead and print out meow and then go ahead and do I equal or rather I plus= 1 is maybe the most succinct way to express that same idea. All right, just to confirm that this works, Python of cat.py. Enter. Meow meow meow. All right. So, how else can we do this? And how can we do this more Pythonically? This is perfectly correct. Many people might implement it this way, but it's not quite as succinct as we could alternatively do in Python. Yeah. >> Yeah. So, we could maybe use a for loop. And in fact, let's let's go there because we don't quite have the same types of for loops in Python as we did in C. while loops are essentially the same, but for loops are actually a little bit different and actually a little bit better. So, let me go into my code here, delete all four of these lines, and literally just say for i in this list of values 01 and two colon print meow. In other words, in four loops in Python, you don't have the parentheses, you don't have the two semicolons, you don't have the initialization and the boolean expression and the update. You just say a little more English-like for each I in the following list or for each value of I in the following list. And what Python will do for us is automatically on the first iteration set I equal to zero. On the second iteration set I to one on the third iteration set I to two and then there's only three things in the list. So that's it. And so just as before with the Y and the yes example where I use square brackets similar to arrays and C, I was using a Python list of strings in that case. Here I'm using a Python list of integers 0, one, and two. And they're integers in the sense that they have no quotes around them. So they're obviously not strings. And I'm printing out meow this many times. And indeed, if I do Python of cat.py again, I get meow meow meow. This is correct. This is arguably better, at least in the sense that it's two lines of code instead of four. And it's arguably more readable as well. But what do you not like about this perhaps even if you're only seeing it for the first time? >> Yeah, it's going to be a lot more difficult to do things more than three times because recall in Python in in Scratch at least. And in C, we had the ability to either express ourselves literally or at least in C, we could just change that three to any number we want. 30, 300, no big deal. It's a super simple change, even though it was kind of annoying to type all of this out. Well, in Python, yeah, I could do this and say for I and 0 1 and two just to mimic the numbers that we'd be setting I equal to in the C version. Frankly, this can be any list. It could be 1 2 3 4 5 6 uh cat, dog, bird, or any three things whatsoever. But I'm just using 0 1 and two for consistency with the way C would have done it. But slightly better than this is to use one of those other data types that was briefly on the screen earlier. We have not just floats and ints and stirs and lists and tpples. We also have what are called ranges. And range is not only a data type in Python, but more literally a function that you can call to get a range of values from zero on up. So I can change this list of three values to a function call to a function called range. Pass in how many things I want and by default, per the documentation, I'll get back a list of numbers 0, 1, and two. And nicely, Python's pretty smart about this. It technically doesn't hand you back all of the numbers at once, whether it's three or 30 or 300 or 3 million. It sort of hands them back to you one at a time. So you're not using more memory just because you're doing more iterations. So now if I do want to iterate four times, five times, 30 times, 300 times. I again can just change the single value. And if you want to be fancy too, you can skip numbers. You can go count all the way through odd numbers or even numbers. You can change the incrementation factor. But the default and the most canonical is indeed just to count up like that. So if I go back to VS Code here and improve this, I can change that hard-coded list to just range of three, clear my terminal, run this cat one more time, and now I'm back in business as well. In fact, this is so common. Let me throw up one alternative to this. You'll notice that in the previous example, both in VS Code and on the screen, um I am not actually using I in any way. In fact, if you look back at how we converted the Scratch to Python code, I'm using I because when you use a for loop in Python, you have to give it a variable in some list or range of values. That's just the way it is. But I'm technically not using or printing I anywhere. And that's fine. And so it's arguably Pythonic, too. If you have a variable out of necessity, but you're not actually going to use it for anything useful, just call it an underscore instead. And even though this is weird looking, an underscore is a valid symbol for a variable name in Python. So it is Pythonic to just use this just to signal to yourself later and to colleagues that yeah, I'm using a variable because I have to, but it's not one I'm actually going to use elsewhere. It's a minor subtlety and not strictly uh necessary, but perhaps commonly done. All right, how about a couple final versions of cats then? So recall that if we wanted to do something in Scratch forever, we had a forever block which literally did that. Well, in C, we couldn't quite translate that literally. So the closest uh approximation was probably this while true, whereby you have a boolean expression that by definition is always true. So the loop is never going to stop, thereby infinite. If you wanted to print out meow meow meow on the screen, adnauseium. In Python, you can do it almost the same, but the curly braces are about to go, the f is about to go, the back slashn, the semicolon, and the parenthesis. But for whatever reason, in C, we lowercase true and false. In Python, we capitalize true and false. So, a minor subtlety, but it's now indeed capital T, but the indentation has to be the same and the colon has to be there as well. So, with that, we can of course induce intentionally or otherwise some infinite loops. As with C, you can break out of them if need be with control C to interrupt the process. But let's just see lastly with this cat how we can make it a little more abstract like the final versions of our cat in Scratch and C. So let me propose to open up here uh in a pro version of cat that we looked at that we wrote in the past. Uh it was version 12 at the time which looked a little something like this. This was one of the final versions of our cat in C that simply allowed me in Maine to call a meow function that took an argument which is the number of times I wanted to meow. This in C is how we implemented that helper function so to speak that returned nothing. So its return type was void but it did take an integer called n as its input. And then there was a for loop inside of there that printed meow that many times. So long story short, this was how both in Scratch and in C we invented our own functions. Well, how can we do this now in Python? Well, let me bring this version of cat over to the right here. Delete that previous version. And let me propose that we do this. For I in range of three, let's go ahead and assume for the moment that there is a meow function in Scratch whose purpose in life is to just meow on the screen. Well, that of course does not exist. So, in Python, I'm going to use a trick that allows me to define my own function. And the keyword for this is literally defaf for define. the name of the function and then parenthesis if it takes no arguments. You don't need the void keyword even if it takes no inputs. So let's do a simpler version of the cat first that takes no arguments and then we'll add back that argument. How do how does a cat meow? It literally just says meow on the screen. So already we seem to be an improvement. I've got like four lines of actual code here versus like 20 or so on the lefth hand side. Let's go ahead and run Python of cat.py. Enter. And we see the first of our errors which is remarkable because usually I would have messed up by now. So here we have in Python the equivalent of like a compiler error message. The program has not run. It's tried to run. It's tried to be interpreted but it encountered some error. These are generally called trace backs in the sense that you see a trace back in time of everything the program was trying to do just before it failed. So if you've called a function which called a function which called a function, you'd see all of those function calls on the screen. I've just tried to call one function. So, it's a relatively short error. This is clearly a problem. And here's the type of problem. Name error. The name Meow is not defined. So, intuitively, even if you're seeing Python for the first time, why is ma meow not defined even though it's literally defined right there? Yeah. >> Yeah. As smart as Python is visav, still kind of naive in that meow doesn't exist until line four. So, if you try to use it on line two, too soon. All right. So, in C, we fix this problem by initially just kind of hacking things together by just all right, well, let's just define it up here and then move that down there. And that's totally reasonable. And in fact, if I clear my terminal and rerun Python of cat.py, we're back in business. But I'd argue you can only do that so many times, especially once you've got a bunch of functions. You don't want to relegate like the main part of your program, which really this loop is, to the very bottom of the screen, if only because like that's the first thing you care about. I want to see at the top of the screen. And that's the whole point of putting main at the very top. So what was the solution in C? The solution in C was to put the prototype for the function at the top of the file. That though is not a thing in Python. You don't just copy that first line of code, put it at the top of the file, add a semicolon, and then it works. Instead, the Pythonic way to solve this problem for better or for worse is to actually put your code in a main function. Main in Python has no special significance in this sense. It's just convention to borrow the name that so many other languages use as the main function in those languages. But you just wrap your function in a function main so that you're defining main then you're defining meow before you're actually using the meow function per se. But I have made a mistake. If I run Python of cat.py pi. Now cross my fingers for good measure. And now the program does nothing. Why is that? Yeah. Why is that? >> Oh, sorry. Go ahead. >> Yeah, curiously, I never called the main function. So whereas in C and in Java and C++ and a bunch of other languages, main is special. Like main is the function by definition that is automatically called. Python has no such special magic. It's not going to call main for you just because you created it. In fact, I didn't even call that main function main. It's just a convention. But the solution is exactly that. Well, if the problem is that main wasn't called at the bottom of this file, what I can do is just literally call main, which we would never have done in C, but this is conventional to do in Python. So that after you've defined main up here and then define meow down here now you can call main which in turn will call meow but at that point in the story both of those functions functions exist. So if I go down here and run cat.py again now I see my meow meow meow. Now let me add one final flourish because this version of the code in C recall actually let me specify how many times I want to meow whereas here I actually have my for loop in main at the right and I'm calling meow that many times. Well, what if I want to get rid of this loop over here and de-indent main meow here and pass in literally the number three here. Well, in Python, you can just say inside of the definition of a function that it takes an argument like n. You don't have to specify the data type. Python's smart enough to figure it out. Then in your function, you can use that as with for i in range of n. Go ahead and print meow. So now the right-hand version of this program is pretty much equivalent to the lefth hand version of this program as always using fewer lines of code. Let me go ahead and run python of cat.py. Meow. Meow. Meow. We're good. And then let me make one final change if only because most every documentation you see online or website tutorials on Python will actually have you not just literally call main at the bottom but you'll do this crazy syntax that is solves a problem that we won't trip over in this class but typically it's Pythonic to actually call main after asking the question if name equals equals quote unquote_ain main. This is a stupid mouthful of code that even I had to think about when I was typing it out if I got all the underscores correct. But long story short, this convention of using a conditional before you call main allows you to write more modular code in Python so that some of your files don't actually do anything other than define define define define functions that you can then import into other files you write. So in short, this is the right way to do it. Even though in CS50 it is unlikely that we are to trip over this bug. Questions now on that last piece of how we define functions in Python. Yeah. >> Ah good question and good eye. Why do I have two lines between my functions in Python? As you will see via style 50, it is Pythonic that is Python convention to separate functions in your code by two lines. Whereas there is no such convention in C. So I'm trying to be consistent with what the world does. Yeah. >> If you want to count backwards in a loop, can you do that? Absolutely. You could use the range function in a different way. Start count uh start with a much larger value and count down. How? But you could alternatively do that with a while loop. I would say that yeah, you can make that work, but you shouldn't. It just people don't do that unless it does actually solve a problem for you. Other questions on this? All right. Well, when we looked at C, recall there was a bunch of things that ultimately like we couldn't do well. We ran into issues of like full loading point precision and integer overflow and truncation and like all of these worlds problems. Um, there's still going to be some of those, but first let's take a fruit by the foot break and we'll be back in 10. Help yourself to seconds today. All right, so we're back and let's use our remaining time together to focus not only on some of the problems that Python can solve more readily than C, but also some of the problems that remain. So here was a program early on in our discussion of C that had this weird bug whereby when we implemented a relatively simple calculator to divide two numbers x / y. We experienced what we called truncation at the time whereby 1 / 3 was curiously zero and like something like 4 / 3 was curiously one and we were losing everything after the decimal point. And this was true even if we tried using floats because with truncation recall everything after the decimal point with integer math is simply discarded. So if you do int divided by int you're going to lose what is after the decimal point. So let's take a look in Python at whether this is still actually a problem. So let me go back into VS Code here. We'll close out the C version thereof and let's go ahead and create our own program called calculator.py. And in this version, let's modify the original, which just did some addition, and instead have it do some division instead. I'll get rid of my outdated comments and perform now division instead of uh addition by doing x / y. Python of calculator.py, let's try one and let's try three. And oh, our fractions are actually back. So it turns out in Python, even when you're manipulating integers, if you divide one by the other, and the result logically should actually be a floatingoint value, that's what in fact you're going to get back. And you don't have to jump through the same hoops that we did before to actually force things to floats and then do floatingoint arithmetic and so forth. In fact, if you want the old behavior, it's still actually there. And you can use two slashes in Python to use the old integer division as opposed to what we're seeing here. But a typical programmer I dare say nowadays would want it to behave in exactly the same way. So truncation seems to be less therefore of an issue for us. All right. Well, what other problems did we encounter at the time? Well, recall we had issues of floating point imprecision whereby even when we divided something simple like one divided by three and in grade school we learned that was like 0.333 repeating infinitely many times, we started seeing weird numbers that were not three at the end of that value back in the day. in C. Unfortunately, that's a problem that's still with us. In fact, if I use this same program here, let me go into VS Code and instead of printing out just X / Y, let's go ahead and do this temporarily. Let me give myself a variable called Z and set it equal to X / Y only because it'll be a little easier to see the formatting trick I'm going to use. Let's go ahead and print out a format string that prints out Z. And for the moment, let me just claim that this is do going to do the exact same thing. It's just completely gratuitous that I'm using an F string now as opposed to just printing out Z. But if I do 1 / 3, we're still seeing 0.333. But we're only seeing just over 10 or so digits here. What if we want to see like 50 digits and really start poking around at what's being represented? Well, the syntax is a little weird, but in Python, using an F string, you can do tricks similar to what we did with the percent f with print f and c. And if after my variable's name in this uh set of curly braces, I do a colon and then a dot because I want to see numbers after the decimal point and say something arbitrary like show me 50 digits after the decimal point and treat this as a float. This is a crazy incantation I do think of a format string even I am sort of cheating off of the paper in front of me but this is how you format strings if you want to see them with a little uh more precision or so I think. If I rerun Python of calculator.py pi and do one divided by 3. Darn it, we're still in the same mess that we were before. Now, why is this? Well, it's still the case that I'm running the code on the same kinds of computers that I did before. It's still the case that these computers only have a finite amount of memory. And so, even though I'm manipulating clearly floatingoint values, Python is only allocating, say, 64 bits to those float variables. And so, there's only so much precision that's possible. And so what we're seeing is essentially the closest representation to an infinite number of threes that we can represent using binary using a floatingoint representation therein. So still a problem but I do think in Python you'll find that there's so many more libraries out there thirdparty software that comes not just with the language itself but from others whereby you can use uh libraries for more precise scientific computing that essentially implement their own versions of floatingoint values so that you can use not 64 but 128 or more bits than that when it really matters to some level of precision. Thankfully though one problem is at least solved for us namely integer overflow. So recall that this was another problem we ran into whereby if you try counting higher than say 4 billion or even higher than 2 billion if you're representing negative numbers which has the total range that you have available to you in the positive range we ran into the situation where it somehow wrapped around became negative and then even ended up being zero as a result. Well, Python wonderfully nowadays just gives you more and more bits as needed if your integers are getting larger and larger. So this is a wonderful feature and that we've at least addressed one fundamental limitation we ran into in C and this time the language itself provides us a solution. Python 2 has some pretty handy features as well. One of them is what are called exceptions. And so an exception in Python is a way of handling error conditions without relying on return values alone. So recall that in C if you ever wanted to signify that something went wrong you have to return like most recently like null n ul which was a special sentinel value technically it's just the zero address and by checking for that you can make sure that you know if you're getting back a valid pointer or not and in other functions if something went wrong you might similarly have to check the return value maybe checking for zero or negative one or one or something like that but return values were the only way in C that functions could communicate back to the programmer that something went wrong. And this is problematic because if you imagine implementing a function that's supposed to return maybe an integer, whether positive, negative, or zero, it's kind of unfortunate sometimes if you have to steal one of those values and say, uh-uh, you can't use this value. It's fine in the world of pointers because the world decided years ago, we're never going to use the actual address o x0, the zero address. But that's still technically costing us one or more bytes of space. But in general, it's a bit annoying if your function can't truly return all possible values. Think about a function like get string. If something went wrong in getstring, what do you want to return? Well, we saw in the C uh CS50 library, we do in fact return null once we introduce that. But in general, wouldn't it be nice if functions could somehow signal out of band, so to speak, that something went wrong? So, by that I mean this, let's go into a new program that's inspired by one of our programs today. And in VS Code, I'm going to go ahead and close my calculator, open my terminal window, and create a new program called integer.py. So in integer.py, let's just play around with some integers and see what we can break. So here, I'll define a variable called n, and set it equal to the input function, which comes with Python, just asking the human for some input. Then I'm going to go ahead and ask a question. Is the user's input numeric? And it turns out if you read the documentation for strings in Python, they come with not just an upper function, a lower function aka methods, but also is numeric function or method that tells you whether or not the string itself happens to be numeric. That is looks like a number. All right. So I think if I do that, I could then do something like this. If n is numeric, I'm going to go ahead and claim that in fact it is an integer. Else if it's not numeric, I'm going to claim that it's not an integer. I have no idea what it is. Maybe it's cat. Maybe it's dog. Maybe it's a mix of numbers and letters, but it's definitely not an integer as defined by a sequence of decimal digits in this case. All right, so let's try this out. Python of integer.py. Enter. We'll type in one. That's an integer. We'll type in two. That's an integer. We'll type in zero. That's an integer. Type in cat. Not an integer. So that seems to in fact work. But what if I wanted to immediately convert this to an int as we did in the past. And so let me modify this a little bit here and say instead this n equals not just input asking the user for an integer or rather let's just ask them more generally for input but let's assume that we want to convert this input to an int. And actually we can go ahead and say integer here. All right. Well, here I'm going to go ahead and just print out the claim that yep, this is an integer because if we get to line two, well, clearly we've handled uh the user's input correctly. In other words, how can I get rid of constantly checking the return val sorry, how can I get away from constantly checking the return values of functions to make sure it is what I expect. All right. Well, let's go ahead and run Python of integer.py now. Enter. Type in one tells me it's an integer. Type in two tells me it's an integer. zero tells me it's an integer. Type in cat. Notice this time what goes wrong. Whereas last time we saw this kind of trace back error message, it was a name error because I was using the meow function name too early. Now I'm getting a value error which is a different type of error that relates to invalid literal for int with base 10 cat. Now that's a mouthful. So unfortunately Python's error messages aren't all that much better than clang's error messages. But clearly the interpreter does not like the fact that I'm passing something to int related to base 10, but that's quote unquote cat. And really, the best you can do with this kind of error is realize like, okay, it's clearly the case that cat is not an integer. So, it's having trouble converting cat to an integer. It makes no logical sense. All right. So, what's the gist of the problem? Well, I'm just blindly converting the user's input to an integer, even if it's not input. uh even if it's not an integer. Well, all right. Well, I could rewind to the previous version of my function, use the is numeric function, and then conditionally convert it, but I'm trying to move away from constantly checking return values of error messages. And wouldn't it be nice if I could somehow catch this value error and just deal with it if it happens? And in fact, you can with Python exceptions and which exist in other languages as well, Java among them. You have the ability to sort of listen for errors happening inside of functions without having to rely on return values alone. So, let me go back to VS Code here, clear my terminal just to simplify things a bit, and let me literally say to the interpreter, please try to execute the following two lines of code, except if something goes wrong, like a value error, in which case go ahead and print out something like not integer. So, wouldn't it be nice if you could just wrap all of the code you've written in CS50 thus far with try and sort of ask the computer politely like please try to execute this code? But that really is the the semantics behind it. Try to execute these lines of code except if there's an error then do this other thing instead. And therefore, you don't have to check any return values. you can just blindly pass the output of the input function as the input to the int function knowing that if something goes wrong inside of there, Python is going to execute this code instead except when something goes wrong. So let me go ahead and run Python of integer.py now. I'll type in one and that works because it's trying to execute line two and succeeding. It's trying to execute line three and succeeding. So lines four and four never actually kick in. But if I try again here with cat, line two is going to fail. Line three is never going to get reached because Python is immediately going to jump to this exception handler, so to speak, thereby catching the error or the exception and printing not integer instead. So it's a little bit of a weird convention. It's different from what C offers, but a lot of newer languages nowadays do offer this because it's a better way of just writing code that you know should work 99% of the time. But if something does go wrong out of memory, the human types something wrong in or something like that, you can handle all of those exceptional cases, exceptional in a bad sense using this accept keyword instead. questions on any of this here technique. Yeah, >> a really good question. In this case, I used a value error. Do I need to define every possible thing that can go wrong? Short answer, yes. Now, there aren't terribly many. There's some standard ones and they're all capitalized in this way. Capital letter, capital letter, something error. Typically, you can even invent your own. Um, and it's good practice to enumerate the kinds of things that you think can go wrong. Value error is pretty generic, but there could be memory related errors. There could be file not found related errors. There's a bunch of different exceptions that are all documented in Python that you can listen for. That said, as nice as Python's documentation is overall, it is not good at documenting for specific functions what exceptions they can throw. And I've never understood this after all of these years that no human has gone into the documentation and painstakingly enumerated all of the possible things that can go wrong. What's too often the case in the real world with some of my own code included is if you encounter an exception that you didn't think was going to happen, you go in and improve your code and add to this list of except clauses. What else might go wrong? Shouldn't be that way. And different libraries are better about documenting these things. All right. Well, with that in mind, let me propose that in the CS50 library for Python, get int and get float, they work just like the C library whereby if you type in cat or dog or bird into those functions, they just reprompt you. They just reprompt you. And long story short, this is the kind of code we wrote in Python. Try to get input from the user except if something goes wrong, prompt them again, prompt them again. So, we too were using precisely these features even though it wasn't something that was available to us in C. All right. But something else that we did in C was play around with Mario in a few different forms. And in lecture recall a few weeks back, we experimented with like using some asy arts, some very simple text to print out something like this pyramid of height 3. Well, how can we go about printing something like this? Well, I would propose that if I go back to VS Code here, let's close out my integer examples, code up a new version of Mario in Mario.py. This one's kind of simple. I can say something like for I in range of three, go ahead and print out quote unquote a hash. down in my terminal window, Python of Mario 3, and I've got really the closest analog to three bricks stacked on top of each other in this way. But in C in eventually, uh, our implementation of Mario started to get a little fancy and we started to prompt the user for the height of the p of the wall and therefore we could have not just three but maybe four or even more bricks being printed. So, let me actually open up that version from a few weeks back whereby from week one we had a version of Mario that looked like this whereby we after including some header files declared in main a variable called n. Then we saw a new construct at the time, a dowhile loop that just keeps using get int get int get in so long as n is not uh one or greater equivalently so long as n is less than one and kept prompting the user again and again. The reason for having n up here recall was issues of scope. This therefore it's accessible lower in the function as opposed to it being confined to those curly braces. And then down here we used a for loop to actually print out that many hashes. So in short, the dowhile loop solve the problem in C, whereby you want to get user input at least once and maybe again and again and again if they don't cooperate the first time. And that's where doh loops really shine. Do something at least once and maybe again again and again. Otherwise, it's a little more annoying to do it with while loops or for loops. Unfortunately, Python does not offer a dowhile loop. And so here too, we have an opportunity to introduce you to what the world would call Pythonic. What is Python's solution there too? Well, on the right hand side here in Mario.py, let's change this a little bit and let's do from uh let's go ahead and do uh while whoops while true capital T. Go ahead and use a variable n. Set it equal to int input height asking the human for the height of the wall. And I'm going to just cross my fingers that they're not going to type in cat or dog or something that's not an int. In this case, I'm going to say if n is greater than zero, that is a positive number. That's useful. We can proceed. I'm going to now break out of this loop. And then lower in the file, I'm going to say for i in range of n, go ahead and print out the hashes. So we still have that same lesson as before, like the Python version seems to be shorter, more concise, even if you ignore the comments on the lefth hand side. And I've completely avoided using a dowhile loop. But there are a few things that are different nonetheless that feel like versus C shouldn't even work. Like what's weird about this solution even though I think it's actually correct? Yeah, >> I have two. >> Okay, so it's not correct. That's uh one of the first things to point out. So, too many prepositions for this was supposed to say for I in range. Okay. So, now that this program's correct, what looks weird to you and probably could break it. Yeah. >> Yeah. So, the end variable should be it seems to be scoped to the while loop, at least in so far as it's indented inside the while loop, which feels analogous to being inside of curly braces and C. And so it seems weird that I'm presuming to use n on line six even though it was only defined on line two. It turns out this is possible in Python. The issue of scope that we encountered in C is not as rigorously enforced. We'll say for today such that when you define N up here, you can actually use it down here. And you can think of this as being a little reasonable because if there's no more specification of what data type n is and no more semicolon. Just imagine it would look kind of stupid if you just put an a blank N there and hit enter just so it kind of exists. There's no way to express the idea of create this variable in advance without actually assigning it a value. Whereas in C we could do that. So this is in fact okay and correct. Um what else is going on here? Well instead of a do while we're kind of just implementing the idea of it. I'm just blindly inducing deliberately an infinite loop like do the following forever but then as soon as I have the answer I want like a positive integer from the human break out of this loop and this is indeed the pythonic way to say get user input because this will minimally ask the user for a height once and maybe more and more times. So no do loops only while loops and for loops and only while loops are really the same as in C. Even for loops we've seen are a bit different. All right. Well, how about instead of just that Mario uh example, recall this one where we wanted to print like four question marks in the sky side by side. Well, we can do this in a few different ways. Let me go back to VS Code, close the C version, and let's just completely change Mario.py to implement this. Now, I want four question marks in the sky. So, I think I can do something like for I in range of four, go ahead and just print out quote unquote question mark. Do you like this? Python of Mario.py Pi. Should I run it? No. Why? This is how I did it in C. Yeah. >> Yeah. I got to edit the end value, the named parameter for the print function because otherwise if I hit enter, they're all on different lines, which is not the effect I want when all four question marks are meant to be side by side. All right. Well, that's an easy fix. I can pass the named parameter called end into the print function. Set it equal to quote unquote with double quotes or with single quotes. As always, stylistically, I would be consistent. So, I'm going to use double quotes even though the documentation is consistent with its single quotes. Now, I'm going to rerun Mario of Python Mario.py. And I'm so close. Now, they're on the same line, but the stupid cursor didn't move to the next line. That's fine. How to fix this? Well, just logically, I can put a blank print statement below. And even though I'm not passing anything in, you get a new line for free when calling print. So even though I'm not passing in any arguments, I am getting the aesthetic effect that I want. So that is a perfectly reasonable way to do it. Now, if you feel yourself becoming a bit of a geek though in learning about Python and previously C, you can even solve this problem even more Pythonically by saying print quote unquote question mark* 4 using multiplication similar in spirit to the plus operator for concatenation. And now multiply the exclamation point by itself four times. So now if I go down here and run Python of Mario.py, I get a very elegant solution to exactly that same problem. even more concisely than my previous version. What if I want to do something in two dimensions? Well, recall that we moved to the underground of Mario Brothers here and we had like a 3x3 grid of bricks. How can we do that? Well, in C, we had nested for loops using I and J back in the day. And I could do the same thing in Python. Let me go back into VS Code here and let me do one outer loop for I in range of three. Then let me do an inner loop for J in range of three. Then let me go ahead and print out a hash. But let me learn from my past mistakes. I don't want to print out a new line every time. So let's override that default. But after each row, let's print a new line. So that down here, I can go in Mario.py, run it, and I've got my 3x3 grid of bricks. I could change this a little bit and call this row and column. Even though here too, even more so. I'm not literally using row and column anywhere explicitly, but semantically it kind of explains maybe a little clearer to the reader what's actually going on. So that might help. But we could tighten this up too, right? If I just want to print a 3x3 grid, well, I know that the top thing here will iterate three times. And I know how to very elegantly print things out with a oneliner. So I could just print out a hash times three in this case. And then down here, I can go to Python of Mario. And voila, I'm back in business 2. So it's just sort of easier to do these kinds of things and express yourself all the more succinctly. Well, what else can we do? Well, it turns out in Python that unlike arrays, you can ask lists how long they are. So you don't have to keep around a variable of how large an array is. You can just add stuff to a list and then ask Python how long is this list? How many elements are in it? Case in point, let me go back to VS Code and clear out Mario.py pi and let's reimplement from a few weeks back the notion of uh calculating uh like and the average uh quiz score that you might have in a class. So in score.py, let's go ahead and create a program that's got a list called scores of three scores that we've seen before, 72, 73, and 33. And recall that we tried a few weeks back and see to average these together. And to do that, we had to add them all together. We had to uh divide by the total number of elements in the list. Like it wasn't that hard. It was sort of like grade school arithmetic to calculate an average. But Python has more functions available to us. Not just length, but even summation. So let me go ahead and do this. Let me say that my average variable shall be the sum of those scores divided by the length of those scores. And indeed, per the documentation, Python has a lang function, leen for short, a sum function which takes the add uh which adds together all of the elements in that list. And so down here now I can say something like print with an f string or format string that the average is whatever that value is. And I don't have to do any loops or math myself. I can just call the function like I could in Excel or Google Sheets or Apple numbers. Python of score.py enter. And my average is in fact 59.3333. And then some weird imprecision at the end there. And in fact just for consistency with our C code, let me rename this. I'm going to rename score to scores plural. That's going to close the window. But now at least you'll see online that we have a program indeed called scores. Well, this is not that interesting because I've just hard-coded my 72, my 73, and 33. What if we want the human to be able to type that in? Well, I think we can do that, too. So, let me actually open up that version of the file now pluralized. Let me go ahead and not initialize the list for the human, but let me set it equal to an empty list. Just using an open square bracket and close square bracket, like an array that has nothing in it. But this one is literally of size zero at the moment. And now let me do for I in range of let's just for now ask the user for three scores. Even though we could certainly ask the user how many scores do they want to input and then use that number instead. So in each of these iterations, let's ask the user for a score using something like int input score. I'm going to set aside the reality that if the user types in cat or dog, the whole thing's going to break and therefore I should really add my try and my accept. But I'm going to discard that error checking and focus only on the essence of this program for now. Now after line three, if I have in a score variable the user's quiz score, how do I put it into that array? Well, in in that list, well, with an array, I had to use the square bracket notation, keep track of how big it is and use like bracket I or something like that. No longer in Python because a uh list is an object that has not only data but functions aka methods associated with it. I can just call a method that comes with every Python list called append and pass in that score using that same dot notation as before. The rest of my code can stay exactly the same. If I now run Python of scores.py pi and I type in 72 73 33 manually though I still get that same average and notice I did not need to decide in advance how big that list of scores was going to be questions on what we've just done with lists. No. All right. Even cooler for some definition of cool is that we can now implement hash tables or more generically dictionaries sets of key value pairs by just using a data type that comes with Python. I claimed last week that like Python that dictionaries are sort and hashts in particular are sort of the Swiss army knives of data structures and that they just let you associate some piece of data with others. With Python, you do not need to jump through the hoops that you needed to with problem set five implementing your own spell checker and your own hasht. you just create a dict object in Python, a dictionary that gives you the ability to associate keys with values. So, case in point, let's do this. Let me go back into VS Code and close out scores.py and let's create a new and improved version of our phone book in phone book.py. Let's go ahead and come up with a list of names just to demonstrate how we could store a bunch of names in the phone book irrespective of numbers and set those equal to say uh Kelly's name and my name and John Harvard's name just by putting four quoted strings or stirs inside of this list. Now let's ask the human using the input function for the name that they want to search for in this list. And now let's implement linear search using Python. I can do this in a bunch of ways, but one way is to say for each uh name, we'll call it n in names, go ahead and ask the question if the name I'm looking for equals the current name in the list that I'm iterating over, go ahead and print out just something generic like found and then break out of this loop. And let's see if we can find Kelly or David or John or someone else. Python of phonebook.py. Enter. Searching for the name, say David. Enter. And it was in fact found. Let me go ahead and search for someone else's name that's not in there, Brian. And now it's not in fact found. Although it's not all that enlightening to just ignore the question altogether. It would be nice to say not found. And here where is where in C it would be kind of nonobvious to do this in C. If you wanted to print out found or if you get through the whole list and you still haven't found the user, print not found. you'd have to like keep track with the variable of whether or not you found the person or you'd have to return from the code prematurely just to get out of it logically. Turns out somewhat weirdly but wonderfully usefully for loops in Python can have else clauses associated with them whereby I can say down here print not found. If I run this version of the program and search for someone who's not in the phone book like Brian now I actually see not found. Semantically, it's a little weird, but essentially what's happening is if you get through this whole loop and you never call break, then you've not actually broken out of the loop. So, you're going to hit the else. And in that case, you're going to print out not found. And this is such a common thing to like do this kind of bookkeeping and keep track of whether or not something has happened inside of a for loop. And if so, do this, else do that. Else literally handles that scenario in Python. And this is the most C unlike thing that we've perhaps seen in terms of features with regard to at least loops. All right. Well, this is great that I've kind of implemented linear search, but like we did that in C and it's getting a little tedious. Can't we do better? We actually can. Let me clear my terminal and tighten this up. Instead of iterating over every name in names, just like we keep iterating over integers in ranges and checking for each name if it equals the thing we're looking at, you can actually do something much more clever. You can just literally ask Python if the name you're looking for is in the names list, then go ahead and print out uh found, else print not found. And so this is where Python 2 gets kind of cool. In line five, you have just a simple if condition with a boolean expression name in names. How does Python know if name is in names? It uses linear search presumably to search over the whole list of names looking for what you care about and then tells you true or false if it found it. You don't have to write the code to iterate over it with a while loop or for loop or whatnot. You just say what you mean. And so here too, it's a little more English-like. If name in names, question mark, then print found, much more so than it would be pronouncable in C. So that's one other cool feature that we now have at our disposal. What's yet another? Well, when it comes to dictionary objects in C, or rather in Python, a dict object really just gives you a set of key value pairs. And we've seen this kind of chart before whereby we might have name and number and name and number and name and number. How do we translate this to code? Because in C, as with problem set 5, it was going to be quite an undertaking to be able to store a whole bunch of things in memory in the form of something like a hash table. Well, in Python, we can actually define a dictionary ourselves. So, these square brackets represent a list, but I can alternatively use curly braces for a very new purpose. I'm going to go ahead and hit enter just to move the second curly brace to a new line. And I am going to now enumerate a bunch of key value pairs. Namely, quote unquote Kelly for the first key colon. Then we'll do + one 617495 1,000 as the number. Then I'm going to go ahead and do quote unquote David for the second key. And since we both work here, I'm going to go ahead and just use that same number as we've done in before. Then a third key for John Harvard colon. And for John, we'll use plus one 949 uh 4682750, which is fun to call or text this. Now, even though it's syntactically a little different, gives me the equivalent of this chart here, key value pairs, where the keys are the staff names and the values are the staff numbers. That implements all of that, a hash table, if you will, in Python's own syntax. So, how do I now use this? Turns out I can actually use it in exactly the same way. I'm going to go ahead and generalize this now to people because it contains not just names but names and numbers. So I'm going to change this variable down here to people too. But notice the syntax now. I can still ask the human for a name they want to look up. I can now still say if the name is in the people dictionary. And by definition, Python's going to interpret that preposition in as meaning is the following key in the dictionary. And if so, it's going to return true. But what's cool about this is that besides just making this work as follows. Python phonebook.py. And let's type in David. And there's my number. Oh, that's not my number. It just says found. Let's run it again and type in say Brian. Not found. Okay, that's as expected. But I'd like to know what my number is or Kelly's number or John's number. Well, that's an easy fix, too. Inside of this conditional, I can say something like this. Number equals people bracket name. And we've not seen this before, but we have seen square brackets in C when we had arrays. This square bracket notation is how you indexed into an array to get a specific value 0 1 2 3 4. What's amazing about dictionaries, not just in Python, but in other languages as well, you can now index into a dictionary just as you can index into an array. But whereas an array you use numeric indices, in dictionaries you use string indices. You can use strings to look up their corresponding value. So to be clear, name at this point is given to us by the human's input. So if I typed in DAV ID, name equals David. So this is like saying people square bracket quote unquote David. Find David's number. that stores the answer from this two column chart in the variable called number. And all that remains is for me to print it out, which I can do using an old fing. Now, let me go down into my print statement, change this to an fstring, add a colon, add the number variable to be interpolated, rerun this program as Python of phone book.py, type in my name, and there's my number as found. And this is incredibly powerful. And why again uh hashts and in turn more generally dictionaries are sort of the Swiss army knife. Being able just to look up data with such simple syntax is wonderfully useful and powerful. And in fact we can even do more than this. For instance, let me propose that if you think about other incarnations of um key value pairs, you see them all the time. For instance, in like spreadsheets, like here's a screenshot of Google Sheets whereby I've got the beginnings of a spreadsheet with uh names and numbers. But in this model, I want to actually associate some metadata with my data. So the data I care about is the actual names and numbers. But you could imagine having a third column like email address and maybe home address or any number of other pieces of data associated with these three people. For now, I've just got two columns or two attributes, names and numbers. Each of the rows in a spreadsheet, as most anyone knows who's used a spreadsheet before, represents different records or different pieces of data, like this is Kelly, this is David, this is John, and so forth. We can implement this idea using dictionaries and lists together. So the syntax is going to be a little strange at first, but let me go back to VS Code here and let me change my people uh dictionary to be a people list between square brackets. And the elements of this list now are going to be uh dictionaries themselves. I'm going to use some curly braces inside of these square brackets and say that the name of one person is quote unquote Kelly and the number for that person is quote unquote +16174951 1000 close quote then comma on the outside of the curly braces then I'm going to have another quote unquote name colon dv ID comma then another number colon I'm going to borrow the same phone number because we both work here then lastly a comma and finally quote unquote name colon quote unquote John and then lastly a quote unquote number for John colon plus one uh 949468275 zero. All right. So what's going on here now? Our people variable is now not just a simple dictionary with just individual key value pairs. Name number name number name number number. We now have a more generalized way of storing not just a name or a number but an email address or a home address or any number of other values. How? Well, the commas just separate the key value pairs now. So, if I do have email addresses for us, I can put comma quote unquote email colon like [email protected] and I can just keep adding these key value pairs to each of the dictionaries because a dictionary is a collection of key value pairs. So it stands to reason that I can associate name with David, number with the number, email with mailinhar.edu and so forth, effectively implementing this idea now in the computer's memory. And at the risk of significantly oversimplifying, this is what Google and Microsoft and Apple are doing with their spreadsheet software. They have written code that presents to you a nice table with a graphical user interface on the screen, but underneath the hood, what they effectively have is lists of dictionaries representing each of those rows. And we're going to come back to this when we start experimenting before long with our own databases. Going to get back rows of data from databases. We are going to store that data in lists of dictionaries for the same reason as well. So, how can we use this? Well, let me hide my terminal for a second and tweak the program just a little bit. I'm still going to get the name of a person to look up their number. I'm still going to uh how about iterate over this because I've lost the ability at least for now to just ask a question like is this name in the structure because it's a list I do now need to iterate a little bit differently. So I'm going to do for each person in the people list go ahead and check is the current person's name equal to the name I'm looking for and if so go ahead and create a variable called number. set it equal to that person's number and then go ahead and print out for instance found colon then in my curly braces that specific number and then after all that break out of this. So this is a mouthful but recall that it's all the same syntax we've seen before in smaller parts. Square brackets and square brackets means here comes a list. What are the elements of this list? dict dict three dictionaries back to back to back each of which has a key and a value and a key and a value called name and number respectively. The second one temporarily has name and number and email as keys plus three values and the third one has keys of name and number as well with their corresponding value. So when I iterate over each person in the people list that means on each iteration person is going to be set to this dictionary then this dictionary then this dictionary on each iteration I'm asking this question is that current person's name key uh is rather is the value of that person's name key equal to the name I'm looking for and if so grab a variable called number set it equal to the value of that person's number key and then just print it out. And if we wanted email instead, I tweak the word uh number to email. If I want to look up anything else, you can tweak that code there. But being able to index into dictionaries using strings is sort of the fundamentally powerful new technique that we have here. Question now on any of this? Yeah. >> If both >> Good question. If you wanted both name and number on the screen, do you concatenate? Sure, you could do that. Or print them out by passing a comma into the print function and printing one out each way. Absolutely. However you want to format it. And actually, just as an aside too, even though this becomes a little less readable, this is a little silly that on line 11, I'm declaring a variable called number only to use it one line later and then never again. Technically with those curly braces and format strings, I could just take this code on the right, plug it into those curly braces and get rid of this variable altogether. Just at some point though, fstrings start to get a little too hard to read with quotes inside of quotes. And so like I kind of prefer being a little more pedantic about it and explicitly putting it in a variable and then interpolating just that variable. But you could do it in different ways still. All right, couple final features of Python that'll get us on our way with doing other things. Turns out there's a whole bunch of libraries that come with the language itself that you nonetheless have to import. Even though they're not third party, you didn't have to install them. You just need to add them to your code by importing them. One of them is CIS. And among the things that the CIS library has in Python is the ability to give you access to command line arguments. After all, we've lost access to command line arguments because there's no more main, at least by convention. There's no int main void. There's no int main argv arg stuff going on in our code. But all of that functionality is still available in a library called uh cis. So how do we use this? Well, let me go back to VS Code here now. Let me create a relatively simple program called greet.py. Similar to a few weeks back that's just going to greet the user using command line arguments instead of get string or the input function. I'm going to do this by saying from the cy library import argv. In this case, argv is essentially just a list. It is a list of the command line arguments that the human has typed. It's a list, which means you can just ask the length function leen what its length is. So, there's no need for arg anymore. You can just literally ask arg how long it is, which is kind of nice. So, I'm going to say this. If the length of argv uh equals 2, which means the human typed two words at the prompt. Okay, let's go ahead and greet them assuming that's their name and say hello, and then whatever their name is. Let me make this a format string. And to be pedantic, let me create a variable called name and set it equal to argv bracket 1, which is going to be the second word that the human typed in, as has been our convention in the past. Else, if they didn't type exactly two command line arguments, let's just go ahead and print out something like hello world as generic. Let me run python of greet.py. Enter. And you see hello world because I apparently did not type in exactly two words and yet I did. So let's see where this is going. Let me rerun Python of greet.py but type in my name David at the command line. Enter. And huh I screwed up unintentionally. What did I do wrong? All right. Print f is not a thing. So that's an easy fix. Let's delete it. Let me clear my terminal window. Rerun python of greet.py space David. Enter. And now I get hello David. The only thing that's weird here is that I typed in three words at the prompt and yet I'm checking for two. And it's a bit subtle, but with Python and RV, it ignores the Python interpreter. It goes without saying that you're using the Python interpreter to run a Python program. So the only things that are being counted are the words after the Python interpreter itself. So when I type greet.py and David, that's two. When I only typed greet.py, that's one instead. All right. So now that I've done that, I have access to my command line arguments. Again, what about my exit statuses? This was getting a little low level, but in recent C programs, we've had you all returning zero on success, returning one on error. Can we still do that? Well, yes. And in fact, the CIS library is used for that as well. So if I want to actually add some exit statuses to a program to facilitate check 50 and automated tests in the real world, I can do that with a program called let's call this uh exit.py. And in exit.py, Pi I'm similarly going to import uh CIS but in a different way. I'm going to give myself access to well yes let's go ahead and import the whole library just to demonstrate how you can access things inside of it without explicitly saying from cis import such and such as before if uh the length of cis.orgv arg. So this is a little bit different, but I'm asking the same kind of question. Does not equal to. I want to go ahead and print out to the user missing command line argument, which is something we did a while back as well. And then I want to exit with code one. CIS.exit one else. If I don't run into that issue, I'm going to go ahead. Actually, let's not even bother with an else. Let's for parody with our C version, let's do this. print f quote unquote hello uh cis.orgv bracket one close quote cis.exit exit zero. All right, that's a whole mouthful, but what's really going on? So, I could have done from cis import argv, but I don't need to enumerate every single variable or every single function that I want from a library. I can also just more generally say import the whole library. Give me access to everything and then I'll tell you what I want from it later. Therefore, on line three, I can still access argv. I just have to scope it to the cy library. So that I say cis.orgv not arg means go inside of that library and find me arguing it to a variable unto itself in my own code. Why am I saying not equal to two? Well, if they don't give me two words uh after the interpreter's name, I want to yell at them and say missing command line argument and then exit one. I'm not going to give them a default hello world anymore. I want them to give me their name. Meanwhile, if I get this far and I haven't exited from the program, I can print out cis.orgv bracket one, which is going to be David in the example I typed before. And this means success. So cis.exit zero signifies success. It's more syntax than before uh than it was in C, but we have the exact same functionality available to us as we have in the past. How about one other example that we've had in the past. Let's convert it to Python as well. So you have a few more tools in your toolkit. How about implementing a version of this phone book that actually persists? So instead of hard coding into it Kelly and David and John in this way, let's actually let the user type in a name and a number just like on your iPhone or Android phone and add it to a text file like a CSV file as we did before uh using commaepparated values. Well, it turns out that Python comes with a library to handle CSV files. We don't need to hackishly implement our own CSV support by printing the commas ourselves. Instead, we can import the CSV library. We can then create say a variable called file set it equal to open and open a file called phonebook.csv in append mode. So this is almost the same as C except it's open instead of fop which we saw a couple of weeks back. Now let's ask the user via the input function for the name they want to add to their contacts and the number that they want to add to their contacts. And then in after that, let's go ahead and do this, which is a bit of uh muscle memory to to remember, but I'm going to create a variable called writer, but I could call it anything I want. Set it equal to CSV.riter, which means there's a function called writer in the CSV library that I'm simply accessing it because I didn't import it explicitly by name. And I'm going to pass it that file. This tells Python, turn that file into a CSV that can be written to. The next line of code, I'm going to literally say writer.right row. Write row is a method aka function associated with this writer object. And I know that only because I did actually read the documentation uh for the CSV library. What do I want to write? Well, I want to write a list of values, namely a name and a number. And I'm using square brackets to tell the right row function that here you go. Here's a list of values, two of them, a name and a number. After all that, I'm going to do file.close and just close the whole file. All right, so where does this actually get me? Well, let me go ahead and open up phonebook.csv, which is initially empty. I'll move this over to the right hand side. But when I now run this program with Python of phonebook.py, enter. I'll type in, say, Kelly's name. Enter. + 1 6174951000. Enter. And voila, it ends up in the CSV using a little bit less code than we had to last time with C. Let's run it once more. And I'll type in my name. And I'll again use + 1 617495 1000. Enter. It's being appended to that file as well. And one last time for John. Plus 1 9494682750. Enter. Voila. So it's pretty easy. That is to say in Python to start creating files like this. But this isn't really Pythonic. Let me in fact close the CSV file, hide my terminal, and propose that we can tighten up this code a bit too. I don't need to open up the file way up here. I can go ahead and get my variables values uh this way first. And in fact, I could have done that code a little later anyway, but I can do this in Python. I can say with the following file opened, phone book.csv CSV in append mode and refer to it as a variable called file. Do this stuff and close the file yourself. So this program is suddenly significantly shorter because this one line has the effect of opening the file for me in append mode, assign it to a variable, do this stuff, and then as soon as the program's indentation ends and there's code over here or no code whatsoever, the file gets closed for me automatically. This just helps us avoid like memory leaks and like stupid mistakes we've made in C because you forget to close a file that you have to open and you don't necessarily notice unless you run valr or something on it. Python tries to avoid this by giving you a new keyword with that doesn't really make sense semantically except with the following file open and it will close the file for you. So that's two among the features that you sort of get with Python. The catch though is that this CSV is fairly simplistic. In particular, it's missing a header row that actually indicates what is in each of the columns. In fact, if I go ahead and run code of phonebook.csv, we'll see again that the file contains just one row for Kelly, for me, and for John. Whereas, ideally, it would look a little something more like this Google sheet version, which actually has at the very first row something say name and number, which then describes the data therein, after which are the three actual rows. Now, the simplest fix here, frankly, would probably be to just start with name, comma, number at the top of the file and then assume that my phonebook.py program is just going to append, append, append additional rows to the file containing the names and numbers respectively. I could have done that from the get-go. And in fact, that would be better than putting some code inside of phonebook.py PI that writes out that specific row because after all, if I'm writing running this program again and again, I don't want the header row to appear again and again and again unless I complicate the program a little bit to ensure that I only do that once. But assuming that I do go into phonebook.csv and from the get-go do have a file that contains name and number, we can actually start to improve upon the implementation of phonebook.py pi because we can take advantage of the fact that my dictionary can act that my writer can actually read that same header. In fact, let me put these files side by side here. And then in phone book.py, let's go ahead and transition away from using a writer to using a so-called dictionary writer or dict writer for short. Capital D, capital W. And then let me go ahead and specify one additional argument to this particular function, namely field names, which I know exists because I looked it up in the documentation. And the value of this argument is supposed to be a list of the fields that are presumed to exist in the CSV that we're about to write to. So I'm going to do quote unquote name, quote unquote number. Line's a bit long, so it's scrolling there. But if I scroll back to the left, we'll see that the line is otherwise unchanged. But when I go down now to write each respective row, notice that I don't have to rely on this list which just assumes somewhat naively that name will always be in the first column or column zero and number will always be in the second or column one. After all, if someone were to move that data around, at least in the spreadsheet using Excel or Google Sheets or something else, my code would end up being fairly fragile because at the moment it's just assuming blindly that name goes first followed by number. But once we have that header row in there and tell dict writer about it, we can actually now pass in not a list but an actual dictionary of key value pairs and let the dictionary writer figure out where in the file which column those values should go in. So inside of this dictionary, I'm going to have one key called name, the value of which is indeed the name the user typed in. The second key of which is going to be quote unquote number, the value of which is the number that the user typed in. And let me go back actually now and fix a typo from earlier. We're only asking the user for one number. So all this time I should have just requested one number aesthetically with my input function there. Now notice I have the file ready to go. Indeed name and number are there that matches the field names I've provided to my code and it matches the key value pairs that I'm subsequently passing to right row. So let's go ahead and give this a try. Let me go ahead and run again with this otherwise empty CSV file. Say for the header uh phonebook.py with uh Python of phonebook.py. Enter. I'm going to now go ahead and type in say the first name which was Kelly before plus 1 617495 1000 and watch what happens at top right. Kelly and her number end up in the file even though I didn't actually specify explicitly as with a list or numeric indices which value goes where. Let's run it once more and put in myself again. Plus 1 617495 1000. Enter. And there again I am. And lastly, just for good measure, let's go ahead and put John back in the file with plus one 949-468-2750, which if you still haven't called or texted, do feel free enter. And voila, in phonebook.csv, we have all of those same rows and code that's a little more resilient now against any changes we might subsequently make there, too. All right, how about now some final flourishes using some other features of Python that we did see a glimpse of some time ago, namely the ability to install libraries of our own choice. So, up until now in CS50.dev, we CS50 have pre-installed most of what you need, including back in week uh the earliest weeks of the class when we had that cows program that I wrote that was using a thirdparty library that I had installed into my code space in advance. Well, you can use a program called pip to install Python packages into your own code space and if using your own Mac and PC onto your own Macs and PCs as well if those libraries are freely available as open source online and in the repository from which the Python uh pit program actually draws. Let me go back to VS Code and let me go ahead and create a new program called cow.py. And with this program, I'm going to go ahead and import that library cows. And after that, I'm going to call cowsay.cow quote unquote say this is CS50 to have a cute little cow on the screen say exactly that. Now, in a previous lecture, I had pre-installed this library. But suppose I had forgotten to do so today. Let's see what other type of error we'll see on the screen. Well, let me go ahead and run Python of cow.py. Enter. And there's another one of those trace backs. This one's a little more straightforward than the name error and the value error we saw in the past. This is a literally module not found error. no module named cows. Well, this is where the pip command comes in. If something hasn't been pre-installed uh for you in cs50.dev or in the real world on whatever system you're using, you can use pip install cows and assuming you've spelled it correctly and assuming the library is publicly available, hitting enter will result in pip automatically downloading the latest version, installing it in this case into your code space and solving hopefully that problem. Let me clear my terminal window, run python of cow.py Pi again. Definitely cross my fingers. And there is the most adorable cow. And if we full screen the terminal, we'll see that he's indeed saying this is CS50. Now, that's just one of the things we can install with cows. I could also install libraries onto my own Mac and PC. In fact, in just a moment, I'm going to switch over to another computer here where I have a terminal window open on my own actual Mac. And I'm doing this because I'd like to play around with some speech uh some texttospech uh library functionality which you can't really do in cs50.dev because it's browserbased and when you run code in the cloud it's not going to pass the audio along to your speakers on your laptop or desktop. But if I'm running Python and my own code on my own computer, a Mac in this case, or a PC in someone else's case, I can install that kind of library, speech to text, and have my own code on my own computer, use my own speakers to verbalize some string quite like that. So, how can I go about doing this? Well, having read some documentation, I'm going to go ahead and install with pip a library called pi to text uh text to speech version 3. hitting enter goes and finds and downloads as needed the uh the library if it's not already installed and then brings me back to my terminal and I'm going to use an older school program here called Vim or vi to actually implement a cow program on this computer whereby I'm going to go ahead and write some code using this library without VS code but with just another text editor instead to do this at the very top of my file I'm going to import this library called Python texttospech so pyttsx3 for version three and then I'm going to use only three lines of code to synthesize some voice. I'm going to say a variable called engine. Set it equal to pi ttsx3.init because the documentation taught me that I need to initialize the library the first time I use it. I can then use this variable called engine to actually say something quite like scratch albeit verbally instead of pictorially like this is c-50 quote unquote. And then lastly I can use engine.run run and wait similar to some scratch block so that the whole expression is actually verbalized before my program actually quits. Now, the first time I run this, it might take a moment for the library indeed to initialize itself. But on my own Mac here, I'm going to run Python of cow.py. If we could raise the volume just a little bit, hopefully we'll not see but hear this cow's greeting. >> This is CS50. It was very much in a rush to say it, but after initializing for that long. And if we ran it again and again and added some optimizations, we could get it talking much more quickly than that. But we now have a version of the program that indeed verbalizes what string or stir it is that I've passed into it here. >> CS15. >> It's really in a rush to finish there. All right. But let's try one final flourish of another library that's fun to play around with, if only because it'll motivate some of the things you can now do in Python yourself. Let me go into VS Code in my code space because this one does not require my speakers. I'll close that first version of the cow and I'm going to go ahead and create a QR code generator after installing with pip uh a library called QR code which I read about online and now it's installed in my code space. I'm going to now go ahead and create a file called uh QR.py. So let's go ahead and code up QR.py and I want to generate my own QR codes. Most of you in the h are in the habit if you've ever generated a QR code before, you probably just Google around for some generator online for which someone else wrote code to generate the QR code. But I can do that for myself and actually generate my own images. I'm going to go ahead and import the library that I just installed. Import QR code. And then below that, I'm going to create a variable called for instance image and set that equal to this libraries QR code function. No relation to the make that we use for C. And I'm going to make a QR code containing a URL maybe of one of the lecture videos. So let's do httpsyoutube.com the short version and then xvfz j5 p g uh gg0 if I got that just right. Then after that I'm going to go ahead and call image.save to save that URL as a file called qr.png quote unquote. And then PNG will be the format which is portable network graphic which is akin to a JPEG or a GIF but with different features. I'm just going to double check my writing here. So we go to the right lecture video and I think we are indeed good. And what that should do after running my code is leave me with today's final flourish a ping file in my code space that when open is going to be QR code that you can scan with your phone. So if you'd like to get ready for this final flourish I'm going to go ahead and run Python of QR.PI and hit enter. Thankfully, it worked. I'm going to now open up qr.png and close my terminal window. And for our final moments together this here in week six, after which we'll ultimately transition to yet more languages and problems to be solved, here is a final code for you to scan of today's here lecture. All right, that's it for today. We'll see you next time. All right. This is CS50 and this is already week seven wherein wherein we introduce another programming language this time known as structured query language or SQL or SQL for short. Now SQL as we'll see is a different sort of programming language that allows us to solve like a lot of the same kinds of problems that we've been dabbling with over the past several weeks but arguably in a lot of context it allows us to solve those problems more easily. Indeed, among the goals for today are to demonstrate that sometimes there's multiple tools that you can use to solve the same problem, whether it's C or Python or today's SQL. Um, but we'll also see that uh SQL allows us a different sort of approach to solving problems. Whereas C very much so and Python to a large extent are very much procedural programming languages whereby you have to write these procedures, functions step by step that tell the computer what to do including loops and conditionals and all of that. SQL is said to be a declarative programming language which is a different sort of paradigm whereby when you want to solve some problem you essentially declare what problem you want to solve or you declare what question you have and it's up to the programming language to figure out using loops and conditionals and all of those lower level building blocks how to get you the answer. So ultimately today is all about teaching you yet another language mostly so that you can learn again to teach yourself new languages and to appreciate that once you exit a class like CS50 and are out there in the real world really isn't all that big a deal to pick up new programming languages especially when in advance you've seen different programming paradigms like procedural like object-oriented like today declarative as well but today ultimately is also about data and so to get us started we thought we'd collect some real world data by asking all of you a couple of questions So, if on your laptop or phone you would like to pull up this URL here, it will also exists in just a moment in QR code form. So, if you'd like to go to that URL there or simply scan this here QR code with your phone, that's going to lead you to a Google form. For those unfamiliar, Google has lots of tools among which are uh is a tool via which you can ask people questions via forms. Microsoft has something similar as well. And at that URL, what you'll soon see is a form that looks a little something like this. Among those questions are which is your favorite language, at least among those we've studied thus far. So go ahead and anonymously answer the questions you see on this form. You'll see which is your favorite language and also which is your favorite problem in problem sets thus far. And meanwhile, as you might know, if you've used Google forms yourself to collect data, we can move from questions here to actual responses. And as people start to buzz in, we'll see that the data set here is starting to update in real time. And Google gives us these nice graphical user interfaces or guies via which we can analyze the data. And so far, Python is easily the winner with 70% plus of you preferring it. 11% of you uh wishing we were still in Scratch and N 18% of you in C. And you'll see the responses are coming in here. But for our purposes today, what's more interesting than the actual answers to these questions is how we can get at the raw data. So among the things you can do in Google Sheets is quite literally click view in sheets, which is in Google forms is click on view in sheets. And what this is going to allow me to do is access the underlying raw data. Now, because Google has forms and spreadsheets, they sort of tied these two products together. But what's especially nice about Google spreadsheets is that I can also download the raw data as a file. I can download it as an Excel file, a text file, a PDF. But for today, we're going to download it in a very common format known as CSV for commaepparated values. And indeed, if I go to the file menu, download commaepparated values. This is perhaps the most uh straightforward, easiest way to get raw data out of any kind of tabular data like this to load it into code that we are about to write. So, if you haven't buzzed in already, that's fine. But at this point in time, now that I've clicked the button, I now have a CSV file in my Mac downloads folder, which if I go ahead and open up here, I can see that indeed I've got this long named file, favor-form responses 1.csv. I'm going to shorten that file name to just favorites.csv. And what I'm going to go ahead and do is open up VS Code. And in my file explorer, I'm going to literally just drag and drop favorites.csv from my Mac. that's going to have the effect of uploading the file as it was at that moment in time so that we can now begin to write some code using this file. And VS Code has automatically gone ahead and opened it up for me. And what you're looking at here is what we're going to start to call a flat file database. It's a very lightweight database in the sense that it stores a lot of data. And it's a flat file in the sense that it's literally just a text file. And by convention, the way the data is stored in this file is indeed by separating values with commas. There are other conventions as well, but CSV is probably the de facto standard. But TSV is a thing for tab separated values, PSV, which is pipe separated values where you might have a vertical bar. Essentially, these file formats try to use a character that might not appear in the actual data so as to separate your rows and columns. So indeed, if I switch back to VS Code here and we take a look at the data, you'll see that from Google Sheets, I've been given three columns. Timestamp, which was automatically generated for me, the language, as well as the problem. And what I see here is that we had a few respondents buzz in a little early. Uh very excited for today's data. But here's the rest of them from like 1:30 p.m. Eastern onward. And you'll see separating separated via commas are effectively three columns of data. So everything before the first column represents a time stamp. Everything between the first and second comma represents the choice of language that you all buzzed in with. And then everything after the second comma represents the problem. Now it's kind of uh jagged edges. It doesn't line up in nice rows and columns because some answers are longer, some answers are shorter, but the commas are sufficient to tell the code we write where one column ends and the next one begins. So, how do we go about writing code like this? If we'd now like to ask some questions about the data, like what is the most popular language? What is the most popular problem? Or conversely, the least of each of those. Well, we could look at the original data in Google forms and that's where we got the pie chart. But how is Google figuring out what the most popular answers are and what uh pie charts it wants to depict? Well, they probably wrote some code not unlike what we're about to do. Although, we'll start with just a command line environment as always. So, within VS Code, I'm going to go ahead and do this. I'm going to go ahead and open up a program called favorites.py. And let's write a program whose purpose in life is to open the CSV file, read it top to bottom, left to right, and then crunch some numbers, figure out what the most popular answers are to those questions. So, I'm going to go ahead and import a package that comes with Python, a library called the CSV library. And nicely enough, this is just code that someone else wrote years ago that figures out how to read data from a file, separating it via comma, so that you and I don't have to write all of that ourselves. Then, I'm going to use this Pythonic convention with open quote unquote favorites.csv as file. Though, if I want to be super explicit that I intend only to read this file, which is the default, I'm going to go ahead and explicitly say quote unquote R, just like we did in C when using fop to open a file in read mode. And now I'm going to do this. I'm going to go ahead and say reader equals CSV.reader file. So, this is a Python convention whereby the CSV library comes with a function called reader that takes as its sole argument here a file that has already been opened. And what that reader will do is figure out where all of the commas are so that I can iterate over this reader in a loop and get back row after row after row without me having to write all of the code to figure out where those commas are. So what I'm going to do in this loop here uh in this uh block of code is for each row in that reader, let's go ahead and just print out maybe the second column which was the language column. So I'm going to go ahead and say print row bracket one because what we'll see is that this reader which again comes with Python hands me a list a list a list for each of the rows wherein bracket zero would represent the first column bracket one would represent the second bracket two would represent the third because everything is zero indexed in Python. All right so let's see what the effect is here let me maximize my terminal window run python of favorites.py Pi cross my finger that I got this right and voila there is every language that was selected by you all in the form from top to bottom by default chronologically but there's a bit of a bug I dare say let me scroll up and up and up in this output through all of these answers until I get to the very top where I ran the program myself which is here python of favorites.py Pi. There's a minor bug here. What's the bug in the output? Yeah, >> yeah, it accidentally includes the header, which is a bug in the sense that I really just wanted to see the languages, but the code is doing what I told it to, which is just print out every row. So, there's a few ways we could ignore this. Let me go ahead and minimize my terminal window and let me go ahead and say, well, you know what? after we create this reader, let's just skip to the next uh let's just skip to the next row and ignore it effectively and then begin iterating over everything thereafter. And so what happens now is if I remaximize my window, rerun python of favorites.py enter and now scroll up again to the beginning of this incarnation of the program. You'll see that the very first thing I see after my program was run was indeed Python, Python, Python, Python, and so forth. No more quote unquote language. So, how is that? Well, this is a a feature we haven't quite seen before or talked about in much detail, but this reader is is stateful in some sense. And this was actually true of all of the file IO we did in C whereby when you were using f read or some other function to read data from the file something was remembering where it was in the file so that you didn't get the same bites again and again and again. It was more like uh a cassette tape, an old school cassette tape if you will, or a scrubber along the bar uh along the bottom of like any streaming video whereby when you just read some data, it grabs the next chunk, the next chunk, the next chunk, the next chunk, and something inside of the computer's memory remembers where it is. So, this says skip to the next row. And thus, when you do four row in reader, you get everything but the first row because the reader is stateful. It remembers where it is in memory. All right. All right. Well, thus far this isn't all that useful because all I'm doing is just printing out the data. But let's take a step toward making this program a little more useful. In particular, let's just be a little more pedantic and specify that what I'm really doing here inside of this loop is figuring out what the current rows favorite is. So, I'm going to create a variable called favorite and set that equal to row bracket one. And then even though this doesn't change the functionality, I'm going to print that favorite just because semantically, stylistically, it's nice to know what row bracket one is as by defining a variable that tells me or anyone else who reads this code in the future what it's actually doing. All right, but readers are only so useful. And in fact, if I were to open up this CSV file, maybe in Microsoft Excel or Apple Numbers or Google Sheets, again, you could imagine someone kind of moving the data by just dragging one of the columns to the left or the right such that now it's no longer timestamp language problem. Maybe it's timestamp problem language or maybe time stamp is all the way over to the right. You could imagine therefore that the indices we're using 0 1 and two could be a little fragile because if someone changes the data on me now my code is just going to break because I am blindly assuming that the second column aka bracket 1 is going to be the language column but that might not be the case but there's an alternative to this and you might recall having seen this before. I'm going to go into favorites.py and tweak my code a little bit not just to use a reader but a dictionary reader. So I'm going to change this to dict reader instead of just reader. And then the upside of using a dictionary reader is that every time I go through this loop reading row by row by row, each row that I'm handed by this reader is not going to be a list anymore that's numerically indexed with zeros and ones and twos. Each row is going to be, as you might guess, a a dictionary, which is a collection of key value pairs, which means now we can use words as our indices instead of just numbers. Which is to say if I switch from reader which gives me lists to dict reader which gives me dictionaries I can change this line 10 now and say I specifically want the language column wherever it is all the way to the left or the middle or the right. So in general using a dictionary reader is probably just going to be more robust because it's resilient against changes in that actual numeric ordering. All right, let me pause here to see first if there's any questions on this exercise whose purpose in life is just to demonstrate how we can download the CSV data then iterate over it line by line without actually analyzing it yet. No. Okay. So let's ask maybe the most natural question which is like how many people prefer Python? How many people prefer C or Scratch in turn? In other words, how can we recreate in our own code what Google Forms is doing for us graphically with those pie charts? Well, I think what we could do is write some code logically that essentially relies on this mental model. What I have here is an opportunity to use a bunch of key value pairs because if I want to know how many instances of Python there are and C and Scratch, well, those might as well be three keys, the values of which are hopefully going to be three numbers that represent the counts of the popularity of each of those languages. So in memory, I essentially want to construct something that looks like this and would if I were doing this on a chalkboard. But recall that this mental model maps perfectly to the notion of a Python dictionary because a dictionary in Python is indeed key value pairs. And we've seen it already because that's how the dictionary reader works. But we could certainly use our own uh dictionaries to solve this same problem ourselves. So the goal at hand is to count the number of people who said Python and C and Scratch respectively. So how to do this? Well, I think what I could do is Oh, and actually let me delete this line. Because we are using a dictionary reader, we no longer need to skip the first row. It is automatically consumed by the dictionary reader for us. So, this now would be the better version of the dictionary reader. Let's go ahead and do this. Let me declare some variables first that will store for me the total number of people who said Python, Scratch, and C respectively. So, I could say Scratch equals 0, uh C equals Z, Python equals Z. And I could just set three variables equal to 0 0 0 and 0. If you haven't seen it before, there are some Pythonic uh tricks you can do here. If you've got three variables that you want to initialize all at once because it's that simple, you could alternatively do scratch, c, python equals 0, 0, 0. This too would have the intended effect and it looks a little better because it's all a simple oneliner. But what do I want to do now? Well, down here, let's go ahead and do a simple conditional before we enhance this by using an actual dictionary. Let me go ahead and say if the current favorite in that reader equals equals scratch. Well, let's go ahead and increment the scratch variable by doing plusals 1 as we saw last time. Uh, else if the favorite in the current row equals equals quote unquote C. Well, let's go ahead and then increment the C variable by one. uh else if the favorite equals equals Python, then let's go ahead and increment plus equals uh Python by one instead. I could technically get away with saying else here, but I'm consciously this time not trying to overoptimize this because if someone changes the form maybe next semester and whatnot and we're asking about a fourth language, I wouldn't want my code to assume that anything that isn't Scratch or C must be Python when there could be some future fourth language. So, this is a little more robust and in this case, we'll just ignore anything that isn't Scratch or C or Python. All right, at the end of this, let's go ahead and not just print out the favorite, but outside of the for loop, let's go ahead and print out, for instance, the Scratch count is this. Then, let's go ahead and print out the C count is this. And then let's print out the Python count is this. But, of course, there's a subtle bug here. Yeah. Ah, so I didn't format these things as f string. So I need the little f over here to the left of each of these strings. All right, so let me go ahead and maximize my terminal window, run Python of this version of favorites.py, and hopefully what we'll see is not every row again and again and again, but three lines of output, giving me the total counts instead. All right, this seems to line up with the rough percentages that we saw coming in earlier on Google Forms. 109 of you like Python, followed by 58 of you in C, and 24 of you preferring Scratch instead. All right, but why does this perhaps rub you the wrong way? I already alluded to the fact that we're going to get rid of this, but why is this not the best design just using three variables like this? Yeah, >> different categories. >> Yeah, exactly. If we were to add a bunch more languages, a fourth one, a fifth one, a sixth one, a 10th one, a 20th one, like having that many variables is just certainly going to look unwieldy and it's just not going to it shouldn't rub you the right way. At that point, we should really be graduating to some proper data structure, whether it was an array in C or better still in Python, an actual dictionary. So, let's do that instead. Let me go ahead and in a newer version of this file, let's get rid of these individual variables and let's just have a generic variable called counts, for instance, and set it equal to an empty dictionary. And just using two curly braces will give me an empty dictionary. Or if you want to be more pedantic, you can actually call the dict function, which will return to you an empty dictionary. I'd argue though that most people would probably just use the double curly braces like this to indicate that here comes a dictionary for me. Now, how do I use this? Well, I don't need to update three separate variables. I think I could just do something like this. I could say once I've determined what the current rows favorite value is for language, I could say counts bracket favorite. So, use the current string as an index into the dictionary. So, it's going to be quote unquote Scratch or C or Python. and then just increment that by one. And then down here, we don't have these variables anymore. So, I'm going to go ahead instead say uh how about this? We'll use a loop for each favorite in those counts. Let's go ahead and print out uh how about the favorite value and the counts thereof without any fing. Okay. So the only thing that's different is I'm using a dictionary here which is essentially the code version of this two column chart whose keys are going to be the favorite strings uh scratch or C or Python the values of which are going to be the actual counts and I'm just doing some simple math by plus+ing or incrementing the count each time I see a certain language. Unfortunately this code is not quite going to work. Let me go ahead and run Python of favorites.py Pi and dang it, there's a key error. Let me minimize the terminal window so we can see both at once. Why is there a key error apparently on line 11 wherein I'm indexing into the counts array uh dictionary? What's going on? Yeah, >> the key already exists. >> Yeah, it's a little subtle, but if this is like the very first time through the file, there is no key Python. There is no key C or scratch because no one has put them there. And yet recall that plus equal means you're going to that location in the dictionary and just blindly incrementing it. But what is it? Well, it's effectively a garbage value. But it's not even that because there's no actual key there. So we need to do a little bit of logic here. And we can solve this in a couple of ways. Well, I could say something very pedantically like this. I could just say, well, if this favorite is in the counts dictionary, this is the Pythonic way to ask that question. Is this key in this dictionary? If so, well, then it's safe to go ahead and increment it just as I've done before. But if it's not, what I think I want to do is set counts favorites equal to one instead because either I want to increment the current count by one or this is the first time logically I've seen this favorite so I want to set it equal to one instead. We could do this a different way logically just like we could in C solve problems differently. I could instead say something like this. I could get rid of all this code and just say if favorite not in count then I could say count bracket favorite equals zero. So just always initialize it to zero if it's not there. Now I can safely blindly update the count by one because now I know no matter what once I get to line 13 that count is actually there. All right, so let's see with this version of the code. Let's go ahead and clear my terminal window. Uh, rerun python of favorites.py. Cross my fingers. And there we go. Python and Scratch and C. Interestingly, the order switched around this time uh based on the order in which I was inserting things into the dictionary. But we'll see how we can exercise a bit more control over that. But let me propose that that key error. call. We discussed briefly last week that whenever you have these kinds of trace backs that refer to certain exceptions like exceptionally bad situations that can happen, you can also change your code to just try to do something and then try to catch the exception instead. So an alternative way to do what we initially did would be this. Instead of just blindly saying go into the counts dictionary, index into it at the favorite uh key and increment it by one, what we could do is try to do that. please, except if there is a key error, in which case, you know what, go ahead and just initialize that value to one instead. So, in short, there's like four different ways already to solve the same problem. Whichever way you prefer is quite reasonable. This is just another way and arguably another Pythonic way to do things by trying to do something but anticipating that something in fact can go wrong. A while ago you removed >> a while ago what >> you removed next reader. >> Correct. A while ago I removed next reader because that was only necessary for CSV reader because that was just reading every row again and again. But when you use a CSV dictionary reader that automatically consumes the first row because that's how the dictionary reader knows what the columns will be called and so you don't have to skip over it instead. A nice enhancement. other questions on what we've just done here. All right, so let me propose that like writing this amount of code is kind of annoying just to ask a relatively simple question like what's the most popular language in this file, right? You it's been nice. It's sort of a step backwards from Google spreadsheets and Apple numbers and Microsoft Excel where you could really just like highlight the column and it would just tell you the answer usually in the bottom righth hand corner or you could use a function in one of those spreadsheet tools to ask the same question. So, it's starting to feel like with almost a 20 lines of code, like maybe there's a better way. And I dare say there is. Rather than use a flat file database, let's graduate already to what the world calls a relational database. And a relational database is simply data in which you define relations among your data, which isn't so much relevant now except that that timestamp is associated with that language is associated with that uh prefer favorite uh problem as well. But we'll see that data sets can be much more uh much larger and more complicated. And it might be valuable if we can actually express relationships across multiple pieces of data. In particular, let's introduce already a programming language called structured query language or SQL for short, aka SQL. And SQL essentially only has four fundamental operations. So even though we're transitioning into a new language, by the end of today, we're going to transition out of the new language because there's only so much you can do. Now, as with any language, it's going to take time and practice or to sort of get a hold the hang of it. But take comfort in knowing that SQL really just supports four fundamental operations. And the acronym that the world uses is indeed CRUD, which stands for create, read, update, and delete. That is to say, when using a relational database, you can create data, read data, update the data, or delete data. And that's pretty comprehensive as to what's possible. Now, what is an actual database? Well, generally speaking, a database is just a piece of software that's running on a computer somewhere inside of which is stored a whole lot of data. And that database therefore provides you with access to that data at any time, whether it's on your local Mac or PC somewhere in the cloud or to a whole cluster of web servers, which we'll talk about in the weeks to come as we transition from uh command line tools to the web. Now, technically in SQL, the commands you actually use to implement this idea of creating data, reading data, updating, and deleting data is almost the same. But for whatever reason uh the world chose the command select which is equivalent to reading data. So we'll soon see that there's a command in SQL that lets us select data which is equivalent to this idea of reading it whereas the other three options refer of course to writing data that is changing data. Um technically speaking we'll be able to insert data into a database as we'll soon see and we'll also be able to drop data altogether not just delete individual rows but whole tables so to speak of uh rows instead. So what does this all mean? Well, let's go ahead and do say an example of using SQL to solve to ask some relatively simple questions and begin to develop some muscle memory for using this new language. If I were to manually load a bunch of data into a proper database for SQL, I would actually use code like this. I would literally type create table. Then I'd come up with the name of the table, aka sheet, and then I would specify every column that I want to put in that table. And here's where the vernacular changes. So whereas in the world of spreadsheets you have sheets, tabs that contain rows and columns, in the world of databases, you have tables which are just rows and columns. It's different terminology, but it refers to conceptually the same thing. In CS50, we're going to use a specific version of SQL known as SQL light, which is like a lightweight version of SQL that's actually very commonly used in web applications, in mobile applications, but it doesn't have all of the bells and whistles or all of the scalability uh that your Oracle, SQL Servers, Microsoft Access, Postgress, MySQL, those are just product names, open source and commercial like, which if you've ever heard of just represent uh bigger, faster versions of SQL databases. is, but we'll indeed use the lightweight version of it known as SQL light. And the command we're going to start to run is quite literally SQLite 3, which is version three of the same command, which we've pre-installed into your code spaces for you. So, let's go ahead and do this. Let me go ahead and run a command called SQLite 3, which is going to let me create my very first SQLite database, and I'm going to import into that database the CSV file that we downloaded from Google Forms. In other words, I'm going to load that same data set into a different program, an actual database, so that I can use a completely different programming language to ask questions about it instead of writing, as we just did, some Python code. So, let me go back into VS Code here. Let me close my CSV file and my Python file. Let me reopen my terminal window and let me go ahead and run SQLite 3 space and then the name I want to give to this database, which for instance will be favorites. DB for database uh by convention. Enter. I'm going to be prompted to make sure I want to create this new file. Y for yes. Enter. And now I'm inside of the database running a command at a prompt that's now says SQL light and then an angle bracket. I'm not going to be using anySSQL files for now. Although you can actually write SQL code in separate text files. I'm actually going to use the databases interactive interpreter to just run all of the commands I want interactively by just typing them out. Semicolon enter. type it out, semicolon, enter, back and forth. But you can save all of these commands as you'll see in problem set 7 in files as well. Now, how do I go about actually importing that CSV file into this lightweight database? Well, for this, I'm going to execute three commands. And any command in SQLite that starts with a dot is specific to SQL light, this lightweight version of SQL. Anything that doesn't start with a dot is generalizable and will work on most any SQL database anywhere in the world, no matter the product you're using. So, I'm going to go ahead and in my SQLite terminal, I'm going to change my mode to CSV mode just to tell the database that I want to load some CSV data. I'm going to then literally import that data from a file called favorites.csv, which is the file we downloaded earlier and then uploaded to my code. And now I have to specify the name of a table. So, I'm going to call this table aka sheet favorites just to keep everything consistent. And that's it. In the absence of an error message, everything probably worked fine. I'm going to do gotquit. That quits out of SQLite. But what you'll now see if I type ls is that not only do I have favorites.csv, which I uploaded, favorites.py, which we wrote a few minutes ago, but I also now have favorites. DB, which is a database version of that same file. Now, I can't actually see what's inside of it because if I go ahead and run uh code of favorites db, I'm going to see this file is not displayed in the text editor because it is either binary or uses an unsupported text encoding. This is to be expected because this database is stored essentially in the form of zeros and ones that the SQLite 3 program knows how to read, but is not something that VS Code can just show me everything therein. And generally storing data in binary is going to be more efficient than storing things purely textually because we're going to be able to use various data structures and algorithms that we've been talking about for weeks uh more easily on that binary data. All right, so let's go ahead now and see what this import command did. I'm going to again uh maximize my terminal window. I'm going to go ahead and run SQLite 3 again, passing in favorites.db. Enter. This time it already exists so it just opened it without prompting me. And now I'm going to go ahead and type another SQLite specific command called schema. The schema of a database is just the design of the database. What does it look like? What are the rows and columns and tables therein? So if I type dots schema, what I'm going to see is this SQL command create table if not exists quote unquote favorites which is the name of the table. Then in parenthesis there are going to be apparently three columns. One of which is called time stamp. The next of which is called language. The third of which is called problem. And each of those columns is going to be raw text. Now we'll soon see that it doesn't have to just be text. But when I use the import command, this is the default table that SQLite created for me. Soon we'll see that I can exercise more control, especially over the types of data that I'm putting in this database. But what's really nice about the import command is it could not be easier to convert a CSV file to a SQLite database. So that now as we're about to see we can use SQL on it instead of Python or any other language instead. Okay. So how do we go about getting data from this database? Well, the first of our commands that we'll explore is that one called select. So select data means to read data from the database. And in this sense, it's going to be a declarative language because I'm just going to declare what data I want to select from the database. And I'm not going to worry about opening the file anymore or iterating over it with a for loop or a while loop or defining variables or the like. I'm just going to select syntactically what I want. So let me go back to SQLite here. Let me clear my terminal just to get rid of the past commands. And let's do the first of these. Select star from favorites. And I regret to say uh the semicolon is back for the SQL code we're now writing. Enter. and we will see a sort of asy art version now. So even better than the raw CSV file of all of the data that was imported into this table. So select star from favorites is apparently selecting everything. So the star in this context is a wild card of sorts that represents all of the columns in the table. The table itself is called favorites. So I'm selecting all of the columns from the table called favorites. And here you have it with sort of simple ASKI art. first column, second column, third column, chronologically listed because that's exactly how it was loaded into the database. All right, so if star is wild card, what more can we do? Well, if you don't care about all of the columns, you can actually be a little more specific. So I could say instead, select just the language column from the favorites table, semicolon, enter. And now I have just a single column of data that shows me one cell for every submission but not the timestamp or the favorite problem that that person put in. Or if I want to declare that I want a couple of columns. So I can say select language and problem but I don't care about the timestamp from favorites as such and now you get two columns instead. So in short, rather than write the dozen or so lines of code that we earlier did with Python to open the file and then iterate over it with a reader, we just select what data we want from this here database. But even more powerfully, SQL comes with a whole bunch of functions built in. Quite like the spreadsheet software that you and I are already familiar with in the real world like Excel and numbers and Google Sheets. SQLite comes with an average function, account function, distinct lower, min, max, min, uppercase, and so forth. There's a whole list of them. We'll play around with just a couple of these. If we want to transform some of this data, let me go back into VS Code, clear my SQL light terminal, and suppose I just want to get the total number of rows in the favorites table, like how many people at the moment in time I downloaded the file, even if not everyone had quite buzzed in yet, did I end up with in that file? Well, I could say select the count of all of the rows from the favorites table semicolon. And now I'll get back a single cell which gives me 272 submissions had come in the moment I downloaded that file. Suppose I want to see just to confirm that no one submitted bogus data. Which languages were actually among those typed in? Well, I can select only the distinct languages that were typed in from the favorites table. And now I get a unique list of languages that everyone buzzed in with irrespective of how many times. If I want to maybe get um how many distinct languages there are, if it's not as obvious as three here, I could select the count of distinct languages from the favorites table and it would just tell me the answer. Three is the total number of languages that are distinct in that submission. So again, it's even easy to just eyeball this, but very quickly with single statements that are sort of English-like left to right is enabling me to just select the answers I want to some of these problems. Well, what more can SQL do? Well, here is a bunch of other uh keywords that we can add to our SQL commands that allow us to control further what kind of data we're going to get back. We're going to be able to group data by similar values. We're going to check for not just string equality, but for uh fuzzy matching, checking if something is close to a string that we're looking for. We can limit the total number of rows coming back. We can order or sort the data by a certain column. And we can actually have predicates, so to speak, using a wear, which is similar in spirit to an if condition, but a little more succinctly written instead. So, for instance, let me go back to VS Code here. Let me clear my terminal again, and let me go ahead and select how many of you answered C is your favorite language. Without selecting all of the counts again, let's just uh hit the nail on the head. So, let's select the count of rows from the favorites table where the language selected equals quote unquote C semicolon. And I get back a simple answer. 58 of you buzzed in with the answer C. How many of you liked both C and very specifically the problem called hello world? If you sort of that was the extent of your sort of um the passion for for code, let's go ahead and select the count of star from favorites where the language you typed in equals quote unquote C. Uh and the problem you typed in equals quote unquote hello, world semicolon. And it looks like five of you said your favorite language was C and your favorite program was hello world. Great. All right, so it's getting a little more interesting. What about the other version of hello world where we called it hello, it's me. Well, that one's interesting because I think it's going to break my convention of using single quotes, which would be convention here in SQL. Whenever you're using a raw string, single quotes here would be the norm. But let's type this out. So, select count of star uh from favorites where language equals quote unquote C. And the problem this time equals quote unquote hello, it's me. So, at a glance, this is probably going to confuse SQLite 3 because does that middle apostrophe belong to the first one or the second one? This is ambiguous. And this is weird. In C, we would solve this problem by putting a backslash in front of it in a so-called escape character. Different languages have different conventions. This one's a little weird, but in SQLite, what you instead do is doubly single quote it. So putting two single quotes is the convention for escaping a single quote just because you got to remember or Google these kinds of things in the real world if you forget. Enter. Now I get back that. So not it was not the case that any of you liked both C and that problem specifically. Well, what if we want to be a little more inclusive of either hello problem? Well, I could do this in this way. Uh just like in my uh code spaces terminal, I can go up and down to go back through my history. Same thing in SQLite. So I can go back to commands to get up here and let me go ahead and write something longer where the problem is hello world or the problem equals quote unquote hello it's double apostrophe me single apostrophe semicolon oh and parenthesis. So it's wrapped onto two lines here. So, it's a little messy, but I'm just logically saying where you buzzed in with C as your language and a problem of hello world or a problem of hello, it's me. Enter. It should be the same answer as before because none of you liked hello, it's me. But I chose this syntax because I can actually make this a little cleaner. I can go and delete this whole parenthetical and just say where language equals C. And the problem is like quote unquote hello, percent sign, single quote semicolon. So this is a little weird too. It's just how SQL does this instead. But whereas previously I was using an equal sign to check for literal string equality like literally those problem names, like allows me to use wild cards. And it's not a wild card quite like the previous used of the asterisk that we saw. When you are using a wild card in a string in SQL, you say percent sign to represent zero or more characters there. So hello, space percent is going to hopefully match this or the other problem that started with hello, so let me go ahead now and hit enter. The answer is still going to be the same, but indeed it's demonstrative that that is how you could express yourself a little more generally if you wanted a pattern match like that. Questions now on any of these techniques? Yeah, >> capitalization capitaliz. >> Uh, good question. Does it have to be capitalized when doing string equality? Yes, but not with like. Like will tolerate case insensitivity. So uppercase or lower case, >> but like count and everything. >> Oh. Oh, I see. Good question. So the capitalization so stylistically in SQL I would argue and this is a stylistic convention in SQL certainly for CS50 and also for a lot of companies and communities in the world to uppercase your SQL keywords just to make them stand out from words that you and I chose as like the name of the table or the name of the columns therein. This is just a convention. I would propose like always to be consistent but for CS50 and for style50 sake I would propose that you indeed capitalize like this. And frankly, it just makes it easier to read to my eye because the SQL stuff jumps out and then the lowercase stuff is specific to your data set. A good question. All right. How about another uh set of keywords that we saw on the screen earlier, namely grouping by? Well, suppose we have a data set like this whereby we suppose we have a data set like this whereby how does this go? Happy Halloween. whereby here's just an excerpt from that table. So for as languages go uh say one of you liked C, two of you like or three of you liked Python and then now that we're introducing SQL, let's imagine that two of you now like SQL even better. So that's the extent of the data set. Wouldn't it be nice to be able to figure out how many of you like C or Python or SQL? Well, I could write some Python code, open the file, iterate over it using variables, using a dictionary, and those what 20 or so lines of code we wrote earlier to answer this question. Wouldn't it be nice to just ask the SQL language to figure out how many of you like C, how many of you like Python, how many of you like SQL? We can do this by grouping these cells by common values. Let's group all of the Python rows together and all of the SQL rows together. And even though there's just one, all of the C rows as well. So, how can we do this? Well, let me go back to VS Code here and clear my terminal. And let's do this. Let's select every language but its respective count as well from the favorites table. But before you do any of that, group everything by language. So this one takes a little more practice and getting used to, but this is simply saying select all of the it's saying look at the languages essentially group all of the common languages together and then figure out what count that gives you for all of the grouped rows. If I hit enter here, we'll get an answer just like the Python code that took me 20 lines of code to write earlier. What's really happening though in the database is something a little bit like this. Notice, of course, that there's only one version of C. There's then three versions of Python and there's two examples of SQL. And the table I'm essentially building is to group all of those by identical values and then spit out the total counts here. Now on the screen, it's just one, three, and two. in the data set with some 200 plus responses, we have much larger answers including scratch instead of SQL right here. But this now sort of speaks to just how much more convenient it is to if you want to ask a question like that, especially if the data set is more than a couple of hundred rows. If your boss for instance in the real world has a CSV data set and wants you to analyze the data, well, you can literally download it, import it into SQLite, run one command, and boom, like you've got this analysis done. if the extent of it is just to group the data and figure out uh what kinds of uh counts you have in the data set. All right, what else can we do? Well, we can play around with this a bit more. Let me go back here into VS Code and propose that we could uh order those results more than in just the uh the default way. So, let's go ahead and select the language uh and the count from the favorites table yet again. Let's group by language yet again, but this time let's order by the counts column in descending order. So, it's a bit more of a mouthful and it takes some practice to memorize all of the syntax, but when I hit enter now, I get back the same answers, but Python is at the very top of the list. Now, count star isn't necessarily all that self explanatory, and indeed, it's a little annoying that I have to write out count star here at top right as well as in the beginning. So, it turns out SQL also supports aliases. So if you want to change the temporary name of the column to be something else like n for number, well then I can actually define an alias with the keyword as order by n at the end of this statement and then hit enter and get back the same results too. And so if it's not sort of implicitly clear already, each of these SQL select commands is essentially giving me back a temporary table. This is not being saved anywhere. Like now it's gone from the computer's memory once I've actually gotten my answer. But it's essentially returning a subset of the tables that do exist in the computer's memory because that's what the import command did for me. It loaded the whole data set into memory. And now I have these temporary tables that are just containing the answers to questions I care about. And if you only care about the top one language, well, there's a limit keyword, too. I can literally just say limit one at the end of that exact same statement. Enter. And now I've got a single answer to my question. A single row saying Python was the most popular with 190 people selecting that. All right, for now I think that's enough on select. There's a few more keywords, but it really is just a matter of composing these building blocks. Questions though on these capabilities fundamentally. All right. Well, how about maybe inserting data instead? So here might be the canonical way to insert a row into a table in SQL. You literally say insert into then the name of the table then in parenthesis the one or more columns for which you have data and then literally the word values and then in another set of parenthesis a commaepparated list of the one or more values that you want to insert into those there columns. So for instance let me go back into VS code here. And of course at the time we circulated this form a few minutes ago we had not yet assigned problem set 7. But in problem set seven is a problem called 50ville, which let's propose might very well be someone's favorite in a week. So let's go ahead and insert that row now pro uh preemptively. Let's insert into the favorites table two columns, language and problem. Why? Well, I don't really care to figure out what the time stamp is and the format thereof. So I'm just going to omit the time stamp altogether. But the values I'm going to insert for this new row are going to be are going to be quote unquote SQL comma quote unquote uh 50 bill close quote close parenthesis semicolon enter. Nothing bad seems to have happened. Let me go ahead and select star from favorites just to see what my data set looks like now. And indeed at the bottom of the file or the bottom of the table indeed there is that new row. But what's sort of noteworthy is that this isn't just blank. There's our old friend null, which is not a null pointer. It's the same word literally, null l, and it refers explicitly to the absence of data. And this is actually a nice feature because if any of you have ever used like Google spreadsheets, Apple numbers, Microsoft Excel, and thought about uh or looked at cells that are blank, like what does it mean if a spreadsheet cell is blank? Does it mean like there's literally no data there? Does it mean that you just don't have the data there or it's missing in some form? Well, how do you address that? Well, maybe you put like n sl a in English for like not available or something like that, but that's kind of hackish. And if you use na, that might mean that no one can actually type na as their answer. And so what's nice about SQL and data and database languages more generally is that null signifies the conscious omission of data. It's not just a missing value. It's consciously not there. It's not just the empty string, quote unquote, for instance. So we might see different examples of that. But what's nice now is that I can distinguish null from other values. And in fact, if that is not a good idea to have any data in my data set that is null for whatever reason, like it just looks like bogus data, it would nice to know who inserted that when. No problem. We can also delete data from a table in SQL. And I can delete from the name of the table where some condition is true. So for instance, if I want to delete that, I can do this in a couple of ways, but perhaps the simplest is to delete from favorites where uh timestamp is null. Semicolon. So is 2 is another SQL keyword here. And that will go ahead and delete only those rows where the time stamp is null. Enter. Let's do the same select command as before. Enter. And voila, that row is now gone. Be very, very, very careful with delete statements. If I had foolishly done this, want to guess what the results would be? It would delete everything. And like you can Google around and see actual articles of like interns at companies who had way too much access to a company database executing something like delete from favorites because they forgot the predicate. They hit enter too soon. and boom, all of the data is now gone. So these are very destructive commands and just like in the real world, if you don't have backups or versions of these same tables, the data can indeed be lost forever. So don't do that. Always have your wear and make sure your wear is correct. All right. Well, let's go ahead maybe and um suppose let's claim that maybe 50ville is going to be a really popular problem among students. So much so that it becomes overnight everyone's favorite problem. Well, we can update the table as is. Here is the general syntax for updating rows in a table. You literally say update the name of the table, the word set, and then a bunch of key value pairs. The column that you want to update, setting it equal to the value that you want to update it to where some condition is true. So, what does this mean concretely? Well, let's say that we want to change everyone's favorite to SQL and 50ville. I could do this. update favorites set language equal to SQL comma problem equal to 50ville close quote semicolon and this is where again it can be dangerous but in this case I'm going to go ahead and hit enter without any predicate to filter this nothing bad seems to happen but if I now do select star from favorites semicolon all of you would seem to like 50 bill and there is no going back to the previous version of the table unless I quit out of this And I import the whole CSV again, maybe after deleting the data entirely. All right. So, how do I get rid of all of the data? Well, if you want to delete from favorites for real now, enter. Select star from favorites. We can confirm that that was a bad idea. There's literally no data in the database anymore, but we can certainly restore from our actual CSV. So in short, we've got select, we've got insert, we've got update, we've got delete, we've seen create, albeit automatically generated by SQLite 3. Maybe we'll see drop. And actually, we can see drop now. So recall that if I do dots schema, I can see all of the tables in this here database. If I do drop table favorites semicolon, and now again dot schema, now there is nothing in this database at all. So that's an even worse command to run unless you know and intend what you're doing. Questions then on these CRUD operations creating, reading, updating, deleting. Yeah, here first. >> Why do you not do quotation marks around null? So null is a special symbol and if you put quotation marks around it, you would literally be looking for the value null l that maybe was the name of a language or the name of a problem or something literally in the CSV. We are looking for the absence of that data altogether. Yeah. >> Really good question. Is it's so easy to destroy data like this. Are people actively backing up their data? Short answer, yes, absolutely. Like all of CS50's web apps and the like are automatically backed up on some schedule. Even then, we have to decide what that schedule is. And if it's daily, for instance, nightly, we could lose up to like 23 hours 59 minutes of data. In some case maybe companies would therefore version their data more tightly like every 5 minutes every minute although that's going to consume a lot more space but there already is this theme of trade-off certainly in computing um you can also implement forms of access control so SQLite is lightweight it has no notion of usernames or passwords if you have access to the data you can touch everything but in the real world with uh commercial and open source software like uh Oracle and SQL server and Postgress and MySQL you actually have usernames and passwords and specific permissions so you can give users in turns the ability to select data but not update or delete or insert data or any combination thereof. So there are defenses other questions on these here CRUD commands. Okay, let's go ahead and play with some real world data. So many of you might be familiar with IMDb, the internet movie database, which is a great repository of data for movies and also TV shows and actors and the like. And within IMDb's website, you can actually download uh TSV files, tab separated values of files that contain a lot of the data from that their website. So we went ahead and did this. We then converted that TSV data into a whole bunch of SQL tables so that we can begin to play with it uh in the context of TV shows. However, let's start first with a question about how you could go about modeling data for TV shows themselves. So for instance in advance I also uh created a few different spreadsheets that just allowed me to play with how I might model data real world data at that. So the office is a very popular uh TV show. The US version here is uh the US version here starred Steve Carell and others. So if I think about how IMDb or maybe just even little old me with a spreadsheet might keep track of who starred in what TV show. Well, I might just use a Google sheet like this and in the first column have a title column where this is the title of the show, like The Office. And then if it stars one person, I would put Steve Carell in the next column. But if there was a second star, I might put Rain Wilson or John or Jenna or BJ Novak here, column by column by column. And I could just keep adding show after show after show after show, one row per show, and then however many stars that are in there. What might you not like about the design of this data, though? or what might start to look odd. >> Yeah, it's a little weird that we have star star star. Just this repetition has tended to be bad. Anytime we're copying and pasting should rub you the wrong way. Other observations about it too? Yeah. >> Yeah. At the moment I've got 1 2 3 four five stars and there's certainly TV shows with fewer TV stars and more and so okay I can add more columns. I can just keep saying star, star, star, but then it's going to be a very ragged data set, very sparse data set where there's going to be a lot of blank cells for shows that have small casts, but then a lot of columns for shows that have large casts. So, it just feels like this should be rubbing you the wrong way. It just feels like it's going to get messy, especially as the number of stars, let alone shows, gets larger. All right. Well, another version of this uh data set that I put together is this instead. So, I didn't like the fact that I was going to have an arbitrary number of columns based on the specific show in question. So, here I scaled back and I just have a single column for title as before, but now a single column for star. And I decided that if a TV show has multiple stars, well, I just put each of the stars names and then to the left of them specify the show that they're in. seems to be a little better and that I've solved some of the redundancy problem, but I've kind of just kind of like covered up the hole in a leaky hose and now another leak sprung up here, which is to say there's still a bad design. What's bad here? Yeah, >> yeah, now I've got the office, the office, the office, the office, the office. And that too feels like I'm wasting space. If I manually type this in, odds are eventually I'm going to screw up and one of these is going to be misspelled, which is going to break something somehow. So, this two doesn't feel quite ideal. So the third and final version I whipped up to model this data which is going to lead us to a similar design in an actual database looks a little more arcane but is the right way at least academically to do things and we'll see technologically too this is going to be a big game. So here I now have a spreadsheet with three separate sheets. One is called shows which is selected at the moment. Another is called people which is not selected yet and the third of which is called stars. What am I doing here? Well, notice that in the show sheet, I've still got the title column, but I've decided to give the office a unique ID. Much like a Harvard student has a unique ID number, much like an employee in a company probably has a unique employee ID. Similarly, have I given the office a unique identifier that happens to be the same as it is in IMDb. Meanwhile, for all of the people that exist in the world of TV shows, for instance, these five folks, I have their names as well as unique IDs for them. and those integers are unique to the people and no connection per se to the show ids just yet. But the third and final sheet I've whipped up is going to be a sort of cross referencing sheet that allows me to associate shows with people. And at a glance, this looks the most arcane of the three because it's just numbers. It's just integers. But if you recall from a moment ago that the office's unique ID was 386676. Well, that's how we associated that show with this person which happens to be Steve Carell and so forth. Now, at a glance, not very useful to me, the human unless I do some fancy spreadsheet stuff like VLOOKUPs, a familiar, the like, but this is a stepping stone to how proper databases do actually store data. What I have done here is normalize the data by eliminating all redundancies except for maximally some redundant integers. And why is that? Well, integers, at least we know from our days in C, are going to be a finite length. It's going to be 32 bits, maybe 64 bits, but it's always going to be the same number of bits. And that's nice because anytime you have a fixed number of bits, it lends itself to storing things nicely in an array or doing binary search because everything is a predictable distance apart as opposed to strings like Steve Carell or John Krinski or the names might vary in length. These IDs for the title of the show and these IDs for the persons are not going to vary in length because they're all just integers. But of course, this spreadsheet now much less useful because if I want to figure out who is in the office, well, first I have to figure out what show this is, then I have to figure out what uh person this is and this is and this is but that's where SQL is again going to swoop in and allow us to solve this problem. And indeed SQL is one of the most common ways that web applications today, mobile applications today store any amount of data at scale. They are most likely not using simple CSV files. they are using SQL light or MySQL or Postgress or Oracle or other commercial and open source incarnations of SQL databases and odds are IMDb might be using the same as well. All right, so let's go ahead and do this. I have created in advance a file called shows db that contains hundreds of thousands of rows from TV shows and TV stars and other data from IMDb itself. And in a moment we'll see a database that if drawn as a picture looks a little something like this. There is going to be a people table. There's going to be a shows table. There's going to be a stars table that somehow links the two. There's also going to be a writer table and a ratings table and a genres table. So overnight this sort of escalated quickly from just favorites which was a single table to now a real world data set that has six tables. But here is the relational in relational databases as these arrows are meant to imply. Right now, there are relationships across these several tables. Case in point, here is people here. And we'll see in a moment that a person in the IMDb world has an ID number, a name, and a year of birth. A show in the IMDb world has a unique ID, a title, the year it debuted, and a total number of episode. But there's no mention of people and shows. There's no mention of shows and people. But per the arrows, there's going to be this third table here, stars, that somehow links show ids with person IDs. And this is where relational databases get really powerful because you can solve all of those redundancy concerns and actually enable yourself to select data much more quickly instead. But let's focus on something simple first. Let's focus just on the shows table, which pictorially might look a little something like this. So, in just a moment, I'm going to go ahead and reopen VS Code, and I'm going to open up instead of favorites. DB, I'm going to go ahead and open up uh a file called shows.db, which again, I arrived with in advance. So, if I open up with SQLite 3 shows db and hit enter, I'm back at a SQL prompt. Let me go ahead and type schema shows just to show you what command created this here table. And it got a little more interesting already. Notice that the table is called shows and it's got 1 2 3 four columns. The an ID for each show, a title for each show, the year it debuted for each show, and the number of episodes. There's also clearly some mention of types and some other keywords that we haven't yet talked about. But let's focus now first on just what the data is. The best way to wrap your mind around a new data set if someone hands you a SQL uh database or you've imported a CSV into a SQL database is just select some data. So select star from shows semicolon. That's a lot of data flying across the screen. It's not very easy to see because some of the show names are apparently crazy long and so it's wrapping, but it's still going and going and going. I'm going to hit control C to interrupt it. C as uh with our terminals in general is your friend. Let's run that same command, but just limit it to the first 10 shows. So, there are the first 10 shows in the IMDb database of TV shows. So, we've got 10 rows in this data set going back to it looks like the 1970s is roughly where their data set starts. All right. So here's the data we have in here. Well, how much is there? Well, let's go ahead and check. So, select count star from shows semicolon. And now we're talking. There's 250,87 shows in this database. And if I do the same for people, select count star from people semicolon. Looks like there are 74,315 TV stars associated with this year data set. So here too the data is much more interesting and much more representative of real world data. All right. How about the ratings? IMDb if unfamiliar is also a place where you could go to check the ratings from users as to whether something is good uh show a good show a bad show or anything in between. So let's do dots schema ratings and I'll see that yeah there's this table called ratings that as we saw briefly on the screen there's a show id and then a rating and then the total number of votes that contributed there too and again some data types and other syntax that we'll get to before long but let me go ahead and just do select star from ratings limit 10 just to get a sense of what the data is. That's now what the data looks like in that table. So to a human at a glance, not that useful because you don't know what those show ids are. But in a moment, we're going to see how we can reconstitute this data by linking these tables together by way of those ids and actually get answers to questions. So among other things, a SQL database or a relational database more generally supports onetoone relationships whereby a row in one table can map to a one row in another table. So it's this is in contrast to one to many for instance. So one one means one row over here somehow relates to one row over here. Again the relational in relational database. Uh how might we go about uh seeing this? Well first here's a tour of the data types that SQL light supports. Uh whereas in C we had a somewhat similar list and in Python that list went away at least with regard to explicit types in SQL we're back to when creating our tables explicitly stating what the types of those uh columns are. So you have integers, you have numeric, which is more of a catch-all for things like times and dates and other useful real world data. You have real numbers which are like floats with decimal points. You have text which we've seen already. And then you have blobs which is a great name which stands for binary large objects. You can actually store raw zeros and ones like files in the database. Generally that's frowned upon to store files. But there's certain times where you do want to store binary data and not pure text. That's it for SQL light. There are only these five types. in uh other commercial and open- source SQL databases like Oracle and MySQL and Postgress and the same names I keep rattling off, you have even more data types than these. So that's among the additional features you get by using other databases as well. There's a few keywords though that are worth noting in SQL. You can specifically say when creating a table that this column cannot be null. If you don't want timestamp for instance to ever allow for null values, you can literally specify when creating that table, this column cannot be null. And if I try to insert data into that table with a null value as by not providing a timestamp, the insertion will fail. And so here's where things are different from just writing Python code or certainly using a spreadsheet. You can actually have built-in defenses so that you and no one else messes up your data by inserting bogus or blank data accidentally. You can further say that things must be unique. So every element, every cell in a column must be unique to ensure that you can't accidentally put two things with the same ID. Two Harvard ids, two employee ids that are duplicates. You can avoid that all together. But more importantly, relational databases support these two concepts, primary keys and foreign keys. And this is where the magic really starts to happen. A primary key is the unique identifier for a table. It is the column of values that uniquely identify every row. So it's probably going to be the show ID, the person ID, the Harvard ID, the employee ID. Anytime you have a value, often numeric, often integral, that uniquely identifies rows, you simply call that a primary key. When that same ID appears in another table for cross referencing purposes, you refer to it instead as a foreign key because that same key is over there in another table, thus foreign. But they refer to one and the same things in the context of the table in which it's defined. It's primary. If it appears in some other table, it is now considered foreign. All right. So, how can we make use of this? Well, let me go ahead and propose that we execute a few SQL commands as follows. If I wanted to start asking questions about ratings, I could do something like this. Select star from ratings where the rating is maybe a good show. So, let's call it 6.0 or higher. But let's just limit this to the top 10 shows that meet that threshold. Enter. So here I now have a temporary table that gives me three columns from the ratings table. Show ID, which is a for the moment useless identifier because I don't know what show it corresponds to, but the rating value and the number of votes that contributed there too. Well, how might I actually get to the shows that are actually highly rated at 6.0 or higher? Well, I don't need to select star. If all I care about is these top 10, I can whittle this same command down to just selecting the ratings. And now or sorry uh sorry, not the ratings, I can whittle this uh this table down to just selecting the show ids. So this is the answer to the question. What are the top 10 TV shows whose ratings are 6.0 or higher? Well, from the table, these are the first 10 that come back. How do I now select the shows that correspond to these values? Here's where things can be done a few different ways. I could select everything I know from the shows table where the ID of the show is in the following set. I'm going to do a parenthesis and then just for readability, I'm going to hit enter. The dot dot dot and angle bracket just means I'm continuing my thought. It's not executing the command yet. What is the query I now want to run? Well, it's going to be a nested query. I can now do the same thing as before. Select the show id from the ratings table where the rating is really good greater than or equal to 6.0. But let's then limit the total number of queries to just 10. So here just like in sort of grade school math we have parenthesis. So the first thing that's going to be executed is the thing inside parenthesis. So this is going to get me every show ID from the ratings table that has a really good rating of 6.0 or higher. That's going to return to me a column of values. I'm then going to say select star from the shows table where the ID of the show is in that list of values but only show me 10 of those is what I'm asking here. So what I should now see is much more useful data namely the 10 shows that are highly rated. Enter. And indeed I get back these 10 shows all of whose ratings are indeed quite a bit higher. If I want to only care about the title that too I can do. So let's do this again. Instead of selecting star, let's select title from shows where the ID of the show is in the following parenthetical. Select show ID from ratings where the rating is greater than or equal to 6.0. Close my parenthesis. Limit to 10. Enter. And I see the exact same thing, but just the nail being hit on the head. Just give me the titles of those top several shows. Of course, I might want to might be able to do this differently. In other words, here's the top 10 titles. Well, what are the ratings? Like, that's why you go to IMDb or Rotten Tomatoes or the like. You want to see the actual ratings, not the titles or the ratings. Well, it turns out we're going to need another technique to do that. Namely, an ability to join two tables. And in fact, just as a teaser for this, if we want to start playing around with some real data, here might be, for instance, excerpts from two tables. Here's the shows table at left. Here's the ratings table at right or a subset thereof. If I want to figure out what the rating is for a given show, wouldn't it be nice if I could somehow like line these two tables up together such that just like the tips of my finger, I line up this value with its corresponding value over here, a cross reference of sorts. Well, just for the sake of discussion, let me just kind of visually flip this around. Though that does nothing technically underneath the hood. Let me just scooch them together now after highlighting the common values. demonstrate that. Well, wouldn't it be nice to take the shows table and join it with the ratings table in such a way that those IDs all line up? And we're going to have the ability to do just this. Um, this is a lot already, and this isn't the sort of cliffhanger I'd wanted to end on cuz who cares about joins, but it's going to be cool. But let's take our 10-minute Halloween candy break and come back in 10 for the next. All right, we are back. So, recall where we left off was essentially here. We had these two tables. the shows table at left and the ratings table at right. And the motivation here was like how do we actually associate shows with their respective ratings because the ratings of course are not in the shows table. As an aside they could be and in fact because this is meant to demonstrate a onetoone relationship whereby every show has one rating. We could have just put the rating and the number of votes into the shows table but we chose not to because uh IMDb actually stores their ratings as a separate TSV file. And so what we tried to do for par with that is only import into a ratings table the very TSV file that we had downloaded from them. But that too would be a solution there too. So at this point in the story we've got the shows table here. We've got the ratings table over here. We've noticed that there are commonalities. There are show ids that appear in both tables. And in fact to use some of the new vernacular this is the primary key. The ID column here. This is that same value but in this context it's known as a foreign key because it's in some other table. But that's going to be how we link these two things together. So, how do we select for not just The Office, but maybe every TV show its respective rating? Well, let's go back to VS Code and at my SQL light prompt, let me go ahead and do this. Select star from the shows table. But let's go ahead and join the shows table with the ratings table. How do I want to join these two tables together? We'll do so on the shows tables ID column being equal to the ratings tables show id column and then go ahead and filter the results in the following way where the rating we care about should still be greater than or equal to 6.0 and let's only limit this to the top 10 results. So, it's a bit more of a mouthful, but what I'm doing is selecting everything from the result of joining shows and ratings on this column with this column. And the rest of the predicate is as before. So, join is going to do literally that join these two tables as I have prescribed. When I go ahead here and hit enter, now that I have my semicolon, I get back a complete table containing everything from the shows table, everything from the ratings table with those unique identifiers lined up. Indeed, if you look at the primary key over here, the ID column, 62614 dot dot dot. Over here, you have show ID, which came from the ratings table, 62614 dot dot dot. So, we've taken two tables and really joined them together, but we're only seeing a subset because I limited it to 10 such rows. Now, of course, most of this data doesn't seem very interesting if my whole goal is just to tell me what the ratings are for these shows. Well, let's go ahead and in code achieve this sort of result. Let's literally join these tables together. Let's get rid of the redundancy all together. And then really, let's whittle it down to just a title column and a rating column. So, how do we do that? Well, in code, I'm going to go ahead and select more specifically the title of every show and the rating of every show from the shows table, but I'm going to join it with the ratings table on shows doid equaling ratings.show id. And as before, I'm going to limit it to where rating is greater than or equal to 6.0 and 10 such results. Enter. And now I have a nice simple temporary table that in one column has the titles of these shows and in the right hand side has the ratings of the shows. Even though those two data sets were completely separate in two separate tables. Indeed, if we think back to where this data came from, what we've been focusing on is the shows table and we've joined it with the ratings table. Here's the primary key for shows. Here's the foreign key for ratings. And by convention, notice that we've adopted a certain uh a certain approach. Anything that's called ID here implies that it's a primary key. Anything that's something underscore ID implies that it's a foreign key. And the convention we adopted which is actually quite common is if the table is called shows plural, we call the foreign key show singular ID. Different companies, different communities will have different practices, but we've been consistent across all of these tables with our underscore and lowercase conventions. Yeah. I'm just curious on how these IDs all generate and relate to each other properly. >> Really good question. How do all these IDs generate and relate to each other properly? Well, in our case, I have no idea. The Internet Movie Database people came up with these unique identifiers somehow and we simply in incorporated them into our data set. In practice, what they probably did and what you will do for instance in future problem sets when generating data is you just assign an arbitrary integer starting at one then two then three then four then five and you just let it auto increment all the way up and you let the database ensure that you never have duplicate values. >> Yeah. >> Just to clarify for the dot dot dot and arrow symbol that's only to like make it look better, right? like there's no like >> correct the dot dot dot in uh uh angled bracket that you keep seeing is just the continuation prompt which means I have prematurely hit enter deliberately because I want to move everything onto the next line so it doesn't wrap ugly onto multiple lines it is not SQL syntax it's specific to SQL light 3 and it's just a continuation of the thought that's all good good observation yeah >> when you limit it to 10 showing how Good question. When you limit something to 10, for instance, which ones do you get? You just get literally the first 10 rows from the table. And so it will typically be ordered if you don't use the order by uh keywords uh in the same order from which it came from those tables. And so you're just seeing arbitrarily the first 10 that match that predicate, which is rating greater than or equal to six. We have not ordered it by rating. So I'm not getting like the 10.0 shows necessarily. I'm just getting the first 10 shows that are greater than six. And the point for that is just I want it to fit on the screen rather than see hundreds of thousands of answers. Okay. So you might recall now that there were certainly other tables besides these. So let's see in the broader scheme, not just shows and ratings, but let's focus on genres. If only because genres is interesting because it's no longer a onetoone relationship because of course why would a show have multiple ratings. It sort of has its own rating. But a show could certainly belong to multiple genres. You could imagine a show being a comedy and a drama or a musical and a comedy or any other number of combinations of one or more genres. And so the way we've chosen to implement that here too is with a separate table called genres which is not perfect. There's going to be some redundancies here that we have not yet eliminated. But it does indicate that we can go ahead and have multiple such values associated with each and every show. So how do we get there? Let's focus just on this. Let's go back in just a moment to VS Code and let's take a look at the schema for now genres. In genres, we have the following. A table called genres which got has two columns. A show ID which is an integer that cannot be null and a genre which is text which is also not be null. And now for the first time, let's actually use some of the vernacular we've introduced. Here we have an example explicitly in SQL that specifies when creating this table that it shall the show id column shall be a foreign key that references the shows tables ID column. And admittedly I think the syntax for creating tables is a bit of a mouthful even. I often have to read uh to look it up to remember the order of everything. But here we have the columns listed first and then these key constraints. Foreign key referencing this primary key over here. And in fact, let's rewind to look at the shows table now to see from which uh from whence we came. So if I do do schema of shows, which we've done before, but waved our hand at it, then we'll indeed see that shows has a primary key called ID, which is an integer. How do I know that? Because the very last thing in the parenthesis says that the ID column in this table is a primary key. Then we see that uh the title is text can't be null. The year is numeric, which again I described as sort of a catchall for other real world numeric types that aren't purely integers or uh real numbers per se. Episodes is an integer. Both of those apparently can be null because maybe IMDb just doesn't have that data for some older shows, but primary key is indeed specified here. And just for thoroughess, let me distinguish now genres from ratings. If I do schema ratings again, which we waved our hand at earlier, very similar in spirit to genres in that there's an ID column that somehow references the shows table and then some other column here, genre. In this case, we had ratings and votes, which were reals and integers respectively. But notice this one additional constraint here. I deliberately specified that show ID in the ratings table must be unique. That is to say, you cannot have the same show ID more than once in the ratings table. Why? Because I indeed wanted a onetoone relationship. And it would not be one one if there were multiple show ids that correspond to one uh ID in the shows table itself. But genres, we're going to allow that it's uh can be duplicates. And so we don't have mention of unique there. All right. So where does this get us? Well, let me go back into uh my terminal here after clearing all of that. And let's go ahead and just see the data to wrap our mind around it a little more uh real. So select star from genres limit 10 just to see the the first 10. All right. So it looks like there's some comedies, adventures, comedies, family, action, sci-fi, and so forth. Well, let's go ahead and look up just one show's information. In fact, I saw this number, this ID before. How about let's just look up this show. What is this adventure show? Uh 63881. So select star from shows where ID equals 63881 semicolon. Okay. So this is the show called Catweel from 1970 which had 26 episodes in total and that was indeed its unique identifier. So that's all fine and good if I want to see something about that specific show. But as before, how do I associate Cat Weasel in this case with all of its genres? Well, instead of it being a onetoone relationship necessarily, maybe Cat Weasel is not just an adventure. Maybe it's also a comedy and a family show. And indeed, if I go back to the results just now, you'll see that 68111 indeed lines up with adventure, comedy, and family. And then the ID changes to be about some other show. So, how do I select these three answers to the question, what genre is Cat Weasel? Well, for this, we need to talk about one to many relationships and how we can get those back. Well, let's go ahead and do this now in my terminal. Let me go ahead and say uh the following. Select genre from the genres table where the show ID equals just that 63881, which I'm now starting to memorize, adventure, comedy, and family. So, that's the answer to the question, but this certainly isn't the best way to do this where you have to like look up the unique ID for the show you care about, then copy paste it or memorize and type it out into this query just to get the genres. It would be nice to just ask all of this in one breath. Well, we can do this even though it's a bit more verbose. I'm going to instead this time say select genre from genres where the show id I care about equals and now I'm just going to hit enter so as to move this nested query inside of parenthesis and I'm going to say well I don't know off the top of my head what the unique ID is for catw weasel but I can ask the database select the ID from the shows table where the title of the show equals cat weasel and this now obviates the need for me to memorize or copy paste that unique ID I'll hit enter and close my parenthesis. Uh, I'm going to go ahead then and say uh, semicolon enter. And now I get back the exact same answers, but without having to know or care about these numeric values. And that's kind of the point here. Even though the database itself, the actual IMDb website needs to use these unique identifiers to store everything in the database, we humans, generally speaking, should not know or care what these identifiers are. They're just meant to implement this notion of relationships, these cross references. And so here we see an example where you can ask the question you care about without worrying about any of the underlying numbers or even seeing them as a result. All right. Well, what's really how else might we go about do doing this? Well, let me propose that we join these two tables and ask the question in a slightly different way. So, here's an excerpt from the shows table. Here's an excerpt from the genres table. And clearly we could do something like we did before for ratings where we could line these two up and kind of join them together. Just for the sake of discussion, let me flip these columns around though that has no technical significance. And now we can clearly see 63881 appears there and here. The difference though because now this is a one to many relationship is that it's not quite as simple as just joining the rows together. I need to kind of join it here and here and here. And the database can do this for you albeit at some cost in redundancy. So what I'm going to observe is that these ids are all the same. Primary key in this context, foreign key in this context. Well, I'm going to start to join them together here, but it's not possible to return a temporary table that's just outright missing data. You have to get the same number of rows and columns everywhere in a grid. So what the database is going to do if I do join these two tables together and they are participating in a one to many relationship with each other, it's going to duplicate the data that's necessary to sort of make every row look the same. Downside is it might indeed be taking up some additional space unless the database is smart and somehow using pointers or something like that underneath the hood to avoid the redundancy. But for my purposes, this is actually quite nice because if I iterate over these rows, as I could in Python, as we'll eventually see, it's just nice to have all the data you care about in each and every row, even though it's clearly redundant. But the data is not being stored redundantly in the data. It's just temporarily being presented to me with this here, redundancy. So, what do I really want to have happen? Well, I really care about actually joining these two tables together and ultimately just getting back the title and the genre respectively. So, let me go ahead and my VS code here and do select title and genre from the shows table. But let's join it this time on the genres table on shows ID equaling genres.show id. So that's quite the same as with ratings where uh the ID equals just for time sake 63881 which I know is Catweasel but I could certainly use a nested query if I wanted to do this as before. Enter. And I get back Catweel's three genres. And if I were to loop over this data in some kind of like Python code, I would have access to the title and genre with each iteration, which I claim is useful. But if I don't care about that and I just really want to select the genres, I can do this with joins too. Let me just select the genre from shows joining it on genres on shows ID equaling genres. ID where the ID is catw weasel 63881. And now I get back just that answer. So in short, what have we just seen? One, you can join two tables together and whittle down the temporary table to just the data you care about. Or if you prefer, and if I scroll back up in my history here, you could take a fundamentally different approach but still get the same answer of simply using a nested query. I would say as you learn SQL for the first time, I think it's quite often easier to just do multiple nested queries because you sort of work your way uh from the inside out, taking sort of baby steps to the problem. If the problem in question is give me all of the genres for a specific TV show, well, first I need to know because I know how the data is laid out in the database. I need to know the unique ID of the show I care about. Fine, that's pretty straightforward and hence this inner query. Once you have that, you can parenthesize it and on the outside now you can select the question to which you really want the answer, which is what is the genre that lines up with that show ID one or more times. So in short, nested queries probably easier and certainly when learning it for the first time, but quite powerful are these join queries where this achieves the exact same result. Especially if I were to generalize away the 63881 and do a nested query here. Sometimes you want join, sometimes nested queries suffice. >> How does SQL do all these searches? >> Oh my goodness. How does SQL do all of these searches? What's its time complexity? We'll talk about that toward the end of today. In the most naive implementation, SQL is essentially just doing linear search from the top of the table all the way to the bottom. However, we as the programmers are going to have the ability to optimize those queries so that the database can actually do something closer to binary search and in general we'll be able to achieve much better performance as a result. A really good question. All right, let's go back to the big uh flowchart of this data set. We've looked now at shows and ratings. We've looked at shows and genres. Let's now focus on the juiciest part like the part that associates shows with people. That is who stars in what. Thinking back now to what I was mocking up in the Google sheet at the very start whereby I wanted to somehow be able to associate the office with Steve Carell and John Krinski and Jenna Fischer and so forth. The right way and the right way I claim is going to be like this. Here's my people table which has a primary key of ID and then the name of each person and their birth year if known. Then we have the shows table which we keep talking about which again has a primary key, a title and year and episodes thereof. And then the stars table is somewhat new now because now when it comes to people starring in TV shows we have a third and final type of relationship, a many to many relationship. Why? Because it's certainly the case that one person can be in multiple shows. And it's certainly the case that some shows have multiple people hence many to many. So this is the third and final relationship where just to recap ratings was one one genres was one to many and now stars is going to be many to many. All right let's dive in. So these queries will be a bit more verbose but again they're going to follow this principle of sort of taking baby steps to the answer we care about. Let me go back into VS Code here and suppose I want to find out everything about the office that we know. So, select star from shows where title equals quote unquote the office semicolon. Well, that's interesting. There's a whole bunch of offices. There was the UK version. There's a few other variants, but the one we're probably talking about with these stars is the one that started in 2005 with 188 episodes. That's the US version in fact. So, let me be a little more precise. Let me select everything I know from the stars from the shows table where the title equals office and year equals 2005. so we don't confuse our answers with the other versions of the office. Now, how do I go about selecting all of the people who starred in that version of The Office? Well, I already have an answer to the question of what is the ID of that version of The Office because it's right there in front of me. And in fact, I can narrow my query more precisely. Let's just select the ID from the shows table where the title is the office and the year is 2005. 386676. Now, I could lazily just copy paste that or memorize it, but we're going to do this query more dynamically. I want to next though figure out who is in that show. So, if I have a show ID, I want to figure out who's in it. But how do I get to the people and the names of those people? I have to logically go through this cross referencing of the stars table. So, here's where this query is going to be a bit meteor than the past ones and that we need to do a bit more work than before. All right. Well, what's the work I need to do? Let me go ahead now and do the following. Select all of the person IDs that are associated with this show id. So, how do I do that? Select person ID from the stars table where the show ID equals and I could lazily copy paste this, but let's avoid that. Where the show ID equals, let me now in parenthesis do this. select ID from shows where title equals quote unquote the office and year equals 2005 and then close my parenthesis semicolon. So what am I doing? I'm taking a second baby step if you will. The innermost query inside the parenthesis is just again dynamically figuring out the unique ID of the office I care about. The outer query is now figuring out all of the person IDs associated with that show as per the stars table. And the stars table has only two columns. Show id and person ID. That's how the linkage is done just with those integers. Enter. I now have a column of person IDs that are starring in that version of the office. So how do I take this one final step if I really want to care about their names and not their random person IDs? Well, I could go ahead and select the name from the people table where that person's ID is in the following set. So when I'm dealing with a single value, I just use equals for equality. But when I'm dealing with a whole result set, a whole column of answers, I use the preposition in in SQL instead. So where the person's ID is in the following data set. Well, let's do the same query as before. Select all the person IDs from the stars table where the show ID I care about equals because there's only one show I care about. I'm going to further parenthesize this. Select ID from shows where title equals quote unquote the office and year equals 2005. Uh, enter. I'll close my parenthesis. Enter. I'll close my parenthesis. Semicolon. And now from the outside in, I've taken three baby steps. The innermost one just gets me the show ID. The second one in the middle gets me all of the related person IDs. And the last one is really the final flourish. Get me all of the names of these people based on those IDs. Enter. And now we see all of the stars in this show beyond even the subset that we've been playing with visually on the screen. Okay, that's a lot. Let me pause here and see if there's any questions. Yeah, >> this outermost query is what gives me the names. But that query needs to know the ID of the person who name whose name you want. So the middle query actually gets all of those person IDs. But to get those person IDs, I need to know the show id. So the innermost query, this one gets me the show ID of the office itself. All right. So at the risk of overwhelming, here are other ways you can solve the same problem. But I do claim that the nested selects is probably conceptually and pragmatically the easiest way. But let's also solve this problem by doing a few joins just so you've seen it. Actually, before we uh do a join, let's let's flip the question around first. How about all of the shows that Steve Carell has starred in besides The Office? So, let me select everything I know from the people table where the name of the person equals quote unquote Steve Carell semicolon. All right, there seems to be only one Steve Carell in IMDb born in 1962. That's all nice and good. What I really care about is his ID. So, I'm going to uh narrow this down to selecting just his ID. Now, I could memorize or copy paste 136797, but don't need to do that. Let's just use this as part of a nested query. Let's now select all of the show ids from the stars table that are somehow related to Steve Carell's person ID. So where person ID equals and I could copy paste this but that's generally frowned upon. So let's not do that. Let's just set it equal to a nested query where I do the same thing as before. Select ID from people where name equals Steve Carell. Then close my parenthesis semicolon. All right. He's been in a lot of TV shows, but this is not useful because I have no idea what all of these integers are. So, the final flourish, select the title from the shows table where the ID of the shows I care about is somehow in this parenthetical list. Well, what's that parenthetical list? Well, select the show ID from stars where the person ID equals Steve Carell's. What is his ID? Well, I didn't memorize it. So, I'm going to select ID from people where the name of the person I care about is Steve Carell, quote unquote. Close these par this parenthesis. Close this parenthesis. Semicolon. Enter. And now I see all of Steve Carell shows. And even though we're doing this in a black and white command line environment, think about what the actual IMDb is doing with both of these queries. If you go to IMDb.com and search for Steve Carell, even though there's going to be a lot of colors and pretty pictures and whatnot, you'll probably get in some form a list of all of Steve Carell shows. Or if you search for The Office, you'll get a list in some form of all of the stars there in. I could claim then that if imdb.com is using SQL, which it very likely is, but not necessarily, they are executing queries just like we did. And when you type into the search box something like the office or Steve Carell, they're essentially just copy pasting your user input into a prefabbed SQL query that they wrote in advance so as to get you the answers that you actually care about. So this is how a lot of today's websites and mobile apps are actually working. The programmer comes up with sort of the template for the queries you might ask and then you supply the actual data you're searching for. All right, how about now as promised a couple of other ways to implement these many to many relationships uh based queries but by using joins. If I know I need to involve the shows table, the people table and the stars table, I can actually do this all in one breath without any nested queries. Select for me the title from the shows table. But let's join that on the stars table on shows do ID equaling stars dot show id. Uh but let's additionally join the shows table on the following. Let's join it on people on stars.person id equaling people id. In other words, if you know conceptually that you've got these three tables, you want to somehow combine them without using nested selects. just figure out how to line them all up. So again, I'm selecting from the shows table, but I'm joining it with the stars table by lining up the shows tables primary key with the stars tables foreign key. And I'm lining it up with the people table by lining up the stars tables foreign key with the people tables primary key. I'm just kind of logically connecting all of the things I know to be related. And lastly, let's just say where the name I care about equals quote unquote Steve Carell semicolon. It's a little slower for now. And this speaks to the question that was asked earlier. How is the database doing this? Well, slowly, apparently by default, unless we optimize it, I got back essentially the same results. Although there is some duplication as a result uh which alludes to the um filling in blank of blanks that I alluded to earlier. But let me show you one other technique too. But again, I would encourage you certainly for problem set seven to focus on nested queries when you can because they're a little conceptually simpler. If I care about the titles of those shows, I could select title from the shows table and the stars table and the people table all at once in one breath. But I want to do so where the shows tables primary key equals the stars tables foreign key. uh and the people tables primary key equals the stars tables foreign key and the name I care about is Steve Carell. In other words, this is just a third way to express the exact same idea by doing implicit joins by selecting data clearly from all three tables as per this commaepparated list of table names, but telling the database with your predicate, the wear clause, how you want to line all of those tables up. If I hit enter here, cross my fingers, I should get back the same results as well, albeit with duplication, which I didn't see in the nested queries. Okay, that too was a mouthful. Let me pause here for questions. Yeah, >> to do that, >> correct? In order to do this, you as the programmer must know the internal structure of the database, which is quite often the case, whether you created the database yourself or you work with a colleague who designed the schema for the database. That said, I think your question is hinting at sort of the challenge like I really need to know the underlying implementation details when really all I care about is the answers to my questions. In code quite oftenly nowadays um there are object relational mappings whereby you can use OMS for short whereby you can use libraries that they understand the underlying database schema. You as the programmer do not need to because it figures out how to do all of the joins for you. So for CS50 we're introducing everyone to the bottom up understanding of how these joins work. But that too can be easily automated because of those schemas. Yeah. Just notice when you're typing across you indent is indentation important in SQL. >> Good question. Is indentation in SQL important? Technically no. But like with any of the languages we've talked about thus far, it is good for the humans and certainly good for the students in a context like this. Python of the languages we looked at is the most rigorous whereby indentation very much matters and the consistency thereof. SQL I'm just trying to pretty print things to make it easy to gro visually. All right. So those last two queries were arguably kind of slow. Whereas with my nested queries, I actually got lucky and just boom, I got the answer quite quickly. Those joins seem to be a step backwards and that it was taking more time to get back the same data that I actually cared about. But that's something we can actually chip away at. It turns out that one of the other values of a relational database visa v something like a spreadsheet is that you can actually tell the database in advance how to optimize for certain queries. This is not the case for spreadsheets. If you have a lot of data in Google spreadsheets or Microsoft Excel or Apple Numbers, tens of thousands of rows, hundreds of thousands of rows, millions of rows, your computer's going to slow to a crawl. And at some point, those software packages are just going to say, "Sorry, file is too big." And they're certainly not going to be terribly fast at searching the data. But with a SQL database and relational databases more generally, you are as much the architect of it as you are the user of it in this case. And so you can tell the database in advance if you want to optimize for certain queries like select statements. So for instance, let me go back to VS Code here and just for the sake of discussion, let's time how long it takes to find all of the shows whose name is the office. I'm going to use a SQLite command called timer. And I'm going to set it to on. And this is just now going to tell me for every command I run how long it took. I'm going to now select everything from the shows table where the title of the show equals quote unquote the office close quote semicolon enter. And that query took let's say in real terms 0.042 seconds. That's crazy fast. Like it's less than a second. I mean it's truly a split second. So no big deal. But it's a fairly simple query. But I bet we could optimize even this. Now why would you want to optimize even queries that are already pretty fast? Well, if they're very commonly being executed, and I dare say someone going to imdb.com and searching for The Office or any TV show, like that's the common case. People are looking for TV shows, movies, actors, and so forth. It'd be nice to use as little amount of time to answer those questions as possible. Why? One, it makes for happier customers and users because you're getting them the answer faster. Two, it saves you money because presumably if you've spent $1,000 for a server and that server has certain amount of RAM, a certain speed CPU or brain, it can only do so many searches per unit of time, per second, per minute, or the like. So, wouldn't it be nice if all of those searches is faster using less time? So, you can handle not a thousand users at once, but 2,000 users or 5,000 users all with the same hardware. So, there's uh certainly upsides there. Well, how can I go about optimizing a query? Well, I can create my own index. Another use of the create keyword in SQL where I can tell the database to optimize for searches on a specific table and specific columns therein. I say create index and then I come up with a name for the index whatever I want on the name of the table that I want to index and then in parenthesis the columns that I want to optimize for. So what does this mean in real terms? Well, let's go back to VS Code here and let me create an index called for instance title index though the name doesn't matter on the shows table uh using the title column. In other words, tell the database please expedite searches on the shows tables title column. After all, that's what I just searched on. Enter. Now, that took a moment, almost half a second, but that's a table. That's an index that only has to be created once. If I do a lot of updates and deletes, it might actually take a little bit of time over over the course of using the database to maintain that index. But for now, that's a one-time operation, creating the index. But watch what happens now if I scroll up in my history and go to the exact same query as before, which previously took 0.042 seconds, which yes, is fast, but not nearly as fast as the new version, which is 0.001 seconds instead. orders of magnitude faster. So I can handle 4 uh2 times as many users on the same database so to speak than I could have previously just by building this index. So what actually is an index? Well, we come full circle to discussions in like uh week five of the class. So an index in a database is very often created using what's called a B tree. This is not binary tree. A B tree is its own distinct structure that's very similar in spirit in that it's fairly shallow because most of the nodes have children but it doesn't necessarily have two children. It might have more children. And in fact, the more children the nodes have, the sort of higher up you can pull all of the leaf nodes and the shorter you can make the height of the tree. So this is just a generic representation of a B tree. But what this implies is that when I am now searching for titles like the office, the database doesn't have to do the default behavior which is start at the top and use linear search all the way to the bottom. If it has proactively built up an index in memory thanks to my command, it now has a treel like structure storing those titles that allows it to find in some logarithmic time whether it's log base 2 or some other base the same data much more quickly. And that's how we went from 042 to 0.001 second instead in this case here. Questions then on these here indexes? No. All right. Well, let's propose that we can combine some of today's ideas. It turns out that now we're getting to the point in the course where you're not just choosing between this language and another. You're generally using a suite of languages to solve problems. And indeed, in the coming weeks of the class, when we transition to web-based applications, you're going to use a bit of Python, you're going to use a bit of SQL, you're going to use a bit of JavaScript and two other languages called HTML and CSS. You might be using like five different languages at a time just to build one application. Why? Because some of them are better for the job than others. And indeed, that's the ecosystem in which real world software development is done. Well, to make this bridge, we have a version of the CS50 library, recall, for Python, which has functions like get string, even though it's not that useful because it's just like the input function, but get int uh and get float. But also, in the CS50 library for Python, we have a module that specifically makes it easier to use SQL from Python code. After all, wouldn't it be nice if I could get the best of both worlds and implement like an interactive program in Python, but that uses SQL to actually get back data? Or I can build a website that allows people to search for TV shows or TV stars and actually get that data from a database, but use Python to generate the web pages themselves. Well, we have some documentation for this library here, but I'm going to go ahead and use it in real time to show you how much more easily you can solve certain problems by using each tool for what it's good at. So, let's go back to VS Code here. Let me exit out of SQL light and get back to my normal terminal. And let me go ahead and let's say minimize my terminal here. Uh, actually, let's go ahead and open up favorites.py, which is where we left off before. And recall that in the last version of favorites.py, we had simply used a dictionary to go about keeping track of how many of you said Python or C or Scratch. And when I last ran this program with Python of favorites.py, pi. The answer looked like this. Now notice that it's not sorted alphabetically, otherwise C would be first. And it's also not sorted numerically, otherwise C would be second. So it would be nice in Python to maybe exercise some control over this. But I stopped sort of doing that before because it gets very annoying quickly. And by this I mean the following. Let me go back into VS Code here uh and into favorites.py. And if I wanted to sort by uh the counts here, I could do this. Uh, I could change my loop from iterating for favorite in counts to favorite in sorted counts. So, this is actually not too bad thus far. I can actually sort dictionaries pretty readily. So, now if I run this and let me make my terminal a little bit taller so we can see both results. If I run the program now, you'll see that it's sorted alphabetically by key. So apparently when you use the sorted function in Python and pass it a dictionary, you can still iterate over all of the key value pairs in that dictionary, but it's been sorted now by key. So that's nice if that's to be my goal, but maybe that's not really my goal. And here's how alternatively I could sort by value, the 190, the 58, and the 24. I can still use the sorted function, but I need to tell Python to use a key, a sorting key of the counts dictionaries gets function. Uh, and then if I run it again, I now see it's sorted by value. But darn it, it's now sorted in the opposite order. I see scratch at 24, then 58, then 190. If I want to reverse it, well then I have to go up here and add another named parameter. Reverse equals true. I can run it another time. And now I get the result I care about. Long story short, this is just very annoying to have to use that amount of code to actually answer relatively simple questions. And this is why we did transition for much of today to a declarative language like SQL that just let me select what I care about in that data. So if I again I go back into my database version with SQLite 3 of favorites.db. I'll maximize my terminal window. What did we do before? Well, we can select uh from the database uh select uh let's see favorite comma count star from favorites group by uh favorite semicolon whoops. Oh, sorry. What did we do? We do select language, comma, count, star from favorites, group by favorite. Oh, damn it. What happened? Oh, we deleted it. See, this is why you don't use the delete or drop command. So, I'm not going to demonstrate this again, but recall uh before break that when we last selected this information, we used the group by command to actually group by the language in question and we got back all the counts. But then we were very easily able to reorder things by actually just using order by and then doing something in ascending order or for instance descending order instead. Well, now let's actually combine these worlds of Python and SQL together to write first a program that does just that. But to do this, we're going to need to restore that database. So let's go ahead and do this. Let's remove favorites. DB, which is just a file in my account. Let's go ahead and run uh SQLite 3 of favorites.d DB to create a new version thereof. Let's now go ahead and change my mode as we did earlier in class to CSV. Let's now do import of favorites uh CSV into a table called favorites. And now let's doquit. And when I do ls, okay, now it's back favorites.db in addition to today's other files. Now let me go ahead and run SQLite 3 of favorites. DB. And just as a sanity check, select star from favorites semicolon. There's all of the data back. minus the addition and subtraction that we ourselves made earlier manually. And let's go ahead and in SQL go ahead and do select language, count star from favorites and group by language, but let's order by count star in descending order. And that's one of the last commands we ran with this file. And there is the answer in a single line of code instead of some 17 lines of code plus or minus some white space here. Can we merge now these two ideas? Well, let's see how to do this. Let's go back into favorites.py here and make a new and improved version of it that actually uses SQL and no dictionary, no for loop, no try except or any of this. Instead, let's go ahead and from CS50's own library import a SQL function which will give me access to this functionality. Let's create a variable called DB by convention, but I could call it anything I want and set it equal to CS50SQL function and pass to CS50SQL function the path to the database file I want to open. This is a little weird, but the syntax here is SQLite without the three colon slash favorites. DB. This syntax, otherwise known as a URI, is going to allow us to use the SQL light lang uh uh protocol in order to open up favorites. DB, which is the very file I was just experimenting with manually in my terminal. Here now is how I can execute a SQL query in Python using CS50's library. Now, as an aside, even though this is indeed meant to be a training wheel, CS50's library is just easier to use than a lot of the real world libraries that makes this possible. So because we spend so relatively little time on this, we're still using this training wheel for this. Give me a variable called rows because I want to get back all of the rows from this table that contain those languages and e do db.execute. The only function that's useful in the CS50 library for SQL is this execute function which allows me to write literally a line of SQL like select language count star uh from favorites group by language order by count star uh descending order. Just to make my life easier, I'm going to add that alias trick that we saw before. So as n to change the count to the variable n. And then here I can just do order by n instead. It's a little long, but notice that now I'm using SQL as a string that I'm passing as an argument to this dbexecute function. So at the very end of this, I've got to close my quote, close my parenthesis so as to use one language in effect inside of another. Now assuming I do get back a temporary tables rows with that line of code on line five, let's do this. For each row in rows, go ahead and do the following. Create a variable called language and set it equal to row quote unquote language. Then create another variable called n, for instance, and set it equal to row quote unquote n. And then let's just go ahead and print out language and n respectively. So what does CS50's library do? It returns by design a list of rows. Each of those rows is a dictionary of key value pairs. So when I do for row and rows, this is just iterating over a list of values. And we've done that over the past couple of weeks. Inside of this loop, I'm just creating temporarily two variables, uh, language and n, to show you that each row is indeed a dictionary, which means I can index into it using strings like quote unquote language and quote unquote n because those are the columns that I selected using this query up above. Strictly speaking, I don't even need these variables. I can just get rid of that and a little more succinctly just pass in row bracket language and then row bracket uh n instead. So let me go down to my terminal window here, exit out of SQLite, run Python of favorites.py in this form, enter and I get back it would seem the same exact answer 190 58 and 24 in this case. questions now on this co-mingling of languages. All right, how about one final thing? Once we have the ability to like use Python, now we can in fact make things interactive. So for instance, let me close my terminal temporarily. Let me go ahead and now ask for some user input. So after opening the database, let's do this. Let's ask the human using Python's input function or equivalently CS50's get string function for their favorite TV show and store it in that same variable. Then let's do a SQL query that selects that data. Rows equals db.execute select and let's see how many people selected uh this favorite problem rather not TV show how about favorite problem from our favorites data set. So select count star as n from the favorites database where the problem in question equals well now I need to put the user's input. I don't know what that is yet because they haven't typed it in yet. So, what I'm going to go ahead and do is a placeholder and say favorite close quote and make this whole thing an F string. Then I'm going to go down here and I don't need to iterate because ideally I'm just getting back a single answer. How many people chose this problem as their favorite? So, I'm going to say that uh the row I care about is simply the first row. So, rows is a list. So, rows bracket zero is the first and only row in that list. And then let's go ahead and print out row quote unquote n. Let's see the result here and then see what happens. Let me put some single quotes here and single quotes here. Let me open my terminal. Let me do python of favorites.py and I'll say hello, world. Enter. And as before at the start of class, 42 of you like that. However, this is not not not how you should ever write SQL code in Python. What could go wrong with this code? Nothing went wrong a moment ago, but what could go wrong? Yeah, the user input. How so? >> True. I don't know what those are yet, but we're about to go there. What even more simplistically could go wrong by plugging in the user's input here? Yeah, >> like hello. >> Exactly. If I inputed the other problem we played with, hello, it's me where it was it apostrophe s that if interpolated right here is clearly going to confuse the uh single quotes such that who knows what's going to come back. Now, in the best case, the code might just not work and I'll get some kind of error in on the screen, which is not great for the user because the program is not going to be useful. There's no user friendly error message. But in the worst case, the user could do something incredibly malicious if you are simply blinding blindly trusting user input and plugging their input into a SQL query that you yourself constructed. Why? What if the user types something crazy like the word delete or drop or update or any of those destructive commands that we saw earlier and somehow tricks your code into executing maybe the select but then eventually an additional query like a delete. Maybe they type in a semicolon and then delete or a semicolon and then drop or something like that. This is the biggest threat to taking user input and trusting it in the context of databases. And it's called uh as one of your classmates knows already, what's known as a SQL injection attack. A SQL injection attack is the ability for an adversary or an unknowing user to somehow inject code into your database. A SQL injection attack then might look something like this in the real world. here for instance is like the login screen to github.com. Um they do actually use SQL among other languages underneath the hood I believe not necessarily for this but suppose they did and when logging into github.com you're prompted for your username or email address and then of course your password. Well, what if I know a little something about SQL and suppose for the sake of discussion, GitHub is using SQL light, which they're not using because it's not meant for massive large uh massive data sets like this. But suppose they are. And just to be malicious, I type in my username mailinharbor.edu, but then I use a single quote and then dash dash. Well, the single quote is there, me being an adversary in the story, because maybe I can confuse their code by closing their quotes sooner than they intended. And we haven't talked about this yet, but it turns out that dash in SQL is the comment character. So it's like hash in Python or slash and C. This in SQL means ignore everything to the right. That alone can be used fairly maliciously as follows. Here, for instance, could be the code that GitHub is using underneath the hood, whereby they might have some Python code, and heck, maybe they're using the CS50 library that executes this pre-made query. select star from the users table where the username equals this question mark and the password equals this question mark passing in username and password for instance. Uh but if they are trusting the username and password I typed in and just plugging it right there, they could be vulnerable to indeed a SQL injection attack. For instance, this code we'll soon see is actually the right way to do it. But suppose they were doing it with fstrings like I started to in my version of favorites.py. Same thing. Select star from users. where username equals this username and password equals this password and the little f here means here's a format string. What could go wrong? Well, let me actually paste in the mail at harbor.edu single quote- dash text here. Notice that this single quote and this single quote are meant to surround the username. And same thing for the password there. But watch what happens when I type in my data. Mail at harbor.edu single quote. So this would seem to finish the thought prematurely. and then it says dash dash and so that just means ignore everything else. And so the effect here is essentially to gray out all of that stuff because it's effectively been commented out. So what GitHub ends up doing accidentally in this case is selecting star from users where username is mailon at harbor.edu irrespective of what his password actually is. And if you assume that down here they've got some conditional logic like well if we get back some rows that means that mail is in fact a registered user. Go ahead and log him in. We don't know what the code looks like, so it's dot dot dot. You've just enabled anyone on the internet to log in as me or anyone else just by suffixing their input with a single quote and dash dash. And that's the least of our concerns. If we additionally went in there and maybe instead of dash we put a semicolon and then delete from users or drop users, we could cause massive havoc on their database. This happens all the time. Even now in the current year, you can Google around and see examples of companies that have not used proper sanitization of user input. And it's not just the intern. It's like random people on the internet are accessing or destroying their data maliciously. So what is the solution to a problem like this? Well, one, do not use format strings in Python to simply plug in user input. But the more important lesson is never trust users input. either they're going to do something accidentally or they're going to do something maliciously and you do not want that to happen. So the solution then is to use a library. Almost always use a library. This is not a wheel you should reinvent yourself. And by library I mean something like this. If you instead use a library like CS50s and you don't just use fstrings, you'll see in a moment you use question marks. What will happen is this. When the user goes and types in mailinharvard.edu single quote dash, that's fine. and let them put weird scary characters like single quotes in their input. The library will take charge of escaping user input. So anything dangerous in their input will be changed from one single quote to two because we saw earlier today that that's how you escape a character. And that means that now what you have is in effect my username is apparently meenhar.edu apostrophe dash and that's my username. Well that's obviously not a real email address. It's not a real username. This is just going to return false. No rows are actually going to come back. And the way to do this now in our favorites example analogously is in VS Code here to actually go up into this uh execute line. Don't use an F string. Change the value of problem to be a placeholder instead and then pass into this execute function one or more arguments that will be substituted in for that question mark. And this is not a CS50 thing. This is a uh industry convention whereby you quite often use literally a question mark. And that means that whatever this variable's value is will get plugged into that question mark for you. But the single quotes will be added. Any dangerous characters will be escaped for you. And at that point, you can trust that the user can type in anything they want. Your code is not going to break. You can see hints of this actually in the real world. If you've ever gone to a website and they tell you like, oh, you can't you like for passwords for instance, like all of us probably intuitively know that you should have pretty long uh hard to guess passwords with letters and numbers and punctuation symbols. Sometimes websites very stupidly prohibit you from using certain punctuation symbols, which should drive you nuts because there's no computational reason that you have to put the onus on the user to sanitize their own input. But quite likely those websites have kind of learned part of this lesson and they know some characters can be dangerous in SQL like semicolons or single quotes or the like and they just don't want you to ever type those in. Even though there are solutions to this problem, use a library that someone else smarter than you u with more history of writing code than you has used that's open source so that many people have seen it and banged on it over the years so that this problem is not something you're vulnerable to. questions then on what these here SQL injection attacks are all about. Yeah, >> I guess you're telling the user what not to use, you're also telling them what system you're using and so maybe that >> Good point. So if by also telling people what characters they shouldn't use, you're leaking information because a smart adversary might know, oh well, if they don't want me using that symbol, they're probably using this language or this technology. Yes, no good comes from telling the world more information than they need to know. So that's another good paranoia to have. How about one other issue before we come full circle to the SQL injection attacks. There's another challenge with relational databases and with SQL uh itself, namely race conditions. This isn't so much a problem when I'm writing a a little program here on my own computer. uh but when you're running SQL code on a database in the real world in the cloud where you have many different servers talking to that database and many different users uh talking to those web servers as is going to be the case at Meta and Google and Microsoft and any number of popular companies nowadays and even some of CS50's own apps uses centralized SQL databases where if multiple people are trying to do the same thing on them at the same time submit their homework run check 50 we too are vulnerable to what are called race conditions. So what is a race condition? Well, the way I learned this back in the day when taking a course on databases and operating systems uh more generally was to think of a scenario like this. Maybe in your dorm, you and your roommates have a little dorm fridge and you're both in the habit of really liking to drink milk as the story was told to us. And so maybe one of you comes home from class one day and you get get to your room, look in the fridge, there's no milk in there. And so you decide to walk across the street to CVS or some other store to get milk. Meanwhile, your roommate comes home from their class and opens the fridge and it's like, "Oh, we're out of milk. Let me go to the store, too." And for the sake of the story, they go to a different store altogether so that you don't run into each other and the problem solves itself. So now both of you are on your way to a store to get milk. Time passes. You both come home. One of you puts a jug of milk in the fridge. The other one gets home and is like, "Ah, damn it." Like we already got milk. I can't fit this milk in the fridge or now it's too much milk. We don't really like milk this much. It's going to go bad. Like very bad outcome here. Having too much milk is the moral of the story. But what's the what stupid story? What's the What's the real takeaway? Why did we find ourselves in a situation where we ended up with too milk, too much milk? >> We didn't know what the other person >> we didn't know what the other person was doing. And to really geek out on this, we inspected the state of a variable that was in the process of being updated by someone else. And this is a thing in computing as far back as Scratch. Recall with Scratch, you could have multiple scripts running at the same time for a single sprite because Scratch in effect is multi-threaded. You can have a single sprite doing multiple things in parallel by having those multiple scripts. Similarly, here your room is sort of multi-threaded because you have two independent beings who can both go to the store, solve the same problem in parallel. The problem though is that if one is not aware that the other is doing that work already, you might make poor decisions. So, in the real world, what should the first roommate have done after inspecting the state of the refrigerator and realizing, "Oh, we're out of milk." Okay, call the other roommate or maybe more simply like put a note on the door or like maybe dramatically lock the refrigerator somehow. And in fact, that's a term of art in databases is to actually use a database lock so that if you are in the process of updating the value in the database, lock it so that no one else can inspect the value of that database and potentially make a poor decision. So when might this actually happen in the real world rather than the contrived milk example. So there are a lot of social media posts nowadays that are quite popular. To this day, as of today, this is still the most popular Instagram post for instance. And imagine when this was first posted, hundreds, thousands, hundreds of thousands of people might have all been clicking the heart icon essentially at the same time. Now, Meta uh the company behind Instagram presumably has lots and lots of different servers, but let's suppose for the sake of discussion they have a single database, which is not true, but the danger is still there. Even with multiple databases, all of these different web servers are talking to the same database. And suppose those those servers are using Python code and hey the CS50 library that might look a little something like this in order to decide how to update the total number of likes for an Instagram post. The first line of code running on meta servers might say this. Get these rows as follows. execute a query like select the current number of likes from the posts table where the ID of the post is whatever it is 1 2 3 4 5 6 whatever notice no SQL injection attacks uh possible here because I'm using the placeholder not an F string then the next line of code running on meta server maybe just stores in a variable just to make the code more readable uh the first rows likes column so it's again it's the CS50 library in the story rows is a list of dictionaries so this is the first such element in the list and this is the likes column in the column we just selected the temporary table. Lastly, what do we want to do? Well, we want to plus+ essentially that total. So, we update the post table setting the number of likes equal to this question mark where the ID equals this question mark. And we didn't see this already, but the CS50 library supports indeed multiple arguments after the SQL string. I'm going to update the number of likes to be likes plus one. Plugging in the same ID of that post. So in short, take on faith that it's quite common that in order to achieve one small goal like updating the number of likes stands to reason you might need to do two database queries or three lines of code. Now if these lines of code are executing on multiple web servers, you could certainly imagine that if people are hitting the the like button pretty much at the same time, maybe one server is going to execute this first line of code and it's going to get its answer. Maybe there's a hundred likes at this point in the story. And then just by chance on another server, this line of code is also executed, but it too gets the same answer. There's currently a hundred likes. Meanwhile, the first server in the story continues to do its execution of code such that it updates the number of likes from 100 to 101. But because the other server was essentially running the same code in parallel, it's going to make the same mathematical decision and update the number of posts, the number of likes from 100 to 101. But at this point in the story, the number of likes should obviously be 10. and two, so we've lost data. And that's one of the dangers of a race condition is that you'll end up with an inaccurate result. And for a company like Meta, they don't want to go losing data like likes like this. Like that actually drives engagement and so forth. And so like that's genuinely a technical, if not a business problem as well. So it's analogous to sort of the milk problem, but actually at scale. So what's the solution? There's a bunch of different ways, but conceptually, we just want to lock the database when this logic is being executed such that when one server is updating the number of likes, no one else should be allowed to update the like count at the same time. Now, that's a little crazy for someone as big as Meta because you're really just serializing all of these likes and slowing things down. So, there's more fine grain control nowadays, namely called transactions, where you can essentially lock not the whole table and certainly not the whole database, but just the row in question, for instance. And so you would use commands in SQL like begin transaction and then execute the lines of code that you want. And then when you're ready to commit it, that is save it, you use the commit command. But if something goes wrong or you get interrupted, you can actually roll back the whole thing. And what this kind of code does in effect by using more verbose uh CS50 and Python code like this is you can ensure that those three lines of code inside or technically the two database queries inside will either both be executed or not at all. They will not be interrupted. And that's the fundamental solution to this problem analogous to putting a lock on the fridge or by leaving a note or calling your roommate preventing them from making the same decision themselves. questions then on these race conditions the solutions again even though this won't be gerine for CS50 simply using techniques like locks and what we called transactions no all right then a final moment to end on uh we would not be a computer science course if we didn't introduce you to a few pieces of CS cannon uh here is a sort of meme that's circulated for years when it comes to like optical character recognition OCR of like toll booths trying to detect your license plate automatically This is someone trying to have a funny old time tricking the city into deleting their database altogether. Because if you're just scanning this off of someone's license plate or front of the car and just blindly plugging it in without sanitizing their input, escaping their input with something like a good library, you might very well drop the entire database. As an aside, something did something similar too where I think they made their license plate null. NL, which just confused the heck out of the system, too, because the programmers didn't understand why null was all over the place when lights were being run and whatnot. And lastly, a very famed uh character in the world of XKCD as computer science circles goes is this. So we'll end as we've done before on an awkward silence as you process this here canonical CS joke. >> Now you two know who Bobby Tables is. All right, that's it for week seven. We'll see you next time. Heat. Heat. All right. This is CS50 and this is our lecture on artificial intelligence or AI. Particularly for all of those family members who are here in the audience with us for the first time. In fact, uh for those students among us, maybe a round of applause for all of the family members who have come here today to join you. Nice. So nice to see everyone. And as CS50 students already know, it's sort of a thing in programming circles to uh have a rubber duck on your desk. Indeed, a few weeks back, we gave one to all CS50 students. And the motivation is to have someone something to talk to in the presence of a bug or mistake in your code or confusion you're having when it comes to solving some problem. And the idea is that in the absence of having a friend, family member, TA of whom you can ask questions is to literally verbalize your confusion, your question to this inanimate object on your desk. And in that process of verbalizing your own confusion and explaining yourself, quite often does that proverbial light bulb go off over your head and voila, problem is solved. Now, as CS50 students also know, we sort of virtualized that rubber duck over the past few years and most recently in a form of uh this guy here. So, in students programming environment within CS50, a tool called Visual Studio Code at a URL of CS50.dev, they have a virtual rubber duck available available to them at all times. And early on in the very first version of this rubber duck, it was a chat window that looked like this. And if students had a question, they could simply type into the chat window something like, "I'm hoping you can help me solve a problem." And for multiple years, all the CS50 duck did was respond with one, two, or three quacks. Uh we have anecdotal evidence to suggest that that alone was enough for answering students questions because it was in that process of like actually typing out the confusion that you realize, oh, I'm doing something silly and you figure it out on your own. But of course now that we live in an age of chatgbt and claude and gemini and all of these other AI based tools came as no surprise perhaps when in 2023 this same duck started responding to students in English and that now is the tool that they have available which is in effect meant to be a less helpful version of chat GPT one that doesn't just spoil answers outright but tries to guide them to solutions akin to any good teacher or tutor and so today's lecture is indeed on just that and the underlying building blocks that make possible that their rubber duck in all of the AI with which we're all increasingly familiar, namely generative artificial intelligence using this technology known as AI to generate something, whether that's images or sounds or video or text. And in fact, what we thought we'd do to get everyone involved early on is if you uh have a phone uh by your side, if you'd like to go ahead and scan this QR QR code here, and that's going to lead you to a polling station where you can buzz in with some answers. Um, CS50's preceptor Kelly is going to kindly join me here on stage to help run the keyboard. And what we're about to do is play a little game and see just how good we humans are right now at distinguishing AI from reality. And so we'll borrow some data from uh the New York Times, which a couple years back actually published some examples of AI and not AI, and we'll see just how good this this technology has gotten. So here we have two photographs on the screen. In a moment, you'll be asked on your phone, if you were successful in scanning that code, which one of these is AI, left or right. So hopefully on your phone here, if you want to go ahead and swipe to the next screen, we'll activate the poll here. In a moment, you should see on your phone a prompt inviting you to select left or right. And feel free to raise your hand if you're not seeing that. But it looks like the responses are coming in. And at the risk of spoiling, it looks like 70% plus of you think it is the answer on the right. And if Kelly, maybe we could swipe back to the two photographs. In this particular case, yes, it was in fact the one on the right. Maybe it looked a little too good or maybe a little too unreal. Maybe. Let's see maybe a couple of other examples. So, same QR code. No need to rescan. Let's go ahead and pull up these two examples. Now, two photographs, same question. Which of these is AI? Left or right? left or right. All right, want to take a look at the chart, see what the responses are coming in a little closer in this case, but a majority of you think the answer is in fact left here, though 5% of you were truthfully admitting that you're unsure. But Kelly, if you want to swipe back to the photos, the answer this time was in fact a trick question. They were both in fact AI, which perhaps speaks to just how good this technology is already getting. Neither of these faces exists in the real world. It was synthesized based on lots of training data. So, two photographs that look like humans but do not in fact exist. How about one more? This time focusing on text, which will be uh the focus, of course, underlying our duck. Did a fourth grader write this or the new chatbot? Here are two final examples. Uh same code as before, so no need to rescan. And here are the texts. Essay one. I like to bring a yummy sandwich and a cold juice box for lunch. And sometimes I'll even pack a tasty piece of fruit or a bag of crunchy chips. As we eat, we chat and laugh and catch up on each other's day. dot dot dot. C. Essay two. My mother packs me a sandwich, a drink, fruit, and a treat. When I get into a lunchroom, I find an empty table and sit there and eat my lunch. My friends come and sit down with me. dot dot dot. The question now, lastly, is which of these is AI? One or two? Essay one or two? The bars here are duking themselves out. Looks like a majority of you say essay one. Let's go back to the text. And someone of you who one of you who says essay 1, why if you want to raise a quick hand? Why essay one? Yeah. >> Okay. And so essay 2 looks more like you would write. And can I ask what grade you are in? >> A fifth grader. So is this a new fifth grader or not? The answer here in fact is that essay one is the AI because indeed essay 2 is more akin to what a fourth or if I may a fifth grader would write. And I dare say there are maybe some telltale signs. I'm not sure a typical fourth grader or fifth grader would catch up on each other's day in the vernacular that we see in essay one. But suffice it to say this game is not something we can play for in the years to come because it's just going to get too hard to discern something that's AI generated or not. And so among our goals for today is really to give you a better sense of not just how technologies like this duck and these games that we've played here with images and text work, but really what are the underlying principles of artificial intelligence that frankly have been with us and have been been developing for decades and have really now come to a head in recent years thanks to advances in research, thanks to all the more cloud computing, thanks to all the more uh memory and disk space and information sheer volume thereof that we have at our disposal that can be used to train all of these here technologies. ies. So that their duck is built on a fairly complicated uh architecture that looks a little something like this where here's a student using one of CS50's tools. Here's a website with which CS50 students are familiar called CS50.AI AI where we the staff wrote a bunch of code that actually talks to what are called APIs, application programming interfaces, thirdparty services by companies like Microsoft and OpenAI that really have been doing the hard work of developing these models as well as some local sweet uh some local sauce that we CS50 add into the mix to make it specific the ducks answers to CS50 itself. But what we've essentially been doing is uh something that with which you might be familiar in part prompt engineering which has started popping up for better or for worse on uh LinkedIn profiles everywhere. And prompt engineering really it's not so much a form of engineering as it is a form of asking good questions and being detailed in your question giving context to the underlying AI so that the answer with high probability is what you want back. And so there's two terms in this world of prompt engineering that are worth knowing about. So in CS50 has leveraged both of these to implement that duck. We for instance wrote what's called a system prompt which are instructions written by us humans often in English that sort of nudge the underlying AI technology to have a certain personality or a specific domain of expertise. For instance, we CS50 have written a system prompt essentially that looks like this. In reality, it's like a lot of lines long nowadays, but the essence of it is this. You are a friendly and supportive teaching assistant for CS50. You are also a rubber duck and that is sufficient to turn an AI into a rubber duck. It turns out answer student questions only about CS50 in the field of computer science. Do not answer questions about unrelated topics. Do not provide full answers to problem sets as this would violate academic honesty. Answer this question colon and after that preamble if you will aka system prompt we effectively copy paste whatever question a student has typed in otherwise known as a user prompt. And that is why the duck behaves like a duck in our case and not a cat or a dog or a PhD, but rather something that's been attenuated to the particular goals we have pedagogically in the course. And in fact, those of you who are CS50 students might recall from quite some weeks ago in week zero when we first introduced the course uh to the class, we had code that we whipped up that day that ultimately looked a little something like this. And I'll walk through it briefly line by line. But now on the heels of having studied some Python in CS50, this year code that I whipped up in the first lecture might make now a bit more sense. In that first lecture, we imported OpenAI's own library code that a third party company wrote to make it possible for us to implement code on top of theirs. We created a variable called client in week zero and this gave us access to the OpenAI client. That is software that they wrote for us. We then defined in week zero a user prompt which came from the user using the input function with which CS50 students are now familiar. And then we defined this system prompt that day where I said limit your answer to one sentence. Pretend you're a dot dot dot cat I think was the persona of the day. And then we used some bit more arcane code here. But in essence we created a variable called response which was meant to represent the response from OpenAI server. We used client.responses.create create which is a function or method that OpenAI gives us that allows us to pass in three arguments. The input from the user that is the user prompt the instructions from us that is the system prompt and then the specific model or version of AI that we wanted to use and the last thing we did that day was print out response.output_ext and that's how we were able to answer questions like what is CS50 or the like. So, we've seen all of that before, but we didn't talk about that week exactly how it was working or what more we could actually do with it. And so, in fact, what I thought we'd do today is peel back a layer that we've not allowed into the course up until now. And indeed, you still cannot use this feature until the very end of the class in CS50 when you get to your final projects, at which point you are welcome and encouraged to use VS Code in uh this particular way. So, here again is VS Code. For those unfamiliar, this is the programming environment we use here with students. And let me open up some code that was assigned to students a couple of weeks back, namely a spell checker that they had to implement in C. So I came in advance with a folder called speller. And inside of this folder, I had code that day and all students had that week called dictionary.c. And in this file, which will not look familiar to many of you if you've not taken weeks 0 through uh seven up until now, we did have some placeholders for students. So long story short, students had to answer a few questions. that is write code to do this to-do, this to-do, this to-do, and one more. There were four functions or blanks that students needed to fill in with code. And I dare say it took most students 5 hours, 10 hours, 15 hours, something in that very broad range. Let me show you now how using AI, you soon, the aspiring programmers can start to write code all the more quickly. not by just choosing a different language but by using these AI best based technologies beyond the duck itself. So what I've done here on the right hand side of VS code is enabled a feature that CS50 disables for all students from the start of the course called copilot. This is very similar in spirit to products from Google um and anthropic and other companies as well. But this is the one that comes from Microsoft and in turn GitHub here and it too gives us me sort of a chat window here and this is just one of its features. For instance, if I wanted to implement to get started the check function, I could just ask it to do that. Implement the check function and uh how about using a hasht in C. I'm going to go ahead and click enter. Now it's going to work. It's using as reference that is context the very file that I've opened which is dictionary.c here. Um, copilot in general as as well as a lot of AI tools are familiar with CS50 itself because it's been freely available as open courseware for years. What you see here it doing is essentially thinking though that's a bit of an overstatement. It's not really thinking. It's trying to find patterns in what the the problem is I want to solve among all of its training data that it's seen before and come up with a pretty good answer. So for today's purposes, I'm going to wave my hand at the chat GPT like explanation of what to do that has appeared at right. But what's juiciest to look at here is on the left if I now scroll down is highlighted in green is all of the suggested code for implementing this here check function. Now it might not be the way you implemented it yourself but I do dare say this has hints of exactly what you probably did when it came to implementing a hash a hash table. And in fact I can go ahead and keep all of this code if I like how it looks. Let's assume that's all correct there. Uh it might be the case that I want to now implement the load function. So how about now implement load function enter as simple as that. And what data is being used? Well, a few different things. It says one reference. So it's indeed using this one file. But there's also what are called comments in the code with which all students are now familiar. These slash commands in gray that are giving English hints as to what this function is supposed to do. There's implicit information as to what the inputs to these functions, otherwise known as arguments are meant to be, what the outputs are meant to be. So the underlying AI called co-pilot here kind of has a decent number of hits hints and much like a good TA or good software engineer that's enough context to figure out how to fill in those blanks. And so here too if I scroll down now we'll see in green some suggested code via which it could uh solve that same problem as well. the load function. And I dare say I've been talking for far fewer minutes than CS50 students spent actually coding the solution from scratch to this here problem. So I'll go ahead and click keep. I'll assume that it's correct. But that's actually quite a big assumption. And those of you wondering like why have we been learning off all this? If I could just ask in English it to do my homework for me. I mean there's a lot to be said for the muscle memory that hopefully you feel you've been developing over the past several weeks. The reality is if you don't have an eye for what you're looking at, there's no way you're going to be able to troubleshoot an issue in here, explain it to someone else, make marginal changes or the like. And yet, what's incredibly exciting even to someone like me, all of the staff, friends of mine in the industry, is that this kind of functionality and AI amplifies your capabilities as a programmer sort of overnight. Once you have that vocabulary, that muscle memory for doing it yourself, the AI can just take it from there and get rid of all of the tedium, allow you to focus at the whiteboard with the other humans on sort of the overarching problems that you want to solve and leave it to this AI to actually solve problems for you. A fun exercise too might be to go back uh at terms end and try solving any number of the courses assignments. For instance, let me go ahead and do this. In my terminal window here, I'm going to go back to my main directory. I'm going to create an empty file called Mario.c. C that has nothing in it. And I'm going to go ahead in my chat window here and say, please implement a program in C that prints a left aligned pyramid of bricks using hash symbols for bricks and use the CS50 library to ask the user for a non negative height as an integer. Period. I dare say that's essentially the English description of what was for CS50 this year problem set one to implement a program called Marioc. This two is sort of doing its thing. It's using one reference. It's working. It knows as a hint that this file is called Mario.c. And it's seen a lot of those in its training data over time. There's an English explanation of what I should do. And those CS50 students in the room probably recognize the sort of basic structure here of using a dowh loop to prompt the user for a height using the CS50 library which has been included. print a left alto line pyramid using some kind of loop and boom, we are done. And these are fairly bite-sized problems as you'll see as you get to terms end with your final project, which is a fairly open-ended opportunity to apply your newfound knowledge and savvy with programming itself to a problem of interest. It will allow you to implement far grander projects, far greater projects than has been possible to date, certainly in just the few weeks we have to do it because of this uh amplification of your own abilities. So with that promise, let's talk about how in the heck any of this is actually working. I clearly just generated a whole lot of stuff and that's how we began the story with the generation of those images and those two essays by kids. But what is generative artificial intelligence or really what is AI itself? And these are some of the underlying building blocks that aren't going anywhere anytime soon and indeed have led us as a progression to the capabilities you just saw. So spam, we sort of take for granted now that in our Gmail inboxes or Outlook inboxes, most of the spam just ends up in a folder. Well, there's not some human at Microsoft or Google sort of manually labeling the messages as they come in, deciding spam or not spam. They're figuring out using code and nowadays using AI that looks like spam and therefore I'm going to put it in the spam folder, which is probably correct 99% of the time, but indeed there's potentially a failure rate. Um, other applications might include handwriting recognition. Certainly Microsoft and Google doesn't know the handwriting style of all of us here in this room, but it's been trained on enough other humans handwriting styles that odds are your handwriting in mine looks similar to someone else's. And so with very high probability, they could recognize something like Hello World here as indeed that same digital text. All of us are into streaming services nowadays, Netflix and the like. Well, they're getting pretty darn good at knowing if I watched X, I might also like Y. Why? Well, because of other things I've I've watched before and maybe upvoted and downvoted. Maybe because of other things people have watched who like similar movies or TV shows to me. So that too is AI. There's no ifels else if else if else construct for every movie or TV show in their database. It's sort of figuring out much more organically, dynamically what you and I might like. And then all these voice assistants today, Siri, Alexa, Google Assistant, and the like. Those two don't recognize your voice or necessarily know what questions you're going to ask it. There's no massive if else if that has all possible questions in the world just waiting for you or me to ask it. That too, of course, is dynamically generated. But that's getting a bit ahead of ourselves. Let's like rewind in time. And some of the parents in the audience might remember this year game among the first arcade games in the world, namely Pong. And so this was a black and white game whereby there's two players, a paddle on the left, a paddle on the right, and then using some kind of joystick or track ball, they can move their paddles up and down, and the goal is to bounce the ball back and forth and ideally catch it every time. Otherwise, you uh lose a point. Uh this is just an animated GIF, so there's nothing really dramatic to watch. It's going to stay at 15 against 12. Uh just looping again and again. Nothing interesting is going to happen, but this is a nice example of a game that lends itself to solving it with code. And indeed, it's been in our vernacular for years to play against not just the computer, but the the CPU, the central processing unit, or really the AI. And yet, AI does not need to be nearly as sophisticated as the tools we now see. For instance, here's a successor to Pong known as Breakout. Similar in spirit, but there's just one paddle and one ball, and the goal is to bounce the ball off of these colorful bricks, and you get more and more points depending on how high up you can get the ball. All of us as humans, even if you've never played this old school game, probably have an instinct as to where we should move the paddle. If the ball just left it going this way, which direction should I move the paddle? I mean, probably to the left. And indeed, that'll catch it on the way down. So, you and I just made a decision that's fairly instinctive, but it's been ingrained in us, but we could sort of take all the fun out of the game and start to quantify it or describe it a little more algorithmically, step by step. In fact, decision trees are a concept from economics, strategic thinking, computer science as well. That's one way of solving this problem in such a way that you will always play this game well if you just follow this algorithm. So, for instance, how might we implement uh code uh or decision-m process for something like breakout? Well, you ask yourself first, is the ball to the left of the paddle? If so, you know where we're going, then go ahead and move the paddle left. But what if the answer were no? In fact, well, you don't just blindly move the paddle to the right. probably. What should you then ask? >> Are we right below the ball? >> Are you right below the ball? If the ball's coming right at you, you don't want to just naively go to the right and then risk missing it. So, there's another question to ask. Is the ball to the right of the paddle? And that's a yes no question. If yes, well then okay, move it to the right. But if not, you should probably stay exactly where you are and don't move the paddle. All right, so that's fairly deterministic, if you will. Um, and we can map it to code using pseudo code in uh say a class like CS50. We can say in a loop, well, while the game is ongoing, if the ball's to the left of the paddle, then move the paddle left. Uh, else if the ball's to the right of the paddle, sorry for the typo there, move the paddle right. Uh, else just don't move the paddle. And so these decision trees, as we drew it, have a perfect mapping to code or really pseudo code in this particular case, which is to say that's how people who implemented the breakout game or the pawn game, who implemented a computer player surely coded it up. It was as straightforward as that. But how about something like tic-tac-toe, which some of you might have played on the way in for just a moment on the scraps of paper um that you might have had. Uh here we have a tic-tac-toe board with two uh O's and two X's. For those unfamiliar, this game tic-tac-toe, otherwise known as knights and crosses, is a matter of going back and forth, X's and O's between two people. And the goal is to get three O's in a row or three X's in a row, either vertically, horizontally, or diagonally. So this is a game here in mid-progress. Well, let's consider how you could solve the game of tic-tac-toe like a a computer, like an AI might. Well, you could ask yourself, can I get three in a row on this turn? Well, if yes, well, play in the square to get three in a row. It's as straightforward as that. If you can't, though, what should you ask? Well, can my opponent get three in a row on their next turn? Because if so, you should probably at least block their move next, so at least you don't lose. now. But this game, tic-tac-toe, is relatively simple as it is, gets a little harder to play when it's not obvious where you should go. Now, all of us as humans, if you grew up playing this game, probably had heruristics you used, like you really like the middle or you like the top corner or something like that. So, we probably can uh make our next move quickly, but is it optimal? And I dare say if back in childhood or more recently you've ever lost a game of tic-tac-toe like you're just bad at tic-tac-toe because logically there's no reason you should ever lose a game of tic-tac-toe if you're playing optimally. At worst you should force a tie but at best you should win the game. So think of that the next time you play tic-tac-toe and lose like you're doing something wrong. But in your defense it's because the question mark is sort of not obvious. like how do I answer it when the answer is not right in front of me to move for the win or move for the block? Well, one algorithm you could have been using all of these years is called Miniax. And as the name suggest, it's all about minimizing something and or maximizing something else. So here too, let's take a bit of fun out of the game and turn it into some math, but relatively simple math. So here we have three representative tic-tac-toe boards. O has one here, X has one here, and the middle is a tie. Doesn't matter how we score these boards, but we need a consistent system. So I'm going to propose that anytime O wins the score of the game is negative 1. Anytime X wins, the score of the game is a positive one. And anytime nobody wins, the score is zero. Um so at this point each of these boards have these values negative 1, 0, and one. So the goal therefore in this game of tic-tac-toe now is for X to maximize its score because one is the biggest value available and O's goal in life is to minimize its score. So that's how we take the fun out of the game. We turn it into math where one player just wants to maximize, one player just wants to minimize their score. All right, so a quick uh sanity check here. Here's a board. It's not colorcoded. What is the value of this board? >> One because x has in fact one straight there down the middle. So x is one zero o is negative one otherwise a tie. So now let's see how we go about with those principles in place figuring out where we should play in tic-tac-toe. Now, here's a fairly easy configuration. There's only two moves left. It's not hard to figure out how to win or tie this game. But let's use it for simpl for simplicity. It's O's turn, for instance. So, where can O go? Well, that invites the question, well, what is the value of the board? Or how do we how do we minimize the value of the board for O to win? Well, O can go in one of two places, top left or bottom middle. Which way should O go? Well, if O goes in top left, we should consider what's the value of this board? Is it minimal? Well, let's see. uh if O goes here, X is obviously going to go here. X is therefore going to win. So the value of this board is going to be a one. Now since there's only one way logically to get from this configuration to this one, we might as well call the value of this board by transitivity one. And so O probably doesn't want to go there because that's a pretty maximal score and O wants to minimize. Over here though, if O goes bottom middle, well then X is going to go top left. And now no one has one. So the value of this board is thus >> zero. we might as well treat this as zero because that's the only way to get there logically. So now O more mathematically and logically can decide do I want an end point of one or an end point of zero. Well zero is probably the better option because that's less than one and thus it's the minimal possibility. So O is going to go ahead in the bottom middle and at least force a tie. And so that's where you see evidence where if you humans are ever losing the game of tic-tac-toe, you have not followed that their logic. But you could probably do it if there's just two moves left. But the catch is, let's go ahead and sort of rewind to three moves left here. There are three blanks. And I've kind of zoomed out. The catch is that the decision tree gets a lot bigger the more and more moves that are left. It gets sort of bigger and bushier in that it's essentially doubling in size and width. And that's great if you have the luxury of writing it down on a piece of paper. But if you're doing this on your head while playing against a a fifth grader, if I may, you're probably not drawing out all of the various boards and configurations, trying to play it optimally. You're going with some instinct. And your instincts might not be aligned with an algorithm that is tried andrude miniax that will ideally get you to win the game, but at least will get you to force a tie if you can't win. But tic-tac-toe is not that hard. I mean, how many different ways are there to play tic-tac-toe? could write a computer program to pretty much play tic-tac-toe optimally. Um, we could use code like this. If the player is X for each possible move, calculate the score for the board at that point in time and then choose the move with the highest score. So, you just try all possibilities mathematically and then you make the decision. Most of us in our heads are not doing that, but we could. Um, else does the player essentially do the same thing, but choose the minimal possible score. So, that's the code for implementing tic-tac-toe. How many ways are there to play tic-tac-toe though? Well, 255,168, which means if we were to draw that tree, it would be pretty darn big and it would take you quite a bit of time to sort of think through all those possibilities. So, in your defense, you're maybe not that bad at tic-tac-toe. It's just harder than you thought as a game. But what about games with which we might as adults be more familiar? Well, what about the game of chess, which is often used as a measure of like how smart a computer is, whether it's Watson back in the day playing against it or something else? Well, if we consider even just the first four moves of tic-tac-toe, whereby I mean black goes and white goes, and then they each go three more times. So, four pair-wise moves. How many different ways are there to play chess? Well, it turns out 85 billion just to get the game started. And that's a lot of decisions to consider and then make. How about the game of Go a familiar? Consider the first four move 266 quintilion possibilities. And this is where we sort of as humans and even with our modern PCs and Macs and phones kind of have to throw up our hands because I don't have this many bytes of memory in my computer. I don't have this many hours in my life left to actually crunch all of those numbers and figure out the solution. And so where AI comes in is where it's no longer as simple as just writing if else's and loops and no longer as simple as just trying all possibilities. You instead need to write code that doesn't solve the problem directly but in some sense indirectly. You write code so that the computer figures out how to win. Perhaps by showing it configurations of the board that are a good place to be in that is promising and maybe showing it boards that it doesn't want to find itself in the configuration of because that's going to lead it to lose. In other words, you train it but not necessarily as exhaustive. And this is what we mean nowadays by machine learning. writing code via which machines learn how to solve problems generally by being trained on massive amounts of data and then in new problems looking for patterns via which they can apply those past training data to the problem at hand. And reinforcement learning is one way to think about this. In fact, in fact, we as humans use reinforcement learning which is a type of machine learning sort of all of the time. Um in fact uh uh a fun demonstration to watch here involves these here are pancakes. So, in fact, let me go ahead and pull up a short recording here of an actual researcher in a lab who's trying to teach a robot how to make uh how to flip pancakes. So, we'll see here in this video that there's a robot has a arm that can go up, down, left, right. This, of course, is the human, the researcher, and he's just going to show the robot one or more times like how to flip a pancake and crosses his fingers and okay, seems to have done it well. Does it again. Not quite the same, but pretty good. And now he's going to let the robot just try to figure out how to flip that pancake after having just trained it a few different times. The first few times, odds are the robot's not going to do super well cuz it really doesn't understand what the human just did or what the whole purpose of. But and here's the key detail with reinforcement learning. Behind the scenes, the human is probably rewarding the robot when it does a good job. like better and better it flips, the more it gets rewarded as by like hitting a key and giving it a point, for instance, or giving it the digital equivalent of a cookie. Or conversely, every time the robot screws up and drops the pancake on the floor, sort of a proverbial slap on the wrist, a punishment so that it does less of that behavior the next time. And any of you who are parents, which by definition today, many of you are, odds are, whether it's not this or maybe just verbal uh approval or reprimands, have you probably trained children at some point to do more of one thing and less of another. And what you're seeing in the backdrop there is now just a quantization of the movements X, Y, and Z coordinates so that it can do more of the X's and the Y's and the Z that led it to some kind of reward. And now after you're up to some 50 trials, the robot seems to be getting better and better such that like a good human, we'll see if I can do this without embarrassing myself, can flip the thing. That's pretty good. That was pretty I've been doing this a long time. Okay, so we've seen then how you might uh reinforce learning through that kind of domain. Let's take an example that's familiar to those of you who are gamers. Anytime you've played a game where there's some kind of map or a world that you need to explore up, down, left, right, maybe you're trying to get to the exit. So here simplistically is the player at the yellow dot. Here for instance in green is the exit of the map and you want to get to that point. And maybe somewhere else in this world there's a lot of like lava pits and you don't want to fall into the lava pit because you lose a life or you lose a point or there's some penalty or punishment associated with that. Well, we with this bird's eye view can obviously see how to get to the green dot. But if you're playing a game like Zelda or something like that, all you can do is move up, down, left, right, and sort of hope for the best. So, let's do just that. Suppose the yellow dot just randomly chooses a direction and goes to the right. Well, now we can sort of take away a life, take away a point or effectively punish it so that it knows don't do that. And so long as the uh player has a bit of memory, either the human player or the code that's implementing this just with a dark red line, that means don't do that again because that didn't lead to a good outcome. So maybe the next time the yellow dot goes this way and this way and then ah didn't realize that that's actually the same lava pit. But that's fine. Use a little bit more memory and remind me don't do that because I just lost a second life in this story and maybe it goes this way next time. Ah, now I need to remember don't do that. But effectively, I'm either being punished for doing the wrong thing. Ah, or as we'll soon see, being rewarded for doing more of the successful thing. And just by chance, maybe I finally make my way to the exit in this way. And so I can be rewarded for that. Now I got 100 points or whatever it is, the high score. So now, as per these green lines, I can just follow that path again and again, and I can always win this game. kind of like me nowadays, like 30 years later, playing Super Mario Brothers because I can get through all the warp levels because I know where everything is because for some reason that's still stored in my brain. Is this the best way to play? Am I as good at Super Mario Brothers as I might think? What's bad about this solution? Yeah. >> Exactly. Yeah. I've moved many more times than I need to. And just for fun today, what grade are you in? >> Uh, seventh. >> Seventh grade. Wonderful. So now seventh grade observation is like exactly that that we could have taken a shorter path which is essentially that way albeit uh making some straight moves. And so we're never going to find that shorter path. We're never going to get the highest score possible if I just keep naively following my welltrodden path. And so how do we break out of that mold? And you can see this even in the real world. Another sort of personal example is I'm the type of person for some reason where if I go to a restaurant for the first time, I choose a dish off the menu and I really like it. I will never again order anything else off that menu other than that dish because I know it is good. But there could be something even better on the menu, but I'm never going to explore that because I'm sort of fixed in my ways, as some of you from the smiles might be too. But what if we took advantage of exploring just a little bit? And there's this principle of exploring versus exploiting when it comes to using artificial intelligence to solve problems. Up until now, I've just been exploiting knowledge I already have. Don't go through the red walls. Do go through the green walls. Exploit, exploit, exploit. and I will get to a final solution. But what if I just sprinkle in a little bit of randomness along the way and maybe 10% of the time as represented by this epsilon variable, I as the computer in the story generate a random number between zero and one. And if it's less than that percent, which is going to happen 10% of the time, I'm going to make a random move instead of one that I know will get me closer to the exit. Otherwise, I'll indeed make the move with the highest value. Now, this isn't going to necessarily win me the game that first time, but if I play it enough and enough and enough and insert some of this randomness, I might very well find a better solution and therefore be a better player, a better winner overall. If I just 10% of the time ordered something else off the menu, I might find that there's an amazing dish out there that otherwise I wouldn't have discovered. And so indeed using that approach can we finally find a more optimal path through the maze as was shorter there presumably therefore maximizing our score and doing even better than we might have by just exploiting the same knowledge. So you can see this even in the game of Breakout especially if you write a solution in code to play this game for you. Let me go ahead and pull up another video recording of an AI playing Breakout. And what this AI is doing is essentially figuring out maybe more intelligently than you or I could, how to play this game optimally. And what we'll see here is that just like uh the pancake flipping robot, there's some notion of scoring and rewards and penalties here. So like right now, the paddle is just doing random stuff. It doesn't really know how to play the game yet, but it realizes after 200 episodes that, oh, my score goes up if I hit the ball and it goes down equivalently if I miss it. and it's still a little twitchy. It doesn't quite understand what it's supposed to do and why. But if you do it again and again and again and it's rewarded andor punished enough, you'll see that it starts to get pretty good and closer to what a good human might do. But here's where the algorithm gets a little creepy. If you let it play long enough, or if you and I, the humans play long enough, you might find a certain trick to the game. I dare say the AI becomes a bit scarily sent sentient in that turns out if you're smart enough to break through that top row, you can let the game just play itself for you and maximize your score without even touching the ball. Something that I do find a little creepy that I just figured out how to do that without being told. But it's just a logical continuation of rewarding it for good behavior and punishing it for bad behavior. So that next time you have an occasion to play Breakout, consider that kind of strategy as opposed to doing more of the work yourself, let the computer do it for you instead. Well, what else is there to consider in this world of AI in the context of machine learning? Well, there's specifically a category of learning that's supervised. And we've been using this for years. And in fact, our first example of spam early on was certainly supervised. Why? Because it was you and I who was like putting the ma email into the spam folder. to this day, maybe once a day, I hit the keyboard shortcut in Gmail to say, "Ah, this is spam. You should have caught this." And that is training Google's algorithm further, assuming it's not just little old me, but maybe thousands of people tagging that same kind of email as spam. That's supervised learning and that there's a human in the loop doing at least something. Um, so spam detection might be one of those. But the catch is that labeling data in that way manually just doesn't scale very well. That would be akin to having someone at Google or Microsoft labeling every email or someone at Netflix doing the same for all of the videos out there. It's expensive in terms of human power. And there's certainly problems out there with so much data. It's just not realistic for humans to label millions of pieces of data, billions of pieces of data. We've got to move to an unsupervised model. And so this is where the world starts to consider deep learning, solving problems using code whereby you don't even have humans in the loop in quite the same way. and neural networks inspired by the world of biology are sort of the inspiration for what is the state-of-the-art even underlying today's rubber duck and more generally these things called large language models like chat GPT and the like. So here pictured somewhat abstractly is a neuron and it's something in the human body that transmits a signal say from left to right electrically and if you have multiple neurons you can intercommunicate among them so that if I think a thought uh then I know how to raise my hand because some kind of message electrically has gone from my head to this extremity here. So that's in essence what I remember from nth grade biology. But as computer scientists, we sort of abstract all of this away. So instead of calling these two neuron, drawing them as neurons, let's just start drawing neurons as these little circles. And if they have connective tissue between them of sorts, we'll just draw a a straight line an edge between them. So this is what a computer scientist would call a graph. If you have two such neurons over here leading to one out uh one neuron here, you can think of this as being like maybe two inputs to a problem and now one output there too. We can represent the notion of problem solving, which is what CS50 and intro courses more generally are all about. So let's solve a problem with a neural network without necessarily training it in advance, just letting it figure out how to answer this question. Here's a very simple two-dimensional world, XY grid, and here are two dots. And the dots in this world are either blue or they are red. But I have no idea yet what makes a dot blue or red. However, if you train me on those two dots, I bet I could come up with predictions, especially if you let me label this world in terms of x coordinates on the horizontal, y-coordinates on the vertical, and then you know what? We can think of this neural network very simply as representing the x coordinate here, the y-coordinate here, and the answer I want to get is quote unquote red or blue or zero or one or true or false, however you want to think of the representation. So, how do I get from a specific xycoordinate to a prediction of color if I only know the coordinates? Well, up from the get-go, maybe the best I can do is just divide the world into blue dots on the left and red dots on the right. A best fit line, if you will, based on very minimal data. Of course, if you give me a third dot, it's going to be pretty easy to realize that I was a little too hasty. That line is not vertical. So, maybe we pivot the line this way. And now I'm back in business. Now, I can predict with higher probability based on XY what color the next dot will be. You give me enough of these dots, I can come up with a pretty good best fit line. It's not perfect, but here's a hint at why AI is not perfect, but 99% of the time, maybe I'll be able to predict correctly. And I can do even better if you let me squiggle the line a little bit and maybe make it more than just a simple uh slope. So, what is it we're really doing with implementing this neural network, albeit simplistically with just three neurons? Well, essentially, we're trying to come up with three values, three parameters, an A, a B, and a C. And what do those represent? Well, really just a solution to this formula. that their line we drew can be represented if you think back to like high school math with a formula along these lines where by it's a * x plus b * y plus some constant c and we can just arbitrarily conclude that if that value mathematically gives me a number greater than zero predict it's going to be blue otherwise predict it's going to be red we can sort of map our mathematics just like with tic-tac-toe to the actual problem we care about by defining the world in this way and so if you give me enough data points and enough data points I can come up with answers for that A, that B, that C. The so-called parameters in neural networks. Now, in reality, neural networks are not composed of like three neurons and a couple of edges. They look a little something more like this. And in practice, they've got billions of these things here on the screen. In which case, pretty much every one of these edges represents some mathematical value that was contrived based on lots and lots of training data. And whereas I, the computer scientist, might know what these neurons over here represent because those are my inputs, three in this case. and I, the computer scientist, know what this one represents at the end. If you sort of took the hood off of this thing and looked inside the neural network, even though there'd be millions billions of numbers going on there, I can't tell you what this neuron represents or why this edge has this uh weight. It's because of the massive amount of training data that that's just how the math works out. And if you feed me more data, I might change some of those parameters more. So the graph ultimately might look quite different, but my inputs and my outputs are going to be what I use to solve that their problem. So if you want to predict like rainfall from humidity or pressure, you can have two inputs giving that one output. Uh advertising dollar spent in a given month that might predict sales by just having trained again on such volumes of data. And when we get now full circle to something like CS50's rubber duck and large language models like claude and gemini and chacht what's really happening and this is all hot off the press in recent years screenshotted here are some of the recent research papers that have driven a lot of this advancement in recent years. you have from open AAI say a generative pre-trained transformer which is a lot to say but there's the GPT in chat GPT and essentially this is a neural network that's been trained on large volumes of textual information that gives us the interactive chat feature that we have in the class and we all have more generally in chatbt itself. So an example of what is actually happening underneath the hood of these GPTs. Well, here's a paragraph that up until recent years was kind of a hard paragraph to end with the dot dot dot. Uh, Massachusetts is a state in the New England region of the northeastern United States. It borders on the Atlantic Ocean to the east. The state's capital is dot dot dot. Now, most anyone living in Massachusetts probably knows that answer. But if this AI has just been trained on lots and lots of data, there's probably a lot of people who say Massachusetts in part of a sentence and then the answer, which I won't say yet, is in uh the other part of the sentence. But in this example, given that the question we're asking is sort of so far from some of the useful keywords up until recently, this was a hard problem to solve because there was so much distance. Moreover, there's these nouns that are being used to substitute for the proper noun. Like we suddenly start calling it a state, we call it a state down here. And it wasn't necessarily obvious to AIS that we're talking about the same thing as if it were just city, state, where you'd have much more proximity. So in a nutshell, what we now do especially to solve problems like these is we first break down a sentence or the training data or input alike into like an array or a list of the words themselves. We come up with a representation of each of these words. For instance, the word Massachusetts if you encode it in a certain way uh is going to be represented with an array or vector of numbers, floatingoint values. So many so that the word Massachusetts in one model would use these 1536 floatingoint numbers to represent Massachusetts essentially in an n-dimensional space. So not just an XY plane but somewhere sort of virtually out there and then and this has been the key to these GPTs an attention is calculated based on all of that data whereby in this picture the thicker lines imply more of a relationship between those two words. So Massachusetts and state is inferred as having a thicker line, a higher attention from one word to the other. Whereas our A's and our ises and our thus have thinner lines because they're just not as much signal to the AI as to what the answer to this question is. Meanwhile, when you then feed that sentence like the state's capital is one word per neuron here, the goal is to get the answer to that question. And even here, this is way smaller of a representation than the actual neural network would be. But in effect, all these LLMs, large language models are are just statistical models. Like what is the highest probability word that it should spit out at the end of this paragraph based on all of the Reddit posts and Google search results and encyclopedias and Wikipedias that it's found and trained on online? Well, the answer hopefully will be Boston. But of course, 1% of the time, maybe less than that, the answer might not be correct. And even CS50's own duck is fallible, even though we've written lots of code to try to put downward pressure on those mistakes. And those mistakes are what we'll call lastly hallucinations where the AI just makes something up perhaps because some crazy human on the internet made something up and it was interpreted as authoritative or just by bad luck because of a bit of that exploration 10% of the time 1% of the time the AI sort of veered this way in the large language model in the neural network and spit out an answer that just in fact is not correct. And so I thought I'd end for today on this final note, a poem with which many of us might have grown up from Shell Silverstein here about the homework machine, which years ago somehow sort of predicted the state we would be in with these AI machines. He said, "The homework machine, oh, the homework machine, most perfect contraption that's ever been seen. Just put in your homework, then drop in a dime, snap on the switch, and in 10 seconds time, your homework comes out quick and clean as can be." Here it is. 9 + 4, and the answer is three. Three. Oh, me. I guess it's not as perfect as I thought it would be. This then was CS50. See you next time. Heat. Heat. Heat. Heat. All right, this is CS50 and this is already week 8. uh and up until now of course in so many of our problem sets like we've been writing command line code like a black and white terminal window and everything is very keyboard based very textual but of course like the apps that you and I are using like every day are in the form of a web browser and on our phone and so today and really for the rest of the semester we now transition to using all of the building blocks that we've been accumulating over the past few weeks but to redeploy them in the context of web apps and for your final project for instance if you so choose even mobile apps as well. So today we're going to understand how the internet that we use every day actually works. We're going to introduce you to a language called HTML which is the language in which web pages are written. A language called CSS which is the language with which web pages are stylized. And then lastly JavaScript which of those is the only actual programming language but even though we'll spend uh quite little time on it you'll see syntactically and functionally it's very similar to C to Python and languages indeed that have come before. All right. So we use the internet every day. So what exactly is it? Well, in the simplest form, like we've got networks in the world and networks are interconnections of computers, whether with wires or wirelessly. You have a network at home nowadays for the most part. You certainly have a network on a campus like this. In corporations, you have networks. So interconnections of computers. As soon as you start networking the networks, if not networking the networks of networks, you have in effect the internet. So this global interconnection of computers, servers, devices and so many other things literally nowadays that we take for granted every day. But how does it actually work and where did it come from? Well, if we rewind to like 1969, the internet in its original form really something known as ARPANet for the advanced research projects agency, a project from the Department of Defense that was really designed to interconnect what limited supercomputers we had back then that were otherwise geographically inaccessible to so many researchers and others. The internet or ARPANET really just looked like this with UCLA and just a few other nodes so to speak interconnected somehow. Uh just a year or so later did we have Harvard and MIT and others on the east coast. And if we fast forward now to today of course we can find and route data most anywhere in the world. And in fact the world is now filled with these things called routers. A router is just a computer a server uh that routes data up down left right geographically. And of course in the real world it might go out this wire here, out this wire here, out this wire or out this wire. And in fact, just to make more real what we're about to be talking about when we talk about networks of computers and eventually the internet, um we engaged some of our teaching fellows over the past few years to perform a a little skit of sorts for us using uh Zoom, if you will, whereby each of the teaching fellows or humans you're about to see consider them as representing a router, a device on the internet whose purpose in life is to route data. And what they're routing is what we're going to start calling packets. packets of information which metaphorically you can think of as just like a little white envelope like this that we use to send things via snail mail via the US Postal Service or beyond that internationally. So I give you in just 60 seconds or so what it means to send a packet on the internet for instance from Phyllis in the bottom right hand corner to a familiar face Brian at top left. If we could dim the lights if only to be dramatic. Heat. Heat. Thank you. Sure, we can clap for that. And we actually should clap for that because you're seeing the sort of final version which looked kind of perfect, but they were all smiling and clapping because it took us so many damn takes to like actually get the coordination of that correct. But for now, assume that it was in fact correct. But notice what's among the takeaways from even that little skid is that the packet, the envelope from Phyllis to Brian could have taken any number of paths. It could have gone up and then to the left. It could have gone left and then up. It could have zigzagged and the like. And that's actually representative of how the world now looks because of so many wires and so many wireless connections. There's actually a lot of ways that data can travel from point A to point B. And it turns out it's not even necessarily going to be the shortest difference. It might be the least expensive dis uh distance uh or perhaps just the result of how some humans or somehow some servers have automatically configured the d the uh routes to get from point A to point B. So let's consider how the data is actually getting there. So long story short, all of those routers and indeed all devices on the internet including the ones in your pocket or on your laps speak a language, more technically a protocol nowadays known as TCP IP. And this is actually a pair of protocols which is a set of conventions that governs how computers behave on the internet. In the human world, we have protocols as well. For instance, when I meet someone for the first time, I very often instinctively sort of extend my hand just sort of hoping that they too will extend their hand and shake. And that's a human protocol in that it governs how to people in that case intercommunicate. Well, servers have the same kinds of protocols, but it's all textbased or bit based instead of of course physical. But TCP and e and IP are two different protocols that solve two different problems. And let's focus on the last of them first. So IP short for internet protocol is simply a protocol that decides to give all of us a unique address in the world. In other words, there are these things called IP addresses. It's a numeric address that literally every computer in the world has in order to uniquely identify it. Case in point, in the real world, we have addresses too. For instance, in this building here, Memorial Hall, we're at 45 Quincy Street, Cambridge, Massachusetts 02138 USA. And theoretically that unique identifier should get an envelope in the physical world to this location from any other in the real world. IP as applied to the internet just means that similarly do devices, Macs, PCs, phones, and everything else on the internet have a unique identifier as well known as an IP address. It's a number, but it's typically formatted in dotted decimal notation, so to speak. So it's something dot something dot something dot something. And just as a bit of trivia, each of these number signs represents a value from 0 to 255. So there are four such values apparently. And just doing some quick week zero math, if each of those values can be 0 to 255, how many bits is an IP address presumably? >> So eight bits per number. And how many was this? >> So 32 bits because if you're counting from 0 to 255, well that's 256 total possibilities. That's two to the eth which means 8 bits. 8 bits. 8 bits. 8 bits. So IP addresses are 32 bits. Little trivia that's germanine only in so far as it does kind of limit how many total devices we could seem to have in the world. If you've got only 32 bits, how high can you count? Roughly >> two. >> So two to the 32nd power, which we've generally ballparked as 4 billion, which is to say you can have 4 billion devices total, it would seem on the internet, which is a big number. But there's also a lot of humans nowadays. is and odds are most everyone in this room has at least two devices to their name. Maybe a phone and a laptop with which you're taking the course. Maybe even more devices thanks to the internet of things like smart home devices. We have so many IP addresses being assigned to things. So long story short, the world is gradually transitioning from this version here, IPv4, uh to IPv6, which instead of using 32 bits is actually using 128 bits, which is crazy large and gives us more than enough IP addresses for the foreseeable future. To be fair, we've been talking about this for like 20, 30 years, transitioning from V4 to V6, and it's still gradually in motion. But for simplicity in the class and in general, we'll still use IPv4, if only because it's a little easier to wrap your mind around. Now, this is admittedly a pretty arcane diagram. But this is the diagram, ASI art, if you will, that's in the U official specification of what we mean by an IP datagramgram. More colloquially, this is what a packet actually looks like. Now, what are we looking at? Well, you're just looking at like a grid of bits. So this here represents 32 bits total where this is bit zero and that's bit 31 zero indexed all the way over there. And then each row represents 32 more bits. 32 more bits. 32 more bits. Which is to say anytime a computer like Phyllis sends an envelope of information on the internet. It contains at least this information. A whole bunch of bits broken down into bytes. Now, the only ones we'll really care about today are this one here, source address, which is to say when Phyllis sends that packet, she writes her source address, her IP address, something on the outside of the envelope, so to speak. And she also puts Brian's IP address, whatever that is, something else something else on the outside of the envelope as well. There's a whole bunch of other bits involved which are useful, but we'll wave our hands at those for today. But that really speaks to what's actually happening. And if we do this metaphorically in the real world, it's kind of like taking out that envelope. And for instance, if Brian's IP address is 1.23.4 for the sake of discussion, Phyllis in advance of our filming that bit would have written something like 1.23.4 in the middle of the envelope, just like we would in the real world. But presumably, she wants Brian to be able to reply to acknowledge receipt or send his own message. So, she's also going to put her IP address, for instance, in the top left corner of the envelope, 5.67.7.8 for the sake of discussion, so that Brian knows when he writes out his own packet of information how to actually or to whom to reply. But at the end of the day, it's all just bits uh being sent in a specific pattern and there is formal documentation is the the order in which all of those bits will actually be sent out on the wire or wirelessly. So in short, IP ensures that all of us have unique IP addresses via which data can go from us or to us. But that's only one problem. Nowadays, of course, servers can do so many other things. They can do email and chat and video conferencing, game servers, and who knows what. And it would be nice if a single server certainly could do multiple things. And in fact, that's very much the case. Single servers nowadays, and a server is just a term of art for a computer used to serve information to other people. By contrast, our laptops, our desktops are generally clients because they only serve one of us, not multiple people. But these are just uh terms of art. We're describing at the end of the day still computers. IP only ensures that we can uniquely address computers on the internet. But there's another protocol in TCPIP, namely the TCP portion that allows computers to uniquely identify services that they're offering uh to the rest of the world. So for instance, TCP allows it allows a computer to distinguish whether it has received a packet that's an email or receive a packet that's a chat message or a piece of a video conference or the like, which is to say there's more than just IP addresses on the outside of these envelopes. There are also what are called port numbers as well. Uh similarly, numeric uh numeric values that are usually in the range of like 0 to one uh zero on up in the low thousands and they're standardized. For instance, if you are requesting a web page using http slash with which all of us are presumably familiar, unbeknownst to you, on the outside of the virtual envelope that your computer subsequently sends is the port number 80. Because when the server receives that, it knows, oh, this human is requesting a web page and not, for instance, their email or something else. or nowadays if you're using HTTPS where the S denotes secure in the URL you're actually using port 443 which is just an arbitrary number that a bunch of humans in a room decided on years ago to standardize what goes on the outside of an envelope. So just to be more clear then when Phyllis is sending a request to Brian and if Phyllis for instance is the client just a human using a computer and Brian in this story is now a web server better yet a secure web server that's somehow encrypting or scrambling the information to keep it secure well on the outside of this envelope after Brian's IP address which was 1.2.3.4 four. Phyllis is also going to write the number 443 so that when Brian receives and opens this envelope, he knows what he's looking at. A request for a web page and not an email or a chat message or something else. Moreover, we can continue the story just a little bit further. Phyllis also writes on the envelope not only her IP address 5.67.8, but some number as well in that top lefthand corner, whatever it happens to be, which is a port number via which Brian can reply to her. In this way, Phyllis can in effect have multiple tabs open, be using Zoom and uh some chat software or something else, running multiple programs on her computer, and the internet packets are all coming in, but her computer knows to which tabs or applications those packets belong. So, if you really want to geek out, here's what this thing looks like. This is just the sequencing of bits for TCP as well, which is to say, in addition to the dozens of bits we looked at a moment ago that standardize what IP is putting on the outside of the envelope, TCP is adding uh 16 bits that specify a port number, which means you can indeed have tens of thousands of possible port numbers, a destination port number, and a bunch of other stuff, including this so-called sequence number, which happens to be a 32bit value, which is actually pretty important because quite often when sending messages on the internet, they're pretty large. And it would be nice if one person downloading a big image or one person downloading a movie or streaming a movie doesn't mean that no one else on the internet can do something else at that moment in time. So for the sake of discussion, suppose that this very happy cat here is a very large JPEG, for instance, a very large graphical file. It would be nice, let's say, that if Phyllis is trying to send or receive an image as large as this, it's not just in one massive envelope that's going to prevent a whole bunch of other users from similarly using the internet at that moment in time. So, at the risk of a a bit of heresy, we can actually tear this cat in half and fragment it really. And then inside of Phyllis's envelope or equivalently Brian's reply depending on where this cat is coming from or going to part of that cat can go in this envelope. And now say in the bottom left hand corner of this envelope, Phyllis or Brian could write the sequence number in question. One out of four, two out of four, three out of four, four out of four. So that when this and hopefully the other packets arrive at their destination, the recipient's computer can check, okay, this was a really big file in this case. Do I have all of the parts? Yes, it can be inferred from the so-called sequence number which we've represented there in that memo field of the envelope. There's a bunch of other stuff that can go on here too, including prioritization of data as well. Um, but ultimately TCP just allows servers to handle multiple types of services and also allows it to receive data reliably because if for instance a recipient only gets two out of the four packets or three out of the four packets, the fact that there's a sequence number involved is enough information for that recipient to say to the sender, hey, I'm missing one or two or three or more packets. Please resend them. So in short, TCP guarantees delivery by just doing some bookkeeping on the outside of these envelopes. So in short, IP allows us to uniquely identify computers and TCP guarantees delivery and allows us to multiplex so to speak among multiple services on the same device. Questions on the uh this jargon thus far because today's filled with acronyms unfortunately. questions on IP, TCP or anything else. Okay, so seeing none, uh, as promised, let's do yet another acronym. So, it would be pretty tedious if Phyllis and Brian and all of us humans had to write actually IP addresses into our browsers when visiting websites. Uh, and in fact, most of us never do that. Instead, we go to google.com or Harvard.edu edu or actual domain name so to speak which were so much easier for us humans to remember than these arbitrary IP addresses that are either automatically assigned to computers or manually configured uh by humans configuring servers but there's another acronym in the world and there's another technology used on the internet namely DNS for domain name system and this is just a certain type of server that every home has if even if you didn't know it every uh campus has every company has there's so many DNS servers around the world but their purpose in life quite simply is to translate what you and I know as domain names like google.com, harvard.edu and the like into their corresponding IP addresses. And so in short, inside of these DNS servers are essentially like a two column table or spreadsheet, however you want to think about it, whereby here's all of the domain names in the world. Here are all of the corresponding IP addresses in the world. And so when your Mac or PC or phone being used by you is trying to access google.com or harbor.edu, edu. That device certainly when it's first booted up has no idea what IP address what the IP address is for that server. It's not the case that Apple or Google are pre-installing billions of IP addresses inside of our devices. But your device is smart enough to ask the local network at home on campus or at work. Well, what is the IP address of google.com? What is the IP address of harbor.edu? Then what your Mac, PC or phone actually do upon getting that answer from one of these local DNS servers is it writes the corresponding IP address on the outside of that envelope. So it's a wonderfully useful service that just makes the internet more useful for you and I to use because we can use names instead of IP addresses as well. Um technically these things are called fully qualified domain names. Where do they come from? Well, some of you might actually have your own personal website. You might have gone through this process. It's actually not that hard to get your own domain name. You can go to any number of what are called internet registars and pay them some money and it's essentially a on a rental basis. So you rent a domain name for a year or maybe three or five years at a time and they can automatically bill you. The domain name might be as little as a dollar per year or thousands of dollars per year depending on whether someone has scooped it up and is maybe squatting or the like. But all you do ultimately is pay someone money and they give you the rights to use that domain name. And then what you do technically is you configure some DNS server somewhere in the world to know what the eventual IP address is for your server that's going to serve up your domain names, web pages. And long story short with DNS, I say that you have one in your home and on your work and on your campus because it's a very hierarchical kind of structure. like there is out there somewhere these so-called root servers that essentially know what all the IP addresses are of all of the dotcoms for instance or all of theus or the like but my Mac doesn't know that and so my Mac might actually ask that root server what is that IP address but in ter more efficiently my Mac is better still going to ask the local network first when I'm at home it asks my home DNS server which is built into the little home router that you've got somewhere in there uh or if you're on campus it asks Harvard's DNS server And this whole design is recursive to borrow a term from a few weeks ago in that if my computer doesn't know the answer, what's the IP address for this domain? If Harvard doesn't know the answer, it eventually gets escalated to those so-called root servers, but then cached that is remembered by all of these other DNS servers along the way. So, it's a very elegant hierarchical design, but at the end of the day, it's just doing this. It's a big cheat sheet of domain names to IP addresses, and the server is responding for us. All right, one more acronym. So, how do I know what my MAC's IP address should be? How do I know what my phone's IP address should be? Uh, how do I know what the IP address is of the DNS server of whom I should be asking any of these questions? How do I know the IP address of the router to whom to hand my data off to? Like, there's a lot of assumptions built into the story we've been telling. And the answer is, unfortunately, yet another acronym, DHCP, is the solution to all of those problems. And it wasn't always. You know, back in my day, we used to have to manually type in what our computer's IP address was based on what some human told us it would be. We had to type in our DNS server, type in our router address. But now, uh, now DHCP is just yet another server running in your home network, running on campus, running in your corporate network whose purpose in life is to answer questions of the form, what is my IP address? which is to say when you boot up your Mac, your PC, your phone for the first time, it essentially broadcasts a message like hello world, what's my IP address? And hopefully there's one such DHCP server on that local network wired or wirelessly that will respond based on how Harvard or Comcast or Verizon or someone at home has configured it to tell you what your devices IP address is, what the IP is of your local router, what the IP address is or are of your DNS servers and the like. And so this is why things just work nowadays once you've connected to like a Wi-Fi network or physically plugged in. Dynamic host configuration protocol didn't always exist. Wonderful that it now does. All right, enough sort of outside of the envelope stuff. Everything else today will be a deeper dive inside the inside of this envelope to look at what actually are the messages that we are sending, receiving, how are you structuring the web pages and designing everything that comes back from the server to the client. And let's dive in then to this acronym HTTP which you've been typing for years or seeing for years even though you don't really have to type it anymore because browsers just assume that this is what you want. But HTTP is another protocol, hypertext transfer protocol, whose purpose in life is to request web pages and receive web pages. As a protocol, it just standardizes like what goes inside of that envelope when you're trying to use the web. There are different protocols for email, different protocols for Zoom, different protocols for Discord, and any number of other internet services. We'll focus predominantly today on HTTP, which happens to use ports 80 and 443, among others, as we saw. So let's see what HTTP uh it uh is all about or HTTPS the corresponding secure version thereof. So here is a URL canonical URL in that it has a whole bunch of components. Let's consider what some of the jargon is that we're going to start taking for granted. So if you go to httpswww.agample.com/ you are implicitly requesting the root of that website. root just means the default directory, the default folder if you will. And that's what the yellow highlighted slash here just means like give me the default web page. Technically speaking, what you're going to receive in your browser, unbeknownst to you, is an actual file. By convention, it's a file called index.html, maybe index.htm, or any number of other files. But it would be pretty stupid if we as humans all had to type out the actual file name that we want. So the server by default is just going to return you the root of the website. If though you're inside of a folder or you do actually click on a link that leads you to a file, you might very well have at the end of this domain name a full path as well, which might contain zero or more folder names and zero or more file uh zero or one file names as well. In fact, it could be explicitly file.html orfolder/or/folder/file.html. You've probably seen thousands of these over time, even if you haven't really given it much thought. So we today onward will be creating all of this stuff here but we need to understand what's going on to the left too. So here is the so-called domain name or more properly the fully qualified domain name and it has a few different parts too. So this is technically the domain name as we all refer to it something.com means commercial and that com is more specifically known as a tople domain or tldd. Back in the day there were only a few of these.gov.com.net.org org.edu and a bunch of others. Now, there's hundreds, if not thousands of them. Many of them aren't really used prominently in the wild, but there are some not on that original list, like CS50 uses. IO a lot, which doesn't mean input output. It's actually a two-letter country code that has been uh uh essentially rented to us and anyone else using that same TL because in the English- speakaking world, io actually sounds kind of cool. It's kind of conotes indeed input and output.tv TV is another one that actually belongs to a country but in fact also sounds like uh in English television and so that too has been used as well but in general there are top level domains like these some of them now are full words some of them are two characters denoting they belong to a country they are the sort of top level indeed uh categorization of all of these websites meanwhile many URLs but not all also have something to the left of the domain name known as a host name which technically speaking refers to the name of the server that you're requesting specifically. It doesn't have to be literally one server. www can refer to dozens of hundreds thousands of servers. Indeed, if you go to any popular website like gmail.com or the like. Even though you only have one domain name, somehow or other technologically it is referring to clusters of hundreds or thousands of servers that ensure that they can handle all of the customers that might visit that site. And then lastly, there's this the scheme or the protocol in use specifically. And for our discussion today, it's always going to be HTTPS, which is ideal because it's secure and encrypted somehow. Uh, but it can also be indeed HTTP col. So that's it. Like that's just the jargon with which you should be familiar when it comes to URLs like these. And what we'll be doing today is actually creating content that lives at URLs like that and serving it up to us. But what do the messages ultimately look like that are going inside of these envelopes? what the URLs are doing are just getting us to the right place. But how do we express in some form of code that we want this fileh from this server using encryption in this way? Well, inside of the virtual envelopes that Phyllis was sending to Brian and he would have ultimately sent back to her are messages that look like this. Uh get, post, and a bunch of other verbs, if you will. So, HTTP supports a bunch of operations or verbs, namely get, post, and a few others. And it was in the the first of these that Phyllis would have put inside of her envelope initially in order to get a web page like a cat from Brian. Specifically, inside of the envelope, she would have had a textual message. It's not code per se. There's no functions or loops or variables or anything like that. It's a protocol just in the sense that humans years ago standardized what messages should appear inside of those envelopes if you want to get a web page from a server. So for instance, if Brian in this story is now suddenly harvard.edu, specifically www.har.edu, Phyllis's envelope would have contained a message saying get in all caps slash if she just wants the root or the default page from Brian's server, the version of HTTP that she's using, for instance, version two. And she would also specify just in case Brian is multitasking and serving up websites for different domain names on the same physical box which actual host that she wants and maybe a bunch of other lines as well. And hopefully if all goes well, Brian would have responded with an envelope of his own containing an HTTP response in answer to her HTTP request. And Brian's envelope would have contained a textual message that just confirms what version of HTTP he's using, a status code, which is an arcane number that just indicates in this case that everything is okay. All is well, and he would specify the type of content he's sending back to her in his own envelope because it could be HTML. More on that to later today. It could be a JPEG, it could be a GIF, it could be any number of other file formats. And this is just a hint to Phyllis's browser as to what's going to be inside of that envelope she is getting back within her browser. And then maybe a bunch of other stuff as well. So even though some of these details like these underlying implementation details might visually be new to you if you've never really thought about it, turns out we as aspiring programmers can actually see and and poke around with these building blocks and ultimately today take advantage of them. So you're about to see a program that's called curl which stands for connect URL. It's installed in Linux systems like cs50.dev. It's also comes with Macs and PCs quite frequently or you can easily install it. And essentially it's a headless browser that allows you to pretend to be a browser and grab the response from a server by pretending to send by actually sending the contents of an envelope like this. So for instance, if I want to pretend to be a browser and request harbor.edu, edu. I can type this in my cs50.dev terminal window. And let me go ahead and maximize its size and do the following. curl- i, which specifically is only going to show me the headers, the text that we were just talking about. And it's not going to send any of the contents of Harvard's website. Curl- capital I httpswww.harboard.edu/. So if I were typing this into a browser, I would actually see Harvard's homepage. In this case, I'm just going to see the contents of the envelope as black and white text on the screen. Specifically, only the first few lines, the so-called headers that the server is responding with, just as I claimed Brian would to Phyllis. I hit enter, and there's indeed more lines than I had in my slide, but you can see that everything is in fact 200. Okay, this is a convention. 200 means all is indeed okay. There's a bunch of other information here, including the date and time in which this response came back. Here's that content pipeline text HTML and then some other details and a whole bunch of other information as well. So that's one way of seeing what's going on underneath the hood. Well, what other responses might come back? Well, it turns out that 200, okay, is the best possible outcome, but there's another a bunch of other outcomes that are possible as well. For instance, sometimes you'll get not 200 but 301, which means moved permanently. uh it uh colloquially speaking and what does this mean? Well, if a server responds to a browser with a numeric code of 301, that means that the browser is supposed to go to this location instead. It's sort of like putting a detour sign on the server that says there's nothing for you here. Go over here to this location instead. And now notice in this example, it's telling the user to go to httpsw.har.edu/ do slash that's actually what I typed before so I would not have seen that myself but if I go back to VS Code here and let's run the exact same command but let's try to visit the insecure version of Harvard's website http slash which just means that anyone else on the internet can technically see what it is I am now doing with my browser which might not be desirable enter this time Harvard server does not just tell me 200 okay it actually says 301 move permanently and if I read lower in these lines there indeed is the location to which I should actually go and it's a subtle difference. It's forcing me to go to https instead without actually showing me the contents of Harvard's website. So nowadays you and I don't even have to think about this. You and I are not even in the habit surely of typing http or https col. But the browser is ensuring in this case that you are redirected so to speak automatically to the secure version of that site instead. Now there's other status codes and in fact even if you never realized it before now what numeric code do you essentially you sometimes see on the internet when something goes wrong 404. So 404 is a weirdly public arcane error number error number or status code that just means file not found. And we can simulate this as follows. For instance if I in my terminal window do curl-hwww.har.edu I'll suppose that Harvard has a whole department dedicated to cats, which it does not. But if I hit enter here, you'll see that I get an HTTP24 status code, which just means the website does not in fact exist. And if I visited https/www.har.edu/cats in my browser, I would presumably see some error page that may or may not show me visually 404. But many websites, most websites, for better or for worse, reveal this number. So much so that most everyone in this room is probably familiar with 404, even though its origin is this very low-level arcane status code buried in the HTTP headers inside of envelopes like these. There's a whole bunch of others if you'd like some fun facts. Uh 200 is indeed okay. 301 is moved permanently. There's a bunch of other 300 ones that all relate to go elsewhere. Uh 400 generally means that you as the user have somehow done something wrong or next week as we start writing code that talks to web servers. Maybe your code has done something wrong when requesting a website. 500s are really bad. It means the server is messed up somehow. Either it's not available or the programmer made some bug in their code such that it's crashing with for instance something like an internal server error. Uh, we included 418, which is not actually a thing, but it was a fun uh um sort of April Fool's joke years ago where a bunch of uh humans thought it would be funny to write up a whole specification for what it means for a server to respond with a number of 418. Inside joke, not funny at the moment, but uh it is sort of part of internet lore nowadays. Um we can have a little bit of fun with this, maybe with the at the expense of our dear friends down the road. Um, for years now, someone has been paying for uh the following behavior. Let me go back to V uh VS Code here in my terminal window. Let me do curl- httpsychool.org. Have you ever been ever reply perhaps? Well, let me actually go to httpsafetyschool.org and just for fun, hit enter. Oh my goodness, look at where we are. So, how is this implemented? Well, if I finish what I began over here by just looking at the HTTP headers inside of the envelope my actual browser just sent to safetychool.org for like 20 years, presumably some Harvard alum has been paying the bill to rent this domain name just to have this trick implemented such that 301 move permanently is directing people ever since to yale.edu. There's a bunch of others if you go down the rabbit hole of looking on Reddit and the like Stanford, Berkeley, there's a healthy competition on East Coast and West Coast, but it all boils down to very arcane understanding of how HTTP works, the protocol that governs how data is sent from web browsers to web servers. Now, you can of course use curl for connecting to URLs in the context of something like CS50. You could have been doing stuffing stuff like this all the time though with your actual browser. So, I'm using Chrome here, but most any browser nowadays has the ability to give you developer tools uh natively, which is to say somewhere there should be an a menu option that lets you use developer tools that are conducive to someone who knows a bit of programming to poking around underneath the hood of the browser and see what's going on. For instance, I'm going to go ahead and open up a new window here, and I'm going to rightclick on the background, or I can go to the appropriate menu in Chrome's dot dot dot menu, and I'm going to go to inspect, which pulls up what we're going to call developer tools. I'm doing it incognito mode for reasons we'll see next week. This has the effect of clearing automatically any of my cookies, my browser history, because most anytime I do something with the web browser today, I want to pretend like I'm doing it for the very first time so that the behavior is exactly as we suspect. uh expect. So down here, now that I've opened up the so-called developer tools in Chrome, and they look almost the same in Safari and Edge and a bunch of other browsers as well, I will see a tab called elements, which shows me all of the elements of this web page once it appears, including the so-called HTML code we're about to write. I can see a console where error message might sometimes appear, similar in spirit to the terminal window in VS Code. I can also see the network connections that the browser is making to the server. And that's where I thought we'd start our attention here. Here I have a brand new browser window. I'm clicking on network over here. Um, just to make sure we can see everything without it getting automatically deleted, I've clicked on preserve log and disable cache just so that it behaves exactly as expected. And now let's go up here for the first time in this incognito window and go to http/safetieschool.org. Enter. And you'll see a whole bunch of output including this warning in this particular mode. This is increasingly common nowadays for websites that do not support HTTPS, which this alum hasn't been paying for. Uh you'll get a warning typically that specifies you might not want to do this because the whole world, at least the whole world between you and point B, might know what it is you're uh accessing on the web. I can go ahead and pass through this. In fact, once I do that and click on connect to site, we'll see even more output at the bottom and a whole bunch of output that's kind of overwhelming. Notice at bottom left here, just going to safetychool.org resulted in 61 HTTP requests, in effect, 61 envelopes going back and forth. I'm going to focus though on the ones at the very top here, whereby when we finally click through that warning, and I got back a response from the server, having visited safetieschool.org, here is Chrome's presentation of the same information that curl was showing me in my terminal window. The message that came back was 301 move permanently. The protocol or the verb being used was get. There's some uh mentions of the IP address in question here and a whole bunch of other stuff that we'll wave our hands at for today. So all of this time you can see the same and let's try this with some cats. Let me click on the little ghostbuster symbol to clear everything uh down in the developer tools. Let me zoom out and this time let me go to httpsw.har.edu/cats edu/cats which recall did not exist according to curl. If I hit enter, I do see a web page. It's interesting that Harvard has chosen to fairly arcanely reveal to all visitors 404, which means nothing except in so far as the status code. But if I scrolled through all of the 59 requests that were involved and just displaying this very graphical page and go back to the top, you'll see by clicking on the first row for cats itself that I used get to get it uh that URL/cats in the end and it was indeed 404 not found. So you can sort of have all this fun on your own by just poking underneath the hood of what your browser has been hiding from you all of this time. All right. Any questions now before we dive in? No. All right. Well, that's the network tab. Let's look at some of the others and see how we can start writing the stuff oursel. Let me go to stanford.edu. Enter. A whole bunch of things will fly across the screen, but this time I'm going to go to the elements tab. And what we're about to dive into is an actual language, not a programming language, a markup language called HTML, hypertext markup language, whose purpose in life is just to tell browsers what to display on the screen. So here is all of the so-called HTML that some human or humans or software at Stanford wrote in order to create Stanford's homepage, which as of today looks lovely like this. Uh the interesting thing though about the code that Stanford has written to generate this website is that it's being sent to me as a copy. And this is quite unlike the code we've been writing thus far. Um when you wrote code in Scratch, it was sort of there in the browser and stored on MIT server. When you wrote C code and ran it, it was inside of the code space and not given to any user who might access it. The way the web works though is a little bit different. Inside of those envelopes are literally copies of what's on the server being sent to the browser. And so it's your browser, the so-called client, that's actually reading that code, HTML in this case, top to bottom, left to right, and figuring out how to display it. It's not executed on the server per se. Now, that story is going to change a bit next week when we start using Python to dynamically generate HTML so that we're not writing all of this code by hand after this week, but for now, everything you see was the result of the browser executing code that Stanford wrote. The implication of that is that we can have a bit of fun with these same developer tools. For instance, if I control-click or rightclick on something like the word Stanford in the middle middle of their homepage, choose that same inspect option. What's nice about these developer tools is it's going to jump to the very line of code that created that Stanford brand name in the middle of the web page. And this is a wonderful teaching and learning tool because in the days to come when you're trying to learn more and more HTML, you can literally do this for any website on the internet and understand how it is someone implemented a design for instance that you really like and you can learn from other websites how they've constructed the same. So over here you'll see that the word Stanford is just in the source code of this page in the so-called HTML and you know just for fun I can change it to Harvard. Hit enter and now Stanford's website looks like we've been there um and rather hacked it. Of course, it's not that easy to hack Stanford's website. What have I presumably only done just now? I've changed my local copy of that particular website. So, if I just click on the reload icon, I'll actually see that Stanford's website, for better, for worse, still looks like that. But this speaks to now the control that we have within our browser to actually manipulate and learn from what it is that's going on underneath the hood. So, let's dive into this language called HTML, hypertext markup language. It's not a programming language, which means we're going to fly through it even quicker than usual because it really just contains some basic building blocks that do have some interesting intellectual design under them, but for the most part, it becomes an exercise ultimately and just like looking up other tags that exist, read the documentation and figure out how you can use them to do other features in websites. So, let's take a look at perhaps the simplest of webpage and specifically glean from them what tags are and what attributes are. really the only two terms of art that are going to be generained for this particular language. No loops, no conditionals, no variables, no complexity really other than basic building blocks like these. So here is HTML for the simplest of websites. This is like a mini version of what Stanford's uh team presumably wrote on their server, but it's only like a dozen lines of code instead of hundreds or thousands, however long that website was. Any web page written today, assuming it's using the latest version of HTML, which happens to be version five as of today, uh begins with code that looks like this. This kind of code will presumably be stored in a file called file.html, uh index.html, Stanford.html, whatever the file is actually named. This is simply what's going to be inside of the contents. You could save this file on your own Mac, open it up, and your browser would open it, but you're going to be the only one in the world that can actually see the contents of that web page if it's just on your Mac or just on your PC. So, we of course are going to be writing HTML on a server so that not just you, but in theory, especially for your final project, anyone on the world with an internet connection can access the same. So, we within the context of CS50.dev dev are going to start using this new command HTTP server whose purpose in life is just to serve up files via HTTP. Now, there's kind of an interesting design going on here because if we use ht if we use uh cs50.dev, otherwise known as GitHub code spaces, there's already a web server running on that website because when you go to cs50.dev dev and log in and get redirected some longer URL. You're using a web application aka VS Code that allows you to write code in the cloud. Now, that application by default is running on port 80 and 443. So, it doesn't matter if you start at HTTP or HTTPS, both will work. But that means that your code that we write today and you write for the next problem set or for your final project can't live at port 80 or port 443 because GitHub, the company that hosts this, is already using those default standard ports. But we can use any number of other port numbers. I claimed earlier there's tens of thousands of numbers that we could use. So that's what we're actually going to do. So let me go back to VS Code here. Let me shrink down my terminal window. Let me create a first file today called for instance uh hello.html. Enter. And now I've got an empty tab as usual. I'm going to very quickly whip up the exact same contents that we just saw. So an angled bracket, an exclamation point, dock type HTML, then open bracket HTML, close bracket, and notice the autocomplete kicked in for this particular language. So I don't have to type everything myself. Inside of this tag, so to speak, I'm now going to put a head tag inside of which is going to be a title tag. I'm going to say something like hello title just to be quick. And then down here below those lines, I'm going to put a so-called body tag inside of which is hello body just for some quick text. And that's it. This is now a file inside of my code space. And there's no command to just compile or run this in the terminal because the goal is going to be to open this HTML file with a browser. If I want to do that in another browser tab, I need to tell code my code space to serve that file via HTTP. So, the simplest way to do this is as follows, http-server enter. You're going to see a whole bunch of text on the screen. You're going to see a green button hopefully pop up that says open in browser, which is going to allow you to open up, and I'll zoom in the contents of the current folder with a web browser. My URL has changed to be different from what it was a moment ago. I came in advance today with my own folder of code like we usually do. Source 8, which contains all of today's pre-made examples. But here is the file I just created a moment ago. And if I click on that hello.html, what we're looking at at the moment is just a directory listing, a directory index of all of the files in my code right now, I see the simplest of web pages. It's a little underwhelming, but clearly here's hello body, which takes up like 95% of the screen, the so-called viewport, which is just a big rectangular region of the screen, but there's the title in the tab up there. So, if you've ever wondered or cared like where does the content in a web page come from, well, here's the body content. Here's the head or the title content. And then everything else is just sort of icing on the cake. So, I've written at this point a file called hello.html. it has yielded this effect of having something in the head uh in the uh the head of the page and the body. But let's actually tease apart what just happened. So at the start of any file written in this language called HTML, the latest version thereof, five, it literally just starts with this. And this is just the kind of thing you memorize or copy paste. Uh open bracket exclamation point dot type HTML close bracket over there. It looks a little bit different because we're not going to use for the most part the exclamation point syntax anywhere else unless we're using an HTML comment. So HTML has comments just like Python, C and other languages. But let's focus really on this juicier part. Here we have what's known as an uh an element in HTML. An element includes a start tag and an end tag or equivalently an open tag and a close tag. So here for instance is syntax that essentially is going to tell the browser when my browser reads this file top to bottom left to right hey browser here comes the HTML of my page and the language in which the contents of this page are written are in English. So HTML all lowercase is the name of the tag so to speak and equivalently the name of the element. Lang is what's going to be called an attribute which just modifies the default behavior of the uh element and quote unquote en is the value thereof which is the shorthand notation for English and their shorthand notations for most every human language as well. So you have a tag name and an attribute with a value. And we've seen these things so many times. These key value pairs in the context of dictionaries or hashts or any number of other contexts. Key value pairs in HTML are separated by an equal sign with the value typically quoted in this way. Double quotes or single quotes but being consistent. Then notice at the end of this file as per the indentation, there's something symmetrically down here that has the effect of closing the tag or ending the tag. And this effectively tells the browser, "Hey browser, that's it for my HTML." Meanwhile, everything else follows the similar paradigm inside of those two tags. Here is a head tag that says, "Hey browser, here comes the head of my page. Hey browser, that's it for the head of the page. Hey browser, inside of the head, here comes the title, that's it for the title. Well, what is the title? Hello, title." Just as I wrote in my code space. Same story for body. Hey browser, here comes the body of the page. The 95% of the screen, that's it for the body. But what's in the body is exactly that. The indentation is nice and pretty printed. I've used four spaces as we commonly do. Not strictly necessary. In fact, in my own code space, I didn't even bother putting these on three separate lines. I just did one line. That's fine because as we'll see, browsers typically ignore whites space. Uh but I've done it there as we often do just to ensure that things are pretty printed and therefore readable by us humans. Let me call your attention to one other thing on the screen. Up until now, before every lecture, I've been hiding a whole bunch of tabs in my terminal window. But today, I left enabled one that you've probably seen but not cared about before, namely ports. And it's under this ports tab that you can actually see a real incarnation of a TCP port. By default, when you run the command HTTP server, it serves up my current folders content on its own web server, its own HTTP server, but not using the default port 80 or 443 because GitHub is already using those on CS50.dev and their product. But by default, we've chosen another common developer port number 8080, which is interesting only in so far as it's 80 twice, but it's a human convention, but it could have been any number of thousands of other possibilities. But this line here is just telling me that I am some apparently running a server on port 8080. And if I click on there too, I can manually open the same tab. But that's what the green button was doing for me. It was informing me, hey, you've just started a web server on this port. Do you want to open a new tab with the contents thereof? So this is the picture we're now painting. Let me pull back up the code that we just wrote and let me propose that what we've really done is built a tree in the browser's memory. So we kind of have come full circle with week five when we talked about trees and other hierarchical structures. If we assume that the document can be represented with a node that looks a bit like an oval up here that just represents the whole contents of the file. Well, it starts with a single root element by convention, the HTML element. And your page can have only one of those elements. But the HTML tag inside of it can be a head tag and a body tag. And in this case, the head tag, recall, had a title tag as well as the actual text thereof, which was hello title. Meanwhile, the body had just the text thereof as well. And so when I keep saying that the browser is downloading the file, for instance, hello.html, reading it top to bottom, left to right. It's doing literally that, but somehow or other, it's using Maloc or whatever language it's written in to allocate node, node, node, node, node, and populating that tree in your browser's memory or RAM, a data structure quite like that. So, it's all sort of gerine to where we've been before. Before now, we take I think a snack, are there any questions about what we've just seen? anything at all. Shouldn't have prefaced this with the only thing between us is uh these questions and snacks. No. All right, snack time. All right, see you in 10. Snacks. All right, so we are back and pretty much everything we do here on out will look structurally like this. And we're just going to introduce a few more tags and a few more attributes to give you a sense of some of the basic building blocks of most any website out there. And you'll find pretty quickly that it starts to get kind of tedious writing it out. In fact, I will resort to some copy paste today just to kind of speed things up. But this is going to motivate indeed next week when we reintroduce Python as well as SQL to actually auto automate generation of HTML as well. So all of today's websites and many of today's mobile apps are written in HTML. But people are in decreasingly writing this kind of stuff by hand. Rather they are writing code that generates precisely what we're going to learn. So understanding the fundamentals will still be useful so we know what code to write next week and beyond. So let me go back into VS Code here. And what I'm going to go ahead and do is open up another terminal window so that I can leave HTTP server running in this first terminal window. And what I'm going to go ahead and propose that we do is implement a web page that has not just a single line of text, but maybe some paragraphs. So I'm going to call this paragraphs.html. That's going to open up a new tab. And here's where I'm going to save some time. I'm going to go back to hello.html HTML and just highlight all and copy paste this as the beginning of this file. But what I'll start doing is just changing the title of each page to match the file name. So this is going to be my paragraphs example. And instead of saying just hello body, let's actually have a few paragraphs of text. Um I'd rather not waste time writing even full paragraphs of text. So let's actually open up the doc and let's log in and for instance just ask it for a help quick helping hand here. Write three paragraphs about uh computer science. don't really care what the output is. All I want is some dynamically generated text to save me some keystrokes. And here we have an educational answer there, too. Even though all we really care about today is the fact that this is three chunks of text. Hopefully, that's all quite accurate. All right, I'm going to go ahead and highlight all of that. Go back into my paragraphs.html tab. Paste it inside of the body. It's so long, the paragraphs, that the text scrolls. I can at least clean this up slightly. I'm going to go ahead and just indent it twice just so that at least it's pretty printed inside of the body. And now I'm going to go back to my other tab which represents the contents of hello.html. I'm going to click back which is going to show me that same directory listing again which now has a new file paragraphs.html and I'm going to click it so as to see these three paragraphs of text. What looks wrong? Yeah, >> paragraphs. >> There's no paragraphs. It's just one big blob of text. It's the same text, but buried in there is the end of the first paragraph and the start of the next, and same for the third. So, what's going on? Well, appropo of my comment earlier about browsers not really caring about whites space, you can put all the white space you want there. It's just going to ignore it in this particular case. All it's going to give me minimally is a single space between each of these paragraphs of text. So, HTML is very pedantic. Like, if you want there to be more paragraphs, you need to tell the browser, put a paragraph here, put a paragraph there. And the way to do this thankfully isn't all that hard. I'm going to go inside of the body here and I'm going to simply open a tag called open uh P for paragraph for short. Notice that VS Code in this particular case is a little annoying because it's trying to finish my thought, but it doesn't know that I already wrote this text. So, I'm just going to delete what it automatically generated. And then I'm going to manually indent this. And I'm going to do the same thing again for the other paragraphs. Up here, I'm going to open the paragraph tag. I'm going to delete temporarily the close tag so that I can actually put it below that chunk of text here. Indent this and then down here. And this would have been easier if I just did it right the first time. I'm going to do the same thing with the third and final paragraph. So now what we in effect have three times in a row is hey browser here comes a paragraph then the first paragraph. Hey browser that's it for the paragraph. Hey browser here comes a paragraph that's it for the paragraph. Hey browser comes a paragraph. So, three times in total with open, close, open, close, open, close. Now, if I go back to the browser, nothing appears to have changed yet, but that's cuz I'm looking at a copy that was downloaded a moment ago in that virtual envelope. So, this is why, among other reasons, we hit reload on web pages to get the latest version. And voila, now we have three actual paragraphs. Um, the white space is inserted automatically by the browser, but it's at least prettier to the eye now. So, that then is the paragraph tag. So, useful, of course, if we have paragraphs of text. What are some other tags we might introduce? Well, maybe you're writing a paper or a blog post or the like. It's pretty typical to want headings of sections of the page. Maybe chapters and then sections and then subsections or the like. HTML can help with this too. So, let me go into my terminal window again, create a file called how about uh let's call it headings.html. And then in this file, let me similarly go back to hello.html, copy paste it into headings. I'm going to close paragraphs because we're done with that. And I'm just going to change the title now to headings. And inside of the body here, what I'm going to go ahead and do is uh you know, it would have been nice to have some of that same text. Let's let me go back one step. Let me grab the paragraphs and paste that into this new file. Let me rename it to headings to make clear which file we're in. And now let me go ahead and propose that wouldn't it be nice if I made clear that this is the first paragraph. So I'm going to use the H1 tag, which is the heading one tag. And I'm just going to say one for the sake of discussion. And down here, I'm going to say H2 and say two for the sake of discussion. And down here, H3 3 because I don't really care what these things are called. Just want to demonstrate the functionality. If I go back to my other tab now, back to the directory listing, there's my brand new file headings.html. And it's the same paragraphs, but now you have some big bold text that looks reminiscent of the chapter heading, the section heading, the subsection heading, and the like. Or that you might see on a news site or a blog site or the like. So you've got H1 through H6 from biggest and boldest to uh smaller but still bold. And the browser decides on all of those settings for us. But it also makes some semantic clarity to me that probably the most important thing on the page at least to begin with is that H1 tag and then everything else is like supporting paragraphs or arguments or whatever the case might be. There's a hierarchy implicit there. All right. What are some other things we can do with web pages? Well, let me open my terminal window again and why don't we code up how about a list of values cuz lists are everywhere on the internet. So, let me open up list.html and then close my terminal. Uh, I'll go ahead and start with that same file, headings.html, paste it into list, change the name here. Let's delete everything I did. And again, the only reason I'm copying and pasting is just to avoid writing out the same boilerplate code again and again with the HTML tag, head tag, body tag, and so forth. Let's focus on the new stuff. The new stuff in this example will be a list of values like the words fu, bar, and baz, which much like a mathematician might go with xyz as placeholders, computer scientists would typically reach for words like fu, bar, and baz when nonsensical placeholders. And this looks like a list of three values, one after the other. Of course, if I go back into my directory index, click on list, how many list items am I going to see per line? Yeah. Well, it's going to be just one big blob of text here, too. It doesn't matter if it looks like a list. It is just going to be text after text after text separated by a single space, not the multiple lines I had. So, here too, we've got to be pretty pedantic. If I want a list of values, I need to use a tag that conveys that. And the tag I'll use first is going to be ul for unordered list, which gives me a bulleted list. And then inside of this unordered list, I claim we're going to have a whole bunch of list items or li for short. uh like fu, like bar, like baz or any other things that you want to put in your list. If I now go back to my other tab, reload, now you get the familiar bulleted lists that you might see in any number of websites, Google Docs or the like. How does Google Docs do it underneath the hood? Well, they're just using a UL tag and some LI tags inside of that to give you the bulleted list that's just happening automatically when you click the appropriate button in something like Google Docs, which at the end of the day is just a website. Well, what if I want to number these things? Well, if I go back to VS Code, I could certainly just start numbering them like 1 2 3, which is fine, but honestly, like computers can count and with loops pretty quickly. Also, it's a little annoying. If I want to go back in later and insert something between some of those elements, I then have to reumber everything manually. I mean, this is one of the things computers are good at. So, take a guess. If I want not an unordered list, but an ordered list that is numbered, what might you change? Yes, O is a good bet. Let's change both the open tag and the close tag. Let me go back to this uh my second tab. Reload. And now we have it. Uh one, two, and three. And you can actually use a whole table of contents. You can use uh sub bullets or subning. Anything you can do in like a table of contents, HTML can do for you automatically here. Well, what about tabular data? Laying out data in kind of rows and columns. Well, we can do that, too. Let me go ahead and open up a new file. Uh how about table.html. HTML. Let me go ahead then in this file, copy paste as before, just so I have some boilerplate. Let's get rid of everything in the body. And then let's just manually whip up a little table like this. Open bracket table. Inside of the table tags, I'm going to have a TR tag for table row. Inside of this table row, I'm going to have a table data tag, which is going to have the number one. I'm going to give myself another two, another three. Outside of the table row, I'm gonna have another table row. And I'm gonna create maybe four. And now I'm going to do five. And now I'm gonna do six. And you can perhaps see where this is going. After this, I'm going to do one more table row. How about a little tediously? Seven. How about eight? How about nine? And then lastly, just to make it look a little familiar, final table row. How about with a TD of an asterisk? And then how about a zero? And lastly, how about a pound symbol? Maybe. Any guesses as to what we're making in HTML here? Like a telephone keypad. Yeah. So, let's go back over to Let me close the old file. Back over to the browser. Click back. There's my new file, table.html. And it's not going to be very pretty, but I dare say that's exactly what you see when you pull up the phone app and you start dialing a number. It's sort of a numeric keypad laid out automatically for me in rows and columns. Now, this one's a little underwhelming. Let me open up a file that I made in advance of class today. Um, in my favorites uh file here, I'm going to go ahead and copy a pre-made example. I'm going to open up this file called favorites0.html. And what you'll see here is a slightly more complicated table, still with a table tag, but this time with a t head tag for table head and then a tbody tag inside of which are all of those rows. And I know this just by having read the documentation. And then notice this. Inside of the first TR in the T head, there are three TH's, table headings, timestamp, language, and problem, which might sound a little familiar when we last collected data from everyone via that Google form. Well, let's go ahead and spoil what this is. Let me go back to the directory index. There is this pre-made file, favorites.html, and arguably a more compelling use of a table. Now, we have an HTML table containing all of the form submissions that you all clicked in with the other day when we were asking you your favorite language and your favorite problem. It's not super pretty, but indeed it's in rows and columns. And so, it's reminiscent of the HTML that Google is using in the actual Google Sheets software to lay out a sheet of data for you in those same rows and columns. All right. Well, let's do something that's a little more visually interesting. Let me go back to VS Code here. uh close out those first uh those last two. And how about let's do something with images? Well, I brought again uh inside of today's code. Uh how about our same bridge that we keep opening up in class? And this is the week's bridge. Looks a little something Whoops. Uh looks a little something like this. Here though is just the raw image. How could I include an image in a web page that I serve up on the internet? Well, let's go ahead and try this. Let me close the ping itself. Let me copy this and create a new file called how about image.html. Hide my terminal. Copy paste that. Just quickly change the title to image so we know where we are. And inside of the body of this page, let's go and embed that image so that we can include not just the image, but if we want paragraphs of text around it, headings as well. Heck, maybe a table, any other features that we've seen already. I'm going to say img, which is image for short. Source src for short equals quote unquote bridge.png. And then I'm going to close the tag here. Now I'm going to go back to my other tab. Go back into my directory index. Here's my brand new file, image.html. And this too isn't going to look all that different from the actual image because I have no other content. But when I click on this, you'll see that there is the full screen image. And it's even a little too big to fit in my viewport in the body of the page. But we can fix something like that later. I've embedded in this website precisely that image. But I should do a little bit better here. In fact, if the image is slow to load or if someone uh is visually impaired and doesn't know what they're looking at, it would be nice to have some alternative text that something like screen reader software could recite. So, there's another attribute for this tag specifically called alt for alternative. And I can put something like Harvard University to at least give the user a textual description of what kind of photo they're looking at. You'll also see that text if indeed the image is slow to load or if it's broken, like missing altogether, you won't see 404. you'll see like a broken image icon, but at least with some explanatory text as to what the developer intended you to see at that point. It's not going to change at all if I reload here by going back to image.html, but again, a screen reader or an astute viewer would see that ultimately in the browser. But there's something different, and this isn't a mistake for once. What have I done differently, but apparently not wrong? I claim something new or noteworthy about this particular image tag. Yeah. >> Yeah. There's no like close tag. There's no like open bracket/ img which is the pattern we followed for every other tag like closing the HTML tag, the head tag, the body tag and so forth. I just don't see any end tag here. And it's just not necessary. Turns out there are certain HTML tags that can be empty elements, which is to say doesn't make semantic sense to start and end an image. Like it's either there or it's not. And so some tags just don't require an end tag if it's sort of obvious to the browser that the image should go there. So image is one such of those tags. And then I noticed um I'm missing the lang here, which isn't strictly necessary because I've got no textual content, but just for consistency, let me go back and put that in as before. Um, meanwhile, um, the image is exactly as it would appear in the screen, but it doesn't have to be just an image we embed. We can do something with like video. So, let me go ahead and open up a file called video.html. Let me copy paste some of that starter code. Change this to video. And instead of the image tag, as you might imagine, there's also a video tag. It's a little more involved, but per the documentation, I know I can do this video. And then inside of the video tag, I can actually have multiple sources just in case the browser might want different versions or different resolutions, sort of qualities thereof. And this somewhat confusingly is an actual tag called source, not shortened, but stupidly this tag has an attribute called source, which is shortened that equals the name of the file you want to embed. And I came with today's examples, a video file called video.mpp4, which is a small video that you can embed. And I can tell the browser what type of video it is to be clear. And the convention here or content type is to say the type of this video is an MPEG 4 video. There are other features though for the video tag. In fact, in when you see a video on a page, you can very often see like a play icon, a pause icon, maybe some other controls. Well, it turns out you can put an HTML attribute on the video tag literally called controls that will enable those. If you don't turn them on, there's no way to like start and stop the video and or see rather those controls visually. This way, the user actually sees them. But this attribute is a little bit different from others. It doesn't actually need a value. It just has to be present and the browser will know when it sees the word controls, oh, I should turn on the controls feature. And for good measure, especially in today's world of advertisements everywhere, if you want the video to play automatically potentially, uh, or at least not annoy the user, you might want to mute it by default as well. So another attribute per the documentation for the video tag is that you can start the video muted as well. And only when the user clicks on it might you actually start to hear something. But of course these are fairly basic examples of media inside of pages. Let's actually do what the uh H is meant to imply in HTML. The hypertext the ability to link from one page to another. That is a feature we haven't yet seen. So let me go ahead and do this. And let me just for completeness, let me go back into hello.html because I completely forgot the language attribute, even though that's really just there for SEO, search engine optimization, or for tools like Google Translate or the like that know therefore what language they're translating from. Um, let me go into my terminal window here and let's create another file called link.html, which demonstrates exactly that, the ability to link from one web page to another. Uh let's go ahead here and change the title to link so I know where I am. And in the body of this page, let's go ahead and create what's called a hyper reference or hyperlink. Uh I'll encourage people in this page to visit the actual Harvard website. So let's do visit. How about uh Harvard period just to demonstrate where we're beginning. If I go back into this directory index, click on link.html. This, of course, is not yet a link, so I should probably make it one. Well, instead of just saying visit Harvard, maybe I should say harvard.edu. Go back to the other tab. Reload. And it's harvard.edu, but I can click and highlight it, but it's not clickable. It's not underlined like a link. All right. Well, maybe I need to do like www.harboard.edu. Reload. Still nothing happening. All right. Well, maybe I need the full URL in the scheme. https and maybe the slash at the end. Reload again and nothing's happening. So here too, HTML is pedantic. Like it will not create a link for you unless you tell it to create a link. And the fact that when you post on social media nowadays or in Google Docs, things are automatically hyperl for you, like that's a feature implemented in code. Very often, Python or JavaScript or something else where some human wrote code that looks for patterns in the uh input you've typed in and if it looks like you've typed a URL, it will automatically link it for you. But what are those websites doing for you automatically? Well, they're doing this. If you want to have a tag, a link here to Harvard's website, you use open bracket a for anchor, href for hyper reference. Set that equal to the URL to which you want to link. Close the tag and then in between the open tag and the closed tag, put the actual word you want to link to. So now if I go back to this page and reload, now I have what looked like my original attempt, just visit Harvard, but it's a hyperlink. And this is super subtle, but if I hover over that underlined word, which is blue by default, you'll actually see in the browser's bottom lefthand corner where you're going to be whisked away to, even though that's all too subtle, but this now looks like I intended, an actual hyperlink to Harvard. In fact, I could link it to the full URL, but it would be a little redundant. And even though this looks like uh you shouldn't have to do this, this is indeed how HTML works. The href attribute is where you're going to go. The text inside of the open and close tag is what the user will see. So if you want them to see the full URL, you got to put it there. And now I can see the full URL to where I'm being led. But here's where you can actually introduce discussions of like cyber security. How could this feature be abused? Might you think? This stupid simple feature. Yeah. have it display something but actually >> yeah you could have it display one thing but lead to somewhere else and it wouldn't be that hard for the adversary who's maybe tricked you into visiting their web page to say you're actually going to go to yale.edu edu instead of Harvard. But if I reload the page, it doesn't look any different. Unless the viewer is astute enough to look at this tiny little text in the bottom of the screen or just click on the link and be whisked away to the wrong destination. That can be problematic. Like this is a nice haha sort of prank. But you could certainly imagine doing this with like paypal.com addresses or any number of banks or anything where you're trying to collect personal information from someone. And if the resulting website looks quite like the one you're actually creating, uh, it looks quite like the website they're expecting, but it's actually your copy thereof, it's all too easy to wage what are called fishing attacks. P H I S H I N G, which means to lead someone to what looks like the real site, but is not. Typically, to get their username, their password, their credit card information, or something else. But it boils down to just these basic building blocks like this. questions then on any of these building blocks that we've seen thus far. Yeah. >> I think I might have gone lost in the earlier portion. >> Sure. >> How did you um like get get it to open up? Like did you run the file in >> Oh, good question. How did I get it to open up? So, let me rewind. So, the very first thing we did after creating hello.html HTML was open a terminal window and specifically I ran a command which was HTTP server http-server which starts my own web server in my code space but not on the default port 80 and443 because that's what cs50.dev is already using instead it chose by our design 8080 which is commonly used by developers when making websites. Then I just kind of hid my terminal because it's not interesting to see constantly then. But that web server is still running in my code space. And anytime I'm saying let's go back to this tab, I am now visiting a different URL that was the result of my clicking on that green button which led me to my own website. If you ever get lost or close that tab by accident, no big deal. If you go to the ports tab of your terminal, you can actually hover over this and click on that same URL and open up the contents of your own site instead. >> Fluffy meme. Yes, these are randomly generated names by GitHub, which is the company that hosts VS Code in this way. And they do this to ensure uniqueness without it being some arcane sequence of random letters and numbers. They concatenate random English words together. A good question. All right. So, what else can we do here? Well, let me propose that there's a bit more you can do with even these URLs. Here, of course, is the scheme and the host name and the domain and the TLD. But after the URL, things can get a little more interesting than just folder names and file names. In fact, it's quite common to see URLs that have somewhere in them a question mark and then a bunch of other key value pairs which is this omnipresent computer science thing it seems including in the context of URLs whereby if you want to pass a input to a web server one means by which you can do that is literally in the URL itself. So for instance, if you visit google.com and you want to search for something, you and I are all in the habit of course of just typing into a search box. But how is that search box actually getting the data into Google's servers? Well, it's via these URLs. And if there's not one input, but two inputs, the URL might be a bit longer and there might be one or more amperands in the URL that just separate more key value pairs. And it turns out we can see this in the real world as follows. Let me go back to VS Code here. Let me open up a new tab. Uh, and let me open up uh, google.com. And I'm just going to hit enter on the shortest way of saying it. So, I get to Google's home uh, homepage here. Even though notice I ended up at some longer form of the URL. In fact, I'm going to delete everything else from the URL that's not relevant to us today. It's still forcibly coming back. So, Google is somehow trying to track me by putting that in there. That's fine. All I'm going to do is search for cats. Now, there's a whole bunch of other functionality that's clearly happening, like autocomplete, and it's trying to figure out what results or words I might want. I'm just going to go ahead and hit enter. And this is all to say that notice if I zoom in on the URL at the top of my screen, it's a crazy long URL because Google probably is doing a bunch of tracking and advertising and analytics technologically, none of which is relevant to us today. But notice after www.google.com, there's /arch, which is the path on their server, the search program that someone there has written. There's a question mark and then there is an HTTP parameter as these things are called the more precise name for key value pairs in URLs. This is an HTTP parameter. Its value after the equal sign is in fact cats. All this other stuff I have no idea what it is. I'm going to just delete it and hit enter and it stays gone. But I still get cats in my search results. So this I would argue is sort of the canonically shortest form of a Google URL that's useful. In fact, if I want to search for dogs instead, I don't have to use the search box. I can literally manually make my own URL, hit enter, and if I zoom out, there are Google search results about dogs. So, this URL 2 is sort of the essence then of how URLs work. And specifically, the get verb, which was that keyword in all caps that I claimed was inside of the envelope, and it's what Phyllis's browser was sending, and it's what my browser has been sending through all of these examples. But here's where things now can get interesting. If I know how Google's server works, its backend, the part that knows all about cats and dogs on the internet, I can implement my own front end by just knowing a bit of HTML. So, let me actually go back into VS Code here. Let me go uh into my second terminal, which is blank, and let me go ahead and create something called search.html. I'm going to go ahead and copy my original code, close link, and paste it here. Hide my terminal. call this thing search and then inside of the body of this page I'm going to make my own version of Google here. I'm going to use a form tag and I'm going to in that form specify an input tag whose name is going to be exactly equal to what I saw Google uses Q which happens to stand for query. Uh I am then going to add another one input. Uh the type of this button actually let's say the type of this box this input is going to be text. The type of this next one is going to be a submit button. Uh, and then that's it. Let me go back into my other tab. Go back into my directory listing. Click on search.html. And this is not pretty, but it is the beginning of my very own search engine. Unfortunately, if I type in cats, notice what happens. My URL changes such that it's search.html question mark q equals cats. I know nothing about cats. I don't have a database of cats. I haven't done any backend work, just the front end. The front end is what the user sees. The back end is what provides data to the front end. But why don't I tell this form not to submit to me. But let's say that its action should actually be go to go to https www.google.com/arch which is the URL that I saw in my browser. I'm just inferring how Google works. I'm going to be pedantic even though this is the default. I'm going to say the method I want my form to use is get. Confusingly, it should be lowercase here, even though inside of the envelope it will be all caps. And then I'm going to go back to this page. Reload after going back. And you'll see the same exact box, but when I search now for cats, submit, notice my URL changes to Google's own. It's like voila. Like I just implemented my own Google without doing the actual hard part. I've actually just done the more simple front end. And there's a few other things I can do here that are sort of nice. I can change the type to be a search box. I can change the value of my button, not to be the default, which notice was submit. I can say Google search. And I can keep tweaking this to make it even prettier and prettier here. Now in my version is now a box that has uh cats. Notice that it's trying to complete my thought. I can actually go back into the form. I can say autocomplete equals off to turn off that feature. So now if I click in this box and type Oh, autocomplete equals off. Why is it still there? >> Did I forget to refresh? Oh, thank you. I forgot to refresh. Hence my point. So you always have to reload after making a change. And now the autocomplete feature is off. And this other little thing, it's subtle, but this little X that will just clear the whole thing. That is simply the result of having changed text to search for the type of that box. Um, there's other things you can do too for accessibility or user friendliness. I can do auto uh focus here for instance without any attribute or without any value. If I now reload this page, notice that the cursor is automatically blinking in the text box, which is a marginal change, but much easier for me to now type cats without having to stupidly click in the box in order to actually foreground it so I can type input. So, suffice it to say, this is not really the business that Google is in. They do much more on the back end than they do on the front end. But with just these basic building blocks, can I implement the beginnings of the same website? In fact, let me do one other flourish. You'll see that that text box is blank. Not clear what I might want to do. Well, there's another attribute I can use. Placeholder equals something like query. I can at least tell the user what to search for. If I reload again, now I see in gray text query instructions so that I roughly know what now to type. So all these things that you see every day on websites are really as easy as just coding up some HTML like that. But what else can we do with HTML? Well, it turns out this is a topic for another longer day too. There exist in computing what are called regular expressions which is a fancy way of describing patterns which are quite useful when you want to validate input. For instance, if you want the user to have to type in an email address with the at sign with the tldd and so forth, it would be nice to make sure that they get a warning if they try to skip that field or they mistype something in it as well. Um, with the world of regular expressions known in short as reg x's, you have a whole bunch of uh documentation here that in a nutshell will introduce you to some pretty powerful syntax that we won't spend much time on at all today, but it's syntax that exists not only in uh the world of the web, but in Python and so many other languages as well. So consider this just a quick crash course. If you want to define a pattern in say a website that ensures that the user types in a email address, you can use these textual building blocks whereby in the world of regular expressions, a single dot represents any character. If you don't care what the character is, dot confusingly doesn't represent a period, it represents any character. Star represents zero or more times. Uh plus means one or more times. Question mark means zero or one time if you want something to be there or not. curly braces with a number means this many times n and you can even have a range of values instead. And then you can use square brackets and some other syntax to say I want the user to type in any of these characters or digits in this case. Or you can do ranges like this. I want them to type in any decimal digit between 0 and 9 or back slashd represents any digit. Back slash capital d means anything that's not a digit. Long story short, humans over the years have come up with shorthand notation known as regular expressions via which you can define patterns. This is useful because if I wanted to make a web page that does in fact require that someone type in say an email address, I can enforce that to some extent. If I go back to my browser here and into VS Code, let me go ahead and create a new file called say register.html to be representative of registering for some website. I'll change the title here real quick. I'm going to keep the form, but in this case, I'm not going to bother with Google anymore. So, let's make it a bit simpler than before. And let's go ahead and do this. Inside of the form, I'm going to have an input. Uh, I'm going to have the name of this input be email because that's what I'm collecting. I'm going to have a placeholder be quote unquote email so the user know what's to type in. Um, and I'm going to go ahead here and have something like how about uh this a pattern as well. So actually let's say uh let's say type equals text, but I'm going to specify additionally a pattern. So the pattern I want the user to type in in between these quotes is going to be any character one or more times. That is to say their username, then an at sign. then any character one or more times. Uh then literally a period and we didn't see this on the screen but just like in C when you want to escape special characters if you want literally a period in their input as the like the dot in harbor.edu you can say backslash period to mean a literal period and then the word or the uh tld edu. So I think now what this means and let me go ahead and give myself a button and just so you've seen it there's also a button element in HTML which is similar in spirit to the submit button we saw a moment ago. Let me go back to my directory listing go into register.html and let me go ahead and just type in like mail as my name register and you'll see please match the requested format. So I have not satisfied it properly until I actually type in something like [email protected] and now it's happy. Alternatively, it's a little tedious to actually type in these patterns. So, there are some shorthands for them. I can actually get rid of this pattern. And if I read the documentation for HTML, there is actually an input of type email which just does all of that pattern matching for you. But the scary thing is that it's actually pretty involved to validate email addresses. I did a very simplified version of username at domain.tld. This is the regular expression that some browsers use to validate email addresses because even though mine is relatively simple [email protected], turns out there's a crazy amount of syntax that is valid in email addresses. And this is where regular expressions get scary. But for our purposes today, they're a thing that exists. You might find them useful in HTML. You might find them useful in Python. They're incredibly useful when it comes to extracting information from web pages. If you're analytically minded, you like the world of data science, you like to uh gather and analyze data, you can use regular expressions not just to validate data but to find patterns of data in actual websites or documents and extract that data so as to perform operations or analysis on them. So wonderfully useful if complicated tool. The catch though is this. Notice that here I'm still required to type in a valid email address register and I'm getting even more explicit information this time because I use the type equals email. The catch though with web pages is that they're not to be trusted in so far as this HTML came from the server and is downloaded onto the user's Mac or PC or phone where they have a copy thereof. I can open up developer tools as I did before by right-clicking or control-clicking and choosing inspect or whatever the menu option might be. I can go into the elements of this page, literally the HTML, and if I don't want to type in email, I want to just type in any old text and see if I can break your site, I can just change it. And now there is no such warning. Which is to say, even though you will encounter, not just today, but over the coming weeks as you play with HTML certain features, they are not to be trusted in general when it comes to security. And just like our discussion in the world of SQL and SQL injection attacks, this is one of the attack vectors. If two people are working on a website, one person's implementing the database stuff, one person's implementing the HTML, and the database person's like, "Oh, I don't need to worry about escaping characters because we're doing you we're using the pattern attribute in the HTML." Bad idea because it's this easy to hack a website, disable features that have been written for the site by just literally deleting them in your own copy. So, we'll see next week how we can defend against this on the server side, but the point now is just not to trust the user's input at all. All right. How can we be sure our HTML is right? Well, there's a bunch of ways, but one tool that's worth knowing about is this one here at validator.w3.org is a website uh by the group that essentially standardizes this and other languages. If I click on their validate by directput tab and I quickly go back into VS Code and let me grab the simplest of my examples, hello.html, I can just copy paste that into their website. Click check and they have written code to validate that the HTML I have written is in fact correct. Anything I've opened that needs to be closed has been closed. I don't have any stupid typos or missing brackets or quote marks. This is a wonderfully useful tool just to validate that your code is syntactically correct. Even though it might still look like a mess visually on the screen, this will at least check for you the underlying HTML. All right. So, up until now, everything I've done has been pretty boring. It's black and white. The pages are fairly simplistic. Turns out we can take things the final mile using another language altogether. Namely, something called CSS, which is the second of our three languages today. This two not a programming language, although curiously, they keep adding more and more features that are making it more and more like a programming language, but more on that another time. This stands for cascading stylesheets. And whereas HTML is all about the skeleton of a website, the structure thereof, CSS is like the the skin, the aesthetics thereof, the final mile that actually allows you to control the positioning of things more precisely, the colors, the font sizes, all of the aesthetics. It lets you do the finer touches on the website. And with CSS, we have slightly different syntax, but frankly, it just boils down to even more key value pairs. And as with HTML, we'll give you a taste of the basic structure and principles underlying CSS. There's so many uh key value pairs that are possible that we certainly won't do them justice today, but it's the kind of thing where you ultimately look it up in a reference, a book, um a website, or the like to pick up even more than these techniques. Well, let's do this. Let me propose that in a moment. We're going to see what are called properties. This is CSS's jargon for key value pairs. Why do we have yet another word? because a different group of humans in a different room came up with this language versus the other people. But it's just key value pairs known as now as properties instead of as attributes in HTML itself. There's going to be different ways we can define properties and this is kind of a laundry list of some of them and we'll see them in context. But in short, CSS is just going to allow us to slap a whole bunch of key value pairs on our HTML elements to make them hopefully look prettier or be more precisely controlled aesthetically. So, in my HTML, thus far, we've generally had something that looks like this. Turns out, if I want to start using some CSS, I can introduce, as we'll see, a so-called style tag in the head of my page. And inside of that style tag, I can put these so-called key value pairs. Or, as we'll soon see too, if I want to factor them out and put them into a separate file, I can actually use a link tag, which confusingly has nothing to do with hyperlinks or clickable text, but just links in another file. In this case, styles.css. the relationship of which shall be that of stylesheet. This the sort of copy paste stuff that you do where the only thing you really care about as the developer is the name of the file in which you're putting your styles. All right, let's do this. Let me go back over to VS Code, close out register.html, open up a new file this time called home.html, and let me purport to make a simple homepage for someone like John Harvard. I'll copy paste my boiler plate. I'll change the title here just to be uh let's say uh home. And then inside of the body of this page, let's do the simplest web page possible for someone called John Harvard. I'm going to say here's a paragraph of text uh when John Harvard is going to be the person's name. Here's another paragraph of text. Welcome to my homepage will be in the middle of this page. Then a final paragraph of text inside of which is like copyright. See how about uh John Harvard down here. So, it's a basic website. It's just three paragraphs. It's not going to be pretty, but let's make sure I haven't done anything wrong. Let me close my developer tools. Click back. Click home. And there we have it. The simplest of pages for John Harvard. Welcome to my homepage. Copyright John Harvard. Let's at least start to exercise some control over this. Let's change the font size and the alignment of the text. So, back in VS Code, let's go ahead and add uh for now, actually, not even a style tag, but a style attribute. I'm going to go ahead here and type in style quote equals quote unquote font-size large and then text-all colon center semicolon. And I apologize, but semicolons are back in CSS. Then, in my next paragraphs, open tag, let's do something similar, but different. font size colon medium for medium text align colon center semicolon. Uh, and then lastly down here, let's do style equals quote unquote. Font size colon small because it's the footer, so who cares? Text align colon center semicolon. Strictly speaking, at the last key value pair, otherwise known as a property, you don't need the semicolons, but just for consistency, I'll keep them uh for for that. All right, let's go back to this page, reload, and watch. All of the text a moment ago was left aligned and the same size. Now, it's a little subtle, but it's clearly centered, but it's large, medium, and small, respectively. Even if you've never seen CSS before, what rubs you wrong about this design, though, based on all weeks past? Yeah. >> Yeah. For every line, I've been repeating myself with text align center. Text align center. text in line center. And if we really want to nitpick, these aren't really paragraphs, right? There's like no phrases or full sentences, let alone paragraphs. So, it turns out there's a whole bunch of tags we can use to lay out a page. And in fact, I'm going to transition to one that's a little more generic than paragraphs, namely div, which is just going to create a division in the page for me. And this doesn't have any functional impact, but semantically it's a little nicer because it means I've got the division here for the header, the division here for the main part, and the division down here for the footer. It's just a different way of thinking about it. is just different rectangular swaths of the page. But I like your point that text align center is kind of stupidly duplicated all of these times. Let me actually go ahead and first reload this change because there is one side effect that we might want to get back. When I reload now using divs instead of paragraphs, well, there goes the nice white space in between my text. Divs just give me rectangle after rectangle. And as an aside, let me control-click or rightclick, open up developer tools yet again, and notice this other trick with your elements tab. Whatever you hover over at the bottom of your screen will be colorcoded at the top of the screen. So if I dive into the body by clicking this little triangle, let me zoom in. At bottom left, I can now see my own HTML much more uh pretty printed and colorful down here. If I click on this one or hover over it, you'll see that the first div, the rectangular region is highlighted. Now the second, now the third. That's all we mean by divisions of the page. Um, this allows me to see my copy of it in the browser as opposed to in the original file. So just another technique for developer tools. All right, but I don't like this duplication, but here is now the C in CSS. Cascading stylesheets means that if you want one property or key value pair to sort of cascade down on all of the other tags inside of that one, you can do that. For instance, in the body tag, I can add my own style attribute here and put all of that text align center there. Why? Because div are the three children of the body tag to borrow our vernacular from family trees and from trees more generally. So, this too should work because text align center should cascade down now on all three of those children. And indeed, if I reload the page, nothing visually changes, but it's arguably now better designed. All right, what more could we do here? Well, how about this? It would be nice to make clear to servers out there, like search engines, like what's going on in the page semantically. And the term of art out there nowadays is the semantic web, which essentially is about putting more hints in your HTML so that servers like um search engines kind of know more so what they're looking at. This is pretty generic right now. Div, div, div. But presumably the top of the page is among the most important things because that's effectively like the header of the page. Then the middle div is kind of the second most important because it's like the main part of the page and the footer is like the least important. So it turns out there are other tags in HTML besides paragraphs and divs. There are literally tags like header which allows me to define the header of the page, main which allows me to define the main part of the page and then even footer which allows me to define that too. So now if Google and Bing and other search engines are sort of crawling my website once it's public, they know that John Harvard's important because it's in the header, uh, welcome to my homepage is important because it's in the main page. They're probably not going to care as much about the copyright because it's in the footer. So it's just providing more hints to these kinds of services. Um, moreover, we can do some other things here. This is kind of a hackish way to implement a copyright symbol. HTML also has what are called entities where if I can do this magical incantation here, uh, amperand hash symbol 169 semicolon. Notice that VS code recognizes this as an HTML entity. If I go back to this page and notice my first approach was just parenthesis C parenthesis. If I reload now, having used that HTML entity, which I only know by having looked it up, now I get the copyright symbol that actually comes in the font that's being used here. All right, so let's transition now to this approach whereby I claimed before that you can actually use a style tag. And why might we want to do this? Well, looking back at my code here, this is sort of hinting at potentially bad design. Even though there are different arguments for and against this, right now I'm sort of co-mingling my data with my presentation thereof. Like John Harvard, welcome to my homepage and copyright such and such is sort of the data I care about. Um, but I'm sort of mixing in the stylization of all of this stuff by putting CSS and HTML in the same place. So to be clear, all of the green stuff and even well everything we've seen thus far, the tags and the attributes, that's all HTML syntax. Everything between the quotes is now CSS. And this is the first we've seen this before only in the sense that we've used SQL inside of Python code. Here we're using CSS inside of HTML code. But the CSS syntax is everything thus far inside of those quote marks. Wouldn't it be nice to kind of factor that out so that I can see it all in one place and better still factor it out ultimately to another file? And I can do this as follows. Let me in my home.html HTML get rid of all of these style attributes and really go whittle the page down to its essence whereby I just have the header main and footer tags inside of which is that content. It's already easier to read at least for me the human inside of my head tag. Now though let me go up and say style and inside of this new style tag let me show you another approach for stylizing the page. Up here is where we can actually select elements to operate on using what are called selectors. So if I want to modify the style of my page's body, I can do that by typing body. And then I'm afraid curly braces are back in CSS 2, I can put text align center up here. And the fact that I've put the word body before those curly braces just means all of these key value pairs, one in this case, will operate on the body. Meanwhile, down here, I can say the header is going to have font size colon large. Uh, the main part of the page is going to have font size colon medium. And then lastly, the footer of the page is going to have font size colon small. You know, definitely more lines now, which isn't the best, but the effect now if I go back to my browser and reload visually is pretty much the same. I've just relocated all of those key value pairs elsewhere, but as a stepping stone now for doing something a little smarter whereby I now can uh lay the foundation for putting this in another file al together. But first, let me note this too. The fact that I've put all of these key value pairs associated with specific HTML tags doesn't really make them very usable or re rather reusable. And so when I alluded to earlier that these properties can be applied to different selections of HTML type selectors, class selectors, ID selector, attribute selector. Let's just give you a little taste of this. What do we mean? Well, suppose that I want to generically be able to use text align center uh without associate it only with the body. Maybe I want to use this for a larger project where I want to uh center many things on the page. I can define my own keyword like the word centered which doesn't exist per se but if I prefix it with a dot what I've just created is what's called a CSS class and a class is just a set of key value pairs properties that you can associate with any HTML tags meanwhile if I want this key value pair to be associated with the notion of large I can define large I can define medium and I can define dot small down here the motivation for which is that now in my page page. If I want to center the body, oops, let me fix my own typo. If I want to center the body, I can say please use the class known as centered on this tag. And then on the header, I can say please use the class known as large on this tag. And then please use the class called medium here. And then lastly, use the class called small here. So now in the spirit of a lot of the modularization we did in Scratch and in CN Python of making your own functions, classes aren't functions, but they are a way to encapsulate one or more properties and use or reuse them anywhere you want in a web page. It's not that over it's not that impressive here in this short one, but it lays the foundation for doing much more interesting things soon down the road. In fact, let's take a step in that same direction. Let me go ahead and now highlight everything I've put inside of this style tag um and cut it onto my clipboard. I'm going to get rid of the style tag al together. I'm going to create quickly a new file comb.css and I'm just going to paste all of that stuff in there. And just to be nitpicky, I'm going to de-indent it so it's all left aligned. So all I've done is just move everything I just wrote into a new file called home.css. I'll close that. Out of sight, out of mind. But what I'm going to do now in the head instead of a style tag which contained all of that clutter, I'm going to say link href equals home.css and then this real tag which just means the relationship of this file to this one should be that of a stylesheet. And this tag 2 does not need to be closed. It just is. And now if I go back here and reload, still no changes other than the tweaked the font a moment ago. Still no changes. But now it's better design with that file completely separate. So where are we going with this? Well, just to kind of circle back to something we did earlier, let me open up my terminal window. And recall earlier we had this file like favorites0.html. And this contained all of the data from a couple of weeks back that we solicited via that Google form. And recall a bit ago when we went into favorites 0.html. I mean, it was just kind of an ugly uh table structure. But it turns out in the world of uh in the world of HTML and CSS, there are also what we're going to call frameworks, which is a fancy word for library. But a framework is sort of a way of doing something by using someone else's library. And to do it their way, you just read their documentation and then you adopt their functions in the case of code or you adopt their CSS classes in the case of this example. So, one of the most popular frameworks out there nowadays and among the simplest and best documented is one called Bootstrap. Uh, which is a set of uh CSS classes and other features that you can use because it's open source in your own code. And in fact, all of the documentation is at this URL here. I read the documentation before class and I copied really the one line of code that I need to make favorites.html even prettier. So, let me go back into VS Code and let me copy my pre-made example from earlier. And you'll see that in favorites, whoops, favorites one.html, I have all of the same code, all of those lines of everyone's submissions. But notice I've added now this link tag. And it's a little longer than the one I wrote. It's referencing a third party website, JS Deliver, which is a CDN, content delivery network, which is to say a server that just serves up content for other people to use. But I copied that from Bootstrap's own documentation. And what I did here is the following. I added a class to my table tag specifically with a value of table and followed by a space table striped. Why? Well, I read Bootstrap's documentation at that previous URL and I liked the look of their tables because it lays it out with nice stripes like white and gray and white and gray and it sort of formats everything quite a bit nicer. So, if I go into this version in my second tab by going back first and now opening up favorites 1.html, HTML, same exact data, two lines of change, and voila, now we're talking. This looks much more like a table that you would see on any pretty website like your Gmail inbox or the like, all by simply changing the CSS and not really the HTML at all. So, the motivation for introducing those classes a moment ago was so that we can have reusability of code. And better still, we can start to stand on the shoulders of others by using code that other people have written in order to improve the aesthetics of our own websites as well. All right, how about a couple of final flourishes with some style? Let me close out these examples here and let me propose to go into how about that same link example from earlier. So, let me reopen link.html, which recall had this fishing attack at the time. I'm going to revert this to the safe version and just say visit Harvard at Harvard's actual URL. Suppose I wanted to stylize this link beyond the default. Well, let's see what it looks like by default. If I go back into link.html, this is what it looked like before, blue and underlined by default per the browser's decision. But I can override that and any number of ways to keep things simple. I'm just going to stay in my same file now rather than uh be pedantic about moving it to another file. And if I want to stylize the anchor tag, just as before, I can say a and then in some curly braces here, I can do something like this. Color uh colon red. If I want to make it crimsonlike instead, let me go back to VS Code or my other tab. Click reload. And now we have a red tab. I can really geek out. And if you remember your hexadimal codes from our discussion of images a few weeks back, I can do hash FF000000, which is a lot of red, no green, no blue. And if I go back to my other tab, click reload, same exact thing. You have that much control over even the color codes that you might use. Maybe you don't like the underlining in this particular case. Well, that's fine. I can do something like text decoration none per the documentation. I can reload and gone is that underline. Maybe it'd be nice to hover over the word and then see the underline. Well, I can do that, too. Turns out I can have these pseudo selectors whereby I say the name of the tag, then a keyword like hover, which browsers know to recognize. And when I hover over an anchor, what I want to do is change the text decoration to underline temporarily. If I go back to this tab now, reload, looks the same, but as I move my cursor over, notice that it's underlining it for a visual effect. Let's see what's going on with my developer tools. If I right click anywhere and choose inspect, notice a detail I haven't showed us before is not uh is under the elements tab here. Notice if I go down to my link here and let me just make the right hand pane here a bit bigger. All this time but ignored up until now has been this part of developer tools whereby I can actually see all of the CSS that applies to the element I have just selected, namely this link. And I see here in nice pretty printed fashion that I'm using this color FF00000000 text decoration none. Why is this useful? Well, one, if you want to learn from another website how it's doing its thing, you can just look at the CSS, but also if you want to be able to iterate more quickly and just kind of tinker with things, I can actually turn the color on and off by just hovering over the inspector here and just turn it on and off by clicking and uncclicking. And if I want to just play around with, oh, maybe maybe Harvard should be 00 FF0000, enter, I can make it green instead. So, you can temporarily change the browser's copy of your own HTML or CSS just to tinker and iterate quickly just like I tinkered with Stanford's uh own website or at least my own copy thereof. Lastly, how about in terms of these selectors? These are using type selectors that is selecting the name of the tag. If I want to actually uh affect one tag specifically, a very common convention is to give an HTML element a unique ID. For instance, I'm going to call this Harvard. And by uh honor system, I should not give any other element in this page an ID of Harvard. The motivation is that I can now uniquely identify this tag by for instance changing this to hash Harvard, which is just the convention for specifying that it's not a class now. It's instead an ID. You do not put the hash though in the actual value down here. And what I can even do down here is something like um uh hash harbored to scope that as well. If I now reload, we're back to the red version and the same functionality as before. And it's just a more precise way now to target your CSS properties to a very specific element instead. Okay, that was a lot. Any questions on any of this thus far? No. That clear? All right. Well, one last language for the day. And and we do mean what we say like that is the extent to which you will learn formally HTML and CSS like everything else just follows those exact same patterns. It's different classes. It's different attributes. It's different tag names. All of which can be picked up through practice, through uh osmosis, through uh references. But that's really it for the fundamentals. And so our last focus today is on an actual programming language that we'll just scratch the surface of, if only because it's so darn omnipresent nowadays. Most every website you use is made from not only HTML and CSS, but if it's in any way interactive, odds are it's using JavaScript, a programming language that is very commonly used client side whereby humans write the code on the server, but then your browser as before downloads it to the client and then it runs in your own Mac, your PC or your phone. That said, JavaScript is also very popular on the server nowadays. It's not just a browserbased language. In JavaScript, what you have most powerfully though is the ability in memory to mutate this tree in real time. In other words, think about even your Gmail inbox or your Outlook inbox. Typically, you see email after email after email after email. Odds are per today, what HTML tag is creating that UI of row after row after row? Which tag? like table tag like the table tag probably right table row table row table row but it wouldn't make well actually this is the way things used to work in my day back in the day when you visited not even Gmail before it existed but your email inbox you would download from the server a web page containing a table tag with table rows and table data elements and that was your inbox if you wanted to see if you got new mail you just reload the whole page and it would download new contents from the server and show you the new HTML with JavaScript which has come onto the scene over the past 20 plus years. You have the ability to download the data once initially, then use code to just grab some more data every 30 seconds or some more data pretty much anytime an email arrives. And if this picture here represents not our super simple hello title, hello body page, but a whole bunch of table rows for your existing email. The moment you get more email, you can use JavaScript code to add another node to this tree, another node to this tree representing the table row tag. the table row tag again and again. So in short, with JavaScript, you have the ability to change the tree, otherwise known as the document object model or DOM for short, dynamically in order to evolve the web page. So let's take a quick tour of what JavaScript does have syntactically and then I'll just demonstrate some of the capabilities thereof without dwelling today on syntax beyond this. So in Scratch, which is looking pretty good now, you had conditionals which looked like this. In JavaScript, it's pretty much the same as C. The curly braces are back at least for uh two or more lines. Uh but uh indentation doesn't matter except for the style thereof as it uh as in contrast with Python. If you have an if else, it's going to look the exact same in C. If you have if else if else, you have the exact same thing in C. Different from Python because this was l if in Python. Now we're back more verbosely to else if as in C. Uh variables in JavaScript. Well, here in Scratch is how you set a variable counter to zero. In JavaScript, there's a few ways to do this, but the most uh reasonable for now is to let counter equal zero. So, you don't specify the type. This is more of a polite way of asking the browser, please let a variable called counter exist and set it equal to zero by default. Semicolons are back. However, that's not strictly true. Browsers are smart enough to know where semicolons actually matter, but for our purposes, assume that they're always there. How do you change counter by one? Well, you can do it the pedantic way, which is a little verbose. You can do the plus equals trick or nicely back in play is the plus+ in JavaScript just like in C but not in Python. Loops in JavaScript. Well, in Scratch, if you want to do things three times, here's how you would do it in JavaScript. It's pretty much the same as C except for not mentioning the data type. Instead, you use the keyword let here. But otherwise, this is exactly the same as in C. Uh if you want to do something forever for whatever reason in JavaScript, you can say while true, which is exactly how we did it in C. If you have a web page like this, meanwhile, and you want to insert some JavaScript to it, you can do it in a couple a few different ways. You can put a script tag just like the style tag in the head of the web page. This can get you into trouble though for reasons you might encounter whereby if you put your JavaScript code up here and you try to use it to modify the web page but the web page isn't defined until down here you can get into some uh a race condition really where the data does not yet exist. So um you instead of putting it there or even in another file, it's actually pretty common too to avoid that altogether by putting your script code or your script tag at the end of the page just before the end of the body to ensure that all of the web page exists already. This is similar in spirit to the deaf issues we saw in Python or the prototype issues we saw in C. There's bunches of solutions though to this here problem. But let's now take some JavaScript for an actual spin and use VS Code to write some of it as follows. uh in VS Code. Let me go ahead and close link.html, open up my terminal temporarily, and let's improve my actually let's just improve the very file, hello.html, that I have here in front of me, and actually have it be more interactive and give me sort of a popup on the screen when I type in my name. So, let's start as follows. First, let's go ahead and change this just to uh hello, just for short. And in the body of this page, let's give myself a form. And in this form, let's give myself an input. Uh, we'll turn off autocomplete just to avoid distractions. We'll turn on autofocus to save me a click. I'm going to give this HTML element an ID uniquely of name. A placeholder also of name just so the human knows what to do. And the type of this field shall be text. In other words, I want to create a program week one and week zero where I type in my name and see hello such and such. I'm going to give myself an input a submit button with input type equals submit. don't really care what the button says, but I do care now when I go back to my other tab, close my developer tools, go back into hello.html, I now have something that looks like this. It looks similar to our search example for cats, but now I'm asking the user for their name along with the submit button. But what I want to have happen is when I type in David and click submit, I want to see hello David somewhere on the screen. Well, how can I do this? Well, a few different ways, but JavaScript allows me to do things like this. And for upcoming problem sets, you won't necessarily have to write JavaScript like this. So consider this a whirlwind tour, not so much uh something to ingrain. Here I can add a new attribute to the form tag called onsubmit, which as the name suggests means call the following function when this form is submitted. Well, what function do I want to call? I'm going to call it a greet function. And that's it for now. How do I define a greet function? Well, I could, among other places, put this inside of the head of my page in a script tag. I can define a function in JavaScript by literally saying function and then the name of the function and then in parenthesis any arguments there too. I'm not going to have any. And then in curly braces, I can actually define the meat of that function. And for instance, I can do this. Uh, let name equal the following document.query selector. And now what I want to do is this. Document is a global variable that just comes with JavaScript in the browser that allows me to write code involving the whole document, the web page itself. Query selector is a fancy name for a function that lets me select specific elements of the page using CSS selector. So the very same syntax we saw with names and with dots and with hash symbols a moment ago are back in play for JavaScript here. So if I want to create a variable that stores the name that the human typed in, what I can do is pass to query selector a selector for that element, which is quote unquote hash name, where hash just means ID. But the reason I'm using name is because the unique identifier I put here is name. If I change this to foo nonsensically, that's fine. I just have to change this to foo up here. So I'm in full control over what is called what. But if I want to get the value that the user typed into that box, I now do value. And we've seen these dots before. In C, they were for accessing strrus. In Python, they were for accessing contents of objects. So this just means use the document global variable, use the query selector function or method inside of it, get the element whose unique ID is name, and then go inside of that text box and give me its value. So it's a very long-winded way of saying store the user's input in a variable called name. But what's nice now, even though this is going to be a bit ugly, is I can then use a built-in JavaScript function called alert. And I can say something like hello, close quote, then plus, which we've seen before in Python, and concatenate with it that name's V value. Now, this isn't quite complete and for reasons I'm going to wave my hand at. I also need to add annoyingly return false down here because otherwise if I click submit, yes, the greet function will get called, but the browser will still try to submit the form to a server which is going to interrupt my own code. So, long story short, this is a bit of a hackish approach for now to just making sure that the only thing that happens when I submit this form is that my function is called. Now, if I didn't screw anything up, I should now see after reloading this page a prompt for my name. I'll type it in and when I click submit, I should see an ugly but functional alert box pop up with dynamically generated text, namely hello, David. I say it's ugly because by convention, Chrome shows you the full URL or the domain name of the website in question, which is my randomly generated one, which does look stupid. So, we can do better than this. But the point is now that I have written code in JavaScript to listen for the submission of this form and when that happens call that their function. And this is generally the paradigm of JavaScript. There exists in the context of websites a whole bunch of events that can happen. And this is a word we haven't used since week zero in Scratch. Recall that in Scratch you have events like when green flag clicked and when the green flag is clicked you can do something in response. Same thing in the world of web programming. Here are just some of the events that can happen in a web page. Like the user can change something, click on something, drag something, key up, put the keyboard up, put the mouse down, or other things. What I'm listening for is the submission of a form, which is cool because in JavaScript then you can essentially write code that listens for any number of these events and then does something when it happens. Consider after all in Gmail, if you click the little refresh icon within Gmail itself to get new mail, it runs some JavaScript code. it turns out to talk to Google's servers, get more email, and update your site. If you click and drag on Google Maps to see like higher up geographically, well, what's happening? Some JavaScript code is listening for your mouse going down and dragging so as to go fetch more tiles, more rectangular pictures of the map wherever you're trying to drag. So, anything that's interactive in websites nowadays like that is using JavaScript by just listening for things that you or someone else might actually do. Well, let me go ahead and start opening some pre-made examples just to give you a sense of the other syntax that is in use today with JavaScript. I'm going to go ahead and open up a version of this hello program called hello2.html, which is different in that I'm practicing what I preached earlier by putting the script tag at the bottom of the page just to ensure that the form and everything inside of it already exists for sure by the time this code executes. Moreover, what I'm getting out of is the business of using the onsubmit attribute. So, just as I tried to get my CSS out of my HTML and put it elsewhere, similarly, I'm trying to get my JavaScript code like the greet function out of the HTML and putting it down here. Now, why is this useful? This is a big mouthful, but it just follows a general pattern as follows. Document.query selector quote unquote form is just getting a reference to the actual form element in the page. So if you imagine in your mind's eye that this is drawn out as a tree in the computer's memory, this is just getting me a pointer to the form node in that tree. Haven't seen this before, but it kind of does what it says. Add event listener is a function or method that you can call on any element that just tells it to listen subsequently for this event and when that event is heard, submit in this case, call the following anonymous function, otherwise known as a lambda function. But long story short, this syntax just means when submit happens on that element, execute the code between these curly braces. What happens? Alert. Hello. Quote unquote document.query selector name.val. I didn't bother with the variable this time. This does exactly the same thing, but is a purely JavaScript solution without using the onsubmit attribute. And we show you this only because especially for final projects, you might want to do something like add event listener to make like maybe a drop- down menu or some interactive clickable thing in your website that just listens for one of these events to happen before actually executing some code. Um, notice I've been conventionally using single quotes in JavaScript because that's just a thing in the JavaScript community to generally prefer single quotes over double quotes. Why? Well, it means people in JavaScript are hitting the shift symbol like much less than the rest of the world to get double quotes. It's just a convention. So long as you're consistent, um either is fine. Um conditional on not having actual apostrophes and text and such. Let me show you one other convention. Instead of putting my code at the bottom of my page just before the body ends, it is also alternatively conventional as in hello3.html HTML to do this to still maybe put the script tag at the top of the page, but to additionally have this magical line whereby you add an event listener before you do anything else that listens for this crazy weirdly named event called DOM content loaded. But now that you've heard DOM briefly, DOM is document object model just means the tree in memory. This is just the fancy way of saying when that tree is loaded, go ahead and do the following. And this ensures that when a browser reads all of this code top to bottom, left to right, this code won't actually be executed until the whole DOM is loaded into the computer's memory. That whole tree is built. So that's all that's being referred to there. The rest of the code is actually exactly the same. Um, what more can we do? Well, just so you've seen it, I can delete all of that code, move it to a file called like hello.js, JS. And in the fourth version of this example, I'm back to just HTML because I can put all of the fancy complexity inside of my script uh tag here, factoring that code out into hello 4.js, but the code is otherwise, I claim, unchanged. All right, this is a lot. I know it's quick, but do the general principles make sense? Like just listening for events and running some code in response? That's really all we're talking about. Allah week zero with scratch. All right. Well, let let me let things escalate just a little bit. And this time I'll open the demo first. Let me go ahead and open up hello 5.html which I wrote in advance, which okay, this is definitely starting to look like a mouthful, but in a moment it'll make a bit more sense. Let me go ahead into my other tab here. Click back. Go into source 8, which is all of my pre-made examples. And I said we're in hello 5 now. And in hello 5, there's no submit button because watch this fanciness when I search for something like C uh or David as a full-fledged word there. Notice it's just happening inside of the web page. Moreover, if you poke around, let me rightclick on the page. Let me inspect to open my developer tools. Let me expand the body down here. And actually, let me reload the page. So, notice by default, this is what my web page looks like. It's just got an empty paragraph tag for some reason. But watch what happens at the bottom of the screen. And I'll zoom in a bit more. When I start typing my name like D and then let me expand this triangle. You see it beginning A V I D. When I say that JavaScript can mutate the DOM, the actual tree in memory. Like that's what you're seeing. You're seeing the HTML preprinted color-coded version of that tree in memory. And how is it working? Well, if we go back to the code here, well, let me wave my hands at this first line. This just means don't do this until the whole DOM is loaded. Let's look at this line, which means give me a variable called input, and set that equal to, okay, the input tag on the page, the text box, and then do what? Well, take that input, add an event listener that's forever listening for key up, like my finger going up off the keyboard, and when that happens, call the following function, which has no name, but that just means call these lines inside of the curly braces. Well, what happens inside of those curly braces? Well, here's a variable called name. And this is just pointing at the paragraph tag. Apparently, I'm checking this question. If there's hm if input value, so this is like saying if input value does not equal quote unquote just implicitly, go ahead and set the inner HTML of that name variable equal to hello quote unquote input value. Now, this is crazy syntax and I'm showing it just because you'll see it in documentation online. This is similar in spirit to Python's F strings. It's ugly syntax with dollar signs and curly braces and worse yet back ticks. However, this is a manifestation of really the JavaScript community presumably deciding that if you want the language to evolve, you have to make sure you're backwards compatible with old versions of the language. So, they chose characters and syntax that probably do not appear already in the wild. That's why sometimes things look uglier, I would surmise, than otherwise. But long story short, this just means if there's input there, go ahead and say hello, input. Otherwise, it says by default, hello whoever you are. And in fact, if I go back here and delete my name, watch what happens. It goes back to that default. So, here is just an example of listening for keystrokes going up and down and making sure that the page responds accordingly. How about something else? Let me go back into my directory listing. Let me open up background.html, which I wrote in advance. It's super simple, but this is the first of like an interactive website that has three buttons labeled R, G, and B. As you might imagine, clicking on R does that. G does this, B does that. Well, how is this working? This is the first example now where you can use JavaScript code to alter CSS dynamically. So, let me reload the page. So, it's back to white. Let me open developer tools and watch what's happening now on the body tag specifically. Initially, there's no stylization on the body other than the browser's default margins and whatnot over here. But watch what happens at bottom right when I click on the R button. You see that all of a sudden background color red was dynamically added. Now it's green, now it's blue. And notice the HTML at bottom left is changing too. So somehow I am listening for clicks and then changing CSS in response. So if I go back to VS Code, let's close Hello 5. Let's open up source 8's uh version of background.html. And in here, it's a bit of a mouthful, but the HTML is simple. Here's three buttons. And because I wanted them to be uniquely identifiable, I gave them all IDs of red, green, and blue, respectively. And then this code is a bit of copy paste. And frankly, I could probably avoid that if I were more elegant. But just to be pedantic, here's what's happening. Here's a variable called body that's just getting the body element, the node in the tree at that moment in time. And then these three lines of code, their purpose in life is to handle the red clicks. How? Well, we're telling the document to select the element whose ID is read, listen for the click event, and whenever that happens, do this. Body, which is the same variable as before, dotstyle, which we haven't seen before, but any element can have a style property associated with it in JavaScript. Background color equals quote unquote red. And the other blocks of code are exact same thing for green and for blue. The whole point here is we're now listening for clicks on buttons and changing not the contents of the button but rather the style thereof of the whole page. As an aside, this is curios uh curiosity. This is what's known as camelc case whereby like a camel has a hump in the middle. This word has a hump in the middle like capital C all of a sudden to separate the two words in CSS. Recall it was uh a moment ago background dash color. Anyone want to guess why this is not how you write it in JavaScript? Anything with hyphens in CSS is changed to camelc case in JavaScript. >> Uh it's not related to comments. It's simpler than that. Yeah. >> Yeah. Right. Like left hand wasn't talking to right hand and people realize, oh damn it. Like this now means background minus color which is not a thing because minus is indeed just like in C and in Python a mathematical operator. So, the world decided to reconcile this problem by just capitalizing uh the character that would otherwise be where the hyphen is. Well, little CSS trivia. All right, what else can we do? How about a couple of final examples here? So, what more can we do with CSS? So, back in my day, too, we had a tag called the HTML blink tag, which is among the few tags in the world of HTML that has actually been deprecated, that is removed from language. Like no one removes things from languages generally, but the blink tag was so hideous, followed only by the marquee tag whereby my own homepage is like a freshman had like welcome to my homepage just moving across the screen like this from left to right for no good reason like an ugly marquee and like uh on like a digital signage nowadays. But we can bring it back as follows. So if I close out my developer tools, go back into my source 8 directory and open up blink. This is what the blink tag used to do back in the day. Now, this version is implemented instead in JavaScript code as follows. I have a function here called blink, which I'm apparently calling every once in a while. Uh, how is that happening? Well, let's scroll down. Here's my HTML, super simple. Literally just says hello world. But notice this. There's another global variable we haven't seen in JavaScript called window. That refers to like the general window, not necessarily the contents of the page, where you can call a method called set interval. And you can tell that method set interval to call a specific function every number of milliseconds. So if I want to call blink every 500 milliseconds, that's the line of code that I use. If I scroll up to now this function, let's see how blink is implemented both now and perhaps back in the day. Well, body is a variable here that's just pointing to the body node in the DOM. And this is a big mouthful, but if that body's styles visibility property in CSS is quote unquote hidden, then change that body's styles visibility property to be visible. Otherwise, change it to be hidden instead. Here too, don't understand why left hand and right hand weren't talking to one another. You would think that the opposite of visible would be invisible, but in CSS, the opposite of visible is hidden. Just have to memorize stupid things like that. But what's this really doing? It's just changing the CSS from hidden to visible. Hidden to visible every 500 milliseconds. So in fact what you're seeing here in the blink is if I inspect this page too. And now notice it's kind of fun just to watch it. You can see the HTML at bottom left and the CSS at bottom right just automatically changing because I'm doing that every 500 milliseconds. All right. How about one other? Well, autocomplete. Well, we saw a step toward this with my hello, David example a moment ago. Super common though in Google and like every website now to automatically try to finish your thought. How is that happening? Well, that's not just HTML and CSS. That is also some JavaScript thrown into the mix. So, for instance, let me go into my terminal and open up source 8's example called autocomplete.html. And here I am going to borrow a file called large.js which is just a massive version. I'll open that too if you're curious. Large.js is just a huge JavaScript array. eras are back containing all of the words from problem set five, the spellchecking problem set where you had a 100,000 plus words in uh C in a file given to you. Now we've converted that to JavaScript by using a global variable like this in the code here. What's happening? Well, apparently there's going to be a text box at the bot at the top of the page that we see. Then there's an empty unordered list. So an empty bulleted list. And then there's this code down here. I'm apparently creating a variable called input that's referencing that text box. I'm then listening for key up just as like we've done before. And then I'm doing this. I'm setting a variable called HTML equal to quote unquote nothing. So an empty string. And then I'm checking does the input text box have any value implicitly. If so, what am I doing? This is kind of cool. It's a bit of Python and C together syntactically for each word in the words array. JavaScript uses the keyword of instead of in like Python, but so be it. What I'm doing now is in JavaScript, I'm saying if that current word in that big file of 100,000 words starts with whatever the user typed in, go ahead and add to that HTML string using plus equals, which is just concatenation. We've seen plus before, the following, an LI tag inside of which is that specific word. And so in effect, what you're seeing now is what every almost every website nowadays does. They're not manually writing HTML like we've been doing much of today. They're writing code that dynamically generates HTML because the programmers understand what HTML is. They understand that unordered lists have li children. And so using this string that I've highlighted, they're creating LI element after LI element for the purpose of changing the inner HTML of the UL element to be the value of that variable. And this is a very long way of saying how is autocomplete implemented in general. Well, just like this, if I search for cats by typing in C, there's every word in that 100,000 dictionary that starts with C. A T S. And there's every word that starts with C A T S. Meanwhile, watch what happens underneath the hood. If I open up my inspect tab again and I go to my body, inside of this is the empty UL, but watch as soon as I start typing something like C. Now I can expand the triangle because there is an LI element that's been created for every one of the words that match. As I do ATS, now I've got just four of them. And there is cats, there is cats skill and so forth. So anytime you go to google.com like we did earlier and we went to google.com and started searching for cats, where are all of those search results coming from? Someone wrote JavaScript that's listening for key up or the like and then dynamically populating an unordered list or in this case a much prettier list of the matching results. And the final example that we thought we'd leave you with, and again the whole purpose of introducing JavaScript is to give you a taste of its syntax and its relative familiarity, but with the power that you can uh the power with which you can leverage it to make websites so much more interactive. And in fact, with Bootstrap, you don't just get CSS you can use, you have a whole set of JavaScript functionality. So you can have drop- down menus and the like. For instance, for instance, among the things you'll use for an upcoming problem set and perhaps your final project, something that looks a little like this, uh, in Bootstrap.html, here's a whole bunch of code that I literally copied and pasted from Bootstrap's documentation. And it's just like boilerplate code for a corporate website that has features with pricing and disabled menu options as well, just for the sake of discussion. And then here, if I go back into this example, you'll see fairly simple website that looks like this. A so-called navbar with all of the main menu options of like a corporate website. And notice if you start to resize the window, which I'll do here, and put it into sort of mobile mode because it's so narrow now, thanks to JavaScript, it's listening for clicks on this hamburger menu and revealing the menu options that way. This is quite like how CS50's own website works and so many other websites out there. But the last one we thought we'd use is you're so in the habit of using Google Maps or Uber Eats or any number of apps that need to know your location. That too is exposed through JavaScript quite simply. Let me go ahead and in geoloccation.html HTML open up uh the following code whereby super simple even though some new functions there exists another global variable in JavaScript in browsers called navigator which has a property called an object called geoloccation which has a function called get current position that takes an argument which is just an anonymous function which means call this code when you're ready to know the uh coordinates because it might take a while to figure out your GPS coordinates and once you do this simple example is just going to write to the document that is the rectangular page the positions latitude that comes back and the position's longitude that comes back. So to see this in action, let me go ahead and uh open up that second tab. Go back into geol location. It's notice for privacy sake, it's asking me to approve this. So I'm going to say allow this time. There are apparently my laptop's GPS coordinates. And if I go to google maps.com, I can actually paste this in here. Enter. And looks like if we zoom in in in okay, I'm not technically outside, so it's only close to a degree of precision, but it's probably mapping to one of the Wi-Fi access points that's on that corner of the building. So, we're pretty darn close, pretty much close enough to get me my my food or my my ride here. And a final note, now that you've seen a little bit of JavaScript, let me go ahead and open up just 60 final seconds of uh just how uh how much effort it took us to put not only this lecture together, but particularly that example of the teaching fellows passing packets, everything we like to think is very finely flourished here. Uh but here's a little bit of behind the scenes and these final 60 seconds together. If we could dim the lights before we adjourn. >> Off you go. Offering. Okay, Josh. Nice. Helen. Oh, Bentimony. No. Oh, wait. That was amazing. Josh um Sophie Amazing. That was perfect. >> I think I over to you all. >> Oh, nice guy. That was amazing. Thank you all. >> So good. >> All right, that's it for CS50. We'll see you next time. Heat up here. Heat. Heat. All right, this is CS50. This is already week nine. And I dare say this week is the most representative of what you'll be doing after the class if you so choose to program in the future and tackle some project that's new to you. In fact, the closest to this week was perhaps week six wherein we didn't really introduce all that many new concepts but really translated them from C and to Python. And so this week in particular, the goal is to really synthesize the past 10 weeks of class, drawing upon a lot of the building blocks that are hopefully now uh metaphorically in your toolbox and gives you an opportunity now to apply those ideas to new problems. In particular, web programming. So every day you and I are using the web in some form. Every day you and I are using mobile apps in some form. And we said last week that the languages underlying a lot of those applications are HTML and JavaScript for the layout and aesthetics. and then also in part JavaScript for a lot of the client side interactivity that you might experience nowadays. Well, today we come full circle and bring back a serverside component whereby we'll again write some Python, we'll again write some SQL code and use it to make our full-fledged own web applications and in turn if you so choose mobile applications as for your final project as well. So up until now when we did anything with the web, you ran this command last week HTTP server which literally did just that. It spawned a so-called HTTP server that is a web server whose purpose in life is just to serve up content from like your current folder, any files therein, any folders therein. And so all of the URLs generally followed a certain format. So if your URL were example.com/reall just denotes the root of the web server and so in there typically by default you would see a directory index. We'll see today that that goes away because generally when you visit something.com/ you want to see the actual website, not the contents of everything in the server. So we'll see how to address that. But the URLs up until now have been of a form like file.html literally referencing a file in that folder or folder slash which just means whatever is inside of that folder or folder/file.html or dot dot dot. You can nest these things however long that you want. And recall that more generally we said that you're referring to some kind of path on the server where pi the p path is a step of folders ending in perhaps a file name. So today we're going to generalize that at least in terms of nomenclature and start talking more about routes because essentially in web programming we are going to exercise a lot more control over what is in the URL. So back in the day it referred to literally a file on the server and as recently as last week the URLs referred to literally a file on the server. However, we'll see in code that we can actually just parse this that is analyze what is after the domain name in a URL and just use this as generic input to the server to figure out what kind of output to produce. We're going to see the same convention though. If you want to pass in specific parameters, key value pairs, uh we'll use a question mark after our so-called route key equals value. And then if there's another one or more, we'll just separate them by amperands. And to do all of this, we're going to recall the inside of those virtual envelopes. Recall that if we did something like on google.com to search for cats, what was really being sent to the server was a request for /arch, which notice is not search.html. There's no folder per se there. This is just the name of a program really running on Google servers. And that's going to be the so-called route that we ourselves start programming today. question mark Q equals cats just meant that the query parameter the input from the web form is going to contain in this particular example the word cats. So how are we going to do all do this? So we could implement our own web server in C. It would be a nightmare to like use a language as lowle as C and actually deal with something as high level as writing code for the web. We're instead going to use Python for the most part if only because it's much higher level. But even then, we would probably if we wanted to do this thing uh from scratch, we would have to write a lot of Python code to like analyze the insides of these envelopes, figure out what inputs are being passed to the server, and then figure out how to access that in Python code. It's just a lot of work to just get a web application up and working. And so what the world generally does is they don't reinvent the wheel of writing their own web server. Rather, they use an off-the-shelf fairly generic web server or application server as it might be called. And we for instance are going to use something called flask. Now flask is a framework as the world would say or more specifically a micro framework which just means it's a library of code that other people wrote to make it easier for us to implement web applications. So they took the time to figure out how to handle get requests on a server, post requests on a server, figure out how to extract key value pairs from URLs, the sort of commodity stuff that like literally every web application on the internet has to do anyway. So we don't have to retrace those steps ourselves. What this will allow us to do is only implement the problems that we care about by using this framework. And to be clear, a framework much like Bootstrap is not only a library that someone else has written for you, but it's like a set of conventions that you follow in order to use the library in their recommended way. So it's more of a generic term that includes library and a set of conventions. And how do you know how to use either? You just read the documentation or take a class in which we're about to give you an introduction to some of this right here. So instead of running today http-server to start a web server that just serves up static content files and folders in our account we're instead going to run the command moving forward flask space run and this is going to look for code that we've written in our current directory and if it is in accordance with the conventions to which I'm alluding by using the so-called framework then it's going to start our web application on some TCP port for instance 8080 as we discussed last week to do this all we have to have in our current folder There is minimally a file called app.py by default. This is hinting at an application in the language called Python. And what code we put in there we'll soon see. And then ideally we would have another text file called requirements.ext by convention inside of which is just one per line the name of all of the libraries that we want this web application to include. In other words, if I go over here to VS Code, if I don't have such a file, that's fine, but I want to use a framework like Flask. Recall our pip command for installing Python packages. is I could just say pip install flask enter and that would go ahead and install the flask framework or library for me just like we did a few weeks ago with installing the silly little cows uh library as well. I've already done that in advance and better still I've installed I've come with uh my code today both of these files app and requirements.ext and in fact if I go ahead and create one just for fun here all you need do in a requirements.ext text file is literally put the name of the library that you want to include and then you run pip in a slightly different way to install that library or any other libraries that are in that file as well. So let me wave my hands at the requirements.ext for uh moving forward. It just means what libraries do you want to use with this web application so you don't have to remember or memorize them and type them all out manually. All right. So what's going to go inside of app.py? Well, the minimal amount of code that we can write to make our own web application that does something like print out hello world to my browser could look like this. Now, there's a bit of new syntax here, but not all that much today moving forward. The very first line just says from flask import flask, which is a weird way of just saying give me access to the flask library. Capitalization no matters. And so, the package that we're using is called flask lowercase, but we want to have access to a special function in there called flask capital F. So this is sort of a copy paste line. The next one's a little weird looking, but it essentially says give me a variable called app and turn this file into a flask application. We haven't seen this in a few weeks, but there was that weird if conditional that we put at the bottom of some of our Python code a few weeks back that just said if uh dot dot dot and it mentioned in there name if name equals equals_. So we've seen an illusion to name. For our purposes, name just refers to whatever the name of this file here is. No matter what I call it, you can sort of access the current file by way of this special global variable. So this line collectively just means turn this file into a flask application and store the result in a variable called app. So I can now do stuff with flask. And what am I going to do? Well, down here, let me first point out a familiar syntax. I'm defining a function that I called index by convention, but I could have called it anything I want whose sole purpose in life is just to return quote unquote hello world, which is the super simple output this web app is going to display. But, and this is the new syntax, I'm using here, what's generally called a Python decorator, which is a type of function that essentially affects the behavior of the function right after it. So, by saying atapp.rout route quote unquote slash. This is telling the Flask framework associate this index function with this route, the single forward slash. And that's how we're going to take over the default behavior of the slash portion of the URL by telling it to return whatever this function returns. And we'll see this in action now. So let me go over here say to VS Code. And within VS Code, I'm going to whip up exactly that application in a file called uh app.py. Just so as to combine this and some subsequent examples, maybe the same folder, I'm going to first create a directory or folder called hello. I'm going to go into that hello folder. I'm going to go ahead and recreate that same requirements file just for good measure to tell the world that I want to use the flask library here. And then I'm additionally going to create now app.py. And I'll type this fairly quickly, but I'm just reciting what we saw a moment ago. From the Flask package, import the Flask function, lowercase F, capital F, respectively. Then give me a variable called app. Set it equal to that function call passing in the name of this file, whatever it actually is. And then lastly, let's go ahead and call at app.rout quote unquote slash, which says, hey, Python, whatever the next function is, associate it with this slash route. And so I'm going to define that function. I could call it anything I want, foo or bar or baz. But in so far as slash represents the index of the website, like the default page, I'm just going to go ahead and call it by convention index and then return for now hello, world. And that's it. So whereas last week when I was writing code in HTML files, I was making web pages, now I've created what we'll call a web application. And it's an application in the sense that there's actually some logic going on there. There's some functions, there could be some conditionals, there's clearly a variable, there could be loops, and all of the sort of stuff we've seen in Scratch, NC, and Python as well. We'll now see back in this Python file. So, how do we now run this? Well, let me go back into my terminal window here, and I'll clear it just for good measure. I'm going to go ahead and run flask run enter. I'm going to see some cryptic looking output, but there's that familiar pop-up with the green button that wants to open up this application, whereas HTTP server uses 8080 by default. Flask uses port 5000 by default. And here we have it. I've just opened up my second tab, and we spent a lot of time there last week. This is the server I'm running, not on port 8080, but on port 5000 today. And there is the contents of what was spit out by my very first application. Now, even though the browser is rendering this like it is a web page, notice this. If I uh inspect, if I rightclick or control-click anywhere on the screen and go to view page source, you'll see that there's no actual HTML on this page. It's literally a single line of text, hello, world. If I close that and rightclick or control-click again and go to inspect like we did last week to open up developer tools, you'll see that the browser has actually filled in some blanks here for me by just rendering as it should the minimal possible web page. But the content I actually sent to the web browser is only literally hello, world. So how can I actually send a web page of my own rather than letting the browser do something like this? Well, I could go ahead and close that and go back to my application. I'm going to go ahead now and hide the terminal just because the server is still running. And what I'm going to go ahead and do here is well, nothing's really stopping me from returning not just a string of text, but a string of HTML. And this might not look pretty, but let me go ahead and do open bracket doc type HTML close bracket then HTML then head then title. And I'll just title this for instance hello to keep it simple. back slashtitle back slash head open bracket body hello, world back slashbody back sltl uh close quotes and I used single quotes in this case but I could have just as easily used double quotes but that's a full-fledged web page like that's the minimal amount of content we saw last week actually you know what for good measure let's actually add lang equals quote unquote en so it's actually fortuitous that you use single quotes because now I have some double quotes inside and even though this is not pretty printed it's just one massive mouthful of HTML all along one Fine. When I now go back to the browser, reload the page as by clicking here, and then view page source again, here's what my browser received this time. Indeed, it's the full-fledged HTML. And in fact, if I close that tab and reopen developer tools via inspect, now we'll see in the tab absolutely everything that I sent over, including a title, including the lang equals n. And had I typed even more, we would have seen that, too. All right. So, what was the point of this exercise? It feels as though that I've really just taken more time, added more complexity to achieve literally what I could have done last week by just creating index.html myself without any Python code. But I dare say what we're trying to do is lay the foundation for a full-fledged interactive website that maybe has forms that we can submit to the application that allows us to generate not just one page, but maybe two or three or any number. So what you're seeing here is sort of the beginning of google.com's search application or gmail.com itself or facebook.com or any web application you can think of begins with a little code that theoretically looks a little something like this. But this is kind of stupid to put HTML hardcoded no less in one long string here inside of my application. Let's try to factor this out. That was a lesson we preached last week about sort of factoring out our JavaScript, factoring out our CSS. We can do the same thing with our actual HTML here. And so what I'm actually going to do is import not only the Flask function, but also another function that per its documentation comes with Flask called render template with an underscore in between. This is a function whose purpose in life is to render a template, so to speak, of HTML. We'll see what we mean by template in just a bit. But down here, what I'm going to do is now delete all of that code. And let me just assume that I'm going to put that same code in a file called index.html, html just like I did last week. So let's instead return the return value of render template of quote unquote index.html. Now that file does not yet exist. Indeed, if I go into my terminal window, create a second terminal just so I can leave the server running but still see what's going on. I'm going to CD into that same hello directory, type ls to list my files, and I only see app.pay and requirements.ext. But it turns out per Flask's documentation, if you want to create your own HTML files, you simply have to add a directory that by convention is called templates. And that's it. So in addition to app.py requirements.ext, I need a folder called templates. So let's go back into VS Code, make dur templates. Capitalization matters, all lowercase. Now, let me go ahead and cd into templates and run the code command and create a file called index.html in the templates folder. And then super quickly, let me hide this. Let me whip up that same page again. Doc type HTML html lang equals quote unquote en close bracket uh head close bracket title close bracket hello and then down here body close bracket hello, world. So autocomplete is helping me type quickly. But now I have a file with my HTML that this application I claim is going to spit out automatically for me. So let's see the effect. Let me go back into my other browser tab. Let me close the developer tools and let me quite simply just click reload. And no apparent change. It's working exactly as it did before, but I've laid the foundation for making a much more useful layout of my files so that I can actually keep my logic, my Python code, and my HTML a bit separate from that. All right. Well, how can we make this into something even more interesting? Well, let's start to take some actual user input for instance. So, wouldn't it be nice if I could pass in via the URL something like Q equals cats, but maybe something like name equals David or name equals Kelly and actually see the name that's being outputed. In other words, let me zoom in up here and let me pretend like this happened automatically. Let me do question mark uh name equals David. Enter. Well, it would be nice if I saw hello, David. I'll I'll propose rather than just hello, world. So, how do I actually get access to everything after the question mark? Well, here is where a framework like Flask and any number of alternatives starts to shine. It gives me that answer for uh automatically. And so it turns out in Flask once you've used it, you have access to a special global variable as we'll call it called request.orgs where args just means the arguments or the parameters that were passed in to this HTTP request. So how do we use this? Well, let me go back to VS Code here. And at the very top line, in addition to importing Flask, capital F, render template, let's also import request, which is a global variable that comes with the Flask framework. And then I'm going to use it as follows. I'm going to go ahead and say um a second argument to the render template function where I'm going to say placeholder equals request. Actually, let me not do that yet. Let me first create a variable name equals request args. And then let me go ahead and get the name key from the arguments. And then down here, let's go ahead and pass in placeholder equals name. So what am I doing here on line 8? I'm creating a variable called name. I'm storing in that the value that's in the request global variable in what's apparently a dictionary called args, specifically the name key therein. So if the thing after the question mark name equals is David, this should give me David. If it's Kelly, it should give me Kelly instead. Then what I'm doing is rendering this template called index.html, but I'm additionally passing in some named parameters. We talked briefly about that in week six when we introduced the idea that Python can take not only a commaepparated list of arguments, but some of which can have names. So I'm proposing that one such name of an argument to this render template function can be placeholder for instance. Now, at the moment, this code isn't going to do anything useful. If I go back indeed to the other tab, click reload after zooming in, even with my name in the URL, you'll see that we still see hello, David. But here's where things now get interesting. And here too is what we mean by template. If I go back into VS Code, open up index.html again, and instead of putting the word world there, what I'd like to see is not hello world, but hello, placeholder. But of course, if I literally type that, I'm going to see literally placeholder unless I surround placeholder with pairs of curly braces like this. And by using these pairs of curly braces, I'm telling Flask that I want to interpolate, so to speak, that variable. I want to substitute in its value. So this is yet another syntax. In Python, we saw fstrings. In C, we saw percent s. When using something like print f in an HTML file, when using flask specifically, we use these pair of curly braces to denote this is indeed a placeholder whose value should be plugged in. So now let's go back over to the second tab. Recall if I zoom in that passed in already to this URL is question mark name equals David. And this time when I click reload, voila, now I see my actual name. And unlike the JavaScript examples last week which were doing everything client side, notice here if I go to uh rightclick or control-click and view page source, what's noteworthy today is that David in this case literally came from the server. This was not rendered client side. The server sent this HTML and specifically this text. So, if I go back to the same tab here, zoom in and change David for instance to Kelly, what I should see instead when I hit enter is hello, Kelly. And indeed, if I go back to the source code and reload the page there, I should see in the view page source that the server sent indeed hello, Kelly. So, it's in this sense that it's an application. The URL is providing input to the application by way of this URL format, the so-called get for uh the get string that's being passed in. And if I look at the code that I'm running, app.py is the code that's running. It is grabbing that name from the URL. I am then passing it into my index.html file and then my HTML file is plugging the actual value in for me. And so what's going on with for instance these curly braces? Well, here too is where we're actually using a library. And included in Flask is another library called Ginga. And Ginga is what's called a templating library. And there's so many templating libraries in the world. Ginga is actually fairly s simple, which is nice. And which is why Flask uses it. And for now, you can just think of Ginga as being the library that knows how to interpolate variables inside of pairs of curly braces. So why are we introducing yet another frame, another library? of all the folks who implemented Flask decided that it was not worth their time reinventing the wheel of a templating language, a language via which you can figure out what values to plug in where. So they just lean on another library that someone else wrote years prior so as to not reinvent that wheel themselves. And that's all that's going on with a framework. In this case, it's using perhaps multiple libraries instead. All right. So what then is a template? So this then is a template. What you're looking at here, hello, placeholder, is a template in the sense that it's kind of the blueprint for the web page I want the user to see, but it's going to be dynamically generated using indeed this blueprint by plugging in the value of placeholder inside of those pairs of curly braces. And so that's why index.html starting today is in a folder called templates because this is not just static HTML like the stuff we wrote last week. This is the uh the the the blueprint for the actual HTML that we want the browser to spit out. But there's a bug here. Notice what's going to happen here. If I go up to this URL and I get rid of the name altogether, for instance, I just visit the slash route without any key value pairs and hit enter. This is sort of bad bad request. It's an HTTP 400. In fact, if you look at the tab, here's another HTTP status code that we probably haven't seen before. But 400 just means the user did something wrong by not passing in the parameter that was expected. Well, that's a little bad design if like the user has to manually type in things to the URLs. Like no human actually does that. That's not good for business or customers in general. So I can go back into app.py and just make a little bit of conditional code here. And here's too where we see what makes this an application and not just a static page. Instead of just blindly getting the name here, I could instead do something like this. Well, if the name parameter is in request.orgs, and this is just Python syntax for asking if this key is in this dictionary, then I'm going to go ahead and define name and set it equal to request.orgs quote unquote name. Else, if there is no name in the request, well, then I might as well give some default value like name equals quote unquote world. And that alone logically makes sure that I only try to access request.org's name if the key is actually there. So, if I go back to the browser now, reload without anything else in the URL. Now, we're back in business and it's saying hello, world. But if I go up to the URL bar and add name equals David, enter, that too now works. So, it's a web application in the sense that not only does it have function calls as well as a variable, but now we've got some conditional logic with boolean expressions as well. All right, questions on anything we've done thus far because it was a lot all at once. Questions thus far? Yeah. >> Good question. Let's try that. What if I just did question mark name equals nothing? Well, let me go back to that other tab. Uh, delete the name David and hit enter. And I indeed see hello, nothing. Why? Because the name key is provided now. It just doesn't have a value. And so the conditional has the same answer. Well, yes, name is in request.orgs, but there's just no value associated with it. And here again is the value or a hint at the value of using a framework like flask. The fact that I can just import the request global variable and then ask questions like is this parameter in this dictionary means I don't have to write any of the code that like figures out what the URL looks like, break it apart between the question mark and the equal signs and any amperands therein. That's all sort of generic logic that every web application has to do. So again, Flask is sort of doing that lift for me and I can just focus on the logic that I actually care about. All right. Well, a quick convention here. It's I've used the word placeholder here just to kind of hit the nail on the head and make clear this is a placeholder, but frankly it's a little more readable stylistically to not just put hello generic placeholder, but to say something like hello, name so that a colleague or even myself looking at this file down the line knows that okay, we're trying to print out the user's name here. That's fine. You can change the name of these variables to be anything you want. And even though it looks weird, it's conventional in Flask to do something like this. Name equals name. But each of these names means something different. This is the name of the placeholder that I'm going to put in my actual template. This is the value that I actually want to give it. And it just keeps me a little ser by just reusing the same name instead of calling it placeholder or placeholder 1, placeholder 2, placeholder 3, or something generic like that. Now it's just a little clear even though it looks weird to say name equals name. Again, that just allows me to do this in my template. All right. Well, what more can I do after that? Well, let me propose that we can actually go in and simplify this code a little bit. It turns out this is so common to just ask a question as to whether the parameter is there and then do something with it or not that flask comes with some logic to do this. And in fact, I can get rid of all four of these lines. Just go ahead and with confidence declare a variable called name, set it equal to request.orgs, arcs, but in the so-called dictionary, use a function called get that comes with it, which technically doesn't relate to the verb that was used by HTTP. This just means literally get me the following. And if you want to get the parameter called name, you literally just say quote unquote name. However, in case there is no name parameter, you can also give this function a default value like world. And so now we've collapsed into four lines uh from four lines into one that exact same logic. So this gets me the HTTP parameter called name. But if it's not there, it gives me a default value of world. So that no matter what, this name variable has what I care about. Indeed, if I go back over here, let's type in how about name equals David again. Enter. That's there. If I type in uh no name, enter. That too is now working as well. All right. Well, let's see if we can refine this a bit more. Let me propose that in our next version of this. Let's introduce a second route. So two URLs. Much like uh Google has many different URLs as does most any web application. At the moment, I'm doing everything in my slash route. So how might I move away from this? Well, let me go ahead and not only add a second route, but an actual form via which the user can type in their their name. So to do this, let me propose that in index.html, HTML. Instead of just printing out the user's name and trusting that they're going to have typed their name in manually to the URL, which again is not normal behavior, let's actually show the user a form via which they can do exactly that. So here's my form tag. Uh let's say the method I'm going to use is get so that I see everything in the URL. Let's give myself an input uh that whose name is name because this is the human's name. And notice somewhat confusingly, this name on the left is the HTTP, sorry, this name on the left is the HTML attribute that we saw last week. So, it's different from what we just did in Python, even though they're all called the same thing. The type of this input is going to be text. And let's go ahead and make this a little more user friendly. Let's put some placeholder text called name, so the human knows what what to type in. Let's go ahead and disable autocomplete just so we don't see previous input into this text box. And let's autofocus it so that the cursor is blinking in the text box by default. Then lastly, let's go ahead and have a button the type of which is submit. So that clicking this button actually submits the form. And I'm just going to call this button like greet because I want the user to be able to greet themselves by clicking this button. Now I should specify action. The only other time we used action is when we actually went to httpsw.google.com/ google.com/arch that's not relevant today because I'm trying to print hello world not search for cats and such but this is where I too have control if I want to submit this form to a specific location on in my web application action is where I can specify it so why don't I pretend that there exists a route in my application called /greet and if you go to example.com/greet question mark name equals David this now will greet the user with hello David for instance, but slashgreet does not exist. If we go back to app.py, literally the only route that currently exists is single slash, but I can change that. I can go into my uh app.py as I have here and below this function, I can go ahead and define app.rout quote unquote /greet and just invent any route that I want. I can then define a function that will be called whenever that route is visited. By convention, to keep myself sane, I'm going to call the function the same thing as the route, but you don't have to do this. It's just to minimize uh decisions I have to make. And then in this function, what I'm going to do is this. Return render template greet.html, which doesn't exist yet, but that's a problem to be solved. And then I can pass in the name of the user. I'm going to go ahead and save myself a line of code and just say request.orgs.get quote unquote name, world. In other words, strictly speaking, I don't need that variable on its own line. This has the effect of what we already did in index, but I'm doing it all in one elegant oneliner. And now in index, in so far as I want the index of the site to just show the user the form via which they can type in their name, this one's easy now. Render template quote unquote index.html and return that template. So to recap, here's index.html, HTML which is now a form instead of a template for hello, such and such. App.py is going to return that template whenever I visit the index or slash of the page. And then this greet route is going to handle the case of printing out greet.html passing in the user's name. All right, I think I'm not quite good to go yet, but let's try this out. Let me go back to my browser tab, reload, and there we have it. I have a web form now instead of the uh the hello, soando, I'm going to go ahead and type in my name. And notice the URL at the moment, even though Chrome is hiding it, technically it's there slash, but Chrome and most browsers today sort of hide as much stuff as they can if it's not all that intellectually interesting. But watch what happens when I click greet to the URL. It automatically sends me to /Greet question mark name equals David. And this is just like the way the forms worked last week when we recreated our own version of Google in search.html because the action there was google.com/arch. The user was whisked away to Google server. Today I stay on the same server because the action I used was quite simply slashgree which is assumed to be on my own server. But clearly I screwed something up because I have a big internal server error in front of me as you soon will too. Odds are as you dive into this uh 500 is the status code that means your fault somehow. Now why is that? Well, it's unclear from this generic black and white message. However, because I'm the developer, I can go back to VS Code, open my terminal window, and recall that I have two terminals open now. One that I can type stuff in, the other of which is still running from before. Let me open up that one. And you'll see if I maximize my terminal window, a whole bunch of scary error messages here. But the relevant one is probably going to be, let's see, down here. Race template not found error. Ginga exceptions template not found. Greet.h. html. So there's a lot of esoteric error messages here, more so than usual, but the simple fact is that I just screwed up and I did not create greet.html. So file not found by the server. So the user doesn't see all that complexity. That's deliberate by design. It's generally not good for cyber security. if you're revealing to the user all of the error messages that are happening on your server because maybe that suggests they can hack in some way some way by taking advantage of those error messages and the information implicit in them. But they are there in your terminal window to actually see and diagnose. So how do I fix this? Well, not a problem. Let me shrink my terminal window back down. Let me code a file called greet.html. And in greet.html, let's create the template via which I'm going to greet the user, which ironically is the exact same as index.html HTML used to be. So, let me recreate that real quick. Uh, doc type HTML. Let me close my terminal. HTML lang equals en uh head uh title hello body hello, and there's my uh here's my placeholder hello, name. So, to be clear, the index.html template doesn't have any curly braces or anything dynamic. It just spits out the HTML for the form. Greek.html HTML spits out HTML and the actual greeting. And it's app.py that decides which of these to show the user. Either index.html if they visit the slash route or greet.html if they somehow find their way to the /greet route, which they will automatically by simply submitting that form. All right, so let's go back into this internal server error and go back to the form. Nothing has changed with the form, but now when I type in David click greet, not only will the URL change to be slashgreet question mark name equals David, I actually now see the content that I expected a moment ago. All right. Well, now it's a opportunity to critique. I have these two templates open, index.html and greet.html. And even if you've never done web programming before and even if you've never did HTML before last week, what is bad about this design intuitively? >> Say again. >> Abstraction. >> Abstraction in what sense? >> Yes. So that's exactly the the hangup I have here. There's a lot of duplication. And technically I didn't copy paste though I might as well have because notice as I very hintingly go back and forth almost every line of code in these files is the same except for the form which is there or not there or the hello comma like all of the boilerplate HTML namely everything I just highlighted here lines one through seven in greet.html HTML and this and this is what we really start to mean about a template. Like wouldn't it be nice if we could factor out all of that HTML that's common to both files, put it in literally a template that both routes can use so that I can write that boilerplate code once instead of again and again. Cuz imagine in your mind's eye, well, if I have three routes or four routes or five routes, I'm going to be like typing the same darn HTML three, four, five times. That's got to be dumb and that's got to be solvable as we've seen in other languages as well. So, let me indeed go ahead and try to improve this. And the syntax is a little weird, but it's the kind of thing you get used to quite quickly. I'm going to go ahead and create a third HTML file now by going back to my terminal window inside still my templates directory. And by convention, this file is going to be called layout.html. Why this? That's what the flask documentation tells you to do. So, in layout.html, HTML. I can pull all of my boilerplate HTML, the stuff that is invariant and doesn't change. So, here we go. Doc type HTML uh HTML tag lang equals en close bracket open bracket head open bracket title. We'll call it hello for all of the pages. Open bracket body. And here's where it gets interesting. The body is the only thing that has been changing in these two examples. In index.html, it was a web form. In greet.html, HTML. It was just a simple string of hello, so and so. So, what I want to tell Flask is that everything in the body will just be a dynamic block of code. And the syntax for that, which takes a little bit getting used to, but it's also sort of copy-pasteable. Block body using percent signs this time. And because I don't want any such body in the template, I'm going to literally close this block as follows. And here you see another example of sort of HTML like syntax but instead of using angled brackets, Ginga uh the templating library that Flask uses uses curly brace and percent sign to open the tag and then the opposite to close it. So what you really have here are two Ginga tags as we'll call them. This one is called block and I'm defining an arbitrary name here. I could have called it foo bar or baz but because I want this block to refer to the body of the page by convention I'm going to call it body. And then this weird syntax which is used in some other languages too just means end whatever block you just began. And so again you just see reasonable people disagreeing. The people who invented HTML use nice angled brackets and words like these. The people who came up with ginger used curly braces and percent signs. Why? Well, odds are these are not normal symbols that a human would type when writing uh code, at least in HTML. So they just chose something that probably wouldn't collide with actual syntax the human wants to use. So that's it for the template. This is now a uh this is essentially a blueprint that doesn't have just a placeholder for a single word or value like name. I can put a whole chunk of code here now instead. And how do I do that? Well, let me go into index.html with the moment which at the moment is a little duplicative in that it's got all of this boilerplate. So you know what? I'm going to go ahead and delete everything that is already in my layout both above and below that web form. And now I'm going to use a bit more ginger syntax. This too takes a little while to memorize or copy paste. But if I want index.html to use the layout.html blueprint, I can simply say extends layout.html and then close tag using percent sign close bracket here. And then if what I want to plug into that layout is the following code, I can say as before block uh body and then down here I can say end block. And that's it. And just to be a little nitpicky, I'm going to de-indent that slightly. And now even though it looks like web pages suddenly look a lot uglier. Well, they do because like this is weird looking syntax, but I have now distilled index.html into its essence. This is the only thing that changes visav the greeting page. And so I've put my HTML here that I care about. I've said to Flask, this is what index.html's body block shall be. Where to put it? Well, put it into that particular layout.html file. And so the logic for greet.html is the same thing. It's going to look just as weird, but again, you get used to it. Let's go ahead and delete everything that's boilerplate in greet.html, both above and below. up at the top. Let's tell Flask that greet.html 2 extends layout.html. And let's go ahead and say to Flask that the block uh called body shall be this for greet.html. And the end of this block is now down here. And just to be nitpicky, I'll de-indent that too. So again, the pages look a little weirder now, but it's going to follow a paradigm that we just see again and again, such that the only juicy stuff is what's inside of that body block. So now, if I go back to my layout, it looks exactly like this. This indeed is a placeholder, not just for a single variable like name or the placeholder we did before. This is the placeholder for a whole block of code that came from a file, not from a variable. And so if I go back into my other tab here, go click back to go back to the web form and reload. Notice that I have the familiar looking form. But if I now look at my developer or if I look at view page source, notice everything that came from the web page from the server. Here's that boiler plate up here. Here's that boiler plate down here. And here's the stuff that's unique to this page. And recall too, aesthetically I de-indented it, which is why it's now no longer pretty printed in what the browser sees. Like that's okay. There's no reason to obsess over the indentation and the pretty printing of what the browser sees. Ultimately, the reason I did this indentation is because arguably when I'm in VS Code here and I look at index.html, this is clearly indented inside of the body block just so I know what's part of that block. The browser does not care about superfluous whites space or less thereof. All right, questions on what we've just done here, which is to truly take this template out for a spin and now remove what redundancies I had accidentally introduced. Questions? No. Okay. Amazing. All right. Well, let's go ahead and look at this URL again. I'm not liking the fact that every example we've done thus far involves putting my name or Kelly's name right there in the URL bar. Well, why is that? Well, if I have like a nosy sibling and they sit down at my browser, they're going to see like every URL I visited, including whose name was greeted. Now, that's not all that big a deal, but now imagine it's a username and a password that the form is submitting or a credit card number that the form is submitting or just search terms that you don't want the world knowing you're searching for. They're going to end up in the URL bar. Why? If you are using method equals get for the form, that's how get works. It literally puts all of the HTTP parameters in the URL, which is wonderfully useful if it's sort of uh low stake stuff like the Google search box or if it is um or potentially low stake stuff like the Google search box or if you just want to be able to hyperlink directly to a URL like this. In other words, if I put this into an anchor tag open bracket a href and a URL like this, I could deep link a user to a web page that just always says hello, David. So get strings contain all of the requisite information to render a page for the user. But this isn't really good for privacy. So recall that there's not only get, but there's also something called post. And post is just a different HTTP verb that essentially with respect to those virtual envelopes next last week sort of puts the information more deeply inside of the envelope such that it's not written right there in the URL bar, but it's still accessible by the server. So if I do this, watch what happens. Let me uh go back into VS Code. Let me go back into index.html which has the form. And let me quite simply change the method from get to post. And now let me go back to my other browser tab. Back to the form and reload so that the form knows that the method has changed. Now type in David and click greet. And before I do that, let me zoom in on the URL bar. Notice that the URL does change. I'm at slashgreet, but I haven't revealed to the world or to anyone with physical access to my browser what URL I just searched for. All they know is that I went to /greet, but not the key value pair or pairs that were passed in. Of course, this clearly hasn't worked. I've got an HTTP status code of 405, which means method not allowed. That's because flask by default when defining routes simply assumes that you want get instead of post. Now, get is good for the default page. In fact, when I go back here, this is equivalent to me visiting the slash route just in the browser. So, I want my index to generally support get, but the greet route should support post. And the simplest way to do this is to pass in another argument to the route function, which we haven't needed before because the default is get. And I can instead tell flask a commaepparated list of the HTTP methods that I want this route to support. So if I wanted to support just post, I can pass in a list containing just post. And recall FL uh Python uses square brackets for lists, which are their version of arrays in C. Now by default, this argument is this methods equals get. And that's why the only thing supported a moment ago was get. That's why I'm now changing it to be post instead. I have to make one other change though. It turns out if you read the documentation when accessing HTTP parameters via post instead of get you move from using request.orgs to request form. This is completely unintuitive that request.orgs is get and request.form is post because they all come from forms. So it's bad naming admittedly. So you just kind of have to remember request.orgs is used for get. Request form is used for post. So all I need to do further is change this to be request.form and that's it. Now my web application will support web form submitting to it via post instead of get. Let me go ahead and type in my name. Now I'll zoom in. Notice that the URL will again change to /greet with no parameters evident. But I will be greeted this time because the server knew to look deeper into that envelope for those key value pairs instead. And just to be now uh sort of diagnostic about this, let me go back once more. Let me rightclick or control-click on my desktop and go to inspect. Here's where developer tools can be super useful as well. I'm going to go in here and I'm going to go ahead and clear this. And now I'm going to type in David again and I'm going to click greet. But because I have the network tab open like we played with last week, it's going to show me all of the requests going from my browser to server, which is going to be useful here because not only do I see, okay, it obviously worked because I got back a 200, but if I click on this diagnostic output, I can actually go to the payload tab here and I'll see that the form data that was submitted was name, the value of which was David. So you can see what you're submitting. So you can do this today like if you want to log into some website uh Gmail or otherwise you can actually see all of the data that your own keyboard is submitting to the server even if it's using post because the browser that you control of course can see the same there. All right, any questions now on this transition from get to post kind of on a roll or not going so well. We'll see. All right, so what more can we do with this? Well, let's give ourselves a couple more building blocks before we transition to actually implementing some real world problems as I did years ago with one such example. Suppose that I don't like this direction I'm going in in so far as every time I have a page with a form, it submits to another route altogether. Cuz in your mind's eye, just kind of extrapolate. Well, if I have two forms on my page, I now need four routes. If I have three forms, I need six routes. It seems a little annoying that you use one route just to show the form and another route to process the form. This is going to get annoying over time because it's like twice as many routes as might be ideal. So, is there a way to get kind of the best of both worlds and combine these two routes into one so that everything related to greeting the user all happens in one place? Well, you can as follows. What I'm going to go ahead and do is delete my greet route al together and most of my index route. But I'm going to ask a question. I'm going to first say that the methods that the index route support now shall be both get and post as a commaepparated list there. And then inside of my index route I can simply ask a question of the form if the request that is submitted to the server has a method of post then assume that form was submitted. This is just a Python comment note to self that I'm going to come back to in a moment. else if the request method is not post. So I could technically say if l if uh l if request method equals equals get then but this is kind of dumb because I only support two verbs. So I might as well just assume for efficiency else handles the get implicitly then go ahead and assume that no form was submitted. So show form. So just notes to self as to what I want to do. So how do I show the form? Well this line was easy. return render template of index.html. If though the form was submitted, what do I want to do? Well, just as before, let's return render template greet.html passing in a name value of request.form.get quote unquote name else a default value of world. So, the exact same logic from each of the two functions a moment ago, but I've now combined them into one by just using some conditional logic and just asking the server if the user got here via post, well, the only way they could have gotten here via post is by having clicked that button and submitted the form. So, let's just go ahead and greet them. Else, if they got here via get by just typing in example.com or whatever the actual URL is, let's go ahead and show them the template. So, it's still good design in that I have a separate template for each of these pieces of functionality that is only minimally different, but I'm sort of deciding which of those to show based on the actual logic in this here app. All right, so this is almost perfect except for one bug. What else needs to change if I've just combined my greet route and this default slash route as well? Yeah. Yeah. So, in the form that has index.html, recall that there's an action line that specifies like to what URL do you want to submit this? Well, let me go back to index.html. It can't be /greet anymore because that doesn't exist. So, I'm just going to delete the word greet and submit it to slash instead, which will have the effect of also just omitting it entirely. If you don't specify an action, it submits to the very location that it came from. But if you want to be pedantic and even more clear, just specifying that the action now of this form is just this, then that will work here, too. All right, so let's test it. Let's go back to the other tab. Back to the form, reload. It's blank now. I type in David. Click greet. And this two is working. But again, if I go back and reload, get is working as well. But there's nothing ending up in the URL because I'm now using post, which again tends to be a good thing for privacy reasons as well. Let me show one final flourish before we transition to something realworld motivated. If I go into app.py, for a while now, I've been passing in this default value of world, which is fine, especially if it's something short and sweet. That's the default value. But I can actually put a bit of conditional logic in my template as well. So, in fact, let me go into greet.html HTML and trust that I will now be passed in a name variable. But I can decide for myself in the template whether I want to say hello name or if it's blank hello world instead. And how might I do this? Well, I can always say hello, but then I'm going to use some Ginga syntax that we haven't seen yet. But it turns out in Ginga, the templating language that Flask uses, you can use Python-like syntax too. And you can ask questions like well if uh the name variable has a value well then go ahead and output the value of that name. Else if the name variable does not have a value go ahead and output a literal value like world. Uh and then down here end if. So ginger again is a little weird in that it says end block end if but that's the way it is. But even though this looks a little weird, it's just a nice clever way of putting a bit of logic into my template. And if the name has a value, so it's not empty or none, go ahead and display it. Hence the curly braces. Else go ahead and literally say world. Why is it not problematic? And you can see the dots here that there's all of this white space after the word hello, like otherwise this would seem to create quite a messy paragraph or phrase of text in terms of whites space. But >> HTML ignore ignores superfluous whites space. So anything more than a single space just gets canonicalized or collapsed into a single space. And we saw that recall last week accidentally when I had those three paragraphs of of text uh from uh from the duck, but I wanted them deliberately to be separate paragraphs and they weren't because all of that white space was ignored until I actually introduced the uh paragraph tag instead. So this just moves some of that logic. now to the templates. So for all this logic and more, here's the official documentation for Flask and specifically Ginga's own documentation, but for the most part, we've seen what's possible already. And I promised a real world example. So here now it is. So uh back when I took CS50 as a sophomore, there was no web programming in the class. And frankly, there was barely any web actually in the world because it was all so new HTML and the like. But uh it was my sophomore, spring maybe or junior fall that I also got involved in the freshman inter mural sports program or frost IM's for short. And back in the day uh we would walk from say Matthews Hall to Wigglesworth uh freshman year at least to register for sports by filling out what was called a sheet of paper and then you would go to the proctor's dorm room and slide it like under their door or through the mail slot and that's how we registered for sports. It was sort of ripe for disruption before that was even a phrase. And so one of the very first projects I took on myself personally after taking CS50 was to figure out how web programming worked. And Python wasn't really a wasn't a thing yet uh nor was half of the topics we've been talking about thus far. But at the time I learned a programming language called Pearl. I learned a little something about CSV files which we did a couple of weeks back too. And I built this the freshman intramural sports website via which you could click on a bunch of links and get some information. But most importantly, you could register for sports as by typing in your name, selecting the sport for which you want to register, click submit, and no longer walk across Harvard Yard with a piece of paper to actually register for sports. So, we thought we'd use this as sort of the beginning of a motivation for how we can now solve problems using web- based interfaces using code. Um, and also what not to do, like background images that repeat like this are not really in fashion anymore, nor arguably in 1997. Um but let's leave that as a cliffhanger and come back in 10 minutes after a snack with re-implementing the frost IM's website. All right, we are back. So among the goals now are to recreate the beginnings of a site like this for frost IMS whereby we want to enable students to uh visit a form, fill out that form and submit it to a server and then register. And we'll dispense with all of the amazing graphics and such and keep it fairly simplistic and core HTML. So let's go ahead and do this. Back here in VS Code, I've gotten ready now for this next set of examples. And in particular, I've created in advance a directory called frost im.py, requirements.ext, and templates, which are essentially the same as the ones we just created, but I stripped out the hello and greeting specific stuff. I'm going to go ahead in this terminal and do flask run. So, I get the server up and running again on port 5000. And then I'm going to go ahead and open up another terminal here as I did before. cd into frost ims in that terminal where I'll see the exact same files and I'll give you a quick tour of what I created in advance. So here in app.py is quite simply the simplest of applications that just renders the index.html template with an expectation in a moment that we're going to make it more interesting than that. Meanwhile, if I open my temp uh my terminal again and open up requirements.txt, it just mentions flask, but it's already installed. So no more to say about that for now. Now, let me go ahead lastly and open up templates, uh, the templates folder. Two files there in the first of which is layout.html, which looks almost the same, except I did add a slightly more userfriendly tag to the head of the page, which you might not have seen before, but this is a tag that essentially you can copy and paste into templates of your own that help the content of a page resize to be mobile friendly. In fact, without this line, if you were to develop problem set 9 or your final project for the web and then try to access the site on a phone, everything might look quite a bit too small, font sizes and more, this line tends to help the browsers resize dynamically so that it actually matches the width of the devices own width. For instance, a phone versus a laptop or desktop. But otherwise, everything else is the same there, including the placeholder for the body block that I've defined here on line 9. Lastly, there's one more file that at the moment doesn't do anything all that interesting except is ready to contain the contents of the registration form for frost IM. So, let's go ahead and start with actually that. Let me quickly whip up a form that minimally gives the user something that they can submit to the server to register for sports and then we'll improve upon it a bit iteratively. So, here inside of the body of index.html, html which is going to extend the actual layout, the blueprint we already created. I'm going to have a quick title for the page like register just to make clear to the student what they need to do using the H1 which is the big and bold tag. Then I'm going to go ahead and have a form tag uh whose uh action is going to be anything I want, but since I want the user to register, I'm going to have it go to slashregister, which makes more sense semantically than greet now because we're doing something else. The method I'm going to have the student use is post, if only because they don't want their roommates knowing what they visited in their browser. So this way it will tuck the HTTP parameters deeper in that virtual envelope so it's not stored in the browser's history. Inside of this form, I'm going to have minimally an input box for the student's name. So I'll call that aptly name and set name equal to name in my HTML. The type of this text box will be exactly that text. And then just to make it a little more user friendly, I'm going to add a placeholder of name so they know what to do. I'm going to go ahead and uh turn off autocomplete in case multiple roommates want to uh sign in from the same computer, register from the same computer. And then we'll turn on autofocus to put the cursor in that name box. And then, and you didn't see this last week, but if you've ever wondered how drop-own menus are implemented in HTML, if you've never done this yourself, those drop-own menus on web pages are called select menus. And if I want the user to select a sport to register for, I'm going to call this input a uh sport. And this is an alternative to just having a generic text box where we have the students type in the sport they want to register for which would be fraught with typographical errors and changes in capitalization. A drop-own menu of course standardizes what the human can select. So inside of this dropdown I'm going to have a few options. uh the first of which uh will be uh basketball for instance, the second of which will be soccer and the third of which I think was the first three with which we debuted back in the day was ultimate frisbee. Now these option tags can take some attributes. Uh by default they will take on the value of whatever words are typed in between the open and close tags. But just to be pedantic I'm going to make clear that the value of selecting this option shall be basketball. But I could change it to be something else if I so chose. The value of this selection will be soccer and the value of this last option will be ultimate frisbee just in case I want to store something else in my database ultimately. Now that is a complete index.html I think. So if I go back to uh my browser tab which previously was showing me the hello program because I stopped and restarted Flask and you can stop flask by just hitting C uh for interrupting it. I'm going to reload the page and I should now see okay a slightly more interesting form with a name box with the uh cursor is blinking there and then a select menu a dropown with three options. Now it's a little presumptuous of me to select basketball by default and in fact this is kind of inviting user error if they type in their name don't really think about it and now register for basketball accidentally. So I'm going to make a couple of improvements here. I'm actually gonna have essentially a blank option at the top whose value is nothing and I'm gonna have it just labeled sport. And just to be super clear, I'm going to select this value by default. So the option tag in HTML supports not only a value attribute, but it turns out a selected attribute, which if present means that's the option that will be selected by default. So if we go back now to this page and reload to get a new copy of the HTML, looks a little better. I still have the name at left, but the sport now menu looks like this. So, it's a little more clear what I want them to do from this dropdown. And sport deliberately on the back end won't have a value. And theoretically, this will help me determine if they actually selected a sport or just clicked register and ignored the drop down still. But I do need a way for them to register ideally by clicking a button. So, I'm going to add a button, the type of which is submit. And then I'm going to have this button's label be register. So now if I go back to the form once more, reload, I now have I think a complete form, albeit not very pretty, via which David can register, for instance, for basketball by clicking register. And ah darn it, I have a 404 not found. But why is that? Why is nothing yet found? Why is slashregister not found? Yeah, >> what's that? >> I haven't Well, I haven't linked the option to anything. I think the form has been linked. Whoops. The form is telling the browser to go to slregister. So, this is correct behavior. But if we go to app.py, like there's no route defined for slregister. So, of course, it's not found because there's an infinite number of routes that don't exist and register is currently among those. So, I can define that myself. I can say app.root quote unquote register. Uh, I do want to use post. So I need to proactively say that the methods this uh function will support will be indeed post instead of the default of get. I'm going to define an actual function to call when this route is used. And by convention I'm going to call it just register even though I could call it anything I want. And inside of my register function, well for now I'm going to cheat a little bit. I'm going to at least just say uh I'm going to at least check that the user has given me a name and a sport. So how can I express this? Well, because I have already imported the request global variable that comes with flask, I can ask questions of it. And I can say something like if it is not the case that request.form.getame has a value or if it's the case that or if it's not the case that request.form.getport has a value, then let's go ahead and give the user uh a warning of sorts. I'll return render template of a file called failure.html. This doesn't exist yet, but no big deal. Let me go back into my terminal. Let me uh go into templates and create a file called failure.html. And in this file, I'm going to say that it extends uh layout.html.html. And then it has a block body inside of which is going to be something like super trivial for now, just to get us going. And this failure page is simply going to say you are not registered exclamation point and then end block. So that's it. Just sort of an error page that now exists. I'm going to close it out of sight, out of mind. But I think this now will work. If it is not the case that the user gave us a name or it's not the case that the user gave us a sport, we will show this error message. Otherwise, if all seems to be well, for now, we're not going to do anything useful with the information, but I'm going to go ahead and return render template of success.html, which is simply going to assume that the user was successfully registered. So, let's whip that up quickly. Uh, I'm going to go ahead and code up success.html inside of this file, which will similarly extend uh layout.html inside of which there's a body block that quite simply says, "How about you are registered?" and we'll just pretend that it is so and block. So that's it. In short, I want the two templates that show failure or success respectively. So I think now in app.py, we're in better shape. I now have a register route that will get called if post is used to visit it. And I'm going to check request.form, which is where you get the post variables from. Check whether name or sport is provided. And I'm going to render a template accordingly. So let's try this. Let me go back to my other tab and go back to the form. Let me type in my name, David, but no sport. Click register, and I have an internal server error, which was not intended. So, let's figure out how to diagnose this. So, it seems to be the case that I'm at /register. That was intended, but something clearly went wrong. So, let's go back. Now, I could just kind of stare at my code endlessly, but recall that there should be some hints in my terminal window that's running Flask. So, let me go back to my other terminal, and there it is. Unexpected char double quote at line 11. Well, look, sounds like user error. So, that is in failure.html. And you can kind of see it because Flask is like underlining it literally for me. What did I do that was stupid? Yeah, I just didn't close my quote. So, amateur hour here. So, let me go into I do need to open it after all, ironically. So, let's go ahead in my other terminal, open up failure.html. And there it is. One stupid character away from correctness. All right, let's close this again. Go back to the other tab. Let's try this again. David as my name but no sport. Register. Okay, you are not registered. I don't know why, but I know I'm not registered. Let's try it again with a name. Uh with no name, but yes, a sport. Click register. You are not registered. All right, just for good measure, let's give no name and no sport. You are not registered. So, that seems to be working. Let's now cooperate. Let's go ahead and register as David for basketball. Cross my fingers. Damn it. And internal server error. Let's try to learn from my past mistakes. Let's open up this eyeball it. I did it twice even though that was not copy paste. So 0 for two. All right, let's go back here. Notice now I can actually just click reload because the browser is smart enough to remember what I just posted to the server. So if I click reload, you'll be prompted to confirm the form submission less you be doing this on a website with your credit card or something where you don't want to send it twice. But in this case, I'm fine with sending my name and basketball twice. So I'm going to click continue. And this time it worked telling me that I'm actually registered. So I'm not doing anything with the students data, but at least I am validating that they gave me some input. Now there's a catch here. The catch of course with HTML is that it's all executed s client side. And so for instance, suppose that a student is really upset that we only offer basketball, soccer, and ultimate frisbee. And maybe they really want to register for volleyball even though we're not offering volleyball. Well, there's arguably like a security vulnerability here where technically my code right now will tolerate any user input even if it's not in that dropdown because after all, let me go ahead and rightclick or control-click on my web page and open up the developer tools. Let me go into the form as sort of a hacker type student. Let me go into the select menu and okay, no big deal. If I want uh ultimate frisbee to exist, well, I just need to know a little HTML. I'm going to rightclick on that element and click edit as HTML. This literally lets me start editing the HTML of the page. I'm going to give myself my own option. Option value equals volleyball. Close bracket volleyball. Uh, enter. And now when I close developer tools, woohoo, I can register for volleyball if I want. So let's select volleyball. Type in maybe Kelly is hacking the site. Register. And she is registered for volleyball apparently. All right. So the short answer is the short the takeaway here is do not trust user input ever for reasons we've already seen when we discuss SQL ever more so now that we're dealing with the web because who knows what users are going to do accidentally foolishly or even in Kelly's case here maliciously trying to pass data that we did not expect. So what would be the defense against this? Like this is just how HTML works and assume that I'm actually registering Kelly for sports now and somehow she's now signed up for volleyball in our database. What would a solution be logically here? Yeah. >> Yeah. So maybe do some server side validation. So don't just blindly check that we have a value from the user. Actually check that it's one of those sports. So if I go back to app.py, I could do this in a few ways. And maybe my first instinct would be this. Let's check for the name and do this. But let's also do this. Like if request form.get get quote unquote uh sport. And actually, let's put this in a variable just to make it even easier to type. So, sport equals this. If sport uh how about does not equal uh what was it? Basket ball and sport does not equal uh soccer and sport does not equal quote unquote ultimate frisbee, then render an error. So, uh, return render template quote unquote failure.html. So, now if I go back to this form and try to register as Kelly again, you are not registered. So, I somehow caught her because volleyball of course is not in the list of sports that I put there. But what might you not like about this approach? Even if you've never done web stuff before, what's bad about this? >> Yeah, I have to hardcode every single sport now in not only app.py PI to check for the validity on the server of what the humanness has typed in. But recall that the drop down itself came from index.html. So I now in duplicate have to put like all of the sports there too. So like this just seems bad to have duplication. And so better might be to do something more like this at the top of my file here. Why don't I go ahead and just give myself a global variable which in the context of this web app is perfectly reasonable. So I can access it anywhere. Let's call it sports in all caps just to note that this is a global variable in constant. Even though Python does not have consts in the sense that C does, but this is sort of on the honor system. If you see a variable in all caps like this, just don't mess with it. Use it, but don't mess with it. So, uh, inside of the square brackets, this is going to be a list of the sports that I do want to support. So, basket ball, uh, soccer, ultimate frisbee, and that's it. Now, instead of doing all of this, what I can instead ask is a simpler question like this. If sport not in sports, then go ahead and return render template quote unquote failure.html. And I can actually tighten this up a little bit. I don't need two calls to failure.html. Why don't I just borrow this code and say or uh sport not in sports render a failure. And now I've tightened this up quite a bit more, but I'm essentially using Python to just ask is the sport that Oops, sorry, I deleted too much. Sport equals actually, let's just tighten it up further. Sport does not exist. So let's do request.form.get quote unquote sport. So if the sport that the human typed in or selected from the drop down somehow is not in this global list of possible sports, well then it's a failure. Don't let Kelly or whoever register instead. But if I now have this global variable, I can be a bit smarter in my template. I don't need to manually write out all three of these sports here. Instead, I think I can be smart about this. And when I render index.html itself, why don't I just pass in a variable called sports for instance, set it equal to the value of that global array. And then in my template, and here's where templating again gets interesting and starts to save you time. Let me go into index.html, HTML delete all but the se default value the blank one and do something like this. Ginger it turns out also supports loops like Python for sports in sports using the curly braces and the percent signs. I can now dynamically generate options as many as I want. So option value equals quote unquote the current sport close uh quote there close bracket sport. So it's a little redundant but again this is just how HTML is. This is what the human sees. This is the value that gets submitted to the server in case you want one to differ from the other. And then below that option line, I can say end for which is a bit weird, but that's how it works in Ginga to stop that loop. So this is kind of powerful. Now if I have three sports, 30 sports, all of the options will be dynamically generated by this template. And so now we're starting to save ourselves time and I can centrally manage all the sports by just updating this global list here in app.py. So, let's go back to the browser, uh, back to the form, reload, and you'll see that the drop-down thankfully still works the same way, but all of those options were dynamically generated. Indeed, if I view page source from my browser, you'll see, and there's some extra whites space there because the loop was adding some whites space on each iteration, I still have the three sports, but not volleyball, as was my intention. So now if uh if Kelly even tries hacking this version of the site by going in here and select and typing in volleyball manually registering the logic will still catch it because only those three sports are in that array. So it's perfectly fine for me now to register for basketball because it's among the sports sorry in that list not array questions on any of these here techniques. All right how about another type of form? So, select menus are nice, but you also might see radio buttons on websites, which are the mutually exclusive little circles that you can select to choose one or another option. Uh, let me go back to index.html and just show you how those can be created as well. Instead of using a select menu, turns out we can create a whole bunch of inputs uh of radio type type as follows uh as of radio button type as follows. for each sport. So for sport in sports, let's go ahead and output in between this tag and the N4 the following input type equals radio uh and let's give it a name. The name of this radio box is going radio uh button is going to be sport and the value of the current input is going to be quote unquote sport. And the word that the human's going to see is as before sport. So notice it's just another type of input. Previously we've seen text for instance two lines above. We also saw last time search. We saw email. There's a bunch of text input types. This one though is going to display as a radio button instead. And the human is going to see this label here. If I now go back to my other browser tab and click back, click reload on the form. I should see it's not pretty, but it's a radio button in the sense that these are mutually exclusive. How does the browser know that I should only be allowed to select one of them? Well, because I use the same name for each of those radio buttons. It knows that means mutual exclusivity. In fact, if I view page source in the browser, you'll see that all three of the inputs that were dynamically generated, type equals radio, type equals radio, type equals radio, also have identical names. And so that's just how that works. And that's the only change necessary. If I now go ahead and type in my name, David Basketball, click register, we're still up and running because what the server gets is still exactly the same inside of request.form. They can access. You can still access name or sport no matter what type it was in the user's own browser. Questions on these techniques? All right. Right. Well, it's kind of obnoxious that when you don't do something right in this website, like forget your name, but do select a sport, all you are told is generically you are not registered. Like, it'd be nice and much more userfriendly, better UX, user experience, so to speak, to actually tell the user what's wrong so they can actually fix the problem. Now, there's a bunch of ways we can do this, but I'm going to propose that we go ahead and do this. Let's create a template called error.html, whose purpose in life is just to tell the user a little something more about what they did wrong. So, I'm going to go back into my terminal window here. I'm going to code up a file called error.html. Enter. And I'm going to go ahead and before as before extend uh layout.html, learning from my past mistakes and closing that quote. Then I'm going to go ahead and do body block down here. And then inside of this block body, I'm going to go ahead and have just some simple text like an H1 tag that just says error to the user. then a paragraph tag that's going to contain some error message to be determined. Uh and then uh that's it for now. So I've got the template for an error message screen. Let me go back into app.py now and let me add some logic because app.py does know what's wrong. It's just at the moment we're very generically returning a failure template instead of something more precise. But if I know that the user hasn't given me their name, well let me say that error message. So, let's actually get rid of these two lines and be a little more specific like this. So, if or how about let's do it like this. How about validate the user's name first? So, name equals request.form.get quote unquote name. That just gives me a variable containing the user's name. If they didn't give me a name, which I can express with just if not name, like if name is blank or none, then let me go ahead and return render template of that error template. But let's pass in a specific message like missing name. And so by passing in another argument to this template called message, I can trust that Flask will dynamically output that message where I tell it to using the old curly braces. Meanwhile, let's go ahead and validate not just the name, but validate uh sport. I can do this in a couple of ways. Let's do this. So sport equals request.form.get quote unquote sport. Then in here, let's say if there's no sport, go ahead and return render template quote unquote error.html, message equals missing sport. So quite like name. But we can be more specific now, too. If the sport they did give me is not in the global sports list, well then it's Kelly trying to register for volleyball again. So let's return render template of error.html, HTML, but this time the message shall be invalid sport or something like that. So, we're being ever more clear otherwise they are presumably confirmed because we got this far logically. So, if I go back to the other browser tab, go back to the form and let's go ahead and type in no name and just click register. Okay, what did I do wrong accidentally? So, let's go back to VS Code, open my terminal, open the first terminal window where Flask run is running. un encountered unknown tag body. So I did something stupid in error.html. So let's go into error.html and uh body block. Oh, that's subtle. I just transposed the words. It's supposed to be block body. That was dumb. All right. Block body. I think that's correct. So let's go back to the browser. Let's reload. It's prompting me to reconfirm that I want to submit the exact same form which recall had no name and no sport. But now I see an error in a good way. This is not an uh server error. This is my error. Missing name. Now it's not super user friendly, but it's at least more explanatory than you are not registered. All right, let's go back. Let's give it a name, but no sport. Register. Ah, missing sport. Let's go back. Uh, let's go ahead and give it a sport, but uh a sport, but no name. Missing name as before. And if I took the time to actually hack the HTML and do what Kelly did before and add volleyball, it would similarly say invalid sport in this case, too, because it's not in that same list. All right, questions on this technique. All right. Well, it's all fine and good to have a registration site that does this, but it's literally just throwing out the information. And what I did like years ago was actually even cut a corner initially where I think I wrote code that just sent an automatic email to the proctor running frost IM containing the person's name and the sport for which they registered. But that was very quickly replaced by a better feature which is actually store the data in the server itself and keep track of it rather than just send it off via email. So let's do a first pass at actually storing information on everyone who has registered for sports. Well, well, let me go up here and let me create another global variable to make my life easier here called registrance and set this equal to curly brace close curly brace. What do these two characters represent if empty especially? What data type is this? It's a dictionary. So, it's a Python dict. So, you could similarly say dict explicitly open close pen. But it's more Pythonic generally to just use two curly braces. This is just giving me an empty dictionary. Why? Well, I want to store the two things I'm se collecting about all of the students, their name and the sport for which they registered. So, key value, name sport. So, how can I go about doing this? Well, it's pretty trivial. Down here in my register function, recall that I'm just kind of naively saying you're registered even though I'm not doing anything with their name or sport. But that's easy. Let's remember the student for real now. So in that registrance uh uh dictionary, let's go ahead and index into it using the student's name, David or Kelly or whoever, and set that equal to the sport for which they registered. And now notice the name is coming as before from request.form.get. The sport is similarly coming from that function. And so this is just remembering that key value pair. So that's all fine and good. It's in the computer's memory. How do we actually see it? Well, wouldn't it be nice after you register if you could see the actual registrance of the website? Um, uh, certainly if you're the proctor trying to run the sports. Well, yes. So, let's go down here and let's create another route like /registrants, which is just going to give me a list of everyone who's registered. Let's define a function called registrants, though I could call it anything I want. And this one's going to be relatively simple. Let's render a template called registrants which will soon exist and pass in all of the registrants that are in that global dictionary. And again I can call this placeholder anything I want but in so far as it contains the registrance I'm setting registrance equal to the registrance global dictionary. So let's go now into my terminal window and create registrance.html HTML and create really the beginnings of an actual frostim's website that's going to show the proctor who has now registered. So let me go into this terminal and do code of registrance.html and close the terminal. Let's try to get this right. Finally extends layout.html close quote uh close bracket there. Then let's do block body in the right order. Then end block down here. And then inside of the block here, this is going to be a bit more of a mouthful, but let's use some of our HTML from last week. We'll give an H1 tag that says registrance so the proctor knows what they're looking at. Then let's put this in a table for instance with two columns, names and sports. So table tag followed by a T head tag for the table heading. Uh then that heading is going to contain just a single row for TR. And each of those has a th table heading. Uh one of which, and actually I'll make it tighter is name. The other of which is going to be sport. So these are the column headings, the table headings, TH tags for short. After the head of the table, let's go ahead and do a T body for table body. And inside of here, this is where Ginga comes in use. I can say for each name in the registrance placeholder that was plugged in and for proactively, what do I want to do on each iteration? Well, I think want to output table row, table row, table row. And in here I can do TR and then inside of that a table data for the cell on the left putting in the student's name which is coming from this for loop just like in Python. And then one more table data namely the registrance uh placeholder indexed into at that name which because it's a dictionary will give me the sport for that student's name. And then I think we're good to go. And in fact, just to hark back to something I said last week when we were imagining, actually this is in week five when we were talking about stacks and like your Gmail or Outlook inbox is essentially a stack with the newest emails on top. And I hypothesized at the time that it's just row after row after row after row when we started talking last week about HTML. Here is what Google and Microsoft and others are probably doing. Anytime you have tabular information in a page, they've got some data in memory like the registrants and they're just using code like this in Ginger to output table row, table row, table row. Imagine this is your email instead. Same exact idea. And now we have the ability to express that kind of logic. So let's go back now into the browser. Click reload on the form. Let's register for instance David for basketball. Click register. It claims I'm registered. But hopefully now I'm legitimately registered because that variable is storing it in memory. And in fact, let's go ahead and go now to not slregister, but I'll zoom in at the top registrance and hit enter. And we will see a very ugly but functional HTML table containing two columns name and sport. The so-called t head with which David and basketball are present. Moreover, if we now go back to that form and let's try registering Kelly for instance for soccer. Click register. Now let's manually go to registrants again. Now Kelly and David are in the server's memory as well. Questions then on what this example is now doing or how it's achieving these results? Yeah. >> Really good question. If you wanted to restrict the registrance page to only certain people, ideally you would have a password on it. Um, and in fact, one of the next examples we'll do in a few minutes is a a login page for exactly that reason. Right now, just sort of on the honor system that only the proctor in question goes to this URL. But just for the sake of discussion actually, suppose that you did want the registration list to be public if only to like hype up who has already registered. Well, it's not you good to just tell people go to the /registers URL. We can actually link them to that in a few different ways. So for instance, I can go down to uh how about uh let's say success.html. So let me open up success.html. It just says you are registered. I can do something like this. Um a href equals /registrance. So I have control now over my HTML and the routes. So slregistrance will exist. Uh see who else registered. Period. So, this will create a nice little HTML link that links me to that route. So, let's try this. So, let's go back to the form over here. Uh, let's go ahead and register John for ultimate frisbee and register. All right. And now we see you are registered. See who else registered. And if I hover over this, it's super small, but it would have showed me in the bottom left corner at the link. And indeed, here now is John at the bottom of this table. And just to be clear, if I view page source on the browser, you see all of the TRS that we dynamically generated on the server side before they were sent as such to the browser. All right. What if we wanted to do something slightly more elegant here? Well, I don't have to just use this HTML hack like why don't I just show the user who has registered automatically. And this is kind of a cool feature of web apps as well. In addition to importing flask render template and request, I'm going to also import a function called redirect that comes with flask. And indeed, rather than just show success.html, I'm going to go ahead and return the result of redirecting the user to /registrance. So to be clear, I'm in my register route, and instead of showing them the success page anymore, which I might as well delete at this point, just going to redirect them to this list of everyone who is registered, including themselves. So, if I go back over here and type in someone like Doug, who maybe will play basketball with me, and click register, watch what happens to the URL at the very top of the screen, I'm automatically whisked away to registrance in this case. Um, I made a change to the code though, and so the server actually was smart enough to reload. So, Doug is now uh the only one in the database. And this actually hints at a problem we should really solve. Like, in fact, let's do this real fast. Let me go ahead and register myself again for basketball. Register. Now, it's Doug and David. The catch though is if this server ever goes offline, maybe because it needs to be updated or it crashes or it reboots, when you hit control C and get back to your terminal, Flask server is no longer running, which means that global variable called C registrance in all caps is gone. It's like free. The memory has been freed. So, if I were to rerun Flask now, as would happen automatically if the server itself rebooted, well, this is not great because if I go back to the registrance page and click reload, no one has registered. And in fact, that's what happened with Doug a moment ago because I changed my actual app.py, Flask was smart enough to realize, oh wait, the code has changed. I better reload the program, which gave me a brand new version of that global dictionary. So what would be better clearly than storing registrants in memory in RAM in a variable in the server? Yeah. Yeah. So in an actual database and so here's two where everything kind of comes full circle and connects again. So let me go back into uh app.py here. And I like generally the logic of what I've done. I don't like the fact that I'm just storing my registrance inside of this global variable, which is again just in the computer's volatile memory. Let's actually put this in a database instead. So, let me go up here and get rid of this global dictionary and let me do something a little smarter up here. Let me import from CS50's own library the SQL function that we've used before. And again, even though we've been taking off all almost all of CS50's training wheels, the reality is using CS50's SQL library, even through final projects, just makes using SQL in Python so much easier. But there's certainly thirdparty libraries you can use. Um, let me go down now and in addition to creating my app, let's create a database, DB for short, setting that equal to SQLite, and then SQLite SL, which is not a typo. And let's assume that the database shall be called frost imdb. More on that in a moment. And then down here, now that I have a database variable, let's not remember the student by storing them in this dictionary. Let's actually execute a line of SQL. So, db.execute insert into Well, wait a minute. What am I going to insert them into? Not to worry. I came prepared for this. So, let me go ahead and maximize my terminal window and then run SQLite 3 of a file called frost imdb. And this is a file I made in advance, but it's super simple. In fact, if I type dot schema just to see the design of this database, you'll see that in advance I created a table in this database called registrance. It has a column called ID, a column called name, and a column called sport. And the primary key of this table is to use the ID value which is just an integer. And now notice I have some constraints here. I want the user to give me a name and a sport. So I've specified that it's not just text, it's not null. That is null values should not be possible to put in here. All right. So, let me go ahead and exit out of SQLite 3. Let me go back into uh my code editor here. And now I know what to insert into. Insert into the table called registrance. What? Well, I want to insert how about a name of the student and the sport for which they registered. And the values therefore that I want to insert are going to be whatever they came from the post request. Here's where you do not want to make yourself vulnerable to SQL injection attacks. No fst strings in here. you know, just plugging the students input in blindly. This is where and why we use these placeholders in both CS50's library and in many libraries uh in the real world to specify that I want the library to properly sanitize the user's input and get rid of any scary characters like apostrophes or semicolons or the like. So, I'm going to pass in name and sport. And this one line has the effect of, as you recommended, storing the registration in an actual database on the server, not just in volatile temporary memory. But we do have to change one thing. This line here is no longer valid because there's no global variable there via which we can get all of the registrants. But that's no big deal. Here's how most web apps would do this. I'm going to define a variable called registrance and set it equal to DB execute of select star from registrance. It's as easy as that to just get all of the registrants from my database. And down here, there's no longer an all capitalized variable, but there is a lowercase one registrance. So, to be clear, in my register route, I am inserting the user into the database. And in my registrance route, I am selecting the users from the database. And then the rest of the code, I think, can stay the same. So, let's go back to fro's here. Go back to the form. Let's register David for basketball register. Ah, I did screw up. You're seeing some weirdness here. What are you actually seeing? There's one user registered. Not intentional. But what does this syntax suggest? We're looking at this is a dictionary. Recall that the db.execute method that comes with CS50 SQL library gives you a list of dictionary objects. And so because there's only one registrant at the moment, you're seeing my dictionary for my registration, which is not what I want to show here. And I forgot. I need to also go back into the registrance uh template to tweak my syntax as follows. Let me go back into VS Code here. Let me go into registrance.html. And because I am passing in now not a dictionary but a list of dictionaries, I just need to think about the problem a little bit differently. So my syntax here is going to be for each uh let's do this as follows. For each registrant in that registrance list of dictionaries, go ahead and display the current registrance name and go ahead and display the current registrance sport. In other words, I'm using Python syntax which works as well in Ginga here. This iterates over the list of registrants each of which is a dictionary. So I'm using dictionary syntax now to index into the name key of the registrant dict uh object and the sport key of the same. So now let me go back to my browser and I'm just going to go ahead and reload the registrance page without resubmitting the form. Now there it is. David and basketball. And now let's go back to the form and register a couple more people. Kelly for soccer register. Notice we're at the registrance link. Kelly is indeed still registered. Let me go back to this and let's register John. Ultimate Frisbee register. Let's go ahead and kill the Flask server by going to my first terminal window. Uh, control C. And now let me go ahead and rerun Flask, which was bad before. That's how Doug ended up the only registrant last time. But this time if I go back to the registrance page and immediately click reload, even though the server is running a new in memory, the database is persistent, which was the whole point of using SQL from week uh seven onward. And let's do one more for good measure. If I go back to the form, we'll register Doug so he can play basketball with me, too. And we even have Doug now in the database. It's an ugly looking table, but the data is in fact all there. All right, questions now on this improvement which is getting closer and closer to what the actual Frostim's database did uh website did so many years ago. All right. Well, let me propose this now. We have this table of registrants. Suppose that um maybe uh Kelly was not a very sportsman like when she played soccer last time. So, we want to dregister Kelly from soccer. That is nope. we're going to reject your registration. Let's think for a moment about the design here. Like, here's an HTML table containing names and sports. And wouldn't it be nice if we could add a button that would let me dregister Kelly or anyone for that matter? When I click on that button, what information should ideally be sent from the browser to the server to remove someone like Kelly from the database? >> ID. >> Yeah. The ID of the person. And you're proposing ID instead of name. Why? the ID uniquely identifies in that SQL table. >> Exactly. The ID uniquely identifies the user in the SQL table. So, in fact, let's see this real quick. If I go back to VS Code and we'll revisit essentially a week seven issue here. Let me go back into my second terminal where I can again run SQLite 3 after maximizing my terminal. And before I just wrote schema to see what the table is. Now I'm going to literally run select star from registrance in SQLite 3 and we'll see a little askar table of all four of us who registered but we also see the unique ID and the value of the unique ID recall from week seven is that it's the so-called primary key. It is the value that uniquely identifies users as minimally as possible and that's a good thing because if we have another Kelly registering for frost IM's we don't want to dregister the wrong Kelly or both Kelly's we want only the Kelly with ID of two. So somehow the button we add to the registrance page should contain in it the ID of the person we want to delete. Because if you do pass the ID of the person that you want to delete to the server, the server can do some kind of select looking or some kind of delete statement using that ID number and delete just that row. So there's a few ways we can do this, but let me propose that we proceed as follows. in our registrance route, which is where we can currently see all of these users. Let's go ahead and output an ugly but functional form for each of those users. So, let me go ahead and uh minimize this and hide my terminal window. And in registrance, let's go ahead and just do this. In addition to outputting every registrance name and sport, let's also output a third column whose purpose in life is to contain an HTML form. The action of that form will be a route like dregister and the method we're going to use is going to be post just so that we don't accidentally store uh personally identifying information in a URL or such. This form is going to have a button the type of which is submit and the button is going to say dregister. And I could now implement the ID in a couple of ways. I could do input name equals ID, type equals text. And now if I go back to my other browser tab and reload, I should see a button for every one of these registrants. And I do. But this is kind of like the honor system where I just let the user type in the ID of who they want to delete. And it's sort of weird that I have multiple forms in that case. But here is where dynamically generating HTML can get pretty uh useful. Let's change the type of this input to hidden and set the value of this uh input to be whatever the current registrance ID actually is. Uh storing this in here and let's go ahead and not confuse this. So we'll use single quotes on the outside instead. So inside of this value I'm putting the current user's ID. So, if I go back now, notice that the text boxes are going to disappear, but the buttons will not. But all of that information is still there. If I right click or control-click and open up my developer, uh, let's open up view page source because it's just a bit bigger. Notice that David and Kelly and John and everyone else here has the same HTML as before, plus another column containing a form that contains a I somehow messed up still. Why is this blank? So, this is still not good. Ah, thank you. I accidentally pluralized this, but it should be registrant because I'm inside of this for loop and each iteration gives me a variable called registrance. So, user error on my part. So, let's go ahead and dramatically do this again. Let me view page source of the same page. Scroll down a bit. Thankfully, there is now for every one of these registrants a hidden ID for one for me, two for Kelly, and I bet if we keep scrolling, we'll see three for John, and four for Doug. So, now this form has enough information, even though there's no user input other than the clicking of the button to tell the server whom to delete. So, how do we delete the user from that particular registration table? Well, I think we just need to add a route. So, let me go back into VS Code here into app.py and let's go ahead and create another route for instance uh in here say uh we'll put it up here below uh up here below index. So, app.root quote unquote slash dregister whoops dregister and now defregister but I could call it anything I want. And how do I do this? Well, let's first get the ID from the form. ID equals requestform.get get quote unquote ID. Let's do a bit of a sanity check here. So if there is an ID and it's not blank for some reason, go ahead and do DB.execute delete from registrance where ID equals uh question mark. And now let's pass in the user's actual ID. And then no matter what, let's go ahead and redirect the user back to the registrance page so that we can hopefully see the result of that change. So again, I'm just using a bit of SQL per week 7. I'm using a placeholder by using the question mark, passing in the actual ID from the form. And I'm only doing this if there is an ID that was passed in. And I'm letting the database actually do the deletion. All right, so let's try to do this. Let's go back to the browser here. Reload the /registance page for good measure. Let's decree that Kelly is now dregistered by clicking this button. And oh, so close. method not allowed at the dregister route. What did I do wrong? Let me go back to the code. What's wrong with my dregister route? Well, what method is the form using? If I go back to registrance.html, the meth the form is using post. >> Yeah. So, I need to override the default, which is get. So, I need to go up here again and just change an argument to be methods equals and then in a list containing only post now instead of get. All right, let's go back to the form and go back. And now let's try to dregister Kelly. She's gone. Let's get rid of me now. I'm gone. And indeed, if I go back to VS Code, open my terminal, maximize it, and select star from registrance again, you'll see that the two of us are indeed gone in this case. questions now on this technique because now we have most of the plumbing in place for adding people to a database, deleting people from a database. It's very similar in spirit now to most any website that has this kind of interactivity. All right, subtle question. I deliberately in my registrance.html file uh used post as we just discovered instead of get. Why though? because it wasn't that strong an argument that I hinted at earlier of like, well, I don't want like Kelly's ID to end up in my URL bar or mine. Like IDs are not really personally identifiable. They're just opaque integers at the moment. But why would it be bad if you could delete people by using the get method? So this is kind of subtle but the catch with using get is that by definition you can visit that resource that route by just typing in a URL or following a hyperlink. So for instance if an adversary were to type a URL like /registrance question mark id equals oh I don't know uh four and then send me this URL in an email or send this URL in an email to the proctor who's running the frostam's program. If that proctor simply clicks naively on this link as my code is implemented now and I've used get instead of post, what's going to happen? >> Doug gets dregistered just because the proctor followed a link in their email. And this is hinting at the kinds of fishing attacks that are possible too. Bad design like generally when you are using get requests that is just simple URLs that are clickable or typable. They should not have the effect of changing data on the server. Post is much better if only because you can't just click a link and post happens. To induce a post request, you almost always have to click a button. So, at least this case, the proctor would receive an email. They would have to receive an email, click on a link, and then they would see a web page like this that clearly has a button labeled dregister or the like, which is an additional layer of protection. And there's even more attacks that you can wage by supporting get. So in general, post requests are preferred anytime there's anything remotely personally identifiable or remotely destructive like actually changing data on the database like this. All right. Well, what more can or should we do with fro perhaps? Well, let's see. Maybe one or so final flourishes here. Um, if I want to go ahead and maybe make those error messages a little more interesting. Let's do that for just a second. Let me go back to uh my uh other browser tab here. Let's go back to the registration page where the form is and let's deliberately not cooperate and just click register so that I get an error about missing name. Well, wouldn't it be nice if we made this a little more user friendly by including like an image on the page as is commonly the case? Well, we can certainly include images in websites using the image tag, but the catch is we actually have to be a little more clever about how we store the image on the server in order for this to work. So for instance, let me go into that error page. We don't need success open anymore and we don't need layout anymore or this index anymore. Let's focus on error. And suppose that I did want to include an an error message containing like a a grumpy cat on the screen. Well, ideally I would just do alt or I would do open bracket image uh source equals and then something like cat.jpeg where cat.jpeg is the name of a cat in this current folder. And just to be clear, let's have an alternative text of grumpy cat for screen readers or slow connections. Okay, this unfortunately is not going to work. Let's go over here and induce the same error by just reloading and submitting the same form. And you'll see indeed a broken image because that image that cat.jpeg does not exist, but we do at least see the alternative text. Well, I did come prepared with a cat already. And so, let me go ahead and grab this cat from another folder. And this cat is going to contain uh is going to exist in a file called cat.jpeg. And indeed, if I type ls now after having grabbed a copy of that cat, it exists alongside app.py. Seems good. Let's go back to the browser here. Let's reload. And we should see ah still no cat. Well, why is this? Well, this is a side effect of using the framework as well. It turns out for organizational sake, any images you want to display on a page or any CSS files or JavaScript files that you want to embed in a page, if they're static assets, should actually be in a folder called static. And by static, that just means unchanging. You or someone else wrote them once and they're not dynamic in the way that app.py is. So, I'm actually going to use my mv command and move cat.jpeg into the static folder. Indeed, if I type ls now, cat is gone, but it is in the static folder. And now if I go back over here, I think we'll be good except that I do need to go into error.html and say that the source of this image is actually in /static/cat.jpeg to make clear it's in that folder. And so indeed when I now reload the page once more now I see a very grumpy cat at least guiding my error message. A but there is a difference here. Even though when accessing the static directory I have to be explicit. Notice that this whole time we have never once mentioned the templates directory. The render template function to be clear knows automatically to look in the templates folder for your template. You do not and you should not say something like templates here. You simply specify the name of the file. But in the in the uh HTML template, you do actually have to include as I did /static in the HTML. All right, let's do one final flourish with the actual code. Suppose that it's time to modernize and let people register not just for one sport as per the radio buttons, but multiple sports. It's a little obnoxious to make me go back and fill out my name again and again and again if I want to register once, twice, three times for sports. So, why don't we uh go ahead and in terms of UI change those radio buttons to checkboxes? That's a very easy fix. Let me go into uh my templates folder and into index.html HTML where this form is. And if I want to change radio buttons to checkboxes, literally just change radio to checkbox. If I go back to the browser here and reload, you'll see the familiar checkboxes now, which are not mutually exclusive. It lets me check multiple ones, thereby registering for multiple sports at once. But my logic has to change a tiny little bit here whereby if I want to go ahead and get all of the sports for which the user is registered, well, that logic has to change in app.py. So where is my register route? Down here. And we haven't touched this in a while, but recall that the register route here has uh a validate name chunk of code, validate sport chunk of code, and we most recently did the insert into chunk of code as well. But if the user is registering for multiple sports, I'm okay with having one row per sport, even though I'm sure we could do better than that. But how do I iterate over all of the sports that the user gave me? Well, I need to change my validation code here a little bit. If you know the user can select multiple values as with checkboxes, you're going to use request.form.getlist and then the name of the uh parameter that you want to get the value of. And then this is going to give me back a list of values. So I'm going to go ahead and change semantically my code to say sports because I'm expecting zero or more sports now instead of one. So if there are no sports, we're going to just say missing sport. Heck, missing sports. Um but then I can't simply do this. I can't just say is the sport for which the user registered in that array or not because they might have given me two sports or three. So logically I should really check all of the sports that the human typed in for me and I should probably do something like this instead. So for each uh sport in the sports that the user typed in, go ahead and uh ask the question if that sport is not in sports, then go ahead and output invalid sport. So it's just a bit of tedium here. We're just adding a bit of logic, but this way I'm iterating over every check box that the user checked and making sure they didn't do what Kelly did earlier and sort of make up her own sport and submit that to me among all of the others. But this now should let me. Let's try. Let's reload. Oh, and then actually one other line here. We also need to do it down here. Uh, for each sport in sports, we better execute that line of code multiple times. So, let's see what happens. Let's go ahead and register David for actually let's see what who's in the database still. So registrance. So we've got John and Doug. No David or Kelly. So let's reregister David for basketball and soccer. Click register. And now I'm indeed registered for both. And I observe that it's kind of bad design that I'm just inserting myself twice into the database. So let me go ahead and open up the Frostims database one last time. Uh let me do a select uh let me do a select star from registrance. You'll see too that David and David are both there. What would be a better design here to get rid of the redundancy and to know that I'm the same person ideally? Yeah. >> Yeah. I should probably have an ID for the the person as well. So this is going to complicate it more than we want to play with today. Instead of just a registrance table, I should probably have like a students table that has an ID for every student and the name of every student and then change this table as we've seen with the IMDb database and others. I should really be storing the IDs of the students, the Harvard IDs if you will, and not just their names like this. So, there's room for improvement, but the point here is just how we can actually use checkboxes and get back multiple items from folks. All right, that was a lot. Questions on where we're now at. All right, to make the coding a little less tedious, what we're going to do is look at a few final examples that have sort of come pre-made, and we'll walk through the code, pointing out only what's different as opposed to some of the boilerplate that we keep seeing. Um, where we left off now, recall, is that we have app.py, which is all of our logic, requirements.ext, text which just enumerates the libraries that we want to use in the project. Static which now contains any static files like cats or JavaScript or CSS and templates which contains our actual templates. It's worth noting that we're actually following a fairly common paradigm. This is not specific to Flask. The model that we've essentially the the paradigm that we've essentially been implementing is this. If this uh shape over here represents the human or the user, they keep interacting with what the world generally calls a view. A view is the term of art that just describes like the user interface. aka view. But that view is generated by a certain type of code, namely controller logic. So app.py is technically what the world would call controller logic or business logic uh to use an industry term. And that controller code, aka app.py, is generating one or more views. So the views that we're referring to here is like everything in your templates. Those are your views. But there's a third piece of the puzzle that we just introduced which is generally called a model. And initially my model was just a stupidly simple uh dictionary in memory and that evolved eventually into frostams.db. So your model is generally your persistent data like where you're storing data related to the application. And even though the picture doesn't lend itself to pronouncing it in the right order this is what's known as the MVC paradigm model view controller. And it's a very common way of developing web apps by just thinking about the different problems you need to solve with this kind of nomenclature. Like I've got to implement my controller which does all of the logic, all of the variables, functions, conditionals, loops, and so forth. I've got to implement the view which contains everything the user sees and interacts with like the HTML. And I've got to eventually implement the model which is like all of the backend data space and such. The catch though is that this is not a clean line because clearly in views we've seen variables, we've seen loops, we've seen conditionals. So this is just a general mindset to have and in the real world if you ever uh explore web apps again you are henceforth familiar with what's known as this MVC model. But now let's solve some other real world problem. So here's what you see on the occasion that you sign into something like Gmail or really any other website that asks for a username and then eventually a password or some such thing. This is just a web form. It looks a lot prettier than mine because they're using some fancy CSS to make things blue and nicely indented and so forth, but it's just HTML underneath the hood with probably an input type equals text to give me this text box. Of course, when you log into Gmail after providing your password, somehow Gmail remembers often for days, weeks even that you have logged in already. Now, how is that actually working? Well, when you first log into a site like Gmail and click submit or the next button in this case, presumably the browser is submitting in a virtual envelope, so to speak, a message like this to Google's servers. Post slash something to accounts.google.com, which happens to be the URL that Google uh typically uses for this. And inside of this, the dot dot dot is your username and password and anything else that might be submitted to the server. Ideally, the server responds to you with 200. Okay, like here is your inbox. Okay, you logged in successfully, but it also underneath the hood, every time you've been logging into Gmail, has been planting a cookie on your computer. And you might be generally familiar with cookies. They have kind of a bad rap because they're often used and are used quite frequently for tracking, for advertising, um, and really kind of keeping eyes on you in some way. But in their basic form, they're just a feature of HTTP, which is wonderfully useful because it solves some typical problems. Uh this is another HTTP header that is usually inside of those virtual envelopes that come back from servers to browsers. In addition to telling the browser what the type of content is in the envelope, it might tell the browser, please set the following cookie. A cookie is just a key value pair. It might be something like session literally equals some value. And that value is usually a random string that might be 1 2 3 4 5 6 or something like that, but it's a unique identifier. Or naively, if Google implemented cookies poorly, they could technically tell your browser to store a cookie on your computer containing your username and a password. Why? So that tomorrow when you open up Gmail, you're not prompted again with the stupid form to log in. It already knows your browser that you're logged in. And your browser can do that by just sending the same cookie it got yesterday to the server. Now, this is bad to use cookies to store usernames and passwords generally because it's putting very precious data in the browser's memory and any sibling or roommate who walks over to your browser can now find your username and password by just poking around your cookies. So generally what browsers do is more like this screenshot here whereby all the server does is it puts a big random value on your computer somewhere essentially a text file containing a big random value and that is equivalent essentially to sort of a handstamp like if you go into a bar or a club or an amusement park generally you show your ticket once when you go in and then thereafter you just show your hand if you want to be able to come and go again and again. So right now my hand has not yet been stamped. We uh have this nice here smiley face sticker. I might have a smiley face now on my hand anytime I want to go back into the bar or club or amusement park because they now know, oh, we already checked who you are, presumably the very first time that you came in. That's all cookies are effectively doing is it's putting a virtual handstamp in your browser because the browser the next time you go to Gmail and click on a link or click on an email. Your browser unbeknownst to you will send a get request that looks like this but also contains a line like cookie colon and then that same key value pair. It's like presenting your handstamp again and again every time you open an email or click on a link in Gmail. This cookie header is what the browser sends. This set cookie header is what the server sends. So this is the act of stamping your hand. This is the act of presenting your hand. And that effectively is how browsers and servers remember who you are. This is how advertisers generally remember who you are because at one point or other they put a cookie on your computer and unbeknownst to you, you're going to this website, this website, this website and your browser has been presenting this handstamp all this time so advertisers know, oh that's David again, that's David again. And that's David again because they're seeing the h same handstamp. And so one of the reasons why last week for instance I kept opening things in incognito mode which you might use generally if you want to do something private and not have it be saved in the computer's memory is also because incognito mode gets rid of all of your cookies when you close the window effectively like wiping off the handstamp the next time you go to that same website. So that's all a cookie is. It's a key value pair that can be planted on your computer, but it's a wonderfully powerful mechanism for implementing, and this is the juiciest idea for today, I'd argue, what are called sessions. Sessions are this feature whereby browsers and servers have a persistent connection to each other, even though HTTP is what we'll call stateless. So stateless just means that you don't have a constant connection to the server when you are using a website. And that's not always true. And nowadays you sometimes do have a consistent a persistent connection but cookies allow you to close your laptop even shut down your computer come back the next day and still have the illusion of being connected just as you were the previous day because of this virtual presentation of handstamps. So a session more concretely you can think of in Python as a dictionary of key value pairs that you can associate with each and every user. That is to say, when I log into a website that is using sessions implemented with cookies, they can store any number of key value pairs about me in the server's memory. And my presentation of the handstamp will ensure that they keep uh they know which key value pairs to assign to mate. Let me go back into VS Code here and let me CD into a directory with which I came, which is called login, which is just going to be a relatively simple Flask application that demonstrates how you can implement the ability to log into a website. And we'll keep it super simple with just usernames, no passwords. But as you'll see in problem set 9, we'll add some passwords to the mix as well. If I type ls inside of this login directory, you'll see some familiar friends, app.py, requirements.ext, and templates. But let me draw our attention to one other library we're going to now start using called Flask session. So flask session is just a third party library that gives us the ability to use cookies in our application and not have to know or understand any of the screenshots we just saw of HTTP requests. it sort of suffices to stipulate, okay, someone figured out how cookies works. I just want to use them now as a feature so that when a user uses my website, I can associate data with them like who they are, what their username is, and therefore that they've logged in. So, let's go ahead and close requirements.ext and open up app.py in this case. Here is an implementation of a program whose purpose in life is to enable me to log in. And in fact, before we demon before we walk through the code, let me do this in this uh terminal, let's do flask run. And I already hit control C on my other terminal window a moment ago. Uh let me now go into my other tab up here and reload the slash route, which is now going to be this login route instead of frost imams. All this website does by default is it tells me first you are not logged in, but here's a link to log in. It's a little small, but if you look in the bottom lefthand corner of my browser right now, it's a URL that ends with slashlo. And in fact, I can see that more clearly if I view page source in the browser. Here is the only thing I'm really seeing in this web app so far. But notice what happens now. If I click on login, the route in my URL just changed to /lo. I'm again keeping it simple with just usernames, no passwords, but I'm going to log in as David and click login. But first, let me show you the code. In view page source, I have a form that submits to /lo using the post method. The only thing about this button that's that form that's interesting is it's got a text box and a login button. Same as we've seen before. So, let's click it. Now, I click login. And notice I get whisked away back to the original route, the slash route. Even though Chrome is hiding the slash from me, but the website somehow knows that I'm logged in as David. In fact, if I open up my page source in the browser, I'll see that now it doesn't say you are not logged in. It says I am logged in as David. And it's now giving me apparently conditionally a logout link. So I argue this is representative now of any website that lets you log in and out of it. So how does this work? Well, in my login account uh in my login app here, what do we have in app.py? The following. I've got from flask import flask redirect render template request and a new one session which you can essentially think of as a dictionary where you can store key value pairs for each and every user and flask will make sure that your code has a different copy of session for every user that visits. You can just treat it as though you only have one user, but Flask will ensure that when a user visits, they get their own copy of session, their own copy of session, their own copy of session essentially to store whatever you want. This next line here, I just need to copy paste from flask session import capital session. This line is the same. Turn this file into a flask app. This stuff is new and find a copy paste. This just says configure this app to use sessions by storing the cookies on the server as files instead of in a database or somewhere else. But this is the default that we use for our examples. All right, what's going on here? Well, in my slash route, I've got an index function whose purpose in life seems to be to render a template called index.html and then pass in a name placeholder, which is the value of session.get.name. So whatever name is stored in the session if any that gets passed into the template. So let's go down this rabbit hole. Let me open up index.html. Interesting. So here is the logic that implemented those two different versions of the homepage that we saw. If the name has a value, so if it's not empty, we saw you are logged in as such and such. Here's a logout link. If though there was no name, as happens by default before you even log in, you see you are not logged in. Here's a link to log in. So that's all the homepage is is it's conditional logic checking if there is in fact a user logged in. All right. Well, let's go back to app.pay. How does the login work? Well, if you find your way to the login route, then I'm asking a question. If the user got here via post, they probably got here by clicking the login button that I gave them. So, let's store in the session dictionary the word name and make the value of that key this value here where what I've just highlighted is whatever the user typed into the form whether it's David, Kelly, John or anyone else. That's what comes back from the form and I'm just storing that in the session which again is like this special global variable that you get one per user and it's implemented underneath the hood by way of cookies or these handstamps. Then I'm just redirected to the slash route. Otherwise, if the request method wasn't post, that means the user just van newly visited example.com or whatever my website is. That's why I show them login.html. All right, let's go down that rabbit hole. Let's open up login.html. It's pretty simple. It's just a stupid form that has a text box and a submit button. But the most important part is that as we saw in the browser, it submits to /lo the route we just saw. All right, if I go back to here, how do you log out? Well, we didn't actually click this, but here is how you can delete the contents of the session and actually log the user out. You just call session.clear. And so, in fact, if I go back over here and click log out, how does the server know that I've logged out? Well, that route very quickly, you didn't even see the URL bar change logged me out by clearing the whole session. And so, the cookie that was planted on my computer was essentially deleted at this point in time. Or really, the server side data that's associated with that cookie was deleted. So, I'm no longer seeing it at all. So, that's kind of it. Like, if you log into a website, whether it's Facebook or Gmail or Outlook or anything else, like that's effectively how they're logging you in, but of course, they're adding into the mix some uh passwords and other security as well. All right, how about one other example? Let me go back into VS Code here and let me go into my first terminal, hit C to kill this login example. Let me hit cd to go back and then cd uh store to implement the simplest of web stores like some kind of e-commerce site that has an actual shopping cart implemented. Let me do flask run inside of this directory. Open up my other terminal window. And in my other terminal window, I'm going to go cd to go back and then go into store here where I'm going to see some familiar files, namely app.py requirements.ext, but a database file this time in addition to my templates. Well, let's see what's inside of that database. Let me go ahead and run SQLite 3 of store.db dots schema to see what's in the database. Ah, this is like a bookstore like the very first version of amazon.com if you will. And the table has uh two columns an ID column and a title column for all of the books that this store shall sell. Well, what are those books? Select star from books semicolon. Okay, so this is a bookstore that sells only five books among them the Hitchhiker's Guide to the Galaxy and sequels. All right. So, wouldn't it be nice if we have a website that displays everything in this catalog and lets me like add things to my cart? And in fact, here is maybe the better metaphor for what a session is. A session essentially gives you the ability to implement a shopping cart like this where the shopping cart of course in the real world is specific to each user. Like if I'm on Amazon.com and Kelly's on Amazon.com and both logged in, we obviously don't see the contents of each other's carts. And that's because we have separate cookies on our hands. And so Flask or whatever Amazon is using creates the illusion that we each have our own global dictionary called session in which Amazon can store any key value pairs it wants like what's in our shopping cart. So let's try this. Let me go back to my other browser and reload. So I'll now see not the login example but the bookstore example. And it's super ugly because I whipped it up using the simplest of HTML. But you'll see here every one of the books in the database plus an add to cart button. And even if again you're sort of new to all this web programming, there's not all that much you can do with HTML except use forms maybe with some hidden elements to achieve this result. So here we have the H1 tag with books. Here's an H2 which is big and bold but not quite as big. Here's the form. Here's the uh here's the button for the Hitcher's Guide to the Galaxy as an aside because there's like a curly quote or an apostrophe in the book's name. This is just an HTML entity that Flask is outputting for me, even though it's not there uh visually in the database. So, what is the button do for Hitchhiker's Guide to the Galaxy? Well, it's a form whose action is /cart, presumably because I want to add it to my cart using the post method. I've got an input name equals ID, the type of which is hidden, the value of which is one. And fast forward 2 3 4. So just like the dregister example for Kelly, similarly, is each book going to be addable to a cart instead of removable by using that unique ID? And indeed, every form has an add to cart button. So what's happening then on the server? Well, let's take a look at the other tab here. If I go back into uh VS Code and if I go into my let's say let's minimize the terminal window here and let's open up inside of store. Let's open up our template for index.html which is sort of the entry point. Oh, which is not that. Uh let's open up app.py first and figure out what's going on. So at the top we have some imports including our SQL library. We have an app variable being created, a DB variable being created using that same store.db. We've got this boilerplate code which just again enables cookies and stores the contents on the local file system instead of in a database. Ah here's the interesting beginning point. How did I see that big page with all the books and the buttons? Well, for the slash route, we've got this function that first uses some SQL to get all of the books from the database. Select star from books. And then, ah, there's no index.html because I called it books.html in this case just because. And I set the books placeholder equal to the value of the books variable. All right, let's go down this rabbit hole now. Let's open up the templates folders books.html file. Okay, so here we have that H1 with books and then we have a for loop which is going to output for every book an H2 tag and a form tag a form tag again and again and again each of which has a value that equals the current book's ID but the title in the H2 of course is the title of the book which is more human friendly. So what happens when I actually click on add to cart for the Hitchhiker's Guide to the Galaxy? Well, I should indeed see that now that one book has been added. And if I go back and add another like the restaurant at the end of the universe, I now have two books in my cart. So, where is that data actually being stored? Well, if we go back to VS Code here, uh, hide the terminal and focus on the cart route. The cart route because it supports post in addition to get also is doing this for me. Well, first it's checking with some logic here. If there is no cart in the session, go ahead and create a key called cart and set it equal to an empty list. In other words, I can put any key value pairs into the session that I want. So, if I want my shopping cart to effectively be a list of all of the books that the user has added to their cart, it stands to reason that my cart by default should just be an empty list when they first arrive. However, if the user has clicked submit in order to get here, well, I'm going to do this. I'm going to get the ID of the book that they've submitted via that form. And if it indeed exists and it's not someone like Kelly messing around and sending me invalid parameters, I am going to append to the cart list in the session the book ID. And then I'm just going to redirect the user to the cart. And anytime you do a redirect that always is using get, not post. And so when I come back to this cart route later, I'm not going to be using post. I'm going to be using get, which means this chunk of code here is executed. I have a variable called books. set it equal to the results of doing select star from books where id in the following parenthesized list of ids recall that in is the preposition that gives me back multiple ids if I so choose and then I'm rendering cart.html HTML with those there books. And if I go back to the application, the reason why I'm seeing two elements here, and indeed if I go to my developer tools or view page source rather, I'll see two list items inside of an ordered list or a numbered list containing the contents then of that shopping cart. All right. So, if we now have the ability to use sessions to remember who has logged in and we have the ability with sessions to remember what someone has added to their shopping cart, what else can we do with web applications more generally, even if not using sessions? Well, let me go ahead and close this tab here. Let me go back to VS Code here. Close out these two examples and let's do a final set of examples that demonstrate what we can do with some real world data and a web application. I have lastly a directory called shows which is evocative of our use of IMDb in the past. And I'm going to go ahead into my first terminal window. Hit control C and call your attention to one thing before we move on. Every time I have executed a SQL query inside of my code in my first terminal window where Flask is running, you'll see either in green for success or yellow or red for some issues the actual SQL code uh SQL commands that are being sent to your database. This is useful if you mess something up at some point related to a database query. You can actually see in your terminal where you're running flask run actually what SQL command was sent to the server to to try to troubleshoot errors that way. Otherwise, you're just flying blind when actually interacting only with the web browser. But for now, let me go ahead and clear that away and cd back to my default directory and cd now into shows where if I type ls, we'll see a whole bunch of files. app.py requirements.ext text and this time shows.db which is the very same database that we had in past weeks when we played with some of the very large number of shows in the internet movie database. And what does zap.py do here? Well, it implements the simplest of programs. This gives me access first to shows.db with some boilerplate up top. If I scroll down here, you'll see that there's a uh index.html template that's rendered by default. And then apparently there's a search route which is akin to what Google does for us when we searched for cats and dogs in the past. But for the first time I'm implementing my own search engine for TV shows, not for dogs and cats. But what does this search route do? Well, it uses a shows variable and it executes the SQL select star from shows where title equals question mark and it passes in just like Google does the Q parameter for query and then it renders a template called search.html HTML passing in those shows as a placeholder. In other words, what does this do? Well, let me go back over to the store uh to the store tab here. Change the URL to just slash. And because I'm now running uh I'm no longer running the store, I do want to go ahead and run in my first terminal window flask run to start start off the shows application instead. So if I now go back to that tab because no server is running, what I see here now is the simplest of search boxes like our Google example asking for a query, but this time I can search for things with which I'm more familiar, like the office, capital T, capital O, search. And what I get back, not that enlighteningly, but is the title of every show that matches exactly that. If I go ahead and view page source, you'll see that what was generated was a unordered list of offices that are in the database. And recall there's the British one, the American one, and a bunch of others as well. However, this form does not work. If I type in something like the office search, I get no results in that case, which isn't so much a bug. Well, is just a lack of features here. And so, let me actually go into VS Code here, and let me propose that we come up with a better version of this code. So, in fact, I'm going to go into the pre-made examples with which I came today. I'm going to go into the next version of shows here. Run flask run here. reload the application over here and now show you that the office in lowercase does actually work. Moreover, it searches for anything that mentions the office. So if you had to guess how might this be implemented underneath the hood, well, if I open up my other terminal window and go into that same directory, shows one and open up this version of app.py, PI you'll see that instead of using a simple query like before I'm now using the like keyword here because I'm checking that it is like the office and notice this is a bit clever here or a bit confusing at first glance the placeholder I want is question mark but I don't want to just search for the user's input I want to tolerate zero or more characters to the left via the SQL wild card and zero or more characters to the right so I'm concatenating onto the user's input a percent sign here a percent sign here because recall from our week seven with SQL. This just means look for anything case insensitively that has t space o ffic in it no matter where that string is in the text. How did it know to render that though as this bulleted list of all of these offices? Well, let me go into my terminal here and open up uh search.html which is the template that the search route is using. And you'll see that I'm just iterating over with a ginger for loop each of those shows. and then outputting a list item for each of those matches effectively just as I did before. But there's this other technique I can use altogether and it's generally going to open up more possibilities for us in final projects if not beyond of creating essentially my own API. Rather than to just make a web app that spits out the entire HTML page that I want the user to see, wouldn't it be nice if I could just start to create routes that spit out the data that I want and then I or even some third party making a website with the same data can integrate my application into their own. And indeed, an API is an application programming interface. And it's essentially web- based functions you can call to get data from someone else's services generally using HTTP. And you can return the data in any number of formats in text format um in HTML format or in something called JSON format which is short for JavaScript object notation which looks a little something like this which is quite like Python arrays and dictionaries combined. But notice here with a wave of the hand, there's a whole bunch of key value pairs in this particular example of all of the offices that are in IMDb's database. And so I wanted to show us these final versions of this same shows application that works a little bit differently. If I go into say shows 2 example here now run whoops and let's go ahead and exit out of the previous flask copy and run shows two inside of which is flask run. Notice here that if I go back to this web form now, notice that there is no more search button because this is meant to be highly interactive and I can search for t space of ffic. And you'll notice that this is effectively autocomplete which we saw a taste of last week with JavaScript which I am in fact using here. But how is this working? Well, let me reload and open up my developer tools. And in developer tools, let's watch the network tab this time because when I type in something like t, you'll see that my web page suddenly made a request to my own slasharch route. And if I click on my developer tools and look at the response that came back, you'll see that the slasharch route spit out not a full web page, but just a whole bunch of LI tags. Now, why is that? Well, let me go back to VS Code and open up in my other terminal uh app.py. And in app.py, scrolling down to search, you'll see that when I get shows from the database, I'm still using search.html, which previously extended my layout and plugged in that whole ordered unordered list. But this time, if I go into this version of search.html, HTML, you'll see that I'm only spitting out raw HTML because I'm assuming that maybe someone, myself included, wants to use slash search to just get a whole bunch of list items that they can put into their own unordered list or UL tag. And so what's effectively happening over here is every time I type a letter, notice at bottom left, another HTTP request goes across the internet, another HTTP request, and each of those is returning the set of LI elements that line up with the query that I've typed in. But this is a little sloppy arguably in so far as I'm returning a chunk of HTML, but out of context, and I'm dictating to the user that they have to use list items. Wouldn't it be nice to just send the raw data? And I can do that, too. Let me go back into VS Code here and look at our final example, shows three, inside of which is a version of this code that now returns that so-called JavaScript object notation. And if I go into shows three, run flask run, go back over now to my browser tab, and click reload, I'll see now when I search for say T and click on that row. Notice now in the response tab of my developer tools, I'm getting back a whole bunch of juicy information. A massive JavaScript object notation chunk of data. Notice the square bracket means here comes a list or an array. Here comes a dictionary or dict. And indeed, that's what I'm seeing. This looks like Python, but it's technically JavaScript and it's technically JavaScript's object notation. This just means this is the juicy data I'm getting back from the server. And if you now think way back to week zero and even our family weekend lecture on AI, a lecture on AI where I was writing code that talked to open AIS so-called API to get responses from our serverside cat. They were sending us JavaScript object notation like this and I was just grabbing the data that I actually cared about, namely the cat's actual response. And so in this case, if I open up in my other terminal window here, app.py, Pi. You'll see in my search route that instead of returning a template, I'm using a crazy named function called JSONify, which is just another function that comes with Flask itself that has the effect of taking the list of Python dictionaries that came back from my SQL database, JSONifying it in such a way that I then can uh serve it to anyone on the internet, myself included, as a service so that I and they can use my own data to implement ment their own web web applications. So that's sort of it for web programming. Ultimately, you now have all of the building blocks from week zero onward to make your own web applications. And if you so choose for final projects, your own mobile applications, even if this too, like everything else has felt like a bit of a fire hose, it is in the process of your final project of specking out and proposing and executing your own final project that will make all of this feel much more comfortable and familiar. And you'll look back on so many of the past weeks as useful building blocks. Uh but this then was your CS50 education weeks 0 through nine. We have just one more left next week. So we'll see you then. Heat. Heat. Heat. Heat. All right, this is CS50 week 10, the very end. And we will end today's class just as we ended week zero, which is a little bit of cake outside in the transcept. But over these past 10 plus weeks, if you've been feeling like it was that proverbial fire hose sort of hitting you in the face with so much new content, so many new skills, so many new challenges, um realize that you're in very good company. And we can officially declare nonetheless that if you started the class among those less comfortable, you are officially after today no longer less comfortable. You're at least somewhere in between. And if you were in between, you're more comfortable. And if you were more comfortable, you're perhaps now most comfortable among those here. Um, but keep in mind as per CS50 syllabus, what does ultimately matter in this course is not so much where you end up relative to your classmates, but where you end up relative where uh to where you yourself began. And that's taken into account come final projects, come final grades. But most importantly, that's really what's most important educationally in general is that delta from week zero to in our case here now week 10. Uh, so if it's any reassurance, something I like to bring up around this time is just how badly I did in CS50 and like the very first problem set. Like I didn't even get hello world right somehow in the fall of 1996. So here's a photograph of my homework assignment for assignment one. It was a program to print hello world on the screen. I was incredibly detailed with my comments. Even commenting that main is main which is not the way you're supposed to program. Even telling the the TF where my file ended, which is not really necessary. And I got minus two for not even following directions uh correctly. So take some comfort in that. Even if by problems at nine, you're still getting points off, you're hopefully, at least in my case, in some very good company. It only gets better and easier uh and faster in time. But the whole course ultimately has really been about this picture, right? Problem solving is computer science. And you have inputs, which is the problem to be solved. You have the outputs that you want to get to, which is presumably the solutions there, too. And inside of that proverbial black box are these algorithms, step-by-step instructions for solving some problem. And I pulled up my own notes from CS50's first lecture some 25 plus years ago too where I wrote down this in my horrible writing handwriting to this day. But I noted that what an algorithm is is a precise sequence of steps for getting something done which is pretty much what we now say. Uh I noted that programming itself as we have for weeks now is the process of taking an algorithm and putting it into a language that a computer can process and that's what you've done in Scratch and C and Python and SQL and JavaScript and anything in between. Um, and most important, at least my takeaway that day when it comes to algorithms is precision and correctness. Um, and indeed those are points we've made perhaps not as emphatically um, over the past several weeks as well. But we thought we'd see just how much those two lessons in particular have sunk in uh, by doing a bit of an exercise, some CS50 Pictionary and this our last lecture al together this term. Um, for which to begin we need one brave volunteer to come on up stage. Who would like to volunteer? Who? How about Okay, over here. We never call from the middle of the section. Come on up. Come on up. A round of applause for being so brave. Nice. All right, come on over. And in just a moment, let's go ahead and do introductions. First, if you want to come up over to the middle of the uh stage and introduce yourself to the world. >> Hi, I'm Gia. I'm a freshman. >> All right. Nice. Nice to meet you. Thank you for joining us. So, what we're about to do is G is going to look at my screen where there's going to be a picture on a white screen. All of you presumably have a white sheet of paper in front of you that you grabbed on the way in. If you don't, just grab one from a friend or your binder or the like. And if you really don't, that's okay, too. But hopefully everyone has a pen or pencil or someone near you does. And what Gia, we're going to ask you to do is program the audience to draw what it is you see on the screen. You can say anything you want, but you may not use any physical gestures or the like. Verbal programming only. >> Okay. >> All right. Come on over to the lectern and in just a moment GN only Gia will see what is actually here on the screen. So, step one for your audience. Okay. So, the first thing that you need to do is draw two lines right next to each other. Two vertical lines. Okay. >> Okay. >> Step two. >> Step two. Once you have done that, you need to draw three dots. One on above those two vertical lines, one right in the middle between those two vertical lines, and one at on the bottom of these three vertical lines, but beneath those two vertical lines. Yeah. So, three dots. >> Okay. Step three. Step three is on the top of the left vertical line, you're going to connect a line from that position to the top dot that you drew. And then on the top of the right vertical line, you're going to connect that position to the top dot that you drew. >> All right, step four >> is remember that top left position? You're going to connect that to the middle dot that you drew. And then the top right of the vertical line at the Yes. You're going to connect that to the middle dot of the line that you drew. >> Got it? >> And then step five, on the bottom left of your left vertical line, you're going to connect that position to the bottom dot that you drew. And then on the bottom right of the right vertical line, you're going to connect that position to the bottom dot that you drew. And now from the middle dot to the bottom dot, you should have no line in between that. And you can now draw a line between those two dots. >> Step six and the last. >> I think you should be done. >> All right. A round of applause then for our programmer. Let me give you a little something >> if you want to take a seat. So now what Kelly and I are going to do is very quickly collect your execution of this program and we'll see just how it went with Gia as the programmer. If you want to just reach out and hand me or Kelly over there any of your handwritings. We don't need all of them. Just a representative sample will suffice. If you're proud of your work, extend your hand quite a bit. Okay. Very proud. Okay. >> Okay. >> Okay. Okay. One more. One more. That's okay. All right. All right, I'm going to run back to the stage. Okay, it's okay if we didn't grab yours. All right. All right. Thank you to Kelly for grabbing these as well. So, without having seen any of these, here is how you all interpreted Gia's instructions. So, here's one interpretation. Okay. Perhaps similar or different from your own. Uh here's another several vertical vertical line question mark. Okay. Uh here is very narrow one. All right. And and let's see if we got any other variants thereof. Actually, the rest of them are pretty consistent. So, G, if it's any reassurance, I'm seeing a lot of ones that look like this. Here's another that looks like th this. And here's yet another that looks like this. So, if you're wondering where we're going with this, if I go ahead and reveal what it was Gia was looking at on the screen, she was in fact having you draw this here cube. So, some of the takeaways here. So, suffice to say, not all of that went well. Uh, but why was that? Well, I dare say it was very easy to get confused, I think, G, in some of your words because you had in your mind's eye exactly what it was you were drawing. And of course, it was right there on the screen. But we didn't leverage, at least in G's instructions, any abstractions. I dare say it might have been a little bit easier for all of us if maybe she had just teed things up by saying, "All right, everyone, we're going to draw a cube," for instance, which is indeed an abstraction over these lower level details that she was focusing on. But perhaps there could have been another approach altogether, which is even more pedantic. For instance, a lot of the earliest drawing programs and even worlds like Scratch sort of take for granted that you have a coordinate system like X's and Y's and you can go up, down, left, and right. So, an alternative to just saying, "Hey, I'll draw a cube, which could be subject to interpretation because the cube like this is it like this rotated." So, we still would have needed more information than just a cube from Gia. But here, maybe an alternative approach would have been to really get into the weeds and say, "Put your pen at the top of the page and then draw a straight line to the southwest, for instance, and then draw another line of the same distance to the south and then to the southeast or so forth." And it could have been in terms of degrees. It could be directionally in that way, but it might not have been clear to anyone what it was we were drawing until enough of the lines suddenly appear on the screen and then voila, you see that we've been drawing a cube this whole time. So the degree to which we're precise and the layer of the level of abstraction that we operate in is incredibly important. Whether it's for another human to understand us, for an AI to understand us nowadays, or anything in between. All right, why don't we go ahead and flip things around a bit um for this? Why don't we go ahead and get one more volunteer to do something a little different here on stage? One more. Okay, how about here on the aisle? Come on down. Round of applause for this brave volunteer. Come on down. All right. So, in this exercise, we're going to flip things around. So, you all will be giving the instructions verbally by just shouting them out. And our volunteer, whose name is >> Presley. >> Preston. >> Presley. >> Presley. Presley, you want to say a quick introduction? >> Yeah. Uh, my name is Presley. I'm a freshman uh living in Stoton House. >> Nice. Well, welcome. Come on over to the the uh the easel here. And we have a black marker for Presley here. And the only thing that we ask is that you not look up or behind you because the answer is going to be right there on the screen. But everyone else is welcome to look up or over to the TV screen. And if you want to go ahead and face the easel here and as you draw, just make sure to kind of open up after each uh stroke of the pen so that everyone can see what you have done. All right. So no looking up as of now because what the audience is about to do is to program you to draw this on the screen. Oh, way to encourage him. Okay. So, step one, feel free to just raise your hand and we'll shout them out. >> Oh, I heard draw a circle over here. >> But not too big. I heard over here a stick figure. >> Good abstraction. You're going to end up drawing a stick figure. But we should probably be a little more helpful than that. So, let's do the hand thing just so we can be more precise and not overwhelm Presley. There was a hand over here. Yeah. And back. >> Draw a line down. >> Draw a line down from the circle. Presley >> from the bottom >> from the bottom of the circle. Okay, someone else. >> Actually, let me let me rewind. Sorry. Say it again. >> Draw two diagonal lines from the line you just drew. >> Well, I don't think the audience likes this. Wait, let's Oh, >> okay. Okay, that's what we were told. Next step, someone else. >> Good one. Okay. Extend the original vertical line to be about the same height as the circle. >> Okay. Yeah, that's good. Good feedback. All right. Someone else. Next step. Next step. Yes. Draw two diagonal lines from the bottom of the line. >> Nice. Draw two diagonal lines from the bottom of that line that look like legs. Good use of detail and abstraction. Okay, nice. Next step. >> Anyone? We're close. Yeah, over here. line >> on the left. So, you're going to draw a speech bubble to the left of the head with the word high, capital H, with a short line. >> No bubble, just high. >> And you wanted to clarify one other detail. And then a line from high to the face. >> A line from high to the face >> with space in between. Okay. No, you're doing great. It's okay, Presley. Okay. Hang in there. Okay. Final step or two. Next step. Anyone at all. >> Feel free to shout it out. >> Adjust the arms to make them look like they're running. >> Adjust the arms to make them look like they're running. Good luck. >> Draw a perpendicular line from the left arm. >> Oh, I like that. Draw a perpendicular line from the left arm >> to the bottom >> to the bottom. >> Okay. And lastly, one final step. >> Same side as Yeah, it's permanent. Uh, I think we need a final touch on the other arm. Maybe. Yes. One final step. >> Anyone? >> Draw a perpendicular line per diagonally to the left >> of the arm >> of the right arm. Just a little bit. >> Just a little bit. >> All right. I think I've I think we've withheld our applause long enough. Presley, if you want to take a step back and look at what you They were trying to get you to draw a round of applause. So, here too. Let me Here you go. Your dorm room if you would like. Okay. And a little Super Mario as well. All right. So, here too. Um, I think you were the problem this time. Round of applause for Presley. And of course, since it's, you know, permanent ink, it's easy to sort of go off the rails early on and make a mistake. But I think that was actually a nice mix of low-level details like the directions of the lines and the lengths thereof and also some abstractions because I do dare say someone shouting out that it is to be a stick figure gave him a much more helpful mental model. So that might be sort of the comments on top of the function, but when we really got into the weeds of implementing that function, it was more akin to stepbystep instructions for solving this here particular problem. So my thanks to Presley for bearing with us with that one as well. So beyond this, where have we been up until now? So uh if we look back at the past several weeks, this is sort of the trajectory on which uh we've been. So we started with scratch from scratch literally in the very first week. The goal of which was to introduce you to some of those procedural fundamentals like what a loop is and a conditional and boolean expressions and variables which have pretty much recurred in different forms and different languages over the week since thereafter we transitioned to a more traditional language C which many of you will never use again and admittedly even I only use it for like a month or two of the year during CS50 itself. The intent was to be this incredibly foundational language that so many other languages today are built on top of. Case in point, the interpreter that you might use for Python itself can be written in C. And that speaks to how we sort of talked about bootstrapping from one language to another, from lowlevel to high level and beyond. Arrays and algorithms, all of that and uh memory and data structures like all of that is sort of omnipresent in computing, in programming and the like. even though you might not need to in modern languages like Python uh worry as much about managing your own memory because good programmers better programmers have figured out how to solve those problems for you in the language itself or in the libraries that you're using. You can take for granted now that you at least know what a hash table is, what a linked list is, what the trade-offs are among those, what the running times are. And that's what computer scientists and software engineers think about and talk about and whiteboard about in the real world when trying to implement algorithms of their own to real world problems or implementing real world products. And then of course over the past few weeks we've sort of used that as a stepping stone to talk about very modern programming paradigms. most recently web programming. And even though we didn't use it explicitly in the class, mobile programming is increasingly based on HTML and CSS and JavaScript, which might be something some of you will tackle for your own final projects. And you can't escape now using or seeing or leveraging somehow artificial intelligence. And among the goals for today is to at least point you in the direction of tools that now having finished problem set 9, you are welcome and encouraged to use for your final project so that you can build all the more um and all the more successfully than even some of your predecessors just a few years ago could have now that your own work and your own knowhow can be amplified by the impact of AI itself. Um this of course now brings us to today the end, but wanted to give you a sense of where you can go here on out. So with your final project, this really is the uh the intent of the final project is to be the very first of hopefully many projects that you decide to spec out for yourself. Like every problem set thus far has been written by me and the team and you've been following our instructions step by step. The final project takes all of those training wheels off. And even though you are welcome and encouraged to borrow code from say problem set 9 if you want to do something web- based or even earlier if you want to do something that's more similar to past pets is to make it ultimately your own. And even if you want, start with a completely empty window and just a blinking prompt and build something of your own. Um, setting out for yourself, as you've seen in the specification, a good goal, which you intend to meet no matter what, a better goal, which is a bit more of a stretch, and a best goal, which in practice rarely ever happens with software. To this day, 25 years since taking CS50 myself, um, or plus now, um, even I consistently underappreciate just how long it takes sometimes to solve problems. But that's beginning to go away at least to some extent thanks to AI where at least now you essentially have a junior colleague next to you who can help solve bugs for you, point you in the right direction, even tackle features as well. Um, all that we ask for this final project is that you build something of interest to you, that you solve an actual problem, that you impact campus, or that you, as we say in the spec, change the world and try to achieve something, try to create something that outlives the course itself over these final few weeks of the class and even continue on with it if you'd like in January and beyond. Uh, for now, this the so-called CS50 charades for which we need two teams of three. So, if you're sitting there in a group of three of friends total, or we'll form one up here live. So, come on up as our first volunteer. Need five more volunteers. Feel free to volunteer. The person's next to you. Three in a row. How about two more over here? One. And how about two on the end? Come on up. All right. And a round of applause for these six here volunteers. And all right, let me give you one microphone. Let me give you second microphone. And Kelly, if you want to come on up as well. I think these three seem to know each other already. So, we'll have them be one team. If you guys want to be another team as well, come on up. Uh, let me take one microphone actually for the other team. All right. And how about quick introductions to this team here. And first, we need a team name from you all. You haven't had time to think about this. >> Team A. Okay. So, team A is who? >> Uh, I'm Leah. I'm a first year and I'm in wholeworthy. >> Welcome. Uh, >> my name is Stephen. I'm a freshman in candidate F. I'm Charlotte. I'm a freshman and I'm also in Canada F. >> All right, let's do introductions on the other team as well. You are going to be team >> Awesome Sauce. >> Awesome sauce. Okay. Versus team A. Uh, if you want to go ahead and introduce yourselves here. >> Hi, my name is Jenny Pan. I'm a freshman in Hollis. >> Hi, my name is Noah. I'm a freshman in Halbut. >> And hi, my name is Marie and I'm a freshman. Sorry, I'm a freshman in Canada. >> All right, welcome to both of our teams here. And among the goals now, let's leave one microphone with each team, uh, is to play a bit of charades whereby one of you in a moment is going to be responsible for acting out a word that you see on the screen. So, we're going to put on this screen and this screen over here some term that relates to CS50 somehow, and that person's goal over the course of 60 seconds is going to be to act that out in such a way that their teammates can hopefully guess what the word is. We'll give you 60 seconds at a time. Kelly has kindly offered to keep score. Um, and if you solve it in fewer than 60 seconds, we got another word for you and another word. And we'll see how many points you can acrewue over the course of those 60 seconds. And depending on how this goes, we'll do maybe one or two rounds in total. Questions. >> Skips do we get? >> How many skips do you get? I guess you can skip uh as many as you want until we run out of questions. >> Oh. Oh, >> but try not to run through all of our questions. All right. Any questions though beyond that? All right. So, if you guys want to step off stage over there, why don't we have team A begin? So, one of you, Leah, if you're holding the mic, if you want to be the charader, let's go ahead and have you stand here so you can see the screen. And we only ask that you two not look up because the answer is going to be right there. >> All right. And you should just shout out uh the word that Leah is acting out. Question. >> Acting only charades. >> Speaking. >> Yeah. Yeah, I can't speak because that would kind of defeat the point. So, yes, just acting out. Just acting out physically. All right. >> I'm going to go over here. Give me just a moment to get the slides ready with your questions. And Leah, the first clue. Oh, and Kelly's going to be timing you. 60 seconds to acrew as many points as you can. All right, here we go. Go. Act that out. >> Oh, that was weird. Thank you. Sorry. Yes. Act out. This is CS50. All right. No. Act this out. Please go. >> Loop. calling a recursion. >> Yes. One point >> coming >> uh an array link list >> abstraction >> snake. >> Python. Python. >> Yes. Python >> duck. The duck. >> Nice. >> Binary. Uh >> one zero >> binary digit bit >> bite >> one zero. It's definitely binary asy. >> Want to pass >> link list array. >> Yes. Array >> loop. >> Yes. Loop >> time. time. All right. Very nicely done. All right. Five is the score to beat. So, if you guys want to step over here, if uh one of you has the mic, go ahead and assume the same roles. Five is the score to beat. All right. Five is the score to beat. All right. Here we go. Final round. First word. And you guys just make sure you don't look up. Go. Head node >> algorithm >> input algorithm >> these are hard No. >> Sure. You have to act it out. Act it out. >> Oh, they go. Run time. Run time. What's that? >> Tree. >> Yes. Tree. >> Next one. >> Oh my god. >> Next one. >> Binary search. >> Binary boolean. No. A merge s call phone call >> function. >> It was binary search, wasn't it? >> What was binary >> phone? Oh, that's time. All right, but a round of applause for our team awesome sauce. >> Okay, we have some some parting prizes for you, your very own Super Mario Pezes for you guys as well. I'm glad we squared away that the ability to pass though on the question, so thank you for that. All right, so admittedly pretty hard. Our thanks to all of these volunteers for playing that out. Allow me to turn our attention back to here in just a moment where else uh we can go from here. So up until now up until now we've been using Visual Studio Code for CS50 at the URL CS50. Recall that this is just an adaptation of a commercial tool called GitHub code spaces which is like a cloud-based version of Visual Studio Code itself or VS code which is an largely open source tool for Microsoft that's incredibly popular in the industry which is to say even though we have the CS50 library in there and we turned off by default some of the menu options and we disabled AI. It is the tool that so many programmers around the world do use every day to write code. So you have been learning all this time sort of industry standards in that sense. It is now time if you so choose, but you are welcome to keep using this for your final project if feeling more comfortable with it. Uh to drop the 4CS50 and actually install on your own Mac or PC if you so choose Visual Studio Code itself. You can go to this URL here. Um it's fairly straightforward to install it. But invariably you'll run into probably some technical support headaches depending on the language that you're trying to use with it. For instance, if you're trying to use it with Python, you'll probably also have to download and install Python onto your computer at least if you want the latest version. And just know a priori that sometimes just stuff happens and it just doesn't work and you have to Google or ask chat GPT and that's fine and honestly that's kind of normal but this is also why we don't do any of this in week zero of the class so that we can focus on hello world and Mario and cash and credit and get into the interesting parts of computing and programming and not frust uh not frustrating you so with technical support challenges. But now given that all of you are somewhere in between or among those more comfortable uh you're now ready to sort of uh deal with those same technical challenge yourself. But who knows maybe it will go perfectly smoothly. Um you can go to CS50's own documentation because if you want to be able to use all of the same software that CS50 has pre-installed you can use a technology known as containerization with a tool called Docker and actually run a CS50 environment on your Mac or PC or even in the cloud but still run VS Code on your own Mac and PC. Among the upsides of which are that you're not dependent necessarily on the cloud. You can do everything offline. Uh which is useful in general. You can do things more quickly sometimes if you're using the full capabilities of your own computer and not just a browser. So this is generally how uh programmers approach their code using something like VS Code or alternative products. And in fact there's a bunch of others out there but perhaps the trendiest right now are these three here. Not just Visual uh Studio Code itself um but a tool called Cursor, another one called Windsurf. There's dozens of other text editors, often known as integrated development environments, which tend to have even more features that you can download for free or commercially on your own Macs, PCs, and the like. Uh, but you can't go wrong transitioning from CS50 to VS Code on your own Mac or PC, if only because you're already familiar with it. As for the command line, so those of you with Macs might have found somewhere in your utilities folder a program called Terminal. Um, if not, poke around there later today and you'll see that all this time you've had a command line interface available to you on Mac OS. Windows has something similar as well. They don't necessarily come with all of the same tools that we've been using within CS50.dev, but if you're a Mac user and you go to this URL here, or you're a Windows user and you go to this URL here, or if you're a Linux user, you probably know all of this already, so there's no URL for you there. Um you can install some of those same tools on your Mac and PC and feel all the more at home uh doing things in a command line as well. Um git this is something that we actually in CS50 abstract on top of. This is essentially the de facto standard nowadays for collaborating with other people using a central cloud server in order to share your code with it and in turn other people uh for versioning your code so that you keep track of multiple uh versions thereof and changes that you've made. um go to this URL here if you would like and you'll see a tutorial by CS50's own Brian U introducing you to actual Git because we've been sort of abstracting away this particular tool by just doing it all automatically for you. If you've ever gone through your timeline in CS50.dev being able to roll back to previous versions of your code, we're just using Git, but we're automatically running this command for you. If you want to collaborate with partners for your final project, you can use Git. However, I will encourage you to alternatively use Visual Studio Code's live share feature, which allows one of you to log into your code space, click some buttons, and then share access to your code space with your friend or your partner on whom with whom you're working on the project, and you can both in real time like Google Docs edit the code or different files therein uh using that one code space. A little easier than getting onboarded at least with Git. um hosting a website if this proves of interest for your final project or even after the course if it's a static website. Two popular places to go if only because they offer free tiers is what's called GitHub pages which you can use to just host HTML CSS and JavaScript with no Python, no Flask, no backend. Um or Netlefi is a popular company nowadays too that has an uh entry-level account that for which you can sign up for free. If you just want to have like a portfolio website, if you're an artist or a programmer, you just want to have static content that you write once and deploy, these are good starting points, but not all of them. Hosting a web app. So, this law, this list gets even longer. And all of these recommendations are essentially uh curated by the teaching staff. So, they're all opinionated, but these are perhaps the most common places you can go. Um, Amazon, Microsoft, Google, Cloudflare, they all have student type accounts. So, if you use your.edu email address, for instance, or some other form of proving your status as a current student, you can generally sign up for discounts and free access to a lot of these same services as well without having to pay while you're just learning along the way. GitHub has something similar called the student developer pack. And then a couple of other companies for hosting web apps that have been popular are Heroku, Verscell, and bunches of others. So by web app we mean not just HTML, CSS and JavaScript but maybe some Python maybe some JavaScript on the server maybe Ruby yet another language or any number of others when you actually need a backend in addition to the front end maybe you need a database as well this would be the place to start whether it's at the CS50 hackathon or beyond um and nowadays this is a slide that didn't even need to exist a couple of years ago asking AI again for your final projects you are welcome and encouraged to amplify your own productivity with AI not by having it do for you but moving away from the duck which by design has been fairly limited and meant to be a good teacher but not necessarily one that's going to be a good partner when it comes to building your final project. So chatbt claw gemini uh GitHub copilot openai codeex v 0ero um are all uh popular tools right now that you might want to play around with. The easiest of these to use perhaps if not familiar with say Chacha BT already would be GitHub copilot only because you can enable it within your CS50 code space by following our own documentation at cs50.thed the docs.io where we'll tell you the sequence of steps via which you can reenable AI now that you're allowed to for your final project and turn on all of those features that were disabled by default. Um and then there's still humans out there like it remains to be seen just how popular these websites are in the years to come for better or for worse. Um, but among the places that programmers and technopiles have gone for years are Reddit, Stack Overflow, Server Fault, where there's a rich history of questions and answers that ironically all of those AIs have been trained on, which unfortunately means some of these might be driven out of business eventually in some sense if we're all just turning only to AI. But when you actually want that human component, these are still good places to go. Um, and then news. Two of the many places you can go for news in technology, computing, computer science more broadly, would be TechCrunch is still a good one. hacker news so to speak and then you might have some of your own popular choices as well. Um and then if uh with some bias um take other classes like CS50 besides this undergraduate class has a rich history now over the past decade of creating all the more open courseware. So courses in more Python, more SQL, a language called R, cyber security, uh game development and more. All of those are linked at this URL here edex.org.css50 where you need not pay or sign up beyond auditing the course and all of the content is freely available. something for winter break, for instance, if you want to dive a little more deeply into some subject for the sake of your final project, your professional aspirations, or even just to prepare for spring term. And then over the coming weeks too, will CS50 itself be soliciting interest in applications for becoming a teaching fellow or TF, a course assistant or CA. If you would like to get all the more involved as a teacher of CS50 next fall, uh do uh follow the application link that we will soon circulate uh via email. Um, and do stay in touch too if you just enjoy answering other people's questions or seeing what the pulse of sort of computing is. At this URL here is a whole bunch of CS50's own communities uh in social media largely via which you can follow along at home in the months and years to come too. So, a few thanks before we do one final game al together. Um, to all of the people who have been making this course possible. Um, so our friends at Memorial Hall who make bring us into this beautiful space and make it possible for us to have of all things a class in such a space. um our friends at ESS who help with the audio each and every week in CS50. Um the restaurant Changa down the road, we hope you'll continue to visit our friends there. Wesley Chen is a good friend of ours and the manager um please tell him you're from CS50 and I'm sure he'll be delighted to see you. Um and then CS50's own team, most of whom were in back there or sitting next to you with cameras um without whom the course wouldn't be possible. And of course CS50's own teaching fellows and CAS, just a few of whom posed here for this photo. If I could invite you to all give everyone here a round of applause, my thanks to all of them. So, um, and then of course the CS50 duck should be thanked as well. Okay. Thanks. The CS50's own Rang Shinlu and some of our own former teaching fellows and students who have been behind the development of that their duck that you've gotten to know over these past several months. All right, if Kelly could join me again on screen, the only thing between us and cake is a final game, namely a quiz show in which all of you can partake. Here we go. Question one. What is the largest number an 8bit unsigned binary digit can represent? 256, 128, 255, or one? Starting strong, and keep in mind all of these questions came from you all because we asked you recently for review questions that are now on the screen. Again the timer is clicking and most popular answer was 255 which I think if we click once more we'll confirm was in fact the correct answer. So why is that and why is it not 256? Well if we start counting from zero as we always have that's consuming one of the 256 possibilities. So the largest number that we can represent with that's 8 bit and unsigned which means no negative numbers involved is indeed going to be 255. treasure that information now always. All right, next question from Kelly. Which issue is at the center of the year 2038 problem, which hopefully you added to your Google calendars a few weeks back. Integer overflow, malicious inputs, SQL injection attacks, or memory leak. Which of those is at the core of the year 2038 problem? All right, let's go ahead and reveal the number one answer with 92% of you saying integer overflow is in fact correct because we're still in the habit of using 32-bit integers to keep track of time from the so-called epoch which was January 1st, 1970. And unfortunately, we humans aren't great at sort of planning ahead. And so we're going to run out of permutations of 32bits by a certain date in the year 2038 unless everyone upgrades their computers to 64-bit counters which thankfully most every piece of modern hardware nowadays is using already. Your Macs, your PCs, and your phones. So hopefully this will be really a non-event, but hopefully you'll think of us in CS50 in uh you know 10 plus years when your Google calendar reminder goes off. Question three, which of the following is not a step of compiling? Linking, pre-processing, assembling, or interpreting? Bit more of a challenge. Which of these is not a step of compiling? All right, almost 200 responses coming in. All right, why don't we go ahead and reveal the most popular answer with 54% of you saying interpreting is in fact correct. Recall that we we talked about compiling. Compiling itself is just one of several steps. There is in fact the pre-processing step which takes care of any of the hash symbols in C that start with hash include hashdefine and the like. That's pre-processing. Uh there was then assembling or there was then compiling which actually compiled your code into assembly code. There was then the assembler which would actually take it down further to machine code and then linking 29. This is for 29% of you. The linking step, recall, was taking your zeros and ones and combining them with say CS50's libraries zeros and ones and maybe the standard IO libraries zeros and ones, linking them all together to give you one executable program like hello uh itself. All right, next question. What does a pointer store? The name of a variable, the memory addresses of a value, the size of a value, or the value of a variable? Think for a moment. What does a pointer store? All right, about 200 responses in and yes, the memory address of a variable with 96% of you confirming as much. That is correct. Question five. What is the running time of linear search? Big O of 1, big O of N, big O of N squared, or big O of N log N? linear search running time. And recall that with something like search, you could get lucky. But if big O is the upper bound on our running time, you might not. You might hit the end of the list that you're searching. And so the running time of linear search is of course big O of N. It might be omega of one, but not big O of one. At least if we're considering what the worst case scenarios might be. All right, on to question six. Which what data structure follows the first in first out principle? A Q, a link list, a stack, or a hash table? First in, first out, aka FIFO. Which of these is FIFO? All right. First in, first out is in fact a Q as you would hope if you're getting in line for a restaurant, for a store. You'd hope that if you're the first one in line, you're going to be the first one out equitably speaking. And so it is in fact a queue. The opposite of that in some sense then would have been a stack whereby when you think about the cafeteria trays, the sort of first one in is actually the last one out. So LIFO instead for a stack. All right, question seven. Which operator returns the memory address of a variable? An asterisk, a dollar sign, an amperand, or a hyphen and a greater than sign. presumably in C which returns the memory address of a variable. All right, let's see what everyone thinks. So the most popular and correct answer is the amperand. This is the address of operator. The asterisk recall in most context is the opposite of that. That's the dreference operator. It's actually go to an address. Um this is not a thing in C. Uh this though is similar in spirit to a combination of the star operator and the dot operator which means to dreference and follow a pointer to something inside of a strct typically. All right, question eight. Which SQL command is used to remove duplicate rows from a result set? Remove, unique, distinct, or clean? We didn't spend a huge amount of time on these keywords, but only one of them applies here. A result set is just the answers that you get back when doing your select. And if you want to filter out duplicates, you can in fact say distinct is correct. Unique is also a keyword in SQL, but that is when you want to define in your schema that a columns values are going to be unique, like an email address column instead. Distinct is how you filter out duplicates in your selects. All right, question nine. We're past the halfway mark. What does an HTTP code of 418 signify? Not found. I'm a teapot. Forbidden, unauthorized. 418. This too. If you know this one, moving forward, you'll be considered among the CS elite. answers are coming in a little slower, but I'm a teapot is correct, which is not actually a thing or useful technology. It was in fact an April Fool's joke years ago where a bunch of computer scientists got together in a room and wrote out an entire specification for what it means for a server to return 418. I'm a teapot. All right, number 10. Where does Malo dynamically allocate memory from? The heap, the stack, global variables, or assembly? All right, heap is in fact correct. That's the sort of top part of the memory. Even though top and bottom make no actual technical sense. It's just our artist rendition thereof. The stack recall is what is used when functions are being called. Every time a function is called, it gets a so-called frame on the stack. That's where your local variables and your arguments get put. But if in C you use maloc, it does in fact end up on the heap. in C. If you allocate memory with Maloc but forget to call free, what problem can occur? A memory leak, segmentation fault, stack overflow, or all of the above if you allocate memory with Maloc but forget to call free. What problem can occur? All right, most popular answer is in fact memory leak, which is correct. Um, you could imagine scenarios in which you also get a segmentation fault andor a stack overflow, but those aren't direct consequences of not calling free. That's generally the consequence of using too much memory, for instance, or in this case doing something wrong with your memory. So interrelated, yes, but in terms of not calling free for each maloc, this is what's going to happen by definition. All right, well done there. Next question, which is 12. What does this domain name give the web page of? Safetychool.org. Is it Harvard University? Is it Princeton University? Is it Yale University? Or Colombia University? All right. Recall that this was in the context of our HTTP redirections. Yes. Interesting. Yes. In fact, uh Yale University, some alum has been paying like $10 a year for like 20 years for this joke. safetychool.org if you visit it returns an HTTP 301 uh HTTP header which says the location of it is in fact yale.edu. All right 13 three to go. What is the purpose of DNS? Uh to encrypt data sent over the dark web to find the nearest coffee shop for you to protect your location against hackers or to translate domain names into IP addresses. What is the purpose of DNS? If helpful, domain name system. All right, about at the 200 mark and the correct answer is indeed domain names into IP addresses. That is a server that is on your home network, on your ISP's network, on your campus's network, your corporate network. That just answers questions like that for you. All right, second to last question. Which of the following is not a built-in SQL feature to tackle race conditions? Begin transaction, commit, roll back, or enroll? We talked ever so briefly about this in the context of ending up with too much milk. Recall and the correct answer is indeed in roll. All three of those even though you didn't have to use them for problem set seven or nine um are indeed uh features of SQL. Uh but enroll is not a thing. All right. And the very last question. and try to answer this as quickly as you can. What does Professor Men say at the beginning of every CS50 lecture? Welcome to Harvard's computer science class. Hello everyone. Ready to code? All right, this is CS50 or let's get started with some programming. All of these questions were in fact written by you all. All right. And the correct answer, I'm pretty sure with 98% of you saying so, is all right, this is CS50. And all right, this was CS50. Cake is now served.
Download Subtitles
These subtitles were extracted using the Free YouTube Subtitle Downloader by LunaNotes.
Download more subtitlesRelated Videos
Download Java Full Course Subtitles for Free (2025)
Enhance your learning experience with downloadable subtitles for the Java Full Course 2025. Access accurate captions to follow along easily, improve comprehension, and review key concepts at your own pace.
Download Subtitles for CLAUDE CODE Full Course 2026
Enhance your learning experience with downloadable subtitles for the CLAUDE CODE FULL COURSE 4 HOURS: Build & Sell (2026). These captions help you follow along easily, improve comprehension, and revisit key concepts anytime. Perfect for learners who want clear, accessible content at their own pace.
Download Subtitles for Adobe Illustrator Beginners FREE Course
Enhance your learning experience with accessible subtitles for the Adobe Illustrator for Beginners FREE Course. Download captions to follow along easily, improve comprehension, and master the software at your own pace.
Download Subtitles for 2022 ICT Mentorship Episode 2 Video
Enhance your understanding of the 2022 ICT Mentorship Episode 2 by downloading accurate subtitles. Subtitles make it easier to follow technical discussions and ensure you don’t miss any important insights. Perfect for learners who prefer reading along or need accessibility support.
Download Subtitles for 'How Transistors Run Code' Video
Enhance your understanding of computing with downloadable subtitles for the "How Transistors Run Code" video. Access clear, accurate captions to follow along easily and reinforce your learning experience.
Most Viewed
ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1
ดาวน์โหลดซับไตเติ้ลสำหรับวิดีโอ DMD LAND 3 The Final Land Day 1 เพื่อช่วยให้เข้าใจเนื้อหาได้ง่ายขึ้น และเพิ่มความสะดวกในการติดตามทุกช่วงเวลา เหมาะสำหรับผู้ชมที่ต้องการความชัดเจนและเข้าถึงข้อมูลอย่างครบถ้วน
Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63
Accede fácilmente a los subtítulos del episodio 63 de '6 DE COPAS', centrado en el narcisismo. Descargar estos subtítulos te ayudará a entender mejor el contenido y mejorar la experiencia de visualización.
Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56
Descarga los subtítulos para el episodio 56 de la tercera temporada de 6 DE COPAS, centrado en los tipos de apego. Mejora tu comprensión y disfruta del contenido en detalle con nuestros subtítulos precisos y accesibles.
Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen
Laden Sie die Untertitel für den gesamten Film 'Nicos Weg' herunter, um Ihr Deutschlernen auf A1 Niveau zu unterstützen. Untertitel helfen Ihnen, Wortschatz und Aussprache besser zu verstehen und verbessern das Hörverständnis effektiv.
Download Subtitles for Your Favorite Videos Easily
Enhance your video watching experience by downloading accurate subtitles and captions. Enjoy better understanding, accessibility, and language support for all your favorite videos.

