LunaNotes

Download Subtitles for Harvard CS50 2026 Computer Science Course

Harvard CS50 (2026) – Full Computer Science University Course

Harvard CS50 (2026) – Full Computer Science University Course

freeCodeCamp.org

40449 segments EN

SRT - Most compatible format for video players (VLC, media players, video editors)

VTT - Web Video Text Tracks for HTML5 video and browsers

TXT - Plain text with timestamps for easy reading and editing

Subtitle Preview

Scroll to view all subtitles

[00:00]

If you want to learn about computer

[00:01]

science and the art of programming, this

[00:03]

course is where to start. CS50 is

[00:06]

considered by many to be one of the best

[00:08]

computer science courses in the world.

[00:11]

This is a Harvard University course

[00:13]

taught by Dr. David Men and we are proud

[00:16]

to bring it to the free code camp

[00:18]

channel. Throughout a series of

[00:19]

lectures, Dr. Men will teach you how to

[00:21]

think algorithmically and solve problems

[00:24]

efficiently. And make sure to check the

[00:27]

description for a lot of extra resources

[00:29]

that go along with the course.

[01:14]

All right. This is

[01:20]

This is CS50, Harvard University's

[01:23]

introduction to the intellectual

[01:25]

enterprises of computer science and the

[01:26]

arts of programming. My name is David

[01:28]

Men and this is week zero. And by the

[01:30]

end of today, you'll know not only what

[01:32]

these light bulbs here spell, but so

[01:34]

much more. But why don't we start first

[01:35]

with the uh the elephant or the elephant

[01:38]

in the room. That is artificial

[01:39]

intelligence, which is seemingly

[01:41]

everywhere over the past few years. And

[01:42]

it's been said that it's going to change

[01:44]

programming. And that's absolutely the

[01:46]

case. It's been that way actually for

[01:47]

the past several years is only going to

[01:49]

get to be the case all the more. But

[01:51]

this is an incredibly exciting time.

[01:53]

This is actually a good thing I do think

[01:55]

in so far as now using AI in any number

[01:57]

of forms. You can ask the computer to

[01:59]

help solve some problem for you. You can

[02:01]

find some bug or mistake in your code.

[02:03]

Better still increasingly you can tell

[02:05]

the AI what additional features you want

[02:07]

to add to your software. And this is

[02:09]

huge because even in industry for years,

[02:11]

humans have been programming in some

[02:12]

form for decades, building products and

[02:14]

solutions to problems, the reality is

[02:16]

that you and I as humans have long been

[02:19]

the bottleneck. There's only so many

[02:20]

hours in the day. There's only so many

[02:22]

people on your team or in your company

[02:24]

and there's so many more bugs that you

[02:26]

want to solve and so many more features

[02:28]

that you want to implement. But at the

[02:30]

same time, you still really need to

[02:32]

understand the fundamentals. And indeed,

[02:34]

a class like this CS50 has never been

[02:36]

about teaching you how to program. Like

[02:38]

that's actually one of the side effects

[02:39]

of taking a class like this. But the

[02:41]

overarching goal is to teach you how to

[02:42]

think, how to take input and produce

[02:44]

correct output and how to master these

[02:46]

and other tools. And so by the end of

[02:48]

the semester, not only you will be not

[02:50]

only will you be acquainted with

[02:52]

languages like Scratch, which we'll

[02:53]

touch on today if you've not seen it

[02:54]

already, languages like C and Python and

[02:57]

SQL, HTML, CSS, and JavaScript. You'll

[03:00]

be able to teach yourself new things

[03:02]

ultimately, and ultimately be able to

[03:04]

tell computers increasingly what it is

[03:06]

you want it to do. But you'll still be

[03:08]

in the driver's seat, so to speak.

[03:09]

You'll be the pilot. You'll be the

[03:11]

conductor. Whatever your preferred

[03:12]

metaphor is. And that's what I think is

[03:14]

so empowering still about learning

[03:16]

introductory material, foundational

[03:17]

material, because you'll know what

[03:19]

you're ultimately talking about and what

[03:20]

you can in fact solve. And we've been

[03:22]

through this before, like when

[03:23]

calculators came out. It's still

[03:25]

valuable, I dare say, all these years

[03:26]

later to still know how to do addition

[03:28]

and subtraction and whatnot. And yet, I

[03:30]

think back on some of my own math

[03:31]

classes. I remember learning so many

[03:33]

darn ways in college how to take

[03:35]

derivatives and integrals. And after

[03:37]

like the six process of that, I sort of

[03:39]

realized, okay, I get it. I get the

[03:40]

idea. Do I really need to know this many

[03:42]

ways? And here too, with AI and with

[03:44]

code, can you increasingly sort of

[03:46]

master the ideas and then lean on a a

[03:49]

co-pilot assistant to actually help you

[03:51]

solve those same problems. So, let's do

[03:53]

some of this ourselves here. In fact,

[03:55]

just to give you a teaser of what you'll

[03:56]

be able to do yourselves before long,

[03:59]

let me go ahead and open up a little

[04:00]

something called Visual Studio Code, aka

[04:03]

VS Code for short. This is popular

[04:05]

largely open- source or free software

[04:07]

that's used by real world people in

[04:09]

industry to write code. And it's

[04:10]

essentially a text editor similar to

[04:12]

Notepad if you're familiar with that or

[04:14]

text edit kind of like Google Docs but

[04:16]

no boldf facing and underlining and and

[04:18]

things like that that you'd find in word

[04:19]

processing programs. And this is CS50's

[04:21]

version thereof. We're going to

[04:22]

introduce you to this all the more next

[04:24]

week. But for now, let's just give you a

[04:26]

taste of what you can do with an

[04:28]

environment like this. So I'm going to

[04:29]

switch over to this program already

[04:31]

running VS Code. And in this uh bottom

[04:35]

of the screen, you're going to see a

[04:36]

so-called terminal window. Again, more

[04:37]

on that next week. But it's in this

[04:38]

terminal window that I can write

[04:40]

commands that tells the computer what I

[04:41]

want it to do. For instance, let's

[04:43]

suppose just for the sake of discussion

[04:45]

that I want to make my own chatbot, not

[04:48]

chat GPT or Gemini and Claude, like

[04:50]

let's make our own in some sense. So,

[04:52]

I'm going to code up a program called

[04:54]

chat.py. And you might be familiar that

[04:56]

I using a language here.py is it's just

[05:00]

called Python. And if unfamiliar, you're

[05:01]

in good company. You'll learn that too

[05:03]

within a few weeks. And at the top of

[05:04]

the file here, I can write my code. And

[05:06]

at the bottom of the file of the window

[05:08]

here, I can run my code. So, here's how

[05:11]

relatively easy it is nowadays to write

[05:14]

even your own chatbot using the AI

[05:17]

technologies that we already have. I'm

[05:18]

going to go ahead and type a command

[05:20]

like import uh uh I'm going to go ahead

[05:23]

and type the following from OpenAI.

[05:26]

import open AI. We'll learn what this

[05:28]

means ultimately, but what I'm going to

[05:30]

do is write my own program on top of an

[05:33]

API, application programming interface

[05:36]

that someone else provides, a big

[05:37]

company called OpenAI, and they're

[05:39]

providing features and functionality

[05:41]

that now I can write code against. I'm

[05:43]

going to create a so-called client,

[05:44]

which is to say a program of my own

[05:47]

that's going to use this OpenAI

[05:50]

software. And then I'm going to go ahead

[05:51]

and ask this software for a response.

[05:54]

And I'm going to set that equal to

[05:56]

client.responses.create

[05:59]

whatever all that means. And then inside

[06:01]

of these parenthesis I'm going to say

[06:03]

the following. The input I want to give

[06:05]

to this underlying API is quote unquote

[06:08]

something like in one sentence

[06:11]

what is CS50? Much like I would ask

[06:13]

chatpt itself. If you're familiar with

[06:15]

things like chat GPT and AI more

[06:17]

generally nowadays, you know there's

[06:18]

this thing called models which are like

[06:19]

statistical models that ultimately drive

[06:21]

what the AIs can do. I'm going to go

[06:23]

ahead and say model equals quote unquote

[06:24]

gpt5 which is the latest and greatest

[06:27]

version at least as of today. Now down

[06:29]

in my terminal window I'm going to run a

[06:31]

different command python of chat.py and

[06:35]

so long as I have made no typographical

[06:37]

errors in this program I should be able

[06:39]

to ask openai not with chatgpt.com but

[06:44]

with my own code for the answer to some

[06:46]

question. But I want to know what the

[06:47]

answer to that question is. So, I

[06:49]

actually want to print out that response

[06:51]

by saying print response output text. In

[06:55]

other words, these 10 lines, and it's

[06:57]

not even 10 lines because a few of them

[06:58]

are blank, I've implemented my own

[07:00]

chatbot that at the moment is hard-coded

[07:02]

that is permanently configured to only

[07:04]

answer one question for me. And let's

[07:07]

see, with the cross of the fingers, CS50

[07:10]

is Harvard University's introductory

[07:11]

computer science course, the

[07:12]

intellectual enterprises of computer

[07:14]

science and the art of programming.

[07:15]

weirdly familiar covering problems

[07:16]

solving algorithms, data structures, and

[07:18]

more using languages like C, Python, and

[07:19]

SQL. Okay, interesting. But let's make

[07:21]

the program itself more dynamic. Suppose

[07:24]

you wanted to write code that actually

[07:26]

asks the human what their question is

[07:28]

because very quickly might we want to

[07:30]

learn something more than just this one

[07:31]

question. So up here, I'm going to go

[07:33]

and change my code and type something

[07:35]

like this. Type prompt equals input with

[07:39]

parenthesis. More on this another time,

[07:41]

too. But what I'm going to ask the user

[07:43]

for is to give me an actual prompt. That

[07:45]

is a question that I want this AI to

[07:47]

answer. And down here, what you'll

[07:49]

notice, even if you've never programmed

[07:51]

before, is that I can do something

[07:52]

somewhat intuitive in so far as line

[07:55]

five is now asking the human for input.

[07:57]

Let's just stipulate that this equal

[07:58]

sign means store that answer in a

[08:00]

variable called prompt where variables

[08:02]

just like in math x, y, or z. Let's go

[08:04]

ahead and store that in prompt. So the

[08:06]

input I want to give to open ai now is

[08:09]

that actual prompt. So, it's a

[08:10]

placeholder containing whatever

[08:12]

keystrokes the human typed in. If I now

[08:14]

run that same command again, python of

[08:16]

chat.py, hit enter, cross my fingers,

[08:20]

I'll see now dynamic prompting. So,

[08:23]

what's a question I might want to ask?

[08:24]

Well, let's just say it again. In one

[08:26]

sentence, whoops, in one sentence, what

[08:29]

is CS50? Question mark. Enter. And now

[08:32]

the answer comes back as probably

[08:36]

roughly the same but a little bit

[08:38]

different a variant thereof. But maybe

[08:40]

we can distill this even more

[08:42]

succinctly. How about let's run it

[08:43]

again. Python of chat.py and let's say

[08:45]

in one word what is CS50 and see if the

[08:49]

underlying AI obliges.

[08:52]

And after a pause course in a word. So

[08:56]

that's not all that incorrect. And maybe

[08:57]

we can have a little fun with this. Now

[08:59]

how about in one word which is

[09:04]

which is better maybe Harvard

[09:08]

or Stanford question mark hope you

[09:11]

picked right let's see the answer is

[09:16]

depends okay so would not in fact oblige

[09:19]

but notice what I keep doing in this

[09:21]

code I keep providing a prompt as the

[09:23]

human like in one sentence in one word

[09:25]

well if you want the AI to behave in a

[09:27]

certain A why don't we just tell the

[09:29]

underlying system to behave in that way

[09:31]

so I the human don't have to keep asking

[09:33]

it in one sentence in one sentence in

[09:34]

one word so we can actually introduce

[09:36]

one other feature that you'll hear

[09:38]

discussed in industry nowadays which is

[09:40]

not only a prompt from the user which

[09:42]

I'm going to now temporarily rename to

[09:44]

user prompt just to make clear it's

[09:46]

coming from the user I'm going to also

[09:47]

give our what's called a system prompt

[09:50]

by setting this equal to some

[09:52]

standardized instructions that I want

[09:55]

the AI to respect like limit your answer

[09:59]

to one sentence, quote unquote. And now,

[10:02]

in addition to passing in as input the

[10:05]

user prompt, I'm going to actually tell

[10:07]

Open III to use these instructions

[10:10]

coming from this other variable called

[10:13]

system prompt. So, in other words, I'm

[10:15]

still using the same underlying service,

[10:17]

but I'm handing it now not only what the

[10:18]

user typed in, but also this

[10:20]

standardized text limit your answer to

[10:22]

one sentence. So, the human like me

[10:24]

doesn't have to do that anymore. Let's

[10:26]

now go back to my terminal. run Python

[10:27]

of chat.py Pi once more and this time

[10:30]

we'll be prompted but now I can just ask

[10:32]

what is CS50 question mark and I'll

[10:35]

likely get a correct and similar answer

[10:39]

to before and indeed it's Harvard

[10:41]

University's flagship introductory

[10:42]

computer science course dot dot dot so

[10:44]

seems spot on too but now we can have

[10:46]

some fun with this too and you might

[10:48]

know that these GPTs nowadays have sort

[10:50]

of personalities you can make them

[10:52]

obliged to behave in one way or another

[10:54]

why don't we go into our system prompt

[10:55]

here and say something silly like

[10:57]

pretend You're a cat. And now let's go

[11:00]

back to the prompt one final time. Run

[11:03]

Python of chat.py. Prompt again will be

[11:06]

say what is CS50? And with a final

[11:09]

flourish of hitting enter, what do we

[11:11]

get back?

[11:14]

CS50 is Harvard University's

[11:15]

introductory computer science course

[11:16]

teaching programming algorithms, data

[11:18]

structures, and problem solving. And

[11:19]

it's available free online. Meow. So

[11:21]

that was enough to coersse this

[11:23]

particular behavior. So this is to say

[11:26]

that with programming, you have the

[11:27]

ability in like 10 lines of text, not

[11:30]

all of which you might understand yet,

[11:32]

but that's the whole point of a class

[11:33]

like this to build fairly powerful

[11:35]

things, maybe silly things like this,

[11:37]

but in fact, it's using these same

[11:39]

primitives that CS50 has its own virtual

[11:41]

rubber duck. And we'll talk more about

[11:42]

this in the weeks to come, but long

[11:44]

story short, in the world of

[11:45]

programming, it's kind of a thing to

[11:47]

keep a rubber duck literally on your

[11:49]

desk or really any inanimate cute object

[11:51]

like this because when you are

[11:53]

struggling with some problem, some bug

[11:55]

or mistake in your code and you don't

[11:57]

have a friend, a teaching assistant, a

[11:58]

parent or someone else who's more

[12:00]

knowledgeable than you about code, well,

[12:02]

you literally are encouraged in

[12:03]

programming circles to like talk to the

[12:05]

rubber duck. And it's through that

[12:07]

process of just verbalizing your

[12:08]

confusion and organizing your thoughts

[12:10]

enough to convey it to another person or

[12:13]

duck in this case that so often that

[12:14]

proverbial light bulb goes off and you

[12:16]

realize ah I'm being an idiot now I hear

[12:18]

in my own thoughts the ill logic or the

[12:20]

mistake I'm making and you solve that

[12:22]

problem as well. So CS50 drawing

[12:24]

inspiration from this will give to you a

[12:27]

virtual duck in computer form and in

[12:29]

fact among the other URLs you'll use

[12:31]

over the course of the semester is that

[12:32]

here cs50.ai AI which is also built into

[12:35]

that previous URL cs50.dev dev whereby

[12:38]

these are the AIS you can use in CS50 to

[12:41]

solve problems and you are encouraged to

[12:43]

do so as you'll see in the course

[12:44]

syllabus it is not reasonable it is not

[12:46]

allowed to use AI based software other

[12:48]

than CS50's own be it claw Gemini chat

[12:51]

GPT or the like but it is reasonable and

[12:53]

very much encouraged along the way to

[12:55]

turn not only to humans like me your

[12:57]

teaching assistant and others in the

[12:59]

class but to CS50's own AI based

[13:01]

software and what you'll find is that

[13:02]

this virtual duck is designed to behave

[13:05]

as close to a good human tutor as you

[13:08]

might expect from an actual human in the

[13:10]

real world knows about CS50 knows how to

[13:12]

lead you to a solution ideally without

[13:14]

simply spoiling it and providing it

[13:16]

outright. So with that said that's sort

[13:19]

of the endgame to be able to write code

[13:21]

like that and more. But let's really

[13:23]

start back at the beginning and see how

[13:25]

we can't get from zeros and ones that

[13:28]

computers speak all the way back to

[13:30]

artificial intelligence. So computer

[13:32]

science is the in the name of the course

[13:34]

computer science 50. But what is that?

[13:35]

Well, it's really just the study of

[13:37]

information. How do you represent it?

[13:39]

How do you process it? And very much

[13:40]

gerine to computer science is what the

[13:42]

world calls computational thinking,

[13:43]

which is just the application of ideas

[13:45]

from computer science or CS to problems

[13:49]

generally in the real world. And in

[13:51]

fact, that's ultimately, I dare say,

[13:52]

what computer science really is. It's

[13:54]

about problem solving. And even though

[13:56]

we use computers, you learn how to

[13:58]

program along the way, these are really

[13:59]

just tools and methodologies that you

[14:02]

can leverage to solve problems. Now,

[14:04]

what does that mean? Well, a problem is

[14:06]

perhaps most easily distilled into a

[14:08]

simple picture like this. We've got some

[14:10]

input, which is like the problem we want

[14:11]

to solve, and the output, which is the

[14:13]

goal we want, the solution there, too.

[14:15]

And then somewhere in the middle here is

[14:16]

the proverbial black box, the sort of

[14:18]

secret sauce that gets that input from

[14:20]

output. So, this then I would say is in

[14:22]

essence is problem solving and thus

[14:24]

computer science. But we have to agree,

[14:27]

especially if we're going to use

[14:28]

devices, Macs, PCs, phones, whatever.

[14:30]

How do we all represent information, the

[14:32]

inputs and the outputs, in some

[14:34]

standardized way? Is it with English? Is

[14:36]

it with something else? Well, you all

[14:37]

probably know, even if you're not

[14:38]

computer people, that at the end of the

[14:40]

day, computers somehow use zeros and one

[14:43]

entirely. That is their entire alphabet.

[14:45]

And in fact, you might be familiar

[14:47]

already with certain such systems. So

[14:49]

the unary uh notation, which means you

[14:52]

essentially use single digits like

[14:54]

fingers on your hand. For instance,

[14:55]

unary aka base one is something you can

[14:57]

do on your own human hand. So for

[14:59]

instance, with one human hand, how high

[15:00]

can I count?

[15:02]

>> All right, so hopefully 1 2 3 4 5 and if

[15:05]

you want to count to six and uh to 11

[15:08]

and 10 and so forth, you need to, you

[15:10]

know, take out another hand or your toes

[15:12]

or the like because it's fairly

[15:13]

limiting. But if I think a little

[15:14]

harder, instead of just using unary,

[15:16]

what if I use a different system

[15:18]

instead? What about something like

[15:20]

binary? Well, how high if you think a

[15:22]

little harder can you count on one human

[15:23]

hand?

[15:25]

So 31 says someone who studied computer

[15:27]

science before. But why is that? It's

[15:29]

kind of hard to imagine, right? Because

[15:31]

1 2 3 4 5 seems to be the five possible

[15:34]

patterns. But that's only when you're

[15:36]

looking at the totality of fingers that

[15:37]

are actually up. Five in total or four

[15:39]

in total or one or the like. But what if

[15:41]

we take into account the pattern of

[15:43]

fingers that are up and we just

[15:44]

standardize what each of those fingers

[15:46]

represent? So maybe we all agree like a

[15:49]

good computer would too that maybe no

[15:51]

fingers up means the number zero. And if

[15:53]

we want to count to one, let's go with

[15:54]

the obvious. This is now one. But

[15:57]

instead of two being this, which was my

[15:59]

first instinct, maybe two can just be

[16:02]

this. A single second finger up like

[16:05]

this. And that means we could now use

[16:08]

two fingers up to represent three. I'll

[16:11]

propose we can use just one middle

[16:13]

finger up to offend everyone, but

[16:15]

represent four. I could maybe use these

[16:18]

two fingers with some difficulty to

[16:20]

represent five, six, seven. I'm already

[16:24]

up to seven having used only three

[16:26]

fingers. And in fact, if we keep going

[16:27]

higher and higher, I bet I can get as

[16:30]

high as 31 for 32 possible combinations,

[16:33]

but the first one was zero. So that's as

[16:35]

high as we can count. So we'll make this

[16:36]

connection in just a moment. But what I

[16:38]

started to do there is something called

[16:40]

base 2. Instead of just having fingers

[16:42]

up or fingers down, I'm taking into

[16:44]

account the positions of those fingers

[16:46]

and giving meaning to like this finger

[16:49]

here, this finger here, this finger here

[16:51]

and so forth. Different weights if you

[16:52]

will. So the binary system is indeed all

[16:56]

computers understand. And you might be

[16:58]

familiar with some terminology here.

[17:00]

Binary digit is not really something

[17:01]

anyone really says, but the shorthand

[17:03]

for that is going to be bit. So if

[17:06]

you've heard of bits and we'll soon see

[17:08]

bytes and then kilobytes and megabytes

[17:10]

and gigabytes and terabytes and more.

[17:12]

This just refers to a bit meaning a

[17:15]

single binary digit either a zero or a

[17:19]

one. A zero is perhaps most simply

[17:22]

represented by just like turning maybe

[17:24]

keeping a finger down or in the world of

[17:26]

computers which have access to

[17:28]

electricity be it from the wall or maybe

[17:30]

a battery. You know what we could do? We

[17:33]

could just decide sort of universally

[17:35]

that when a light bulb is off, that

[17:38]

thing represents a zero. And when the

[17:39]

light bulb is on, that thing's going to

[17:41]

represent a one instead. Now, why is

[17:44]

this? Well, electricity is such a simple

[17:45]

thing, right? It's either flowing or

[17:47]

it's not. And we don't even have to

[17:49]

therefore worry about how much of it is

[17:51]

flowing. And if you're vaguely remember

[17:53]

a little bit about voltage, we can sort

[17:54]

of be like zero volts, nothing's there

[17:56]

available for us. Or maybe it's 5 volts

[17:58]

or something else in between. But what's

[18:00]

nice about binary only using zeros and

[18:03]

ones is that it maps really nicely to

[18:05]

the real world by like throwing a light

[18:07]

switch on and off. You can represent

[18:09]

information by just using a little bit

[18:11]

of electricity or the lack thereof. So

[18:13]

what do I mean by this? Well, suppose we

[18:15]

want to start counting using binary

[18:18]

zeros and ones only. Well, let's think

[18:20]

of them metaphorically as like akin to

[18:22]

these light bulbs here. And in fact, let

[18:24]

me grab a few of these light bulbs and

[18:25]

let me propose that if we want to

[18:27]

represent the number zero, well, it

[18:29]

stands to reason that here single light

[18:32]

bulb that is off can be agreed upon as

[18:34]

representing zero. Now, in practice,

[18:37]

computers don't have little light bulbs

[18:38]

inside, but they do have little switches

[18:40]

inside. Millions of tiny little things

[18:43]

called transistors that if turned on can

[18:45]

allow it to capture a little bit of

[18:47]

electricity and effectively turn on a

[18:49]

metaphorical bulb or the switch can go

[18:51]

off. the transistor can go off and

[18:53]

therefore let the electricity dissipate

[18:54]

and you have just now a zero.

[18:56]

Unfortunately, even though I can let

[18:59]

some electricity, there's the battery I

[19:02]

mentioned is required. Even though we

[19:04]

might have some electricity available to

[19:06]

us, I can therefore count to one. But

[19:08]

how do I go about counting?

[19:11]

Hardware problem. How do I go about

[19:13]

counting higher than one with just a

[19:16]

light bulb?

[19:18]

Yeah. So, I need more of them. So, let

[19:20]

me grab another one here. And now I

[19:22]

could put it next to it. And this two

[19:24]

I'll claim is just still the number one.

[19:26]

But if I want to turn two of them on,

[19:28]

well, that would mean I could count to

[19:30]

two. And if I maybe grab another one,

[19:32]

now I can count as high as three. But

[19:34]

wait a minute. I'm doing something wrong

[19:36]

because with three human fingers, how

[19:37]

high was they able to count?

[19:40]

So, seven in total, starting at zero.

[19:42]

So, I've done something wrong here. But

[19:43]

let me be a little more clever than

[19:45]

about the pattern that I'm actually

[19:46]

using. Perhaps this can still be one.

[19:50]

But just like my finger went up and only

[19:52]

one finger in the second version of

[19:54]

this, this can be what we represent as

[19:58]

two. Which one do I want to turn on as

[20:00]

three? Your left or your right?

[20:03]

>> So you're right because now this matches

[20:05]

what I was doing with my fingers a

[20:06]

moment ago. And I claimed we could

[20:08]

represent three like this. If we want to

[20:09]

represent four, that's fine. We have to

[20:11]

turn that off, this off, and this on.

[20:15]

And that's somehow four. And let's go

[20:17]

all the way up to seven. Which ones need

[20:19]

to be on to represent the number seven?

[20:21]

All right. So, all of them here. Now, if

[20:23]

you're not among those who just sort of

[20:25]

naturally said all of them, like what

[20:27]

the heck is going on? How do half the

[20:29]

people in this room know what these

[20:30]

patterns are supposed to be? Well, maybe

[20:32]

you're remembering what I did with my

[20:33]

fingers. But it turns out you're already

[20:35]

pretty familiar with systems like this,

[20:37]

even if you might not have put a name to

[20:39]

it. So in the human world, the real

[20:41]

world, most of us deal every day with

[20:43]

the so-called base 10 system, otherwise

[20:45]

known as decimal deck implying 10

[20:47]

because in the decimal system you have

[20:49]

10 digits available to you, 0 through 9.

[20:52]

In the binary system, we only had two by

[20:54]

implying two. So 0 and one and unary we

[20:58]

had just one, a single digit there or

[21:01]

not. So in the decimal system, we just

[21:03]

have more of a vocabulary to play with.

[21:05]

And yet you and I have been doing this

[21:07]

since grade school. So this is obviously

[21:08]

the number 123. But why? It's

[21:11]

technically just three symbols. 1 2 3.

[21:14]

But most of us, your mind ego goes,

[21:16]

okay, 123. Pretty obvious, pretty

[21:18]

natural. But at some point, you like me

[21:21]

were probably taught that this is the

[21:22]

one's place and this is the 10's place

[21:25]

and this is the 100's place and so

[21:28]

forth. And the reason that this pattern

[21:30]

of symbols 1 2 3 is 123 is that we're

[21:35]

all doing some quick mental math and

[21:36]

realizing well that's 100* 1 + 10 * 2 +

[21:39]

1 * 3. Oh, okay. There's how we get 100

[21:42]

+ 20 + 3 gives us the number we all know

[21:45]

mathematically is 123. Well, it turns

[21:48]

out whether you're using decimal or

[21:50]

binary or other base systems that we'll

[21:52]

talk about later in the course, the

[21:53]

system is still fundamentally the same.

[21:55]

Let's kind of generalize this away.

[21:56]

Here's a three-digit number in some base

[21:59]

system specifically in decimal. And I

[22:01]

know that only because of the

[22:02]

placeholders that I've got on top of

[22:04]

each of these numbers. But if we do a

[22:05]

little bit of math here, 1 10 100 1,000

[22:09]

10,000 and so forth. What's the pattern?

[22:11]

Well, technically this is 10^ the 0 10

[22:13]

the 1 10 the 2 and so forth. And we're

[22:16]

using 10 because we can use as many as

[22:19]

10 digits under each of those columns.

[22:22]

But if we take some of those digits away

[22:23]

and go from decimal down to binary, the

[22:26]

motivation being it's way easier for a

[22:29]

computer to distinguish electricity

[22:30]

being on or off than coming up with like

[22:34]

10 unique levels of electricity to

[22:36]

distinguish among. You could do it. It

[22:38]

would be annoying and difficult to build

[22:39]

in hardware. You could do it so much

[22:41]

simpler to just say on and off. It's a

[22:45]

nice simple world that way. So let's

[22:47]

change the base from 10 to two. And what

[22:50]

does this get us? Well, if we now do

[22:51]

undo the math, that's 2 to the 0 is 1. 2

[22:54]

to the 1 is 2. 2 to the 2 is 4. So the

[22:57]

ma the mental math is now about to be

[22:59]

the same, but the columns represent

[23:01]

something a little bit different. So for

[23:03]

instance, if I turn all of these off

[23:05]

again, such that I've got off, off off,

[23:08]

otherwise known as 0 0, it's zero

[23:12]

because it's 4 * 0 + 2 * 0 + 1 * 0 still

[23:17]

gives me zero. By contrast, if I turn on

[23:20]

maybe just this one all the way over on

[23:22]

the left, well, that's four times one

[23:26]

because on represents one and off

[23:28]

represents 0 plus 2 * 0 + 1 * 0, that

[23:31]

gives me four. And if I turn both of

[23:33]

these on, such that all three of them

[23:36]

are now on, on on aka one, one, one,

[23:40]

that's 4 * 1 + 2 * 1 + 1 * 1. That then

[23:45]

gives me seven. And we can keep adding

[23:47]

more and more bits to this. In fact, if

[23:49]

we go all the way up uh numerically,

[23:51]

here's how we would represent in binary

[23:53]

the number you and I know is zero.

[23:55]

Here's how we would represent one.

[23:58]

Here's how we would represent two and

[24:00]

three and four and five. And you can

[24:03]

kind of see in your mind's eye now

[24:04]

because I only have zeros and ones and

[24:06]

no twos or threes, not to mention nines,

[24:09]

I'm essentially going to be carrying a

[24:11]

one in a moment if we were to be doing

[24:12]

some math. So to go from five to six,

[24:15]

that's why the one ends up in the middle

[24:17]

column. To go to seven here gives us now

[24:19]

1 one or on on on. How do I represent

[24:22]

eight

[24:24]

using ones and zeros? Yeah,

[24:27]

>> we need to add another digit.

[24:28]

>> Yeah. So we're going to need to add

[24:29]

another digit. We need to throw hardware

[24:31]

at the problem using an additional digit

[24:33]

so that we actually have a column

[24:35]

representing eight. Now, as an aside,

[24:37]

and we'll talk about this before long,

[24:38]

if you don't have an additional digit

[24:41]

available, if your computer doesn't have

[24:43]

enough memory, so to speak, you might

[24:45]

accidentally count from 0 1 2 3 4 5 6 7

[24:49]

and then accidentally end up back at

[24:51]

zero. Because if there's no room to

[24:53]

store the fourth bit, well, all you have

[24:56]

is part of the number. And this is going

[24:58]

to create all sorts of problems then

[25:00]

ultimately in the real world. So let me

[25:02]

go ahead and put these back and propose

[25:04]

that we have a system now. If you agree

[25:08]

to sort of count numbers in this way via

[25:10]

which we can represent information in

[25:12]

some standard way and all the device

[25:14]

underneath the hood needs is a bit of

[25:16]

electricity to make this work. It's got

[25:18]

to be able to turn things on aka use

[25:20]

some transistors and it's got to be able

[25:21]

to turn those things off so as to

[25:23]

represent zeros instead of ones. But the

[25:26]

reality is like two bits, three bits,

[25:28]

four bits aren't very useful in the real

[25:30]

world because even with three bits you

[25:32]

can count to seven, with four you can

[25:33]

count to 15. These aren't very big

[25:36]

numbers. So it tends to be more common

[25:38]

to actually use units of measure of

[25:40]

eight bits at a time. A bite is just

[25:43]

that one bite is eight bits. So if

[25:46]

you've ever used the vernacular of

[25:47]

kilobytes, megabytes, gigabytes, that's

[25:49]

just referring to some number of bits.

[25:52]

But eight of them together compose one

[25:55]

individual bite. So here for instance is

[25:58]

a bite worth of bits. Eight of them

[26:00]

total. I've added all the additional

[26:02]

placeholders. And what number does this

[26:04]

represent in decimal even though you're

[26:06]

looking at eight binary digits?

[26:09]

>> Just zero cuz like literally every

[26:10]

column is a zero. Now this is a bit more

[26:13]

of mental math but unless you know it

[26:15]

already. What if I change all of the

[26:16]

zeros to ones? I turn all eight light

[26:18]

bulbs on. What number is this?

[26:21]

>> Yeah. So 255. Now some of those of you

[26:24]

who didn't get that instantly, that's

[26:25]

fine. You could certainly do the math

[26:26]

manually. I dare say some of you have

[26:28]

some prior knowledge of how to do this

[26:30]

sort of system. But 255 means that if

[26:34]

you start counting at zero and you go

[26:35]

all the way up to 255, okay, that's 256

[26:39]

total possibilities once you include

[26:41]

zero in the total number of patterns of

[26:44]

zeros and ones. And this is just going

[26:45]

to be one of these common numbers in

[26:47]

computer science. 256. Why? because it's

[26:50]

referring to eight of something. 2 to

[26:52]

the 8 gives you 256. And so you're going

[26:56]

to commonly see certain values like

[26:57]

that. 256. Back in the day, computers

[26:59]

could only show 256 colors on the

[27:02]

screen. Certain graphics formats

[27:04]

nowadays that you might download can

[27:06]

only use as many as 256 colors because,

[27:08]

as we'll see, they're only using, for

[27:10]

instance, eight bits, and therefore they

[27:13]

can only represent so many colors of the

[27:15]

rainbow as a result. So this then is how

[27:20]

we might go from just zeros and ones

[27:22]

electricity inside of a computer to

[27:24]

storing actual numbers with which we're

[27:26]

familiar. And honestly we can go higher

[27:27]

than 255. What do you need to count

[27:29]

higher than 255? A 9th bit, a 10th bit,

[27:32]

an 11th bit and so forth. And it turns

[27:34]

out common conventions nowadays and

[27:36]

we'll see this in code too is to use as

[27:38]

many as 32 bits at a time. So that's a

[27:42]

good chunk of bits. And anyone want to

[27:43]

ballpark how high you can count count if

[27:45]

you've got 32 bits available to you?

[27:50]

Oh, fewer people now. Yeah, in the back.

[27:53]

>> Yeah. So, it's roughly 4 billion. And

[27:55]

it's technically two billion if you also

[27:56]

want to represent negative numbers, but

[27:58]

we'll revisit that question. But 2 to

[28:00]

the 32nd power is roughly 4 billion.

[28:03]

However, nowadays it's even more common

[28:05]

with the Macs and PCs you might have on

[28:07]

your laps and even your phones nowadays

[28:08]

to use 64 bits, which is a big enough

[28:11]

number that I'm not even sure offhand

[28:13]

how to pronounce it. That's a lot of

[28:15]

permutations. That's 2 to the 64

[28:17]

possible permutations, but that's

[28:19]

increasingly common place. And as an

[28:21]

aside, just to dovetail things with our

[28:22]

discussion of AI, among the reasons that

[28:24]

we're living through over these past few

[28:26]

years, especially this crazy interesting

[28:28]

time of AI, is because computers have

[28:31]

been getting so much faster,

[28:33]

exponentially so over time, they have so

[28:35]

much more memory available to them.

[28:37]

There's so much data out there on the

[28:38]

internet in particular to train these

[28:40]

models that it's an interesting

[28:42]

confluence of hardware now actually

[28:44]

meeting the mathematics and statistics

[28:45]

that we'll talk about later in the class

[28:47]

that ultimately make tools like the cat

[28:49]

we just built possible. But of course

[28:52]

computers are not all math and in fact

[28:53]

we'll use very little math per se in

[28:55]

this class. And so let's move away

[28:57]

pretty quickly from just zeros and ones

[28:59]

and talk about letters of the alphabet.

[29:00]

Say in English here is the letter A.

[29:03]

Suppose you want to use this letter in

[29:05]

an email, a text message, or any other

[29:07]

program. What is the computer doing

[29:08]

underneath the hood? How can the

[29:10]

computer store a capital letter A in

[29:14]

English? If at the end of the day, all

[29:16]

the computer has access to is a source

[29:18]

of electricity from the wall or from a

[29:21]

battery and it has a lot of switches

[29:24]

that it can turn on and off and treat

[29:26]

the electricity in units of 8 or 32 or

[29:28]

64 or whatever.

[29:31]

How might a computer represent a letter

[29:33]

A?

[29:36]

>> Yeah, we need to give it an identity so

[29:38]

to speak as an integer. In other words,

[29:40]

at the end of the day, if your entire

[29:42]

canvas, so to speak, consists only of

[29:44]

zeros and ones. Like that is going to be

[29:46]

the answer to every question today. You

[29:48]

only have zeros and ones as the solution

[29:50]

to these problems. We just need to agree

[29:53]

what pattern of zeros and ones and

[29:54]

therefore what integer, what number

[29:57]

shall be used to represent the letter A.

[30:00]

And hopefully when we look at that

[30:02]

pattern of zeros and ones in the right

[30:04]

context, we'll indeed see it as an A. So

[30:06]

if we look inside of a computer so to

[30:07]

speak in the context of like a text

[30:09]

messaging program or a word processor or

[30:12]

anything like that, that pattern shall

[30:14]

be interpreted hopefully as a capital

[30:15]

letter A. But if I open up Mac OS's or

[30:17]

Windows or my phone's calculator

[30:19]

program, I would want that same pattern

[30:21]

of zeros and ones to be interpreted

[30:24]

instead as a number. If I open up

[30:26]

Photoshop, as we'll soon see, I want

[30:28]

that same pattern of zeros and ones to

[30:30]

be interpreted as a color presumably,

[30:33]

not to mention videos and sound and so

[30:34]

forth, but it's all just zeros and ones.

[30:36]

And so, even though I, when writing that

[30:38]

chat program a few minutes ago, didn't

[30:41]

have to worry about telling the

[30:42]

computer, oh, this is text, this is a

[30:44]

number, this is something else. We'll

[30:46]

see as we write code ourselves that you

[30:48]

as the programmer will have control over

[30:50]

telling the computer how to treat some

[30:53]

pattern of zeros and ones telling it

[30:55]

this is a number, this is a color, this

[30:56]

is a letter or something else. Um, how

[30:59]

do we represent the letter A? Well,

[31:01]

turns out a bunch of humans in a room

[31:03]

years ago decided ah this pattern of

[31:05]

zeros and ones shall be known globally

[31:08]

as a capital letter English A. What is

[31:12]

that number if you do the quick mental

[31:13]

math? So indeed 65 because we had a one

[31:16]

in the 64's place and a one in the onees

[31:19]

place. So 65 that's just sort of it. It

[31:21]

would have been nice if it were just the

[31:22]

number one or maybe the number zero. But

[31:25]

at least after the capital letter A,

[31:27]

they kept things consistent such that if

[31:30]

you want to represent a letter B, it's

[31:32]

going to be 66. Capital letter C, it's

[31:34]

going to be 67. Why? Because the humans

[31:36]

in this room, a bunch of Americans at

[31:38]

the time, standardized on what's called

[31:39]

ASKI, the American standard code for

[31:42]

information interchange. doesn't matter

[31:43]

what the acronym represents, but it was

[31:46]

just a mapping. Someone on a piece of

[31:47]

paper essentially started writing down

[31:49]

letters of the alphabet and

[31:50]

corresponding numbers so that computers

[31:53]

subsequently could all speak that same

[31:55]

standard representation. And here's an

[31:57]

excerpt thereof. In this case, we're

[31:59]

seeing seven bits worth, but eventually

[32:00]

we ended up using eight bits in total to

[32:03]

represent letters. And some of these are

[32:05]

fairly cryptic. Maybe more on those

[32:06]

another time. But down here, if we

[32:08]

highlight just one column, we'll see

[32:09]

that indeed on this cheat sheet, 65 is

[32:12]

capital A, 66 is B, 67 is C, and so

[32:16]

forth. So, why don't we do a little

[32:18]

exercise here? What pattern of zeros and

[32:21]

ones do I see here? I've got three

[32:23]

bytes, so three sets of eight bits. And

[32:26]

even though there's no placeholders now

[32:28]

over the columns, what is this

[32:31]

number?

[32:33]

It's 60. Yeah. Yeah. So, we got the

[32:36]

ones, twos, fours, 8s, uh, 16, 32, 64s

[32:42]

column. So, indeed, this is going to be

[32:43]

the number 72. 72. This is not what

[32:46]

computer scientists spend their day

[32:48]

doing. This is just to reinforce what it

[32:49]

is we just looked at. And I'll spoil it.

[32:51]

The rest of these numbers are 72 73 33.

[32:54]

And anyone in this room could have done

[32:56]

that if you took out a piece of paper,

[32:57]

figured out what the columns are, and

[32:59]

just do a bit of quick or mental or

[33:01]

written math. But this is to say,

[33:03]

suppose that you just got a text message

[33:04]

or an email that if you had the ability

[33:07]

to look underneath the hood of the

[33:09]

computer and see what pattern of zeros

[33:11]

and ones did you just receive over the

[33:13]

internet. Suppose that pattern of zeros

[33:14]

and ones was three bytes of bits, which

[33:18]

when you do the math are the numbers 72,

[33:20]

73, 33. Well, here's the cheat sheet

[33:23]

again. What message did you just get?

[33:26]

>> Yeah. So, it's high. Why? Because 72 is

[33:29]

H and 73 is I. Now, some of you said hi

[33:32]

fairly emphatically. Why? Well, 33 turns

[33:35]

out, and you wouldn't know this unless

[33:36]

you looked it up or someone told you, is

[33:39]

an exclamation point. So, literally, if

[33:41]

you were to text someone like right now,

[33:42]

if you haven't already, hi exclamation

[33:45]

point in all caps, you would essentially

[33:47]

be sending three bytes of information

[33:49]

somehow over the internet to that

[33:51]

recipient. And because their phone

[33:53]

similarly understands ASI because it was

[33:56]

programmed years ago to do so, it knows

[33:58]

to show you hi exclamation point and not

[34:02]

a number three numbers no less or colors

[34:04]

or something else altogether. So here we

[34:07]

then have hi three digits in a row here.

[34:10]

Um what else is worth noting here? Well,

[34:12]

there's some fun sort of trivia embedded

[34:15]

even in this cheat sheet. So here again

[34:16]

is a b cde e fg and so forth. 65 on

[34:20]

down. Let me just highlight over here

[34:23]

the lowercase letters 97 98 99 and so

[34:27]

forth. If I go back and forth, does

[34:30]

anyone notice the consistent pattern

[34:33]

between these two?

[34:35]

>> Yeah. So, the lowercase letters are 32

[34:38]

away from the uppercase letters. Well,

[34:40]

how do we know that? Well, 97 - 65 is

[34:43]

Yeah. 32. Uh 98 - 66 is okay. 32. And

[34:48]

that pattern continues. What does this

[34:50]

mean? Well, computers know how to do

[34:51]

this. Most normal humans don't need this

[34:53]

information. But what it means is if you

[34:55]

are representing in binary with your

[34:58]

transistors on and off representing some

[35:00]

pattern and this is the pattern

[35:01]

representing capital letter A, which is

[35:03]

why we have a one in the 64's place and

[35:05]

a one in the onees place. How does a

[35:08]

computer go about lowercasing this same

[35:11]

letter? Yeah,

[35:15]

>> perfect. All the computer has to do is

[35:17]

change this one bit in the 32's place to

[35:21]

a one because that has the effect

[35:22]

mathematically per our discussion of

[35:24]

adding the number 32 to whatever it is.

[35:27]

So it turns out you can force text from

[35:29]

uppercase to lowerase or back by just

[35:31]

changing a single bit inside of that

[35:34]

pattern of eight bits in total. All

[35:37]

right, why don't we maybe reinforce this

[35:38]

with another quick exercise? We have an

[35:40]

opportunity perhaps here for um maybe to

[35:42]

give you some stress balls right at the

[35:44]

very start of class. Could we get eight

[35:45]

volunteers to come up on stage? Maybe

[35:48]

over here and over here and uh over here

[35:51]

on the left. Let me go all the way on

[35:52]

the right. Uh let's see. Okay, the high

[35:55]

hand here. The the hand that's highest

[35:57]

there. Yes, we're making eye contact.

[35:58]

How about all the way? Wait, let's see.

[36:00]

Let's go here in the crimson sweatshirt

[36:03]

here. And how about in the the white

[36:04]

shirt here? Come on up. Did I count

[36:06]

correctly? Let's see.

[36:10]

Come on down. The eight of you. I didn't

[36:13]

count right, did I? 1 2 3 4 5 6. It's

[36:17]

ironic that I'm not counting correctly.

[36:18]

Eight here. How about on the left in

[36:20]

gray? Okay. Oh, and uh Okay. In black

[36:23]

here. Come on down. All right.

[36:24]

Hopefully, this is eight. 1 2 3 4 5 6 7.

[36:30]

I pretty. Okay. Eight. There we go. All

[36:32]

right. So, let's go ahead and do the

[36:34]

following exercise. I've got some sheets

[36:36]

of paper preprinted here. If each of you

[36:38]

indeed want to do exactly what you're

[36:39]

doing and line up from left to right,

[36:40]

each of you is going to represent a

[36:42]

placeholder essentially. So we have over

[36:45]

here the ones place all the way over

[36:47]

here. And then we have the two's place

[36:50]

and the four's place and the eights

[36:54]

16

[36:56]

32 64 128. And we come bearing a

[37:00]

microphone if each of you want to say a

[37:02]

quick hello. your name, maybe your dorm

[37:03]

or house, and something besides computer

[37:05]

science that you're studying or want to.

[37:08]

>> Hi, I'm Oh, that's loud. Okay. I'm

[37:10]

Allison. I'm a freshman in Matthews and

[37:15]

um I like climbing and I'm thinking of

[37:17]

CS and econ.

[37:19]

>> Number two.

[37:20]

>> Hi, I'm Lily. I'm in Herbut this year

[37:23]

and I'm thinking of doing CS in

[37:25]

government.

[37:26]

>> Nice to meet.

[37:27]

>> Hi. Hi, I'm Sean. I'm in candidate hall

[37:30]

and I'm thinking of doing astrophysics

[37:32]

and CS.

[37:33]

>> Welcome.

[37:34]

>> Hi, I'm Jordan. I'm doing applied math

[37:36]

with a specialization in CS and econ.

[37:40]

And um I'm in Wigglesworth and I like

[37:43]

going to the gym.

[37:44]

>> Okay, nice. 16.

[37:46]

>> Hi, I'm Shiv. I'm studying Macki and I'm

[37:49]

in Canada.

[37:50]

>> Nice.

[37:51]

>> Hi, I'm Sophia. I'm in the think of

[37:55]

doing electrical engineering.

[37:57]

>> Welcome. Hi, my name is Marie and I'm in

[38:00]

Canada B and I really like CS physics

[38:03]

and astrophysics.

[38:05]

>> Hi, I'm Alyssa. I'm in Hullworthy. I'm

[38:09]

also thinking of studying math or

[38:11]

physics and I also like to climb.

[38:13]

>> Nice. Welcome to you all. So, on the

[38:16]

backs of their sheets of paper, they

[38:18]

have a little cheat sheet that's

[38:19]

describing what they should do in each

[38:20]

of three rounds. We're going to spell

[38:22]

out together a threeletter word. You all

[38:24]

as the audience have a cheat sheet above

[38:26]

you that represents numbers to letters.

[38:28]

These folks don't necessarily know what

[38:30]

they're spelling. They only know what

[38:31]

they individually are spelling. So if

[38:33]

your sheet of paper tells you to

[38:34]

represent a zero in a given round, just

[38:36]

kind of stand there awkwardly, no hands

[38:38]

up. But if you're told on your sheet of

[38:40]

paper to represent a one, just raise a

[38:42]

single hand to make obvious to the

[38:43]

audience that you're representing a one

[38:45]

and not a zero. And the goal here is to

[38:47]

figure out what we are spelling using

[38:48]

this system called ASKI. All right,

[38:51]

round one, execute.

[38:55]

What number is this here?

[39:00]

I'm hearing You can just shout it out.

[39:02]

What number?

[39:04]

>> 66 or B. So, you're spelling B. All

[39:07]

right, hands down. Round two.

[39:11]

More math.

[39:15]

Feel free to shout it out.

[39:18]

>> Oh, I heard it. Yeah. 79, which is

[39:20]

>> O. Okay, so we have B O. Hands down.

[39:23]

Third and final round. Execute

[39:27]

number

[39:30]

87.

[39:31]

>> Yes. 87. Which is the letter?

[39:34]

>> W. Which spells

[39:36]

>> bow? If you want to take your bow now.

[39:39]

>> Ah, okay. Here we go. You guys can keep

[39:41]

those.

[39:44]

Okay. Thank. All right. You guys can

[39:46]

head back. Thank you to our volunteers

[39:48]

here. Very nicely done. We indeed

[39:50]

spelled out bow and that's just because

[39:52]

we all standardized on representing

[39:54]

information in exactly the same way

[39:56]

which is why when you type b on your

[39:59]

phone or your computer the recipient

[40:00]

sees the exact same thing but what's

[40:03]

noteworthy in this discussion is that

[40:05]

you can't spell a huge number of words

[40:07]

like yeah English okay we've got that

[40:09]

covered but odds are you're noticing

[40:11]

depending on your own background what

[40:12]

human languages you read or speak

[40:14]

yourself um that a whole bunch of

[40:16]

symbols might be missing from your

[40:17]

keyboard for instance we have accented

[40:19]

characters here in a lot of Asian

[40:21]

languages there's so many more glyphs

[40:22]

than we could have even fit in that

[40:23]

cheat sheet of numbers and letters and

[40:26]

so ASI is not the only system that the

[40:28]

world uses it was one of the earliest

[40:30]

but we've moved on in modern times to a

[40:33]

superset of ASI that's generally known

[40:35]

as Unicode and Unicode uses so many more

[40:38]

bits than ASI that we even have room for

[40:41]

all of these little things that we seem

[40:42]

to send constantly nowadays these are

[40:45]

obviously images that you might send

[40:47]

with your phone or your computer but

[40:48]

they're technically ally characters.

[40:51]

They're technically just patterns of

[40:52]

zeros and ones that have similarly been

[40:54]

standardized around the world to look a

[40:57]

certain way, but they're this is an

[40:59]

emoji keyboard in the sense that you're

[41:01]

sending characters. You're not sending

[41:03]

images per se. The characters are

[41:05]

displayed as images obviously, but

[41:07]

really these are just like characters in

[41:09]

a different font and that font happens

[41:11]

to be very colorful and graphical as

[41:13]

well. So, Unicode instead of using just

[41:16]

seven or eight bits, which if you do the

[41:18]

quick mental math, if ASKI only used

[41:20]

seven or let's say eight bits, how many

[41:23]

possible characters can you represent in

[41:25]

ASKI alone?

[41:27]

256. Because if we do that quick mental

[41:29]

math, 2 to the eth 256 possibilities,

[41:31]

like that's it. That is that's enough

[41:33]

for English because you can cram all the

[41:34]

uppercase letters, the lowercase

[41:36]

letters, the numbers, and a whole bunch

[41:37]

of punctuation as well. But it's not

[41:39]

enough for certain other punctuation

[41:41]

symbols, not to mention many other human

[41:43]

languages. And so the Unicode

[41:45]

Consortium, its charge in life has been

[41:47]

to come up with a digital representation

[41:49]

of all human language, past, present,

[41:52]

and hopefully future by using not just

[41:55]

seven or eight bits, but maybe 16 bits

[41:57]

per character, 24 bits, or heck, even 32

[42:01]

bits per character. And per before, if

[42:03]

you've got as many as 32 bits available

[42:05]

to you, you can represent what, like 4

[42:07]

billion characters in total. And that's

[42:10]

just one of the reasons why these emoji

[42:11]

have kind of exploded in popularity and

[42:13]

availability. There's just so many darn

[42:15]

patterns. Like, what else are we going

[42:17]

to do with all of these zeros and ones?

[42:19]

But more importantly, emoji have been

[42:21]

designed to really represent people and

[42:23]

places and things and emotions in a way

[42:25]

that transcends human language. But even

[42:28]

then, they're somewhat open to

[42:29]

interpretation. In fact, here's a

[42:31]

pattern of I think 32 zeros and ones.

[42:35]

I'm guessing no one's going to do the

[42:36]

quick mental math here, but this

[42:37]

represents what decimal number if we do

[42:39]

in fact do out the math with that's

[42:41]

being the ones place all the way over to

[42:43]

the left. Well, that's the number 4

[42:44]

bill36,991,16.

[42:47]

Who knows what that is? It's not a and

[42:50]

it's nothing near a uppercase or

[42:52]

lowercase, but it is among the most

[42:54]

popular emoji that you might send

[42:56]

typically on your phone, laptop, or

[42:58]

other device. namely this thing here

[43:00]

face with tears of joy which odds are

[43:03]

you've sent or received recently but

[43:06]

interestingly even though many of you

[43:07]

might have iPhones and see and send the

[43:10]

same image you'll notice that if you see

[43:12]

a friend who's got Android or some other

[43:14]

device maybe you're using uh Meta's

[43:16]

messenger program or Telegram or some

[43:19]

other messaging service sometimes these

[43:21]

emoji look a little bit different why

[43:23]

because what a Unicode has done is they

[43:25]

decided there shall exist an emoji known

[43:28]

known as excuse me faced with tears of

[43:31]

joy then Apple and Google and Microsoft

[43:34]

and others they're sort of free to

[43:36]

interpret that as they see fit. So what

[43:37]

you see on the screen here is a recent

[43:39]

version from iOS, Apple's operating

[43:41]

system. Google's version of the same

[43:43]

looks a little something like this. And

[43:44]

on Telegram, if you have animations

[43:46]

enabled, the same idea faced with tears

[43:48]

of joy is actually animated. But it's

[43:50]

the same pattern of zeros and ones in

[43:53]

each case. But again, they each

[43:55]

essentially have different graphical

[43:56]

fonts to present to you what each of

[43:59]

those images actually is. All right. So,

[44:02]

those are each, excuse me, images.

[44:08]

So, those are each images. How is the

[44:11]

computer representing them though? At

[44:13]

the end of the day, we've represented

[44:15]

numbers, we've represented letters, but

[44:18]

how about these things here, colors? So,

[44:21]

how do we represent red or green or

[44:24]

blue, not to mention every other color

[44:25]

in between? At the end of the day, we

[44:28]

only have one canvas at our disposal.

[44:30]

Yeah,

[44:32]

so integers is the exact same answer as

[44:34]

before. We just need to agree on what

[44:36]

number do we use for red, what do we use

[44:38]

for green, what do we use from blue, and

[44:40]

we can come up with some standardized

[44:42]

pattern for this. In fact, one of the

[44:43]

most common techniques for doing this

[44:45]

and the common one of the most common

[44:46]

ways to do this in the real world is to

[44:48]

use a combination of three colors

[44:50]

together. Some amount of red, some

[44:52]

amount of green, and some amount of

[44:54]

blue, and mix them together to get most

[44:56]

any color of the rainbow that you might

[44:57]

want. This is sort of a a picture of

[44:59]

something I grew up with back in the day

[45:01]

where in like middle school when we'd

[45:02]

watch movies or some kind of show in

[45:04]

like in in class, we would kind of uh

[45:07]

the projector screen would be over here.

[45:09]

This is a old school projector with

[45:11]

three different lenses, one of which

[45:13]

projects some amount of green, some

[45:14]

amount of red, some amount of blue. And

[45:16]

so long as the lenses are correctly

[45:18]

oriented to all point at the same circle

[45:21]

or like rectangular region on the

[45:22]

screen, you would see any number of

[45:25]

colors coming to life in the old school

[45:27]

video. I still remember all these years

[45:29]

later, we would kind of sit and lean up

[45:31]

against it because it was super warm and

[45:32]

you could hear it easy way to fall

[45:34]

asleep back in grade school. But we use

[45:36]

the same fundamental color system

[45:38]

nowadays as well, including in modern

[45:40]

programs like Photoshop. So let's

[45:42]

abstract that away. focus on just three

[45:44]

colors, some amount of red, green, and

[45:46]

blue. And let's suppose for the sake of

[45:48]

discussion that we want to mix together

[45:50]

like a medium amount of red, a medium

[45:53]

amount of green, and just a little bit

[45:54]

of blue. For instance,

[45:57]

let's suppose that we'll use 72 amount

[46:00]

of red, 72 amount 73 amount of green or

[46:04]

or 33 amount of blue, RGB. Now, why

[46:07]

these numbers? Well, in the context of

[46:09]

ASI or Unicode, which is just a

[46:11]

supererset thereof, what does this

[46:13]

spell?

[46:15]

>> Hi. But again, if you were instead to

[46:17]

open a file containing these three

[46:19]

numbers or really these three bytes of

[46:22]

bits in Photoshop, you would hope that

[46:25]

they're going to be interpreted not as

[46:27]

letters on the screen, but as some m uh

[46:30]

the the color of a dot on the screen

[46:32]

instead. So it turns out that in

[46:35]

typically when you have a three of these

[46:37]

numbers together each of them is using a

[46:39]

single bite. So eight bits. So you can

[46:42]

have zero red or 255 red. Zero green or

[46:46]

255 green or 0 to 255 of blue. So zero

[46:50]

is none, 255 is the max. So if we mix

[46:53]

these together, imagine that just like

[46:56]

that projector consolidating these three

[46:58]

colors into one central point. Anyone

[47:00]

want to guess what you're going to get

[47:02]

if you mix some red, some green, some

[47:03]

blue in those amounts in way back?

[47:08]

>> Yeah, you're going to get a dark shade

[47:09]

of yellow. I've brightened it up a

[47:11]

little bit for the projector here, but

[47:12]

you're going to get roughly this shade

[47:14]

of yellow. And we could play with these

[47:15]

numbers all day long and get similar

[47:17]

results if we want to represent

[47:19]

different colors as well. And indeed,

[47:21]

whether it's Photoshop or some other

[47:22]

program, you can actually combine these

[47:24]

amounts in all sorts of ratios to get

[47:27]

different colors. So if you had 0 0 0,

[47:29]

so no red, no green, no blue, take a

[47:31]

guess as to what color that's going to

[47:33]

be in the computer,

[47:34]

>> so it's going to be black, like the

[47:35]

absence of all three of those colors.

[47:37]

But if you mix the maximal amount of

[47:38]

each of those 255, red and green and

[47:41]

blue, that's going to give you white.

[47:43]

Now, if any of you have made web pages

[47:45]

before or use programs like Photoshop,

[47:47]

you might have seen numbers like 00 or

[47:50]

FF. Long story short, that's just

[47:52]

another base system for representing

[47:54]

numbers between 0ero and 255 as well.

[47:57]

But we'll come back to that mid-semester

[47:59]

when we make some of our own filters uh

[48:01]

in sort of an Instagram-like way,

[48:02]

manipulating images of our own. So,

[48:06]

where are these colors coming from or

[48:07]

where can we actually see them? Well,

[48:09]

here's just a picture of that same emoji

[48:10]

face with tears of joy. If I kind of

[48:12]

zoom in on that and maybe zoom in again,

[48:15]

you can start to see if you blow it up

[48:17]

enough or if you put your eyes close

[48:18]

enough to the device, sometimes you can

[48:20]

actually see individual dots or squares.

[48:23]

These are generally known as pixels. And

[48:26]

they're just the individual dots that

[48:27]

collectively compose an image. Which is

[48:30]

to say that if each of these dots, which

[48:33]

is part of the image, is going to be a

[48:34]

distinct color. Like this one's yellow,

[48:37]

this one's brown, and then there's a

[48:38]

bunch in between. Well, you're using

[48:40]

some number of bits to represent each of

[48:43]

those pixels colors. So, if you imagine

[48:46]

using the RGB system, that's 8 + 8 + 8

[48:50]

bit. So, that's 24 bits or three bytes

[48:54]

just to keep track of the color of each

[48:56]

and every one of these dots. So now, if

[48:59]

you think about having downloaded a GIF

[49:00]

at some point, a ping, PNG file, um a

[49:04]

JPEG or any other file format, it's

[49:06]

usually measured in what file size? like

[49:08]

megabytes typically that means millions

[49:10]

of bytes. Why? Because if it's a pretty

[49:12]

big photograph or pretty big image, each

[49:15]

of those dots takes up at least three

[49:17]

bytes it would seem. And if you do out

[49:19]

the math, if you got thousands of dots,

[49:21]

each of which uses three bytes, you're

[49:23]

going to quickly get to megabytes, if

[49:25]

not even larger for things like say

[49:27]

videos. But again, it's just patterns of

[49:29]

zeros and ones. And so long as the

[49:31]

programmer knows what they're doing and

[49:33]

tells the computer how to interpret

[49:35]

those zeros and ones. And equivalently,

[49:37]

so long as the software knows, look at

[49:39]

these zeros and ones and interpret them

[49:40]

as numbers or letters or colors, we

[49:44]

should see what we intended to

[49:46]

represent. All right, so that's num

[49:48]

that's uh colors and images. What about

[49:51]

how many of you kind of played with

[49:52]

these little flip books as a kid where

[49:54]

they've got like a hundred different

[49:55]

little pictures and you flip through

[49:56]

them really quickly and you see what

[49:58]

looks like animation in book form. Well,

[50:00]

this is essentially a video. So

[50:02]

therefore, what is a video or how can

[50:04]

you think of what a video is? It's just

[50:07]

a whole bunch of like images flying

[50:08]

across the screen either on paper or

[50:10]

digitally nowadays on your phone or your

[50:12]

laptop. And that's kind of nice because

[50:13]

we're sort of composing more interesting

[50:16]

media now based on these lower level

[50:18]

building blocks. And this is going to be

[50:19]

thematic. We literally started with

[50:20]

zeros and ones. We worked our way up to

[50:22]

letters. We then worked our way up to

[50:24]

sort of images and uh colors and thus

[50:27]

images. Now we're up at this level of

[50:29]

hierarchy in terms of video because

[50:31]

what's a video? It's like 30 images per

[50:34]

second flying across the screen or maybe

[50:37]

slightly fewer than that. That

[50:38]

collectively tricks our mind into

[50:40]

thinking we are seeing motion pictures.

[50:42]

And that's the old school term for

[50:44]

movies, but it literally is what it was.

[50:46]

motion pictures was this film was

[50:48]

showing you 30 pictures per second and

[50:50]

it looks like motion even though you're

[50:52]

just looking at images much like this

[50:54]

flip book very quickly one after the

[50:56]

other. What about music? Well, how could

[50:58]

you go about representing musical notes

[51:01]

if again your only ingredients are zeros

[51:05]

and ones? Even if you're not a musician,

[51:07]

how do you represent music like that on

[51:09]

the screen here? Yeah. Okay. So, the

[51:12]

frequency like the tone that you're

[51:13]

actually hearing from the device. What

[51:15]

else might weigh in beside besides the

[51:17]

frequency of the note? Yeah.

[51:20]

>> So the speed of the note or maybe the

[51:21]

duration like if you think about a

[51:23]

physical piano like how long you're

[51:24]

holding the key down for or not. What

[51:26]

else? So the amplitude maybe how loud

[51:29]

like how hard did you hit the keyboard

[51:31]

to generate that sound. So let me

[51:33]

propose at the risk of simplifying we

[51:35]

could represent each of these notes

[51:36]

using three numbers. maybe 0 to 255 or

[51:39]

some other range that represents the

[51:41]

frequency or the pitch of the note, the

[51:43]

duration, and the loudness. And so long

[51:46]

as the person receiving a file

[51:48]

containing all of those zeros and ones

[51:50]

knows how to interpret them three at a

[51:52]

time, I bet you could share uh a musical

[51:55]

file with someone else that they could

[51:57]

hear in exactly the same way that you

[52:00]

yourself intended. Let me pause here to

[52:04]

see if there's any questions now because

[52:06]

we've already built our way up from

[52:07]

zeros and ones now to video and sound.

[52:12]

>> Yeah, in front.

[52:13]

>> How does the computer know differentiate

[52:15]

between what the letter like 65 would be

[52:19]

and then what the number 65?

[52:20]

>> So, how does the computer distinguish

[52:22]

between the letter 65 and the number 65?

[52:24]

It's context dependent. So put simply

[52:27]

and we'll see this as early as next week

[52:29]

the programmer tells the computer how to

[52:31]

display the information either as a

[52:33]

number or a letter or equivalently once

[52:36]

programmed the software knows that when

[52:38]

it opens a GIF file or JPEG or something

[52:42]

else to interpret those zeros and ones

[52:45]

as colors instead of as like docx for a

[52:48]

Microsoft Word file or the like. Other

[52:51]

questions on any of these

[52:53]

representations?

[52:56]

Yeah. In front. Can we

[52:56]

>> go over like the base 10 base 2 thing

[52:59]

like really briefly?

[53:00]

>> Sure. So, can we go over base 10 and

[53:02]

base two? So, base 10 is like literally

[53:04]

the numbers you and I use every day.

[53:06]

It's base 10 in the sense that you have

[53:08]

10 digits at your disposal. 0 through 9.

[53:11]

And any numbers you want to represent in

[53:13]

the real world must be composed using 0

[53:15]

through 9. The binary system or base 2

[53:18]

is fundamentally the same. It's just the

[53:20]

computer doesn't have access to two

[53:22]

through 9. It only has access to zero

[53:24]

and one. But much like the light bulbs I

[53:26]

was displaying here, you can simply

[53:28]

ascribe different weights to each of the

[53:31]

digits. So that instead of it being as

[53:33]

much as the ones place, the 10's place,

[53:35]

and the hundred's place, if we more

[53:36]

modestly say the ones place, the two's

[53:38]

place, the four's place, we can use the

[53:40]

same system. In binary, you might need

[53:42]

to use more digits to count as high

[53:46]

because in 255, you can just write 255.

[53:49]

That's three digits in decimal. But in

[53:51]

binary, we've seen you need to use eight

[53:53]

such digits, which is more, but it's

[53:56]

still much better than unary, which

[53:57]

would have had 255 light bulbs on

[54:01]

instead.

[54:02]

>> And is

[54:04]

binary and like the same thing.

[54:06]

>> Is binary and base 2 the same thing?

[54:08]

Yes. Just like base 10 and decimal are

[54:11]

the same thing as well. And unary and

[54:13]

base 1 are the same thing as well. All

[54:15]

right. So let me just stipulate that

[54:18]

even though we sort of took this tour

[54:19]

quickly at the end of the day computers

[54:20]

only have zeros and ones at their

[54:22]

disposal. So again the answer to any

[54:23]

question as to how can we represent X is

[54:27]

going to somehow involve permuting those

[54:29]

zeros and ones into patterns or

[54:31]

equivalently into the numbers that they

[54:33]

represent. But if we now have a way to

[54:35]

represent all inputs in the world be it

[54:37]

letters, numbers, images, videos,

[54:39]

anything else and get output from some

[54:42]

problem-solving process like how do we

[54:44]

actually solve problems? Well, the

[54:45]

secret sauce in the middle here is

[54:46]

another term that you've probably heard

[54:47]

in the real world nowadays, which is

[54:49]

that of algorithm. Stepbystep

[54:52]

instructions for solving some problem.

[54:54]

So, this ultimately is what computer

[54:56]

science really is about too, is not just

[54:58]

representing information, but somehow

[55:00]

processing it, doing something

[55:01]

interesting with it to actually solve

[55:03]

the problem that you've been provided as

[55:05]

input so you can output the correct

[55:07]

answer. Now, there's all sorts of

[55:09]

algorithms implemented in our phones and

[55:11]

in our Macs and PCs, and that's all

[55:13]

software is. It's an implementation in

[55:15]

code, be it C++ or Java or anything

[55:19]

else. Other languages exist too in code

[55:22]

that the computer understands, but it's

[55:24]

still just step-by-step instructions.

[55:26]

And among the things we'll learn in CS50

[55:27]

is how to express yourself in different

[55:29]

ways to solve problems, not only in

[55:31]

different languages, but using different

[55:33]

methodologies as well. Because as we'll

[55:35]

see, among the reasons we introduce

[55:36]

these several languages is you don't

[55:38]

just learn more and more languages that

[55:40]

allow you to solve the same problems.

[55:42]

Different languages will allow you to

[55:44]

solve different problems and even save

[55:46]

you time by being better tools for the

[55:48]

job. So here for instance on uh an

[55:50]

iPhone is maybe a bunch of contacts

[55:52]

which is presumably familiar where we

[55:54]

might have a whole bunch of friends and

[55:56]

family and whatnot alphabetized by first

[55:58]

name or last name and suppose we want to

[55:59]

find one such person like John Harvard

[56:01]

whose number here might be plus1

[56:03]

949-4682750.

[56:05]

Feel free to call or text him sometime.

[56:07]

Um this is the goal of this problem. If

[56:10]

we have our contacts app and I start

[56:12]

typing in John's name by first name or

[56:14]

last name, the autocomplete nowadays

[56:16]

kicks in and it somehow filters the list

[56:19]

down from my 10 friends or 100 friends

[56:21]

or a thousand friends into just the

[56:22]

single directory entry that matches. So

[56:25]

here too, back in the days of RG&B um

[56:29]

projector, we had uh phone books like

[56:31]

this here too. Um I'm pleased to say

[56:33]

thanks to our friend Alexis, this is the

[56:35]

largest phone book that we've used for

[56:36]

this demonstration. Uh, this is an old

[56:38]

school phone book that's essentially the

[56:40]

same thing as our contacts app or

[56:41]

address book nowadays whereby I've got a

[56:43]

whole bunch of names and numbers

[56:46]

alphabetically sorted by first name or

[56:47]

last name, whatever, and corresponding

[56:49]

to each of those as a number. So, back

[56:51]

in the day and frankly even nowadays in

[56:53]

your phones, how do you go about finding

[56:55]

someone in a phone book or your contacts

[56:57]

app? Well, you could very naively just

[56:59]

start at the beginning and look down and

[57:01]

just turn one page at a time looking for

[57:04]

John Harvard in this case. Now, so long

[57:06]

as I'm paying attention, this

[57:07]

step-by-step process will get me to John

[57:11]

Harvard. Like, this is a correct

[57:12]

algorithm, even though you might kind of

[57:15]

object to how I'm doing this. Why? Like,

[57:18]

what's bad about this algorithm?

[57:21]

>> It's just slow. I mean, this is crazy

[57:22]

slow. If there's like a thousand pages

[57:24]

in this phone book, which looks like

[57:25]

there are, like this could take me as

[57:26]

many as a thousand pages, or maybe he's

[57:28]

roughly in the middle, like 500 pages.

[57:30]

Like, that's crazy. That's really rather

[57:32]

slow, especially if I'm going to do this

[57:33]

again and again. Well, what if I do it a

[57:35]

little smarter? Grade school, I sort of

[57:37]

learned how to count two at a time. So,

[57:38]

2 4 6 8 10 12 14 16 18. Again, if I'm

[57:44]

paying attention, I'll get there twice

[57:46]

as fast because I'm counting two at a

[57:48]

time. But is that algorithm step by step

[57:50]

correct?

[57:51]

And I'm seeing no, but why?

[57:55]

>> I might skip over John Harvard. So, just

[57:57]

by bad luck and kind of with 50/50

[57:59]

probability, he's going to be sandwiched

[58:01]

between two of the pages. Now, I don't

[58:03]

have to abort this algorithm alto

[58:04]

together. I could just as soon as I get

[58:06]

past the J section if we're doing it by

[58:08]

first name. I could just double back one

[58:10]

page and just make sure that I haven't

[58:12]

missed him. So, it's recoverable. And

[58:14]

this algorithm therefore is sort of

[58:15]

twice as fast plus one extra step maybe

[58:18]

to double back. But that's arguably

[58:20]

otherwise a bug or a mistake in the

[58:22]

algorithm if I don't fix it

[58:23]

intelligently. But what did we do back

[58:25]

in the day? And what does your iPhone or

[58:26]

Android phone do? What they typically do

[58:28]

is they go roughly to the middle, look

[58:31]

physically or virtually down. They see,

[58:33]

"Oh, I'm in the M section." And so,

[58:35]

which side is John Harbor to? To the

[58:37]

left or to the right? So, he's to the

[58:39]

left. So, I could literally now

[58:44]

Jesus Christ.

[58:47]

We talked about this before class that

[58:49]

this might be more Oh my god. There we

[58:52]

go. We can tear the problem in half.

[58:54]

Thank you.

[58:59]

It's been a while. We can tear the

[59:01]

problem in half. We know that John

[59:03]

Harvard is to the left. So, I can throw

[59:06]

half of the problem away if uh

[59:08]

dramatically such that I'm now gone from

[59:10]

a thousandpage problem to 500 pages

[59:13]

instead. What now can I do? I can go

[59:14]

roughly to the middle here and maybe I'm

[59:16]

in the E section. So, I went a little

[59:18]

too far back to the left, but I kept it

[59:19]

simple and I just divided so that I can

[59:21]

conquer this problem, if you will. And

[59:23]

if I'm in the E section now, is John

[59:24]

Harvard to the left or to the right? To

[59:26]

the right. So I can again Jesus Christ.

[59:32]

Tear the problem in half. And now, thank

[59:35]

you. So now John Harvard again is going

[59:38]

to be in this half. I can throw this

[59:39]

half away. So now I've gone from a,000

[59:41]

to 500 to 250. And I can repeat, repeat,

[59:43]

repeat down to 125. Half of that, half

[59:46]

of that, half of that until I'm left

[59:47]

with finally just a single page. And

[59:50]

John Harvard is hopefully now on this

[59:51]

page such that I can call him or not at

[59:54]

all at which point this is all sort of

[59:55]

for not. But what's powerful about each

[59:58]

of those algorithms is that the sort of

[60:00]

good better and best like they all get

[60:02]

the job done conditional on the second

[60:04]

one having that little fix just to make

[60:06]

sure I don't miss John Harbor between

[60:07]

two pages but they're fundamentally

[60:10]

different in their efficiency and the

[60:12]

quality of their design. And this is

[60:14]

really representative of one of the

[60:15]

emphases of a class like this. It's not

[60:17]

just about writing correct code or

[60:19]

getting the job done, but doing it well

[60:22]

and doing it quickly. Using the least

[60:24]

amount of CPU or computing resources,

[60:27]

using the minimal amount of RAM, using

[60:29]

the fewest number of people, using the

[60:31]

least amount of money, whatever your

[60:32]

constrained resource is, solving a

[60:35]

problem better. So that first algorithm

[60:37]

step-by-step instructions was all about

[60:40]

doing something like this whereby the

[60:43]

first algorithm if we plot things on a

[60:45]

grid like this we have on the x-axis a

[60:49]

representation of the size of the

[60:50]

problem. So this would mean small

[60:52]

problem like zero pages. This would mean

[60:54]

big problem like a thousand pages. And

[60:56]

on the y or vertical axis we have some

[60:58]

measurement of time. So this is the

[61:00]

number of seconds or the number of page

[61:01]

turns whatever your metric actually is.

[61:04]

So this would be uh not much time at

[61:06]

all, so fast. This would be a lot of

[61:08]

time, so slow. So what's the

[61:10]

relationship if we just roughly draw

[61:12]

these three algorithms? Well, the first

[61:13]

one is technically a straight line. And

[61:15]

we'll describe that as n. The slope is n

[61:17]

because if you think of n as a number

[61:19]

for the number of pages, well, there's a

[61:21]

one toone relationship in the first

[61:23]

algorithm as to how many times I have to

[61:25]

turn the page based on how many pages

[61:27]

there actually is. And you can think

[61:29]

about this in the extreme. If I was

[61:30]

looking for someone whose name started

[61:32]

with Z, I might have to go through like

[61:34]

a thousand darn pages to get to that

[61:36]

person whose name started with Z, unless

[61:38]

again I do something hackish and just

[61:40]

kind of cheat and go to the end. If we

[61:42]

execute these algorithms again and again

[61:43]

the same way, that's going to be pretty

[61:45]

slow. But the second algorithm was

[61:47]

pretty much twice as fast plus that one

[61:49]

extra step potentially. But it's still a

[61:51]

straight line because if there's a

[61:53]

thousand pages and I'm dividing the

[61:55]

problem and I'm doing two pages at a

[61:57]

time, well that's like n divided by two

[61:59]

steps plus one give or take. But it's

[62:01]

still a straight line because but it's

[62:04]

still better. Notice if this is the size

[62:06]

of the problem, a thousand pages for

[62:08]

instance, we'll notice that the first

[62:09]

algorithm took literally twice as much

[62:12]

time as the second algorithm. So we're

[62:14]

doing better already. But the third

[62:16]

algorithm fundamentally is going to look

[62:18]

something like this. And if you remember

[62:20]

your logarithm so to speak, sort of the

[62:22]

opposite of an exponential, this curve

[62:24]

is so much lower and flatter, if you

[62:27]

will, than either of these two

[62:29]

mathematically. More on this another

[62:30]

time. The slope is going to be like log

[62:32]

base 2 of n or just logarithmic in

[62:35]

nature. But what it means is that it's

[62:37]

growing very very very slowly. It's

[62:40]

still going up. It's never going to

[62:41]

flatline and go perfectly horizontal,

[62:43]

but it goes up very slowly. Why? Well,

[62:45]

if you think about two towns nearby,

[62:47]

like Cambridge on this side of the river

[62:48]

and the town of Alustin on the other,

[62:50]

suppose that they still have phone books

[62:52]

like this one, and they merge their

[62:54]

phone books for whatever reason. So,

[62:55]

overnight, we go from a thousandpage

[62:57]

phone book to a 2,000page phone book.

[63:00]

The first algorithm is going to take

[63:01]

literally twice as long as will the

[63:03]

second one because we're only going

[63:04]

through it one or two pages at a time.

[63:07]

But if the phone book size doubles from

[63:09]

this year, for instance, to next year,

[63:11]

you can kind of in your mind's eye think

[63:13]

about the green line. It's not going to

[63:15]

go up that much higher. Why? Well,

[63:18]

practically speaking, even if the phone

[63:20]

book becomes 2,000 pages long. Well, how

[63:24]

many more times do you have to tear or

[63:26]

divide that problem in half?

[63:29]

>> Just one. Because you're taking a,000

[63:31]

page bite out of it, or a 500 than a

[63:33]

250. you're taking much bigger bites out

[63:35]

of it than just one or two at a time.

[63:38]

And so what computer science and what

[63:40]

algorithms and about good design is

[63:42]

about is figuring out what is the logic

[63:44]

via which you can solve problems not

[63:46]

only correctly but efficiently as well.

[63:50]

And that then gives us these things

[63:51]

called algorithms. And when it comes

[63:53]

time to code, which we're about to do

[63:54]

too, code is just an implementation and

[63:57]

a language the computer understands of

[63:59]

an algorithm. Now this assumes that

[64:01]

we've come up with some digital way that

[64:03]

is to say zero in onebased way to

[64:05]

represent names and numbers. But

[64:08]

honestly we already did that. We came up

[64:09]

with a asky and then unicode to

[64:11]

represent the names. Representing

[64:13]

numbers is even easier than that. That's

[64:14]

really where we started. So code is just

[64:17]

about taking as input some standardized

[64:19]

representation of names and numbers and

[64:21]

spitting out answers. And that's truly

[64:23]

what iOS and Android are doing. When you

[64:25]

start doing autocomplete, they could be

[64:28]

searching from the top to the bottom,

[64:30]

which is fine if you've only got a few

[64:32]

friends and family in the phone. But if

[64:34]

you've got a thousand or if you've got

[64:35]

10,000 or if it's not a phone book

[64:38]

anymore, it's some database with lots

[64:39]

and lots of data. Well, it stands to

[64:41]

reason that it'd be nice maybe if the

[64:43]

computer kept it all alphabetized just

[64:45]

like that book and jumped to the middle,

[64:48]

then the middle of the middle, then the

[64:49]

middle of the middle of the middle, and

[64:50]

so forth. Why? because the speed is

[64:53]

going to be much much faster,

[64:55]

logarithmic in nature and not linear so

[64:58]

to speak in nature. But we'll revisit

[65:00]

those topics as well. But for now,

[65:02]

before we get into actual code, let's

[65:04]

talk for a moment about pseudo code. So

[65:07]

pseudo code is not one formal thing.

[65:09]

Every human will come up with their own

[65:10]

way of representing pseudo code. It's an

[65:13]

English-like or human-like formulation

[65:15]

of step-by-step instructions just using

[65:17]

tur correct English or whatever human

[65:20]

language. So, for instance, if I want to

[65:22]

translate what I did somewhat

[65:23]

intuitively with that phone book by just

[65:25]

dividing in half, dividing in half into

[65:27]

step-by-step instructions, I could hand

[65:29]

you or now it is like a robot or

[65:31]

something like that. Well, step one was

[65:33]

essentially to pick up the phone book,

[65:34]

which I did. Step two was I open to the

[65:37]

middle of the phone book in the third

[65:38]

and final algorithm. Step three was look

[65:40]

at the page as I did. Step four got a

[65:43]

little more interesting. Even though I

[65:44]

didn't verbalize this, presumably I was

[65:46]

asking myself a question. If the person

[65:48]

I'm looking for, John Harbert, is on the

[65:50]

page, then I would have called him right

[65:53]

then. But if he weren't on the page, if

[65:56]

he instead were earlier in the book, as

[65:59]

did happen, well then I'm going to go to

[66:01]

the left, so to speak, but more

[66:02]

methodically, I'm going to open to the

[66:04]

middle of the left half of the book.

[66:06]

Then I'm going to go back to line three.

[66:10]

That's interesting. We'll come back to

[66:11]

that in a moment. But else if the person

[66:12]

is later in the book, well, I'm going to

[66:14]

open to the middle of the right half of

[66:16]

the book and then go back to line three.

[66:20]

Now, let's pause here. Why do I keep

[66:22]

going back to line three? This would

[66:24]

seem to get me doing the same thing

[66:26]

forever endlessly.

[66:29]

But not quite. Why?

[66:31]

>> As soon as you hit the one the on.

[66:34]

>> Yeah. So because I am dividing the

[66:38]

problem in half, for instance, on line

[66:40]

six or line nine implicitly just based

[66:42]

on how I've written this, the problem's

[66:44]

getting smaller and smaller and smaller.

[66:46]

So it's fine if I keep doing the same

[66:47]

logic again and again because if the

[66:49]

problem's getting smaller, eventually

[66:50]

it's going to bottom out and I'm going

[66:52]

to have just one person on that page

[66:54]

that I want to call and so the algorithm

[66:56]

is done. But there is a perverse corner

[66:59]

case, if you will, and this is where

[67:00]

it's ever more important to be precise

[67:02]

when writing code and anticipate what

[67:04]

could go wrong. I should probably ask

[67:07]

one more question in this code, not just

[67:09]

these three. What might that question

[67:14]

be? Yeah.

[67:16]

>> John Harvard is in the book.

[67:18]

>> Yeah. So, if John Harvard is not in the

[67:19]

book, there's this corner case where

[67:21]

what if I'm just wasting my time

[67:22]

entirely and I get to the end of the

[67:24]

phone book and John Harvard's not there.

[67:25]

What should the computer do? Well, as an

[67:27]

aside, if you've ever been using your

[67:28]

Mac or PC or phone and the thing just

[67:30]

freezes or like the stupid little beach

[67:32]

ball starts spinning or something like

[67:33]

that and you're like, what is going on?

[67:35]

Some human at Google or Microsoft or

[67:37]

Apple or the like made a mistake. They

[67:40]

forgot for instance that fourth uncommon

[67:43]

but possible situation wherein if they

[67:45]

don't tell the computer how to handle

[67:46]

it, the computer's effectively going to

[67:48]

freak out and do something undefined

[67:51]

like just hang or reboot or do something

[67:53]

else. So we do want to add this else

[67:56]

quit altogether. So you have welldefined

[67:59]

behavior and truly think that the next

[68:01]

time your computer or phone

[68:02]

spontaneously reboots or dies or does

[68:05]

something wrong, it's probably not your

[68:07]

fault per se. It's some other human

[68:09]

elsewhere did not write correct code.

[68:11]

They didn't anticipate cases like these.

[68:14]

But now let's use some terminology here.

[68:16]

There's some salient ideas that we're

[68:17]

going to see in Scratch and C and Python

[68:20]

and these other languages I alluded to

[68:21]

earlier. Everything I've just

[68:23]

highlighted here, henceforth, we're

[68:25]

going to think of as functions.

[68:26]

Functions are verbs or actions that

[68:28]

really get some small piece of work done

[68:31]

for you. Functions are verbs or actions.

[68:34]

Here though, highlighted is the

[68:35]

beginning of what we'll call

[68:36]

conditionals. Conditional is like a fork

[68:38]

in the road. Do I go this way? Do I go

[68:40]

this way? Or some other way altogether.

[68:42]

How do you decide what road to go down?

[68:45]

We're going to call these questions you

[68:47]

ask yourself boolean expressions. Named

[68:50]

after a mathematician Bull. And a

[68:51]

boolean expression is just a question

[68:53]

that has a yes or no answer or a true or

[68:56]

false answer or a one or zero answer

[68:59]

just it's a binary state yes or no

[69:02]

typically. Otherwise we have this go

[69:04]

back to go back to which is what we're

[69:06]

generally going to call a loop which

[69:08]

somehow induces cyclical behavior again

[69:11]

and again. And those functions and those

[69:13]

conditionals, boolean expressions and

[69:15]

loops and a few other concepts are

[69:16]

pretty much what will underly all of the

[69:18]

code that we write whether it is in

[69:21]

scratch C or something else altogether.

[69:24]

But we need to get to that point and in

[69:26]

fact let's go and infer what this

[69:29]

program here does. At the end of the

[69:31]

day, computers only understand zeros and

[69:33]

ones. So I claim here is a program of

[69:35]

zeros and ones. What does it do?

[69:39]

Anyone

[69:41]

want to guess? I mean, we could spend

[69:42]

all day converting all of these zeros

[69:44]

and ones to numbers, but they're not

[69:46]

going to be numbers if it's code. What

[69:47]

do you think?

[69:49]

>> That's amazing. It does in fact print

[69:53]

hello world.

[69:55]

All right. So, no one except like maybe

[69:57]

you and me and a few others in the room

[69:58]

should know, and that was probably guess

[70:00]

admittedly or advancing on the slide.

[70:02]

But why is that? Well, it turns out that

[70:03]

not only do computers standardize

[70:05]

information, data like numbers and

[70:08]

letters and colors and other things,

[70:09]

they also standardize instructions. And

[70:11]

so, if you've heard of companies like

[70:13]

Intel or AMD or Nvidia or others, among

[70:17]

the things they do is they decide as a

[70:18]

company what pattern of zeros and ones

[70:21]

shall represent what functionality. And

[70:23]

it's very low-level functionality. those

[70:25]

companies and others decide that some

[70:27]

pattern of zeros and ones means add two

[70:30]

numbers together or subtract or

[70:32]

multiply. Another pattern might mean

[70:34]

load information from the computer's

[70:35]

hard drive into memory. Another might

[70:37]

mean store it somewhere else. Another

[70:40]

might mean print something out to the

[70:42]

screen. So nested somewhere in here and

[70:44]

admittedly I have no idea which pattern

[70:45]

off because it's not interesting enough

[70:47]

to go figure it out at this level says

[70:50]

print. And somewhere in there, like this

[70:52]

gentleman proposed, I bet we could find

[70:54]

the representation of H, which was 72

[70:58]

and E and L and L and O and everything

[71:01]

that composes hello world. Because, as

[71:02]

it turns out in programming circles, the

[71:04]

very first program that students

[71:06]

typically write is that of hello world.

[71:09]

Now, this one here is written in a much

[71:12]

more intelligible way. Even if you're

[71:14]

not a programmer, odds are if I asked

[71:15]

you, what does this program do? you

[71:17]

would have said,

[71:19]

"Oh, hello world." Even though there's a

[71:21]

lot of clutter here, like no idea what

[71:22]

this is until next week. Int main void.

[71:24]

That looks cryptic. There's these weird

[71:26]

curly braces, which we rarely use in the

[71:28]

real world, but at least I understand a

[71:29]

few words like hello in world. And this

[71:32]

is kind of familiar. Print f, but it's

[71:34]

not print, but it's probably the same

[71:35]

thing. So, here too is an example of

[71:37]

this hierarchy. Back in the day, in the

[71:40]

earliest days of computers, humans were

[71:42]

writing code by representing zeros and

[71:45]

ones. If you've ever heard your parents

[71:46]

talk about punch cards or the like,

[71:47]

you're effectively representing patterns

[71:49]

that tell the computer what to do or

[71:51]

what to represent, like literally holes

[71:53]

in paper. Well, pretty quickly early on

[71:55]

this got really tedious, only writing

[71:57]

code at such a low level. So, someone

[71:59]

decided, you know what, I'm going to put

[72:00]

in the effort. I'm going to figure out

[72:02]

what patterns of zeros and ones I can

[72:04]

put together so as to be able to convert

[72:07]

something more user friendly to those

[72:10]

zeros and ones. And as a teaser for next

[72:12]

week, that person invented the first

[72:14]

compiler. A compiler is just a program

[72:16]

that translates one language to another.

[72:18]

And more modernly, this is a language

[72:20]

called C, which we'll spend a few weeks

[72:21]

on together because it's so fundamental

[72:24]

to how the computer works. Even this is

[72:26]

going to get tedious by like week six of

[72:28]

the class. And this is going to get

[72:29]

stupid. This is going to get annoying.

[72:30]

This is going to get cryptic. We're just

[72:32]

going to write print hello on the screen

[72:35]

in order to use a different language

[72:36]

called Python. Why? because someone

[72:38]

wrote in C a program that can convert

[72:42]

Python, this is a white lie, to C which

[72:45]

can then be converted to zeros and ones

[72:47]

and so forth. So in computing there's

[72:49]

this principle of abstraction where we

[72:51]

start with the basics and thank god we

[72:53]

can all trust that someone else solved

[72:54]

these really hard problems or way uh

[72:57]

long ago. Then they wrote programs to

[72:59]

make it easier. We wrote programs to

[73:01]

make it easier. You can now write code

[73:02]

like I did with the chatbot to make

[73:04]

things even easier. Why? because OpenAI

[73:06]

and other companies have abstracted away

[73:08]

a lot of the lower level implementation

[73:10]

details. And that's where I think this

[73:12]

stuff gets really exciting. We can stand

[73:14]

on the shoulders of others so long as we

[73:15]

know how to use and assemble these kinds

[73:18]

of building blocks. And speaking of

[73:20]

building blocks, let's start here. Now,

[73:22]

odds are some of you might have started

[73:23]

here in like grade school playing with

[73:25]

Scratch. And it's great for like after

[73:26]

school programs, learning how to

[73:28]

program. And you probably used it this

[73:30]

language to make games and graphics and

[73:32]

just maybe playful art or the like. But

[73:34]

in Scratch, which is a graphical

[73:36]

programming language designed about 20

[73:38]

years ago from our friends down the road

[73:39]

at MIT's Media Lab, it represents pretty

[73:42]

much everything we're going to be doing

[73:44]

fundamentally over the next several

[73:46]

weeks in more modern languages like C

[73:49]

and Python, more textual languages, if

[73:51]

you will. I bet I could ask the group

[73:53]

here, what does this program do when you

[73:55]

click a green flag? Well, it says hello

[73:58]

world on the screen. Because with

[74:00]

Scratch, you have the ability to express

[74:02]

yourself with functions and loops and

[74:04]

conditionals and all of this, but by

[74:06]

using drag and drop puzzle pieces. So,

[74:09]

what we're about to do is this. We're

[74:10]

going to go on my screen to

[74:11]

scratch.mmit.edu.

[74:13]

It's a browserbased programming

[74:14]

environment, and we're only going to

[74:15]

spend one week, really a few days in

[74:18]

CS50 on this language. But the

[74:20]

overarching goal is to one make sure

[74:22]

everyone's comfortable applying some of

[74:24]

these building blocks and actually

[74:25]

developing something that's interesting

[74:26]

and visual and audio as well, but to

[74:29]

also give us some visuals that we can

[74:31]

rely on and fall back on when all of

[74:33]

those curly braces and parentheses and

[74:35]

sort of stupid syntax comes back that's

[74:38]

necessary in many languages but can very

[74:40]

quickly become a distraction early on

[74:42]

from the interesting and useful ideas.

[74:45]

So what we're about to see is this in a

[74:47]

browser. This is the Scratch programming

[74:49]

environment and there's a few different

[74:50]

parts of this world. This is the blocks

[74:52]

pallet so to speak. That is to say,

[74:54]

there's a bunch of puzzle pieces or

[74:55]

building blocks that represent functions

[74:58]

and conditionals and v and uh loops and

[75:01]

other such constructs. There's going to

[75:03]

be the programming area here where you

[75:04]

can actually write your code by dragging

[75:06]

and dropping these puzzle pieces.

[75:08]

There's a whole world of sprites here.

[75:10]

By default, Scratch is uh and is a cat

[75:13]

by design, but you can make Scratch look

[75:15]

like a dog, a bird, a garbage can, or

[75:17]

anything else as we'll soon see. And

[75:19]

then this is the world in which Scratch

[75:21]

itself lives. So Scratch can go up,

[75:23]

down, left, right, and generally be

[75:25]

animated within that world. For the

[75:26]

curious, kind of like high school

[75:28]

geometry class, there's sort of this XY

[75:30]

plane here. So 0 0 would be in the

[75:32]

middle. 0 180 is here. 0 comma 180 is

[75:36]

here. Uh -240 is here. and positive 240

[75:40]

0. Generally, you don't need to worry

[75:42]

about the numbers, but they exist. So

[75:44]

that when you say up or down, you can

[75:46]

actually tell the program go up one

[75:48]

pixel or 10 pixels or 100 pixels so that

[75:51]

you have some definition of what this

[75:52]

world actually is. All right, so let's

[75:56]

actually put this to the test. Let me go

[75:58]

ahead here and flip over to in just a

[76:00]

moment the actual Scratch website

[76:04]

whereby I'm going to have on my screen

[76:06]

in just a moment that same user

[76:08]

interface once I've logged in that via

[76:11]

which I can actually write some code of

[76:13]

my own. Let me go ahead and zoom in on

[76:15]

the screen a little bit here and let's

[76:17]

make the simplest of these programs

[76:18]

first. Maybe a program that simply says

[76:20]

hello world. Now at a glance it's kind

[76:22]

of overwhelming how many puzzle pieces

[76:24]

there are. And honestly, even over 20

[76:25]

years, I've never used them all. And MIT

[76:27]

occasionally adds to it. But the point

[76:28]

is that they're colorcoded to resemble

[76:31]

the type of functionality that they

[76:33]

offer. And also, it's meant to be the

[76:35]

sort of thing where you can just kind of

[76:36]

scroll through and get a visual sense of

[76:38]

like what you could do and then figure

[76:40]

out how you might assemble these puzzle

[76:41]

pieces together. So, I'm going to go

[76:43]

under this yellow or orangish category

[76:46]

here to begin with. So, there exists in

[76:48]

the world of Scratch not quite the same

[76:50]

jargon that I'm using now. functions and

[76:53]

conditionals and loops. That's more of

[76:55]

the programmer's way. This is more of

[76:56]

the child-friendly way, but it's really

[76:58]

the same idea. Under events, you have

[77:01]

puzzle pieces that represent things that

[77:02]

can happen while the world is running.

[77:06]

So, for instance, the first one here is

[77:07]

sort of the canonical when the green

[77:09]

flag is clicked. Why is that relevant?

[77:11]

Well, in the two-dimensional world that

[77:13]

Scratch lives in, there's a stop sign,

[77:14]

which means stop, and there's a green

[77:16]

flag, which means go. So, I can

[77:18]

therefore drag one of these puzzle

[77:20]

pieces over here so that when I click

[77:22]

that green flag, the cat will in fact do

[77:24]

something for me. Doesn't really matter

[77:26]

where I drop it, so long as it's

[77:27]

somewhere in the middle here. I'm going

[77:29]

to go ahead and let go. Now, I want the

[77:31]

look of the cat to change. I want to see

[77:33]

like a cartoon speech bubble come out

[77:35]

for now. So, I'm going to go under looks

[77:37]

here. And there's a bunch of different

[77:39]

ways to say things and think things. I'm

[77:41]

going to keep it simple and just drag

[77:42]

this one here. And now notice when I get

[77:44]

close enough to that first puzzle piece,

[77:47]

they're sort of magnetic and they want

[77:48]

to snap together. So I can just let go

[77:50]

and boom, because they're a similar

[77:52]

shape, they will lock together

[77:53]

automatically. And notice too, if I zoom

[77:55]

in here, the white oval, which by

[77:58]

default says hello, is actually editable

[78:00]

by me because it turns out that some

[78:03]

functions can take arguments or more

[78:06]

generally inputs that influence their

[78:08]

behavior. So, if I kind of click or

[78:09]

double click on this, I can change it to

[78:11]

the more canonical hello world or hello

[78:13]

David or hello whatever I want the

[78:15]

message to be. I'm going to go ahead and

[78:17]

zoom out. And now over here at top

[78:19]

right, notice that I can very simply

[78:21]

click the green flag. And I'll have

[78:23]

written my first program in Scratch. I

[78:26]

clicked the green flag, it said go. And

[78:28]

now notice it's sort of stuck on that

[78:30]

because I never said stop saying go. But

[78:32]

that's where I can click the red stop

[78:34]

sign and sort of get the cat back to

[78:36]

where I want it. So think about for just

[78:38]

a moment what it is we just did. So at

[78:40]

the one hand we have a very obvious

[78:43]

puzzle piece that says say and it said

[78:45]

something but it really is a function

[78:46]

and that function does take an input

[78:49]

represented by the white oval here

[78:51]

otherwise known as an argument or a

[78:53]

parameter. But what this really is is

[78:55]

just an input to the function. And so we

[78:57]

can map even this simple simple scratch

[79:00]

program onto our model of problem

[79:02]

solving before with an addition of what

[79:04]

we'll call moving forward a side effect.

[79:06]

A side effect in a computer program is

[79:09]

often something that happens visually on

[79:11]

the screen or maybe audibly out of a

[79:14]

speaker. It's something that just kind

[79:15]

of happens as a result of you using a

[79:17]

function like a speech bubble appearing

[79:19]

on the screen. So here more generally is

[79:21]

what we claimed it represents the

[79:23]

solving of a problem. And let's just

[79:25]

consider what the input is. The input to

[79:27]

this problem say something on the screen

[79:30]

is this white oval here that I typed in.

[79:32]

Hello world. The algorithm, the

[79:34]

step-by-step instructions are not

[79:36]

something really I wrote like our

[79:37]

friends at MIT implemented that purple

[79:40]

say block. So someone there knows how to

[79:43]

get the cat to say something out of its

[79:45]

uh comical mouth. So the algorithm

[79:47]

implemented in code is really equivalent

[79:50]

to the say function. So a function is

[79:52]

just a piece of functionality

[79:54]

implemented in code which in turn

[79:57]

implements an algorithm. So algorithm is

[79:58]

sort of the concept and the function is

[80:00]

actually the incarnation of it in code.

[80:03]

What's the output? Well, hopefully it's

[80:04]

this side effect seeing the speech

[80:06]

bubble come out of the cat's mouth like

[80:09]

this. All right, so that's one such

[80:12]

program, but it's always going to play

[80:14]

and look the same. What if I actually

[80:16]

want to prompt the human for their

[80:18]

actual name? Well, let me go back to the

[80:21]

puzzle pieces here. Let me go ahead and

[80:23]

throw this whole thing away. Okay. And

[80:24]

if you want to delete blocks, you can

[80:25]

either rightclick or control-click and

[80:27]

choose from a menu. Or you can just drag

[80:28]

them there and sort of let go and

[80:30]

they'll disappear. I'm going to go back

[80:32]

in and get another uh another event

[80:34]

block, even though I could have reused

[80:36]

that same one. I'm going to go ahead and

[80:38]

go under sensing now. And if I zoom in

[80:40]

over here, you'll see a whole bunch of

[80:42]

things like I can sense distance and

[80:44]

colors. But more pragmatically, I can

[80:46]

use this function in blue, ask

[80:49]

something, and then wait for the answer.

[80:51]

And what's different about this puzzle

[80:53]

piece is that it too is yes a function.

[80:55]

It too takes an argument, but instead of

[80:58]

having an immediate side effect like

[80:59]

displaying something on the screen, it's

[81:02]

essentially inside of the computer going

[81:04]

to hand me back the response. It's going

[81:07]

to return a value, so to speak. And a

[81:10]

return value is something that the code

[81:12]

can see, but the human can't. A side

[81:14]

effect is something the human sees, but

[81:15]

a return value is something only the

[81:17]

computer sees. It's like the computer is

[81:19]

handing me back the user's input. So,

[81:21]

how does this work? We'll notice, and

[81:23]

this is a bit strange. This isn't

[81:24]

usually how variables work, but Scratch

[81:26]

2 supports variables, and that was a

[81:28]

word I used quickly at the very start

[81:30]

when we were making the chatbot. A

[81:32]

variable like in math, X, Y, or Z, just

[81:34]

store some value, but it doesn't have to

[81:36]

store a number. In code, it can store

[81:38]

like a human name. So, what's going to

[81:40]

happen when I use this puzzle piece is

[81:42]

that once the human types in their name

[81:43]

and hits enter, MIT, or really Scratch

[81:46]

is going to store the answer, the

[81:48]

so-called return value in a variable

[81:50]

that's designed to be called answer.

[81:52]

But, as we'll see, you can make your own

[81:55]

variables down the line if you want and

[81:57]

call them anything you want. But, let me

[81:59]

go ahead and zoom out. Let me drag this

[82:01]

over here. I'm going to use the default

[82:02]

question, what's your name? But I could

[82:04]

certainly change the text there. And let

[82:06]

me go under looks again. Let me go ahead

[82:09]

and grab the say block and let me go

[82:12]

ahead and say just for consistency like

[82:14]

hello,

[82:16]

okay? And now let me go under maybe

[82:19]

sensing I want to say how do I want to

[82:21]

say this answer. Well, notice this. The

[82:23]

shapes are important. This too is an

[82:25]

oval even though it's not white but

[82:26]

that's just because it's not editable.

[82:27]

It's going to be handed to me by the ask

[82:30]

function. Let me zoom out and grab a

[82:32]

second say block like this. And notice

[82:36]

it will magnetically clip together. I

[82:38]

don't want to say hello again. So, I

[82:40]

could delete that. But now it's still

[82:42]

the same shape even though it's a little

[82:43]

smaller. Let me go back to sensing. And

[82:44]

notice what can happen here. When you

[82:46]

have values like words inside of a

[82:49]

so-called variable, you can use those

[82:51]

instead of manual input at your

[82:53]

keyboard. And notice it too wants to

[82:54]

magnetically snap into place. It'll grow

[82:57]

to fit that variable because the shape

[82:58]

is the same. And now let's do this. Let

[83:01]

me click the green flag at right. I'm

[83:03]

seeing quote unquote what's your name?

[83:05]

I'm getting a text box this time, like

[83:06]

on a web page for instance. Let me type

[83:09]

in my name and watch closely what comes

[83:10]

out of the cat's mouth as soon as I

[83:12]

click the check mark or hit enter.

[83:16]

Huh. Okay, I got my name right, but let

[83:18]

me do it once more. Let me stop and

[83:19]

start davvid.

[83:22]

Enter. No, it didn't work. Let me try

[83:25]

one other. Maybe it's my name. Let's try

[83:27]

Kelly. Enter. What's missing? Obviously,

[83:33]

the the hello. There's a bug, a mistake

[83:35]

in this program. But is there like what

[83:37]

explains this? Even if you've never

[83:38]

programmed before, intuitively, what

[83:40]

could explain why I'm not seeing hello?

[83:46]

>> Exactly. It's on two different lines.

[83:47]

So, it's doing one after the other. So,

[83:49]

it is happening. It's just you and I is

[83:51]

the slowest things in the room are just

[83:53]

not seeing it in time because it's

[83:55]

happening so darn fast. Because my

[83:57]

computer is so, you know, so new and so

[83:59]

fast, it's happening, but way too

[84:00]

quickly. So, how can we solve this? So

[84:02]

we can solve this in a few different

[84:04]

ways. And this is where in Scratch at

[84:06]

least for problems at zero when wherein

[84:07]

you'll have an opportunity to play

[84:09]

around with this. I can scroll around

[84:10]

here and okay under control I see

[84:13]

something like weight. So I can just

[84:14]

kind of slow things down. And now notice

[84:17]

too if you hover over the middle of two

[84:19]

blocks if it's the right shape it'll

[84:20]

just snap into the middle too. Or you

[84:23]

can just so you know kind of drag things

[84:25]

away to magnetically separate them. But

[84:27]

this might solve this. So let me hit

[84:28]

stop and then start davvid. Enter.

[84:33]

Hello, David. All right, that was a

[84:35]

little Let's do like maybe two seconds

[84:37]

to see it again. Green flag dab ID.

[84:40]

Enter. Hello,

[84:42]

David. All right, it's working better.

[84:45]

It's sort of more correct because I'm

[84:46]

seeing the hello and the David, but kind

[84:49]

of stupid, right, to see one and then

[84:51]

the other. Wouldn't it be nice to say it

[84:52]

all in one breath, so to speak? Well,

[84:54]

here's where we can maybe compose some

[84:56]

ideas. So, let me get rid of this weight

[84:58]

and the additional block. Let's confine

[84:59]

ourselves to just one say block. But let

[85:02]

me go down to operations where we

[85:04]

haven't been before. And this is

[85:06]

interesting. There's this bigger oval

[85:08]

here that says join two things like

[85:10]

apple and banana. And those are just

[85:12]

random placeholder words that you can

[85:13]

override with anything you want. But

[85:15]

they're both ovals and white, which

[85:16]

means I can edit them. So let me go

[85:18]

ahead and do this. Let me drag this on

[85:20]

top of the say block. And this is just

[85:22]

going to therefore uh override the hello

[85:25]

I put there. Now I don't want to say

[85:27]

apple or banana, but I do want to say

[85:29]

hello,

[85:31]

and I then want to say my name. Okay, so

[85:33]

now I can go back to sensing, go back to

[85:35]

answer, drag and drop this here. That'll

[85:39]

snap into place. And let me zoom in. Now

[85:41]

what I've done is take a function and on

[85:43]

top of it I've nested another function,

[85:45]

the join function that takes two

[85:47]

arguments or inputs and presumably joins

[85:50]

them together as per its name. So let's

[85:52]

see what this does for us. Let me click

[85:54]

stop and start. I'll type in David

[85:57]

enter. And it's so close. Now, this is

[86:00]

just kind of an aesthetic bug. What have

[86:01]

I done wrong here?

[86:04]

There's no space. So, it looks a little

[86:05]

wrong, but that's an easy fix. I just

[86:07]

need to literally go into the hello

[86:09]

block after the comma, hit the space

[86:11]

bar, so that now when I stop and start

[86:13]

again and type in David, now I see

[86:15]

something that's closer to the grammar

[86:17]

we might typically expect syntactically

[86:19]

here. All right. So, let's model this

[86:22]

after what we just saw earlier. We've

[86:24]

now introduced a so-called return value.

[86:27]

And this return value is something we

[86:29]

can then use in the way we want. It's

[86:31]

not happening immediately like the

[86:32]

speech bubble. It's clearly being passed

[86:34]

to me in some way that I can use to plug

[86:37]

in somewhere else like into that join

[86:38]

block. So if we consider the role of

[86:40]

these variables playing, let's consider

[86:42]

the picture now as follows. If the input

[86:44]

now to the first function, the ask block

[86:47]

is what's your name? Quote unquote,

[86:49]

that's indeed being fed into the ask

[86:51]

block. And the result this time is not a

[86:53]

speech bubble. It's not some immediate

[86:54]

visual side effect. It is the answer

[86:58]

itself stored in a so-called variable as

[87:01]

represented by this blue oval.

[87:03]

Meanwhile, what I want to do is combine

[87:06]

that answer with some text I came up

[87:08]

with in advance by kind of stacking

[87:10]

these things together. Now, visually in

[87:12]

Scratch, you're stacking them on top,

[87:13]

but it's really that you're passing one

[87:15]

into the other into the other because

[87:17]

much like math when you have the

[87:19]

parenthesis and you're supposed to do

[87:20]

what's inside the parenthesis and then

[87:21]

work your way out. Same idea here. You

[87:24]

want to join hello and answer together.

[87:26]

And whatever that output is, that then

[87:28]

becomes the input to the say block,

[87:30]

which like in math is outside of the

[87:33]

join block itself. So pictorially, it

[87:35]

might now look like this. There's two

[87:37]

inputs to this story. Hello, comma,

[87:39]

space, and the answer variable. The

[87:42]

puzzle piece in question is join. Its

[87:44]

goal in life had better be to give me

[87:46]

the full phrase that I want. Hello,

[87:48]

David. Let's shift everything over now

[87:50]

because that output is about to become

[87:52]

the input to the say block which itself

[87:55]

will now have the so-called side effect.

[87:58]

And so this too is what programming and

[88:01]

in turn what computer science is about

[88:02]

is composing with the solutions to

[88:05]

smaller problems solutions to bigger

[88:08]

problems using those component pieces.

[88:10]

And that's what each of these puzzle

[88:11]

pieces represents is a smaller problem

[88:13]

that someone else or maybe even you has

[88:16]

already solved. Now, we can kind of

[88:17]

spice things up here. If I go back to

[88:20]

Scratch's interface, we don't have to

[88:21]

use just the puzzle piece here. I can do

[88:23]

something like this. Let me go ahead and

[88:24]

drag these apart and get rid of the say

[88:26]

block down here. Just for fun, there's

[88:28]

all these extensions that you can add

[88:29]

over the internet to your own Scratch

[88:32]

environment. And if I go to like text to

[88:34]

speech down here, I can, for instance,

[88:36]

do uh a speak block instead of a say

[88:39]

block colored here in green. I can now

[88:41]

reconnect the join block in here. And if

[88:44]

we could raise the volume just a little

[88:45]

bit. Let me stop the old version, start

[88:47]

the new version, type in my name, and

[88:49]

hear what Scratch actually sounds like.

[88:53]

>> Hello, David.

[88:54]

>> Okay, not very cat-like, but we can kind

[88:56]

of waste some time on this by like

[88:58]

dragging the set voice to box. And I can

[89:00]

put this anywhere I want above the speak

[89:04]

block. So, I'm just going to put it

[89:05]

here, even though I've already asked a

[89:06]

question. Maybe kitten sounds

[89:08]

appropriate. Let's try again. Dav

[89:11]

>> meow meow.

[89:13]

>> Okay. And then let's see uh giant little

[89:17]

creepier. Here we go. DAV ID. And

[89:20]

lastly,

[89:21]

>> hello David.

[89:23]

>> All right. Little ransomlike instead.

[89:25]

All right. So, that's just some

[89:26]

additional puzzle pieces, but really

[89:27]

just the same idea, but I like that

[89:29]

we've introduced some sound. So, let's

[89:30]

do this. Let me go ahead and throw away

[89:32]

a lot of those puzzle pieces, leave

[89:33]

ourselves with just the when green flag

[89:35]

clicked, and play around with some other

[89:37]

building blocks that we've seen already

[89:39]

thus far. Let me go ahead, for instance,

[89:41]

under sound, and let's make the cow

[89:43]

actually meow. So, it turns out Scratch

[89:45]

being a cat by default comes with some

[89:47]

sounds by default like meowing. So, if

[89:49]

we go ahead and click the green flag

[89:51]

after programming this program, let's

[89:54]

hear what he sounds like now.

[89:56]

Okay, kind of cute. And if you want it

[89:58]

scratched to meow twice, you can just

[90:00]

play the game again.

[90:03]

And a third time. All right, but that's

[90:05]

going to get a little tedious as cute as

[90:07]

it is. So, I can solve that. Let's just

[90:09]

grab three of the puzzle pieces and just

[90:11]

drag them together and let them connect.

[90:12]

And now click the green flag.

[90:16]

All right. Doesn't it gets less cute

[90:18]

quickly, but maybe we can slow it down

[90:20]

so that the cat doesn't sound so so

[90:21]

hungry. Maybe let me go under uh let's

[90:24]

see under control. Let's grab one of

[90:26]

those. Wait one second and maybe plop a

[90:29]

couple of these in the middle here. That

[90:30]

might help things. And now click the

[90:32]

green flag.

[90:35]

Okay. Still a little hungry, but let's

[90:37]

see if we change it to two. And then I

[90:39]

change it to two down here in both

[90:41]

places. Let's play it again.

[90:46]

Okay, cuter maybe, but now I'm venturing

[90:49]

into badly programmed territory. This is

[90:52]

correct. If my goal is to get the cat to

[90:53]

meow three times, pausing in between.

[90:56]

Sorry, three times pausing in between.

[90:58]

What is bad about this code? Even if

[91:02]

you've never programmed before, though.

[91:04]

Yeah, in the middle.

[91:07]

>> Yeah, I literally had to repeat myself

[91:08]

three times. Essentially copy pasting.

[91:10]

And frankly, I could have been really

[91:11]

lazy and I could rightclick or

[91:13]

control-click and I could have chosen

[91:14]

duplicate. But generally, when you copy

[91:17]

paste code or when you duplicate puzzle

[91:19]

pieces, probably doing something wrong.

[91:22]

Why? It's solving the problem correctly,

[91:25]

but it's not well designed. Even if for

[91:27]

only because when I change the number of

[91:29]

seconds, now I had to change it in two

[91:30]

places. So, I had one initially, then I

[91:33]

had to change it to two. And if you just

[91:35]

imagine in your mind's eye having not

[91:36]

like six puzzle pieces but 60 or 600 or

[91:40]

6,000, you're going to screw up

[91:42]

eventually if it's on you to remember to

[91:43]

change something here and here and here

[91:45]

and here. Like you're going to mess up.

[91:47]

It's better to keep things simple and

[91:49]

ideally centralized by factoring out

[91:52]

common functionality. And clearly

[91:54]

playing sound and waiting is something

[91:56]

I'm doing at least twice if not a third

[91:58]

time here as well. So how can we do this

[92:00]

better? Well, remember this thing loops.

[92:03]

Maybe we can just do something a little

[92:04]

more cycllically. So I tell the computer

[92:06]

to do something once, but I tell it how

[92:08]

many times to do that al together. So

[92:10]

notice here by coincidence under control

[92:12]

I have a repeat block which doesn't say

[92:15]

loop, but that's certainly the right

[92:16]

semantics. Let me go ahead and drag the

[92:18]

repeat block in and I'll change the 10

[92:20]

to three just for consistency here. I'm

[92:22]

going to go back to sound. I'm going to

[92:23]

go ahead and play sound meow until done

[92:26]

just as before. And just so it's not

[92:28]

meowing too fast under control, I'm

[92:30]

going to grab a weight one second and

[92:32]

keep it inside the loop. And notice that

[92:34]

the loop here is sort of hugging these

[92:36]

puzzle pieces by growing to fill however

[92:39]

many pieces I actually cram in there. So

[92:42]

now if I click play, the effect is going

[92:44]

to be the same, but it's arguably not

[92:45]

only correct, but also well

[92:49]

designed because now if I want to change

[92:52]

the weight, change it in one place. If I

[92:54]

want to change the total number of

[92:55]

times, change it in one place. So I've

[92:57]

modularized the code and made it better

[93:00]

designed in this case. But now this is

[93:02]

silly because even though I want the cat

[93:06]

to meow, it feels like any program in

[93:08]

which I want this cat to meow, I have to

[93:10]

make these same puzzle pieces and

[93:11]

connect them together. Wouldn't it be

[93:13]

nice to invent the notion of meowing

[93:15]

once and then actually have a puzzle

[93:17]

piece called meow? So when I want the

[93:18]

cat to meow, it will just meow. Well, I

[93:21]

can do that, too. Let me scroll down to

[93:23]

my blocks here in pink. I'm going to

[93:26]

click make a block and I'm going to

[93:27]

literally make a new puzzle piece that

[93:29]

MIT didn't think of called meow. And I'm

[93:32]

going to go ahead and click okay. Now I

[93:34]

have in my code area here a define block

[93:38]

which literally means define meow as

[93:41]

follows. So how am I going to do this?

[93:43]

Well, I'm going to propose that meowing

[93:45]

just means to play the sound meow until

[93:48]

done and then wait 1 second. And notice

[93:51]

now I have nothing inside my actual

[93:53]

program which begins when I click the

[93:55]

green flag. But notice at top left

[93:58]

because I made a block called meow, I

[94:00]

now have access to one that I can drag

[94:02]

and drop. So now I can drag me into this

[94:06]

loop. And per my comment about

[94:09]

abstracting the lower level

[94:11]

implementation details away, I'm going

[94:13]

to sort of unnecessarily dramatically

[94:15]

just move that out of the way. It still

[94:16]

exists. I didn't delete it, but now out

[94:18]

of sight, out of mind. Now, if you agree

[94:21]

with me that meow means for the cat to

[94:23]

make a sound, we've abstracted away what

[94:25]

it means mechanically for the cat to say

[94:28]

that sound. And so, we now have our own

[94:30]

puzzle piece that I can just now use

[94:32]

forever because I invented the meow

[94:33]

block already. Now, I can do one better

[94:36]

than this. It would be nice if I could

[94:37]

just tell the meow block how many times

[94:38]

I want it to meow because then I don't

[94:40]

need to waste time using loops either

[94:42]

myself. So, let me do this. Let me zoom

[94:44]

out and let me go back to my define

[94:48]

block. Let me rightclick or

[94:49]

control-click and just edit it. Or I

[94:51]

could delete it and start over, but I'll

[94:53]

just edit it. And specifically, let me

[94:54]

say, you know what, let's add an input,

[94:57]

otherwise known as an argument, to this

[94:58]

meow block. And we'll call it maybe n

[95:01]

for the number of times I want it to

[95:02]

meow. And just to be super clear, I'm

[95:05]

going to add a label, which has no

[95:06]

functional impact, but it just helps me

[95:07]

remember what this does. So, I'm going

[95:09]

to say meow end time, so that when I see

[95:11]

the puzzle piece, I know what the N

[95:13]

actually represents. If I now click

[95:15]

okay, my puzzle piece looks a little

[95:17]

different at top left. Now it has the

[95:19]

white oval into which I can type or drag

[95:21]

input. Notice down here in the define

[95:24]

block, I now see that same input called

[95:27]

N. So what I can do now is this. Let me

[95:29]

go under control. Glag, drag the repeat

[95:32]

block here. And I have to do a little

[95:34]

switcheroo. Let me disconnect this. Plug

[95:37]

it inside of the repeat block. Reconnect

[95:39]

all of this. And I don't want 10. And

[95:42]

heck, I don't even want three down here

[95:43]

anymore. I can drag this input because

[95:46]

it's the right shape. And now declare

[95:48]

that meowing n times means to repeat the

[95:51]

following n times. Play sound meow until

[95:54]

done. Wait one second and keep doing

[95:56]

that n total times. If I now zoom out

[96:00]

and scroll up, notice that my usage of

[96:03]

this puzzle piece has changed such that

[96:04]

I don't actually need the repeat block

[96:06]

anymore. I can disconnect this. And

[96:09]

heck, I can actually rightclick and uh

[96:11]

control-click and delete it. just use

[96:13]

this under the green flag. Change this

[96:15]

to a three. And now I have the essence

[96:18]

of this meowing program. The

[96:21]

implementation details are out of sight,

[96:22]

out of mind. Once they're correct, I

[96:24]

don't need to worry about them again.

[96:26]

And this is exactly how Scratch itself

[96:28]

works. I have no idea how MIT

[96:30]

implemented the weight block or the

[96:32]

repeat block. Heck, there's a forever

[96:33]

block and there's a few others, but I

[96:35]

don't need to know or care because

[96:37]

they've implemented those building

[96:39]

blocks that I can then implement myself.

[96:41]

I don't necessarily know how to build a

[96:43]

whole chatbot, but on top of OpenAI's

[96:46]

API, this web-based service, I can

[96:48]

implement my own chatbot because they've

[96:50]

done the heavy lift of actually

[96:52]

implementing that for me. Well, let's do

[96:54]

just a few more examples here. Let's

[96:56]

bring the cat all the more to life. Let

[96:57]

me throw away the meowing. Let me open

[96:59]

up under when green flag clicked. How

[97:01]

about that forever block that we just

[97:03]

glimpsed? Let me go ahead and now add to

[97:05]

the mix what we called earlier

[97:07]

conditionals which allow us to ask

[97:10]

questions and decide whether or not we

[97:11]

should do something. So under this, let

[97:13]

me go ahead and under forever say if the

[97:17]

following is true. Well, what boolean

[97:21]

expression do I want to ask? Well, let's

[97:23]

implement how about this program and

[97:24]

we'll figure out if it works. Uh under

[97:26]

sensing, I'm going to grab this uh very

[97:30]

angled puzzle piece called touching

[97:31]

mouse pointer. that is the cursor and

[97:34]

only if that question has a yes answer

[97:37]

do I want to play the sound meow until

[97:40]

done. So let me zoom in here and in

[97:42]

English

[97:44]

what is this going to implement really

[97:48]

just describe what this program does

[97:50]

less arcanely as the code itself.

[97:52]

Yeahouse

[97:57]

>> yeah if you move the mouse over the cat

[97:59]

it will make noise. So, it's kind of

[98:01]

like implementing petting a cat, if you

[98:03]

will. So, let me zoom out, click the

[98:05]

green flag, and notice nothing's

[98:07]

happening yet, but notice my puzzle

[98:09]

pieces are highlighted in yellow because

[98:11]

it is in fact still running because it's

[98:13]

doing something forever. And it's

[98:14]

constantly checking if I'm touching the

[98:17]

mouse pointer. And if so,

[98:20]

it's like I just pet the cat. Now, it

[98:22]

stopped until I move the cursor again.

[98:24]

Now, it stopped. If I leave it there,

[98:27]

it's going to keep meowing because it's

[98:28]

going to be stuck in this loop forever.

[98:30]

But it's correct in so far as I'm

[98:32]

petting the cat. Let me do this though.

[98:34]

Let me make a mistake this time. Let me

[98:37]

forget about the forever and just do

[98:38]

this. And you might think this is

[98:40]

correct. Let me click the green flag

[98:42]

now. Let me pet the cat. And like

[98:45]

nothing's actually working here. Why

[98:47]

though logically?

[98:49]

Yeah.

[98:51]

>> Yeah. The program's so darn fast. It

[98:53]

already ran through the sequence. And at

[98:54]

the moment in time when I clicked the

[98:56]

rear flag, no, I was not touching the

[98:58]

mouse pointer. And so it was too late by

[99:00]

the time I actually moved the cursor

[99:02]

there. But by using the forever block,

[99:04]

which I did correctly the first time,

[99:06]

this ensures that Scratch is constantly

[99:08]

checking the answer to that question. So

[99:10]

if and when I do pet the cat, it will

[99:13]

actually

[99:15]

detect as much. All right, about a few

[99:18]

final examples before you're on your way

[99:20]

building some of your own first programs

[99:22]

with these building blocks. Let me go

[99:23]

ahead and open up a program that I wrote

[99:25]

in advance in fact about 20 years ago

[99:28]

whereby let me pull this up whereby we

[99:32]

have in this example a program I wrote

[99:34]

called Oscar time and this was the

[99:36]

result of our first assignment in this

[99:38]

class whereby when MIT was implementing

[99:41]

Scratch for the very first time we

[99:42]

needed to implement our very own Scratch

[99:44]

program as well. I'm going to go ahead

[99:46]

and full screen it here. The goal is to

[99:48]

drag as much falling trash as you can to

[99:50]

Oscar's trash can before his song ends.

[99:52]

For which one volunteer would be handy

[99:55]

here. Okay. I saw your hand go up

[99:57]

quickly in blue. Yeah. Come on up. All

[99:59]

right. So, you're playing for a stress

[100:01]

ball here if we will. At one at some

[100:03]

point, I'm going to talk over what

[100:04]

you're actually playing just so that we

[100:05]

can point out what it is we're trying to

[100:07]

glean from this program. And I'll

[100:09]

stipulate this probably took me like 8

[100:11]

12 hours. And as you'll soon see, the

[100:12]

song starts to drive you nuts after a

[100:14]

while because I was trying to

[100:15]

synchronize everything in the game to a

[100:17]

childhood song with which you might be

[100:18]

familiar. Let me go ahead and say hello

[100:20]

if you'd like to introduce yourself.

[100:22]

>> Oh, hello. So, I'm Han and uh I'm a

[100:26]

first year student. I'm pretty excited

[100:28]

for this class.

[100:28]

>> All right, welcome. Well, here is Oscar

[100:30]

time. If you want to go ahead and take

[100:32]

control of the keyboard, all you'll need

[100:33]

to do is drag and drop trash that falls

[100:36]

from the sky into the trash can.

[100:49]

Papa

[101:00]

heat.

[101:16]

And it's around this point in the game

[101:17]

where the novelty starts to wear off

[101:19]

because there's like three more minutes

[101:20]

of this game where more and more stuff

[101:21]

starts to fall from the sky. So as Han,

[101:23]

as you continue to play, I'm going to

[101:24]

cut over here. You keep playing. Let's

[101:26]

consider how I implemented this whereby

[101:30]

we'll start at the beginning. The very

[101:32]

first thing I did when implementing

[101:33]

Oscar time honestly was the easy part.

[101:34]

Like I found a lamp post that looked a

[101:37]

little something like this and I made

[101:38]

the so-called costume for the whole

[101:40]

stage. And that was it. The game didn't

[101:42]

do anything. You couldn't play anything.

[101:43]

You put your green flag, nothing

[101:44]

happened. But then I figured out how to

[101:47]

turn the scratch cat, otherwise known

[101:49]

more generally as a sprite, into a trash

[101:52]

can instead. And so the trash can,

[101:55]

meanwhile, is clearly animated because I

[101:57]

realized that, oh, I can give sprites

[101:59]

like the cat different costumes. So, I

[102:01]

can make the cat not only look like a

[102:03]

trash can, but if I want its lid to go

[102:05]

up, well, that's just another costume.

[102:07]

And if I want to see Oscar popping out,

[102:09]

that's just a third costume. And so, I

[102:11]

made my own simplistic animation. And

[102:14]

you can kind of see it. It's very

[102:15]

jittery step by step by step by creating

[102:19]

the illusion of animation by really just

[102:20]

having a few different images or

[102:22]

costumes on Oscar. Now, I hope you

[102:23]

appreciate how much effort went involved

[102:25]

into timing each of these pieces of

[102:26]

trash with the specific mention of that

[102:28]

type of piece of trash in the music.

[102:30]

Okay. 20 years later, still clinging.

[102:32]

So, you're doing amazing, by the way.

[102:34]

How do we get the trash to fall in the

[102:36]

first place? Well, at the very beginning

[102:37]

of the game, the trash just started

[102:39]

falling from some random location. What

[102:41]

does it mean for trash to fall from the

[102:42]

sky?

[102:44]

Oh, big climax here.

[102:48]

You got a lot of trash on the ground to

[102:49]

pick up.

[102:54]

There we go. And your final score is

[102:59]

a big round of applause if we could for

[103:00]

Han. Thank you.

[103:04]

Thank you. So just to be clear now,

[103:07]

let's decompose this fairly involved

[103:09]

program that took me a lot of hours to

[103:11]

make into its component parts. So this

[103:14]

is just a sprite. And I figured out

[103:15]

eventually how to change its costume,

[103:17]

change its costume, change its costume

[103:18]

to simulate some kind of animation. And

[103:20]

I also realized that oh, I don't need to

[103:22]

just have one sprite or one cat or trash

[103:24]

can. You can create a second sprite, a

[103:26]

third sprite, and many more. So I just

[103:28]

told the sprite to go to a random

[103:30]

location at Y equals 180 and X equals

[103:34]

something. I think I restricted X to be

[103:35]

in this region, which is why the trash

[103:37]

never falls from over here. I just did a

[103:39]

little bit of math based on that

[103:40]

cartisian plane that we saw a slide of

[103:41]

earlier. And then I probably had a loop

[103:44]

that told the trash to move a pixel,

[103:46]

move a pixel, move a pixel down, down,

[103:48]

down, down until it eventually hits the

[103:50]

bottom and therefore just stops. So we

[103:52]

can actually see this step by step. And

[103:54]

this is representative of how even for

[103:56]

something like your first problem said

[103:57]

in CS50 and with Scratch specifically,

[104:00]

you might build some of the same. So,

[104:02]

I'm going to go back into uh CS50 Studio

[104:05]

for today, which is linked on the

[104:06]

courses website, which has a few

[104:08]

different versions of this and other

[104:10]

programs called Oscar 0ero through Oscar

[104:13]

4, where zero is the simplest. And

[104:15]

truly, I meant it when I look inside

[104:17]

this program to see my code. Like, this

[104:19]

was it. There was no code because all I

[104:21]

did was put the sprite on the screen and

[104:23]

change it from a cat to a trash can. And

[104:25]

I added a costume uh a costume for the

[104:27]

stage, so to speak, so that the lamp

[104:29]

post would be fixated there. If I then

[104:31]

go to the next version of code, version

[104:34]

one, so to speak, then I had code that

[104:36]

did this. Now, notice there's a few

[104:38]

things going on here. At bottom left,

[104:40]

you'll see of course the trash can and

[104:42]

then at top right the trash. Here are

[104:44]

the corresponding sprites down here. So,

[104:46]

when Oscar is clicked on here, the trash

[104:48]

can, you see the code I wrote, the

[104:50]

puzzle pieces I dragged for Oscar. And

[104:52]

in a moment, when we click on trash,

[104:54]

you'll see the code I wrote or the

[104:56]

puzzle pieces I wrote dragged and

[104:58]

dropped for the trash piece

[104:59]

specifically. So what does Oscar do?

[105:02]

Well, I first switch his costume to

[105:04]

Oscar 1, which I assume is this the

[105:07]

closed trash can. Then forever Oscar

[105:10]

does the following. If Oscar's touching

[105:12]

the mouse pointer, then change the

[105:14]

costume to Oscar 2. Otherwise, that is

[105:17]

if not touching the mouse pointer,

[105:19]

change the costume to Oscar 1. Well,

[105:21]

what's the implication? Anytime I move

[105:23]

the cursor over the trash can, the lid

[105:25]

just pops up, which was exactly the

[105:27]

animation I wanted to achieve.

[105:29]

Meanwhile, if we do this and click the

[105:31]

green flag, you can see that in action,

[105:33]

even for this simple version. If I move

[105:36]

the cursor over Oscar, we have the

[105:38]

beginnings of a game, even though

[105:39]

there's no score, there's no music or

[105:41]

anything else, but I've solved one of my

[105:43]

problems. Meanwhile, if I click on the

[105:45]

trash piece here, and then you'll see no

[105:48]

code has been written for it yet. So, we

[105:50]

move on to Oscar version two and see

[105:52]

inside it. In Oscar version two, when I

[105:54]

click on trash, ah, now there's some

[105:57]

juicy stuff happening here. And in fact,

[105:59]

this trash sprite has two programs or

[106:02]

scripts associated with it. And that's

[106:04]

fine. Each of them starts with when

[106:05]

green flag clicked, which means the

[106:07]

piece of trash will do two things at

[106:09]

once essentially in parallel. The first

[106:11]

thing it will do is we'll set drag mode

[106:13]

to dragable. And that's just a scratch

[106:14]

thing that lets you actually move the

[106:15]

sprites by clicking on them, making them

[106:17]

dragable. Then it goes to a random X

[106:19]

location between 0 and 240. So yeah,

[106:22]

that must be what I did from the middle

[106:24]

all the way to the right. And I set y

[106:26]

always to 180, which is why the trash

[106:28]

always comes from the sky from the very

[106:30]

top. Then I said forever change your y

[106:33]

by negative one. And here's where it's

[106:35]

useful to know what 180 is, 240 is, and

[106:37]

so forth. Because if I want the trash to

[106:39]

go down, so to speak, that's changing

[106:41]

its Y by a pixel by a pixel by a pixel.

[106:44]

And thankfully MIT implemented it such

[106:46]

that if the trash tries to go off the

[106:48]

screen, it will just stop automatically,

[106:51]

even if it's inside of a forever block,

[106:52]

lest you lose control over the sprites

[106:54]

altogether. But in parallel, what's

[106:56]

happening is this. Also, when the green

[106:58]

flag is clicked, uh the trash piece is

[107:01]

doing this too forever. If touching

[107:03]

Oscar, what's it doing in blue here?

[107:08]

Sort of teleporting away. Now, to your

[107:11]

eye, hopefully it looks like it's going

[107:13]

into the trash can. But what does that

[107:15]

mean to go into the trash can? Well, I

[107:17]

just put it back into the sky as though

[107:18]

a new piece of trash is falling. So even

[107:21]

though you saw one piece of trash, two,

[107:23]

three, four, and so forth, it's the same

[107:25]

sprite just acting that out again and

[107:27]

again. So here, if I click play on this

[107:31]

program, you'll see that it starts

[107:32]

falling one pixel at a time. Because

[107:36]

it's draggable, I can sort of pull it

[107:37]

away and move it over to the trash can

[107:40]

like that. And as soon as I do, it seems

[107:42]

to go in, but really it just teleported

[107:44]

to a different X location. Still at Y=

[107:47]

180. Again, it's not much of a game yet.

[107:49]

There's no score. There's no music or

[107:50]

anything, but let's go to Oscar 3 now.

[107:52]

And in Oscar 3, if we scroll over to the

[107:56]

trash, even more is happening here. In

[107:58]

so far as I realized, you know what?

[108:00]

There was kind of a inefficiency before.

[108:03]

Previously, I had these two programs or

[108:05]

scripts synonym whereby they both went

[108:07]

to the top by going to 0 to 240 for X

[108:12]

and then 180 for Y. And if you noticed,

[108:14]

I used that here and I used that down

[108:17]

here in both programs. Now that too is

[108:18]

kind of stupid because I literally

[108:19]

copied and pasted the same code. So if I

[108:21]

ever want to change that design, I have

[108:23]

to change it in two places and I already

[108:25]

proposed that we frown upon that. So

[108:26]

what did I do in this version? I just

[108:28]

created my own block and I decided to

[108:30]

call my own function go to top. What

[108:33]

does it mean to go to the top? Pick a

[108:34]

random x between those values and fixate

[108:37]

on y= 180 initially. Now in both of

[108:41]

those programs which are otherwise

[108:42]

identical, I just say what I mean. Go to

[108:45]

top. Go to top. And if I really wanted

[108:46]

to, I could drag this out of the way and

[108:48]

never think about it again because now

[108:50]

that functionality exists. So correct,

[108:53]

but arguably better designed. I've now

[108:55]

factored out commonality so as to use

[108:58]

and reuse my code as well. So let's go

[109:01]

up to Oscar version 4 now. And in Oscar

[109:03]

time version 4, the trash can does a

[109:07]

little something more whereby what have

[109:10]

I added to this mix even though we

[109:11]

haven't dragged this puzzle piece

[109:13]

together before?

[109:15]

Yeah. What's new?

[109:16]

>> Score.

[109:17]

>> Yeah. So, it turns out on the left here,

[109:19]

there's a variables category, which is

[109:21]

goes beyond the answer variable that we

[109:24]

just automatically get from the ask

[109:26]

block. You can create your own variables

[109:28]

X, Y, Z. But in computer and

[109:30]

programming, it's best to name things,

[109:31]

not silly simple words like X, Y, and Z,

[109:34]

but full-fledged words that say what

[109:35]

they are, like score. So, I'm setting a

[109:37]

score variable to zero. And then any

[109:40]

time the trash is touching Oscar before

[109:43]

it teleports away to the top, I change

[109:45]

the score by one. That is increment the

[109:47]

score by one. And what Scratch does

[109:49]

automatically for me is it puts a little

[109:51]

billboard up here showing me the current

[109:53]

score. So if I now play this game once

[109:57]

more, the score is going to start at

[109:59]

zero. But if I drag this trash over here

[110:01]

and even let it fall in, as soon as it

[110:03]

touches, the score goes to one. And now

[110:05]

if I click and drag again, the score is

[110:07]

going to as soon as it touches Oscar

[110:09]

going to go to two and so forth. And you

[110:12]

saw in the final flourish with Han

[110:13]

playing that once you had the sound and

[110:15]

other pieces of trash, which are just

[110:17]

really other sprites and I just had wait

[110:19]

like a minute, wait two minutes so that

[110:21]

the trumpet would fall at the right

[110:22]

time. I've broken down a fairly involved

[110:25]

program into these basic building

[110:27]

blocks. And when you too write your own

[110:29]

program, that's exactly how you should

[110:31]

approach it. Even if you have these

[110:32]

grand aspirations to do this or that,

[110:35]

start by the simple problems and figure

[110:37]

out what bites can I uh bite off in

[110:39]

order to make progress. Baby steps if

[110:42]

you will to the final solution. Well,

[110:44]

let's look at one other set of examples

[110:46]

before we have one final volunteer to

[110:48]

come up. And as you'll soon see, it's

[110:49]

tradition in CS50 to end the first class

[110:51]

with cake. So, in a moment, cake will be

[110:54]

served out in the transcept. And please

[110:55]

feel free to come up and say hi and ask

[110:56]

questions if you'd like to. Let me go

[110:58]

ahead and open up though a series of

[111:00]

building blocks here via which we can

[111:02]

make so-called Ivy's hardest game which

[111:05]

is one implemented by one of your

[111:07]

predecessors, a former classmate from

[111:09]

CS50. So here we have a whole bunch of

[111:11]

puzzle pieces written by your classmates

[111:13]

but let me go ahead and zoom in on this

[111:15]

screen. You'll see that this harbored

[111:17]

crest is my sprite. So it's not a cat,

[111:18]

it's not a trash can, it's a harbored

[111:20]

crest and it exists in a very simple

[111:22]

two-dimensional world with two walls

[111:24]

next to it. If I click on the green

[111:26]

flag, notice that with my hands here, I

[111:29]

can go up, I can go down, I can go left,

[111:32]

and I can go right. But if I try going

[111:34]

too far right, I get stuck on the wall.

[111:36]

If I go too far left, I get stuck on the

[111:39]

wall. Well, it's the sort of the

[111:41]

beginning of any animation or game. But

[111:42]

how do I do this? Well, let me go up

[111:44]

here and propose that the first thing

[111:46]

the Harvard sprite is doing is it's

[111:48]

going to the middle 0 comma 0. And it's

[111:51]

then forever listening for the keyboard

[111:54]

and feeling for walls. Now those are

[111:56]

functions I implemented myself to kind

[111:59]

of describe what I wanted the program to

[112:01]

do. And let's do the shorter one first.

[112:03]

What does it mean to feel for the walls?

[112:05]

Just to ask the question, if you're

[112:06]

touching the left wall, change your x by

[112:09]

one. If you're touching the right wall,

[112:11]

change your x by negative one.

[112:15]

Why have I defined touching walls in

[112:18]

this weirdly mathematical way? Yeah.

[112:22]

>> Sure. Yeah.

[112:24]

>> Like counteracts the movement.

[112:26]

Otherwise, you're like not moving.

[112:29]

>> Exactly. Because if I've gone so far

[112:31]

right that I'm touching the right wall,

[112:33]

well, I'm already kind of on top of the

[112:35]

wall a little bit. So, I effectively

[112:37]

want the sprite to bounce off of it. And

[112:39]

the easiest way to do that is just to

[112:40]

say back up one pixel as though you

[112:42]

can't go any further. And same for the

[112:43]

left wall. Meanwhile, let me scroll over

[112:45]

to the second script or program that's

[112:48]

running in parallel. It's a little

[112:49]

longer, but it's not more complicated.

[112:51]

What does it mean to listen for

[112:52]

keyboard? Well, just check. If the key

[112:56]

up arrow is pressed, change Y by one.

[112:59]

Arrow go up. Else if the key down arrow

[113:01]

is pressed, then change Y by negative 1.

[113:03]

Key right arrow is pressed, change X by

[113:05]

one, and so forth. So again, this is

[113:07]

where the math and the numbers are

[113:08]

useful because it gives you a world in

[113:10]

which to live. Up, down, left, right.

[113:12]

deconstructed into some simple

[113:14]

arithmetic values. All right, so the net

[113:17]

result is that we have a crest living in

[113:19]

this world. Well, let's add a bit of

[113:21]

competition here. And in the second

[113:22]

version of this game, let me go ahead

[113:24]

and full screen it again. Click play.

[113:26]

And now we'll see sort of an enemy

[113:28]

bouncing back and forth autonomously. So

[113:31]

there's no one playing except me. I'm

[113:33]

controlling Harvard. Yale is bouncing on

[113:34]

its own. And nothing bad's going to

[113:36]

happen if it hits me. But it does seem

[113:38]

to be autonomous. So how is this

[113:40]

working? Well, if it's doing this

[113:42]

forever, there's probably a forever loop

[113:43]

involved. So, let's see inside here.

[113:45]

Let's click not on Harvard, but on the

[113:48]

Yale sprite. And sure enough, if we

[113:50]

focus on this for a moment, we'll see

[113:52]

that the first thing Yale does is go to

[113:54]

0 comma 0. It points in direction 90°,

[113:56]

which just gives you a sense of whether

[113:58]

you're facing left or right or wherever.

[113:59]

And then it forever does the following.

[114:01]

If it's touching the left wall or

[114:03]

touching the right wall, I was a little

[114:05]

clever this time, if I may. I just kind

[114:07]

of turn around 180 degrees, which

[114:09]

effectively bounces me back in the

[114:11]

opposite direction. Otherwise, I go

[114:13]

ahead and no matter what just move one

[114:15]

step. And this is why Yale is always

[114:17]

moving back and forth. So, a quick

[114:18]

question. If I wanted to speed up Yale

[114:20]

and make this beginning of a game

[114:22]

harder, what would I do?

[114:25]

Yeah.

[114:28]

>> Yeah. So, let's have it move like 10

[114:29]

steps at a time, right? This looks like

[114:30]

a much harder game, if you will, like

[114:32]

level 10 now, because it's just moving

[114:34]

so much faster. All right. Well, let's

[114:36]

try a third version of this that adds

[114:37]

another ingredient. Let me full screen

[114:39]

this and click play. And now you'll see

[114:41]

the even smarter MIT homing in on me by

[114:46]

following my actual movements. So, this

[114:48]

is sort of like boss level material now.

[114:51]

And it's just going to follow me. So,

[114:53]

how is this working? Well, it's kind of

[114:55]

a common game paradigm, but what does

[114:57]

this mean? Well, let's see inside here.

[114:59]

Let's click on MIT sprite. It's pretty

[115:01]

darn easy.

[115:03]

go to some random position just to make

[115:05]

it a little interesting lest MIT always

[115:07]

start in the center and then forever

[115:08]

point towards the Harvard logo outline

[115:11]

which is the name the former student

[115:12]

gave to the costume that the sprite is

[115:14]

wearing that looks like a Harvard crest

[115:16]

and then move one step. So coral layer

[115:18]

of the previous question, how do we make

[115:19]

the game harder and MIT even faster?

[115:22]

Well, we can change this to be like 10

[115:24]

steps and now you'll see MIT is a little

[115:27]

twitchy because

[115:30]

this is kind of a visual bug. Let me

[115:31]

make it full screen.

[115:34]

Why is this visual glitch happening?

[115:37]

It's literally doing what I told it to

[115:39]

do. It just looks stupid. Yeah.

[115:45]

Say again.

[115:48]

>> Yeah. It's moving so fast that it's sort

[115:49]

of going 10 pixels this way, but then I

[115:51]

kind of it kind of overshot me. So then

[115:53]

it's doubling back to follow me again,

[115:54]

and it's doubling back this way. And

[115:56]

because these are such big footsteps, if

[115:58]

you will, it just has this visual effect

[116:00]

of twitching back and forth. So, we

[116:01]

might have to throttle that back a bit

[116:02]

and make it five or two or three instead

[116:05]

of 10 because that's clearly not

[116:06]

desirable gaming behavior here. All

[116:09]

right. Well, let's go ahead and do this.

[116:10]

Let's put them all together just as your

[116:12]

former classmate did when submitting

[116:13]

this actual homework. Uh, the game will

[116:16]

conclude hopefully in an amazing climax

[116:17]

where you've won the game. So, we need

[116:19]

someone ideally with really good hand

[116:20]

eye coordination to play this final game

[116:23]

here. Yeah, your hand went up first, I

[116:25]

think. Okay, come on up. Big round of

[116:27]

applause because this is a lot of

[116:28]

pressure to end.

[116:34]

All right. So, if you win the game, cake

[116:37]

will be served. If you don't win the

[116:39]

game, there will be no cake.

[116:41]

>> Okay. But introduce yourself in the

[116:43]

meantime.

[116:43]

>> Hi, I'm Jenny Pan, freshman at Hollis

[116:47]

and I'm actually a CS major or

[116:49]

concentration.

[116:50]

>> Nice to meet you. Head to the keyboard

[116:51]

here. This now is the combination of all

[116:54]

of those building blocks and even more

[116:56]

aka Ivy's hardest game. You will be in

[116:58]

control just as I would of the harbored

[117:00]

crest. And the goal is to make it to the

[117:02]

exit, which is this gentleman on the

[117:03]

right here. And you'll see there's

[117:04]

multiple levels where it's each level

[117:06]

gets a little harder. All right, here we

[117:08]

go.

[117:50]

Heat.

[118:06]

Heat.

[118:25]

All right, this is CS50 and this is week

[118:28]

one, our second week together. And

[118:30]

you'll recall that last week, week zero,

[118:32]

we focused on Scratch. Ultimately, this

[118:34]

graphical programming language by which

[118:36]

you can drag and drop puzzle pieces that

[118:37]

interlock together only if it makes

[118:39]

logical sense to do so. And many of you

[118:41]

had actually probably played with that

[118:43]

in like middle school or even prior at

[118:44]

some point. But for our purposes, the

[118:46]

goals of Scratch were to give us sort of

[118:48]

a mental model for some fundamental

[118:50]

constructs that we're going to see again

[118:51]

and again today in C in a few weeks in

[118:54]

Python and even thereafter. And those

[118:55]

include things like functions and return

[118:58]

variables and arguments and variables

[119:01]

and loops and conditionals and more. And

[119:03]

so even if today feels like a bit of a

[119:05]

fire hose, such as that picture here,

[119:08]

appreciate that a lot of today's ideas

[119:10]

are exactly the same as last week's

[119:12]

ideas, it's just that the syntax is

[119:14]

going to change. It's going to look a

[119:15]

little different. It's going to look a

[119:16]

little scarier. It's going to be harder

[119:17]

to sort of memorize, except with

[119:19]

practice will come that muscle memory,

[119:21]

but the ideas ultimately are going to be

[119:23]

the same. And indeed, this is, if

[119:24]

unfamiliar, uh MIT down the road has a

[119:27]

tradition of hacks whereby students once

[119:28]

a year do something fairly crazy. And at

[119:30]

this point, they happen to connect an

[119:32]

actual working uh drinking fountain to

[119:35]

an actual fire hydrant. And the sign

[119:37]

there, very pixelated, says, "Getting an

[119:39]

education from MIT is like trying to

[119:41]

drink from a fire hose." And that's

[119:42]

indeed how computer science, how

[119:45]

programming, how CS50 will sometimes

[119:47]

feel, but realize that what's going to

[119:49]

be ultimately most important is not

[119:52]

where you uh feel you are day after day,

[119:54]

but where 3 months from now you feel

[119:57]

that you are relative to last week

[119:59]

alone. so-called week zero. So, let's

[120:01]

look back at what week zero looked like.

[120:03]

It looked a little something like this.

[120:04]

The simplest of programs by which we get

[120:06]

get that cat to say hello world. Today,

[120:09]

that same code is going to start to look

[120:11]

a little like this, which was a glimpse

[120:13]

we gave you last week. But this time,

[120:14]

I've deliberately colorcoded it to try

[120:16]

to send the message that whereas in

[120:17]

Scratch, we had this yellowish puzzle

[120:20]

piece that sort of kicked things off

[120:22]

that didn't really do anything itself,

[120:23]

but it got the program started, whereas

[120:25]

the real work was done in purple here.

[120:27]

Same is going to be true today whereby

[120:29]

I'm going to wave my hands for a little

[120:31]

bit of time at this yellowish code on

[120:33]

the screen. But what's really going to

[120:35]

have the most effect is this same purple

[120:37]

line here and the white text within. And

[120:39]

we'll break down what all of these lines

[120:40]

mean over the next couple of weeks. But

[120:42]

sometimes we'll wave our hand at details

[120:44]

if we feel it's a little unnecessary at

[120:46]

this point in the story. And in fact,

[120:48]

let me get rid of the color coding for

[120:49]

now. And we'll see that this is the kind

[120:51]

of code in a language called C we're

[120:53]

going to start playing with and using

[120:55]

today and for the next several weeks.

[120:57]

And indeed, it's representative of what

[120:59]

we're going to generally call source

[121:00]

code. So source code is what programmers

[121:02]

write. It's what you write. It's what

[121:04]

you wrote, albeit by dragging and

[121:05]

dropping puzzle pieces. This week

[121:07]

onward, you're going to start using your

[121:08]

keyboard all the more. And you're going

[121:09]

to write source code. So this is code

[121:11]

that we humans can understand with some

[121:13]

training and with some practice. But of

[121:16]

course per last week, what language do

[121:18]

computers ultimately understand? Only

[121:22]

>> so binary zeros and ones. And so you and

[121:25]

I, yes, can write code starting today in

[121:27]

a form that looks a little something

[121:29]

like this, which admittedly might look a

[121:30]

little arcane and cryptic, but it's

[121:33]

certainly better than a whole bunch of

[121:34]

zeros and ones. But we're going to write

[121:36]

in source code. But the machines that we

[121:38]

write code for ultimately only

[121:40]

understand these here, zeros and ones,

[121:42]

which may very well say hello world, but

[121:44]

we're going to call this moving forward

[121:46]

machine code. So machine code is what

[121:48]

the the computers understand. Only the

[121:50]

zeros and ones. Source code is what you

[121:52]

and I understand and actually write. So

[121:54]

it stands to reason that we're going to

[121:55]

have to somehow translate one to the

[121:57]

other from source code to machine code.

[122:00]

And I alluded to this ever so briefly

[122:02]

last week, but we're going to use this

[122:04]

same mental model whereby the source

[122:06]

code we write might be the input to some

[122:08]

problem. The output we want there from

[122:09]

is going to be the machine code. So what

[122:12]

we're going to equip you with today

[122:13]

inside of this proverbial black box is a

[122:16]

special piece of software that takes

[122:17]

source code as input, produces machine

[122:20]

code as output, and that type of program

[122:22]

is called a compiler. And there's

[122:24]

bunches of difference of compilers in

[122:25]

the world. We're going to have you use

[122:27]

one of the most popular ones, but it's

[122:28]

simply a piece of software that someone

[122:30]

else wrote that converts one language to

[122:32]

another. Source code, for instance, in a

[122:34]

language called C to machine code, the

[122:37]

zeros and ones that our Macs, PCs,

[122:38]

phones, and other devices actually

[122:40]

understand. So, where are we going to do

[122:42]

this and how are we going to do this?

[122:43]

So, I promised last week that we'd

[122:45]

introduce you to this year tool, which I

[122:47]

used briefly at the very start of class

[122:48]

to whip up that chatbot. We're going to

[122:50]

use it though not for Python this week,

[122:52]

but indeed for a different language, C.

[122:54]

And indeed, this tool, Visual Studio

[122:56]

Code, or VS Code for short, is super

[122:58]

popular in industry. This is what real

[123:00]

programmers, so to speak, are using all

[123:02]

of the time nowadays. There's absolutely

[123:04]

alternatives. If some of you have

[123:05]

programmed before, you might have used

[123:07]

or experienced different tools, but this

[123:09]

is a very common tool that you'll see

[123:10]

even after CS50. And in fact, it's

[123:13]

something that ultimately you can

[123:14]

install for free on your own Macs and

[123:16]

PCs so that by the end of the course,

[123:18]

you're completely independent of CS50

[123:21]

and any CS50 related tools. But what we

[123:24]

have done for the very start of the

[123:25]

class is essentially provided you with a

[123:27]

cloud-based version of this tool. So all

[123:30]

you need is a web browser on any Mac or

[123:32]

PC or the like so that everything's

[123:34]

pre-installed for you, preconfigured for

[123:36]

you, and you don't have to deal with the

[123:37]

stupid technical support headaches at

[123:38]

the start of the term because it should

[123:40]

just work. But by the end of the term,

[123:42]

once you're a little more comfortable

[123:43]

with technology and with code in

[123:45]

particular, you can absolutely offboard

[123:47]

yourself from this tool. Install it,

[123:49]

download it on your own Mac and PC and

[123:51]

have pretty much the exact same

[123:52]

environment completely under your

[123:55]

control. So, starting today, you're

[123:56]

going to see an interface that looks

[123:58]

quite like this quite often. And we used

[124:00]

this same interface last week ever so

[124:01]

briefly. Moving forward, here's where

[124:03]

we're going to write code. At top right

[124:04]

is where one or more code tabs are going

[124:07]

to appear, similar to any tabbed uh

[124:09]

environment that you might use. Here,

[124:11]

for instance, is just a screenshot of

[124:12]

the first file we'll create today called

[124:14]

hello.c. The reason it's called hello.c

[124:17]

is because it's in a language called C,

[124:19]

as we soon shall see. No pun intended.

[124:22]

Meanwhile, the code here happens to be

[124:24]

colorcoded, not quite in the same way as

[124:27]

you saw before cuz I manually made it

[124:28]

look more like scratch blocks. But among

[124:30]

the features that VS Code and other

[124:33]

programming environments provide is

[124:34]

something called syntax highlighting

[124:36]

whereby you don't worry about or even

[124:38]

think about these colors. But as you

[124:40]

write out code in a recognized language,

[124:43]

tools like VS Code will just color code

[124:45]

different parts of your code for you

[124:47]

just to make different features jump

[124:49]

out. And we'll see what those features

[124:50]

are over the course of today. But you'll

[124:52]

also spend a good amount of time, as I

[124:54]

briefly did last week, down here in the

[124:56]

bottom right of your screen, the

[124:57]

so-called terminal window, which is

[124:59]

going to be where you run commands for

[125:01]

compiling code and writing code. And in

[125:03]

fact, as we'll see today, you're going

[125:05]

to start using your mouse and clicking a

[125:07]

little bit less. You're going to start

[125:08]

using your keyboard and typing a bit

[125:10]

more. And ultimately, even though if at

[125:12]

first that might feel like a step

[125:13]

backwards to sort of not use something

[125:15]

that's so user friendly, the reality is

[125:17]

most every programmer tends to find

[125:19]

themselves ultimately much more

[125:21]

productive, much more powerful using the

[125:23]

keyboard more often, more quickly than

[125:26]

say a traditional mouse or trackpad

[125:28]

would allow. Meanwhile, we'll see some

[125:29]

somewhat familiar features here at left,

[125:31]

like this is where you'll see the files

[125:32]

and folders that will create over time.

[125:34]

At far left here is going to be an

[125:36]

activity bar, which is essentially a

[125:37]

modern form of a menu via which you can

[125:39]

open and close things and access other

[125:41]

features. For my purposes, I'll

[125:42]

generally hide this part here. I'll

[125:44]

generally hide this part here so that

[125:46]

when we're together, we're focusing

[125:47]

almost entirely on code and commands,

[125:49]

but I'm just typing some quick keyboard

[125:51]

shortcuts to simplify my own user

[125:54]

interface in that way. So, with all that

[125:57]

said, just some terminology. So this

[125:59]

whole collective environment that I'm

[126:01]

describing here is generally what's

[126:02]

known as a graphical user interface.

[126:04]

Why? Well, it's an interface for users

[126:06]

that's graphical in nature with icons

[126:08]

and buttons and the like. Shorthand

[126:10]

notation for this is guey, GUI for

[126:12]

short. But within this graphical user

[126:14]

interface, as promised, is going to be

[126:16]

that terminal window at bottom right

[126:18]

where I promised we would be typing most

[126:20]

of our commands. And just to give you a

[126:21]

bit more jargon in computing, that's

[126:23]

generally known as a command line

[126:25]

interface or CLI for short, whereby

[126:28]

you're typing commands into that

[126:30]

interface instead. And the world of

[126:32]

computing software is essentially

[126:34]

divided into gueies and CLIs and

[126:36]

sometimes a piece of software might have

[126:38]

one of each as well. But without further

[126:41]

ado, why don't we go ahead and focus

[126:43]

entirely first on this here program,

[126:45]

which I dare say is the simplest program

[126:47]

you can write in a language like C and

[126:50]

see how we can actually compile and run

[126:52]

it together. So, I'm going to go over to

[126:54]

VS Code here where I've hidden my file

[126:56]

explorer with all the icons and I've

[126:58]

hidden my activity bar so that only do I

[127:01]

have room for tabs of code and the

[127:03]

command prompt at the bottom. I'm

[127:05]

calling this a command prompt because

[127:06]

it's at this dollar sign where I'm going

[127:08]

to run some of my commands. And it's a

[127:10]

dollar sign by convention. It has

[127:12]

nothing to do with currency. It's just a

[127:13]

computing convention. Some systems will

[127:15]

use a carrot symbol. Some systems will

[127:17]

use a greater than symbol rather or

[127:20]

something else. But it just means type

[127:21]

your commands here. The first such

[127:23]

command I'm going to type is this code

[127:26]

hello. C with a single space in between.

[127:28]

I've not used any spaces in the name of

[127:31]

the file. I've not capitalized any

[127:32]

aspect of the file just because this is

[127:34]

convention. Unlike your Mac or PC where

[127:36]

you might be in the habit of naming

[127:38]

files with spaces and capitalization,

[127:40]

generally you'll make your life simpler

[127:42]

by just using lowercase and no spaces at

[127:44]

all. As soon as I hit enter, what you'll

[127:46]

see is that a brand new tab appears

[127:48]

called hello C with a cursor blinking on

[127:51]

line one. And this is essentially VS

[127:53]

code waiting for me now to type the

[127:55]

first line of my code. Notice though

[127:57]

that the command is complete there by

[127:59]

whereby I am have another cursor here

[128:01]

which I've give if I give click in the

[128:02]

terminal window and give foreground to

[128:04]

it my cursor might blink there instead

[128:06]

that just means I can type another

[128:07]

command when I am ready. So let's go

[128:09]

ahead and whip up this code and I've

[128:11]

done this many times so I can type it

[128:12]

fairly quickly but in this tab I'm going

[128:14]

to do include standard io.h h so to

[128:17]

speak int main void then inside of

[128:20]

so-called curly braces indenting therein

[128:23]

by four spaces I'm going to say print f

[128:26]

quote unquote hello world back slashn

[128:29]

close quote semicolon and voila I've

[128:32]

written my first program in C in a class

[128:34]

like this no need to write down each and

[128:36]

every line of code that I write in fact

[128:37]

on the course's website will be copies

[128:39]

of everything that we've done as well as

[128:40]

excerpts there from in the courses notes

[128:43]

but you're welcome but not expected to

[128:45]

follow along in real time with what I am

[128:47]

typing here. So that's it. Like I've

[128:50]

written my very first program in C. If I

[128:52]

had done this on an actual Mac or PC

[128:54]

without a command line interface, I

[128:57]

might have a new icon on my desktop, so

[128:58]

to speak, called hello. And ideally, I

[129:01]

could double click on that or tap on it

[129:02]

and run the program. But because I'm in

[129:04]

this specific programming environment

[129:06]

that has a mix of a guey and a CLI, I

[129:09]

actually need to click down in my

[129:10]

terminal window. And I need to now

[129:12]

compile this program first because at

[129:14]

this point in time, it exists only as

[129:16]

source code. So to do this, I'm going to

[129:19]

compile my code by very aptly saying

[129:21]

make space hello. And I'm pronouncing

[129:24]

the space, but literally I hit the space

[129:26]

bar. Make space hello as it sort of

[129:28]

implies semantically will make a program

[129:30]

called hello. Notice I have not said

[129:32]

hello.c C again because the compiler,

[129:35]

let's call it make for now, even though

[129:37]

that's a bit of a white lie, is going to

[129:38]

infer that if I want to make a program

[129:40]

called hello, it's going to

[129:42]

automatically look for a file called

[129:44]

hello. C in this case. So, a bit of

[129:47]

magic. Enter. And remarkably, anytime

[129:50]

you don't see any output at a command

[129:52]

like this, that's probably a good thing.

[129:54]

Generally speaking, when you see output

[129:56]

when compiling your code, you have done

[129:58]

something wrong. Or in this case, I

[129:59]

might have done something wrong. But no

[130:01]

output is good because what I can now do

[130:03]

and this is a bit cryptic. I can run

[130:05]

this program not by double clicking or

[130:07]

tapping anywhere but by doing dot slashh

[130:10]

hello with no spaces. And this is a bit

[130:13]

weird but what the dot slash means is

[130:15]

that a having just made a program called

[130:18]

hello that program is going to end up in

[130:20]

my current folder. It's somewhere in the

[130:22]

cloud. Yes, more on that in a bit. But

[130:23]

the program called hello is just

[130:25]

somewhere in my current folder. When I

[130:27]

say dot slash, that's like saying go

[130:29]

into the current folder and run the

[130:32]

program therein called hello

[130:34]

specifically. Now, as I often do, I'll

[130:36]

cross my fingers, hope that I didn't

[130:38]

mess this up in any way, and I should

[130:40]

see in a second hello world indeed

[130:43]

printed onto the screen. And so, just to

[130:46]

recap those then commands. One, I ran

[130:48]

code hello.c, which is a VS code

[130:50]

specific thing. Code short for VS Code

[130:52]

just creates a new file called hello.c.

[130:54]

And then I'm on my way with my own

[130:56]

keyboard. Make hello compiles that

[130:59]

source code into machine code thereby

[131:02]

creating a new file called hello. And to

[131:04]

run that program hello, I type this

[131:07]

strange command dot /hello. But this is

[131:10]

a paradigm. No matter what you call your

[131:12]

programs, we're going to see again and

[131:13]

again and again. So even if you've not

[131:15]

done something quite like this, it will

[131:17]

very quickly get familiar.

[131:20]

Yes. Questions.

[131:23]

How when you say make hello, how like

[131:25]

how does how do you how does the

[131:27]

computer know like what part of the code

[131:29]

to what part of the code is ascribed to

[131:32]

hello?

[131:32]

>> Good question. When I say make hello,

[131:34]

how does the computer know what part of

[131:36]

the code is ascribed to this program

[131:37]

hello? It literally is going to take the

[131:40]

entire contents of hello.c and turn them

[131:43]

somehow into a program.

[131:44]

>> And does it have to be like named hello?

[131:47]

>> Does it have to be named hello? No. I

[131:48]

could have called it goodbye or anything

[131:50]

more my first program C. anything at all

[131:52]

so long as I change these words here

[131:56]

accordingly.

[131:56]

>> But it has to like it needs to be like

[131:59]

from the same thing like it needs to

[132:01]

>> Yes.

[132:02]

>> have like green C and make green or

[132:04]

whatever.

[132:04]

>> Exactly. If you change the name there

[132:06]

you need to change your commands

[132:07]

accordingly. Other questions on these

[132:09]

here steps?

[132:14]

No. All right. So let's tease apart what

[132:16]

it is we just did and like why this code

[132:18]

works in the way that it does. Well, to

[132:20]

recap, in Scratch, we had a program like

[132:22]

this. When the green flag was clicked,

[132:24]

we wanted to say hello world onto the

[132:25]

screen. The code that corresponds to

[132:27]

that is roughly here. And indeed, notice

[132:29]

that the yellowish or oranges code lines

[132:31]

up with the when green flag clicked. The

[132:33]

purple code here lines up with the say

[132:35]

block. And the white code inside of here

[132:37]

roughly corresponds to what was in the

[132:38]

white oval that we kept using again and

[132:40]

again last week. So, let's do more of a

[132:42]

onetoone correspondence. And these

[132:43]

slides are deliberately designed to give

[132:45]

you again that sort of mental model of

[132:48]

taking same ideas from last week and

[132:50]

just changing the syntax this week

[132:52]

onward. So when we have a function like

[132:54]

this thing here and recall that a

[132:56]

function is just an action or verb. It

[132:58]

sort of accomplishes a small piece of

[132:59]

work in code in C specifically you're

[133:03]

going to type of course not a purple

[133:04]

puzzle piece but you're going to say the

[133:06]

word print. Well, more technically print

[133:08]

f where the f as we'll soon see means

[133:10]

format the printed output because this

[133:12]

is more powerful than just printing some

[133:14]

raw text alone. Then you can have

[133:15]

parentheses open and close left and

[133:17]

right. And notice that it's no accident

[133:19]

that MIT MIT chose an oval for their

[133:22]

input to functions because it roughly

[133:24]

looks like the start of a parenthesis

[133:27]

and parenthesis on left and right.

[133:29]

Meanwhile, what goes inside of the

[133:30]

parenthesis in the corresponding C code?

[133:32]

Well, at the end of the day, minimally

[133:34]

hello, world because that's literally

[133:36]

what we want to print to the screen. But

[133:38]

in C, unlike in Scratch, there's a bit

[133:40]

of overhead, a bit of additional syntax

[133:42]

that you just got to deal with to make

[133:44]

clear to the computer what you want to

[133:46]

print. In particular, you're going to

[133:48]

have to surround everything you want to

[133:50]

print with double quotes to make clear

[133:52]

that hello is not some special function

[133:55]

or variable or something else. It's

[133:56]

hello world is the English phrase that

[133:59]

you want to print. So double quote here,

[134:01]

double quote there means here's the

[134:03]

beginning and the end of what I want to

[134:04]

print. You're also curiously going to

[134:06]

put a backslash

[134:09]

in most cases at the end of the word or

[134:12]

words you want to print. We'll take that

[134:14]

away in a moment and see what it does.

[134:15]

And then lastly, and perhaps most

[134:17]

annoyingly in programming circles, you

[134:19]

have to finish your thought with a

[134:20]

semicolon. Much like in English, you

[134:22]

would finish most sentences with a

[134:23]

period instead. And the thing in the

[134:26]

thing about programming is with C in

[134:28]

particular, if you mess up almost any of

[134:30]

these details I just rattled off,

[134:32]

something's going to go wrong. And so

[134:34]

you're in good company. The very first

[134:35]

program you try to write or try to

[134:37]

compile, odds are it might not work

[134:39]

correctly because you'll develop over

[134:41]

time the muscle memory for spotting all

[134:42]

of these seemingly minor and actually

[134:45]

minor details, but that do matter to the

[134:48]

computer. All right. So if you're

[134:51]

familiar of course with the notation in

[134:52]

like mathematics of functions like a

[134:54]

function in code is really the same idea

[134:56]

as a function in math whereby the

[134:58]

function f takes some input for instance

[135:00]

x and generally produces some output. So

[135:03]

if you're coming more from that

[135:04]

background realize that what we're

[135:05]

really doing here is roughly the same

[135:07]

but in code recall that we can have

[135:09]

different types of output. So if this is

[135:11]

our grand mental model and say we've got

[135:14]

a function as inside of this black box

[135:16]

that takes arguments, that is to say as

[135:18]

its inputs, it can sometimes have side

[135:20]

effects. And recall that side effects

[135:21]

are often visual things that happen as a

[135:24]

result. They display on the screen.

[135:26]

Maybe it comes out of the speaker. It's

[135:27]

something generally ephemeral that just

[135:29]

happens. But it's not necessarily useful

[135:31]

in the same way as another type of

[135:33]

function that we'll return to in just a

[135:34]

bit. But last week, recall that we got

[135:36]

the cat with a speech bubble to uh

[135:38]

manifest on the screen and say hello

[135:40]

world in that speech bubble when the

[135:42]

input was hello world and the

[135:43]

corresponding function was instead say.

[135:46]

So let's see if we can't now tease apart

[135:50]

what the code we wrote is actually doing

[135:53]

for us bit by bit. So let me go back to

[135:55]

VS Code here and let me propose to break

[135:57]

this in a little way. Let me delete the

[135:59]

backslash n if only because at first

[136:01]

glance who knows or cares what that's

[136:03]

doing. Let's just get rid of it if we

[136:04]

don't understand it. I could now go back

[136:07]

down to my terminal window and I could

[136:09]

do dot /hello enter again. But there's

[136:12]

seemingly no change, which is good.

[136:15]

Doesn't seem like I broke it, but I've

[136:17]

kind of misled you here. Why?

[136:21]

Why did nothing seem to change?

[136:24]

I didn't recompile it. So, recall that

[136:26]

the compiler converts source code to

[136:29]

machine code, but I already did that a

[136:30]

couple of minutes ago. If I've changed

[136:31]

the source code, it stands to reason

[136:33]

that I need to recompile the code to

[136:36]

actually see the effects of that. So,

[136:38]

let me do that again. Make hello enter.

[136:41]

Nothing seems to have gone wrong, but

[136:43]

let me now dot /hello enter. And it's

[136:47]

subtle now. And in fact, let me go ahead

[136:48]

and zoom in. It's really just an

[136:50]

aesthetic bug in so far as functionally

[136:53]

the program is still technically

[136:54]

printing hello world. But what's

[136:57]

seemingly wrong? Or put another way,

[136:59]

what did the backs slashn apparently do?

[137:01]

Yeah.

[137:03]

>> Yeah. So, it's somehow giving me a new

[137:05]

line. And that's essentially what the

[137:06]

back slashn denotes is give me a new

[137:08]

line there. And why was I doing that?

[137:10]

Well, really just for the aesthetics.

[137:11]

Like if this dollar sign represents my

[137:13]

prompt where I type commands. If

[137:15]

anything, it just looks kind of stupid

[137:16]

that I finished a program over here and

[137:18]

then the prompt is on the same line. It

[137:20]

just looks wrong. Even though you could

[137:22]

sort of argue that was my intent, even

[137:23]

though in this case it wasn't. So, what

[137:25]

would the alternative be? Well, what

[137:27]

you're seeing here is what's actually

[137:29]

generally known as an escape sequence,

[137:31]

which are sort of uh special sequences

[137:35]

of symbols like backslash and n in this

[137:38]

case that do a little something unusual.

[137:40]

And here's just a non-exhaustive list of

[137:42]

some you'll encounter in the real world

[137:44]

and including in CS50. Back slashn moves

[137:47]

you to a new line. Back slash r is a

[137:49]

so-called carriage return. If you've

[137:50]

ever seen or used an old school

[137:52]

typewriter, this refers to the process

[137:53]

of bringing the typing head back to the

[137:56]

left end. So it sort of moves the cursor

[137:58]

horizontally as opposed to vertically.

[138:00]

This one's interesting. Back slash

[138:02]

double quote.

[138:04]

Why does there exist this pattern?

[138:08]

Back slash double quote. Yeah.

[138:10]

>> If you just write double quote, it

[138:12]

closes the

[138:14]

>> exactly. So recall that phrase we tried

[138:16]

to type uh print out like hello, world.

[138:19]

If for some reason you didn't want to

[138:20]

say hello world, but you wanted to say

[138:22]

some or like sort of snarkily like hello

[138:24]

world or something like that, well, you

[138:27]

can't put a quote a quote a quote and a

[138:29]

quote and expect the computer to know

[138:31]

which quote corresponds to what. It's

[138:33]

just arguably ambiguous. So if inside of

[138:36]

double quotes, you actually want to

[138:38]

print actual double quotes, this is a

[138:41]

escape sequence that tells the computer,

[138:43]

this is not some quote delim delineating

[138:46]

where my thought begins and ends. This

[138:48]

is literally a double quote. And we'll

[138:50]

see other situations in which a single

[138:52]

quote or apostrophe is the same. We'll

[138:53]

see crazy situations in which you want

[138:55]

to print a backslash, but backslash

[138:57]

already has some special meaning. So

[138:58]

there's solutions to all of these

[138:59]

problems. But let's not get too far into

[139:01]

the weeds here. But let me go back to

[139:03]

the code and propose what the

[139:04]

alternative otherwise might have been.

[139:06]

If I didn't know about backslashn, my

[139:09]

instinct to move the cursor to the next

[139:11]

line might have been literally to just

[139:12]

like hit enter or do something like

[139:15]

this, like move the double quote, move

[139:17]

the parenthesis, move the semicolon on

[139:19]

to the next line. But this should start

[139:20]

to rub you the wrong way. And indeed,

[139:22]

this violates a principle of most

[139:24]

programming languages and that most

[139:26]

programming languages are linebased. You

[139:29]

sort of start and finish your thought

[139:30]

ideally on the same line. And this runs

[139:33]

a foul of that. And two, even if you're

[139:35]

seeing code for the first time, assume

[139:37]

that this just looks stupid as well to

[139:40]

sort of move part of your thought to the

[139:41]

next line, it just looks a little

[139:43]

sloppy. And it is. So C and many other

[139:45]

languages, Python among them, solve this

[139:48]

by giving you these so-called escape

[139:49]

sequences. So if you want a new line

[139:51]

there, you do back slashn and you will

[139:53]

get your new line there. Now, that's a

[139:55]

bit of an overstatement what I said in

[139:57]

that sometimes lines of code will be so

[139:59]

long that they do wrap onto multiple

[140:01]

lines, but generally that's a convention

[140:03]

that we're going to try to avoid. All

[140:06]

right, what else could go wrong? Well,

[140:07]

let's do this. Let me go ahead and clear

[140:09]

my terminal window, which I can do by

[140:10]

hitting uh L or I can literally type

[140:13]

clear. And I'm going to frequently do

[140:15]

this just to keep the screen clear, even

[140:17]

though it has no functional impact. It's

[140:18]

just an aesthetic. Let me do something

[140:20]

else accidentally. Suppose I forgot to

[140:23]

finish my thought and I omitted the

[140:25]

semicolon, but otherwise the code is

[140:26]

perfect. Let me do make hello. Now

[140:29]

enter. Now we're going to see some

[140:32]

output that's a little more arcane. Let

[140:34]

me go ahead and scroll back up here to

[140:36]

make clear that what's just happened is

[140:37]

I ran make hello, but I didn't get back

[140:40]

to another prompt. I don't see

[140:41]

immediately a dollar sign because

[140:43]

there's an error message here that is

[140:44]

almost as long as the code I tried to

[140:46]

write. Not to worry. Let's see. Here is

[140:49]

the name of the file in which the

[140:50]

problem exists. Stands to reason that

[140:52]

it's in hello C. Here is the line uh

[140:55]

number in which the problem seems to

[140:57]

exist. Line five. And that's helpful

[140:59]

because it lines up with this. And then

[141:01]

if you're you care to count, this is the

[141:03]

29th character. So if I count from left

[141:05]

to right around character 29, something

[141:08]

is wrong. Something is missing. So it's

[141:10]

a pretty decent error message. In fact,

[141:12]

it even says expected semicolon after

[141:14]

expression. There's a little green

[141:16]

carrot symbol pointing me at the

[141:17]

mistake. So this is an again a this is

[141:20]

another value of the compiler. Not only

[141:22]

will does it know how to convert source

[141:24]

code to machine code, it's also pretty

[141:26]

good at finding mistakes in your code

[141:28]

and trying to draw your attention to

[141:30]

them. So how do I fix this? Well,

[141:32]

assuming you've understood the error

[141:34]

message at this point. Well, you just go

[141:35]

back in, add the semicolon. Let me go

[141:38]

back down to my terminal window. I'm

[141:39]

going to clear it just to clean up the

[141:41]

mess. Let me rerun make hello. And now

[141:44]

we are back in business. And indeed, if

[141:46]

I do /hello, I've got hello world back

[141:49]

on the screen. Well, let's make one

[141:52]

other mistake. Suppose that I forgot, as

[141:54]

you sometimes will, to include this line

[141:57]

at the top, which will make more sense

[141:58]

next week, but for now, let's just omit

[141:59]

it and dive right into the code. You

[142:01]

would think this is enough, just

[142:03]

printing out hello world. Well, here,

[142:05]

let me go back down to my terminal

[142:06]

window. Let me do make hello again now.

[142:09]

And I'm going to get a whole different

[142:10]

error message instead. So now problem is

[142:14]

still with hello C. That makes sense.

[142:16]

Line three. Okay. So somewhere in there

[142:19]

print f is suddenly the problem even

[142:21]

though the semicolon is back and the

[142:23]

back slashn is back. So let's keep

[142:24]

reading. Error call to undeclared

[142:27]

library function printf with type int.

[142:30]

And then this is a whole mouthful. So,

[142:32]

here is an example of an error message

[142:34]

that unless you're sort of conditioned

[142:36]

to know what this means and you've seen

[142:37]

it before, it's quite more cryptic and

[142:40]

unclear like what the solution to the

[142:42]

problem is, especially when the rest of

[142:43]

your code is truly correct. I've just

[142:46]

forgotten something stupid. But how can

[142:48]

I sort of think about this problem?

[142:50]

Well, it turns out that another feature

[142:53]

of C is that it comes with a bunch of

[142:55]

header files. A bunch of files whose

[142:58]

names don't end in C, but end inh. And

[143:02]

these so-called header files which end

[143:04]

inh are contain code that other people

[143:08]

wrote that you can use in your own

[143:10]

programs. So for instance in this

[143:13]

particular case a header file is giving

[143:15]

us access to what's more generally in

[143:17]

computing called a library. A library is

[143:20]

code someone else wrote that you can

[143:21]

use. And I actually used a library last

[143:23]

week when I did that import line and

[143:25]

mentioned open AAI the company. I was

[143:28]

actually using a library from that

[143:30]

company that I had automatically

[143:32]

downloaded and installed into my

[143:34]

programming environment in advance of

[143:35]

class because I don't know how to

[143:37]

implement a chatbot without standing on

[143:38]

their shoulders and using a lot of the

[143:40]

code they themselves wrote. Same idea

[143:42]

here. Even though print f is a feature

[143:45]

of C, if you want to use it, you have to

[143:48]

include that library by telling your

[143:51]

program to include the header file that

[143:55]

defines that function. And you only know

[143:57]

this by being taught it or looking it up

[143:58]

in a book or a reference. But in this

[144:00]

case, I wanted to use a header file

[144:03]

called standard io.h stdiodio.h.

[144:07]

Um, it is not studio.h.

[144:10]

This is a very common bug online. Um, if

[144:13]

you find yourself typing studio.h, typo,

[144:16]

it's standard io.h.

[144:18]

And in that file then is defined the

[144:22]

printf function. So, if I go back to my

[144:24]

code here, the solution to this problem

[144:26]

truly is to just undo the deletion I

[144:28]

made a moment ago. Because what line one

[144:30]

is now doing for me is it's telling the

[144:31]

compiler, oh, by the way, I didn't write

[144:33]

all the code that I'm about to use.

[144:36]

Please include the definition of print f

[144:39]

from this other file called standard

[144:41]

io.h. And again, you'd only know this by

[144:43]

looking it up in a reference, attending

[144:44]

a lecture or something like that. It's

[144:46]

not obvious otherwise, but these are the

[144:48]

kinds of things you very quickly look

[144:50]

up. So, where do you look them up? Well,

[144:52]

it turns out the ecosystem of C has, you

[144:55]

know, hundreds of books you can buy or

[144:57]

download, many, many, many websites.

[144:59]

Among them is one of CS50's own. And in

[145:01]

fact, the conventional way to look stuff

[145:04]

up for the programming language called C

[145:05]

is to look at the official manual pages

[145:08]

or man pages for short for the C

[145:11]

language. Unfortunately, many of them

[145:13]

were written decades ago and they were

[145:14]

certainly written by fairly advanced

[145:16]

programmers and not for a broad

[145:17]

audience. And so what we have done is

[145:19]

imported all of that freely available

[145:22]

documentation uh hosted it at our own

[145:24]

URL here manual.cs50.io

[145:27]

and we've essentially simplified it for

[145:28]

those less comfortable those of you who

[145:30]

might be less familiar with less

[145:32]

comfortable with technology and really

[145:33]

for most people who aren't used to

[145:35]

reading manual pages. It's just useful

[145:37]

to have it written in teaching assistant

[145:39]

like language instead. So for instance

[145:41]

if you go to a URL like this you'll see

[145:43]

CS50's documentation for this official

[145:46]

library standard io.a H that comes with

[145:49]

C itself. If you get a URL like this,

[145:51]

you can look up the documentation for

[145:52]

print F itself specifically. So for

[145:55]

instance, let me go ahead and just give

[145:56]

you a teaser for this. If I were to do

[145:58]

the same on my own computer, I might see

[146:01]

the CS50 manual pages here and you'll

[146:03]

see header file by header file a bunch

[146:06]

of frequently used functions in CS50.

[146:08]

We've also filtered the list down from a

[146:10]

massive list to much shorter list so

[146:12]

that you can sort of see what's most

[146:14]

likely useful to you. If you go to a

[146:16]

specific page like standard io.h, you'll

[146:18]

see for instance here just over a

[146:20]

halfozen functions that we won't touch

[146:22]

on today beyond print def, but that

[146:24]

we'll see in the class over time that

[146:25]

does useful stuff. For instance, printf

[146:28]

prints to the screen. And we'll see

[146:29]

other functions for opening files,

[146:31]

closing files, and the like because all

[146:33]

of that's related to standard IO input

[146:35]

and output. If I go to a specific man

[146:38]

page for uh this uh header file, you'll

[146:42]

see the standard formatting for these

[146:44]

pages. So, here's the name of the

[146:45]

function, print f, and it prints to the

[146:46]

screen. You'll see a synopsis, and this

[146:48]

indeed indicates we're in less

[146:50]

comfortable mode. If you want to see the

[146:51]

original, more arcane documentation,

[146:53]

just uncheck that, and you'll see the

[146:55]

original official documentation, but

[146:57]

you'll see a mention of like what header

[146:58]

file this function is defined in so that

[147:00]

you know what file to use in your own

[147:02]

code. You'll see a so-called prototype,

[147:04]

which is just the first line of code

[147:07]

from that function. More on that in just

[147:09]

a little bit. You'll see an English

[147:10]

description. You'll see example code.

[147:11]

Long story short, this is the

[147:13]

authoritative answer. And even though

[147:15]

you have access in this class to the

[147:17]

virtual rubber duck at CS50.AI and in

[147:20]

other forms of it that you'll soon see,

[147:22]

you should also have the tendency and

[147:24]

the in instinct moving forward to check

[147:26]

the official documentation. And all of

[147:28]

today's AIS are trained on things like

[147:31]

the official documentation. So that's

[147:33]

the source material that any of these

[147:35]

AI, the ducks among the duck among them

[147:37]

are actually relying on. But what we're

[147:40]

also going to see is that besides these

[147:42]

official functions, there's some that

[147:43]

CS50 itself has invented. We use these

[147:46]

really as training wheels for just the

[147:47]

first few weeks of the course and then

[147:49]

we take these training wheels off. But

[147:50]

the reality is in a language like C,

[147:53]

certain stuff is just really hard or

[147:55]

annoying to do. Certainly if you're

[147:56]

learning how to program for the very

[147:58]

first time or at least you are new to C.

[148:00]

We'll eventually show you how to do it

[148:02]

that way. But even if you just want to

[148:03]

get input from the user like a string of

[148:06]

text or a number of some sort, it's

[148:08]

generally not that easy to do in C, at

[148:10]

least in these early days. So for

[148:12]

instance, at this URL here, you can see

[148:13]

documentation for CS50's own library and

[148:16]

CS50's own header file, CS50.h. And

[148:19]

you'll see such functions in the

[148:20]

documentation as these get string, get

[148:23]

int, get char, and a bunch of others as

[148:25]

well. And we'll touch on those this

[148:27]

week. But it will ultimately be a way of

[148:30]

just getting useful work done quickly by

[148:33]

standing on our shoulders and actually

[148:36]

uh using functions we wrote to then

[148:38]

solve problems of interest to you. So

[148:42]

let's focus for instance on one of these

[148:43]

first. Get string. A string in

[148:46]

programming speak means text. Zero or

[148:49]

more characters of text like h e l l o

[148:51]

comma space w o r l d. That is a string

[148:55]

of text in computer speak. And it's

[148:57]

obviously not a number like 50. It's

[148:59]

actual text that you would type on the

[149:00]

keyboard. We'll see then what other

[149:02]

things we want to get. But with this pro

[149:04]

this function, we can start to replicate

[149:06]

another program that we implemented

[149:08]

pretty quickly last week in Scratch. So

[149:10]

recall that in Scratch, this one was a

[149:12]

little more interactive. I used another

[149:13]

blue puzzle piece ask to actually get

[149:16]

input from the user. And recall that

[149:18]

unlike the print defaf function today

[149:21]

and the say block last week, this time

[149:24]

we still have the same input output

[149:26]

model, but if we pass in arguments to a

[149:29]

function uh that we're about to see, you

[149:32]

can get back not just a side effect

[149:34]

sometimes, but a return value like a

[149:36]

useful reusable value like the person's

[149:39]

name as we'll soon see. All right, so

[149:41]

let's actually do this. If in Scratch

[149:43]

the equivalent was asking the user,

[149:45]

what's your name? asking them that and

[149:47]

then waiting for an answer that we can

[149:49]

store in a variable. Let me propose that

[149:51]

in C side by side it's going to look a

[149:53]

little something like this. Instead at

[149:55]

left we have the scratch block the ask

[149:57]

function here is the argument there too

[149:59]

and then it and wait just means it's

[150:01]

going to wait till the user finishes

[150:02]

typing. If I want to translate this to C

[150:05]

now today moving forward well it looks a

[150:07]

little something like this. The closest

[150:09]

analog in C thanks to CS50's library is

[150:13]

going to be a function called get

[150:14]

string. So there's no C function called

[150:16]

ask. And we deliberately named this

[150:18]

function get string just to make super

[150:20]

clear what it is you are getting. A

[150:22]

string of text in this case. And we've

[150:24]

got the parenthesis ready to go

[150:25]

indicative of this white oval for user

[150:27]

input. If I want to prompt the user with

[150:30]

that same phrase, what's your name?

[150:32]

Well, I can just put it inside of those

[150:34]

parenthesis. But what next do I need to

[150:36]

add around my user input? Um, you did

[150:40]

the quotation marks.

[150:41]

>> Yeah, I need the quotation marks just to

[150:43]

make clear that these aren't special

[150:45]

individual words. This is a whole phrase

[150:48]

that I want to be displayed to the user.

[150:50]

So, I'm going to indeed put double

[150:52]

quotes around everything. And this is

[150:53]

just an aesthetic. I don't in this case

[150:56]

want to bother moving the cursor to the

[150:57]

next line. Like, I want the user to see

[150:59]

the question and I want the cursor to

[151:01]

just stay there blinking waiting for

[151:03]

their prompt. But I don't want the

[151:04]

cursor to be right next to the question

[151:06]

mark. So, I'm deliberately just leaving

[151:08]

a single white space there just to kind

[151:10]

of scooch it over a bit so it looks a

[151:11]

little prettier, at least to my eye.

[151:13]

Now, we're not done yet because we need

[151:15]

to do something with this value. The get

[151:18]

string function, as we'll soon see, is

[151:19]

going to prompt the user for me to type

[151:21]

something in like my name. But where do

[151:23]

I want to put that? Well, MIT has the

[151:25]

answer put in a variable called answer.

[151:28]

And you can't rename that in Scratch.

[151:30]

It's just defined as answer. But in C,

[151:34]

what I'm going to need to do is

[151:35]

something like this. If you want to keep

[151:37]

return values around from a function,

[151:40]

you literally use an equal sign and then

[151:43]

to the left of it, you put the name of

[151:46]

the variable into which you want to put

[151:48]

that return value. So in mathematics, we

[151:51]

would use X, Y, and Z as our variables.

[151:53]

Again, in code, as in Scratch, you can

[151:55]

name your variables anything you want.

[151:57]

By convention, they should usually be

[151:58]

lowercase. They should not have spaces

[152:00]

therein, similar to file names. But this

[152:02]

is a pretty good analog now of what's

[152:04]

going on collectively here. But C is a

[152:07]

little more precise. It you can't just

[152:10]

give the variable a name. You need to

[152:12]

tell C or really the compiler what type

[152:16]

of value you want to put in this

[152:19]

variable. So if it's a string of text,

[152:20]

you put string. If it's a number, you're

[152:22]

going to put something else. But for

[152:23]

now, it's a string. Per the function's

[152:26]

name, it's going to give me a string.

[152:28]

Now, we're so close to finishing this

[152:30]

comparison. There's one detail missing.

[152:33]

What's still missing from the code here?

[152:36]

Yeah.

[152:37]

>> Yeah. So, we have to finish the thought

[152:39]

lastly with a semicolon. So, if you're

[152:41]

getting to sort of the point already,

[152:42]

like this is one of the reasons why we

[152:44]

start with Scratch, you sort of you get

[152:45]

the intuition pretty quickly. And even

[152:47]

though nothing on the right hand side is

[152:48]

particularly hard, there's just all

[152:49]

these stupid little details that you

[152:51]

have to ingrain in yourself over time.

[152:53]

In this case for C, but for many

[152:54]

programming languages, we're going to

[152:56]

see the similar paradigm. But among the

[152:58]

goals of the course too are to show you

[152:59]

how ultimately languages have been

[153:01]

evolving. And so one of the things we'll

[153:02]

see in Python in a few weeks time that

[153:04]

some of this syntax actually goes away

[153:07]

because over time humans have gotten

[153:09]

annoyed at older languages like this.

[153:10]

Like why the heck do I have to keep

[153:12]

putting a semicolon when it's clear that

[153:13]

I'm at the end of the line. So we'll see

[153:15]

among languages like Python we can get

[153:17]

rid of some of these same features. But

[153:19]

for now it's just a matter of

[153:21]

remembering what goes where. All right.

[153:23]

So, let's go ahead now and take that

[153:25]

same idea of converting Scratch to C and

[153:28]

actually do something with this code.

[153:30]

Let me go back to VS Code here. I'm

[153:32]

going to keep my file name the same, but

[153:33]

what you'll see on CS50's website is

[153:35]

that we'll add version numbers to each

[153:37]

of the examples that I'm typing out. So,

[153:39]

you can actually see the progression of

[153:40]

these programs, even though we're not

[153:42]

changing the name. And what I'm going to

[153:43]

go ahead and do here, for instance, in

[153:46]

hello C this time, is the following. I'm

[153:50]

going to go ahead and uh first get rid

[153:53]

of the single hello world. I'm going to

[153:55]

go up here and include this time cs50.h.

[153:58]

So, not one but two header files. And

[154:00]

then inside of my curly braces, inside

[154:02]

the so-called main function, as we'll

[154:05]

soon call it, I'm going to go ahead and

[154:06]

do this. Exactly the same line of code

[154:09]

as on the screen before, I'm going to

[154:11]

get a string prompting the user for

[154:13]

what's your name question mark space

[154:15]

close quote semicolon. And as an aside,

[154:19]

this will will soon see print on the

[154:21]

screen what's your name. So that implies

[154:24]

that the get string function is actually

[154:26]

using print f itself to print out that

[154:30]

message. I do not need to use print f to

[154:32]

display that message on the screen

[154:33]

because I read the documentation for

[154:35]

CS50's get string function and I just

[154:36]

know that it is using print f for me to

[154:39]

achieve that particular goal. Now let me

[154:41]

do something intuitive but not quite

[154:43]

correct. If I want to print out that

[154:45]

answer so that the expression is going

[154:47]

to be not hello world but hello David or

[154:49]

hello Kelly. Let me go ahead and say

[154:51]

hello,

[154:53]

answer back slashn to move the cursor

[154:56]

down as before. semicolon. So this is

[154:58]

not quite right. And even if you've

[155:00]

never programmed before, you can perhaps

[155:01]

see where this is erroneously going. Let

[155:04]

me remake the program because I've

[155:06]

changed the source code and I need new

[155:07]

machine code. Nothing seems to be wrong

[155:10]

aesthetic uh uh logic rather

[155:12]

syntactically. But if I do now dot

[155:14]

/hello and hit enter, you'll see I'm

[155:17]

being prompt. What's your name? So I'm

[155:18]

going to go ahead and type in David and

[155:20]

then hit enter. But when I do, if you

[155:23]

know where this is going, what am I

[155:24]

going to see instead?

[155:26]

>> Hello answer. And the computer's just

[155:29]

doing literally what I told it to do. I

[155:30]

said quote unquote print out hello

[155:32]

answer. But obviously that's not the

[155:34]

goal that I have in mind. So how do I

[155:36]

actually work around that? Well, what I

[155:38]

really need to do is achieve the

[155:40]

equivalent of this thing here, which we

[155:42]

did by stacking blocks in Scratch or

[155:44]

nesting them, if you will, one inside of

[155:46]

the other. So, I want to join the

[155:48]

expression hello, space, and that

[155:51]

answer. And it turns out in C, you can't

[155:52]

do it quite like this. Like, there isn't

[155:54]

an analog of the join function, at least

[155:56]

that we'll see today. So, we have to do

[155:59]

this a little bit differently. We can do

[156:01]

it though by maybe telling the computer,

[156:03]

we'll go ahead and print out hello,

[156:05]

comma, space, and then maybe we can give

[156:07]

it like a placeholder to plug in the

[156:10]

name once we know the name. Because when

[156:12]

I'm writing my code, I have no idea

[156:13]

who's going to play this game, me or

[156:15]

Kelly or someone else. So, what if we

[156:17]

use special syntax to indicate where I

[156:20]

want the person's name actually to go?

[156:22]

Let me propose that we now do this.

[156:25]

instead of printing out hello quote

[156:28]

unquote uh hello comma answer quote

[156:30]

unquote let's go ahead and start

[156:32]

printing out something and I got my

[156:34]

parenthesis ready to go and I did my

[156:35]

semicolon in advance this time I want to

[156:37]

somehow now say hello placeholder and

[156:40]

you would only know this by someone

[156:42]

having told you or a reference online

[156:44]

percent s is the placeholder for a

[156:46]

string that you don't know when you're

[156:48]

writing the code but when someone else

[156:50]

is running the code it will be filled in

[156:53]

and substituted for other input. So,

[156:55]

hello, percent s is the closest we can

[156:57]

get to this. I still need though some

[156:59]

other syntax. I still I do need those

[157:02]

quotes on the left and the right just to

[157:05]

be uh aesthetically pleasing. I'm going

[157:07]

to put a back slashn there at the end to

[157:09]

move the cursor, but now I've left room

[157:11]

in my parenthesis for one more thing.

[157:13]

And you can perhaps guess where I'm

[157:14]

going with this. Again, even if you've

[157:15]

never programmed before, this is telling

[157:17]

print f print out h e l o comma space

[157:21]

something. What should I probably pass

[157:24]

in to these parentheses as a second

[157:27]

input so that print f knows what that

[157:29]

something is?

[157:31]

Yeah,

[157:31]

>> the variable.

[157:32]

>> The variable name. So the variable in

[157:35]

which I have the user's name and indeed

[157:38]

the convention is to put a comma after

[157:40]

the quotes and then the name of the

[157:42]

variable that has the value you want to

[157:44]

be substituted for that placeholder. Now

[157:47]

notice there's a collision of syntax and

[157:49]

grammar here. The comma inside of the

[157:51]

quotes is just an English thing. Hello,

[157:53]

comma, so and so. The comma outside of

[157:56]

the quotes is meaningful to C because it

[157:59]

delineates which is the first input or

[158:01]

argument to left and which now is the

[158:04]

second. And we haven't seen this before

[158:05]

in C. Up until now, we've only been

[158:07]

passing one input, but you can pass in

[158:08]

two or three or four. Completely depends

[158:10]

on what the function is designed to

[158:13]

expect. So, let me put this all together

[158:16]

now. Let me go back to VS Code.

[158:18]

Previously, we were literally printing

[158:20]

out answer, but I can change answer to

[158:22]

percent s. I can move my cursor outside

[158:25]

of those quotes, comma, answer, because

[158:28]

that's the name I gave to that variable.

[158:30]

I can go back down to my terminal window

[158:31]

and clear it just to reduce clutter. Let

[158:34]

me do make hello one more time. Seems to

[158:36]

work. Dot /hello. Enter. DAV ID. And now

[158:41]

hello,

[158:43]

David is printed.

[158:46]

Okay, questions on any and all of that.

[158:49]

>> I was wondering with the header file,

[158:52]

where is it pulling from?

[158:53]

>> Good question. Where is it pulling these

[158:54]

header files from? So, what you are

[158:57]

seeing here is a graphical user

[158:59]

interface that's somewhere hosted in the

[159:00]

cloud at cs50.dev, the URL I mentioned

[159:03]

last week, and we're going to tease this

[159:04]

apart in just a moment. That software is

[159:08]

running on a computer, and that

[159:09]

computer's got a hard drive or a solid

[159:10]

state drive, like folders of storage.

[159:12]

Those files, CS50.h and standard.io.h

[159:15]

age and many more are pre-installed on

[159:17]

the server to which I have connected and

[159:19]

they're stored in a standard place so

[159:21]

that the compiler in particular knows

[159:22]

where to look for them and those are all

[159:24]

things we did in advance for you. Yeah.

[159:27]

>> Why is back slashn not create a new like

[159:30]

a new line?

[159:31]

>> Why does the back slashn not create a

[159:33]

new line? So it is back slashn is

[159:36]

essentially being printed here which has

[159:38]

the effect of pushing the dollar sign to

[159:40]

the next line. Otherwise, the dollar

[159:42]

sign would stay on that second to last

[159:44]

line. Other questions?

[159:46]

>> Why is there no backslash on this?

[159:49]

>> Good. Uh, why is there no backslash and

[159:51]

over here?

[159:52]

>> Good question. My choice as the

[159:54]

programmer. I just wanted to see the

[159:55]

sentence, what's your name? And I wanted

[159:57]

the user me to type my name immediately

[160:00]

after it like this. But I didn't have to

[160:03]

do it that way. I just wanted to show

[160:04]

you the difference.

[160:05]

>> Gotcha. And then also like just

[160:07]

generally when we're like doing the work

[160:09]

should we always write the like first

[160:11]

four lines.

[160:13]

>> Should you always write the first four?

[160:14]

Oh these. Yes. For today trust me do

[160:18]

this, do this, do this, do this. And

[160:20]

next week we'll understand even more

[160:21]

what those lines do. However, slight

[160:23]

caveat only use cs50.h if you're using

[160:26]

one of our functions. Clearly you don't

[160:28]

need cs50.h if you're just printing

[160:30]

something out as in the first example.

[160:32]

Other questions?

[160:35]

is dividing the first input and the

[160:37]

second input. I understand that the

[160:39]

second input is what I type as the user.

[160:42]

The first input doesn't really feel like

[160:45]

input for me because that's the question

[160:46]

that you asked. Can you like explain a

[160:49]

little bit why both say input?

[160:51]

>> Correct. So to to summarize the question

[160:54]

on the right here, this input is

[160:55]

effectively provided by the user. This

[160:57]

first input though is provided by me.

[160:59]

That's the way it is. So uh these are

[161:02]

both inputs because they're being

[161:03]

provided as inputs to the function. The

[161:06]

origins of those inputs though are

[161:08]

entirely up to what I'm trying to

[161:10]

achieve. The first one I know in advance

[161:12]

like I'm the programmer. I know I wanted

[161:13]

to say hello, someone. The second input

[161:17]

I don't know in advance. So I'm using a

[161:19]

place I'm using a variable to store the

[161:21]

value that I'm going to get when the get

[161:23]

string function is used later on. But

[161:25]

they're both inputs even though they're

[161:27]

used in different ways. Good question.

[161:29]

Any others?

[161:31]

No. Okay. So, if we now have that done,

[161:36]

well, let's just take a step back into

[161:38]

the first question that was just asked

[161:39]

about um where are these files? Let's

[161:41]

take a look back at actually what it is

[161:43]

we're actually using here. So, it turns

[161:45]

out even though most of you are using

[161:46]

Mac OS or Windows, there's other

[161:48]

operating systems out there in the

[161:50]

world. Phones have iOS. Uh iPads have

[161:53]

iPad OS. Uh Android devices have

[161:56]

Android, which is its own operating

[161:58]

system. The operating systems in the

[162:00]

world are the pieces of software that

[162:02]

really just do the most fundamental

[162:04]

operations on a device like booting it

[162:06]

up, shutting it down, sending something

[162:07]

to a printer, displaying something on

[162:09]

the screen, managing windows and icons

[162:11]

and all of that sort of commodity stuff

[162:13]

that is used by other people's software

[162:15]

as well. A very popular operating system

[162:17]

in the programming world and in the

[162:19]

world of servers in the cloud and on the

[162:21]

internet at large is called Linux. And

[162:23]

it's a descendant of something called

[162:25]

Unix um which has been around for quite

[162:27]

some time and it's what many programmers

[162:30]

most programmers um use depending on

[162:32]

their environments in so far as Linux is

[162:34]

very highly performant like you can

[162:36]

support thousands of millions of users

[162:39]

on servers running an operating system

[162:41]

like this. It tends not to but it can

[162:43]

have a graphical user interface which

[162:45]

just means it can operate more quickly

[162:47]

because it doesn't need all of these

[162:48]

graphics that are really just for humans

[162:49]

benefits not necessarily for web

[162:51]

browsers and other devices. And Linux in

[162:54]

so far as it's usually used or often

[162:56]

used as a command line interface comes

[162:59]

with a whole bunch of commands that

[163:00]

you'll start to use and see over time.

[163:02]

Now I've used a bunch of commands

[163:03]

already. I've used code which is a VS

[163:05]

code thing. I have used make which is

[163:08]

for today's purposes our compiler but

[163:10]

that's a little white lie that we'll

[163:11]

distill next week. Uh and then I've used

[163:13]

dot /hello which is a command I

[163:16]

essentially invented as soon as I

[163:18]

created a program called hello. But

[163:20]

there's a bunch of other ones as well.

[163:22]

For instance, if I want to list the

[163:24]

files in my current folder, I can type

[163:26]

ls and hit enter for short. If I want to

[163:29]

uh create a new folder, otherwise known

[163:31]

as a directory, I can use mkdir to make

[163:34]

a directory. If I want to remove a

[163:36]

directory, I can use rm directory. If I

[163:38]

want to remove a file, I can use rm. If

[163:40]

I want to rename a file, I can use mv

[163:42]

for move. If I want to copy a file, cp.

[163:45]

If I want to change directories, change

[163:46]

into a folder, I can use cd. Now, these

[163:49]

two just take a little bit of time and

[163:51]

practice to memorize them, and they're

[163:52]

all very tur in so far as the whole

[163:55]

point of a command line interface is to

[163:56]

let people navigate things quickly. So,

[163:59]

for instance, even though this will be a

[164:01]

bit of a whirlwind, let me go back into

[164:03]

VS Code and let me propose that we play

[164:06]

around with just a few of these commands

[164:08]

so that you've seen me doing it, but

[164:10]

generally speaking, in CS50's problem

[164:11]

sets, we will tell you step by step what

[164:13]

commands to type so that you can achieve

[164:16]

the same results. And then later in the

[164:17]

term we'll stop bothering reminding you

[164:19]

pedantically how to do uh this and that

[164:22]

because it should come more naturally

[164:23]

eventually. But for instance let me go

[164:25]

ahead and do this. Let me go ahead and

[164:26]

reopen my file explorer at left. Yours

[164:30]

will look a little different. You'll

[164:31]

have a different number as your unique

[164:32]

ID but generally you'll see whatever

[164:34]

files and or folders you've created

[164:36]

already. The first thing I created today

[164:38]

was called hello.c. And then by using

[164:40]

make I created a second file I claimed

[164:43]

called hello. So the reason hello works

[164:46]

is because there is in fact a program

[164:48]

called hello in my current folder ergo

[164:51]

the dot that was created when I compiled

[164:54]

my source code into machine code. Now

[164:56]

suppose for the sake of discussion that

[164:58]

this is going to get messy quickly

[165:00]

because the more programs we create in

[165:01]

class and for problem sets, you're just

[165:03]

going to have a hot mess of files inside

[165:05]

of this one main folder. Well, let's

[165:07]

create subfolders like you might be

[165:08]

inclined to do on your Mac or PC or

[165:10]

Google Drive or whatnot. Well, we can do

[165:12]

this in a bunch of ways. I could

[165:15]

rightclick or controll-click on my file

[165:18]

explorer, and I'll see a somewhat

[165:19]

familiar uh contextual menu, and I can

[165:22]

literally choose new folder, or I can

[165:24]

rename things, or I can move things

[165:26]

around by dragging and dropping them.

[165:28]

But for today, let's focus more on the

[165:29]

CLI, the command line interface. And

[165:31]

again, commands like this. So, let me go

[165:34]

back into VS Code, and let me propose

[165:36]

that we do a few things just because as

[165:38]

a tour. First, let me delete the machine

[165:41]

code. I I've I'm done with this example.

[165:43]

I don't really want to keep these bits

[165:44]

around unnecessarily. I'm going to

[165:46]

delete hello. Not hello.c, but hello.

[165:49]

The compiled program. When I type that,

[165:51]

I'll be cautioned. Remove the regular

[165:53]

file, whatever that means, called hello.

[165:55]

Here, I'm being prompted for a yes no

[165:58]

response. Y suffices. So, I'm going to

[165:59]

hit Y, enter, and watch what happens at

[166:01]

top left. As soon as I use my terminal

[166:04]

window and this command to remove that

[166:07]

file, it disappears. I could have

[166:09]

rightclicked on it or control-cllicked

[166:10]

on it, but this command line interface

[166:12]

achieves the same thing. Now suppose

[166:14]

that for problem set one in future

[166:16]

problem sets, I want to keep like every

[166:17]

program I write in its own folder just

[166:19]

to keep myself organized, especially as

[166:21]

the term progresses. Well, let me create

[166:23]

a new folder called hello itself. So I

[166:25]

don't want to create a program called

[166:27]

hello. I want to call create a folder

[166:29]

called hello. Well, one way I can do

[166:31]

this per this here cheat sheet is to

[166:33]

make a directory which just means

[166:35]

folder. So, mkdir

[166:37]

hello. Enter. And you'll see at top left

[166:40]

now I indeed have a folder. And it even

[166:42]

has an obvious folder icon next to it.

[166:44]

Now I could cut some corners. I could

[166:45]

click and drag on hello.c and just drop

[166:47]

it into hello. But again, let's stick

[166:49]

with the command line interface. Let me

[166:51]

go ahead now and move mv for short.

[166:54]

Hello. C into hello. So this is the

[166:58]

first command where I'm passing in not

[167:01]

one word after the command like code

[167:04]

hello. see or make hello. Now I'm typing

[167:06]

two words after the command because the

[167:09]

way the move command is designed is to

[167:11]

expect the origin as the first word and

[167:14]

the destination as the second so to

[167:16]

speak whereby if I want to rename hello

[167:20]

C sorry if I want to move hello.c into

[167:22]

the hello folder I should type like

[167:24]

this. Now, you can, just so you know,

[167:27]

include a trailing slash, a forward

[167:30]

slash at the end of the destination just

[167:33]

to make clear that you want to put this

[167:34]

into a folder and not just rename

[167:36]

hello.c to hello. But because the hello

[167:40]

folder already exists, Linux knows what

[167:42]

it's doing. And it's just going to

[167:43]

assume that when you do that, watch what

[167:45]

happens at top left. Hello. C seems to

[167:47]

have disappeared. But if I click this

[167:48]

little triangle, ah, there it is. It's

[167:51]

now inside of that folder. But now I've

[167:53]

created kind of a predicament for

[167:54]

myself. Let me clear my terminal window.

[167:56]

And now let me type ls. And when I type

[167:59]

ls for list, you'll see only a folder

[168:03]

called hello. And it's colorcoded just

[168:05]

to call it out to your eyes. And there's

[168:06]

a trailing slash just to make obvious

[168:08]

that it's a folder. That's all done

[168:09]

automatically for you by Linux, the

[168:10]

operating system. But wait a minute,

[168:12]

where did my hello program go? Like

[168:14]

where is hello. C. Well, it's in that

[168:17]

folder. So I need to change into that

[168:19]

folder or directory. And here per the

[168:21]

cheat sheet, we have cd for change

[168:23]

directory. So, I can do cd space hello

[168:26]

with or without the slash and hit enter.

[168:29]

And now you'll see this. And it's

[168:30]

admittedly a little cryptic, but my

[168:32]

prompt has now changed to still be a

[168:35]

dollar sign, but before it is just a

[168:37]

constant reminder of where what folder I

[168:39]

am in. We uh adopted this as a

[168:42]

convention. Many systems do the same

[168:43]

thing, though the formatting might be a

[168:44]

little different. This is just to help

[168:46]

you remember where the heck you are

[168:48]

without having to type some other

[168:49]

command to ask the operating system what

[168:51]

folder you are in. So now that I'm here,

[168:54]

if I type ls and hit enter, what should

[168:56]

I see?

[168:59]

Just hello. C because that's the only

[169:01]

thing in that there folder. So now let's

[169:04]

do maybe one other thing. Let's do make

[169:06]

hello inside of this folder. That is

[169:10]

okay. And notice at top left what just

[169:11]

happened. Now I've got both files back.

[169:14]

All right. Suppose I want to get rid of

[169:15]

one. Well, I can do rm hello again. I

[169:18]

can type y for yes to confirm the

[169:19]

deletion. And now I'm back to where I

[169:21]

just was. Now suppose I want to do yet

[169:23]

other things. Suppose that I'm not

[169:24]

really proud of this version of hello.

[169:26]

C. Let me keep it but rename it. Well, I

[169:29]

can say uh how about MV hello C to old

[169:34]

C. I just want to rename the file. So MV

[169:36]

can be used not only to physically move

[169:38]

a file from one place to another. If you

[169:40]

use it onto file names, it will just

[169:42]

rename the file for you. So there's no

[169:44]

rename command that you need use

[169:46]

instead. Uh but you know what? Nope. I

[169:48]

regret that. This program was fine.

[169:49]

Let's rename it back. So, let's move old

[169:52]

C back to hello. C. And watch it. Top

[169:54]

left. It just renames the file again.

[169:57]

Um, let me go ahead and make a backup

[169:58]

though. So, let me copy with CP hello. C

[170:01]

into a file called like backup.c just in

[170:04]

case I screw this up. I want to have a

[170:05]

spare around. Now, you see at top left,

[170:07]

I've got both files. If I now type ls,

[170:09]

you'll see both files. So, what's

[170:11]

happening in the guey is the exact same

[170:13]

thing is happening in the CLI. But, you

[170:15]

know what? This was just for

[170:15]

demonstration sake. I don't need any of

[170:17]

this. So, let me remove the backup. say

[170:19]

yes for y. Let me go ahead and move

[170:21]

hello.c out of this folder, which I

[170:24]

could just kind of drag and drop it. But

[170:26]

how do I move hello C to the parent

[170:29]

folder, so to speak. I want to move it

[170:31]

out of this folder. Well, you would only

[170:32]

know this by having been told dot dot is

[170:35]

special notation. That means the

[170:37]

so-called parent folder. So, go back up

[170:39]

in the hierarchy. And now, if it's not

[170:42]

obvious, a single dot, which we have

[170:44]

seen before, means this folder. Two dots

[170:47]

means one step up. There's no triple

[170:49]

dots or quadruple dots. You have to use

[170:51]

different syntax, but more on that

[170:52]

another time. So, watch what happens

[170:54]

when I do move hello.c up into the

[170:57]

parent directory. Notice at top left

[171:00]

that the indentation changed because

[171:02]

it's no longer inside of that same

[171:04]

folder. And heck, now I'm going to go

[171:05]

ahead and do this. I could go back to my

[171:07]

main folder by doing cd dot dot to back

[171:10]

out of this folder. But when in doubt or

[171:12]

if you ever get yourself into a

[171:14]

confusing mess, just type cd enter alone

[171:17]

and you'll be magically whisked away to

[171:19]

your default folder, a home directory so

[171:22]

to speak, even though that too is a bit

[171:23]

of a white lie. So that will lead you

[171:26]

always where you're starting when

[171:28]

logging in to c50.dev aka VS Code. And

[171:31]

now I can see the folder which happens

[171:33]

to be empty and the file. So let me go

[171:35]

and do one last command rmder. Hello to

[171:39]

really undo all of the work such that

[171:41]

we're now back to where the story began.

[171:43]

But the point here is just to

[171:45]

demonstrate with that with these basic

[171:46]

fundamental commands, you can do

[171:48]

everything that you've taken for granted

[171:49]

on Macs and PCs for years with a mouse

[171:53]

instead. Questions on any of these here?

[171:56]

Linux commands. Yeah.

[172:00]

>> Files in a folder, how can you like to

[172:02]

open?

[172:04]

>> Really good question. If you have five

[172:06]

different f files in a folder, how can

[172:07]

you choose which one to open? Well, you

[172:09]

can certainly do code space and the name

[172:11]

of the file you want to open. Or we're

[172:13]

going to see other tricks like you can

[172:14]

use an asterisk or star for a so-called

[172:16]

wild card and say open everything in

[172:19]

this folder. And you can even use more

[172:20]

precise patterns than that. So over time

[172:22]

once we have more files at my disposal,

[172:24]

I'll be able to do tricks like that as

[172:25]

well too. Yeah.

[172:27]

>> I don't know if I said

[172:32]

it back.

[172:33]

>> Uhhuh. when you like delete the file was

[172:36]

that hello was that hello.

[172:44]

>> Sure. So one of the things I did in my

[172:46]

VS code a moment ago was once I was

[172:48]

inside of the hello folder into which I

[172:51]

had put hello.c just for the sake of

[172:53]

discussion. I then recompiled it by

[172:55]

running makehello. And this example is a

[172:58]

little confusing deliberately in so far

[173:00]

as I've got a file called hello.c C

[173:02]

inside of a folder called hello. But

[173:05]

because I compiled hello.c, I then

[173:08]

created a program called hello as well.

[173:10]

But that program hello was inside of a

[173:13]

folder called hello. Which is only to

[173:15]

say that you can totally do this. You

[173:16]

can't have a file in a folder in the

[173:19]

same place named the same thing because

[173:20]

they would collide. Like you can't do

[173:22]

that on a Mac or a PC as well. You have

[173:23]

to have unique names. But you can

[173:25]

certainly put something inside of

[173:26]

another folder without collision. Good

[173:29]

question. All right. So let's introduce

[173:31]

a few more building blocks and a few

[173:33]

more things we can do. So besides these

[173:35]

Linux commands which we'll now start

[173:37]

taking for granted, we have a bunch of

[173:38]

other features of of programming

[173:40]

languages that we saw in Scratch. Let's

[173:41]

now translate them to C. So conditionals

[173:43]

were sort of the proverbial fork in the

[173:45]

road enabling you to do this or this or

[173:48]

some other thing based on the answer to

[173:49]

a question, a so-called boolean

[173:51]

expression. Here for instance in scratch

[173:53]

is how we might express if a variable x

[173:56]

is less than a variable y we'll go ahead

[173:58]

and say x is less than y and out of

[174:01]

context I didn't include it in the slide

[174:02]

presumably we've created x and y and

[174:05]

somehow given them values whatever they

[174:07]

are but this is just now the conditional

[174:09]

part of the program in C the way you

[174:11]

would do the same thing is you would say

[174:13]

if and then a space then parentheses

[174:16]

which have nothing to do with functions

[174:17]

if is not a function it is a feature of

[174:21]

C that implements conditionals just like

[174:23]

this orange block is a feature of

[174:24]

scratch inside of the parenthesis you

[174:26]

put your same boolean expression. So

[174:28]

here too out of context if up here I

[174:30]

have defined variables X and Y well I

[174:33]

can certainly use them in this

[174:34]

conditional and I can use this less than

[174:36]

operator just like in math class to ask

[174:38]

this question and the answer even though

[174:40]

it's a less than sign is indeed if you

[174:42]

think about it going to be true or false

[174:44]

yes or no. It's a boolean expression. It

[174:47]

either is less than or it is not. All

[174:49]

right. Inside of the curly braces which

[174:50]

are necessary here I'm just going to

[174:52]

literally put our old friend print f.

[174:54]

And there's nothing interesting here

[174:56]

except the new phrase x is less than y

[174:58]

with the backslash end the semicolon and

[175:00]

the parenthesis. This though is

[175:01]

deliberate just like in Scratch the say

[175:05]

is sort of indented and sort of hugged

[175:07]

by the if orange puzzle piece. Similarly

[175:10]

do these curly braces are they meant to

[175:12]

sort of imply the same. It's sort of

[175:14]

embracing these lines of code. As an

[175:16]

aside in C they're not always necessary.

[175:19]

If you have a single line of code you

[175:20]

can technically omit them. However, what

[175:22]

you'll see in C as in as well as in CS50

[175:25]

in particular, we will generally preach

[175:27]

a certain style like any company in the

[175:29]

real world would do so that programmers

[175:31]

who are collaborating on code all write

[175:33]

code that looks the same uh so that it

[175:36]

doesn't uh devolve into a mess because

[175:38]

everyone has their own convention. So

[175:40]

this is a convention to which you should

[175:41]

indeed it here and then I've indented

[175:43]

four spaces to make clear logically that

[175:46]

this line of code only executes if the

[175:48]

answer to this question is true or yes.

[175:51]

Meanwhile in Scratch if we had an if

[175:53]

else condition so a two-way fork in the

[175:55]

road. If x is less than y say so else

[175:58]

say x is not less than y. How can I do

[176:00]

that in c? Well if x less than y

[176:03]

something else something else. And what

[176:06]

are the uh what's goes in between those

[176:08]

curly braces? Well, just two different

[176:10]

print fs. X is less than Y or X is not

[176:13]

less than Y. The only new thing here is

[176:15]

we've added else and another pair of

[176:16]

curly braces, just like we've got sort

[176:18]

of two uh orange uh shapes hugging those

[176:22]

two purple puzzle pieces there. All

[176:24]

right, how about something a little more

[176:25]

involved? And this looks like it's

[176:26]

escalating quickly, but it's just

[176:27]

because the scratch puzzle pieces are so

[176:29]

big. If x is less than y, then say x is

[176:32]

less than y. Else if x is greater than

[176:35]

y, then say x is greater than y. else if

[176:38]

x equals y then say x is equal to y. How

[176:42]

can we do this and see almost the same

[176:44]

idea. If x less than y else if x greater

[176:47]

than y else if x equals equals y. Well

[176:51]

before we reveal what's in the curly

[176:53]

braces. This is not a typo. Why have I

[176:56]

presumably done this even if you've

[176:58]

never used C before. Yeah.

[177:06]

>> Exactly. The single equal sign, which

[177:08]

we've used already when storing a value

[177:10]

from get string into a variable like

[177:13]

answer, is technically the assignment

[177:15]

operator. So humans decades ago decided

[177:17]

that when faced with the situation where

[177:19]

they wanted to copy from the right to

[177:21]

the left a return value into a variable,

[177:23]

it made sort of visual sense to use an

[177:25]

equal sign because you want those two

[177:26]

things ultimately to be equal. Even

[177:28]

though you kind of read the code from

[177:29]

right to left in that case, I can only

[177:31]

imagine at some point the same people

[177:33]

were in the room and they were coming up

[177:34]

with the syntax for conditionals and

[177:35]

like oh shoot we've already used equals

[177:38]

for assignment. What do we now use for

[177:40]

equality and the solution in C as well

[177:43]

as in many other languages is literally

[177:45]

this. They use two. So this is the

[177:47]

equality operator whereas a single one

[177:49]

is the assignment operator and it's just

[177:51]

because now Scratch is designed for

[177:53]

kids. No sense in confusing little kids

[177:55]

with equal equal signs. So, Scratch uses

[177:57]

a single equal sign, whereas C and most

[178:00]

languages use double equal sign. So, a

[178:02]

minor divergence there. What goes in the

[178:04]

curly braces? Nothing all that

[178:05]

interesting, just a bunch more print fs.

[178:08]

But here's an opportunity to distinguish

[178:10]

not only the equivalence of this scratch

[178:12]

code with CC code, but a misdesign

[178:14]

opportunity that we sort of tripped over

[178:16]

if briefly last week. This is arguably

[178:19]

not well designed even though it is

[178:22]

correct.

[178:24]

Why? Yeah,

[178:25]

>> you don't need to ask.

[178:28]

>> Yeah, we don't need to ask this third

[178:30]

boolean expression. Is X equal equal to

[178:33]

Y, so to speak? Well, logically, if

[178:35]

we're using sort of normal person

[178:36]

numbers, it's either less than or

[178:38]

greater than or by default equal to. So,

[178:41]

you're just wasting the computer's time

[178:43]

and in turn the user's time by asking

[178:45]

this third question. So, slightly better

[178:47]

here would be get rid of the else if

[178:49]

just have a default case, an else block

[178:52]

so to speak, that looks like this. if it

[178:54]

stands to reason that there's only three

[178:55]

possibilities, you only really need to

[178:57]

interrogate two of them out of the

[178:59]

three. So, a minor optimization, but you

[179:00]

could imagine doing that again and again

[179:02]

and again in your code. You don't want

[179:04]

to be wasting the computer or the user's

[179:06]

time if you can improve things like

[179:08]

that. All right. So, now that we have

[179:10]

these equivalences between Scratch code

[179:13]

and C code for these conditionals, well,

[179:15]

what other things can we throw into the

[179:16]

mix? Well, uh C has a whole bunch of

[179:18]

operators. And just so that you've seen

[179:20]

a list in one place, you've got not only

[179:22]

assignment and less than and greater

[179:23]

than and equality, but a few others here

[179:26]

as well. Now, even though in like

[179:28]

Microsoft Word, in Google Docs, you can

[179:30]

kind of do a greater than or equal to

[179:32]

sign one over the other or less than or

[179:34]

equal to, in C in most languages, you

[179:36]

actually just hit the keyboard twice.

[179:38]

You do the less than and an equal sign,

[179:40]

or you do a greater than and the equal

[179:42]

sign. And that's how you achieve the

[179:43]

notion of greater than or equal to or

[179:45]

less than or equal to. Um, this one

[179:47]

we've seen. Anyone want to guess what uh

[179:50]

exclamation point equals means?

[179:51]

Otherwise pronounced bang equals. Yeah.

[179:55]

>> Not equal. So generally in programming

[179:57]

you'll see an exclamation point implying

[180:00]

the negation of something else. The

[180:02]

opposite. So you don't want it to be

[180:03]

equal to, you want it to be not equal

[180:05]

to. Now you might think, shouldn't it be

[180:08]

not equal equal? Yes, but they're trying

[180:11]

to save keystrokes. So this is the

[180:13]

negation of that even though it doesn't

[180:15]

quite look like it should be. just two

[180:18]

characters instead of three. Um, and dot

[180:20]

dot dot there's many other operators

[180:21]

that we'll encounter in the wild over

[180:24]

time. Um, but there's also worth noting

[180:26]

in C more than just strings like strings

[180:29]

recall were strings of text and there's

[180:30]

other types of uh data that you might

[180:33]

get from a user or store. We've seen

[180:35]

string but we'll actually see a whole

[180:37]

bunch of others. So in C we're going to

[180:39]

see bools themselves a a variable that

[180:42]

can be true or false and that's it. So

[180:44]

very much interrelated with boolean

[180:46]

expressions. A variable itself can be

[180:47]

true or false. We're going to see chars

[180:49]

or characters. So not strings of text

[180:52]

like multiple letters and words and the

[180:54]

like but just individual characters. C

[180:56]

unlike some languages does distinguish

[180:58]

between single characters and multiple

[181:00]

characters. Uh double or rather let's

[181:03]

jump to float. A float is otherwise

[181:05]

known as a floatingoint value which is

[181:06]

just a number that has a decimal point

[181:08]

in it. a real number if you will, but a

[181:11]

float generally uses nowadays 32 bits

[181:15]

total to represent those numbers. The

[181:17]

catch with that is that how many total

[181:19]

values can you represent with 32 bits

[181:22]

roughly per last week?

[181:25]

It was one of the few numbers I propose

[181:26]

you remember. It's like roughly 4

[181:28]

billion. But how many real numbers are

[181:30]

there in the world according to math

[181:31]

class?

[181:33]

An infinite number. So we seem to have a

[181:34]

mismatch between what we can represent

[181:36]

in code and how many actual numbers

[181:38]

there are in the world. Okay, so not to

[181:40]

worry if you need more precision like

[181:42]

more significant digits. Well, you can

[181:44]

upgrade your variable so to speak from a

[181:46]

float to a double which uses 64 bits

[181:48]

which is way more precise twice as many

[181:50]

bits but it doesn't fundamentally solve

[181:52]

the problem because really it's still

[181:54]

finite and not infinite. And we'll end

[181:56]

today with a look at what the real world

[181:57]

implications of that are. But besides

[181:59]

floatingoint values, they're just simple

[182:01]

integers. 0 1 2 and the negatives

[182:03]

thereof. Uh but those conventionally use

[182:06]

32 bits, which means the highest a

[182:07]

computer can count using an int would be

[182:10]

4 billion. But if you want to do

[182:11]

negative numbers, it's going to be

[182:13]

roughly 2 billion. So you can go all the

[182:15]

way to negative 2 billion. So that's not

[182:16]

that large nowadays. Along uses 64 bits,

[182:20]

which is a much bigger range of values,

[182:21]

but there too still finite. And there's

[182:24]

a bunch of others as well. So these are

[182:26]

just the types of data that we can store

[182:29]

and manipulate in our programs. But a

[182:31]

couple of those know do uh couple of

[182:34]

those one in particular specifically

[182:36]

come from cs50.h. So among the things

[182:40]

you get by including cs50.h in your code

[182:43]

is access to not only get string but

[182:45]

these other functions as well. And we'll

[182:47]

start to use these in a little bit

[182:48]

whereby you can get integers or chars or

[182:51]

doubles or floats. We don't have a get

[182:52]

bool cuz it's not really useful to just

[182:54]

get a true or false value typically, but

[182:57]

we could have invented it. We just chose

[182:58]

not to. But we'll frequently use these

[183:00]

here functions that you can access by

[183:03]

using that there header file. But where

[183:05]

are we going to put these values and how

[183:07]

are we going to display them? Well,

[183:09]

turns out there's more than just percent

[183:11]

s. So percent s was a placeholder for a

[183:13]

string, but if you want to print out

[183:15]

something like a char, a single

[183:17]

character, you're actually going to use

[183:19]

percent c. If you want to print out a

[183:21]

floatingoint value, you're going to use

[183:23]

percent f. An integer percent i and a

[183:25]

long integer that is a long, you're

[183:27]

going to use percent li instead. So in

[183:30]

short, there's solutions to all of these

[183:32]

problems. These are not uh

[183:34]

intellectually interesting details, but

[183:36]

they are useful practical things to

[183:39]

eventually absorb over time. So let's go

[183:42]

ahead and do this. Let's do just a few

[183:44]

more examples together. In a little bit

[183:45]

we'll journey and we uh for a short

[183:47]

break uh during which uh snacks will be

[183:49]

served every week out in the transep.

[183:51]

But before we get to that, let's uh

[183:53]

focus on these here variables. So in

[183:55]

Scratch we had the ability to store a

[183:57]

bunch of values in variables that we

[184:00]

could create ourselves by creating new

[184:01]

puzzle pieces. In C you can essentially

[184:03]

achieve the same. So for instance

[184:05]

suppose that in Scratch we wanted to

[184:07]

keep track of someone's score using a

[184:09]

counter. Well, we might create a

[184:10]

variable called counter and set it

[184:12]

initially to zero and then eventually

[184:14]

add one to it, add two to it, and so

[184:16]

forth as they drop trash into the trash

[184:18]

can, for instance. Well, in C, you're

[184:20]

going to do something almost the same.

[184:22]

You can choose the name of your variable

[184:24]

just like I did previously with answer.

[184:26]

You can assign it a value like zero

[184:28]

initially, but per earlier, what more am

[184:32]

I probably going to have to do in C on

[184:33]

the right hand side here? Yeah,

[184:37]

>> I got to give it a type and a counter

[184:39]

in. in so far as it's numeric is not

[184:40]

going to be a string of text and I don't

[184:43]

think I need to worry about decimal

[184:44]

points if I'm just counting the

[184:46]

equivalent on my fingers. So int will

[184:48]

suffice and int is the go-to number and

[184:50]

le at at least if two billion plus

[184:52]

values is more than enough for your case

[184:54]

which this is going to be still one

[184:56]

minor thing missing. Yeahm

[184:59]

>> the semicolon to finish the thought. So

[185:01]

that on the right is the equivalent to

[185:02]

doing this here on the left. Suppose

[185:04]

that in Scratch you wanted to increment

[185:05]

the counter and add one to the score,

[185:07]

add two to the score and so forth. It

[185:09]

might look like this. Change counter by

[185:10]

one implicitly going up unless you did

[185:12]

negative which would go down. In C, you

[185:15]

can do this actually in a few ways. And

[185:16]

this looks a bit wrong at the moment.

[185:19]

How can counter possibly equal counter +

[185:22]

one. This does not mean equality per se.

[185:25]

The single equal sign recall is

[185:26]

assignment and it means take the value

[185:28]

on the right and copy it to the value on

[185:31]

the left or to the variable in this case

[185:33]

on the left. So this takes whatever the

[185:35]

current value of counter is zero adds

[185:38]

one to it and then stores that one in

[185:41]

the counter variable. So now the value

[185:43]

is one and if you do it again it goes to

[185:45]

two goes to three goes to four and so

[185:47]

forth. But honestly this incrementation

[185:49]

technique is so common that there's more

[185:51]

shorthand notation for it. You can also

[185:53]

just do this. Looks a little weird at

[185:55]

first glance but counter plus equals 1

[185:57]

semicolon does the exact same thing. You

[185:59]

can just type fewer keystrokes. And

[186:01]

honestly, doing this is so down common

[186:03]

in C that you can even do this counter

[186:06]

plus plus does the exact same thing by

[186:09]

adding one to the variable. There's no

[186:11]

plus+ or plus+ or more pluses. It's only

[186:14]

for incrementing individual values by

[186:16]

one. So arguably this version and this

[186:19]

version, albeit more verbose, are a

[186:21]

little more versatile because you can

[186:22]

add two or three or more at a time. And

[186:25]

there are equivalents for you doing

[186:26]

decrementation and doing minus minus or

[186:29]

the minus symbol more generally in

[186:31]

there. All right, so let's actually use

[186:34]

this technique in some code. Let me go

[186:36]

back into VS Code here. Let me close my

[186:39]

file explorer and let's go ahead and

[186:40]

create maybe this time like a a little

[186:42]

calculator of sorts. Let me propose that

[186:44]

we implement a very baby calculator or

[186:48]

rather not even a calculator yet. Let's

[186:49]

just compare some few values. So let me

[186:51]

do this code of compare C to create a

[186:54]

brand new program called compare. And

[186:57]

then in here I'm going to do a bit of

[186:58]

boilerplate. I'm going to go ahead and

[187:00]

include cs50.h. I'm going to go ahead

[187:02]

and include standard io.h. And I'm going

[187:05]

to go ahead and uh do int main void.

[187:07]

More on that next week. And then inside

[187:09]

the curly braces, let's use these these

[187:11]

new techniques. Let's give myself a

[187:13]

variable called x and set it equal to

[187:16]

the return value of get int. that other

[187:19]

function I promised exists. And let's

[187:21]

prompt the user for a value for x with a

[187:24]

sentence like what's x question mark and

[187:26]

then a space just to nudge the cursor

[187:28]

over. Let's get another variable y. Set

[187:30]

it equal to get int again and ask the

[187:32]

user this time what's y essentially

[187:34]

using the same function twice but to get

[187:36]

two different values. Now let's go ahead

[187:39]

and do something pretty mindless. If x

[187:41]

is less than y, go ahead and print out

[187:44]

with print f x is less than y. Back

[187:48]

slashn to move the cursor close quote

[187:51]

semicolon. So it's not that interesting

[187:53]

of a program, but it's at least dynamic

[187:55]

in that now I'm prompting the user for

[187:57]

two numbers. So let's do this. Make

[187:59]

compare. Enter. Seems to have worked.

[188:01]

And in fact, I can check that it worked

[188:03]

by typing what command to list the files

[188:04]

in my directory.

[188:07]

ls for short. And now you'll see I've

[188:09]

got hello.c. C, but no hello because I

[188:11]

deleted that with rm a few minutes ago.

[188:13]

I've got compare.c which I just created.

[188:15]

And then I've also got a program called

[188:17]

compare. And the asterisk there is just

[188:19]

a visual indicator that this is

[188:20]

executable. It's a program you can run.

[188:22]

It's not just a simple old file. Even

[188:24]

though I didn't type ls previously with

[188:26]

hello, uh it would have similarly had an

[188:29]

asterisk next to it in this context. But

[188:31]

you don't see that in the file explorer.

[188:33]

If I now do compare, well, let's do

[188:35]

something silly like one for x, two for

[188:37]

y. Okay, X is less than Y. Let's do it

[188:40]

again. Dot slashcompare two for X, one

[188:43]

for Y. Okay, and I see nothing. Well,

[188:47]

why am I seeing nothing? Well,

[188:49]

logically, I didn't have a condition for

[188:51]

checking for greater than, let alone

[188:53]

equal to. So, let's enhance this a

[188:55]

little bit. Let me go ahead and

[188:56]

minimally say, all right, else if X is

[188:59]

not less than Y, let's go ahead and

[189:01]

print out X is not less than Y back

[189:03]

slashn close quote semicolon. So I'm at

[189:05]

least handling that situation too. Let

[189:08]

me clear my terminal window. Do make

[189:10]

compare again. Dot /compare one and two

[189:14]

works exactly the same. Now let me go

[189:16]

ahead and do two and one. There we have

[189:19]

better output. Of course it's not really

[189:21]

complete yet because if I do dot slash

[189:23]

compare again and do one and one, it'd

[189:26]

be nice to be a little more specific

[189:28]

than x is not less than y. It's not

[189:29]

wrong but it's not very precise. So I

[189:32]

can add in the to the mix what we did

[189:34]

earlier and I can say okay well else if

[189:37]

x is greater than y say x is greater

[189:42]

than y else if x equals equals y go

[189:46]

ahead and print out x is equal to y back

[189:50]

slashn close quote but here too someone

[189:52]

observed that this is sort of stupidly

[189:54]

inefficient what line of code should I

[189:57]

actually improve here to tighten this up

[190:00]

yeah

[190:00]

>> instead What else did you just get rid

[190:02]

of?

[190:03]

>> Yeah. So line 17. I think I can just get

[190:05]

rid of that unnecessary question because

[190:07]

logically that's going to be the case at

[190:09]

this point. And now I can go ahead and

[190:10]

recompile this with make compare dot /

[190:14]

compare again. Enter one and one. And

[190:17]

now we're back in business catching all

[190:19]

three of those situations uh those uh

[190:22]

scenarios there.

[190:25]

Questions on any of these things here?

[190:30]

Why have I deliberately not done this?

[190:32]

Let me rewind just a moment and let me

[190:35]

hide my terminal window just to keep the

[190:36]

emphasis on the code here. Why not do

[190:39]

this and keep my code arguably simpler?

[190:42]

Like why not just ask three questions?

[190:45]

Step nine, step 13, and step 17 here.

[190:50]

Yeah. What don't you like?

[190:51]

>> Because then it would check each and

[190:53]

every condition. Um even though for

[190:54]

example the first one might be

[190:56]

fulfilled, it would check the second and

[190:57]

third. That wasted

[191:00]

Exactly. It's another example of bad

[191:02]

design because now no matter what, you

[191:04]

were asking three questions on lines 9,

[191:07]

13, and 17. Even if X ends up being less

[191:10]

than Y from the get-go, you're still

[191:12]

wasting everyone's time by saying,

[191:14]

"Wait, well, is X greater than Y?" You

[191:16]

already might know that it's not. Is X

[191:18]

equal to Y? You already might know that

[191:20]

it's not. And so these three

[191:21]

conditionals at the moment are mutually

[191:24]

exclusive, whereby you're checking all

[191:27]

three of them no matter what. even

[191:28]

though logically that shouldn't be

[191:30]

necessary. So our first approach was

[191:32]

actually quite better. And in fact, just

[191:34]

to show you the the density difference

[191:36]

here, let me go back to this very first

[191:38]

version here whereby I was only checking

[191:41]

that one condition. Is X less than Y?

[191:43]

Well, if you're more of a visual

[191:44]

learner, you can actually draw out what

[191:46]

code looks like in flowchart form. So

[191:49]

here is a drawing of a program that

[191:52]

starts here and ideally stops down here.

[191:54]

And each of these uh figures in the

[191:56]

middle sort of represent logical

[191:57]

components of the code. Uh here in the

[192:00]

di in the diamond here is my boolean

[192:03]

expression which represents the start of

[192:04]

the conditional. So if x is less than y

[192:06]

I have a decision to make yes or no true

[192:09]

or false. Well if it is less than y

[192:12]

true. Well let's go ahead and print out

[192:13]

quote unquote x is less than y and then

[192:16]

stop. However the first version of that

[192:18]

program recall just said nothing if it

[192:21]

were not the case that x were less than

[192:23]

y. That's because false just led to the

[192:26]

stop of the program. There's no keyword

[192:27]

stop. There's just no hand no code to

[192:30]

handle that situation. But the second

[192:32]

version of the code when I actually

[192:33]

added an else looked fundamentally a

[192:35]

little different. So now second version

[192:38]

of that code asked is X less than Y and

[192:40]

if true behavior is exactly the same.

[192:42]

But if it weren't true, it were instead

[192:44]

false, that's when I got the message X

[192:46]

is not less than Y. But in the third

[192:49]

version of the code where I added the if

[192:51]

else if else if then the picture gets a

[192:54]

little more complicated and let me zoom

[192:55]

in top to bottom here we have a longer

[192:58]

flowchart but the questions are really

[193:00]

the same. When I start this program I

[193:02]

ask is s is x less than y. If so I print

[193:06]

out x is less than y. However in that la

[193:10]

sorry in that last version of the

[193:11]

program I was still foolishly asking the

[193:13]

same question. Well wait a minute. Is x

[193:15]

greater than y? Wait a minute. is x

[193:17]

equal to y and that's the version in

[193:19]

which again I had all of that

[193:21]

unnecessary code which I just undded

[193:24]

here asking three questions at a time

[193:26]

ideally I don't want to make that

[193:28]

mistake by doing it again and again and

[193:30]

again so if I instead revert that code

[193:33]

to else if and else if then my flowchart

[193:39]

looks a little bit different because

[193:40]

notice the sort of shortcuts now if x is

[193:43]

less than y true we do this and we're

[193:46]

done Super quick. If X is not less than

[193:48]

Y, fine. We do ask one more question. X

[193:50]

is greater than Y. Well, if so, boom. We

[193:53]

make our way to the end of the program

[193:54]

by just printing that. Only if it's the

[193:56]

perverse case where X equals equals Y.

[193:59]

Do we check this condition? No. This

[194:01]

condition, no. This condition, and then

[194:03]

okay, now we can print out X is equal to

[194:06]

Y because it must be logically. Of

[194:08]

course, it's been observed multiple

[194:09]

times. This is a waste of everyone's

[194:11]

time. So we can prune this chart more

[194:13]

and just have one question, two

[194:15]

questions and that alone tightens up the

[194:18]

program. So again, if you're more of a

[194:19]

visual learner, most any block of code

[194:21]

you can re translate to this sort of

[194:23]

pictorial form, but it really just

[194:25]

captures the same logical flow that the

[194:28]

indentation and the syntax and the code

[194:30]

itself is meant to imply. All right, how

[194:35]

about a final exercise with one other

[194:37]

type here? Recall that this is our

[194:39]

available types to us. Actually, two

[194:42]

final examples here before we have a bit

[194:43]

of a break. Here we have a list of types

[194:46]

that we can use. And here we have a list

[194:48]

of functions that we can use. Let's go

[194:50]

ahead and make a a program that's

[194:51]

representative of something we do quite

[194:53]

often nowadays, but using a different

[194:54]

type. So, let me go back into VS Code.

[194:56]

Let me close compare.c. Let me reopen my

[194:59]

terminal window and clear it just so we

[195:01]

have a new prompt. And let's go ahead

[195:03]

and create a program called agree.c.

[195:05]

It's all too often nowadays that we have

[195:06]

to like agree to terms and conditions.

[195:08]

To be fair, it's usually in the form of

[195:09]

like a popup and a button that we click,

[195:11]

but we can do this in code at the

[195:12]

command line as well. Let me go ahead

[195:14]

and include to start CS50.h and include

[195:17]

to start standard io.h. Let me again for

[195:20]

today's purposes do int main void, but

[195:22]

we'll reveal next week what we why we

[195:24]

keep doing that. And now for a yes no

[195:26]

answer, it suffices just to ask for a

[195:28]

single char or character, not a whole

[195:30]

string. So let's do this. char C equals

[195:35]

get char and let's ask the user quote

[195:37]

unquote do you agree question mark for

[195:40]

instance and now I can actually compare

[195:43]

that value for equality with some known

[195:45]

answers for instance I could say if c

[195:49]

equals equals quote unquote y then go

[195:52]

ahead and print out for instance agreed

[195:55]

period back slashn close quote semicolon

[195:58]

else if c equals equals equals n in

[196:03]

quotes. Let's go ahead and print out,

[196:04]

for instance, not agreed period back

[196:06]

slashn semicolon. Now, there's still

[196:09]

room for improvement here, but notice

[196:11]

we're just now using the same building

[196:13]

blocks in C um in different ways to

[196:16]

solve different problems. But notice on

[196:18]

lines 8 and 12, I've used single quotes,

[196:21]

which I alluded to earlier. Why is that

[196:25]

the case? Why single in this case here?

[196:29]

>> Yeah, it's a single character. And this

[196:31]

is just the way you do it in C. When you

[196:33]

want to compare a single character, you

[196:35]

use chars and you use single quotes.

[196:37]

When you want to use strings of text,

[196:39]

like multiple characters, multiple

[196:41]

words, multiple sentences or paragraphs,

[196:43]

you use strings. So this would seem to

[196:46]

work, but arguably I could be a little

[196:48]

more efficient. If the user doesn't type

[196:50]

why, I mean, frankly, I could just chop

[196:52]

off this else if and make it an else and

[196:54]

just assume if you don't give me a Y

[196:56]

answer, then at least I'm going to

[196:58]

assume the worst and you don't agree.

[197:00]

But even here, the program's not all

[197:02]

that great. Let me go ahead and do make

[197:03]

agree and then do dot slag agree. And do

[197:06]

I agree? Sure. I'm going to go ahead and

[197:08]

type y. Meanwhile, if I type anything

[197:10]

else like n or uh even emphatically, no,

[197:14]

that would seem to Whoops. Why did that

[197:16]

not work? Yeah.

[197:20]

>> Exactly. So, among the features of

[197:22]

CS50's functions like getchar is that it

[197:25]

will enforce what type of data you're

[197:27]

getting. So even though I it because I

[197:29]

used getchar, if the user doesn't

[197:31]

cooperate and types in multiple

[197:33]

characters, get char like some of our

[197:35]

other functions is just designed to

[197:37]

prompt them again again and again until

[197:39]

they cooperate. That's useful so that

[197:41]

you don't have to deal with that kind of

[197:43]

error checking. But here I could type n

[197:45]

in uppercase and that seems to now work.

[197:48]

But that only works because of the else.

[197:49]

Let me go ahead and do this which is

[197:51]

very reasonable. I'm going to go ahead

[197:52]

and type y capital y which you would

[197:54]

hope works. That feels like a bug at

[197:57]

this point. Like it's fine if we don't

[197:58]

want to support yes and no. We just want

[198:00]

to support Y and N. But it's kind of

[198:02]

obnoxious not to support the uppercase

[198:04]

version thereof. So how can we fix this?

[198:06]

Well, let me hide my terminal window.

[198:07]

And I could go in and fix this as

[198:09]

follows. I can say well else if C equals

[198:11]

equals quote unquote capital Y in single

[198:14]

quotes. And then I could do print out

[198:16]

agreed period back slashn semicolon. And

[198:19]

then I can do uh else uh that that would

[198:22]

work. That would work there. But what

[198:24]

rubs you the wrong way perhaps about

[198:25]

this solution? Even if you've never

[198:27]

programmed before,

[198:29]

just applying some of the lessons from

[198:31]

last week. Yeah,

[198:33]

>> it's redundant. I mean, I didn't

[198:34]

technically copy and paste, but like

[198:36]

line 14 is identical to line 10, so I

[198:38]

might as well have copied and paste. And

[198:40]

that's generally bad practice. Why?

[198:42]

Well, if I want to change the English

[198:43]

language to say something else in that

[198:45]

case, now I have to change it twice. And

[198:47]

it's just I'm repeating myself, which is

[198:48]

just bad design. So, there are ways to

[198:51]

address this through other types of

[198:52]

operators that we haven't yet seen. If I

[198:54]

want to ask two questions at once,

[198:56]

that's fine. I can do something like

[198:58]

this. Well, if C equals equals quote

[198:59]

unquote Y or C equals equals quote

[199:03]

unquote capital Y, I can tighten things

[199:05]

up using so-called logical operators

[199:08]

whereby I am now taking a boolean

[199:10]

expression and composing it from two

[199:12]

smaller boolean expressions. And I care

[199:15]

about the answer to one of those

[199:17]

questions being true. So whether it's

[199:19]

lowercase Y or uppercase Y, this code

[199:23]

now will work. And if it's anything

[199:25]

else, we're going to default to not

[199:27]

agreed. So the two vertical bars, which

[199:30]

is probably not a character you type

[199:31]

that often, and it varies where it is on

[199:33]

your keyboard depending whether it's

[199:34]

American English or something else, just

[199:36]

means logical or. This is not relevant

[199:39]

here, but you could also in some context

[199:42]

use two amperands to conote and. But

[199:45]

this does not make sense. Why? Why is it

[199:48]

clearly not correct to say and in

[199:51]

between these two clauses? Yeah,

[199:55]

>> exactly. The variable can't both be

[199:57]

lowercase and uppercase. That just makes

[199:59]

most no sense. So, this would be a bug,

[200:00]

but using a vertical two vertical bars

[200:03]

here is in fact correct. All right.

[200:05]

Well, let's do one final flourish here.

[200:08]

Besides conditionals, we had these now

[200:09]

loops. Recall that a loop is just

[200:11]

something that does something again and

[200:12]

again and again. Here for instance to

[200:14]

scratch how we might meow three times in

[200:17]

C. There's going to be a few different

[200:18]

ways to do this. Here is one. You can in

[200:21]

C declare a variable like I for integer

[200:24]

or whatever you want to call it and set

[200:26]

it equal to three, the number you care

[200:28]

about. You can then use a loop and the

[200:30]

closest to the repeat block is arguably

[200:32]

a while loop. There is no repeat keyword

[200:34]

in C. So we can't translate this

[200:36]

verbatim, but we could say while I is

[200:39]

greater than zero. Why? Because that's

[200:40]

sort of logically what I want to do. If

[200:42]

I start counting at three, maybe I can

[200:44]

just sort of decrement one at a time and

[200:46]

get down to zero, at which point I can

[200:48]

stop doing this thing. So I'm going to

[200:50]

initialize a variable to I, a variable I

[200:52]

to three, and then I'm going to say

[200:54]

while I is greater than zero, go ahead

[200:56]

and do the following. And at the end of

[200:58]

that loop before whipping around again,

[201:01]

I'm going to use this line of code,

[201:03]

which we haven't seen, but you can

[201:04]

infer. IUS minus just means subtract one

[201:07]

from I. So this is going to have the

[201:09]

effect of starting at three, going to

[201:10]

two, going to one, going to zero. And as

[201:13]

soon as it goes to zero, this boolean

[201:15]

expression will no longer be true. And

[201:17]

so the loop will just implicitly stop

[201:19]

because that's it. So what are we going

[201:21]

to put inside of the curly braces

[201:23]

besides this decrementation? Well, I

[201:25]

think I can get away with just saying

[201:26]

meow. And that will now print 1 2 3

[201:30]

times. And yet that's interesting. I

[201:32]

sort of counted in instinctively 1 2 3

[201:35]

even though I'm proposing that we count

[201:37]

3 2 1. So can we implement the logic in

[201:40]

the other direction whereby we count up

[201:41]

from zero instead of down from three.

[201:43]

Well sure we just have to make a few

[201:44]

changes. We can set i equal to zero

[201:47]

initially. We can change our boolean

[201:49]

expression to check that i is less than

[201:51]

three again and again. And on each

[201:54]

iteration of this loop let's just keep

[201:55]

incrementing i with i ++. And at this

[201:58]

point it will have the effect of doing 1

[202:00]

2 3. Three is not less than three. So I

[202:03]

won't put any more fingers up. I will

[202:05]

meow in total three total times. And

[202:08]

again, if you're a visual person, here's

[202:10]

how we might start counting at zero

[202:11]

initially. Check that i is less than

[202:14]

three, which it is initially. And if so,

[202:16]

we print out meow. Then we increment i,

[202:18]

and we get whisked around again to the

[202:20]

boolean expression because that's how

[202:22]

while loops work. You constantly have

[202:24]

the condition being checked again and

[202:26]

again. That's just how C works. As soon

[202:28]

as I've incremented I from 0 to 1 to two

[202:31]

to three, three will eventually not

[202:34]

equal not be less than three. So the

[202:36]

answer will be false. So the loop will

[202:38]

just stop. So that has the effect of

[202:40]

achieving the same. But it turns out

[202:42]

that looping uh some amount of times is

[202:45]

so darn common that you don't strictly

[202:46]

have to use a while loop. A for loop, so

[202:49]

to speak, is another alternative there

[202:51]

too, whereby the syntax is a little

[202:53]

weird. It's a little harder to memorize,

[202:54]

but it allows you to write slightly less

[202:56]

code because you write more code on a

[202:58]

single line. So the way you read a for

[203:00]

loop is exactly the same in spirit. You

[203:02]

initialize the variable everything to

[203:04]

the left of this first semicolon. The

[203:07]

you then check the condition and the

[203:09]

computer does all this for you. If I

[203:10]

less than three, if so, you execute

[203:12]

what's inside of the curly braces and

[203:14]

then automatically the thing to the

[203:16]

right of the second semicolon happens.

[203:18]

So I gets incremented from zero to one.

[203:20]

In this case, the condition is checked.

[203:22]

Is one less than three? It is. So, we

[203:24]

print meow again. And C increments I to

[203:28]

two. Is two less than three? Yes. So, we

[203:30]

meow again. I gets incremented to three.

[203:33]

Is three less than three? No. So, the

[203:35]

for loop stops. So, it's exactly the

[203:37]

same, but just more magic is happening

[203:39]

in this first line of code here more

[203:41]

than you yourselves have to actually

[203:43]

write. And it's just arguably more

[203:45]

common convention. But both of them are

[203:47]

perfectly correct if you'd like to do

[203:49]

that yourself. So let's go ahead and

[203:51]

actually implement now this this

[203:53]

beginning of a cat in VS Code. Let me go

[203:55]

back to VS Code and close agree.c. Let

[203:57]

me reopen my terminal window and create

[203:59]

a actual cat in cat.c. And let's go

[204:02]

ahead and do this initially the wrong

[204:03]

way. Include standard io.h int main

[204:07]

void. And then inside of main let's go

[204:09]

ahead and print out quote unquote meow

[204:12]

back slashn semicolon. And then heck,

[204:14]

let me just copy paste. So this is

[204:16]

obviously the wrong way, the bad way to

[204:18]

do this because I'm literally copying

[204:19]

and pasting. But it is correct. If I

[204:21]

want the cat to meow three times, I can

[204:23]

make this cat. I can do slashcat and I

[204:26]

get my meow meow meow. But let's now

[204:29]

actually use some of those new building

[204:30]

blocks whereby we converted scratch to

[204:32]

C. And let me go back into this code and

[204:34]

I'll do the while loop first. So I could

[204:36]

instead have done int i equals 3. If we

[204:40]

count down initially while I is greater

[204:43]

than zero, then go ahead and print out

[204:46]

quote unquote meow back slashn. And then

[204:49]

do I plus+ or I minus minus?

[204:54]

I minus minus because we're starting at

[204:56]

three. Now let me go back to my terminal

[204:58]

window and clear it. Do make cat again.

[205:01]

Dot /cat and we get three meows. And

[205:03]

this is now arguably better implemented.

[205:05]

What if I want to flip things around?

[205:08]

Well, I could now change uh maybe do it

[205:11]

the normal person way. I could start

[205:13]

counting at zero. And I can do this so

[205:15]

long as I is less than three. And I can

[205:18]

do this so long as I increment I on each

[205:21]

iteration. Now I can do make cat again.

[205:24]

Dot /cat. Enter. And that too works. But

[205:27]

there's another way I could do this. If

[205:28]

I want to count like a normal person,

[205:30]

like start counting from one and count

[205:32]

up two and through three, I could do

[205:36]

this. But this is arguably this is

[205:38]

correct. It would iterate three times.

[205:40]

But it's a little confusing because now

[205:41]

I have to think about what it means to

[205:42]

be less than four. Okay, that means

[205:43]

equal to three. I could be a little more

[205:45]

explicit and say we'll do this while I

[205:47]

is less than or equal to three using yet

[205:49]

another one of those operators. So I can

[205:51]

make a cat yet again dot /cat and that

[205:54]

too would work. Now which of these is

[205:55]

correct or best? The convention

[205:58]

truthfully is in general in code to

[206:00]

start counting from zero. start counting

[206:02]

up to but not through the value that you

[206:05]

want. So at least you see the starting

[206:07]

point and the ending point on the screen

[206:10]

if you will at the same time. But of

[206:12]

course I can condense all of this a bit

[206:13]

more and turn this whole thing into a

[206:15]

for loop. And I instead could do four

[206:17]

int i equals 0 i less than 3 i ++ and

[206:20]

then down here I could do print out

[206:22]

quote unquote meow. And if only because

[206:25]

I typed fewer keystrokes that time like

[206:27]

this feels a little nicer. It's a little

[206:29]

tighter and more uh efficient to create

[206:31]

even though the effect is the same.

[206:33]

Indeed, when I make this cat and do dot

[206:35]

/cat a final time, this here too gives

[206:39]

me the three meows. So, what could go

[206:42]

wrong? Well, sometimes you might be

[206:45]

inclined to do something forever and we

[206:46]

might have done that in Scratch and

[206:47]

indeed we did when we had some things

[206:49]

bouncing back and forth off of walls and

[206:50]

so forth. You can achieve the same thing

[206:52]

in code. In fact, in C we could use a

[206:54]

while loop, but there is no forever

[206:56]

block. So while suffices, but recall

[206:58]

that the while loop expects a boolean

[207:00]

expression. And if I want to do

[207:02]

something forever, I essentially need an

[207:03]

expression here that's always true. So I

[207:05]

could do something stupid and uh

[207:07]

arbitrary like while two is greater than

[207:10]

three or while one is less than two. I

[207:13]

mean make a statement of fact that never

[207:14]

changes air go. It's just going to run

[207:16]

forever. But if the whole goal here is

[207:18]

to do something forever and to get this

[207:20]

boolean expression to be true, the

[207:22]

convention in programming is just to

[207:23]

literally say while true. And that

[207:25]

implies and functionally means that you

[207:28]

will do this thing forever unless you

[207:30]

somehow prematurely break out of those

[207:33]

curly braces. More on that before long.

[207:35]

So if I want to meow forever, I could

[207:37]

now just do this. And this would be an

[207:39]

infinite deliberate loop. But unlike a

[207:42]

game where you might want it to keep

[207:43]

going and going and going for some time,

[207:44]

I'm not sure this is going to be the

[207:46]

best thing for us. Let's go ahead and

[207:48]

try this. So let me go ahead here and

[207:50]

include for good measure uh CS50's

[207:52]

library if only because um it too is

[207:55]

giving us features like uh bools. Uh

[207:58]

here I'm going to go ahead and say while

[208:00]

true and then inside of my curly braces

[208:03]

I'm just going to print out meow. Let's

[208:05]

go ahead back slashn semicolon. Let's go

[208:08]

ahead here and make cat one final time.

[208:12]

Let me go ahead here and do dot

[208:14]

slashcat. And

[208:18]

this is like the annoying cat game. Just

[208:19]

like meowing, meowing meowing endlessly.

[208:21]

Like I've now kind of lost control over

[208:23]

my terminal window. And mark my words,

[208:25]

at some point you might do this, too.

[208:26]

But let's go ahead and take a juicy

[208:27]

10-minute break here. Uh we have some

[208:30]

delicious blueberry muffins out in the

[208:31]

transep. Come back in 10 and we'll

[208:33]

figure out how to stop this here cat.

[208:36]

All right, so it's been about 10 minutes

[208:39]

and like VS Code is freaking out with

[208:41]

high code space, CPU utilization

[208:43]

detected. Consider stopping some

[208:45]

processes for the best experience. So

[208:46]

this is what happens when you have

[208:48]

intentionally or otherwise an infinite

[208:50]

loop in so far as I've been printing out

[208:53]

meow endlessly. And I was warned by my

[208:55]

colleague that I probably shouldn't let

[208:57]

this run too long because we might lose

[208:58]

control over the environment altogether.

[209:00]

But the answer to how to solve this is

[209:03]

going to be control C. So there's a few

[209:05]

cryptic keystrokes that you can use to

[209:07]

generally interrupt things as in this

[209:08]

way. And in fact, if I go back and

[209:10]

you'll see, yeah, I kind of lost control

[209:12]

over my code space here. I'm going to go

[209:14]

ahead and try to reload the window

[209:15]

altogether. But had I hit control C in

[209:18]

time, let's hope this doesn't now go off

[209:21]

the rails.

[209:23]

C would have been our friend. There we

[209:27]

go. And we're back. Okay. So, now that

[209:29]

we've got control over our so-called

[209:31]

code space again, how can we go about

[209:33]

making our meowing program a little more

[209:35]

dynamic in so far as let's like start

[209:36]

asking the user how many times they want

[209:38]

the cat to meow. Certainly, rather than

[209:40]

do it an infinite number of times and

[209:42]

even rather than do it three times

[209:44]

alone, I think we have all of these

[209:45]

building blocks thus far. So, let me go

[209:47]

ahead and stay in cat.c here and go

[209:49]

ahead and delete the body of the

[209:51]

contents of my main function. And let's

[209:52]

go ahead and do this. Let's give myself

[209:54]

an int. And I'll go ahead and call it n

[209:56]

for number. Though I could be more

[209:57]

verbose than that if I wanted. I'm going

[209:59]

to set it equal to the so-called return

[210:01]

value of get int, which recall is going

[210:03]

to get an integer from the user. And

[210:05]

quote unquote, let's ask the user what's

[210:07]

n just like I asked earlier, what's x

[210:09]

and what's y, where n is the number of

[210:11]

times I want the cat to meow. Now, how

[210:13]

can I use this variable? Well, we have

[210:15]

that building block, too. I could use a

[210:16]

while loop or a for loop. And if I use a

[210:19]

for loop, I could do this. I could

[210:21]

initialize a variable i for integer, set

[210:24]

it equal to zero initially. I could then

[210:26]

do I less than not three this time but

[210:29]

n. So I can use that variable as a

[210:31]

placeholder inside of the loop to

[210:33]

indicate that I want to do this n times

[210:35]

instead of three. And on each iteration

[210:37]

through this loop I can do i ++. Of

[210:39]

course I could be counting down if I

[210:40]

prefer uh by using decrementation. But

[210:43]

logically I would say this is canonical.

[210:44]

Start at zero and go up to but not

[210:46]

through the value that you actually care

[210:48]

about. And I'll go ahead now and print

[210:50]

out quoteunquote meow with a back slashn

[210:52]

semicolon. Back down to my terminal.

[210:55]

Make this cat again. Dot slashcat.

[210:57]

Enter. I'm prompted this time for n. I

[210:59]

can still give it three and I'm going to

[211:01]

get three meows this time. However, if I

[211:03]

run it again with dot /cat and a

[211:05]

different input like four, of course,

[211:08]

I'm going to get four meows instead.

[211:10]

Now, what is get in doing for me? Well,

[211:12]

it does a few things similar to getch

[211:14]

doing a few things for me. For instance,

[211:17]

suppose that instead of answering this

[211:19]

question correctly with a number n, I

[211:22]

say something random like dog that is

[211:24]

not an integer. And so the get in

[211:27]

function is designed to reject the

[211:28]

user's input implicitly and just

[211:30]

reprompt again and again. Uh I can try

[211:34]

bird and it's going to do this again. So

[211:35]

somewhere in the implementation of get

[211:37]

in, there's a loop that we wrote that

[211:39]

does this kind of error checking for

[211:41]

you. But it doesn't do everything

[211:43]

because an integer is a fairly broad

[211:45]

category of numbers. It's like negative

[211:47]

infinity through positive infinity. And

[211:49]

that's a lot of possibilities. But

[211:51]

suppose I don't want some of those

[211:52]

possibilities. Suppose that it makes no

[211:54]

sense to ask the cat to meow like

[211:57]

negative one time. And yet the program

[211:59]

accepts that. It doesn't do anything or

[212:02]

anything wrong. But I feel like a better

[212:04]

designed program would say, "No, no, no.

[212:06]

Negative one makes no sense. Let's meow

[212:08]

zero or one or two or more times

[212:10]

instead." So, how can I begin to add

[212:12]

some of my own error checking and coers

[212:15]

the user to give me the type of input I

[212:16]

want? Well, let me clear my terminal

[212:18]

window and go back up into my code. And

[212:20]

why don't I do something like this?

[212:21]

After getting n, let's just check if n

[212:25]

is less than zero. Because if so, I want

[212:28]

to prompt the user again. And I can

[212:30]

prompt the user again by doing n equals

[212:33]

get int quote unquote what's n question

[212:37]

mark semicolon. Now what's going on

[212:39]

here? Well on line six I'm doing two

[212:42]

things. I'm getting an integer from the

[212:44]

user and I'm not only storing it in the

[212:47]

variable n. I'm also technically

[212:49]

creating the variable n. So, I didn't

[212:51]

call this out earlier, but on line six,

[212:53]

when you specify the type of a variable

[212:54]

and the name of the variable, you are

[212:57]

creating the variable somewhere in the

[212:59]

computer's memory. And that's necessary

[213:00]

in C to specify the type. If the

[213:03]

variable already exists though, and you

[213:05]

just want to reuse it and change it

[213:07]

later on, it suffices as in line 9 just

[213:11]

to reference it by name. It would be

[213:12]

sort of stupid to specify the type again

[213:14]

because C already knows what type it is

[213:16]

because you told C what it is on line

[213:19]

six. So that's why lines six and nine

[213:22]

are a little bit different. So let's see

[213:23]

how this now works. Let me go back to my

[213:26]

terminal window and remake this cat. Let

[213:29]

me do dot /cat again. Let me not

[213:31]

cooperate and type in like negative one

[213:33]

again. And notice I am reprompted this

[213:35]

time. Fine, fine, fine. Let's type in

[213:37]

three. And now it works. But you can

[213:39]

perhaps logically see where this is

[213:41]

going. Let me go ahead and run this

[213:43]

again. Dot /cat. Type in negative 1.

[213:46]

Type in negative one. And huh, it didn't

[213:49]

prompt me again. But that's consistent

[213:51]

with the code. If I hide my terminal

[213:53]

window here, you'll notice that I've got

[213:55]

one maybe two tries to get this question

[213:58]

right. And after that, there's no more

[214:00]

prompting of me. Now, you can kind of

[214:02]

imagine that this is probably not the

[214:04]

best way to do this. If I were to go

[214:06]

inside of line nine and then move the

[214:08]

cursor down and say, "Okay, well, if n

[214:10]

still doesn't uh is still is less than

[214:12]

zero." Well, let's just do get int again

[214:15]

and ask what's n question mark. And

[214:17]

heck, okay, if it's still less than

[214:20]

zero, well, let's just keep asking the

[214:22]

same, right? Why is this bad?

[214:25]

I'm repeating myself. I'm essentially

[214:27]

copying and pasting even though I'm

[214:28]

retyping. I mean, this just never ends,

[214:30]

right? Like, how many chances are you

[214:31]

going to give the user? In spirit, you'd

[214:34]

hope that they don't un uh not cooperate

[214:36]

this many times. But really to do this

[214:38]

the right way, we should probably prompt

[214:40]

them potentially as many times as it

[214:42]

takes to get the correct input. So this

[214:44]

is not the right path for us to be going

[214:46]

down. But of course, we have already now

[214:48]

this notion of like a loop whereby we

[214:50]

could just do this in a loop. Ask the

[214:52]

question once and maybe just repeat the

[214:55]

question again, but the same question.

[214:57]

So how might I do this? Well, let me go

[214:59]

ahead and delete all of this. And let me

[215:01]

just try to spell this out logically.

[215:03]

So, I want to get a variable n from the

[215:06]

user. And let's go ahead as follows.

[215:10]

While true. I know how to do infinite

[215:13]

loops now. And even though that created

[215:14]

a problem for me with the cat, I bet we

[215:16]

can sort of terminate the loop

[215:18]

prematurely like I proposed earlier as

[215:20]

follows. I could do this int n equals

[215:23]

get int and ask the user again what's n

[215:25]

question mark. And then I could do

[215:28]

something like this. If n is less than

[215:31]

zero, well then you know what? Go ahead

[215:33]

and just continue on with the same loop.

[215:36]

Else if it is not the case that n is

[215:38]

less than zero, what do I want to do? I

[215:39]

want to break out of this loop. So this

[215:41]

is new syntax. This is something you can

[215:43]

do in C whereby if n is less than zero,

[215:46]

fine. Continue means go back to the

[215:48]

start of the loop and do the same exact

[215:50]

thing again. Otherwise, if you instead

[215:52]

say break, it means break out of the

[215:54]

loop and go to below whatever curly

[215:57]

brace is associated with that loop. So,

[215:59]

continue essentially brings you to the

[216:00]

top. Break brings you to the bottom, if

[216:03]

you will. So, logically, I think this is

[216:04]

right, but this code curiously isn't

[216:07]

quite going to work and get me a value

[216:09]

for n. Let me go ahead and open my

[216:11]

terminal window again. Let's make this

[216:13]

cat. And, huh, cat. C line 19 character

[216:19]

25 is an error. Use of undeclared

[216:22]

identifier N. Well, what does that mean?

[216:25]

Again, cat. C line 19. Let me hide my

[216:27]

terminal window. Highlight line 19. N is

[216:31]

being used in line 19, but I created it

[216:35]

in line 8. And so what's the problem?

[216:39]

Why is it not declared seemingly? Yeah,

[216:44]

>> because you are using like within the

[216:47]

loop that you wrote.

[216:48]

>> Yeah, this is a subtlety, but I'm using

[216:50]

I'm creating N inside of this loop. I

[216:53]

mean, literally between the curly braces

[216:55]

on lines 7 and 17. The implication of

[216:58]

which because of how C works is that

[217:01]

that variable only exists inside of that

[217:05]

for loop. This is a problem of what's

[217:06]

known as scope. the variable n only

[217:10]

exists inside of the scope of the while

[217:14]

loop in which it was declared. So how do

[217:16]

I actually fix this? Well, I need to

[217:19]

logically somehow declare that variable

[217:22]

n outside of the loop so that it exists

[217:24]

later on in the program as well. And

[217:27]

there's a few different ways I can fix

[217:28]

this, but the best way is probably to

[217:31]

move the the declaration of n, so to

[217:33]

speak, the creation of n outside of the

[217:36]

curly braces and maybe kind of squeeze

[217:38]

it in here below line five. So still

[217:40]

inside of main, whatever that is. More

[217:41]

on that next week, but in the same curly

[217:44]

braces as everything else. So I can in

[217:47]

fact do this, and this is where the

[217:48]

syntax gets a little bit different. I

[217:50]

can solve this quite simply as follows.

[217:52]

I can go down to a new line six and just

[217:56]

say int n semicolon and that's it. This

[217:59]

declares a variable called n. It creates

[218:01]

a variable called n. And initially it

[218:02]

doesn't give it any value. So who knows

[218:04]

what's in there. More on that another

[218:05]

time. But now on line 9, I don't need to

[218:08]

recreate it. I just need to assign it a

[218:10]

value. And because now n has been

[218:13]

declared on line six and between the

[218:16]

curly braces on line five and all the

[218:19]

way down on 24. Now n is in scope so to

[218:23]

speak for the entirety of this code that

[218:26]

I've written. So let me reopen my

[218:28]

terminal window and clear that old

[218:29]

error. Let me do make cat again. Now the

[218:32]

error messages is gone. Let me go ahead

[218:34]

and do /cat. What's n? Now I'm back in

[218:37]

business and I can do three for meow

[218:38]

meow meow. Better yet, because I'm

[218:40]

inside of a loop now, watch that I can

[218:42]

do negative 1gative 1gative 1gative

[218:44]

1gative -2g350.

[218:47]

Finally, I can cooperate with something

[218:49]

like three. And because I'm in a loop

[218:51]

that by design may very well go

[218:54]

infinitely many times until the user

[218:57]

actually cooperates and lets me break

[218:59]

out of that exact loop. Now, I strictly

[219:02]

speaking don't need both continue and

[219:04]

break. I wanted to demonstrate that both

[219:05]

exist, but this is like twice as much

[219:07]

code than I actually need. If logically

[219:09]

I just want to break out of this loop if

[219:11]

and only if n is greater than or equal

[219:14]

to zero because I'm sort of comfortable

[219:16]

with the idea of zero meows but negative

[219:18]

makes no sense. Well, I can just flip

[219:19]

the logic. I can say if n is greater

[219:23]

than or equal to zero then go ahead and

[219:26]

break. And I've tightened up the code

[219:28]

further. I could technically do

[219:30]

something else. I could say something

[219:33]

like if n is less than zero, but wait a

[219:38]

minute. I want to negate that. You can

[219:39]

start to do tricks like this. An

[219:40]

exclamation point with some additional

[219:42]

parentheses. So you can invert the

[219:44]

logic. It's arguably a little hard to

[219:46]

read. Even though that would be

[219:48]

logically correct. So I'm just going to

[219:49]

say more explicitly as before. If n is

[219:52]

greater than or equal to zero, break out

[219:54]

of this here loop. All right. So this is

[219:57]

one way to use an infinite loop. But it

[219:59]

turns out there's another construct that

[220:00]

you can do altogether that is in a

[220:03]

feature of C. Instead of using a while

[220:05]

loop and forcing it to be infinite by

[220:07]

using while true and then eventually

[220:08]

manually breaking out of it, there

[220:10]

exists another type of loop altogether

[220:12]

and that's called a do while loop. And

[220:14]

you can literally say the word do which

[220:16]

means do the following. Then you can do

[220:18]

exactly what we did before n equals get

[220:20]

and quote unquote what's n question

[220:22]

mark. So exactly like before but then

[220:25]

after those curly braces you use a while

[220:29]

keyword. So at the end of the loop

[220:31]

instead of the beginning and that's

[220:32]

where you put your boolean expression. I

[220:34]

want to do all of that while n is less

[220:38]

than zero. So you can kind of invert the

[220:40]

logic and now kind of tighten things up

[220:42]

further by just telling the computer do

[220:44]

the following. What's the following?

[220:46]

Everything in between those curly braces

[220:48]

while n is less than zero. And this

[220:50]

implicitly handles all of the

[220:52]

continuation and all of the breaking by

[220:54]

just doing what you've said. Do this

[220:57]

while this is true. But the difference

[220:59]

between this dowh loop and a normal

[221:02]

while loop is literally that the

[221:04]

condition is checked at the bottom

[221:06]

instead of the top. So when you say

[221:09]

while parenthesis something that

[221:11]

question is asked first and then you

[221:13]

proceed maybe this condition is only

[221:16]

asked at the very end. And why is this

[221:18]

useful? Well often time when writing

[221:20]

programs where you want to do something

[221:21]

at least once like you obviously want to

[221:23]

ask the user this question at least

[221:24]

once. There's no point in asking a

[221:26]

question like while true or while

[221:28]

anything else. You should just do it and

[221:30]

then you should do it again if the

[221:33]

expression evaluates to true and tells

[221:35]

you to do something. Now you haven't

[221:37]

played with these loops yet most likely

[221:39]

unless you have programmed before. Uh

[221:41]

there's a fun sort of meme that's

[221:42]

apppropo of this moment. So let's see if

[221:44]

this maybe causes a few chuckles. If you

[221:47]

remember Looney Tunes here,

[221:50]

is this funny for people in the know?

[221:54]

There we go. Thank you. Okay, this

[221:57]

doesn't make sense. It eventually will.

[221:58]

And it still might not be funny, but it

[222:00]

will at least make sense. And it

[222:02]

illustrates the difference between doh

[222:03]

while loop like the roadrunner is

[222:04]

stopping because he's checking the

[222:06]

condition. While not on edge, he'll run.

[222:08]

But if he is on the edge, he's not going

[222:10]

to proceed further. But of course, the

[222:11]

coyote here, he's going to do running no

[222:14]

matter what. And then only too late.

[222:16]

Does he check? Haha. He's still on the

[222:18]

ed. All right. So, ah, thank you. All

[222:21]

right. Now, you're cool. All right. So,

[222:24]

many more memes will now make sense as a

[222:26]

result. But let's go ahead and revisit

[222:28]

this code and maybe do something a

[222:30]

little bit different here whereby we no

[222:32]

longer want to just fuss around with

[222:33]

some of these uh conditionals and these

[222:36]

loops. Let's actually make the software

[222:37]

a little better designed. And to do

[222:39]

this, we'll revisit an idea that we

[222:40]

touched on last week and had to do with

[222:42]

problem set zero, which was like create

[222:44]

your own function. Like C does not come

[222:46]

with everything you might want. CS50

[222:47]

library is not going to come with

[222:48]

everything you might want. And at the

[222:50]

end of the day, a lot of programming is

[222:51]

about abstracting away your ideas. So

[222:54]

you solve a problem once and then reuse

[222:56]

it, reuse it, reuse it. And heck, you

[222:58]

can package it up in a so-called library

[223:00]

like we have and let other people use it

[223:02]

as well. So here for instance in Scratch

[223:04]

is how we could have implemented the

[223:06]

notion of meowing as by getting the cat

[223:08]

to play the sound meow until done. We

[223:10]

abstracted it away and then we had a

[223:12]

magical new puzzle piece called meow in

[223:14]

C. This is going to be a little weird

[223:16]

today but next week these details will

[223:18]

start to make more sense. You would

[223:20]

instead do the following. Literally type

[223:23]

void the name of the function you want

[223:25]

to create and then void again in

[223:26]

parenthesis. For now know that this is

[223:29]

the return value of the function. So

[223:32]

void means it returns nothing. This is

[223:35]

the input to or the arguments to the

[223:37]

function. Void means it takes no inputs.

[223:39]

And that makes sense because literally

[223:40]

meow doesn't return anything. It doesn't

[223:43]

take anything. It just meows. It has a

[223:46]

so-called side effect audibly last week.

[223:48]

So this means hey c invent a function

[223:51]

called meow that takes no input,

[223:53]

produces no output, but does have a side

[223:55]

effect of printing meow on the screen.

[223:57]

Meanwhile, if I wanted to do something

[223:59]

like this in code last week where I

[224:02]

meowed three times, well, that's fine.

[224:03]

We have the building blocks for this.

[224:05]

And here's where inventing your own

[224:06]

function starts to get more compelling.

[224:08]

I can abstract away the notion of

[224:09]

meowing now. Like, this doesn't come

[224:11]

with C. It doesn't come with the CS50

[224:12]

library. I just created in the previous

[224:14]

code this meow function. So, I can

[224:17]

encode with a for loop and that new

[224:19]

function meow three times. But I can

[224:21]

abstract this away further. Recall that

[224:23]

the refinement in Scratch last time was

[224:25]

this. I could edit the new function and

[224:28]

I can say it actually does take an input

[224:29]

otherwise known as an argument called n.

[224:32]

And I clarified that this means to meow

[224:33]

some number of times. And then inside of

[224:35]

those scratch blocks, I repeated n times

[224:39]

the meowing act. Well, in C, I can

[224:41]

achieve the exact same thing. Even

[224:43]

though it's going to look a little more

[224:44]

cryptic, but meow still returns nothing.

[224:47]

It has a audible or visual side effect,

[224:49]

but it doesn't return a value. But this

[224:51]

version does take an input. And this

[224:54]

might look a little weird, but just like

[224:55]

before, when you create a variable in C,

[224:57]

you specify the type and the name. When

[225:00]

you invent your own function in C and it

[225:02]

takes one or more inputs, aka arguments,

[225:06]

you specify the type and the name of

[225:08]

those as well. No semicolons up there,

[225:11]

just inside of the parenthesis. And

[225:12]

you'll get used to with practice this

[225:14]

convention. But the rest of this code is

[225:16]

exactly the same, except instead of

[225:18]

three, I'm now using n. So again, I'm

[225:21]

just composing the exact same ideas as

[225:23]

last week, even though it looks way more

[225:25]

cryptic this week, but it will come more

[225:27]

and more familiar with more and more

[225:29]

practice. So how can I go about

[225:32]

implementing this myself? Well, let me

[225:34]

propose that we do something like this.

[225:37]

Let me go back to VS Code here and let

[225:39]

me go ahead and let's really delete most

[225:42]

of the code that I've written inside of

[225:44]

Maine. And let me just suppose for the

[225:46]

moment that meowing exists. And I'm

[225:48]

going to go ahead and say for the first

[225:49]

version for int i equals zero i less

[225:52]

than three. So we're not going to take

[225:53]

input yet. i ++. And then I'm going to

[225:56]

go ahead here and say meow is what I

[225:58]

want this function to do. Now if I

[226:00]

scroll back up, you'll see there's no

[226:01]

definition of meow yet. So I'm going to

[226:04]

invent that too. I'm going to go up here

[226:06]

and say void. Uh meow void. And again

[226:09]

this first version means no input, no

[226:12]

output, just a side effect. And that

[226:14]

side effect super simply is going to be

[226:15]

to say just quote unquote meow with a

[226:17]

back slashn. And now if I go and open my

[226:20]

terminal window, clear it from before,

[226:23]

do make cat, so far so good. /cat, we're

[226:26]

back in business, but I've abstracted

[226:28]

the function away. Now, much like last

[226:30]

week where I sort of dramatically

[226:32]

dragged the meow definition way down to

[226:34]

the bottom of the screen just to make

[226:35]

the point that you don't need to see it

[226:37]

anymore. Out of sight, out of mind. Let

[226:39]

me sort of try to do the same here. Let

[226:41]

me highlight and delete that and like go

[226:44]

way way way down arbitrarily just to be

[226:46]

dramatic and paste it near like the

[226:48]

hundth line of code and scroll back up.

[226:50]

Now out of sight, out of mind. I've

[226:51]

already implemented the idea of meowing.

[226:53]

We don't need to see or talk about it

[226:55]

again. But there is a caveat in C. When

[226:58]

I now clear my terminal and make this

[227:00]

cat, now I've introduced a problem and

[227:02]

there's like more problems it seems than

[227:04]

code. Let me scroll back up to the first

[227:06]

such error and you'll see this on line

[227:09]

nine of cat.c See character 9, there's

[227:11]

an error. Call to undeclared function

[227:14]

meow and then something fairly arcane,

[227:16]

but that means that meow is no longer

[227:18]

recognized as an actual function. I know

[227:21]

that it doesn't come from CS50.h, and I

[227:23]

know it doesn't come from standard.io.h.

[227:24]

It's just down there. But why is the

[227:27]

compiler being kind of dumb here? Uh,

[227:30]

yeah.

[227:31]

function.

[227:34]

>> Yeah, because in so far as the first

[227:35]

version worked like logically it would

[227:37]

seem that putting it at the bottom was

[227:39]

just a bad idea because C compilers are

[227:42]

fairly simplistic. Like they won't

[227:45]

proactively do you the favor of like

[227:46]

checking all the way down at the bottom

[227:48]

of the file. They're going to take you

[227:49]

literally. So if meow doesn't exist as

[227:52]

of line 9, that's on you. Like that is

[227:54]

an error. So I could fix this by just

[227:57]

undoing what I did and move it way back

[227:59]

up to the top. But let me argue that in

[228:01]

general when writing C programs, the

[228:03]

main function, which I keep using and

[228:05]

we'll talk more about next week, is

[228:06]

literally meant to be the main part of

[228:08]

your code. And so it kind of stands to

[228:09]

reason that it should be at the top

[228:11]

because when you open the file, it'd be

[228:12]

nice to see the main program that you

[228:14]

care about, the main function. So

[228:16]

there's an argument to be made that it's

[228:17]

a little annoying to have to put my

[228:19]

functions all at the top, which is just

[228:20]

going to push main further and further

[228:22]

down. So there is a solution, and this

[228:24]

is dare say the only time copying and

[228:27]

pasting is appropriate. Let me delete

[228:28]

most of these blank lines which is

[228:30]

unnecessarily dramatic and just move it

[228:33]

below main as over here. The way I can

[228:38]

uh the solution here though is to do

[228:40]

this to copy the first line of the main

[228:44]

function its so-called signature and

[228:47]

then just put that one line and only

[228:49]

that one line with a semicolon above

[228:51]

main. And this is what's known as a

[228:54]

prototype. So a prototype is just a bit

[228:57]

of a hint to the compiler, a promise if

[228:59]

you will, that hey compiler, there will

[229:02]

exist a function called meow. It takes

[229:04]

no input and it returns no output

[229:06]

semicolon. And it's on the honor system

[229:08]

that it will eventually exist later in

[229:10]

the file. We'll talk more about this

[229:11]

next week why that works, but this is

[229:13]

sort of a promise to the compiler that

[229:15]

it will eventually be defined. Now, what

[229:18]

I've done here on line four as an aside

[229:20]

is what's generally known as a comment.

[229:22]

I just wanted to put on the screen

[229:23]

exactly what I was verbalizing. Anything

[229:25]

in C that starts with slash is a note to

[229:28]

self, like a sticky note in Scratch,

[229:30]

which is just for the human, not for the

[229:31]

computer. And it's a way of reminding

[229:32]

yourself or someone else what's going on

[229:34]

on that line or those lines of code. But

[229:37]

I'll go ahead and delete it for now is

[229:38]

unnecessary because now if I go back

[229:40]

into my terminal and clear those errors,

[229:42]

make this cat again, now it does work

[229:45]

because the cat uh the meow function has

[229:48]

been defined exactly where it should be.

[229:51]

And now I can make the new version of

[229:53]

this uh cat even better. I could change

[229:56]

the function meow to take a variable n

[229:59]

as input for the number of times. And

[230:02]

then in here I could do something like

[230:03]

my for loop for int i equals z i less

[230:07]

than n i ++. And then in this for loop I

[230:12]

can print out quote unquote meow. And

[230:14]

then I'm going to have to change this

[230:16]

too because I have to copy and repaste

[230:17]

it if you will or just manually fix

[230:19]

that. But now I can get rid of all of

[230:21]

this and do meow three for instance. And

[230:25]

this now will be the second version of

[230:27]

the scratch code. If you will make cat

[230:29]

still going to work exactly the same.

[230:31]

Meow meow meow. But now I've implemented

[230:33]

my own function that does take input

[230:35]

even though it doesn't happen to return

[230:38]

any output.

[230:41]

All right. Questions

[230:43]

on any of these examples just yet?

[230:47]

confusion.

[230:50]

All right, let me add one other feature

[230:52]

to this to demonstrate that we can take

[230:53]

not only input but actually produce

[230:56]

output if we want. If I go back into

[230:59]

this code here, let me propose that it's

[231:02]

a little silly to be hard coding that is

[231:04]

fixating three. It'd be nice to get

[231:06]

input from the user. So I could do this.

[231:09]

I could use int n equals get int and say

[231:13]

something like what's n question mark

[231:16]

and then I could pass n in if only to

[231:19]

demonstrate a couple of things. So one

[231:21]

now the program is dynamic. I'm going to

[231:23]

ask the user how many times to meow and

[231:25]

I'm going to pass in that value n. Now

[231:27]

this deliberately is confusing at the

[231:30]

moment because wait a minute I got n

[231:32]

defined here used here but then

[231:34]

redefined here and then reused here. So

[231:38]

it turns out that even if you create n

[231:41]

up here and use the name n, no other

[231:44]

functions can see it for that same issue

[231:46]

of scope. So for instance, suppose I

[231:49]

didn't quite remember this and I sort of

[231:51]

naively just said void. Meow doesn't

[231:53]

need to take any inputs because heck

[231:55]

meow uh n is already defined in main.

[231:58]

Let me go ahead and open my terminal and

[232:00]

clear it. Make cat and see what error

[232:03]

comes out here. Well, error cat. Oh,

[232:07]

sorry. I made two mistakes here. Let me

[232:09]

I also have to change the prototype up

[232:11]

here to say void which means again meow

[232:14]

takes no inputs. Let me go ahead now and

[232:16]

rerun make cat. And there we have an

[232:20]

undeclared identifier again n. So in cat

[232:23]

line 14 which is here it doesn't like

[232:25]

that I'm using n. But wait a minute I

[232:27]

created n here but for the same logic as

[232:30]

earlier. That's fine. You created n on

[232:32]

line 8. But where does n exist? In what

[232:34]

scope?

[232:38]

Yeah, only between the curly braces,

[232:40]

which is lines seven and 10. So by the

[232:42]

time you get down to 14, it's out of

[232:44]

scope, so to speak. So it just doesn't

[232:46]

work. So the solution is exactly what I

[232:48]

did the first time. I can pass it into

[232:50]

meow as input, and I have to tell C to

[232:53]

expect that input. And I can use the

[232:55]

same name, but arguably that's going to

[232:57]

get confusing sometimes. But let me do

[232:59]

this. Let me go back into my code. Let

[233:02]

me undo this change such that now meow

[233:04]

does take an input, but instead of just

[233:06]

calling it n and using n everywhere for

[233:07]

number, this is crazy. Let's just call

[233:09]

this like times. So meow takes some

[233:12]

number of times and then it uses that

[233:15]

value. And now I'm passing in on line 9

[233:18]

n, but in the context of the meow

[233:21]

function on lines 12 onward, that same

[233:25]

variable n is now referred to as times

[233:28]

because you're passing it in as input

[233:29]

and giving it its own name. And that's

[233:31]

totally your prerogative. It's just a

[233:34]

matter of scope. I mean, I could have

[233:36]

called it M or some other letter of the

[233:38]

alphabet, but times is even more clear

[233:40]

because that's the number of times I

[233:41]

want the cat to meow. But again, the

[233:44]

whole point here is just this matter of

[233:47]

scope.

[233:49]

All right. So, let's take a higher level

[233:51]

look now at some of the things we've

[233:52]

been thinking about and then we'll do a

[233:54]

final deep dive or two on some of the

[233:56]

corner some of the problems that we can

[233:58]

solve with all of these building blocks

[233:59]

and some of the problems that we're sort

[234:00]

of ignoring for now. So, when it comes

[234:02]

to writing good code, CS50 and really

[234:04]

the world in general tends to focus on

[234:06]

these kinds of axes. Correctness,

[234:08]

design, and style. What does this mean?

[234:10]

Correctness just means does the code

[234:11]

work the way it's supposed to? In the

[234:13]

context of a class, it should do exactly

[234:14]

what the homework assignment aka problem

[234:16]

set tells you to do. In the real world,

[234:18]

it should do exactly what someone

[234:20]

decided the software should do, the

[234:21]

product manager, the CEO, or the like.

[234:23]

Correctness just means it behaves as it

[234:25]

should. That's different though from how

[234:27]

well designed the code might be. And

[234:29]

we've seen that a few times. I've had

[234:31]

some simplistic examples in Scratch and

[234:32]

C that were 100% correct. Like it did

[234:34]

the right thing logically, but I was

[234:36]

wasting the computer's time. I was

[234:38]

wasting the human's time by asking more

[234:39]

boolean expressions than I needed to and

[234:42]

so forth. So design is more about like

[234:44]

in the in the world of English like not

[234:46]

only saying things that are correct but

[234:48]

doing it well like in making a good

[234:50]

cogent argument not just one that

[234:52]

happens to be correct. Style meanwhile

[234:55]

is the third axis on which we might

[234:56]

evaluate the quality of someone's code

[234:58]

and that's more of the aesthetics like

[235:00]

is everything pretty printed that is

[235:02]

nicely indented are variables well-

[235:04]

named and not just called XYZ

[235:06]

arbitrarily or something like that. So

[235:08]

style matters really to other humans,

[235:10]

not to the computer, but to other

[235:11]

humans. And to illustrate these, you'll

[235:14]

see that in problem set one onward,

[235:16]

you'll be given a number of tools that

[235:17]

you can use. So one of those tools is

[235:19]

called check 50. And in each problem set

[235:21]

problem in C and Python and other

[235:23]

languages, you'll be showed how you can

[235:25]

test your own code. And you can

[235:26]

literally run a command that CS50

[235:28]

created called check 50. You'll then

[235:30]

specify what's called a slug, which just

[235:32]

means a unique identifier for that

[235:33]

homework problem. and you'll get uh

[235:35]

quick feedback on whether or not your

[235:37]

code is correct. It doesn't mean it's

[235:39]

well implemented or well-designed or

[235:41]

pretty that is well stylized. But at

[235:43]

least that's the first gauntlet in

[235:44]

getting good code submitted. Design

[235:47]

though is much more subjective. Design

[235:48]

is something you get feedback on from a

[235:50]

human for instance in section or a

[235:51]

teaching assistant or in software. You

[235:54]

can actually see at top VS code there's

[235:56]

a couple of buttons that I haven't yet

[235:57]

used but could. Design 50 is built on

[236:00]

top of the CS50 duck whereby if you have

[236:02]

a program open in a tab, you click

[236:04]

design 50, you will get chatgpt like

[236:08]

advice on how you can improve not the

[236:10]

correctness of that code but the design

[236:11]

of that code, the quality thereof, which

[236:14]

is a bit more subjective and modeled

[236:15]

after what a good teaching assistant

[236:17]

might say. Style 50, meanwhile, is a

[236:20]

third tool that will provide you with

[236:21]

feedback on the style of your code and

[236:23]

will show you on the left what your code

[236:24]

looks like and on the right what your

[236:26]

code really should look like in so far

[236:28]

as it should be consistent with what

[236:29]

we've taught in class and consistent

[236:31]

with CS50's so-called style guide. And

[236:33]

those of you who have some prior

[236:34]

programming experience undoubtedly won't

[236:36]

like some of CS50's stylistic choices.

[236:38]

And that's going to be the case in the

[236:40]

real world, too. But as I alluded to

[236:42]

earlier, in typical companies, you would

[236:43]

have an official style guide or tool to

[236:45]

which everyone adheres so that

[236:47]

everyone's code actually looks the same

[236:49]

as everyone else's even though people

[236:50]

have contributed different solutions to

[236:53]

problems. So correctness, design, style

[236:55]

is not only how we but really the world

[236:58]

at large tends to evaluate the quality

[237:00]

of code and we do it by way of these

[237:01]

CS50 specific tools here. All right, how

[237:06]

about one final flourish then to this

[237:09]

here program? Back in VS Code, I've got

[237:12]

a correct solution right now. Um, it's

[237:14]

well styled, I'll stipulate, even though

[237:16]

it could stand to have some more

[237:18]

comments. So, for instance, I could do

[237:19]

something like this, like meow uh some

[237:23]

number of times, a comment to myself. Or

[237:26]

up here I could say something like uh

[237:28]

get uh a number from user just to remind

[237:32]

myself and my TA or my colleague what it

[237:35]

is this code is doing. But what more

[237:37]

could I do in the way of design? Well,

[237:39]

this function here get in will indeed

[237:41]

get me an integer but not just positive

[237:43]

or zero but negative. And I could go in

[237:46]

and add a bunch of code like before like

[237:48]

I could actually do instead of this line

[237:50]

I could do something like int n

[237:53]

semicolon do the following. All right. n

[237:55]

equals get int and then I can say what's

[237:58]

n question mark and then after that I

[238:00]

can do something like while n is less

[238:03]

than zero keep doing that so I can have

[238:06]

a pretty verbose implementation of

[238:08]

getting user input or I can implement

[238:11]

another function of my own that only

[238:14]

gets a positive integer or non- negative

[238:16]

integer from the user for instance I

[238:19]

might do something like this uh I could

[238:22]

uh declare at the bot uh maybe below my

[238:25]

main function a function like this uh

[238:27]

int uh how about get n and then inside

[238:32]

of this I might say void because I'm not

[238:34]

going to pass in any input then inside

[238:37]

of this function is where I'm going to

[238:38]

do int n do while uh n equals get int

[238:43]

quote unquote what's n question mark and

[238:46]

then down here I'm going to do while n

[238:48]

is less than zero but rather than do

[238:52]

something immediately with n because I'm

[238:54]

no longer inside of my so-called main

[238:56]

function. What I'm going to do, which is

[238:58]

new, is return this value n. And notice

[239:02]

that this notion of returning a value,

[239:04]

which is the first time I've done this

[239:05]

explicitly, is consistent with this

[239:07]

little hint here on line 19, which

[239:09]

implies that this get n function, which

[239:11]

I'm inventing, is going to return not

[239:13]

void, which means nothing, but an

[239:15]

integer. And that's the whole purpose of

[239:17]

this function in life. Now, if I scroll

[239:19]

back down here, I can get rid of this

[239:21]

whole block of code and just say get n

[239:27]

from the user and then I can immediately

[239:29]

call meow with that value. I need to do

[239:32]

one other thing. I need to highlight

[239:34]

this line of code here and I'm going to

[239:36]

go ahead and add another prototype up

[239:38]

top, which is the only time again for

[239:39]

now that copy paste is encouraged and uh

[239:42]

best to do. So, I've invented my own

[239:45]

function getn. The whole point being now

[239:47]

I have this sort of abstraction here of

[239:49]

a function whose sole purpose in life is

[239:51]

to get me not just an integer but one

[239:54]

that is zero or positive and not

[239:57]

negative. If I open my terminal window,

[240:00]

clear the mess from before, make this

[240:02]

cat dot slashcat. What's N3? I'm now

[240:06]

back in business. And again, we've

[240:08]

essentially translated from scratch last

[240:10]

time into C this time. Exactly how we

[240:13]

might modularize now the code. abstract

[240:15]

away these lower level details and

[240:17]

ultimately create my own function that

[240:19]

as before takes not only arguments but

[240:21]

in this case has not only side effects

[240:24]

or doesn't have side effects but rather

[240:25]

a return value this time.

[240:29]

All right. So as you walked in we had a

[240:31]

little walkthrough of Super Mario

[240:32]

Brothers playing from yester year which

[240:34]

was a sidescrolling game in which Mario

[240:36]

would jump down and go up down left

[240:37]

right and try to collect coins and make

[240:39]

it to the end of the level. There's a

[240:40]

lot of obstacles throughout this kind of

[240:42]

game uh whereby the world might look a

[240:44]

little something like this. Like there's

[240:45]

a pit that Mario's got to jump over and

[240:47]

then there's these coins hidden

[240:48]

typically behind these question marks

[240:50]

that he can jump up and hit his head

[240:51]

with and actually acrew points. Now,

[240:53]

we're not going to do anything graphical

[240:54]

just yet. We're leaving graphics behind

[240:56]

for now in the form of scratch. But with

[240:58]

C, we can implement some of these ideas.

[241:00]

For instance, if I were to write code to

[241:02]

generate just this uh row of four

[241:05]

question marks, I dare say there's a

[241:06]

bunch of ways we can do this. In other

[241:08]

words, let's see if we can't use all of

[241:10]

today's building blocks to start

[241:11]

implementing our own tiny version of

[241:13]

Super Mario Brothers in a file, say,

[241:15]

called Mario.c. So, let me open and

[241:18]

clear my terminal window. Let me run

[241:19]

code Mario.c. And let's just try to do

[241:22]

something super simple like print four

[241:24]

question marks in a row. Well, to do

[241:26]

this, I need print f. So, I'm going to

[241:28]

include standard io.h. I'm then going to

[241:30]

do int main void. More on that next

[241:32]

time. And inside of main, my default

[241:35]

function that just automatically as

[241:37]

before gets called for me. I'm going to

[241:38]

print out the simplest possible

[241:40]

implementation just print out four

[241:42]

question marks like that. So no need per

[241:44]

se for a loop just yet. But I think we

[241:47]

can go down that rabbit hole too. Let me

[241:49]

go down into my terminal window. Make

[241:51]

this version of Mario dot / Mario.

[241:54]

Enter. And voila, we have a very black

[241:56]

and white version textual version of

[241:59]

four question marks in the sky. Now I'm

[242:01]

kind of cheating here by just hard-

[242:03]

coding four question marks. What if I

[242:04]

wanted not four but three or five or

[242:06]

some number other number? Well, we could

[242:08]

do that with a loop too. So let me

[242:10]

change this code here and do something

[242:12]

like this. Four int i equals say zero. I

[242:15]

less than say four for now. I ++ then

[242:18]

inside of this loop I can print out one

[242:21]

question mark at a time. Semicolon. Now

[242:24]

let me go back to the bottom. Make this

[242:26]

version of Mario dot / Mario. Enter. And

[242:28]

voila. It's not actually correct this

[242:32]

time. So why am I getting a column

[242:35]

instead of a row with this here change?

[242:37]

Yeah.

[242:41]

>> Yeah. So I've got I foolishly included

[242:43]

the backslash n after each question

[242:45]

mark. Okay. So that seems like an easy

[242:47]

fix. Let me get rid of that. Let me now

[242:49]

recompile Mario. Rerun Mario. And now so

[242:53]

close. Now I've just done something

[242:55]

stupid. All right. I need the back

[242:56]

slashn. So, I think I do want this here.

[242:59]

Or

[243:01]

what do you propose instead?

[243:07]

>> Yeah, I should really put the back slash

[243:08]

in outside of the loop. So, once I'm

[243:10]

done printing all of the question marks,

[243:12]

then I get the backslash. And that's

[243:14]

fine, even though we haven't seen this

[243:15]

before. Back slashn is an escape

[243:17]

sequence that you can certainly print by

[243:19]

itself. So, I do quote unquote back

[243:21]

slashn outside of the loop below those

[243:23]

curly braces. Now, if I do make Mario

[243:26]

dot slashmario, now I get the four uh

[243:28]

question marks in a row as well as the

[243:31]

new line at the very end. So, again,

[243:33]

kind of a little baby exercise, but

[243:34]

demonstrative of how you can just take a

[243:36]

few different techniques, a few

[243:38]

different building blocks we've used to

[243:39]

compose a correct solution to what a

[243:41]

moment ago was a brand new problem.

[243:44]

Well, let's try another. So later on in

[243:45]

Super Mario Brothers when you go into

[243:47]

sort of the underground world, you see

[243:48]

or rather it's still above ground, you

[243:50]

see a column of uh bricks like this that

[243:53]

he has to jump over. So those here, how

[243:55]

might we make a column? Well, we kind of

[243:57]

had that solution already. And in fact,

[243:58]

if I go back to VS Code here and just

[244:01]

change this version of Mario, I think we

[244:03]

can design this thing to be pretty

[244:05]

simply the same. I is less than three

[244:07]

though. And I do want to put the back

[244:09]

slashn at the end there. Make Mario dot

[244:12]

/ Mario. And albeit textual, I've got my

[244:14]

column of three uh of let's see, I don't

[244:19]

want question marks. Let's make this a

[244:20]

little better. Maybe we'll use the hash

[244:21]

symbol because that kind of sort of

[244:23]

looks like a square. So, make Mario dot

[244:25]

/ Mario. Okay, now we're back in

[244:27]

business. But let's make it more

[244:29]

interesting by going into Mario's

[244:30]

underground now. And here's the third

[244:31]

and final Mario problem whereby we want

[244:33]

to implement like this 3x3 grid of

[244:36]

bricks circled here. So, this one's

[244:39]

interesting because we've never done

[244:40]

something in two dimensions. I did

[244:42]

horizontal, I did vertical, but we

[244:44]

haven't really composed those ideas into

[244:46]

the same. So, let me now think a little

[244:49]

harder this time about how I can print

[244:51]

out row, row, row. And this is where if

[244:55]

you have in your mind's eye any

[244:56]

familiarity with like old school

[244:58]

typewriters, it's kind of the same idea

[244:59]

where you want to print a row of bricks,

[245:01]

then go back to the beginning, a row of

[245:03]

bricks, then go back to the beginning,

[245:04]

and a row of bricks. And that's kind of

[245:06]

what print f has always been doing for

[245:07]

us. It's printing line by line by line

[245:09]

of text. It's not jumping around. So, we

[245:12]

can leverage that perhaps as follows.

[245:14]

Let me go into my main function here.

[245:16]

And if I want to print out something

[245:18]

two-dimensional, let me kind of think

[245:20]

about it as rows and columns. So, maybe

[245:22]

I could do this for int i equals 0, i

[245:25]

less than 3, i ++. Why? Well, I want to

[245:28]

do something three times. Even if I have

[245:30]

no idea where I'm going with this

[245:31]

solution, I at least want to do

[245:32]

something three times, like three rows

[245:34]

of text. But how about this? On each

[245:37]

row, what do I want to do? I want to

[245:39]

print out three things. So I could steal

[245:43]

this idea like int i= 0, i less than 3,

[245:46]

i ++. And then inside of this loop, let

[245:50]

me just print out one brick at a time.

[245:52]

No new lines yet. One brick at a time.

[245:55]

But there is a bit of a problem here.

[245:57]

This is correct to nest loops in this

[246:00]

way. Totally fine to have an outer loop.

[246:02]

Totally fine to have an inner loop. But

[246:04]

I probably don't want the inner loops

[246:06]

variable competing with the outer loops

[246:09]

variable by giving them the same name.

[246:11]

And that's fine. It is pretty

[246:12]

conventional in code when you want

[246:14]

another integer and it's not I because

[246:16]

you've used it already. Fine. You can

[246:18]

use J. So using I and J and K is

[246:21]

generally fine. If you're using L, M, N,

[246:23]

O, like at that point, you're probably

[246:25]

doing something wrong. There's no hard

[246:26]

line, but at some point it gets

[246:27]

ridiculous and you should be coming up

[246:29]

with better variable names. But I and J,

[246:31]

maybe K is fine. So now what's really

[246:34]

happening? Let me suppose that this is

[246:36]

my uh for each row. This is my for each

[246:41]

column I want to print one brick. Now

[246:45]

this isn't quite correct but let me go

[246:48]

ahead and make this version of Mario dot

[246:50]

/ Mario and ah now there's what? One,

[246:53]

two, three. There's nine bricks there.

[246:55]

So I'm close, right? It's supposed to be

[246:57]

3x3. Nine total. What do I want to do

[247:01]

though to get this just right?

[247:04]

Yeah, over on the left. Yeah. What on

[247:07]

what line number would you or afterward?

[247:10]

Uh where would I put the new line?

[247:14]

Because I think I don't want to put it

[247:16]

here because I'm going to get myself

[247:17]

into trouble as before. How about in

[247:18]

back?

[247:21]

>> After the what?

[247:24]

>> After 13. Yeah. So, after I finish

[247:26]

printing each uh brick in the column

[247:29]

from left to right, I'm going to go

[247:31]

ahead and print out I think a single new

[247:34]

line here, nothing else. And now, if I

[247:35]

open my terminal, run Mike Mario again,

[247:38]

dot / Mario. Now, we've got it. And it's

[247:40]

not a perfect square like this one is

[247:43]

because like the hashtags are kind of

[247:44]

more vertical than they are horizontal,

[247:46]

but it's pretty darn close. The e the

[247:48]

takeaway here being you can certainly

[247:50]

nest these kinds of ideas and compose

[247:52]

them. And honestly, INJ is maybe making

[247:54]

this uh more confusing than necessary. I

[247:57]

could just give these better names like

[247:59]

row, row, row, and then maybe call for

[248:04]

column or column. I can spell it out if

[248:06]

that's clearer. Column column just to

[248:08]

make clear to myself, to my TA, to my

[248:12]

colleagues what exactly these variables

[248:14]

represent. And indeed, like an old

[248:16]

school typewriter, the outer loop is

[248:17]

handling row by row by row. But each

[248:20]

time you're on a row, you first want to

[248:22]

do column, column, column, column,

[248:24]

column, column. And that's what

[248:25]

logically the nesting is achieving. And

[248:27]

again, if I do make Mario dot/mario, all

[248:30]

I've done is change variable names. It

[248:31]

has no functional effect beyond that.

[248:34]

Now, this is a little more subtle, but

[248:36]

there is a bit of duplication in this

[248:39]

program. There's a bit of magic, and

[248:42]

this is subtle, but does anyone want to

[248:43]

conjecture what still could be improved

[248:46]

here?

[248:50]

What is maybe rubbing you the wrong way?

[248:54]

>> Yeah, I've hardcoded the three here and

[248:56]

here. It's not a big deal. It's like an

[248:58]

in-class exercise. Like, who really

[248:59]

cares if I'm just manually typing three.

[249:02]

But if I want to make this square bigger

[249:04]

and bigger and bigger over time, I'm

[249:06]

going to have to change it in two

[249:06]

different places. And I've conjectured

[249:08]

last time and today eventually that's

[249:10]

going to come back and bite you. You're

[249:11]

going to do something stupid or a

[249:12]

colleague isn't going to realize you

[249:13]

hard-coded three in multiple places.

[249:15]

Like just bad design. So, how could we

[249:18]

fix this? Well, we could just declare a

[249:20]

variable like n, set it equal to three,

[249:23]

and then use n in both places. And

[249:25]

that's pretty darn good. That's better

[249:27]

because now we're reusing the value. But

[249:29]

we can do one better than this. It turns

[249:31]

out in C and in many languages too,

[249:34]

there's the notion of a constant whereby

[249:36]

if you want to store something in a

[249:37]

variable, but you want to signal to the

[249:39]

compiler that this value should never

[249:41]

change. And better still you want to

[249:43]

prevent yourself a human let or not not

[249:46]

to mention a colleague from accidentally

[249:48]

changing this value you can declare it

[249:50]

to be constant or const for short. So if

[249:52]

I go back into VS code on line five now

[249:54]

and say constint that means that n is an

[249:58]

integer that has a constant value. So if

[250:00]

I do something stupid later in my code

[250:02]

and I try to set n equal to something

[250:04]

else the compiler won't let me do that.

[250:06]

It will protect me from myself. So, it's

[250:08]

just a slightly better design as well.

[250:13]

All right, questions on any of these

[250:16]

here, Mario

[250:19]

examples. The first of our sort of real

[250:21]

world problems, albeit simplified

[250:24]

textually.

[250:25]

All right, let's focus lastly on things

[250:28]

we can't really do well with computers.

[250:31]

Uh, namely some of the limitations

[250:33]

thereof. So, here is a cheat sheet of

[250:35]

some of the operators we've seen thus

[250:36]

far. We played with these with

[250:37]

comparison and uh doing some uh addition

[250:40]

or the like but here we have addition,

[250:42]

subtraction, multiplication, division

[250:44]

and the modulo operator which is

[250:46]

essentially the remainder operator which

[250:48]

you can do with a single command uh with

[250:50]

a single operator like this. Let's use

[250:52]

some of these to make our own calculator

[250:54]

and see what this calculator can and

[250:56]

can't do for us. So back here in VS

[250:57]

Code, let me open my terminal. Let's go

[250:59]

ahead and create a program called

[251:01]

calculator C. And in this program, let's

[251:03]

do something super simple initially that

[251:05]

just like adds two numbers together. So

[251:07]

let's include first uh cs50.h so we can

[251:11]

use our get functions. Then let's go

[251:12]

ahead and include standard io.h so we

[251:14]

can use print f. Let's just copy paste

[251:16]

our usual ma uh int main void. And

[251:19]

inside of main let's do this. Declare a

[251:21]

variable x. Set it equal to get int. And

[251:23]

let's ask the user what's x question

[251:26]

mark. Then let's declare another

[251:27]

variable y. set it equal to get int and

[251:30]

ask the user what's y question mark.

[251:32]

Then let's do something super simple

[251:34]

like give me a third variable. Heck,

[251:36]

we'll call it z. Set it equal to x + y.

[251:40]

And then lastly, let's just print out

[251:41]

the sum of x + y. So this is a super

[251:44]

simple calculator for addition of two

[251:46]

numbers. Print f quote unquote. What's

[251:49]

the answer going to be? Well, it's not

[251:51]

percent s. This was quick earlier.

[251:54]

What's the placeholder to use for an

[251:55]

integer?

[251:57]

percent I back slashn and what do I want

[252:00]

to substitute for that placeholder

[252:03]

just z in this case we haven't quite

[252:05]

done this before but again it's just the

[252:06]

composition of some of our earlier ideas

[252:08]

I can go ahead and make this calculator

[252:10]

enter dot slashcal enter what's x is one

[252:14]

what's y is two and indeed I get three

[252:17]

so not a bad calculator it seems to be

[252:19]

working correctly but it's maybe not the

[252:22]

best design like it's generally frowned

[252:24]

upon to create a variable like Z if

[252:28]

you're only going to use it a moment

[252:29]

later in one place. Like why are you

[252:31]

wasting my time creating a variable just

[252:33]

to use it once and only once? Sometimes

[252:34]

it's fine if it makes your code more

[252:36]

readable or clearer. And in fact, it

[252:38]

might if I called it sum. Like that's

[252:40]

arguably a net positive because I'm

[252:41]

making clear to the reader that it's the

[252:43]

sum of two variables. But even then, I'm

[252:45]

quibbling. I could just get rid of that

[252:47]

third variable altogether. And heck, I

[252:49]

could just do x plus y right here.

[252:50]

That's totally fine and reasonable,

[252:52]

especially since it's still a pretty

[252:53]

short line of code. It's not hard for

[252:55]

anyone to read. Feels like a reasonable

[252:57]

call. But this hints at again my comment

[252:59]

on design being subjective. There's no

[253:01]

steadfast rules here. Some of the TAs

[253:03]

might disagree with me, but like h this

[253:05]

feels fine. It's readable, which is

[253:06]

probably the most important thing

[253:08]

ultimately. Let's make this calculator

[253:10]

dot /cal enter 1 2 and we still get

[253:14]

three. So the code now is still working.

[253:16]

As an aside, if you're starting to

[253:18]

wonder how I type so fast, sometimes I'm

[253:20]

kind of cheating with autocomplete. So

[253:22]

if I know I want to create a program

[253:24]

called calculator and calculator.c

[253:26]

exists, I can start typing c

[253:30]

tab and you can hit tab to sort of

[253:32]

autocomplete the rest of the file name

[253:34]

if it happens to exist there. Better

[253:36]

still, if I want to go back to previous

[253:38]

commands I've typed, I can actually use

[253:39]

my up and down errors to go through my

[253:41]

history. So if I go up up, you'll see

[253:44]

all of the recent commands I typed, and

[253:46]

that saves me time, too. So just little

[253:48]

keyboard shortcuts that speed things

[253:49]

along. All right. All right. Well, let's

[253:51]

do something like this. Not just

[253:52]

addition, why don't we use some

[253:54]

multiplication? So, how about we prompt

[253:56]

the user not for two um numbers, but how

[253:58]

about just one initially x and let's go

[254:01]

ahead and multiply x by two. And I would

[254:03]

do x asterisk 2, which is the

[254:05]

multiplication operator in C. Let's make

[254:08]

this version of the calculator dot/cal.

[254:11]

And now, what's x? Let's do 1. So 1 * 2

[254:14]

is 2. Let's do this again. Let's type in

[254:16]

2. 2 * 2 is 4. Let's do this again. 3. 3

[254:19]

* 2 is 6. and so forth. That's fine. It

[254:22]

seems to work. But maybe let's implement

[254:23]

like a recent meme from the past year or

[254:26]

two. How about this? Let's uh let's see

[254:29]

if you recognize it as we go. So, I'm

[254:31]

going to get rid of this code al

[254:32]

together. And inside of my calculator,

[254:33]

I'm going to do something like int

[254:35]

dollars equals $1 by default. Then I'm

[254:38]

going to deliberately induce an infinite

[254:39]

loop just for demonstration sake. Then

[254:42]

I'm going to do a character from the

[254:44]

user and say something like this using

[254:46]

getch char which gets a single

[254:48]

character. Uh, how about I'll tell the

[254:50]

user here's this many dollars percent I

[254:54]

with a US uh dollar sign before it

[254:57]

double it and give to next person

[255:00]

question mark if you're familiar with

[255:01]

that one and I'm going to prompt them

[255:03]

for yes no answer but I'm going to plug

[255:05]

in the current number of dollars so they

[255:06]

know what they're wagering on then below

[255:08]

this I'm going to say if the character

[255:10]

the human typed in equals equals y for

[255:12]

yes then I'm going to go ahead and do

[255:15]

dollars times equals 2 which recall was

[255:17]

our shorthand notation

[255:19]

for doubling something. Uh, in this

[255:22]

case, I could more pedantically say

[255:23]

equals dollars* 2. But again, I can save

[255:26]

some keystrokes and do dollar uh times

[255:29]

equals 2 instead. There's no plus+

[255:31]

there's no star star trick asteris

[255:33]

asterisk trick. You have to do it in

[255:35]

this way uh minimally. However, if the

[255:38]

user does not want to double it and give

[255:39]

it to the next person, then let's do an

[255:41]

else and just break out of this infinite

[255:43]

loop altogether. But notice what I've

[255:45]

deliberately done in get char similar to

[255:48]

print f. I have included a placeholder.

[255:51]

Why we implemented getchar and get in

[255:53]

and get string just like print f in that

[255:56]

you can pass in placeholders and plug in

[255:58]

values. Why? Well again for the meme

[256:00]

sake I want to be able to tell the user

[256:02]

how much money I'm about to hand them

[256:04]

when I ask them the question. Do you

[256:05]

want to double it and give it to the

[256:06]

next person? I want to see the number.

[256:08]

And the dollar sign is just because

[256:09]

we're talking about dollars. The percent

[256:10]

i is because we're talking about

[256:12]

integers. All right. If I didn't mess

[256:14]

this up, let's make this version of a

[256:16]

calculator or meme. So far so good.

[256:19]

Dot/calculator. Enter. Here's $1, which

[256:22]

was the initial value of my dollars

[256:24]

variable on line six. Double it and give

[256:26]

it to the next person. All right. Why?

[256:28]

Here's $2. Double it and give it to the

[256:30]

next person. Okay. Okay. Okay. Okay.

[256:35]

Okay. I'm going to do it faster. It's

[256:37]

getting pretty good. You can see the

[256:38]

power of exponentiation.

[256:41]

It's getting pretty high. Let's keep

[256:43]

going. Keep going. Lot of doll.

[256:47]

Too far.

[256:50]

That does not happen in the memes. What

[256:52]

happened here?

[256:54]

What's going on? Yeah. What do you

[256:55]

think?

[257:03]

>> Exactly. Good intuition. Because the

[257:05]

computer only has a finite number of

[257:06]

bits allocated to each integer. I

[257:08]

hypothesized earlier that it's usually

[257:10]

32 bits, maybe 64 bits, but it's finite,

[257:13]

which means you can only count so high

[257:15]

and it's roughly 4 billion or again an

[257:16]

integer by default can be negative or

[257:18]

positive. So it's roughly 2 billion and

[257:20]

that's pretty close to what we were

[257:22]

getting here. In fact, we overflowed the

[257:24]

integer in memory. In fact, integer

[257:27]

overflow is a term of art whereby you

[257:29]

can overflow an integer by trying to

[257:32]

store too big of a value in it. And the

[257:34]

reason for this is again to make this

[257:36]

clear, this is a piece of memory inside

[257:38]

of a laptop or a desktop or some other

[257:40]

device. And in these little black chips

[257:42]

is a whole bunch of bits or really bytes

[257:44]

that can store information

[257:45]

electronically. But they allocate those

[257:47]

bits in units of 8, maybe 16, maybe 32,

[257:50]

maybe 64, but finitely many per value.

[257:53]

And whether we're using 32 or 64, you

[257:56]

can only count so high if you have a

[257:58]

finite number of bits. And we've seen

[257:59]

this problem even on a small scale with

[258:01]

our flat light bulbs last week. If we

[258:03]

have a three-digit number as represented

[258:05]

by like three physical light bulbs or

[258:07]

three tiny transistors in the computer,

[258:08]

I can count from zero to one to two to

[258:12]

three to four to five to 6 to 7. If I

[258:15]

want to count to eight though, I need a

[258:18]

fourth bit. But as the red suggests, if

[258:21]

you don't have a fourth bit, for all

[258:22]

intents and purposes, that number is

[258:24]

just zero. Or as an aside, depending on

[258:27]

how you're representing your number,

[258:29]

sometimes a leading one indicates that

[258:31]

the number itself is negative, which is

[258:33]

why in VS Code, we actually saw both

[258:35]

symptoms. First, we went negative

[258:36]

because we wrapped around logically,

[258:39]

much like that one resulted in our

[258:41]

getting back effectively to zero, and

[258:43]

then we did indeed end up on zero

[258:46]

ultimately. So, how can we chip away at

[258:48]

this? Well, a couple of solutions

[258:49]

perhaps. Let me close my terminal window

[258:51]

here, and instead of using an int, well,

[258:54]

let's just kick the can down the road.

[258:55]

Let's use a long which is 64 bit. So at

[258:57]

least we can give away even more money

[258:59]

in this scenario. I can't use percent I

[259:01]

and need to use percent li now for a

[259:03]

long integer. But I think at this point

[259:06]

if I go back to VS Code's terminal

[259:08]

window here. Oh, and I quit that program

[259:10]

by hitting C quickly. Uh now I'm going

[259:13]

to go ahead and do make calculator again

[259:15]

dot /cal. And I'm just going to keep

[259:17]

hitting Y. But because I'm using a long

[259:20]

int now and thus 64 bits, if I do this

[259:23]

long enough, it's going to get crazy

[259:25]

high and much much higher than before.

[259:28]

High enough that I'm not going to keep

[259:29]

clicking Y enter because we're never

[259:30]

going to hit the boundary. But

[259:31]

eventually, especially if I did this in

[259:33]

a loop automatically, it would certainly

[259:35]

Oh. Oh, okay. I guess exponentiation

[259:38]

works fast. Okay, so it did work. I

[259:40]

didn't think I was going to hit it

[259:41]

enough times, but the same problem

[259:42]

happened again. We overflowed this long

[259:45]

integer even using that many bits

[259:47]

because I was talking so long I kept

[259:49]

hitting y enough times to overflow even

[259:51]

that long integer. So that too was a

[259:53]

problem and this happens truly in the

[259:55]

real world. So picture here is a Boeing

[259:57]

787 from a few years back, long before

[259:59]

there were all the more recent problems

[260:00]

with Boeing planes, whereby after 248

[260:05]

days of continuous power, which is kind

[260:07]

of a thing in the aviation industry,

[260:08]

like time is money and generally they

[260:10]

want the planes in the air as much as

[260:12]

possible, which means they want them

[260:13]

powered on as much as possible, which

[260:14]

means they don't like turn them off at

[260:16]

night. They keep them going and flying.

[260:18]

After 248 days, the New York Times

[260:20]

reported a few years back that a model

[260:22]

787 airplane that has been powered

[260:25]

continuously for 248 days can lose all

[260:28]

alternating current electrical power due

[260:30]

to the generator control unit

[260:31]

simultaneously going into failsafe mode.

[260:33]

This condition is caused by a software

[260:35]

counter internal to the GCUs that will

[260:37]

overflow after 248 days of continuous

[260:39]

power. Boeing is in the process at the

[260:41]

time of developing a GCU software

[260:43]

upgrade that will remedy the unsafe

[260:45]

condition. So literally what this means

[260:47]

is that the power to these planes would

[260:49]

just shut off if the planes were on for

[260:52]

more than 248 days at a time. And this

[260:54]

was a common thing for planes to be

[260:56]

maximal power. Why was this actually

[260:59]

happening or what was the solution?

[261:01]

Well, the short-term fix because it took

[261:02]

a while for Boeing to fix this was what?

[261:05]

What would you do if the the symptom is

[261:07]

that the plane shuts off mid-flight

[261:09]

after 248 days? Yeah.

[261:11]

>> Turn it off back on. literally turn it

[261:14]

off and back on again, much like you've

[261:16]

probably been taught with your phones

[261:17]

and computers and any other electronic

[261:19]

devices that somehow freak out on

[261:20]

occasion. Reboot the plane. Now, why is

[261:22]

that? Well, anytime you reboot a phone

[261:24]

or a laptop or a plane, all of those

[261:27]

variables get reset to their default

[261:29]

values, which if it's the first line of

[261:31]

code, like in some of my examples, gets

[261:33]

set back to zero again. For instance,

[261:35]

the first line of code is executed from

[261:37]

top to bottom. So, this effectively

[261:39]

solved the problem. But when they

[261:40]

finally rolled out a fix, then you

[261:42]

didn't have to do that anymore. But the

[261:43]

or source of the problem is essentially

[261:45]

that they were probably using 32-bit

[261:47]

integers, but also negative values. So

[261:49]

they had 31 bits at their disposal to

[261:51]

count to positive numbers. And 248 days

[261:55]

is roughly how many tenths of a second

[261:58]

there are, which means once you count in

[262:00]

tenths of a second for 248 days, you

[262:03]

would overflow an integer and the power

[262:05]

would shut off effectively because

[262:07]

something ended up going to zero. So,

[262:09]

there was a lot of sort of marketing

[262:11]

speak or technical speak in that, but it

[262:12]

boiled down to just a simple integer

[262:14]

overflow. There's a historical bug in

[262:17]

Pac-Man. If you've ever played this uh

[262:19]

in any of its forms, whereby you can

[262:21]

play up to level 255, but because there

[262:24]

was a missing if condition that checked

[262:26]

what level you were on, you could

[262:27]

accidentally garble the screen if you

[262:29]

were amazing at Pac-Man because they too

[262:31]

would overflow an integer and just

[262:33]

random characters would end up appearing

[262:35]

on the screen. So, it's sort of like a

[262:36]

badge of honor to actually hit level 256

[262:39]

in this way because of this bug. But

[262:41]

there's yet other issues we can see

[262:43]

here. And if you don't mind, we might go

[262:44]

a couple minutes over, but let me just

[262:46]

demonstrate what these examples can do

[262:48]

for us here. If I were to revamp my

[262:51]

calculator here as follows by clearing

[262:53]

my terminal window after hitting C to

[262:55]

kill that, let me go ahead and get rid

[262:57]

of all of this meme code here. Scrolling

[262:59]

down to the inside of main, and let's

[263:02]

just do a couple of things like this.

[263:03]

int x equals uh quote unquote uh what's

[263:08]

x question mark. Then let's go ahead and

[263:11]

do int equals get int quote unquote

[263:13]

what's y question mark. Then let's go

[263:15]

ahead and print out just x / y. So

[263:18]

here's a percent i back slashn x / y.

[263:22]

This would seem to be a calculator now

[263:23]

for division which I can make as before.

[263:26]

And actually sorry I don't want to do

[263:31]

missing terminating. Oh, sorry. Missing

[263:33]

a double quote. There was an unintended

[263:35]

bug. So, if I make this your calculator,

[263:36]

do do/calculator, type in 1, type in

[263:38]

three, I get zero, which is weird. What

[263:42]

if I do instead maybe two and three?

[263:45]

It's zero instead of 66. What if I do

[263:47]

three and three? Well, that curiously

[263:49]

works. But if I do something like four

[263:51]

and three, which would be 1.33, that two

[263:54]

doesn't seem to work. So there's this

[263:56]

other issue in computing when you have

[263:58]

finite numbers of bits known as

[263:59]

truncation whereby even when you're

[264:01]

trying to do floatingoint math like with

[264:03]

a decimal point if you are using an

[264:06]

integer you're going to throw away

[264:08]

everything after the decimal point

[264:10]

unless you're explicitly using the right

[264:12]

data type. And we saw an illusion to

[264:14]

this earlier. If I actually go in now

[264:16]

and change my values from integers to

[264:18]

floats and change the percent i to a

[264:21]

percent f and remake this calculator.

[264:24]

Now I can do 1 / 3 and I actually get

[264:26]

back that their response. But there's

[264:28]

another issue latent here which happens

[264:30]

to in the real world whereby I'm going

[264:32]

to tweak this percent f to be a little

[264:34]

arcane. It turns out you can tell C how

[264:36]

many digits you want to show, how many

[264:38]

significant digits you want, if you

[264:40]

will, by just using a dot and then a

[264:42]

number like 50 arbitrarily. And contrary

[264:44]

to what you might have learned in grade

[264:46]

school, this calculator would seem to

[264:47]

think that dot /calc 1 divided by three

[264:51]

is not 0.3333

[264:54]

infinitely many times. There's all this

[264:56]

random stuff happening at the end. Long

[264:59]

story short, this is because computers

[265:01]

one only use finitely many bits even to

[265:03]

represent floatingoint numbers. And if

[265:05]

there's an infinite number of those, you

[265:06]

can't possibly represent every possible

[265:08]

floatingoint value. So we're essentially

[265:10]

seeing an approximation of 1/3

[265:12]

precisely. But this too happens quite a

[265:15]

bit in the wild. There's really no

[265:17]

solution to this other than by throwing

[265:18]

more bits at the problem using a a

[265:20]

double instead of a float or at least

[265:22]

somehow trying to detect this and catch

[265:24]

this. That then is what we'd call

[265:26]

floatingoint imprecision. But to tie

[265:28]

this together and sort of induce a bit

[265:30]

of fear and for the coming years these

[265:31]

things happen all of the time. Back when

[265:33]

I was finishing school, there was the

[265:34]

so-called Y2K problem or year 2000

[265:37]

problem whereby for decades, computers

[265:39]

had been using not four digits to

[265:41]

represent years, but just two because it

[265:43]

was convenient. It was more efficient

[265:45]

because you use half as much memory to

[265:46]

represent maybe the year 1999, just

[265:48]

using two digits instead of four. Of

[265:51]

course, when the uh year rolled around

[265:53]

from 20 thou from 1999 to 2000, if you

[265:56]

didn't have these numbers even in

[265:57]

memory, you might confuse 2000 with

[266:01]

1900, which was the presumption if

[266:03]

you're only storing two digits. So, we

[266:05]

screwed that up. And thankfully, the

[266:06]

world scrambled. And if you read up on

[266:07]

Wikipedia and news articles from the

[266:09]

time, everyone thought the world might

[266:10]

very well end, but it didn't. So, you'd

[266:12]

think we'd have learned our lesson.

[266:14]

Unfortunately, another such problem is

[266:15]

coming up in the year 2038 whereby

[266:18]

historically since uh the 70s and prior,

[266:21]

computers have generally used 32-bit

[266:23]

integers to keep track of time, the date

[266:26]

and the time by means of counting how

[266:28]

many seconds have passed since January

[266:30]

1st, 1970. And all of the math is just

[266:33]

relative to that date because that's

[266:34]

when computers were really starting to

[266:36]

come onto the scene, if you will.

[266:37]

Unfortunately, there's only 4 billion

[266:40]

values you can count to or two billion

[266:42]

if you're doing negatives from uh

[266:44]

January 1st, 1970. And so, um on the

[266:47]

date January 19th, 2038, we will

[266:51]

overflow a 32-bit counter. And suddenly,

[266:54]

if this problem is not fixed by you or

[266:56]

other people before the year 2038, our

[266:58]

computers and phones and other devices

[267:00]

may very well think it's December 13th,

[267:03]

1901.

[267:05]

So, there are solutions to these

[267:06]

problems. CS50 is all about empowering

[267:08]

you with solutions to these problems.

[267:10]

But if you'd like to scan this here

[267:11]

code, um, this will add that date to

[267:13]

your Google calendar or your Outlook

[267:15]

calendar. Keep an eye on it. That though

[267:18]

is week one for CS50. Problem set one

[267:20]

will be in your hands soon. We'll see

[267:21]

you next time.

[267:29]

Heat. Heat.

[268:28]

Heat.

[268:44]

One fish. Two fish. Red fish. Blue fish.

[268:52]

>> Congratulations.

[268:54]

Today is your day. You're off to great

[268:56]

places. You're off and away.

[269:00]

>> It was a bright, cold day in April, and

[269:03]

the clocks were striking 13. Winston

[269:05]

Smith, his chin nuzzled into his breast

[269:08]

in an effort to escape the vile wind,

[269:10]

slipped quickly through the glass doors

[269:11]

of victory mansions, though not quickly

[269:14]

enough to prevent a swirl of gritty dust

[269:16]

from entering along with him.

[269:19]

All right, this is CS50 and this is week

[269:22]

two. And if we could after this dramatic

[269:24]

reading, a round of applause for our

[269:26]

volunteers.

[269:30]

So we can now take for granted from week

[269:32]

one that we now have a new way to

[269:34]

express some of the ideas that we first

[269:35]

explored in week zero like functions and

[269:37]

conditionals and variables and the like.

[269:39]

And now we're doing in C what we used to

[269:41]

do in Scratch. Today what we're going to

[269:42]

start to focus on is some real world

[269:44]

problems so that we can take for granted

[269:46]

that we have that expressiveness. We

[269:47]

have some tools in our toolkit and

[269:49]

actually start to solve some realworld

[269:51]

problems if representative thereof. In

[269:54]

particular, the real world problem that

[269:55]

we're going to start today and this week

[269:57]

with is that of reading levels. Odds are

[269:59]

when growing up, you read at a certain

[270:01]

level based on the age at which you were

[270:02]

at. Maybe it was first grade level or

[270:05]

fifth grade level or 10th grade level or

[270:07]

the like. And that was a function of

[270:08]

just how comfortable you were with the

[270:10]

words in the book or words on the screen

[270:12]

that you were reading. What you've just

[270:13]

heard, thanks to our volunteers, are

[270:15]

three different reading levels that each

[270:17]

of these three volunteers reads at. And

[270:19]

in fact, why don't we go ahead and hear

[270:21]

them again and be a little more

[270:22]

thoughtful this time as to assess at

[270:24]

what reading level your classmate is

[270:27]

reading. So, let's start with Leah if

[270:29]

you'd like to introduce yourself first.

[270:30]

Hi, I'm Leah. I'm a first year in

[270:33]

Hworthy. And here is my little thing.

[270:36]

One fish, two fish, red fish, blue fish.

[270:40]

>> So, at what reading level would you say

[270:42]

Leah reads based on her recitation

[270:45]

thereof? Yeah, in the front.

[270:46]

>> Kindergarten.

[270:47]

>> Kindergarten. Okay. Okay. So, a fairly

[270:49]

young age. And what makes you say

[270:50]

kindergarten?

[270:52]

>> He is speaking in very short phrases

[270:55]

without much complexity.

[270:56]

>> Okay. Very short phrases without much

[270:58]

complexity. And indeed, according to one

[271:00]

scientific measure that we'll explore in

[271:02]

this week's problem set, indeed. We

[271:04]

would say that Leah reads before grade

[271:07]

1, so kindergarten would indeed be apt.

[271:09]

But welcome to the stage here. Let's

[271:12]

move on now to Maria if you'd like to

[271:13]

introduce yourself.

[271:14]

>> Yeah. Hi, I'm Maria. I'm in Stoutton

[271:16]

thinking of applied math. Um,

[271:19]

congratulations. Today is your day.

[271:22]

You're off to great places. You're off

[271:24]

and away.

[271:25]

>> Another familiar phrase, perhaps. At

[271:26]

what reading level would you say Maria

[271:28]

is?

[271:30]

Well, yeah. Over here.

[271:33]

>> Third grade.

[271:33]

>> And what makes you say second or third

[271:35]

grade?

[271:39]

>> Okay.

[271:41]

>> So, now we're starting to introduce uh

[271:43]

complexities like rhyming and a bit more

[271:45]

substance to the quote. And indeed,

[271:47]

based on that reading, that same measure

[271:49]

that I described earlier, which will

[271:51]

involve a mathematical function that

[271:52]

somehow analyzes what it is Maria just

[271:54]

said. Indeed, we would conclude that she

[271:56]

read at a third grade level or grade

[271:58]

three. Finally, Omar, if you'd like to

[272:00]

introduce yourself and read once more

[272:01]

yours.

[272:02]

>> Okay. Um, so, hi everyone. I'm Omar. Um,

[272:05]

I'm a freshman at Earl, but thinking of

[272:07]

doing Kamsai and this is my reading. Um,

[272:10]

it was a bright cold day in April and

[272:12]

the clocks were striking 13. Winston

[272:14]

Smith, his chin nuzzled into his breast

[272:16]

in an effort to escape the vile wind,

[272:18]

slipped quickly through the glass doors

[272:20]

of victory mansions, though not quickly

[272:22]

enough to prevent the swirl of gritty

[272:23]

dust from entering along with him.

[272:25]

>> All right, sort of escalated quickly.

[272:27]

What reading level is Omar at, would you

[272:29]

say? Someone else.

[272:33]

What might you say or estimate?

[272:37]

Yes, right here in the front.

[272:38]

>> Eighth grade.

[272:39]

>> Okay, eighth grade. And what made you

[272:40]

say that?

[272:45]

more comp,

[272:46]

>> more complex sentences, more complex

[272:48]

words. And indeed, according to that

[272:49]

same measure, this full paragraph of

[272:52]

text now, which indeed has even more

[272:53]

grammar when you see it there on the

[272:54]

screen, would be said to be at grade 10

[272:57]

because of that added complexity. So,

[272:59]

with that said, we're going to need to

[273:01]

be able to somehow sort of crunch these

[273:03]

numbers to determine given a body of

[273:05]

text at what reading level someone is.

[273:07]

But in order to do that and apply any

[273:09]

metrics to a body of text, we're going

[273:11]

to need to represent that text in memory

[273:13]

using something like strings from last

[273:15]

week. But last week with strings, we

[273:17]

could really just print them out or

[273:18]

display them wholesale on the screen.

[273:20]

But I think we're going to need to break

[273:22]

down these various texts and others like

[273:24]

it at a finer grain level. And indeed,

[273:25]

among the goals for today is to explore

[273:27]

exactly that. and also to take the

[273:29]

proverbial hood off of the car to take a

[273:31]

look underneath and how the computer is

[273:34]

actually working, how these things like

[273:35]

strings are actually functioning. So, if

[273:37]

you could join me one last time in a

[273:38]

round of applause for our volunteers.

[273:39]

Thank you so much for helping out. Thank

[273:41]

you guys. Thank you. Thank you to Maria

[273:45]

as well. So among the goals for today

[273:49]

beyond exploring a representative

[273:51]

problem like this of reading levels is

[273:52]

going to be another one which is even

[273:54]

more important and more omnipresent than

[273:56]

reading levels namely cryptography. The

[273:58]

art of scrambling information or

[274:01]

specifically encrypting it so you can

[274:02]

send secure communications. Now you sort

[274:04]

of take this for granted increasingly

[274:06]

nowadays that when you send a text

[274:07]

message or perhaps an email or check out

[274:10]

online with a credit card that somehow

[274:12]

or other your information is secure. And

[274:14]

over the coming weeks, we're going to

[274:15]

explore to what extent that is actually

[274:17]

true and why or why. Now, now with

[274:20]

cryptography, similarly too, if we want

[274:22]

to be able to send messages securely,

[274:25]

such that if I want to send a message to

[274:26]

you, I don't want anyone else in the

[274:28]

room to be able to figure out what it is

[274:32]

I have said, even if they physically

[274:33]

intercept that message, which is all too

[274:36]

possible in a digital world. We're going

[274:37]

to need to come up with metrics and

[274:40]

mechanisms for actually scrambling

[274:42]

information in a reversible way so that

[274:44]

I can write my message somehow scramble

[274:46]

it. You can receive that message even if

[274:48]

after it's passed through many other

[274:49]

hands and you can descramble or decrypt

[274:52]

that same message. So for instance, here

[274:54]

on the screen is a message, a fairly

[274:56]

simplistic one that has somehow been

[274:58]

encrypted. And we'll see by the end of

[275:00]

today and by the end of this week that

[275:01]

this encrypted message and there's a bit

[275:04]

of a tell on the end there actually will

[275:06]

be said to decrypt to this is CS50. But

[275:10]

why is going to be the underlying

[275:11]

question and what additional tools do we

[275:12]

need on our toolkit in order to do that?

[275:16]

Another word on tools. So, up until now,

[275:19]

you've probably experienced some bugs,

[275:20]

whether it was in Scratch or ever more

[275:22]

so in C. In fact, don't feel too bad if

[275:24]

like the very first program you wrote in

[275:26]

C like didn't even work. You couldn't

[275:28]

even make it or compile it until you

[275:30]

went back and fixed some of the code

[275:31]

that you had written. Well, it turns out

[275:34]

that bugs, mistakes in programs are ever

[275:36]

so commonplace. And even though we've

[275:38]

already provided you with tools like the

[275:40]

virtual rubber duck at CS50.ai, also

[275:42]

embedded into VS Code at CS50.dev, dev

[275:45]

of whom you can ask questions along the

[275:47]

way. Among the goals today are to give

[275:49]

you some lifelong tools at how you can

[275:51]

actually debug software yourself when

[275:53]

you don't have a duck nearby, when you

[275:55]

don't have a TA nearby, let alone any

[275:57]

humans at all. So with debugging,

[275:59]

there's going to be a number of

[276:00]

techniques that we can use all toward an

[276:01]

end of like finding and removing bugs or

[276:04]

mistakes from our software. And perhaps

[276:06]

the person best known for having

[276:07]

popularized this term of bugs is that of

[276:10]

uh Dr. uh Grace Hopper pictured here who

[276:13]

was a rear admiral in the Navy and was

[276:15]

one of the original programmers of the

[276:16]

so-called Harvard Mark1, a very early

[276:19]

mainframe computer that if you wander

[276:21]

across the Charles River over to the

[276:22]

science and engineering complex here at

[276:24]

Harvard, you can actually see part of

[276:25]

this on display still in the lobby. It

[276:28]

was succeeded by the Harvard Mark II.

[276:30]

And on the Harvard Mark II, Dr. Hopper

[276:32]

and her team were known for having put

[276:35]

this note in their log book after having

[276:37]

done some number crunching on the system

[276:39]

there. And if we zoom in, they had found

[276:41]

a problem with the computer this one day

[276:44]

whereby there was literally a bug, a

[276:46]

moth inside of the circuitry of the

[276:48]

computer. And as was written here, first

[276:50]

actual case of bug being found. And ever

[276:53]

since then, do we say ever more so, the

[276:55]

phrase bug and debugging when it comes

[276:57]

to finding and eliminating problems in

[277:00]

our code. So let's start with just that.

[277:02]

In fact, let me go over to VS Code and

[277:03]

let's deliberately make some mistakes

[277:05]

together that might very well be

[277:06]

reminiscent of some of the mistakes

[277:07]

you've accidentally made thus far, but

[277:09]

along the way give you all the more

[277:11]

tools for solving those problems as

[277:13]

opposed to sort of uh having to ask

[277:16]

someone else, be it virtual or physical,

[277:17]

for help and actually find these

[277:19]

mistakes in your own code. Let me go

[277:21]

ahead and consciously in VS Code create

[277:23]

a program known to be buggy called

[277:25]

buggy.c.

[277:27]

And in this program, let's go ahead and

[277:29]

do some fairly familiar code initially.

[277:32]

I'm going to go ahead and start just

[277:33]

like we did last week with int main

[277:35]

void. More on that today before long. Uh

[277:38]

inside of my curly braces, I'm going to

[277:40]

say print f hello,

[277:42]

world. Uh that's it. Now I'm going to go

[277:45]

back to my terminal window here. I'm

[277:46]

going to go ahead and do make buggy to

[277:49]

make a program from that source code.

[277:51]

But before I do, odds are even after

[277:53]

just a week of this stuff, you can

[277:54]

probably spot a few mistakes I've made,

[277:57]

a few bugs. What do you see wrong

[277:59]

already? Yeah,

[278:01]

>> include standard.

[278:02]

>> I didn't include standard io.h, that

[278:04]

so-called header file, which is

[278:06]

important because it tells the compiler

[278:08]

that I plan to use functions therein

[278:11]

like print f, which clearly I'm doing.

[278:13]

So, let me go in and include standard

[278:15]

io.h.

[278:17]

What else seems to be wrong here? Yeah.

[278:21]

I'm missing a semicolon at the end of

[278:23]

line five here. So, I'm going to go

[278:24]

ahead and add that in. And this is

[278:26]

subtle and arguably not a bug, but maybe

[278:28]

an aesthetic detail. What else have I

[278:30]

done arguably wrong? Yeah. And back.

[278:34]

>> Yeah, I forgot my backslash and the new

[278:36]

line character just to move the cursor

[278:39]

to the next line so that when I get a

[278:40]

new prompt, it's on a fresh line of its

[278:42]

own. Again, more of an aesthetic, but

[278:43]

certainly a pretty reasonable thing to

[278:45]

do. So, let me go ahead now and actually

[278:47]

in my terminal window run make buggy.

[278:49]

and it indeed compiled. But up until

[278:51]

then, had I not fixed those mistakes, I

[278:53]

would have triggered a whole bunch of

[278:55]

bugs, a whole bunch of error messages as

[278:57]

a result. In fact, let's rewind in time

[278:59]

and undo the fixes I just made and go

[279:01]

back to the original form here and try

[279:03]

running again. Make buggy. Enter. And

[279:06]

we'll see some scary looking messages up

[279:08]

here. Let me scroll up to the top of the

[279:10]

output here where we see buggy c,

[279:14]

which means line three. That's where the

[279:16]

problem is right now. error call to

[279:18]

undeclared library function print f with

[279:20]

type and then it starts to get a little

[279:22]

more complicated but I do see clearly

[279:24]

that it's calling my attention to print

[279:26]

f. So hopefully at some point if not

[279:28]

last week hopefully this week onward

[279:30]

your instinct will be ah all right I'm

[279:31]

an idiot I forgot the header file in

[279:34]

which print f is actually declared it's

[279:36]

not a huge deal it's going to come with

[279:37]

practice so that's how I might know uh

[279:40]

in more intuitively what in fact uh the

[279:43]

solution here might be now here's

[279:45]

another common mistake that I've just

[279:46]

gone in and fixed but I did do something

[279:48]

wrong and hopefully none of you actually

[279:50]

did this because it's an annual FAQ.

[279:52]

What did I just do accidentally wrong?

[279:55]

So it's not studio.h, it's standard

[279:58]

io.h. So do kind of ingrain that one for

[280:00]

standard input output. The next though

[280:02]

bug that I haven't yet fixed is that

[280:04]

semicolon. So let me clear my screen and

[280:06]

rerun make buggy. I should no longer see

[280:08]

that first error message anymore. But I

[280:10]

now do see another error message on line

[280:13]

five. Expected semicolon after

[280:15]

expression. All right, that one's pretty

[280:16]

explicit. So I'm going to go ahead and

[280:18]

fix this. But notice that up until now,

[280:20]

my code wouldn't have been able to

[280:21]

compile because of those two error

[280:23]

messages. it stopped showing me uh by

[280:25]

showing me these errors. But at this

[280:27]

point, if I run make buggy enter, it did

[280:31]

in fact compile. And yet it's arguably

[280:33]

still buggy because when I run dot

[280:35]

/buggy, I get my prompt on the wrong

[280:37]

line. So this is a distinction now

[280:39]

between a syntax error, something that

[280:42]

or a programming error that outright

[280:45]

stops my program from compiling. It's

[280:46]

sort of a dealbreaker versus something

[280:48]

that's maybe more of a logical error. I

[280:50]

actually meant to move the cursor to the

[280:52]

next line. And so there's different

[280:53]

types of errors in the world as we're

[280:55]

seeing here. Of course, if I rerun make

[280:57]

buggy again/buggy. Now we're back in

[281:00]

business hopefully with the intention of

[281:03]

having this uh display exactly that. All

[281:06]

right. Well, let's modify to look a

[281:08]

little more like something else from

[281:09]

last week. Recall that last week I

[281:11]

started to get someone's name more

[281:14]

dynamically. So I said something like

[281:15]

name equals get string. And that was a

[281:18]

function we introduced. And I might have

[281:19]

said something like this. what's your

[281:20]

name? question mark with a space just to

[281:23]

move the cursor over. I know now I

[281:24]

definitely need to end my thought with a

[281:26]

semicolon. I could try and compile this

[281:29]

make buggy now and I'm seeing a

[281:32]

different error message altogether that

[281:34]

you might not have seen yet. So on

[281:35]

buggy.c line five error use of

[281:38]

undeclared identifier name.

[281:41]

What now is the mistake that I've made?

[281:43]

Why does it not know? declare the type.

[281:46]

>> Yeah, I forgot to declare the type of

[281:48]

this variable, which for those of you

[281:50]

with the prior programming experience is

[281:51]

not something you have to do in some

[281:53]

languages like Python for instance. But

[281:55]

in languages like C, C++, Java, and

[281:58]

others, you do in fact need to

[281:59]

explicitly tell the compiler that you

[282:01]

want to instantiate a variable, create a

[282:04]

variable in the computer's memory by

[282:06]

telling it its type. And it's not going

[282:08]

to be an int because I don't want an

[282:10]

integer, of course, in this case. I want

[282:12]

text which we now know to be called

[282:14]

string instead. All right, I think this

[282:16]

fixes that bug. So, let me do make buggy

[282:19]

again. And hopefully, huh, a fatal error

[282:22]

this time. Again, indicating that my

[282:24]

code did not recompile on line five.

[282:27]

Still, I have an error, but this time it

[282:29]

says use of undeclared identifier

[282:32]

string. Did I mean standard in? So, this

[282:35]

is a bit of a red herring. The compiler

[282:36]

is trying to be helpful and saying did I

[282:38]

mean standard in but I don't think I

[282:41]

actually do that just is the most

[282:42]

similar looking word in the compiler's

[282:44]

own memory. What's the actual mistake

[282:46]

that I've made here? Yeah,

[282:50]

>> you didn't CS library.

[282:52]

>> Yeah, I didn't include the CS50 header

[282:54]

file because string recall is a feature

[282:57]

of the CS50 library as is get string and

[283:00]

get int and others. So the solution here

[283:03]

is indeed to go up here and just to be

[283:06]

nitpicky I tend to alphabetize my header

[283:08]

files. It's not strictly required

[283:09]

technically but stylistically I find it

[283:11]

nice to be able to skim the header files

[283:13]

alphabetically to see if something is

[283:14]

there or not. I can include cs50.h in

[283:17]

addition to standard io.h and it's in

[283:20]

that file c50.h that not only is get

[283:22]

string define declared so that the

[283:25]

compiler knows that it exists it turns

[283:27]

out so is the word string. So this is a

[283:29]

bit of a white lie and this is something

[283:31]

we do in the early weeks of the class.

[283:32]

We dug up these old training wheels from

[283:34]

a bicycle. The whole idea being to sort

[283:36]

of keep you up and avoid you having to

[283:37]

do all too much complexity early on. The

[283:39]

point of these training wheels in the

[283:41]

form of the CS50 library is to let us

[283:43]

kind of ignore what a string really is

[283:46]

for just another week or two after which

[283:48]

we will then uh peel back that layer,

[283:51]

take off those training wheels and

[283:53]

reveal to you what is actually going on.

[283:55]

So, for now, strings exist, but they

[283:58]

exist because of the CS50 library. In a

[284:00]

couple of weeks, they're still going to

[284:01]

exist, but we're going to call them by a

[284:03]

different name, as we'll eventually see.

[284:05]

But everyone in the real world, uh,

[284:07]

every software developer uses the phrase

[284:09]

string. So, this is a concept that

[284:11]

exists. It is not CS50 specific at all.

[284:13]

It's just that in C, the word string

[284:16]

doesn't typically exist unless you make

[284:19]

it so, as we have. All right. So I think

[284:21]

now if I clear my terminal window and

[284:24]

rerun make buggy now it should in fact

[284:27]

compile. And if I run dot /buggy enter I

[284:30]

should be able to type in my name. And

[284:31]

now voila hello. So this is now not a

[284:35]

syntax error because I didn't screw up

[284:38]

my code per se like it compiled.

[284:40]

Everything is grammatically correct so

[284:42]

to speak but logically intellectually

[284:45]

this is not what I wanted right I wanted

[284:47]

it presumably to say hello David. So,

[284:49]

let's fix one final bug here. How do I

[284:52]

fix this? On what line?

[284:54]

How do I get it to say, "Yeah, hello,

[284:56]

David."

[285:00]

>> Yeah. On line seven, I need to do the

[285:02]

string placeholder, the format code, so

[285:04]

to speak, percent s. And then one more

[285:05]

thing, someone else. What do I do after

[285:07]

this? Yeah. And back.

[285:12]

>> Yeah. A comma. and then add the variable

[285:14]

name that contains the value I want to

[285:16]

substitute in there which is indeed name

[285:18]

though I could have called it anything I

[285:19]

want. All right, so now make buggy enter

[285:22]

seems to have compiled again dot /buggy.

[285:24]

Now I type in my name once more and now

[285:27]

we're back in business. So over the

[285:28]

course of these few exercises, clearly I

[285:31]

I meant to make most of all of these

[285:33]

bugs, these mistakes, but they

[285:35]

demonstrate not only syntax errors,

[285:37]

which are just going to stop the

[285:38]

compiler in its tracks. Like you won't

[285:40]

even be able to compile your code until

[285:42]

you fix those things, but even after

[285:44]

that, there could be these latent bugs

[285:46]

that seem to not be there until you

[285:49]

actually provide input and see what's

[285:52]

actually happening at so-called runtime

[285:54]

when you're running the actual code. And

[285:56]

so here's where it's no longer as easy

[285:58]

as just reading the error message and

[286:00]

figuring out what it means because there

[286:02]

is no error message that appeared on the

[286:04]

screen when it said hello, world. We had

[286:06]

to use our own human intellect and

[286:07]

realize, okay, that's clearly not what I

[286:09]

wanted. Had you run CS50's own check 50

[286:11]

program on something like that, we could

[286:13]

have told you that that's not correct by

[286:15]

automatically assessing the correctness

[286:17]

of it. But the compiler has no idea what

[286:20]

you are trying to achieve logically. it

[286:22]

only knows about the language C itself

[286:24]

and the requisite syntax for actually uh

[286:28]

writing and compiling code. So how could

[286:31]

we go about solving logical problems in

[286:34]

code? So I would propose that we start

[286:36]

to consider this here list whereby when

[286:40]

you want to find a logical problem in

[286:42]

your code and better understand what's

[286:44]

going on or really what's going wrong,

[286:46]

print f is going to be your friend. Up

[286:48]

until now we've used printf to literally

[286:50]

print on the screen. Hello David, hello

[286:52]

Kelly or anything else on the screen.

[286:54]

But you can certainly use print f

[286:56]

temporarily to just print stuff out

[286:59]

inside of your program that you might

[287:00]

want to better understand. And then once

[287:02]

you understand it and once you've solved

[287:04]

some problem fine then you can delete

[287:06]

those temporary lines of code recompile

[287:08]

and move on. So let's use print f as a

[287:11]

debugging tool in that sense. Let me go

[287:13]

back over to VS Code here and let me in

[287:15]

this same program buggy.c see sort of

[287:18]

delete everything and start over with a

[287:20]

different sort of bug. I'm going to

[287:22]

include standard io.h at the top. I'm

[287:25]

going to do int main void after that.

[287:27]

And then inside main, I'm going to do a

[287:29]

simple for loop that just prints out

[287:31]

like a a stack of three bricks like we

[287:34]

saw in the world of Mario when Mario

[287:36]

needed to we claimed sort of jump over a

[287:39]

stack of bricks. We want to print out

[287:40]

just three of those at the moment. So

[287:42]

I'm going to go ahead and say for int i

[287:45]

equals 0. i is less than or equal to

[287:49]

three because I want three of these i

[287:51]

++. Then inside of this for loop, I'm

[287:53]

going to go ahead and quite simply do

[287:55]

print f hash symbol to represent the

[287:57]

brick followed by a new line to move the

[287:59]

cursor to the next line. Semicolon to

[288:01]

complete the thought. Now, I've

[288:03]

deliberately made a stupid mistake here,

[288:05]

but in the context of a simple enough

[288:07]

program that we can focus on the

[288:08]

debugging technique on, not on the

[288:10]

obscurity of the bug in question.

[288:12]

Hopefully, you'll spot the bug in just a

[288:14]

moment, if not already. When I do make

[288:16]

buggy now and dot/buggy, I don't get

[288:18]

three bricks. I of course get one 2 3

[288:22]

four total. So, there's a logical bug in

[288:24]

this program. And odds are you can

[288:26]

already spot what it is. But let me

[288:28]

propose that this program is

[288:30]

representative of a type of problem that

[288:32]

you can solve a little more

[288:33]

diagnostically by poking around and

[288:35]

really asking the computer via printf to

[288:37]

show you what's really going on. And I

[288:40]

would propose that one of the most

[288:41]

helpful techniques in a situation like

[288:43]

this if you're trying to wrap your mind

[288:44]

around why are there four bricks instead

[288:46]

of three. Well, clearly this is related

[288:48]

to the loop somehow. So let's look a

[288:51]

little more thoughtfully at what the

[288:53]

value of i is before we print out each

[288:56]

of those bricks. And I might literally

[288:58]

do something like this temporarily. Uh,

[289:00]

print f quote unquote i is percent i

[289:05]

back slashn close quote. And then I

[289:07]

could just print right here and now the

[289:10]

value of i just so that I can actually

[289:12]

see it. Let me now go down into my

[289:13]

terminal window make buggy again dot

[289:16]

/buggy. And now and I'll full screen my

[289:19]

terminal. I'll get some diagnostic

[289:21]

information at the same time. So when I

[289:23]

is one I get a brick. When I sorry when

[289:25]

I is zero I get a brick. When I is one,

[289:27]

I get another brick. When I is two, I

[289:29]

get another brick. When I is three, I

[289:31]

get a fourth brick. So now I can kind of

[289:33]

see that, okay, my loop is working, but

[289:36]

I'm going too far. I'm going too long.

[289:39]

Now I can do this even more succinctly.

[289:40]

For what it's worth, I don't need a

[289:42]

whole new print def statement. I could

[289:44]

just go into my existing print def, put

[289:46]

my percent I there, and then maybe a

[289:48]

space just to scooch things over and

[289:50]

then print out I in that same line. If I

[289:52]

now do makebuggy slashbuggy. Okay, now

[289:56]

I'm seeing that I'm printing a hash a

[289:58]

brick for each value of i from i equals

[290:01]

0 1 2 and also three. So the solution of

[290:05]

course is that I shouldn't be starting

[290:06]

at zero and iterating less than or equal

[290:09]

to three. The solution is like ah I'm an

[290:12]

idiot. I should have said less than

[290:14]

three. Or if I prefer to count starting

[290:16]

at one like a normal person, I could

[290:18]

have set I equal to one and then go up

[290:20]

two and through three. But as I claimed

[290:23]

last week, the canonical way, the most

[290:25]

common way to do this is start counting

[290:27]

at zero and go up two, but not through

[290:29]

the total value that you have in mind.

[290:33]

But there's going to be another

[290:34]

technique that's worth knowing here. Let

[290:36]

me go ahead and sort of abstract this

[290:37]

away by whipping up a slightly better

[290:39]

variant of this as follows. Let me go

[290:42]

ahead and delete this for loop. Let me

[290:44]

assume for the moment that inside of

[290:45]

main I'm going to ask the user now for

[290:47]

the height of a pyramid. And I'm going

[290:49]

to do something like this. int h equals

[290:51]

get int. And let's prompt the user for

[290:54]

the height value of this pyramid or this

[290:56]

wall. And then let's go ahead and assume

[290:58]

there exists a function called print

[291:00]

column who takes as input a number h

[291:03]

which is how many bricks you want to

[291:05]

print. Now this function does not exist

[291:07]

yet. Print column. Get in does exist but

[291:10]

I don't have access to it. So let me not

[291:11]

make the same mistake twice. What do I

[291:12]

need to add at the top of this file?

[291:15]

Yeah,

[291:16]

>> CS50 header file.

[291:17]

>> I need the CS50 header file because I'm

[291:19]

using the get int function now, which

[291:21]

again comes from our library, not C. So,

[291:23]

let me go ahead and include CS50.h, but

[291:26]

now print column. I can invent this

[291:28]

function myself. So, let me go ahead and

[291:30]

say void print column int height in

[291:35]

parenthesis. More on that in just a

[291:36]

moment. And then I'm going to recreate

[291:38]

the loop from before for int i equals z.

[291:42]

I is less than or equal to the height.

[291:44]

So I'm going to deliberately for now

[291:45]

make that same mistake as before. i ++

[291:48]

and then inside of this for loop I'm

[291:49]

going to go ahead and print out a single

[291:51]

hash and a new line to represent that

[291:54]

there brick. So now main can use a

[291:57]

function called print column. It's going

[291:59]

to pass in the value of h and then this

[292:02]

for loop in the print column function is

[292:04]

going to take care of printing this

[292:05]

thing for me. So, let me do this again.

[292:07]

Make buggy. Enter. So far so good. Dot

[292:10]

/buggy. Let's put in a height. I'm going

[292:12]

to say manually height of three. And I

[292:14]

should see three bricks. But of course,

[292:16]

I'm still seeing four. Now, before we

[292:18]

move on, let me hide my terminal and

[292:20]

propose that this is just kind of

[292:22]

stylistically bad to put anything other

[292:25]

than your main function at the top. But

[292:26]

recall that if I move my helper

[292:29]

function, print column, and it's a

[292:31]

helper function in so far as I made it

[292:32]

to help me solve another problem. I

[292:35]

can't recompile and run my code now.

[292:37]

Why? The compiler won't let me. Yeah.

[292:45]

>> Exactly. When the compiler gets to line

[292:47]

seven of my code, it's going to abort

[292:49]

compilation because it doesn't know what

[292:51]

print column is. Why? Because I don't

[292:53]

tell it what it is until line 10. And

[292:56]

this was the only time I proposed that

[292:57]

copy paste is reasonable is to highlight

[293:00]

and copy the very first line of that

[293:02]

function. Paste it above main with a

[293:05]

semicolon. And that's a so-called

[293:06]

function prototype. It specifies what

[293:09]

the name of it is, what its inputs are

[293:11]

if any, and what its output is if any.

[293:14]

And more on these inputs and outputs

[293:15]

later on. But now this is just a more

[293:17]

complicated but more modularized version

[293:20]

of this same program. Let me do make

[293:22]

buggy. Still compiles dot /buggy. type

[293:25]

in three and I still have that same bug.

[293:28]

But the catch now is that my code has

[293:30]

gotten more complicated. And the point

[293:32]

of my having abstracted away this idea

[293:35]

of printing a column into a new function

[293:37]

is that there's just more code now to

[293:39]

debug. I could certainly go in there and

[293:41]

start adding print fs, but at some point

[293:43]

print f is going to be a very primitive

[293:45]

tool and you're going to waste more time

[293:47]

adding print defs, recompiling your

[293:49]

code, running your code, changing the

[293:51]

print f, recompiling your code, running

[293:53]

your code. It's going to get very

[293:54]

tedious quickly when you have lots of

[293:56]

lines of code on the screen. So, can I

[293:58]

actually step through my code line by

[294:01]

line? Maybe like your TA would in a

[294:03]

section or a small class line by line

[294:05]

walking through the code. You can

[294:07]

because another tool that you have

[294:09]

access to is that called debug 50. So,

[294:13]

this is a CS50 command that will start

[294:16]

an industry standard debugger. And a

[294:18]

debugger is a piece of software that is

[294:21]

used in the real world that literally

[294:22]

lets you do that, debug your code by

[294:24]

letting you slow down or even pause

[294:27]

execution and walk through execution of

[294:30]

your code line by line. The only reason

[294:32]

we call it debug 50 is because in VS

[294:34]

Code it's a little annoying to start the

[294:36]

debugger. And so we automated the

[294:38]

process of starting the debugger, but

[294:39]

everything thereafter has nothing to do

[294:41]

with CS50 and everything to do with

[294:43]

realworld software engineering

[294:45]

techniques. So how do we use this? Let

[294:47]

me go back to VS Code here and let me

[294:49]

propose that I want to step through this

[294:51]

code line by line just like we might at

[294:53]

a whiteboard in a smaller class to

[294:55]

figure out why I'm getting four instead

[294:57]

of three hashes. Well, in my terminal

[294:59]

window, what I'm going to go ahead and

[295:01]

do is this debug50 space/buggy.

[295:06]

So debug 50 is the command. It needs to

[295:08]

know what program I want to debug. So

[295:10]

I'm specifying/buggy,

[295:12]

which is the name of the program I just

[295:14]

compiled. I'm going to get an error

[295:16]

though the first time I run this. Uh, as

[295:18]

will you if you make the same mistake.

[295:20]

I'm about to see this message here.

[295:22]

Looks like you haven't set any break

[295:23]

points. Set at least one break point by

[295:25]

clicking to the left of a line number

[295:27]

and then rerun debug 50. So, what is

[295:29]

this really telling me? Well, the

[295:31]

debugger has no idea when and where I

[295:34]

want to pause execution so as to start

[295:36]

walking through my code line by line. It

[295:38]

wants me to tell it where to break. That

[295:40]

is where to pause by clicking on a line

[295:43]

number. So, let me hide my terminal for

[295:44]

just a moment. And you've probably never

[295:46]

done this intentionally, but if you

[295:48]

hover over the space to the left of your

[295:50]

program's line numbers, you'll see a

[295:52]

little red dot, a little stop sign of

[295:54]

sorts. If you actually click on a line

[295:56]

number, that red dot will stay there.

[295:57]

And you can see the hover here saying

[295:59]

click to add breakpoint. What I'm going

[296:01]

to go ahead and do is say click to add a

[296:03]

breakpoint at main. Maine is the entry

[296:05]

point to my program. It's the default

[296:06]

function that gets called. Let's break

[296:08]

right away so I can step through this

[296:10]

code line by line. All right, let me

[296:12]

reopen my terminal window and clear it

[296:14]

and then run debug 50 again with dot

[296:17]

slashbuggy enter. And now a whole bunch

[296:20]

of stuff is going to happen quickly on

[296:22]

the screen. And then it's going to clean

[296:23]

itself up because once the debugger is

[296:26]

running and ready to go, it's going to

[296:29]

allow me to start stepping through my

[296:31]

code line by line. So what is going on?

[296:34]

Well, notice nothing has happened in the

[296:36]

terminal yet. Why? Because my code has

[296:38]

been paused inside of main. in

[296:40]

particular, it's been paused in the

[296:42]

first real line of code. So the curly

[296:44]

brace is uninteresting. The first line

[296:46]

is just the function's name essentially.

[296:47]

So line 8 is the first juicy line of

[296:49]

code that could possibly do anything

[296:51]

useful. It's been highlighted here in

[296:53]

yellow. And that the fact that this

[296:55]

cursor is here means that we have broken

[296:57]

execution on this line, but we have not

[296:59]

yet executed this line, which is why in

[297:01]

the terminal, I don't see anything yet.

[297:03]

I definitely don't see height followed

[297:05]

by colon. Notice what else has happened

[297:07]

here. All of a sudden in the lefth hand

[297:09]

side of the screen where your file

[297:10]

explorer typically is or where the CS50

[297:12]

duck typically is, we see mention of

[297:15]

variables, you can actually see inside

[297:17]

of the debugger what the value of any

[297:19]

variable in the computer's memory

[297:21]

happens to be. Now I don't quite

[297:23]

understand this right now. We'll come

[297:25]

back to this over time, but weirdly

[297:27]

before line a 8 even executes, it seems

[297:30]

that h has a default value of 32,764,

[297:34]

which seems to have come from nowhere.

[297:36]

As an aside, this is going to be what's

[297:38]

called a garbage value. And this is

[297:40]

actually why we have Oscar so

[297:41]

omnipresently here. A garbage value

[297:44]

tends to be a default value inside of a

[297:46]

variable that's the result of that

[297:48]

memory having been used previously for

[297:50]

something else. Inside of your computer,

[297:52]

you've got all of this memory, random

[297:54]

access memory or RAM. More on that

[297:55]

today. And it stands to reason that the

[297:57]

my computer or whatever cloud server

[297:59]

we're using has been running for some

[298:01]

time. So the bits that H is going to use

[298:04]

might already have some random switches

[298:06]

on and off. Some random pattern of bits

[298:08]

that happens to give me 32,764.

[298:11]

But the moment this line of code

[298:12]

executes, that value is going to get

[298:15]

changed to what I actually want it to

[298:16]

be, which is what the human is going to

[298:18]

type in. Meanwhile, at the bottom here,

[298:20]

you'll see a so-called call stack. More

[298:22]

on this too in the weeks to come, but

[298:23]

you'll see that we've paused on the

[298:25]

function called main in the file called

[298:28]

buggy.c.

[298:30]

So, how do I do something useful? Well,

[298:32]

at the very top of the debugger, you'll

[298:34]

see a whole bunch of color-coded icons.

[298:36]

One looks like a play button. And if I

[298:38]

click that, it's just going to continue

[298:39]

execution of my code as though I don't

[298:42]

want to step through it anymore. So, I'm

[298:43]

not going to click that just yet. The

[298:45]

second arrow, which is a little curved

[298:47]

arrow over a dot, is the so-called step

[298:49]

over line, which will mean step over

[298:52]

this line and execute it, but only one

[298:54]

line at a time. Let's go ahead and do

[298:56]

exactly that. So, I'm going to click the

[298:58]

step over icon, the second one, which is

[299:00]

the curved arrow with the dot under it.

[299:03]

Click. Now, I see in my terminal window

[299:05]

height being prompted. All right, let's

[299:07]

go ahead and type in three, just like I

[299:09]

did before, and hit enter. Now, notice

[299:11]

what happens. Execution has paused on

[299:13]

line 9 instead of 8. And you'll see that

[299:17]

my variable, a so-called local variable,

[299:20]

has the value of three as intended. All

[299:22]

right. So far, this isn't all that

[299:24]

enlightening other than demonstrative of

[299:26]

the fact that I can pause execution of

[299:28]

my program anytime I want. So, let's now

[299:30]

click that step over button again so

[299:33]

that we actually print this column.

[299:35]

Click. And there we have it. Four hashes

[299:39]

at the bottom of the screen. Now,

[299:41]

execution has paused at the end of the

[299:42]

function. This is just my opportunity to

[299:44]

either stop or restart or continue. I'm

[299:47]

just going to go ahead and click the

[299:48]

play button and let it finish executing.

[299:50]

Unfortunately, that wasn't really at all

[299:52]

in enlightening except to confirm for me

[299:55]

that I typed in three and three is what

[299:57]

is in the computer's memory. Not that

[299:59]

interesting though yet. So, let's do

[300:01]

this. Let's leave the breakpoint on line

[300:02]

six as before. Let's rerun the debugger

[300:05]

by running debug 50 space/buggy.

[300:08]

Let's let it do its startup thing, which

[300:10]

looks a little messy at first, but now

[300:11]

we've highlighted line 8 again. I'm

[300:14]

going to go ahead and step over this

[300:15]

line because I do want to get an int.

[300:18]

I'm going to type in three again. enter.

[300:20]

But this time, instead of stepping over

[300:23]

line 9 and just letting print column

[300:25]

happen, this is where the debugger gets

[300:27]

powerful. Let me step into line 9 and

[300:31]

walk through the print column function

[300:33]

itself line by line. So, let me go ahead

[300:35]

and click not this button, which is the

[300:38]

curved arrow over the dot, but the next

[300:40]

one, which is the step into button.

[300:43]

Click. And now you'll see that execution

[300:46]

has jumped inside of print column and

[300:48]

paused on line 14. At which point I can

[300:51]

see at top left what the default value

[300:53]

of I is. And this is some crazy garbage

[300:56]

value because whatever bits are being

[300:58]

used to store I's value have some random

[301:00]

garbage from some previous use of that

[301:02]

memory. But as soon as line 14 executes

[301:05]

once, I bet I is going to take on a

[301:06]

value of zero. So let's do that. I'm

[301:10]

going to go ahead and click step over

[301:12]

because I don't need to step into this

[301:14]

because there's no other functions

[301:15]

there. Step over it and immediately at

[301:17]

top left I is now zero. Now line 16 is

[301:21]

highlighted. Let's step over this. Okay.

[301:23]

And notice in the terminal window, what

[301:25]

do you see? The first of our hashes.

[301:27]

Let's step over. Step over. Second hash.

[301:30]

And I is now one. Step over. Step over.

[301:34]

Now we see a third hash. And I is now

[301:36]

two. Step over. Step over. Okay, there's

[301:39]

the symptom of the bug. Four hashes and

[301:43]

yet I is three. But wait a minute, this

[301:45]

is going to draw my attention now to

[301:47]

line 14 before I continue onward. Wait a

[301:49]

minute. Three is of course less than or

[301:51]

equal to three, which is why I got that

[301:53]

fourth hash on the screen. So at the end

[301:55]

of the day, like you still need to

[301:56]

exercise some of your own human

[301:58]

intellect to figure out and understand

[302:00]

what's going on. But the value of this

[302:02]

here debugger is that you can pause and

[302:05]

work through things at your own pace and

[302:07]

poke around inside of your own code and

[302:10]

better understand what's happening as

[302:11]

opposed to compiling the program,

[302:13]

running it, and just now having to infer

[302:16]

from the symptoms alone what the source

[302:19]

of the problem might be.

[302:21]

So that was a lot. Let me go ahead here

[302:24]

and just let it continue to the end

[302:25]

because I know what the problem is. Now

[302:27]

I need to change the less than or equal

[302:28]

to sign to a simple less than instead.

[302:32]

Questions though on debug 50 or any of

[302:33]

these steps. Yeah,

[302:34]

>> I have two questions.

[302:35]

>> Sure.

[302:36]

>> Could you go over what the break point

[302:39]

thing is? And then my second question

[302:41]

was around the garbage.

[302:43]

The second time you ran it, it still

[302:45]

gave that same garbage value even though

[302:47]

you had assigned to H.

[302:49]

>> Correct. So in order of your questions,

[302:52]

what again are these break points? The

[302:54]

break point or the little red stop sign

[302:56]

here just tells the debugger where to

[302:59]

pause execution. So frankly, I didn't

[303:02]

have to break pause execution at main.

[303:04]

If I really care about debugging print

[303:06]

column, I could have clicked down here

[303:08]

instead and then it would have just run

[303:09]

main automatically and only paused once

[303:12]

print column gets called. So a break

[303:14]

point is where your code will break, the

[303:15]

point at which it will break. As for the

[303:18]

garbage values, I'm tell it's I'm

[303:20]

oversimplifying exactly what's going on

[303:22]

inside of the computer's memory. and

[303:24]

it's not necessarily using exactly the

[303:26]

same memory as before, but the operating

[303:29]

system will govern exactly how the

[303:30]

memory is laid out. Um, this is actually

[303:33]

a significant problem, long story short,

[303:35]

in a lot of today's systems because it's

[303:38]

not that interesting to me to know that

[303:40]

there was 32,000, whatever that number

[303:42]

is, or the negative number. But suppose

[303:44]

that that revealed the password of some

[303:47]

another program or function that had

[303:49]

some information there. It seems all too

[303:51]

easy with the debugger, let alone C, to

[303:54]

actually poke around the computer's

[303:55]

memory. And we're going to come back to

[303:56]

that in a couple of weeks. But for now,

[303:58]

it's a garbage value in so far as you

[303:59]

didn't put the value there. It somehow

[304:01]

got there on its own for now. Other

[304:03]

questions?

[304:04]

>> When you have a four, does the i=

[304:10]

to one at the end of the four or the

[304:13]

next?

[304:22]

Correct. So the question is about the

[304:23]

order of operations for a for loop. So

[304:25]

the first time you go through a for loop

[304:27]

the initialization happens the stuff

[304:28]

before the first semicolon and the

[304:31]

condition is actually checked the

[304:33]

boolean expression. Then everything

[304:34]

inside of the curly braces is executed.

[304:37]

Then the incrementation or update

[304:39]

happens which in this case is I++ and

[304:42]

then the condition is again checked the

[304:44]

boolean expression. The code is

[304:45]

executed. The update happens. The

[304:47]

condition again the code is updated. And

[304:49]

so it starts to loop like this. The

[304:51]

debugger's graphics are fairly

[304:52]

simplistic and it just highlights the

[304:54]

whole line without making super clear

[304:56]

what's happening. But that's just the

[304:57]

definition of a for loop. Good question.

[305:00]

Others about debug 50 or print def.

[305:05]

All right. Yeah.

[305:11]

>> Can you change the position of I++ and

[305:13]

height? Short answer, no. The first

[305:15]

thing is the initialization, the

[305:17]

variable you want to create and

[305:18]

initialize. The second thing is the

[305:20]

actual condition, the so-called boolean

[305:21]

expression. The third thing is always

[305:23]

the update. So, it must come in this

[305:25]

order. What you're not seeing is that

[305:26]

you can actually have multiple boolean

[305:28]

expressions, you can have multiple

[305:29]

initializations, you can have multiple

[305:31]

updates, but we're keeping it simple for

[305:33]

now. And this is canonical. All right.

[305:35]

So to make clear, assuming that either

[305:38]

print f or debug 50 helped me figure out

[305:41]

where the illogic was in my thoughts, I

[305:44]

now know that the fix here is to just go

[305:46]

and change the less than or equal to to

[305:48]

a simple less than. And if I run the

[305:49]

program again, of course, it's going to

[305:50]

give me the three bricks that I always

[305:53]

wanted instead. But there's other

[305:55]

techniques we can use too. So besides

[305:57]

print f and debug, you might wonder why

[305:59]

we have a 7ft duck behind me here. All

[306:02]

of these little rubber ducks on the

[306:03]

floor. So rubber duck debugging per week

[306:04]

zero is actually a thing. Uh this was

[306:06]

popularized in a book some years ago and

[306:08]

the idea is that when you are facing

[306:10]

some bug, some mistake in your program

[306:11]

or you're just confused on some concept.

[306:14]

There is anecdotal evidence to suggest

[306:16]

that just talking out the problem with

[306:19]

an inanimate object like a rubber duck

[306:21]

on your desk is enough often for that

[306:23]

proverbial like light bulb to go off

[306:25]

over your head because you hear in your

[306:27]

own words what confusion you're having,

[306:29]

what illogical thoughts you're having,

[306:31]

and you don't even need another human or

[306:33]

TA or AI in the room to answer the

[306:36]

problem for you. So in fact on the way

[306:37]

out today at the end of class we've got

[306:39]

hundreds of ducks and enough for

[306:40]

everyone to take home with you if you'd

[306:42]

like to use that as another debugging

[306:43]

technique whether in CS50 or something

[306:45]

else. But of course now in the age of AI

[306:47]

you also have the AI powered virtual

[306:50]

duck at cs50.ai and also in VS Code at

[306:52]

cs50.dev which really is a mechanism for

[306:55]

asking questions that you don't think

[306:57]

you can solve on your own. So, it might

[307:00]

be reasonable to ask the duck, "What

[307:02]

does this error message mean?" If you're

[307:04]

having trouble wrapping your mind around

[307:05]

it, but it's less reasonable to say copy

[307:07]

paste your code into the duck and say,

[307:09]

"What's wrong with my code?" You should

[307:11]

really be meeting the AI halfway. After

[307:13]

all, what's the point of actually doing

[307:14]

this or any other class is to develop

[307:16]

that muscle memory, develop those mental

[307:18]

models, get some practical skills. So

[307:20]

try hard to walk that line between

[307:22]

asking the duck too much versus

[307:24]

deploying some of these same tools

[307:25]

yourself. Print fbug 50, even a physical

[307:28]

rubber duck on your desk before you

[307:30]

resort to sort of escalating it to human

[307:33]

like or duck help. All right, so with

[307:36]

those tools added to one's toolkit,

[307:40]

let's actually consider and reveal

[307:42]

what's been going on underneath the hood

[307:43]

since last week. So this was the mental

[307:46]

model we proposed for last week whereby

[307:48]

when you write source code in a language

[307:49]

like C. It's not something that the

[307:51]

computer itself understands natively

[307:53]

because computers we saw only understand

[307:55]

zeros and ones aka machine code. So the

[307:58]

compiler is the program that we use to

[308:00]

convert your source code to the machines

[308:03]

code from C to zeros in one in this

[308:06]

case. More generally a compiler is just

[308:07]

a program that translates one language

[308:09]

to another. And in this case we're going

[308:11]

from source code to machine code. So

[308:13]

let's consider what's really happening.

[308:15]

And indeed, this is among the goals of

[308:16]

this week is to take a look at a lower

[308:19]

level so that when you encounter more

[308:21]

interesting, more challenging problems,

[308:23]

you'll understand from so-called first

[308:24]

principles what the computer is actually

[308:26]

doing and supposed to do. So you can

[308:28]

deductively figure things out for

[308:31]

yourself and generally not view

[308:32]

computers as like magic or I don't know

[308:35]

how this works. you'll have a fairly

[308:37]

bottom-up sense of how everything works

[308:39]

by terms end inside of any computer,

[308:42]

laptop, desktop, phone, or the like

[308:44]

these days. So, here's the simplest of

[308:46]

programs that we wrote last week, even

[308:47]

though there's a lot of syntactic

[308:49]

complexity as we've seen. The goal is to

[308:51]

get it to machine code. These here,

[308:53]

zeros and ones. So, how has that been

[308:55]

happening when you just run make since

[308:58]

last week? Well, these are the two

[309:00]

commands that we've typically run after

[309:02]

creating a file like hello. C. We then

[309:04]

compile it with make hello and then we

[309:06]

run it with dot /hello. So let's give

[309:08]

ourselves this starting point real quick

[309:10]

just so that we have an example in mind

[309:13]

of exactly what it is we're compiling.

[309:15]

So let me go back to VS Code here. Close

[309:18]

out buggy.c and let's create a new file

[309:20]

just like last week called hello.c

[309:23]

inside of which is our old friend

[309:25]

standard io.h h int main void and then

[309:28]

inside of this we'll keep it simple just

[309:30]

printing out hello world which again is

[309:32]

my source code in C. How do I now

[309:35]

actually compile that? Well, of course I

[309:38]

can go down to my terminal window make

[309:40]

hello/hello

[309:42]

and we're off and running. So it was a

[309:44]

bit of a white lie for me to let you

[309:46]

think though that last week the compiler

[309:48]

itself is called make. Make is a command

[309:50]

that literally makes your program. It

[309:53]

makes it by compiling it. But make is

[309:55]

not technically the compiler. If we

[309:56]

really want to get nitpicky, the

[309:58]

compiler you've been using is actually

[310:00]

called clang for C language. And this is

[310:02]

a very popular compiler, freely

[310:04]

available, open source so to speak. You

[310:06]

can even look at the code other humans

[310:08]

wrote to create the compiler online. And

[310:10]

what make is really doing for us is

[310:13]

essentially automating this command. So

[310:15]

all this time I could have just run

[310:17]

clang spacehello.c.

[310:20]

But the default file name from Clang the

[310:23]

compiler weirdly and for historical

[310:26]

reasons is not going to be hello as you

[310:28]

would hope. It's going to be a.out for

[310:31]

assembler output. And we don't do this

[310:33]

in the first uh in week one of the class

[310:35]

because like this just makes things

[310:37]

unnecessarily complex that we're adding

[310:39]

some random name that you just have to

[310:40]

know to type. However, we can do this

[310:42]

now as follows. Let me go back to VS

[310:45]

Code here. And let me clear my terminal

[310:47]

and type ls. And we'll see everything

[310:49]

we've created thus far. Buggy. C, which

[310:52]

when I compiled it, I got buggy. And

[310:54]

hello.c, which I just wrote. And when I

[310:56]

compiled it, I got hello. Let's do this

[310:58]

command now manually, though. Let's use

[311:00]

clang on hello. C, and hit enter. That

[311:03]

two seems to work. But if I now type ls,

[311:06]

you'll see a third program specifically

[311:08]

called a.out, which happens to be the

[311:11]

same as hello. It just is using the

[311:12]

default name instead of my custom name,

[311:14]

hello. But if I do dot slash a.out

[311:17]

indeed that too will work. But the

[311:20]

reason we don't do that certainly in the

[311:22]

first week of the course is that things

[311:25]

get a little annoying or sort of

[311:26]

escalate quickly thereafter. So let me

[311:29]

go ahead and change this program as

[311:30]

we've done a few times already. Let me

[311:32]

include cs50.h so that we get access to

[311:35]

like get string. Let me do string name

[311:37]

equals get string quote unquote what's

[311:40]

your name question mark close quote. And

[311:42]

then down here, just like before, let me

[311:45]

add my percent s and add in my name. So,

[311:47]

I did that super quickly, but it's the

[311:48]

same program we wrote a few minutes ago,

[311:50]

and it's the same one we wrote last

[311:51]

week. What happens now, though, is as

[311:54]

follows. If I now try to do clang hello

[311:57]

C enter, I actually get an error

[312:00]

message. This one perhaps more cryptic

[312:02]

than most. Somehow or other, I have this

[312:05]

error. Linker command failed with exit

[312:07]

code one because of an undefined

[312:09]

reference to get string. Now, in the

[312:12]

past when we've seen undefined or really

[312:14]

undeclared mentions of get string, the

[312:16]

problem was just with missing this line.

[312:18]

This line is clearly here. But the catch

[312:21]

is I'm getting this error message now

[312:23]

because when I run clang of hello.c, I'm

[312:26]

just assuming that clang knows where to

[312:28]

find the CS50 version of get string. And

[312:32]

that is not the case. Technically, if I

[312:36]

want the compiler to compile this code

[312:38]

for me, what I'm actually going to have

[312:40]

to do is this. Let me go back to uh my

[312:44]

terminal window here, and I'm going to

[312:46]

say clang hello. C, but I'm then going

[312:50]

to specify -Lcs50, which is cryptic at

[312:54]

first glance, but this is telling the

[312:56]

compiler to link in the CS50 library so

[313:00]

that it knows what the zeros and ones

[313:02]

are that belong to the get string

[313:04]

function. Long story short, if I hit

[313:06]

enter now, the error message has gone

[313:08]

away. If I type ls, I've still got

[313:10]

a.out, but it's a new version thereof.

[313:12]

And if I do dot / a.out, now I see the

[313:14]

new behavior where I can type in my name

[313:16]

and see hello, David. Now, this is

[313:19]

getting a little stupid that I keep

[313:20]

using a.out. We can change that as well.

[313:23]

In fact, these commands, as we're

[313:25]

starting to see, support what are called

[313:26]

command line arguments. And a lot of the

[313:28]

programs we've run already take command

[313:30]

line arguments. When we run code space

[313:33]

hello.c, the so-called command line

[313:36]

argument to code is hello. C. When I run

[313:39]

make hello, the command line argument to

[313:42]

make is hello. In other words, the

[313:44]

command line arguments to a program are

[313:46]

all of the words you're typing in your

[313:48]

terminal after the name of the program

[313:51]

itself, whether it's make or whether

[313:53]

it's code or anything else. So, this is

[313:56]

to say what I just ran clang of hello.

[313:59]

C-LCS50,

[314:00]

I was passing in two command line

[314:02]

arguments. Hello. C, which is the code I

[314:04]

want to compile, and LCS50, which means

[314:06]

use the CS50 library, please. But I can

[314:09]

add another to the mix. I can actually

[314:11]

do something like this. whereby I do

[314:14]

clang-

[314:15]

o hello hello then I can do hello c and

[314:20]

then -lc cs50 enter. Now that too seems

[314:24]

to work. And if I type ls I've got all

[314:26]

the same programs as before. So let's go

[314:28]

ahead and get rid of those to make clear

[314:29]

what's going on. I'm going to remove

[314:30]

a.out. I'm going to remove hello. And

[314:34]

just for good measure I'll remove buggy

[314:35]

as well. So that all I have left in this

[314:37]

folder is source code. So if I type ls

[314:39]

there's my two files. Let's do this

[314:41]

again. clang- o hello hello c-lcs50

[314:48]

enter. Now if I type ls I don't see

[314:51]

a.out anymore because apparently

[314:53]

according to the documentation for clang

[314:55]

the actual compiler if you pass d- o as

[314:59]

a command line argument followed by

[315:00]

another word of your choice you can name

[315:03]

the program anything you want without

[315:05]

having to resort to mv or clicking on it

[315:07]

and typing a new name in manually. So if

[315:10]

I now do /hello, I see the exact same

[315:12]

version where it's just asking me for my

[315:14]

name and then printing it out. But long

[315:16]

story short, the whole point of this

[315:18]

exercise is that like running commands

[315:20]

like this quickly gets very tedious. You

[315:22]

have to remember like the order in which

[315:23]

to do it, what the command line

[315:24]

argument. I mean, this is just stupid

[315:26]

waste of time typically, certainly in

[315:28]

week one of the course to have to

[315:29]

memorize these kinds of magical commands

[315:31]

to get things working. But for now, know

[315:33]

that when you run make, it's essentially

[315:35]

automating all of that for you and

[315:37]

making it as simple semantically as make

[315:40]

hello or make buggy. But what's really

[315:42]

happening is the make command because of

[315:44]

the way we've configured cs50.dev for

[315:47]

you is doing all of this behind the

[315:49]

scenes. And it's not that magical. This

[315:51]

just means change the file name to hello

[315:53]

when you compile it. This just means

[315:55]

compile this code. And this just means

[315:57]

use the CS50 library. like that's all.

[316:02]

But that message about linking something

[316:04]

in there's there's something juicy going

[316:07]

on there such that make is in fact

[316:09]

helping us sort of solve a whole bunch

[316:11]

of problems when we compile and in fact

[316:13]

let me propose that if we take a step

[316:15]

back and look at some of the actual code

[316:17]

that we're compiling. Let's consider

[316:19]

like what we actually mean by compiling.

[316:22]

Yes, it's the case that to compile your

[316:23]

code means to go from source code to

[316:25]

machine code. But technically there's a

[316:27]

few more steps involved. Technically

[316:29]

when you compile your code that's sort

[316:31]

of become the industry term of art that

[316:33]

really is referring to four separate

[316:35]

processes all of which are happening in

[316:37]

succession automatically but each of

[316:39]

which is doing a different thing. So

[316:40]

just once let's walk through these these

[316:43]

several steps. So what is this

[316:45]

pre-processing step? So consider this

[316:47]

program here which we wrote uh in brief

[316:49]

last week. We've got include standard

[316:51]

io.h which is there because we want to

[316:53]

be able to use print f ultimately. We've

[316:56]

then got a prototype for this meow

[316:57]

function. And the meow function does

[316:59]

this. All it does is print out quote

[317:00]

unquote meow followed by a new line.

[317:02]

Takes no input, returns no return

[317:04]

values. The main function now has a for

[317:07]

loop. Iterates three times each time

[317:09]

calling the meow function. And we saw

[317:11]

this already earlier today. This line of

[317:13]

code here, the so-called prototype is

[317:15]

necessary because we need to tell the

[317:18]

compiler that meow exists before we

[317:20]

actually use it here, especially if I

[317:23]

don't get around to implementing it

[317:24]

until later. So this copy paste of that

[317:26]

first line of code, a so-called

[317:28]

prototype solve that problem. This is

[317:31]

what the header files are essentially

[317:33]

doing for us. Before I use print f down

[317:36]

here, the compiler needs to know what it

[317:38]

is, what its inputs are, what its

[317:40]

outputs are. Turns out the prototype for

[317:43]

print f is going to be in standard io.h.

[317:47]

And that's what that line of code has

[317:49]

been doing for us all this time. In

[317:51]

fact, let's take a simpler example that

[317:52]

we keep using here whereby I'm including

[317:54]

CS50.h and standard io.h. And I'm using

[317:57]

the CS50 get string function to get

[318:00]

someone's name and put it in a variable

[318:01]

called name and then I'm printing out

[318:03]

hello, such and such. What's going on

[318:06]

now when I pre-process this file by

[318:08]

running make, which in turn runs clang?

[318:11]

Well, the compiler finds on the server's

[318:13]

hard drive the file called cs50.h H goes

[318:17]

inside and essentially copies and pastes

[318:19]

its contents into my own code.

[318:22]

Meanwhile,

[318:24]

such that we get the prototype there for

[318:26]

get string. And we haven't seen this

[318:27]

yet, but it stands to reason that all

[318:29]

this time using print f, we've been

[318:30]

passing in a prompt like what's your

[318:32]

name? And we've been getting back a

[318:34]

string. What's inside the parenthesis,

[318:35]

recall, is the input. What's before the

[318:37]

function name is the output, the

[318:39]

so-called return value. What about

[318:40]

standard io.h? It's in that file that

[318:42]

print f's prototype is. So essentially

[318:45]

what the compiler does when

[318:47]

pre-processing this file is it finds

[318:49]

standardio.h somewhere on the server's

[318:51]

hard drive, goes inside and copy and

[318:53]

pastes those relevant lines of code into

[318:55]

my code as well. It's to avoid me having

[318:58]

to do all of that myself, find the file,

[319:00]

copy paste it, or manually type out the

[319:02]

prototype. These pre-processor

[319:04]

directives just automate all of that

[319:06]

TDM. So what this effectively has at the

[319:08]

top of my code after the files been

[319:10]

pre-processed is all of those hash

[319:12]

symbols followed by include are changed

[319:15]

to contain the actual contents of those

[319:18]

header files. Now the compiler knows

[319:20]

what get string is all about and what

[319:22]

print f is all about. That then is the

[319:24]

pre-processing step. What is compiling

[319:27]

technically mean? Compiling means taking

[319:30]

that pre-processed code, which again

[319:32]

looks a little something like this, and

[319:34]

convert it into something called

[319:36]

assembly code. And we won't spend much

[319:38]

time in this class on assembly code, but

[319:40]

this is how programmers used to write

[319:42]

code. Before there was C, before there

[319:44]

was Python and Java and all of these

[319:46]

other modern languages, programmers were

[319:48]

writing code like this. Before this

[319:50]

existed, they were programming zeros and

[319:52]

ones into the earliest of mainframe

[319:54]

computers using punch cards and other

[319:56]

technologies. Like literally sheets of

[319:57]

paper with holes in them. Not very fun.

[320:00]

Very tedious. So the world invented

[320:01]

this. Also not very fun, very tedious.

[320:04]

So the world invented C. Not that much

[320:06]

fun. So the world invented Python and so

[320:08]

forth. We continue to sort of evolve as

[320:10]

a species with code. But the compiler

[320:13]

technically takes your pre-processed

[320:14]

source code and converts it into

[320:16]

something that looks like this. Cryptic,

[320:18]

and that's to be expected. But there are

[320:19]

some familiar phrases. There's mention

[320:21]

of main. There's mention of getstring.

[320:23]

There's mention of print f. And there's

[320:25]

a bunch of other things. Move and push

[320:27]

and exor and call and these other

[320:29]

commands here. These are the assembly

[320:31]

instructions. Those are the lowest level

[320:33]

instructions that the CPU inside of a

[320:35]

computer understands. CPU is the central

[320:38]

processing unit. The thing by Intel or

[320:40]

AMD or Apple or other companies. Those

[320:43]

are the lowest level commands that the

[320:45]

actual hardware inside of the computer

[320:46]

understand. It's just nice to be able to

[320:48]

write words like main and for and uh

[320:52]

print f than it would be to run these

[320:54]

much more arcane commands that you'd

[320:56]

have to look up in a manual. So

[320:58]

compiling just takes CC code and makes

[321:00]

it a lower level type of code called

[321:03]

assembly. When I said a.out means

[321:06]

assembler output, that's why inside of

[321:08]

that file is essentially the output of

[321:10]

an assembler. All right, we're almost

[321:12]

there. What does it mean to assemble a

[321:14]

program? which is step three of the

[321:16]

compilation process. That means

[321:18]

converting assembly code to the actual

[321:21]

zeros and ones we keep talking about. So

[321:23]

if the file is called hello C, when that

[321:25]

file is assembled, the assembly code

[321:28]

becomes the zeros and ones for your code

[321:30]

in hello. C. But your code is not

[321:33]

everything that composes your final

[321:34]

program. Your code from hello.

[321:37]

has to be combined with code from CS50's

[321:41]

library from the standard IO library

[321:43]

that other humans wrote. I and the team

[321:45]

wrote the CS50 code. Other humans in the

[321:47]

world wrote the print f code in standard

[321:49]

IO. So essentially the fourth and final

[321:52]

step is to link all of those zeros and

[321:54]

ones together. Somewhere on the server

[321:57]

there is not just the header file CS50.h

[321:59]

and standard io.h but your code hello.c,

[322:03]

our code cs50. C and the code that

[322:07]

contains print def's own implementation.

[322:10]

Bit of a white lie. It's technically not

[322:11]

called standard io. C, but the point

[322:14]

remains ultimately the same. So these

[322:16]

files have already been compiled for you

[322:18]

in advance. This is your code. What the

[322:21]

assembly process does is it combines all

[322:24]

of that into zeros and ones and then all

[322:27]

three chunks of zeros and ones are

[322:29]

linked together. So if you think back to

[322:32]

when I tried compiling the code without

[322:34]

-Lcs50, there was some mention of linker

[322:37]

linking just means the computer did not

[322:39]

know how to link your code with CS50's

[322:43]

code because we were missing LCS50 which

[322:46]

tells the compiler to go find it

[322:48]

somewhere on the hard drive. And the

[322:50]

final step then of linking is to combine

[322:52]

all of those zeros and ones into one

[322:54]

bigger blob of zeros and ones. And

[322:56]

that's what's inside your hello program

[323:00]

that you can execute. So long story

[323:02]

short, these four steps are what's been

[323:04]

happening ever since the start of last

[323:06]

week. Pre-processing, compiling,

[323:09]

assembly, and linking. But thankfully,

[323:11]

the world of programmers generally just

[323:13]

treats all four of these steps as what

[323:16]

we know now as compiling. It's just a

[323:19]

lot easier to say compile and not worry

[323:20]

about those lower level details. But

[323:22]

that might reveal better to you what all

[323:24]

of these error messages mean when you

[323:26]

see hints of this kind of terminology

[323:31]

questions on any and all of that from

[323:34]

here on out. We're going to go higher

[323:35]

level than lower. Yeah.

[323:38]

I I I don't get the part with the like

[323:40]

when we're talking about com um when I

[323:44]

think it's the assembly process when you

[323:45]

basically convert it to zeros and ones.

[323:47]

>> Um doesn't like across the multiple like

[323:50]

the three different ones. Don't the

[323:51]

zeros and one signify different things

[323:52]

like one signify text and the other

[323:54]

signify something else. How does the

[323:56]

computer know like what part what 8 bit

[323:59]

corresponds to which part?

[324:00]

>> Really good question. How does the

[324:01]

computer know which of those zeros and

[324:03]

ones corresponds to data like numbers or

[324:06]

strings of text or actual commands?

[324:08]

We're going to come back to that in week

[324:09]

four of the class. But long story short,

[324:12]

what we just saw on the screen is a big

[324:13]

blob of zeros and ones actually follow

[324:15]

some pattern where the bits up top

[324:17]

represent a certain functionality. The

[324:19]

bits on the bottom represent something

[324:21]

else and they're organized into

[324:22]

patterns. So, long story short, we'll

[324:24]

come back to that, but they follow

[324:25]

conventions. It's not just a hot mess of

[324:27]

like zeros and ones.

[324:29]

>> Other questions?

[324:30]

>> So, Preprocessing step is just replacing

[324:32]

the hashtag.

[324:36]

>> Correct. The pre-processing step goes

[324:38]

into the header file and essentially

[324:40]

copies and paste the contents of it into

[324:42]

your own code so you don't have to waste

[324:44]

time doing that manually yourself. Other

[324:47]

questions?

[324:47]

>> Just curiosity when you're talking about

[324:49]

the compiling step um how it converts it

[324:52]

to assembly code and you're saying that

[324:54]

the CPU understands all those commands.

[324:57]

Is the CPU then converting that into

[325:01]

Uh no the so when you compile your code

[325:04]

you're going from the uh assembly code

[325:07]

to the zeros and ones that sorry uh when

[325:10]

you compile let me pull up the the chart

[325:12]

again when you compile your code you're

[325:15]

going from the C code to the assembly

[325:18]

code and the patterns you get when you

[325:21]

see the assembly code are specific to a

[325:23]

certain CPU. So long story short, if

[325:25]

you're designing software for iPhones or

[325:27]

for Android devices or Macs or PCs,

[325:30]

you're going to necessarily use a

[325:31]

different compiler because given the

[325:33]

same C code, you will get different

[325:35]

assembly instructions in the output. And

[325:38]

this is why you can't just take back in

[325:39]

the day like a CD containing a program

[325:42]

from a Mac and run it on a PC or vice

[325:44]

versa because it's the wrong patterns of

[325:46]

instructions. But the reason why we have

[325:48]

all of these annoying layers of

[325:51]

complexity is because one, four

[325:53]

different people can now implement the

[325:54]

notion of compiling. Someone can

[325:55]

implement the pre-processor, someone can

[325:57]

implement the compiler, the assembler,

[325:58]

the linker, and you can actually

[325:59]

collaborate by breaking things down into

[326:01]

these quantized steps. But also you can

[326:04]

do this step, this step, and then two

[326:05]

different people can write compilers to

[326:07]

actually write uh to output assembly

[326:09]

code for like iPhones over here and

[326:11]

Android devices over here. But all of us

[326:13]

can still enjoy using the same language

[326:15]

up here. So there's a lot of reasons for

[326:17]

this complexity. Just understanding it

[326:19]

is useful, but you're not going to need

[326:20]

to use this sort of knowledge day today,

[326:23]

but it's what enables so much of today's

[326:25]

complexity nonetheless. All right, so a

[326:28]

bit of a flourish now as to what we've

[326:30]

been doing with compiling. Well,

[326:32]

compiling is going ultimately from

[326:34]

source code to machine code. Couldn't

[326:36]

you just kind of reverse the process,

[326:37]

right? If someone wrote really

[326:39]

interesting software like Microsoft Word

[326:41]

or Excel or something like that, well,

[326:43]

when I buy it or download it, like I

[326:45]

literally have a copy of all of those

[326:46]

zeros and ones, couldn't I just kind of

[326:48]

reverse this process and reverse

[326:50]

engineer someone else's code by

[326:52]

decompiling it? And this is genuinely a

[326:55]

threat. And this comes up in matters of

[326:57]

law and intellectual property because

[326:59]

the zeros and ones have to be accessible

[327:01]

to you and to your computer. So, it's

[327:03]

not a great feeling if someone with

[327:04]

enough time and enough savvy could sort

[327:06]

of reinvent Microsoft Word by just

[327:08]

figuring out what all those zeros and

[327:10]

ones mean. However, it's sort of easier

[327:12]

said than done to reverse engineer code

[327:15]

from these zeros and ones. For instance,

[327:17]

this pattern of bits on the screen here

[327:19]

did what did we say last week?

[327:23]

Silly. No normal person should be able

[327:24]

to answer this, but I did say it before.

[327:27]

These zeros and ones print what?

[327:29]

>> It just prints out hello world. And I

[327:31]

cannot glance at that and figure it out

[327:33]

like off the top of my head. But if I

[327:35]

know what architecture, what CPU this

[327:38]

code has been compiled into and I pay

[327:40]

attention in week four and know what the

[327:42]

various layout of the zeros and ones

[327:43]

are, I could painstakingly figure out

[327:46]

what each of those patterns of zeros and

[327:48]

one means by breaking them into chunks

[327:50]

of 8 or 16 or 32 or 64, which are common

[327:53]

units of measure that I alluded to last

[327:55]

week. Now, that's going to take a crazy

[327:57]

amount of time. And the sort of pre

[327:59]

presumption is that if you are smart

[328:01]

enough and capable enough and have

[328:03]

enough free time to do that, it would

[328:04]

probably take you less time to just

[328:06]

implement Microsoft Word the normal way

[328:08]

and just rebuild the software. It's

[328:09]

going to take you more time to go in

[328:10]

reverse than it would in the so-called

[328:12]

forward direction. But there's other

[328:13]

subtleties as well. Inside of this code

[328:16]

is not only commands like print,

[328:18]

functions like printf, but suppose that

[328:20]

it contained a loop for instance to

[328:21]

print meow meow meow. Well, we know

[328:23]

already that you can use a for loop

[328:24]

sometimes or you can use a while loop,

[328:26]

but they're functionally equivalent.

[328:27]

It's sort of a stylistic decision which

[328:29]

one you use, whichever one you're more

[328:30]

comfortable with, or maybe feels a

[328:32]

little better designed, but you can't

[328:34]

figure out from the zeros and ones

[328:36]

whether or not it was a while loop or a

[328:38]

for loop, because it just results in the

[328:41]

same pattern of zeros and ones. It's

[328:43]

just a programmer's choice. Which is to

[328:44]

say, you can't even perfectly reverse

[328:46]

engineer everything because it's not

[328:48]

going to be obvious from the zeros and

[328:50]

ones what the source code originally

[328:52]

looked like. But again the bigger deal

[328:53]

breaker is if you have that much time

[328:55]

and energy and savvy just like

[328:57]

reimplement Microsoft Word itself don't

[328:59]

try to reverse the whole process which

[329:00]

is going to be much more painstaking and

[329:02]

timeconuming instead. Now this is not

[329:04]

true for all languages and just as a

[329:06]

teaser in a few weeks time when we talk

[329:07]

about web programming and another

[329:09]

language called JavaScript it turns out

[329:11]

that JavaScript source code is actually

[329:14]

sent from web servers to web browsers

[329:16]

and you can look at the source code of

[329:18]

any website on the internet harvard.edu

[329:20]

edu, facebook.com, gmail.com, it's going

[329:23]

to be there. So, not all languages, it

[329:25]

turns out, are even compiled. Typically,

[329:28]

sometimes the source code is just

[329:29]

executed by the underlying computer. So,

[329:32]

we're just scratching the surface of

[329:33]

some of the implications of all this. In

[329:35]

a little bit time, let's take a look

[329:36]

further under the hood at the actual

[329:37]

memory, solve some other problems, but I

[329:39]

think it's now time for cheese it. So,

[329:40]

let's go ahead and take a 10-minute

[329:42]

break. Uh, snacks are now served. See

[329:44]

you in 10.

[329:46]

All right, we are back. And up until now

[329:48]

when we've been writing code, recall

[329:50]

that we have to specify like what type

[329:51]

of value you want to put in a variable.

[329:53]

Like that's why I had to go in and add

[329:54]

string before the word name in my first

[329:56]

bug today. But it turns out C, as we've

[329:58]

kind of seen already, has a whole bunch

[330:00]

of these data types. Um, I rattled these

[330:02]

off last week. Bool, int, long, float,

[330:04]

double, char, string. But we'll consider

[330:06]

for a moment just how much space each of

[330:08]

these things takes up and see if we

[330:10]

can't help you see what the debugger was

[330:12]

seeing earlier. That is what is where in

[330:14]

memory. So, a bull, it turns out,

[330:16]

actually takes up one bite, which is

[330:18]

kind of stupid because technically a

[330:19]

bool, true or false, really only needs

[330:21]

one bit. It just turns out that it's

[330:24]

more efficient and easier to just use a

[330:25]

whole bite, eight bits, even though

[330:27]

seven of them are effectively unused.

[330:28]

So, a bool will take up one bite, even

[330:30]

though it's just true and false. An int

[330:32]

recall uses four bytes. So, if you want

[330:34]

to count really high with an int, the

[330:36]

highest you can go is roughly 4 billion,

[330:38]

we've claimed, unless you want to

[330:39]

represent negative numbers, in which

[330:41]

case the highest is like 2 billion.

[330:42]

because if you want to be able to count

[330:43]

all the way down to negative two

[330:45]

billion, you got to kind of split the

[330:46]

difference. A long meanwhile is twice

[330:48]

that. It uses eight bytes which is

[330:50]

roughly nine quadrillion possibilities

[330:53]

which is quite a few more than 4

[330:55]

billion. Um that is if you want to

[330:57]

include negative numbers as well. Then

[330:58]

we had floats which were real numbers

[331:00]

with decimal points which speak to just

[331:02]

how precise you can be with significant

[331:04]

digits. A float is four bytes by

[331:06]

default, but a double gives you twice as

[331:08]

many bits to play with, which gets you

[331:10]

get lets you be more precise. Even

[331:12]

though at the end of the day, whether

[331:13]

you're using floats or doubles, floating

[331:15]

point imprecision, as we've seen, is a

[331:18]

fundamental problem for scientific,

[331:20]

financial, and other types of computing

[331:22]

where precision is ever so important. A

[331:25]

char meanwhile, at least as we've seen

[331:26]

it, is a single bite using asy

[331:28]

characters specifically. And then string

[331:30]

I'll put as a question mark because a

[331:32]

string totally depends on its length. If

[331:34]

you're storing high, that's like one,

[331:36]

two bytes. If you're storing hello,

[331:37]

that's like five bytes and so forth. So,

[331:40]

strings depend on how many characters

[331:42]

you actually want to store inside of

[331:44]

them. So, where does this go? Well, here

[331:46]

is a picture of a a stick of memory uh a

[331:49]

a dim so to speak, whereby on this uh

[331:52]

stick of memory, which is slid into your

[331:53]

computer, your laptop, your desktop, or

[331:55]

some other device, there's all these

[331:56]

little black chips that essentially

[331:58]

contain lots of room for zeros and ones.

[332:00]

it's somehow electronic, but inside of

[332:02]

there are all of the zeros and ones that

[332:04]

we can uh store data in. So, if we kind

[332:07]

of zoom in on this, it stands to reason

[332:09]

that for the sake of discussion, if this

[332:11]

one chip represents like one gigabyte, 1

[332:14]

billion bytes, it stands to reason that

[332:16]

we could slap some addresses on these

[332:18]

bytes whereby we could say this is the

[332:20]

first bite and this is the last bite or

[332:22]

more precisely this is by 0 1 2 3 dot

[332:26]

dot dot bite 1 billion. And it doesn't

[332:28]

matter if it's top, down, left, right,

[332:29]

or uh any other order. We're just

[332:31]

talking about this conceptually at the

[332:33]

moment. So in fact, let's go ahead and

[332:34]

draw this really as a grid of memory, a

[332:36]

sort of canvas that we can just use to

[332:38]

store types of data like bools and ints

[332:41]

and chars and floats and everything

[332:42]

else. If we are going to use one bite to

[332:44]

store like a char, well, you might use

[332:47]

just these eight bits up here, one bite

[332:49]

up here. If you want to store an int,

[332:51]

well that's four. You might use all four

[332:53]

of these bytes necessarily contiguous.

[332:55]

You can't just choose random bits all

[332:57]

over the place. When you have a four

[332:59]

byte value like an int, they're all

[333:00]

going to be contiguous back to back to

[333:02]

back in memory like this. But if you got

[333:04]

a long or a double, you might use eight

[333:06]

bytes instead. So truly, when you store

[333:09]

a value in memory, whether it's a little

[333:10]

number or a big number, all you're doing

[333:12]

is using some of the zeros and ones

[333:15]

physically in the computer's hardware

[333:17]

somewhere and letting it permute them,

[333:19]

turn them on and off to represent that

[333:21]

value you're trying to store. All right,

[333:24]

so let's go ahead and abstract away from

[333:25]

the hardware though and let's just start

[333:27]

to think of this grid of memory uh sort

[333:29]

of in zoomed in form and consider more

[333:31]

at a lower level what is actually being

[333:33]

stored inside of here. For instance,

[333:35]

suppose that we've got some code like

[333:37]

this containing three scores on like

[333:39]

problem sets. You got a 72 on one of

[333:41]

them, a 73 on another, and a 33 on the

[333:43]

third. I've deliberately chosen our old

[333:45]

friends 72 73 33 which recall spell high

[333:50]

or together in the context of colors is

[333:52]

like a shade of yellow just so that

[333:53]

we're not adding some new random numbers

[333:55]

to the mix. These are our old friends

[333:57]

three integers. Well, let's use these in

[333:59]

a program. Let me go over to VS Code

[334:00]

here and let me create with code a

[334:03]

program called scores.c. That's just

[334:05]

going to let me quickly calculate my

[334:06]

average score on my problem sets. I'm

[334:09]

going to go ahead and include as we

[334:11]

often do standard io.h at the top. I'm

[334:13]

going to do int main void after that.

[334:16]

And then inside of my curly braces, I'm

[334:18]

going to do exactly those sample lines

[334:20]

of code. My first score uh was let's say

[334:24]

a 72, my second score was 73, and my

[334:28]

third score was 33. So I've declared

[334:31]

three variables, one for each of my

[334:32]

problem set scores. Now let's calculate

[334:34]

the average. So print f quote unquote

[334:37]

average colon just so I know what I'm

[334:39]

printing. And now I'm going to go ahead

[334:40]

and use maybe percent uh i back slashn.

[334:45]

And then what I'm going to pass in is a

[334:47]

bit of math. So to compute an average,

[334:48]

it's just score 1 plus score 2 plus

[334:51]

score 3 divided by three. And I put the

[334:55]

scores the numerator in parenthesis just

[334:56]

like in grade school like I need to do

[334:58]

that operation first before doing the

[335:01]

division. So just like math class

[335:02]

semicolon at the end to finish my

[335:04]

thought. Let's see how this goes. Make

[335:06]

scores. enter dot slashcores and it

[335:09]

would seem that my average across these

[335:11]

three problem sets is 72

[335:15]

which I which is great but I don't think

[335:18]

that's actually what I want here. What

[335:23]

have I done wrong? It's unintentional.

[335:26]

Yeah.

[335:29]

>> Yeah. I'm kind of being a little

[335:30]

generous with myself here. I didn't

[335:31]

really factor in my worst score. So that

[335:33]

was accidental. So now let me do this

[335:35]

correctly. make scores dot slashscores

[335:38]

and now okay my average is 59 but I I

[335:41]

beg to differ I'd like to quibble my

[335:43]

score technically I think mathematically

[335:44]

should really be 59 and a3 I'm kind of

[335:47]

being cheated those that third of a

[335:48]

point so what's going on here why am I

[335:50]

only seeing 59 and not my full grade

[335:54]

>> you're using so

[335:57]

it's going to

[335:59]

>> perfect because I'm using integers when

[336:00]

I divide by three it's going to truncate

[336:02]

everything after the decimal point which

[336:03]

we touched on at the very end of week

[336:05]

one, which is an issue with just

[336:06]

truncation in general. So, one approach

[336:08]

to fix this, I could change my percent I

[336:10]

to percent F, which is the format code,

[336:12]

it turns out, for a float, and that is

[336:14]

what I want to print. So, let's see if

[336:16]

that fix alone is enough. Make scores.

[336:18]

Oops, it's not. I got ahead of myself

[336:21]

there. And let me scroll up to the

[336:22]

error. Format specifies double, but the

[336:25]

argument has type int. Turns out you can

[336:27]

use percent f for doubles as well. So,

[336:29]

that's why I'm saying double, even

[336:30]

though I intended a float in this case.

[336:32]

So, there's a problem here. I the

[336:34]

argument has type int even though I'm

[336:37]

passing in percent f. You're seeing

[336:39]

mention of percent d here which is an

[336:41]

alternative to percent i. We typically

[336:43]

encourage you to use percent i because i

[336:45]

for integer but there is uh that is not

[336:47]

the solution to this problem because I

[336:48]

want my third of a point back. So how

[336:51]

could I go about fixing this? Well the

[336:52]

fundamental problem here is that I'm

[336:54]

trying to format an integer as a float

[336:57]

or even as a double. Well I need to

[336:59]

convert these scores to floats instead.

[337:02]

So, I could go in and change this to

[337:04]

float, this to float, this to float, and

[337:07]

heck, just to be super precise, I could

[337:08]

add a 0 on the end of each of them just

[337:11]

to make super clear these are floats.

[337:13]

But there's another way. I could, for

[337:15]

instance, uh, simply convert my

[337:18]

denominator to 3.0 because it turns out

[337:21]

so long as you involve like one float in

[337:24]

your math, the whole thing is going to

[337:26]

get promoted, so to speak, to floating

[337:28]

point values instead of integers. I

[337:30]

don't have to convert all of them. So I

[337:31]

think now if I do make scores dot

[337:33]

slashscores now ah there's my third of a

[337:36]

percent uh the third of a point back.

[337:40]

There's another way to do this just as

[337:41]

an aside and we'll see this again down

[337:42]

the line if you really want to stick

[337:44]

with three cuz it's a little weird just

[337:47]

semantically to divide by 3.0 like

[337:50]

that's an implementation detail but

[337:51]

you're truly computing an average of

[337:53]

three things. You can technically cast

[337:55]

the three to a float in parenthesis. You

[337:59]

can specify the data type that you want

[338:00]

to convert another data type to. And

[338:04]

this too should make the compiler happy.

[338:06]

Aha. Dot /cores. I get roughly the same

[338:09]

answer. We're seeing some floatingoint

[338:10]

imprecision though nonetheless. But that

[338:13]

too would achieve the goal here. But

[338:15]

short that's all just a function of um

[338:18]

floating point arithmetic there. So

[338:20]

what's going on now actually in the

[338:23]

computer's memory? Let me revert back to

[338:24]

the simpler one with just 0 there. And

[338:27]

let me propose that we consider where

[338:28]

these three things are in memory. Well,

[338:30]

if we treat this as my grid or canvas of

[338:33]

memory, who knows where they're going to

[338:34]

end up? But for the sake of discussion,

[338:36]

let's assume that 72 ended up in the top

[338:38]

left of my computer's memory. I've drawn

[338:40]

it to scale, so to speak, and that this

[338:42]

score one variable is clearly taking up

[338:44]

four bytes of memory, and it's an int.

[338:47]

And that's typically how many bytes are

[338:49]

used on systems. Technically, it depends

[338:51]

on the exact system you're using, but

[338:52]

nowadays it's pretty reasonable to

[338:54]

assume that an integer will be 32 bits

[338:57]

on most modern systems. Score 2 is

[338:59]

probably over there. Score 3 is probably

[339:01]

over there. So, I'm using 12 bytes

[339:04]

total, four bytes for each of these

[339:06]

values. All right, so that's really all

[339:08]

that's going on underneath the hood. I

[339:09]

don't have to worry about this. The

[339:11]

compiler essentially figured out for me

[339:13]

where to put all of these things in

[339:14]

memory. But what really is in memory?

[339:16]

Well, technically each of these

[339:18]

variables if it's used if it's composed

[339:20]

of 32 bits is really just a pattern of

[339:22]

literally 32 zeros and ones. And I

[339:24]

figured out the pattern here. I crammed

[339:26]

them all into the space there. But you

[339:28]

see here three patterns of 32 bits which

[339:31]

collectively compose those numbers

[339:33]

there. But let's consider design now in

[339:36]

terms of my code. This gets the job

[339:38]

done. It's not that bad or big of a deal

[339:40]

for just calculating the average of

[339:42]

three scores. But this should also start

[339:44]

to rub you the wrong way. this week

[339:46]

onward when it comes to design like this

[339:48]

is correct especially now that I uh

[339:50]

clamorred back my third of a point but

[339:54]

this is bad design using the variables

[339:56]

in this way why might you think

[340:00]

yeah

[340:01]

>> you're going to have to type in each

[340:02]

score manually assign variable

[340:04]

individually

[340:06]

>> yeah I'm going to have to type in each

[340:07]

score manually with each passing week

[340:09]

when I get the fourth problem set and

[340:10]

the fifth I mean surely people who came

[340:13]

before us came up with a better way to

[340:15]

solve this problem than like manually

[340:16]

create 10 variables, 20 variables,

[340:18]

whatever it is by the end of the

[340:20]

semester. It just feels a little sloppy.

[340:22]

And indeed, that's often the the way to

[340:24]

think about the quality of something

[340:25]

that's designed. Think about the

[340:26]

extreme. If you don't have three scores,

[340:28]

but 30 or 300, is this really going to

[340:31]

be the best way to do it? And if you

[340:32]

feel like, no, no, there's got to be a

[340:34]

better way, odds are there are.

[340:36]

Certainly, if the language itself is

[340:37]

well designed, so let's consider how

[340:39]

else we might go about solving this.

[340:41]

Well, it turns out we can treat our

[340:43]

canvas of memory, that grid of bytes you

[340:46]

into uh chunks of memory known as

[340:49]

arrays. An array is a chunk of

[340:52]

contiguous memory back to back to back

[340:54]

whereby if you want to store three

[340:56]

things, you ask the computer for a chunk

[340:58]

of memory for three things. If you want

[341:00]

30, you ask for one chunk of size 30. If

[341:02]

you want even more, you ask for a chunk

[341:04]

of size 300. Chunk is not a term of art.

[341:06]

I'm just using it to colloqually explain

[341:08]

what an array actually is. It's a chunk

[341:10]

or a block of memory that is back to

[341:12]

back to back to back. So what does this

[341:15]

mean in practice? Well, it means that we

[341:17]

can introduce a little bit of new syntax

[341:18]

in C. If I want to create one variable

[341:21]

instead of three and certainly one

[341:24]

variable instead of 30, I can use syntax

[341:26]

like this. Hey compiler, give me a

[341:29]

variable called scores plural. Give me

[341:33]

room for three integers therein. So,

[341:35]

it's a little bit of a weird syntax, but

[341:38]

you specify the type of all of the

[341:40]

values in the array. You specify the

[341:42]

name of the array, scores in this case,

[341:43]

and I pluralized it just semantically

[341:45]

because it makes more sense than calling

[341:46]

it score now. And then in square

[341:48]

brackets, so to speak, you specify how

[341:50]

many integers you want to put into that

[341:53]

chunk of memory. So, this one line of

[341:55]

code now will essentially give me 12

[341:58]

bytes automatically, but they'll all be

[342:01]

referable by the name scores plural. So,

[342:04]

let's go ahead and weave this into some

[342:06]

code as follows. Let me go back to VS

[342:09]

Code here, clear my terminal, and now

[342:11]

let's just whip up the same kind of

[342:12]

program, but get rid of these three

[342:14]

independent variables. And instead,

[342:15]

let's go ahead and just say int scores

[342:18]

plural bracket three. Now, I need a way

[342:21]

to initialize the three values. But this

[342:23]

I can do too. It turns out that if I

[342:25]

want to put three values in this, I just

[342:27]

need slightly new syntax. I can say

[342:28]

scores bracket 0 equals 2 72 scores

[342:32]

bracket 1 equals 73 scores bracket 2

[342:35]

equals 33 so it's not all that different

[342:37]

from having three variables but now I

[342:39]

technically have one variable and I am

[342:41]

indexing into it at different locations

[342:44]

location 0 1 and two and it's zero

[342:47]

because we always in computing start

[342:49]

counting from zero so I do scores

[342:51]

bracket zero is going to be my 72

[342:53]

problem set scores bracket one is my 73

[342:56]

problem set and scores bracket two was

[342:57]

my weakest my uh 33 P sets. Now my

[343:01]

syntax down here has to change because

[343:03]

there are no more score one, score two,

[343:05]

score three variables, but there are

[343:07]

scores bracket zero plus scores bracket

[343:09]

one plus. And notice what VS Code is

[343:11]

trying to do for me. It's saving me some

[343:13]

keystrokes. As I type in scores and type

[343:15]

one single bracket, notice it finishes

[343:18]

my thought for me and magically puts the

[343:19]

cursor where I want it so I can put the

[343:22]

two right there and generally save on

[343:24]

keystrokes. But that has nothing to do

[343:25]

with C. just has to do with VS Code

[343:27]

trying to be now helpful. So I think now

[343:30]

if I go down here and do make scores dot

[343:33]

slashcores, we get the same answer, but

[343:35]

it's arguably better designed because I

[343:37]

now have one variable instead of three,

[343:40]

let alone many more. And in fact, if I

[343:42]

wanted to change the total number of

[343:44]

scores, I can just change what's in that

[343:46]

initial square bracket. So if we

[343:48]

consider what's going on now, if we look

[343:50]

at the computer's memory, it's the same

[343:52]

exact layout, but there's no more three

[343:54]

variable names. There's one scores

[343:56]

bracket zero, scores bracket one, and

[343:59]

scores bracket two. And notice here,

[344:00]

ever more important, an array's values

[344:03]

are indeed contiguous back to back to

[344:06]

back. Now, the screen is only so wide.

[344:08]

So, they kind of wrap around to the next

[344:10]

row of bytes, but the computer has no

[344:12]

notion of up, down, left, right. I mean,

[344:13]

it's just a piece of hardware that's got

[344:15]

lots of available that can be addressed

[344:18]

from the first bite all the way down to

[344:20]

the last bite. The wrapping is just a

[344:21]

visual artifact on this here screen. All

[344:24]

right. So if I've done this now, maybe

[344:26]

we can make this program a little more

[344:28]

dynamic than just hard- coding in my

[344:29]

scores. Let me go in and add the CS50

[344:32]

header library so that we could also use

[344:34]

for instance like get int and start

[344:36]

getting these scores dynamically. So I

[344:38]

could do get int and I could prompt the

[344:40]

user for a score. I could use get int

[344:43]

again and I can prompt the user for

[344:45]

another pet set score. I can use get int

[344:47]

a third time and prompt the user for a

[344:50]

third such score. And then pretty much

[344:53]

the rest of my code can stay the same.

[344:55]

Let's do make scores again. Dot

[344:57]

slashcores 72 73 33. And now my

[345:01]

program's a little more interactive.

[345:02]

Like this doesn't work for just my three

[345:03]

scores. It could work for anyone scores

[345:05]

in the class. Now this too hints of bad

[345:07]

design. I like my introduction of the

[345:09]

array because I now have one variable

[345:11]

instead of three. But what now might rub

[345:14]

you the wrong way among lines n 7, 8,

[345:17]

and nine? Yeahive.

[345:20]

>> It's repetitive. I mean, I typed it

[345:22]

manually, but I might as well have just

[345:23]

copied and pasted like literally the

[345:25]

same thing. So, what's a candidate for

[345:27]

fixing this? Like, what programming

[345:29]

construct might clean this up? Yeah,

[345:32]

>> yeah, we could use a for loop or a while

[345:34]

loop or whatever, but a for loop would

[345:35]

get the job done. And that's often my

[345:37]

go-to. So, let's do that instead. Let's

[345:39]

go under my declaration of the array and

[345:41]

do four int i= 0, i less than 3, i ++,

[345:46]

which we keep seeing again and again.

[345:48]

Uh, now how do I index into the array at

[345:51]

the right location? Well, here's where

[345:52]

the square brackets are kind of

[345:54]

powerful. I can just say my scores array

[345:57]

at the location I should get an int

[346:01]

from the user as follows. So now I'm

[346:04]

using get int once inside of a loop, but

[346:07]

because I keeps getting incremented as

[346:09]

we've done many a time now for meowing

[346:10]

and other goals, I'm putting the first

[346:12]

one at location zero. Why? Because I is

[346:15]

initialized to zero. I'm putting the

[346:17]

second one at location one. Why? Because

[346:19]

I'm going to plus+ or increment I on the

[346:22]

next iteration, then the next iteration.

[346:24]

So, this has the ultimate effect of

[346:26]

putting these three scores at location

[346:27]

zero, one, and two instead of me having

[346:31]

to type all of that out manually. Now, I

[346:34]

don't love how I've done this still. If

[346:36]

we really want to nitpick, this solves

[346:38]

the problem correctly, but it's kind of

[346:42]

got a poor design decision still. It's

[346:45]

got a a magic number as people say. What

[346:48]

is the magic number here and why is it

[346:50]

bad?

[346:51]

Yeah, over here.

[346:58]

>> Yeah, it was a little soft, but I think

[346:59]

the number three is hardcoded in two

[347:01]

places. We've got it on line six, which

[347:03]

is the size of the array, and then again

[347:05]

on line seven, which is how many times I

[347:07]

want to iterate. But those are the exact

[347:09]

same concepts, but it's on the honor

[347:12]

system that I type the number three

[347:13]

correctly both times. So, I think we can

[347:15]

fix this a little better. I could do

[347:17]

something like int n equals 3 and then I

[347:21]

could use n here and then I could use n

[347:24]

here so that now I only change it in one

[347:27]

place. If your eyes are wandering to the

[347:29]

bottom of the program, there's still a

[347:30]

problem here because I've still

[347:32]

hardcoded 0, one, and two, but we'll

[347:33]

come back to that. But this is arguably

[347:35]

a little better. But let's talk a little

[347:36]

bit about style. Typically when you have

[347:38]

a con when uh typically when you've got

[347:40]

a a variable that should not change its

[347:42]

value we saw last week that we should

[347:44]

declare it as constant and the trick

[347:46]

there is to literally just write const

[347:48]

for short in front of the type of the

[347:50]

variable and now it should not be

[347:52]

changeable by you by a colleague a

[347:54]

collaborator or the like but typically

[347:56]

too by convention stylistically to make

[347:59]

visually clear to another programmer

[348:01]

that this is a constant it's convention

[348:03]

also to capitalize constants so to

[348:05]

actually use like a capital N here in

[348:07]

all places just to make clear visually

[348:10]

that there's something interesting about

[348:12]

this variable and indeed it is a

[348:15]

constant that cannot be changed. All

[348:17]

right, with that refinement, I don't

[348:19]

think we've really improved the program

[348:21]

fundamentally. I think we're going to

[348:22]

need to do a bit more work to do this

[348:24]

really well. So, I'm going to do this a

[348:26]

little quickly, but mostly to make the

[348:27]

point that we can make this indeed more

[348:29]

dynamic. So, let me hide my terminal

[348:30]

window there. Let me go ahead now and

[348:33]

get the scores as I already am as

[348:36]

follows here. And let me go ahead and

[348:40]

uh assume for the sake of

[348:44]

time that we have a function that exists

[348:47]

already called average and I simply want

[348:49]

to pass in to that average function the

[348:52]

scores whose average I want to

[348:54]

calculate. So average does not exist off

[348:56]

the shelf like I can't just use an

[348:58]

existing library for it. I'm going to

[348:59]

have to implement this thing myself. But

[349:02]

how? All right. Well, let's go ahead and

[349:04]

do this. At the top of my file, I'm

[349:06]

going to go ahead and compute or define

[349:09]

a function called average uh that takes

[349:13]

in what? An array of numbers. So, this

[349:17]

syntax is going to be a bit new, but the

[349:18]

way I do this is int say array bracket

[349:22]

zero or array sounds a little too

[349:24]

generic. Let's just call it numbers for

[349:26]

instance here. So that says my average

[349:29]

function is going to take as an argument

[349:32]

an array of numbers. This average

[349:34]

function though should return a value

[349:36]

too. And it should return what type of

[349:38]

value from what we've seen thus far?

[349:42]

A number, a float specifically. It could

[349:44]

be int. But then I'm going to get short

[349:45]

changed my third of a point potentially.

[349:47]

So I think I wanted to return a float.

[349:48]

Or if you really want precision, you

[349:50]

could return a double just to be really

[349:51]

nitpicky. But that seems excessive here.

[349:54]

All right. Well, now inside of my

[349:55]

average function, how can I calculate

[349:57]

the average? Well, this is just kind of

[349:58]

like a math thing. So, I could declare a

[350:01]

variable called sum and set it equal to

[350:03]

zero. I could then have a for loop

[350:05]

inside of this function for int i gets

[350:08]

zero, i less than, huh? Uh, I'm going to

[350:11]

come back to this the number of numbers

[350:13]

in the array. And then I'm going to do i

[350:15]

++. And then on each iteration, I'm

[350:17]

going to do sum equals whatever the

[350:19]

current sum is plus whatever is in the

[350:22]

numbers array at that location. So I'm

[350:26]

going a little quickly, but again, I'm

[350:27]

just applying the same lesson learned.

[350:29]

Numbers is my array. Numbers bracket i

[350:32]

means go to the i location in there. But

[350:34]

if my loop starts at zero, that means go

[350:36]

to location zero and then one and then

[350:37]

two. And heck, if there's more scores in

[350:39]

this array, it's just going to keep

[350:40]

going on up from there because of the

[350:43]

plus+. But I hesitated here for a couple

[350:45]

of reasons. So I put a to-do here, which

[350:47]

is not a thing. That's a note to self.

[350:49]

How far do I iterate? Well, if you've

[350:51]

pro come into CS50 with programming

[350:53]

before, you can usually just ask an

[350:54]

array, aka a vector, what its length is

[350:56]

in Java and in Python and the like. You

[350:59]

can't do that in C. So if I want to know

[351:01]

what the length is of this array, I've

[351:04]

got to have the function tell me. So I'm

[351:07]

going to additionally propose that this

[351:09]

average function can't just take the

[351:11]

array. It's also going to have to take

[351:13]

another argument, a second input, for

[351:15]

instance, called length that tells me

[351:18]

how long it is. And then down here,

[351:20]

which is where we started the story,

[351:22]

when I use this so-called average

[351:24]

function, I'm going to have to tell the

[351:26]

average function by passing in n how

[351:29]

many numbers are in that array, just

[351:31]

because this is annoying that you have

[351:33]

to pass in not only the array, but also

[351:35]

its size separately. That's the way it's

[351:37]

done in C. More recent languages have

[351:39]

improved upon this. So you can just

[351:41]

figure out what the length of the array

[351:42]

is as we'll see in a few weeks in

[351:44]

Python. All right, back to the average

[351:47]

function at hand. I think we're almost

[351:48]

there. This is a little unnecessarily

[351:50]

verbose. Recall that we can tighten this

[351:52]

up by just doing plus equals whatever is

[351:56]

in numbers bracket I. That's just

[351:57]

tightening it up. It's syntactic sugar,

[351:59]

so to speak. And then the last thing I'm

[352:01]

going to do in my average function is

[352:03]

what? Actually calculate the average. So

[352:05]

what is the average? It's just the

[352:06]

numerator. like the sum of all of the

[352:08]

scores divided by the total number of

[352:10]

all of the scores. Well, I've got the

[352:12]

sum. So, I think I just want to do sum

[352:15]

divided by what to get the actual

[352:18]

average now?

[352:20]

>> Yeah.

[352:23]

>> Exactly. Sum divided by length will give

[352:25]

me the average because the sum is the

[352:27]

numerator effectively all of the scores

[352:29]

added together and the denominator is

[352:30]

the length. How many numbers were there

[352:33]

actually? Now, I can't just write this

[352:34]

math expression here. If this is going

[352:36]

to be my function's return value, and

[352:39]

we've done this once or twice before, I

[352:41]

literally say in my average function,

[352:43]

return this value. So, it hands back the

[352:45]

work. I could use print f and just print

[352:47]

it on the screen, but I don't want that

[352:48]

visual side effect. I want to hand it

[352:50]

back so that on line 23, I can simply

[352:55]

calculate the average of those n scores

[352:58]

and let print f use it as the value of

[353:01]

that format code percent f.

[353:05]

All right. Unfort uh I think we are in

[353:08]

reasonably good shape. Let me cross my

[353:11]

fingers now and hope I didn't screw this

[353:13]

up. Make scores. Okay. Dot slashcores.

[353:17]

How many do we want to do? So we'll do

[353:19]

72 73 33. Enter. And there is Oh, so

[353:24]

close. Average.

[353:27]

I've had a regression. I've made the

[353:29]

same mistake again just in a different

[353:31]

way. I think I saw your hand go up. Why

[353:34]

am I getting 59 and I'm not getting my

[353:37]

third of a point?

[353:45]

>> Yeah, I in this return line on line 11.

[353:48]

Right now, I'm again stupidly doing

[353:50]

integer divided by integer. That will

[353:52]

make us suffer from integer integer

[353:54]

truncation because if you're returning

[353:55]

an integer, there's no room for the

[353:57]

decimal point or any numbers thereafter.

[353:58]

So, how do we fix this? Well, I could

[354:01]

change the sum to float. like that would

[354:03]

be reasonable. So then I do a float

[354:05]

divided by the length. I could do my

[354:07]

casting trick like convert the float the

[354:09]

length to a float just for the sake of

[354:12]

floating point arithmetic. There's a

[354:13]

bunch of ways to solve this but I think

[354:15]

I'll go with this one. Now let me now do

[354:17]

make scores again dot/score 72 73 33 and

[354:22]

now I've got albeit with some

[354:23]

imprecision I think enough precision

[354:25]

certainly for like a college grade in

[354:27]

this case 59.33

[354:30]

and so forth. Okay. So what are the

[354:32]

things to actually care about here? So

[354:34]

there's a decent amount of code here.

[354:36]

Most of it is sort of stuff we've seen

[354:38]

before, but the interesting parts I

[354:39]

would propose are this. When you create

[354:42]

your own function that takes an array as

[354:44]

input, you have to take as input the

[354:47]

length of the array. You're not going to

[354:48]

be able to figure it out correctly. As

[354:50]

in mo newer languages, you also need, of

[354:53]

course, to pass in the array itself. How

[354:55]

do you pass in an array? Well, when

[354:57]

you're defining the function, you

[354:58]

specify the type of values in the array.

[355:01]

whatever you want to name the array

[355:02]

inside of this function and then you use

[355:04]

empty square brackets like this. You

[355:06]

don't have to put n or some other number

[355:08]

there. All you need to tell the compiler

[355:10]

is that my average function is going to

[355:12]

take some array of values specifically

[355:15]

this many. You don't put it inside the

[355:17]

square brackets there. Then when I use

[355:19]

it now it's just the now familiar syntax

[355:22]

when you want to index into your array

[355:23]

that is go to location zero or one or

[355:25]

two you just use square bracket notation

[355:27]

here. But the array itself, recall, was

[355:30]

actually created in Maine when I did

[355:32]

this line of code here where I said,

[355:34]

give me an array called scores, each of

[355:36]

whose values is going to be an int, and

[355:38]

I want this many of them. And so maybe

[355:41]

the final flourish that I'll add here,

[355:43]

just to be sort of nitpicky, is I keep

[355:45]

saying that main should really go at the

[355:47]

top. Fine, no big deal. Let me highlight

[355:49]

my average function, move it to the

[355:51]

bottom of my file just because, and then

[355:54]

and only then I'll copy and paste that

[355:56]

first line, the so-called prototype, so

[355:58]

that Clang doesn't freak out by not

[356:00]

knowing what the average function is. So

[356:03]

in short, there's seemingly a bunch of

[356:05]

complexity here, but all we're the only

[356:07]

thing that's really new in this one

[356:09]

example is this is how you pass to a

[356:11]

function an array that already exists

[356:14]

elsewhere, not by its name, but by with

[356:16]

the square brackets there.

[356:18]

Okay,

[356:20]

questions on arrays or any of this new

[356:23]

syntax? Yeah,

[356:25]

>> a bit slow, but

[356:28]

back when you did the whole like average

[356:30]

thing,

[356:30]

>> okay,

[356:31]

>> you said that we could store it as a

[356:33]

float

[356:34]

>> and instead of saying 3.0 was a float,

[356:36]

you just said because 3.0 is a float.

[356:39]

How does it know it's not a double?

[356:41]

>> Oh, uh, how does it know it's not a

[356:43]

double? So, by default, if you just type

[356:45]

a number like 3.0 zero into your code,

[356:48]

it will be assumed to be a double just

[356:51]

because um raw values, literal numbers

[356:54]

with a decimal point will be treated by

[356:56]

the compiler as doubles and be allocated

[356:58]

64 bits.

[357:00]

>> So how come you still do percentage?

[357:02]

>> Uh uh just because like the world did

[357:05]

not need to create a new format code

[357:06]

like percent D is not double percent D

[357:09]

is decimal integer but don't worry about

[357:11]

that. We tend not to talk about it too

[357:13]

much in class. Percent I is integer.

[357:14]

Percent F is float. But percent F is

[357:18]

also double. And this is not consistent

[357:20]

because what's a long percent L L I.

[357:25]

What did I say last week? Percent LI

[357:27]

gives you a long integer. It's just a

[357:30]

mess. That's there's no good reason for

[357:32]

this other than historical baggage.

[357:34]

>> Thank you.

[357:35]

>> Sure. I'm not sure if that's reassuring,

[357:37]

but All right. So,

[357:40]

um

[357:42]

Okay. Let's use these this knowledge for

[357:44]

like something useful now and actually

[357:46]

tease apart what is uh how we can use

[357:49]

these um these skills for good and to

[357:51]

better understand what's going on inside

[357:53]

of the computer as follows. Let me go

[357:55]

over to our grid of memory and this time

[357:57]

let's not store some numbers but let's

[357:58]

store like these three lines of code

[358:00]

these three variables. So three chars

[358:02]

even though we you know where this is

[358:03]

going like this is not good design

[358:04]

because I got three stupidly named

[358:06]

variables C1 C2 C3 but let's make a

[358:08]

point first. The first variable's value

[358:10]

is quote unquote H. Second is I. Third

[358:13]

is exclamation point. Why though am I

[358:15]

using single quotes suddenly instead of

[358:16]

double quotes?

[358:18]

>> It's a character. Chars are single

[358:20]

quotes. Strings are double quotes. And

[358:23]

we'll see the distinction why in a

[358:24]

moment. So for instance, if this is my

[358:25]

grid of memory and this program contains

[358:27]

just three variables, each of them a

[358:28]

char. Odds are they'll end up like this

[358:31]

in memory. C1, C2, C3, HI, exclamation

[358:34]

point. Assuming there's nothing else

[358:35]

going on in my program, they're just

[358:37]

going to end up being back to back to

[358:38]

back in this way. even though it might

[358:40]

not uh in in this way. So what does this

[358:44]

really mean is going on? Well, let's go

[358:46]

ahead and poke around. Let me go back to

[358:48]

VS Code here. Let's close scores.c

[358:51]

reopen my terminal and let's create a

[358:52]

new program called high C and just do

[358:55]

something playful. So let me include

[358:57]

standard io.h at the top. Let me do int

[358:59]

main void after that. And inside of my

[359:01]

curly braces, let's just repeat this. C1

[359:03]

equals H in caps. Char C2 equals I in

[359:08]

caps. and then char C3 equals

[359:10]

exclamation point in cap uh in

[359:13]

exclamation point. That's all. Now,

[359:15]

let's actually poke around and see

[359:18]

what's inside the computer's memory. So,

[359:19]

I could do something like this. I could

[359:21]

print f for instance, percent c percent

[359:27]

back slashn and percent c turns out

[359:29]

means character. So, what do I want to

[359:31]

plug in? C1, C2, and C3 semicolon. So,

[359:35]

let's go ahead and do this. Make high.

[359:37]

enter dot /h high and voila, there's my

[359:41]

hi exclamation point. There's no magic

[359:42]

here. Like I'm literally just printing

[359:44]

out three char variables. I can I don't

[359:46]

need the spaces. If I want to get rid of

[359:48]

those spaces between the word, I can

[359:50]

remake this. Make high dot /h high. And

[359:53]

now we're back in business. hi

[359:54]

exclamation point. But here's where an

[359:57]

understanding of types can give you a

[359:58]

bit of power and sort of satiate some

[360:00]

curiosity. What if I change my percent C

[360:03]

to percent I? percent I percent i. So

[360:07]

int int int. Well, turns out that a char

[360:12]

is really just a number because it's an

[360:13]

asky value from 0 to 255. So there's

[360:16]

nothing stopping me from telling the

[360:18]

compiler, don't print these as chars,

[360:20]

print them as integers. So let's do make

[360:22]

high dot /h high. Enter. And that's a

[360:25]

little cryptic. It looks like it's

[360:26]

saying 727,333,

[360:29]

but no, let me add those spaces back in

[360:32]

between each of those placeholders. make

[360:34]

high again dot /hi there are our old

[360:38]

friends 72 73 33 it is not necessary in

[360:42]

this case to say int int int because the

[360:47]

compiler is smart enough and print f is

[360:49]

smart enough that if you hand it a value

[360:51]

that happens to be a char it knows

[360:53]

already it's going to be an integer

[360:54]

essentially so you don't even need to

[360:56]

bother explicitly casting it this way

[360:58]

we're essentially implicitly casting it

[361:00]

to an integer by using those format

[361:02]

codes as such. All right, so that just

[361:05]

proves that what I've claimed is the

[361:06]

case, that there is this equivalence

[361:08]

between characters and numbers is

[361:10]

actually the case inside of the

[361:12]

computer's memory. So even though you're

[361:13]

storing hi exclamation point,

[361:14]

technically you're storing three

[361:16]

patterns of eight bits each that give

[361:18]

you these decimal numbers 72, 73, and 33

[361:22]

or specifically these patterns here. All

[361:25]

right, then what is a string? And this

[361:27]

is where things get a little more

[361:28]

interesting. string as we've used it is

[361:29]

like a whole word or a phrase or when we

[361:32]

started class today like a whole

[361:33]

paragraph of text. So that's multiple

[361:35]

values. Now why is that interesting for

[361:38]

us potentially? Well, let's go ahead and

[361:40]

write one line of code as a string. So

[361:42]

here for instance is one line of code

[361:44]

with a string. Let's go ahead and put

[361:45]

that into my program. So I'm going to go

[361:47]

back to VS Code here and clear my

[361:49]

terminal. And I'm going to go ahead and

[361:50]

delete all of this code here for a

[361:53]

moment. And I'm going to do something

[361:55]

like this. String s equals quote unquote

[361:59]

high with excl uh with double quotes

[362:01]

now. And now just like in week one, I'm

[362:04]

going to print out percent s back slashn

[362:07]

and print out the value of s per earlier

[362:10]

because string is technically one of our

[362:11]

training wheels for just a few weeks.

[362:13]

I'm going to additionally include cs50.h

[362:16]

at the top so that the compiler knows

[362:18]

about what this word is string. All

[362:21]

right, let's go into the terminal. make

[362:23]

high dot /h high enter and we're back in

[362:26]

business printing that out now as an

[362:29]

entire string. Well, what's going on

[362:31]

inside of the computer's memory this

[362:32]

time? Well, I still have hi exclamation

[362:35]

point, but it's a string now. Well, it

[362:37]

turns out the way that's going to be

[362:39]

laid out in the computer's memory is

[362:40]

exactly like before. There's no mention

[362:42]

of C1, C2, C3 because those variables

[362:44]

don't exist. There's just one variable

[362:46]

S, but it's referring to three bytes of

[362:48]

memory, it would seem. hi exclamation

[362:52]

point. And you can kind of see where

[362:53]

this is going. Like a string, as a

[362:55]

spoiler, turns out is actually just what

[363:00]

an array.

[363:00]

>> It's just going to be an array of

[363:02]

characters. Hence the the dots we're

[363:04]

trying to connect today. So at the

[363:06]

moment though, this is a single variable

[363:07]

s a string. The value of which is hi

[363:10]

exclamation point. But you know what? If

[363:12]

it is in fact an array, I bet we can

[363:14]

start playing around with our new square

[363:15]

bracket notation and see as much in our

[363:18]

actual code. So in fact, let me go ahead

[363:20]

and do this in VS Code. Now let's not

[363:23]

use percent S. Let's use percent C,

[363:26]

percent C, and percent C three times.

[363:29]

Then instead of just S, let's print it

[363:31]

out like it is an array. S bracket zero,

[363:34]

S bracket 1, S bracket 2. Let's go back

[363:38]

to VS Code. Uh my terminal in VS Code,

[363:40]

make high dot slhigh. and nothing has

[363:43]

changed, but I'm printing it out now one

[363:46]

character at a time because I understand

[363:48]

what's going on underneath the hood. In

[363:50]

this case, I can actually see these

[363:52]

values. Now, let's go ahead and change

[363:54]

the percent C to percent I and add a

[363:56]

space just so it's easier to read.

[363:58]

Percent i space percent i space. I don't

[364:01]

need my casts in parenthesis because

[364:03]

print f is smart enough to do this for

[364:04]

me. Make high again dot /h high. There

[364:08]

again is my 72 733. However, that came

[364:12]

from the mere fact that I put in double

[364:14]

quotes hi exclamation point. So, what's

[364:18]

really happening here is it seems that a

[364:21]

string is indeed just an array of

[364:24]

characters.

[364:25]

But how does the computer know when

[364:27]

doing percent s know what to actually

[364:30]

print? In other words, it stands to

[364:32]

reason that eventually if I've got more

[364:34]

variables, more code, there's going to

[364:35]

be other stuff in the computer's memory.

[364:37]

Why does print f know when using percent

[364:40]

s to stop here and not just keep

[364:43]

printing characters that are over here?

[364:44]

Especially if I did have more variables

[364:46]

and more stuff in memory. Well, let's

[364:48]

take a look at what's just past the end

[364:51]

of this array. Let's go back to VS Code.

[364:53]

And now let's get a little crazy and add

[364:54]

in a fourth percent I. And even though

[364:58]

this shouldn't exist, let's do S bracket

[365:00]

three, which even though it's the number

[365:02]

three, it's the fourth location, but hi

[365:04]

exclamation point is only three values.

[365:06]

So, let's look one location past the end

[365:10]

of this array. Make high dot slashh

[365:13]

high. Interesting. It seems, and maybe

[365:17]

it's just luck, good or bad, that the

[365:20]

fourth bite in the computer's memory

[365:22]

seems to be a zero. Well, that's

[365:26]

actually very much by design. And it

[365:27]

turns out if we look a little further by

[365:29]

convention what the compiler will do for

[365:32]

us automatically is terminate that is

[365:35]

end any string we put in double quotes

[365:38]

with a pattern of 8 zero bits. More

[365:42]

succinctly it's just the number zero

[365:44]

because if you do out the math you've

[365:45]

got eight zeros it gives you zero in

[365:46]

decimal or more technically the way it's

[365:49]

typically written is this because it's

[365:51]

not like the number zero that we want to

[365:52]

see on the screen. back slashz0 similar

[365:55]

to back slashn is sort of a special

[365:56]

escape character. This just means

[365:59]

literally 8 zero bits not the number

[366:02]

zero that you might see in a phone

[366:03]

number or something like that. So even

[366:06]

though we said string s equals quote

[366:09]

unquote high with an exclamation point

[366:11]

seemingly three characters, how many

[366:13]

bytes does a string of length three

[366:17]

actually seem to take up in memory?

[366:20]

It's actually going to be four. Then

[366:22]

this happens automatically. That's what

[366:24]

the double quotes are doing for you.

[366:25]

They're telling the compiler, "This is

[366:27]

not just a single character. This is a

[366:28]

sequence of characters. Please be sure

[366:30]

to terminate it for me automatically

[366:33]

with a special pattern of 8 bits." And

[366:37]

that special pattern of 8 zits actually

[366:38]

has a name. It's the so-called null

[366:41]

character or null for short. The null

[366:45]

character is just a bite of zero bits

[366:48]

and it represents the end of a string.

[366:50]

You've actually seen it before if super

[366:52]

briefly two weeks ago. Here was our ASKI

[366:54]

chart and we focused mostly on like this

[366:56]

column here and this column here and

[366:58]

then we looked at the exclamation point

[366:59]

over here. But all this time over here

[367:02]

asky character zero is null n which just

[367:06]

means that's how you pronounce all eight

[367:08]

zero bits. It's been there this whole

[367:10]

time. So why is it done this way? Well,

[367:13]

how is the computer actually printing

[367:16]

something out in memory? Well, it needs

[367:18]

to know where to stop. Print F is pretty

[367:20]

stupid. Odds are inside of print f

[367:22]

there's just a loop that starts printing

[367:23]

the first character, the next character,

[367:24]

the next character, and it's looking for

[367:26]

the end of the string. Why? Well,

[367:28]

consider what might happen. Suppose

[367:30]

you've got a program that has not just

[367:31]

one string, but two. For instance, two

[367:33]

strings like this. So, in fact, let me

[367:35]

go back to VS Code here, clear my

[367:37]

terminal, and let's just make this

[367:38]

program a little more interesting for a

[367:40]

moment. String t equals quote unquote

[367:43]

by, for instance. And then down here,

[367:45]

let's do two print fs. percent s back

[367:47]

slashn and print out s print f percent s

[367:51]

back slashn print out t. Now to be

[367:53]

clear, percent s means string

[367:55]

placeholder. T and s are just also the

[367:57]

names of the variables. There's no

[367:59]

percent t that we want to use here. All

[368:01]

right, let me go down to my terminal

[368:03]

make high and voila, I get high and by

[368:06]

just like you would have expected last

[368:07]

week. But what's going on inside of the

[368:09]

computer's memory? Well, in so far I

[368:11]

asked I have asked it to create two

[368:13]

variables s and t like this. Odds are

[368:16]

what's happening in the computer's

[368:17]

memory is high is ending up here aka s t

[368:21]

because there's nothing else in this

[368:22]

program is probably going to end up here

[368:24]

b exclamation point but it wraps on this

[368:26]

particular screen. T is taking up 1 2 3

[368:30]

4 five bytes total just as high is

[368:33]

taking up four bytes total because the

[368:36]

compiler is automatically adding for me

[368:38]

the back slashzero the null character to

[368:41]

make clear to other functions where this

[368:44]

string ends.

[368:47]

So what does this mean in real terms and

[368:50]

why is it zero? Well, why is it zero?

[368:52]

Like h just because like at the end of

[368:54]

the day all we have is bits. We've got

[368:55]

eight bits to work with for chars. You

[368:57]

got to pick some pattern. We could have

[368:58]

chosen all ones. We could have chosen

[369:00]

all zeros. We could have chosen

[369:01]

something arbitrary. A bunch of humans

[369:03]

in a room years ago decided eight zeros

[369:06]

will mean the null character. That's the

[369:07]

special character we will use to

[369:09]

terminate strings in this way. Well,

[369:12]

what does that mean with our new syntax?

[369:14]

Well, it means we could poke around with

[369:16]

strings as well. So, even though that

[369:18]

first variable is S and that second one

[369:19]

is T, you could technically poke around

[369:21]

and access S brackets 0 and 1 and 2 and

[369:24]

3. t bracket 0 1 2 3 and four and so

[369:27]

forth. So, in fact, if I wanted to dive

[369:29]

in deeply there and actually see that,

[369:32]

well, let me go ahead and do this. Uh,

[369:34]

back in VS Code here, let me make a

[369:36]

refinement here. I've now got, uh, my

[369:38]

two strings here. Um, I could go and,

[369:42]

for instance, down here, just like

[369:44]

before, percent C, percent C, percent C,

[369:47]

percent C, percent C, percent C, percent

[369:50]

C. And if I then do s bracket zero, uh,

[369:55]

s bracket 1, s bracket 2, whoops, two,

[369:59]

and then down here, t bracket zero, t

[370:02]

bracket 1, t bracket 2, t bracket three,

[370:05]

and I'm doing that only because the word

[370:06]

by is longer than the word high. If I do

[370:09]

make high, same principles work even in

[370:12]

this context here. But let's add an

[370:14]

interesting twist just because if I have

[370:16]

these values in memory here uh as

[370:20]

follows. Well, it's kind if I've got two

[370:23]

words in memory, I could use them in an

[370:25]

array too. Instead of having like s and

[370:28]

t or word one and word two, I can

[370:30]

actually put strings in an array, too.

[370:32]

So, let's go ahead and do this. Let me

[370:33]

go back to VS Code. And just for fun

[370:36]

now, let's go ahead and do this. Give me

[370:38]

an array called words that's going to

[370:40]

fit two strings. Then in the first

[370:43]

words, words bracket zero, put hi. Then

[370:46]

in words bracket one, put by. The only

[370:49]

thing new here is that I'm making an

[370:51]

array of strings now instead of an array

[370:52]

of ins. But all of the syntax is exactly

[370:55]

the same. How can I go about printing

[370:56]

these things? Well, just as before, I

[370:58]

can do print f percent s back slashn and

[371:01]

print out words bracket zero. Then I can

[371:04]

do print f quote unquote s back slashn

[371:07]

words bracket one. And again, I'm just

[371:09]

sort of applying the same simple syntax

[371:12]

that we saw before. SLHigh again of the

[371:15]

sixth version of this program, right?

[371:17]

I'm just sort of jumping through

[371:18]

syntactically to demonstrate that these

[371:20]

are just different lenses through which

[371:22]

to look at the exact same idea. And

[371:25]

while a normal person would not do this,

[371:28]

we could think about what's really going

[371:31]

on in memory with arrays of words when

[371:34]

those words themselves are arrays of

[371:36]

characters. because a word is just a

[371:38]

string. So this code here gives us

[371:40]

something like this in memory in that

[371:42]

program a moment ago. This is words

[371:44]

bracket zero. This is words bracket one.

[371:46]

The only thing that's different is I'm

[371:47]

not calling them sn. I've given them one

[371:49]

name with two locations 0 and one. Well,

[371:53]

if each of these values is itself a

[371:57]

string, well, you said earlier that a

[371:59]

string is just an array. So we can

[372:01]

actually think of these two strings even

[372:03]

though the syntax is getting a little

[372:04]

crazy using two sets of square bracket

[372:07]

notation where I can index into my array

[372:10]

of words and then index into the

[372:13]

individual letters of that word by just

[372:16]

using more square brackets. And again,

[372:18]

this is just to demonstrate a point, not

[372:19]

because a normal person would do this.

[372:21]

But if I go back to VS Code, instead of

[372:24]

printing out these two strings, why

[372:26]

don't I do something like this? Print f

[372:28]

quote unquote percent C percent C

[372:30]

percent C back slashn. Then let's print

[372:33]

out the first word, but the first

[372:36]

character therein. Let's print out the

[372:38]

first word, but the second character

[372:40]

therein, the first word, but the third

[372:43]

character therein. And even though I'm

[372:44]

saying third and second and first, it's

[372:47]

2, 1, and zero respectively because we

[372:49]

start counting at zero. And then lastly

[372:52]

here, we can print out the second word.

[372:53]

Percent C, percent C, percent C, percent

[372:55]

C, back slashn, then words bracket. How

[372:59]

do I get to the second word in this

[373:00]

array?

[373:02]

Words bracket one, the first character

[373:05]

they're in. Words bracket one, the

[373:07]

second character they're in. Words

[373:08]

bracket one, the third character they're

[373:11]

in. words bracket one the last character

[373:14]

therein and again I'm this is just to

[373:16]

demonstrate a point but if I do make

[373:18]

high now dot slashh high we have full

[373:21]

control over everything that's going on

[373:23]

if you now do agree and understand that

[373:26]

an array can be indexed into square

[373:29]

bracket notation as can a string because

[373:31]

a string is itself just an array strings

[373:35]

are arrays for today's purposes then

[373:39]

questions on any and all of these

[373:42]

tricks.

[373:46]

No. All right. Yeah. In front.

[373:53]

>> Okay.

[374:01]

How do you like that?

[374:04]

>> How do you establish or create an array?

[374:05]

Well, in the context of this program, if

[374:07]

I go back to VS Code, line six here

[374:10]

gives me an array of size two, an array

[374:13]

of two strings, if you will. The

[374:15]

previous example we were playing with,

[374:16]

which was my scores, uh, whoops, wrong

[374:20]

program, wrong file. If I open up scores

[374:23]

C as before, this line here, line nine,

[374:27]

gives me an array of n integers.

[374:31]

So, that is what establishes or creates

[374:33]

the array in memory. You specify a name,

[374:35]

the size, and the type.

[374:38]

That's all. And the only thing that's

[374:40]

new today again is the square bracket

[374:42]

notation, which in this context creates

[374:44]

an array of that size. But once it

[374:46]

exists, you can then access that chunk

[374:49]

of memory by using square brackets as

[374:51]

well.

[374:55]

Other questions on arrays? Yeah, in

[374:57]

front.

[375:00]

all the values in the array as you

[375:02]

declare it or do you need to go in index

[375:06]

by index to declare?

[375:08]

>> Good question. Do you need to go index

[375:11]

by index to put things inside of an

[375:12]

array? Short answer, no. So, let me open

[375:15]

up again scores.c from before and what I

[375:18]

could have done in an earlier version of

[375:21]

my program would be something like this.

[375:25]

I could have done 72 73 33. And I

[375:28]

deliberately didn't show this because I

[375:30]

didn't want to add too much complexity,

[375:31]

but you can use curly braces in this new

[375:33]

way and initialize the array in one

[375:36]

line. And in that case, you don't even

[375:38]

need to specify the size because the

[375:39]

compiler is not an idiot. It can figure

[375:41]

out that if you've got three numbers on

[375:42]

the right, it knows that it only needs

[375:44]

three elements on the left to put them

[375:46]

into. But let me undo that and leave it

[375:48]

just as I did. But short answer, yes.

[375:49]

You can statically initialize an array

[375:51]

if you know all of the values up front

[375:53]

and not when using get int.

[375:56]

All right. So, if you're on board with

[375:58]

the idea that all a string is is an

[376:02]

array and that array is always null

[376:04]

terminated, we can now

[376:08]

use that knowledge to like solve some

[376:09]

simple problems and problems that others

[376:12]

have already solved before us. So, let

[376:13]

me go ahead and close that file in VS

[376:15]

Code. Let me go ahead and open up

[376:17]

another program here called length.c.

[376:19]

And let's just play around with the

[376:20]

length of strings as follows. Let me

[376:22]

include the CS50 library at the top. Let

[376:24]

me include standard io after that. Let

[376:27]

me do int main void after that. And then

[376:29]

inside of main, let's prompt the user

[376:32]

for their name by using get string and

[376:35]

just say name colon today. And then

[376:37]

after that, let's go ahead and figure

[376:39]

out the length of the person's name.

[376:41]

Like d- avid, I should get the answer of

[376:43]

five. And ke ly, we should get the

[376:45]

answer of five. And hopefully for a

[376:47]

longer or shorter name, we'll get the

[376:48]

correct answer as well. So, how can I go

[376:51]

about counting the number of characters

[376:54]

in a string? Well, the string is just an

[376:57]

array, and that array ends with the null

[377:00]

character. There's a bunch of ways we

[377:02]

can do this, but let me go ahead and do

[377:03]

this. Let me create a variable called n,

[377:06]

which eventually will contain the length

[377:08]

of the name. And I'm going to set it

[377:10]

equal to zero because I don't know

[377:11]

anything yet about the length. Then, I

[377:14]

can do this with a for loop, but I

[377:16]

prefer this time to use a while loop.

[377:18]

I'm gonna say the following. While the

[377:20]

person's name at that location does not

[377:24]

equal backs slashz0,

[377:27]

go ahead and add one to the value of n.

[377:31]

And then after all of this, go ahead and

[377:34]

print out with percent i back slashn the

[377:38]

value of n. So what's going on here?

[377:41]

This is easier said when you know

[377:42]

already where you want to go with it,

[377:44]

but with practice, you too can bang this

[377:45]

out pretty quickly. n is going to

[377:47]

contain the length of my string. I have

[377:49]

in my loop here a boolean expression

[377:51]

that's just asking the question, does

[377:53]

name at the current value of n not equal

[377:57]

the null character? In other words,

[377:58]

you're asking yourself, is this

[378:00]

character null? Is this character null?

[378:01]

Is this character null? Is this

[378:02]

character null? And if not, you keep

[378:04]

going. You keep going. And this is kind

[378:06]

of a clever trick because I'm using n

[378:09]

and incrementing it inside the loop. So

[378:11]

when I look at d, that's not equal to

[378:13]

back slashz. So I increment n. Now n is

[378:16]

one. So I look at name bracket one.

[378:18]

What's at name bracket one if it's my

[378:20]

name? A. A does not equal back slashz0.

[378:23]

So it increments n. What's at location

[378:25]

two in dav ID? V. V does not equal back

[378:28]

slashn. So we repeat with i. We repeat

[378:31]

with d. And then we get to the end of my

[378:34]

name which is the null character because

[378:37]

the get string function and c put it

[378:39]

there automatically for me. The null

[378:42]

character does equal backs slash0. n

[378:45]

does not get incremented any more time.

[378:47]

So at this point in the story on line

[378:49]

13, n is still five because I have not

[378:51]

counted the new the null character. So I

[378:54]

hope I will see five on the screen. This

[378:57]

is just kind of a very mechanical way of

[378:59]

checking checking checking checking

[379:00]

trying to figure out uh through

[379:02]

inference how long the string is because

[379:05]

it's as long as it takes to get to that

[379:07]

back slash zero the null character. So,

[379:09]

let's do make length. Enter dot slength.

[379:13]

Type in my name, David. And I indeed get

[379:15]

five. Let's go ahead and dolength Kelly.

[379:18]

I indeed get five. And hopefully for

[379:20]

shorter and longer names, I'm going to

[379:21]

get the exact same thing, too. In fact,

[379:23]

we can try a corner case. Dot

[379:25]

slashlength. Enter. Let's not give it a

[379:27]

name at all. If I just hit enter here,

[379:30]

what should the length of the person's

[379:31]

name be?

[379:33]

Zero. Which is not incorrect. It's

[379:35]

literally true. But that's because we're

[379:38]

going to get back essentially quote

[379:39]

unquote. But even though it's quote

[379:41]

unquote in the computer's memory, it's

[379:43]

still going to take up one bite because

[379:45]

the get string function will still put

[379:47]

null at the end of the string even if

[379:50]

it's got no characters therein. So it

[379:53]

turns out this is not something you need

[379:55]

to do frequently like initializing a

[379:57]

variable using a loop like this. It

[379:59]

turns out there are better solutions to

[380:01]

this problem. You do not need to

[380:02]

reinvent this wheel yourself because it

[380:04]

turns out in addition to standard io.h H

[380:06]

and CS50.h and as you probably saw in

[380:09]

problem set one, math.h uh and perhaps

[380:12]

others. There are other libraries out

[380:14]

there, namely the string library itself.

[380:16]

In fact, if you go into the CS50 manual,

[380:18]

you can look up the documentation for a

[380:20]

header file called string.h, which

[380:22]

contains declarations for that is

[380:24]

prototypes for a whole bunch of helpful

[380:26]

functions. In fact, the manual pages for

[380:28]

it are at this URL here. The most

[380:31]

important function and the one we're

[380:32]

going to use so often for the next few

[380:34]

weeks is wonderfully called stir lang

[380:36]

for string length. Someone else

[380:39]

literally decades ago wrote the code

[380:41]

that essentially looks quite like this

[380:44]

but packaged it up in a function that

[380:45]

you and I can use. So we don't have to

[380:48]

jump through these stupid hoops just to

[380:49]

count the length of a string. We can

[380:51]

just ask the string length function what

[380:53]

the length of a string is. But odds are

[380:55]

if we looked at the C code that someone

[380:58]

wrote decades ago, it would look indeed

[381:00]

quite like this. So how can I simplify

[381:02]

this program? Well, I can get rid of all

[381:05]

of this code here. I can include

[381:08]

string.h at the top of my file. And then

[381:11]

I quite simply could do something like

[381:13]

this. int length equals sterling of

[381:17]

name. That's going to put in the

[381:19]

variable length. Actually, let's be

[381:20]

consistent. int n equals stir length of

[381:23]

name. And then on line nine, let's print

[381:25]

it out. Let's try this. Make length dot

[381:28]

slashlength David. Okay, Kelly. Okay,

[381:33]

and no one. And zero. It seems to now be

[381:36]

working. So this is a wheel we do not

[381:38]

need to in reinvent. And frankly, now in

[381:40]

a matter of design, I don't really need

[381:42]

the variable n anymore. Recall that we

[381:44]

can nest our functions just like we did

[381:46]

with average before. So let me get rid

[381:47]

of that line and just say sterling of

[381:50]

name is actually perfectly reasonable

[381:52]

here. All right. Well, what more can we

[381:55]

do with this? Well, let's consider some

[381:57]

other matters of design. Let me close

[381:58]

out length C and let's create another

[382:00]

program of our own called string.

[382:03]

C in which we'll play around now with

[382:05]

this library and others. Let me go ahead

[382:06]

and include cs50.h.

[382:09]

Let me go ahead and include standard

[382:11]

io.h. Let me go ahead and include also

[382:14]

string.h.

[382:16]

All right, what do I want to now do?

[382:18]

Well, in main void and inside of main,

[382:20]

let's go ahead and write a program that

[382:23]

prints a string character by character

[382:24]

just to demonstrate these mechanics. So,

[382:26]

string s equals get string and I'm going

[382:29]

to ask the user for some input because I

[382:31]

just want to play around with any old

[382:32]

string. I'm going to go ahead and

[382:33]

proactively say output here and I'm

[382:37]

going to go ahead and uh not use a new

[382:40]

line character there deliberately below

[382:42]

this. Now I'm going to have a for loop,

[382:43]

though I could use a while loop that

[382:44]

says int i equals z, i is less than

[382:47]

sterling lang of s, the string I just

[382:50]

got from the human, and increment i on

[382:52]

each iteration. And on each iteration,

[382:54]

print out just one character in that

[382:57]

string, specifically at s location i.

[383:00]

And then at the very bottom of this

[383:02]

program, let's just print a single

[383:03]

backslash n to move the character onto a

[383:04]

new line. Long story short, what have I

[383:06]

done? I wrote a stupid little program

[383:08]

that prompts the user for a string,

[383:10]

prints the word output thereafter, and

[383:12]

then it just prints the word that they

[383:14]

typed in character by character by

[383:15]

character by character until it reaches

[383:17]

the end of the string based on the

[383:18]

length returned by Sterling. So, let's

[383:22]

go ahead and run this in my terminal

[383:23]

window. I'm going to do make string dot

[383:26]

sling and I'll type in my own name of

[383:27]

before. This was a subtlety. I

[383:29]

deliberately wrote two spaces here

[383:31]

because I just um to be nitpicky, I

[383:34]

wanted input and output to line up

[383:35]

perfectly. So you can see what's

[383:36]

happening. Indeed, if I do enter here,

[383:39]

now I see input is David. The output is

[383:41]

David as well. So that was just a

[383:42]

formatting trick that I foresaw.

[383:45]

Why is this program correct but not

[383:47]

arguably well-designed?

[383:50]

It's pretty good in that it's using the

[383:52]

Sterling function. I didn't reinvent the

[383:53]

wheel unnecessarily, but there's an

[383:55]

inefficiency that's kind of subtle.

[384:00]

And it relates to how a for loop works.

[384:06]

Any thoughts? This program I claim is

[384:09]

doing unnecessary work somewhere.

[384:13]

Yeah.

[384:14]

>> Why do you have to character?

[384:16]

>> Okay, that's definitely stupid. Um, you

[384:18]

don't have to output a character by

[384:19]

character. That's just my pedagogical

[384:21]

decision here. So, correct, but not the

[384:23]

question we're fishing for. There's a

[384:26]

second stupid thing. Yeah.

[384:34]

>> Yes. Every time through this loop, and

[384:37]

this isn't so much my conscious choice,

[384:40]

but my mistake. I'm checking the length

[384:42]

of S again and again. Why? Because

[384:44]

recall how a for loop works. The

[384:45]

initialization happens once at the very

[384:47]

beginning. Then you check the boolean

[384:49]

expression. Then if it's true, you do

[384:52]

the code. Then you do the update. Then

[384:54]

you check the boolean expression. Then

[384:55]

you do the code. update boolean

[384:57]

expression you do the code but every

[384:59]

time you evaluate this boolean

[385:01]

expression you're asking does ah is i

[385:03]

less than the ster length of s but this

[385:05]

is a function call like you are

[385:06]

literally using sterling again and again

[385:09]

and again and like a crazy person you're

[385:10]

asking the computer what's the length of

[385:12]

s what's the length of s what's the

[385:13]

length of s it's not going to change

[385:15]

it's going to be the same no matter what

[385:17]

so how can we fix this well I could

[385:20]

solve this in a couple of ways like I

[385:23]

could for instance down here do int n

[385:25]

equals stir lang of s and store it in a

[385:28]

variable n and just do that. I think

[385:30]

that eliminates the inefficiency because

[385:32]

now I calculate the length of s once.

[385:35]

It's not going to change nor is my

[385:37]

variable. So I can now use and reuse

[385:39]

that variable. It's just saving me a

[385:41]

little bit of time, you know,

[385:42]

microsconds maybe. But when you're

[385:43]

writing bigger programs and you're doing

[385:45]

things in loops, if that loop is running

[385:46]

not three times or five, but a million

[385:48]

times, uh, millions of times, all of

[385:51]

those microsconds, milliseconds might

[385:53]

very well add up. But it turns out

[385:54]

there's some syntactic tricks we can do

[385:56]

too. I alluded to this earlier. If you

[385:59]

want to initialize not one variable but

[386:02]

two, you can actually do it all before

[386:05]

the first semicolon like that. So now on

[386:08]

line 9, I'm declaring a variable called

[386:09]

i and setting equal to zero. And I'm

[386:11]

declaring a second variable called n,

[386:14]

also the same type, int, and setting it

[386:16]

equal to the length of s. And now I can

[386:19]

use that again and again. Now, as an

[386:22]

aside, this is a little bit of a white

[386:23]

lie because smart compilers nowadays are

[386:25]

so advanced that they will notice that

[386:27]

you're calling Sterling again and again

[386:28]

inside of a loop and they will just fix

[386:30]

this for you unbeknownst to you. But

[386:32]

it's representative of a class of

[386:34]

problems that you should be able to spot

[386:36]

with your own human eyes and avoid

[386:37]

altogether so that you don't waste more

[386:39]

time and more compute and more money in

[386:42]

some sense than you might otherwise need

[386:44]

to in this case. Any questions on that

[386:48]

there? Optimization. Yeah,

[386:53]

>> you do not say int. Again, the

[386:55]

constraint is that you have to use the

[386:57]

same data type for all of your

[386:58]

initialization. So, you better hope that

[387:00]

you only want ins otherwise you got to

[387:02]

pull it out and do what I did earlier.

[387:04]

Good question.

[387:06]

Others on this?

[387:11]

Yeah.

[387:11]

>> When does it spaces?

[387:14]

>> When does it account for spaces? A space

[387:17]

is just uh character asky character

[387:20]

number 32. So there's nothing special

[387:22]

about it. It's sort of invisible but it

[387:24]

is there. It is treated like any other

[387:27]

character. There's no special accounting

[387:28]

whatsoever. The null character which is

[387:30]

also invisible is special because print

[387:33]

f and sterling know to look for the end

[387:36]

of that variable the end of that value

[387:38]

as such. All right, let's try one other

[387:41]

demonstration of some of these ideas

[387:43]

here. Let me go into uh a another file

[387:46]

that we'll create called how about

[387:48]

uppercase C. Let's write a super simple

[387:50]

program that like uppercases a string

[387:52]

that the human types in and see how we

[387:53]

can do this sort of good, better, and

[387:54]

best. So I'm going to call this file

[387:56]

uppercase C. Inside of this file, let's

[387:58]

use our now friends include CS50.h.

[388:01]

Let's do include standard io.h. Let's

[388:04]

then include lastly, how about uh

[388:08]

string.h.

[388:09]

And the goal here inside of main is

[388:12]

going to be to get a string from the

[388:16]

user. So string s equals get string. And

[388:18]

we're going to ask the user for a before

[388:20]

string representing what it is they

[388:22]

typed before we uppercase everything.

[388:24]

Then I'm going to go ahead after that

[388:26]

and print out just as a placeholder

[388:28]

after and two spaces just to be nitpicky

[388:31]

so that the text lines up vertically on

[388:32]

the screen. Now I'm going to do the

[388:34]

following for int i= z n equals sterling

[388:38]

lang of s semicolon i less than n just

[388:42]

like before i ++. So I'm just kicking

[388:44]

off a loop that's going to iterate over

[388:47]

the string the human typed in. Now if my

[388:49]

goal in life is to change the user's

[388:53]

input from lowercase if indeed in lower

[388:55]

case to uppercase let's just express

[388:58]

that literally. If the current character

[389:01]

in the string, so s bracket i is greater

[389:05]

than or equal to quote unquote a and s

[389:10]

bracket i is less than or equal to quote

[389:13]

unquote z using single quotes. This is

[389:16]

arguably a very clever way of expressing

[389:17]

the question is it lowercase. We know

[389:20]

from our ASKI chart from week zero that

[389:23]

uh the ASKI chart has uh not only

[389:25]

numbers representing all the uppercase

[389:27]

letters but also numbers representing

[389:28]

all the lowercase letters. Lowerase A

[389:30]

for instance is 97 and they are all

[389:32]

contiguous thereafter. So we can

[389:34]

actually treat just like we did before

[389:37]

chars as ins and ins as chars and sort

[389:39]

of ask mathematical questions about

[389:41]

these chars and say is s bracket i

[389:44]

between a and z inclusive. So if it is

[389:48]

lowercase and I'll add a comment here

[389:49]

for clarity. If S bracket I is lowercase

[389:53]

what do we want to do? We want to force

[389:54]

it to uppercase. So this is a little

[389:56]

trick I can do as follows. Print f the

[389:59]

current character. But let's do some

[390:00]

math on it. Let's change s bracket i by

[390:04]

subtracting some value. Well might that

[390:07]

value be? Well recall from week zero our

[390:10]

asky chart here. And let's focus for

[390:12]

instance on the lowercase letters here

[390:14]

and the uppercase letters here. What's

[390:16]

the distance between all upper and

[390:18]

lowercase letters? It's 32, right? And

[390:20]

the lowercase letters are bigger. So, it

[390:22]

stands to reason if I just subtract 32

[390:24]

from the lowercase letter, it's going to

[390:26]

immediately get me to the uppercase

[390:28]

version thereof. So, this is kind of

[390:30]

cool. So, I can actually go back to VS

[390:32]

Code and I can literally subtract the

[390:34]

number 32 in this case because ASKI is a

[390:37]

standard. It's not going to change.

[390:39]

else. If the letter is not lowercase,

[390:42]

I'm just going to go ahead and print it

[390:43]

out unchanged without doing any

[390:47]

mathematics at all to it. And I'll make

[390:49]

clear with a comment. Uh, else if not

[390:53]

lowercase makes clear what's going on

[390:55]

there. All right, let me go ahead and

[390:57]

make uppercase in my terminal window.

[390:58]

Dot sluppercase. Let's type in my name

[391:02]

all lowercase. And I get back David. H,

[391:04]

minor bug. Couple bugs actually. Let me

[391:08]

fix my spacing. I think I want another

[391:10]

space after the word after. And at the

[391:12]

very bottom of my program, I think I

[391:14]

want a back slashn. Now, let's rerun uh

[391:18]

make unuppercase dot /upercase enter

[391:20]

dab. And now it's forcing it all to

[391:23]

uppercase. Meanwhile, if I do it once

[391:25]

more and type in name capitalized, it's

[391:28]

still going to force everything else to

[391:30]

uppercase. Questions?

[391:32]

>> You're spacing for the after.

[391:34]

>> Oh, I'm an idiot. Okay, thank you.

[391:37]

Yes. Uh I misspelled after otherwise my

[391:40]

lining my alignment would have worked.

[391:42]

So let's do this again. Make uppercase

[391:43]

if only so that we can prove it's the

[391:46]

same dab and all lowercase. And there we

[391:48]

go. That was thank you the intent. All

[391:51]

right. So it's kind of a little trick

[391:52]

but this is kind of tedious, right? Like

[391:54]

Microsoft Word, Google Docs all have the

[391:56]

ability to toggle case from uppercase to

[391:58]

lowerase or lowerase to uppercase. It's

[392:00]

kind of annoying that you have to write

[392:01]

this much code to achieve something so

[392:03]

simple seemingly and so commonplace.

[392:05]

Well, it turns out there's a better

[392:07]

approach here, too. In addition to there

[392:09]

being the string library, there's also

[392:11]

the cype library in cype.h, another

[392:14]

header file, there's a whole bunch of

[392:16]

other functions that are useful that

[392:17]

relate to characters uh characters uh in

[392:21]

ASI. So, for instance, if we go ahead

[392:23]

and use this as follows, I'm going to go

[392:25]

ahead at the top of my file here and

[392:27]

include now cype.h. It turns out there's

[392:31]

going to be functions via which I can

[392:34]

actually ask these questions myself. For

[392:36]

instance, in this next version of the

[392:38]

program, I don't need to do any of this

[392:40]

clever but pretty verbose math. I can

[392:43]

just say if the is lower function which

[392:47]

comes from the cype library passing in s

[392:49]

bracket i returns true, we'll then

[392:52]

convert the letter to lower uppercase by

[392:56]

subtracting 32. But you know I don't

[392:57]

even need to do this mental math or math

[392:59]

in code. I can also from the cype

[393:02]

library use a function called to upper

[393:05]

which takes as input a character like s

[393:07]

bracket i and let someone else's

[393:09]

function do the work for me. So let me

[393:12]

go back down to my terminal window here.

[393:14]

Let me make uppercase now dot /upercase

[393:17]

enter before dab ID. This now works too.

[393:20]

But if I really dig into the

[393:21]

documentation for the cype library,

[393:23]

you'll see that you can just use the is

[393:26]

lower function on any character and it

[393:28]

will very intelligently only uppercase

[393:30]

it if it is actually lowercase. So

[393:32]

someone else years ago wrote the

[393:34]

conditional code that checks if it's

[393:36]

between little A and little Z. So

[393:39]

knowing this, and you would see that

[393:40]

indeed in the documentation, I don't

[393:42]

even need this else. I can instead just

[393:45]

get rid of this whole conditional,

[393:47]

tighten my code up significantly here

[393:49]

and simply say print f using percent c

[393:53]

the two upper version of that same

[393:55]

letter and let the function itself

[393:57]

realize if it's uppercase pass it

[393:59]

through unchanged if it's lowercase

[394:01]

change it first and then return it. So

[394:03]

now if I open my terminal window again

[394:05]

and clear it make uppercase dot

[394:08]

slashupcase enter dav ID and we're back

[394:11]

in business. So again, demonstrative of

[394:12]

how if you find that coding is becoming

[394:15]

tedious or you're solving a problem that

[394:16]

like surely someone else has solved,

[394:18]

odds are there is in fact a library

[394:21]

function for whether it's from CS50 or

[394:23]

from the standard library that you

[394:24]

yourselves can use. Um and unlike the

[394:27]

CS50 library, which is indeed CS50

[394:30]

specific, which is why Clang needed to

[394:32]

know about -L CS50, many of these

[394:34]

libraries just automatically work. You

[394:37]

don't need to link in the cype library.

[394:39]

you don't need to link in other

[394:40]

libraries. Um, but non-standard

[394:42]

libraries like CS50's training wheels

[394:44]

for the first few weeks, we do need to

[394:46]

do that. But make is configured to do

[394:48]

all of that automatically for you.

[394:53]

All right, in our final minutes

[394:56]

together, let's go ahead now and reveal

[394:58]

some of the details we've been rubbing

[395:00]

um uh sweeping under the rug about

[395:02]

Maine. I asked on week one that you just

[395:05]

sort of take on faith that you got to do

[395:07]

the void, you got to do the int, you got

[395:08]

to do the void and all of that. Well,

[395:10]

let's see why that actually is. So, main

[395:12]

is special in so far as in C. It is the

[395:14]

function that will be called

[395:15]

automatically after you've compiled and

[395:18]

then run your code just because not all

[395:20]

languages standardize the name of the

[395:21]

function, but C and C++ and Java and

[395:24]

certain other ones do. In this case,

[395:26]

here is the most canonical simple form

[395:28]

of main. We know that including standard

[395:31]

io.h H just gives us access to the

[395:33]

prototypes for functions like print f.

[395:35]

But what's going on with int and what's

[395:38]

going on with void? Well, void in

[395:40]

parenthesis here just means that main

[395:42]

and in turn all of the programs we've

[395:44]

written up until this moment do not take

[395:47]

command line arguments. Literally every

[395:49]

program we've written /

[395:51]

a.outhello/scores

[395:54]

dot sl everything else. I have never

[395:56]

once typed another word after the name

[395:59]

of our programs that we've written in

[396:00]

class. That is because every program has

[396:03]

void inside of these parenthesis telling

[396:06]

the computer this program does not take

[396:08]

command line arguments, words after the

[396:11]

program's name. That is different from

[396:13]

make and code and cd and other commands

[396:16]

that you've typed with words after them

[396:19]

their names at the prompt. But it turns

[396:21]

out the other supported syntax for the

[396:25]

main function in C can look like this

[396:27]

too, which at a glance looks like kind

[396:29]

of a mouthful, but it just means that

[396:31]

main can take zero arguments or it can

[396:33]

take two. If it takes two, the first is

[396:37]

an integer and the second is an array of

[396:41]

strings. By convention, those inputs are

[396:44]

called arg and arg. arg is the count of

[396:47]

arguments that are typed after the pro

[396:49]

uh after the program's name. Arg is the

[396:51]

argument vector aka array of actual

[396:55]

words. In other words, now that we have

[396:56]

the ability to use arrays, we can get

[397:00]

zero or one or two or three or more

[397:02]

words from users at the prompt when they

[397:05]

run our own programs. So what do I mean

[397:07]

by this? We can now write programs that

[397:10]

actually have command line arguments as

[397:12]

follows. Let me go into VS Code here and

[397:14]

close our old program uppercase. Let's

[397:17]

write a new simpler program here in my

[397:19]

terminal called greet C and just greet

[397:22]

the user in a couple of different ways.

[397:24]

So I'm going to include initially CS50.h

[397:27]

and then I'm going to include standard

[397:28]

io.h here. Then I'm going to say int

[397:31]

main void without introducing anything

[397:33]

new just yet. I'm going to ask the user

[397:35]

like we did last week for a return value

[397:38]

from get string asking them what's your

[397:41]

name as we've done so many times. Then

[397:43]

I'm going to say print f hello percent s

[397:46]

back slashn spitting out their answer as

[397:49]

follows. Same program as last week again

[397:51]

I'm going to make greet. I'm going to

[397:53]

say /greet and I'm prompted now for my

[397:56]

name. I hit enter. Notice that I did not

[397:59]

take any command line arguments. The

[398:01]

only command I ran was dot / greet no

[398:04]

other words. Let's now use this new

[398:07]

trick and actually let the user type

[398:09]

their name when they're running my

[398:10]

program rather than waste their time by

[398:12]

using getstring and prompting them. Let

[398:14]

me go into my editor here. Let's get rid

[398:17]

of the CS50 library. Let's get rid of my

[398:19]

use of get string and let's simply

[398:22]

change void to int arg c then string

[398:27]

argv open bracket close bracket. That's

[398:30]

all down here. Let's simply print out

[398:34]

argv bracket 1 for reasons we'll soon

[398:37]

see. The only change then I'm making

[398:40]

really is changing the prototype for

[398:42]

main from the first version which we've

[398:44]

been using for like a week and a bit now

[398:46]

to the second version which is the only

[398:48]

other version supported. I'm going to go

[398:49]

back to my terminal window now. Make

[398:52]

greet and darn it. I shouldn't so close.

[398:58]

Why did I make uh how do I fix the

[399:00]

mistake I accidentally made? Yeah, in

[399:02]

back. Oh, no. In front.

[399:07]

>> Yes, I should have kept the CS50 library

[399:09]

because it's in the CS50 library that

[399:11]

string is defined. So, include CS50.h.

[399:14]

In week four, we will delete that line

[399:17]

for real and actually show you what

[399:18]

string actually is. I promised at the

[399:20]

start of class that string is a term of

[399:22]

art, but it's not a keyword in C, but it

[399:25]

we'll see what it means in a couple of

[399:26]

weeks time. Okay, let me fix this. make

[399:28]

greet dot slashgreet but now I'm gonna

[399:31]

type before I even hit enter my actual

[399:34]

name and when I hit enter now I see

[399:37]

hello David if I instead dot /g greet

[399:40]

kelly enter now I see hello Kelly if I

[399:44]

do nothing like greet enter I just see

[399:46]

hello null which is not the same null as

[399:48]

before n this is n u lll for reasons

[399:51]

we'll come back to before long but

[399:53]

clearly print f knows something's going

[399:54]

on there's no actual word there. Why

[399:58]

though did I do arg bracket one? Well,

[400:01]

it turns out that just as a feature of

[400:03]

C, if I recompile this program and do

[400:06]

dot /greet and type in nothing else, I'm

[400:12]

going to see something kind of curious.

[400:14]

Hello.

[400:16]

Because automatically the zero location

[400:19]

in the arg variable will automatically

[400:22]

contain the program's own name. Why is

[400:23]

this useful? If you ever want to do

[400:25]

something self-referential like thanks

[400:26]

for running my program or you want to

[400:28]

show documentation for your program and

[400:30]

the name of your program that it depends

[400:32]

on whatever the file itself is called,

[400:34]

you can use argv bracket zero which will

[400:36]

always contain the program's name no

[400:38]

matter what the file has been named or

[400:41]

renamed to. But we can fix that null

[400:43]

issue now in a couple of ways. So arg c

[400:45]

is the other input that I said now can

[400:47]

exist which is the count of arguments at

[400:49]

the prompt. So if I want to check if the

[400:51]

user actually typed their name, I could

[400:52]

say something like if arg c equals

[400:54]

equals 2. Well then and only then go

[400:58]

ahead and print out their name. Else

[401:00]

let's just do some clever default like

[401:02]

print f quote unquote hello world or

[401:05]

heck nothing at all. This version of the

[401:07]

program now is a little smarter because

[401:08]

when I run make greet and dot /gre of my

[401:11]

name works exactly as intended. But if I

[401:13]

forget and only dot slashgreet it's

[401:15]

going to say hello world. Moreover, if I

[401:17]

don't quite cooperate and I say David

[401:19]

Men enter, it similarly just ignores me

[401:22]

because arg count is not two anymore.

[401:24]

It's now three. So, arg contains the

[401:27]

total numbers of words at the prompt,

[401:30]

but the first one is always the

[401:31]

program's name. Question.

[401:44]

>> Sorry. Can you say that once a little

[401:45]

louder?

[401:55]

Why is it information that we just have

[401:56]

or

[402:05]

>> Oh, so the short answer is just because

[402:07]

like the definition of C, if you look up

[402:10]

the documentation for C, you can either

[402:12]

define main as taking no arguments with

[402:14]

the word void

[402:16]

Or you can specify that main can take

[402:18]

two arguments and the compiler and the

[402:20]

operating system will just ensure that

[402:22]

if you provide two those two variables

[402:25]

arg will be filled with those two val

[402:28]

values automatically.

[402:31]

Someone else decided that though that's

[402:33]

just the way it works. You can't come up

[402:34]

you can't put three there. You can't put

[402:36]

four there. You can change the names of

[402:37]

those variables but not the types

[402:40]

because of this convention. So there's

[402:42]

one last feature of main then it's the

[402:44]

actual value it returns. Up until now

[402:46]

every program I've written starts with

[402:48]

int main something. Int main something.

[402:50]

What is that int? We have yet to use it.

[402:52]

Technically the value that main returns

[402:55]

is going to be called a so-called exit

[402:56]

status which is a numeric status that

[402:58]

indicates success or failure. Numbers

[403:00]

are everywhere in the world of

[403:01]

computing. So for instance here's a

[403:02]

screenshot from Zoom whereby if

[403:04]

something goes wrong with Zoom like you

[403:05]

have bad internet connectivity or

[403:06]

something like that you might see an

[403:08]

error code like 1132. That means nothing

[403:10]

to normal people unless you Google it,

[403:13]

look up the documentation, but it means

[403:14]

something very much to the software

[403:16]

engineers who wrote this code because

[403:18]

they know, oh shoot, 1132 means this

[403:20]

error and they probably have a

[403:21]

spreadsheet or a cheat sheet somewhere

[403:23]

that converts those codes to actually

[403:25]

useful error messages. And frankly, in a

[403:27]

better world, they would just tell you

[403:28]

what the problem is rather than just say

[403:30]

report the problem and mention this

[403:31]

number. That said, on the web, odds are

[403:34]

you're familiar with this number 404,

[403:35]

which is also a weird thing for so many

[403:37]

normal people to know, but this

[403:38]

generally means file not found. It's a

[403:40]

numeric code that signifies that

[403:41]

something has gone wrong. Exit status

[403:43]

isn't quite this, but it's similar in

[403:45]

spirit. In Maine, you can return a value

[403:48]

like zero or one or two or something

[403:50]

else to indicate whether something was

[403:52]

successful or not. By convention, a

[403:55]

program, a function like Maine returns

[403:56]

zero on success if all is well. And that

[404:00]

leaves you then with like several

[404:01]

hundred possible things that can go

[404:03]

wrong because you could return one to

[404:05]

signify one thing, two to return

[404:07]

another, three to signify another, and

[404:08]

so long as you have a spreadsheet or a

[404:10]

cheat sheet or something, you can just

[404:11]

keep track as the programmer as to what

[404:13]

error means what. So what does this mean

[404:16]

in real terms? Well, if I go over to VS

[404:17]

Code here, let me implement a relatively

[404:19]

simple program, our last called

[404:22]

status.c.

[404:23]

So in status C, I'm going to go ahead

[404:26]

and use the CS50 library at the top, the

[404:29]

standard IO library at the top, and then

[404:32]

inside of int main and with our new uh

[404:36]

format int arg c string arg v square

[404:41]

brackets inside of main, I'm going to

[404:43]

now do the following. If arg c does not

[404:46]

equal to, then I'm going to go ahead and

[404:49]

print out this time a warning. I'm not

[404:51]

going to have some silly default like

[404:52]

hello world. Let's tell the user that

[404:54]

they didn't use my program correct. And

[404:55]

I'm going to say print f missing command

[404:58]

linear argument. And we'll assume they

[405:01]

know what that means. Then to signify an

[405:04]

error, I'm going to say return one. It

[405:05]

could be two, it could be three, but

[405:07]

this is the first possible error. So I'm

[405:08]

going to start simple with one.

[405:10]

Otherwise, if arg does equal to and I

[405:13]

get to this part of my code, I'm going

[405:15]

to say hello, percent s back slashn and

[405:18]

pass in argv bracket 1 just like before.

[405:21]

And just to be super specific, I'm going

[405:24]

to return zero to tell the computer, the

[405:26]

operating system, that this is success.

[405:28]

Zero signifies success. Any other value

[405:31]

signifies error. Let's make status now.

[405:34]

Let's do dot /st status. And this is a

[405:36]

little magical, but let me go ahead and

[405:39]

cooperate initially. I'm going to type

[405:41]

in my name David. And I'm going to see

[405:43]

hello, David. Uh most people wouldn't

[405:46]

know this but among the commands you can

[405:47]

type at your terminal are this one here

[405:50]

and the TFS and II the TAS and II would

[405:52]

do something like this. We after running

[405:54]

your code can do echo space dollar sign

[405:57]

question mark and we can see secretly

[405:59]

the return value that your program

[406:02]

returned zero in this case. Meanwhile if

[406:05]

we do this again dot slatus uh dot slash

[406:09]

uh status and let me not type my name

[406:11]

this time. When I do this, I see missing

[406:14]

command line argument. What value should

[406:16]

the code have returned? Then one. So

[406:19]

let's see echo dollar sign question

[406:21]

mark. There's the one. So even after

[406:24]

just one week of CS50, if you've ever

[406:25]

wondered how check 50 knows if your code

[406:27]

was correct or not, among the ways we

[406:30]

check for that is by checking this

[406:32]

semi-secret status code, this exit

[406:34]

status, which isn't really a secret.

[406:35]

It's just not displayed to normal people

[406:37]

because it's not all that enlightening

[406:38]

unless you're the software developer who

[406:40]

wrote the code in question. But this

[406:42]

means we could return one in some cases

[406:44]

or two in other cases or three or four

[406:47]

in yet others. And these command line

[406:49]

arguments are sort of everywhere. And in

[406:50]

fact, a program I skipped over a moment

[406:52]

ago was going to be this. There's no uh

[406:55]

academic value to what you're about to

[406:56]

see. But uh another program that takes

[406:59]

command line arguments is known as cows.

[407:01]

And this is sort of very famous in

[407:02]

computing circles because it's been on

[407:04]

systems for many years. Cowsay is a

[407:06]

program that allows you to type in a

[407:08]

word after the prompt like moo and it

[407:11]

will print out what's called asky art.

[407:13]

An adorable little cow with a speech

[407:14]

bubble that says moo. So kind of

[407:16]

evocative of like scratch, but it takes

[407:18]

other command line arguments, not just

[407:20]

the words that you want to come out of

[407:21]

its mouth, but even the appearance that

[407:23]

you want it to have. So for instance, I

[407:25]

can say -f duck and run it again. Enter.

[407:30]

And now I have a little cute duck saying

[407:32]

moo, which is a bit of a bug. So let me

[407:34]

change that to quack for instance

[407:36]

instead. And again no academic value

[407:38]

here. It's just fun to now play with the

[407:40]

various options. But if we really want

[407:41]

to have fun with this, we can do another

[407:43]

one. So cow say-f dragon. And we can say

[407:46]

something like raar. And now we have

[407:49]

this crazy dragon appearing on the

[407:51]

screen. Which is to say again no value

[407:53]

here. It's just fun to play with command

[407:55]

line arguments sometimes. And how is

[407:57]

cows doing this? Well, someone wrote

[407:59]

code maybe in C or some other language

[408:01]

using arg c and argv and poking around

[408:04]

at their values and maybe a conditional

[408:05]

that says if the -f value is dragon then

[408:09]

print this graphic else if the value is

[408:10]

duck then print this other one. It all

[408:12]

boils down to the same fundamentals of

[408:14]

week zero of functions and conditionals

[408:16]

and loops and boolean expressions and

[408:17]

the like. It's just being composed into

[408:19]

more and more interesting things. And

[408:21]

indeed in closing among the other

[408:23]

interesting things we'll play with this

[408:24]

week to come full circle is that of

[408:26]

cryptography. the art of scrambling

[408:27]

information so as to have secure

[408:29]

communication. So important nowadays

[408:31]

with passwords and credit card numbers

[408:34]

and personal messages that you might

[408:35]

want to send and we'll have you explore

[408:37]

through code some of the algorithms via

[408:39]

which you yourselves can encrypt

[408:41]

information. And there's a number of

[408:43]

ways we can do this form of encryption

[408:44]

and they all boil down to this mental

[408:46]

model. You've got some input like the

[408:47]

message you want to send and you want to

[408:49]

incipher it somehow, encrypt it somehow

[408:51]

so that no one knows what message you've

[408:53]

sent. So you want your plain text, which

[408:55]

is the human readable version in English

[408:57]

or any other language to become cipher

[408:59]

text ultimately. So the code you'll be

[409:01]

writing this week is inside of this

[409:03]

black box some kind of cipher, an

[409:05]

algorithm that encrypts information so

[409:07]

that you can do exactly this. Now the

[409:10]

catch is that you can't just give it

[409:12]

plain text and run it through an

[409:13]

algorithm and get cipher text because

[409:15]

you need to somehow have a secret

[409:17]

typically for encryption to work. Like

[409:19]

if I'm going to send a message to

[409:20]

someone in back, well, I could just

[409:22]

randomize the letters that I'm writing

[409:24]

down. But how would they know how to

[409:26]

reverse that process? Probably what we

[409:28]

need to do is agree in advance that you

[409:29]

know what, I'm going to change every A

[409:31]

to a B and every B to a C and a C to a D

[409:33]

and a Z to an A. I'll wrap back around

[409:36]

at the end of the uh the alphabet. It's

[409:38]

not very sophisticated, but who know

[409:40]

middle school teacher if they intercept

[409:42]

two kids passing notes in class are

[409:44]

going to waste time trying to figure out

[409:45]

this cipher. But it does presuppose that

[409:48]

there's a secret between them, the

[409:49]

number one in that case, because I'm

[409:50]

changing every letter by one place. So

[409:53]

how might this work? Well, if I want to

[409:55]

encrypt the word hi, hi exclamation

[409:57]

point and my secret key with someone

[409:59]

that I've come up with in advance is

[410:00]

one. I should send the cipher text i j

[410:05]

exclamation point. Now, this is a simple

[410:06]

cipher, so I'm not really encrypting the

[410:08]

punctuation, which may or may not be a

[410:10]

good thing, but I am encrypting at least

[410:12]

the alphabetical letters. But what does

[410:14]

the recipient then have to do to decrypt

[410:16]

this message? When they see on paper I J

[410:18]

exclamation point, how do they know what

[410:20]

I said? Well, they use that same key but

[410:22]

subtract. So B becomes A, C becomes B, A

[410:26]

becomes Z and so forth. Essentially

[410:28]

inverting the key from positive one to

[410:30]

negative 1. Of course, slightly more

[410:33]

secure than uh a cipher of one, a key of

[410:36]

one would be 13. And in fact, in

[410:38]

computing circles, 13 has special

[410:41]

significance. ROT 13, RO T13 is an

[410:44]

algorithm that's been used for many

[410:45]

years online just to sort of avoid

[410:47]

spoilers. Like Reddit might do this or

[410:49]

other websites where they want you to

[410:51]

have to do some effort to see what the

[410:53]

message says. But it's not all that

[410:54]

hard. You just have to click a button or

[410:56]

write the code that actually does this.

[410:58]

But if you use 13 instead, you wouldn't

[411:00]

get uh J uh you wouldn't get I J. You'd

[411:03]

get UV because U and V are 13 places

[411:06]

away from H and I respectively. But

[411:08]

again, we're not touching the

[411:09]

punctuation. Or we could send something

[411:11]

more personal like I love you and the

[411:13]

message comes out like that. Slightly

[411:16]

more secure than that would be rot 26.

[411:18]

No.

[411:20]

>> No. Why? Because it's the same thing. It

[411:23]

literally rotates all the way around. A

[411:25]

becomes a, b becomes b. So there's a

[411:27]

limit to this. But more seriously, that

[411:29]

speaks to just how strong this

[411:32]

encryption is or is not. Because if you

[411:34]

think about this now from an adversar's

[411:36]

perspective, like the teacher in the

[411:37]

room intercepting the slip of paper, how

[411:39]

much work do they need to do? Well, they

[411:41]

just try all possibilities. Key of one,

[411:44]

key of two, key of three, dot dot dot,

[411:46]

key of 25. And at some point, they will

[411:48]

see clearly that they guessed the key,

[411:50]

which means that cipher is not very

[411:52]

secure. Nonetheless, what we're talking

[411:54]

about is historically known as the

[411:55]

Caesar cipher because back in the day,

[411:58]

when Caesar was communicating by uh by

[412:01]

uh by legend uh with his generals, if

[412:04]

you're the first human on Earth to come

[412:06]

up with encryption or come up with this

[412:07]

specific cipher, it doesn't really

[412:08]

matter how not complex it is if no one

[412:10]

else knows what's going on. Nowadays,

[412:13]

it's not hard at all to write some C

[412:15]

code or any other language that could

[412:16]

just brute force their way through this.

[412:18]

So there are much more sophisticated

[412:19]

algorithms nowadays than simple

[412:21]

rotations of letters of the alphabet as

[412:24]

we'll soon see. But when it comes to

[412:25]

decryption, it really is just a matter

[412:27]

of reversing that process. So this

[412:28]

message here, if we rotate all the

[412:31]

letters in the opposite direction by

[412:32]

subtracting one, will be our final

[412:34]

flourish for today. There's a bit of a

[412:35]

hint there which will reveal that this

[412:37]

message and our final words for us as

[412:39]

the clock strikes 4:15 is going to be

[412:42]

the U becomes T and the I becomes H. Um,

[412:47]

this I'm the only one. This is amusing.

[412:49]

H I S W A S C50. And this was CS50.

[412:57]

We'll see you next time.

[413:13]

Heat. Heat.

[413:35]

Heat. Heat.

[413:56]

Heat.

[414:16]

Heat.

[414:21]

Ow.

[414:37]

Black.

[414:40]

B.

[414:46]

W.

[415:10]

Heat.

[415:13]

Heat. Heat.

[415:46]

All right, this is CS50. This is week

[415:49]

three. And this was an artist rendition

[415:52]

of what various sorting algorithms look

[415:54]

and sound like. Recall from week zero

[415:56]

that an algorithm is just step-by-step

[415:58]

instructions for solving some problem to

[416:00]

sort information as in the real world

[416:01]

just means to order it from like

[416:03]

smallest to largest or alphabetically or

[416:05]

some other heristic. And it's among the

[416:07]

algorithms that we're going to focus on

[416:08]

today in addition to searching which of

[416:10]

course is looking for information as we

[416:12]

did in week zero too. Among the goals

[416:14]

for today are to give you a sense of

[416:15]

certain computer science building

[416:17]

blocks. Like there's a lot of canonical

[416:19]

algorithms out there that most anyone uh

[416:21]

who studied computer science would know,

[416:22]

who anyone who leads a tech interview

[416:24]

would ask. But more importantly, the

[416:26]

goal is to give you different mental

[416:28]

models for and methodologies for

[416:30]

actually solving problems by giving you

[416:32]

a sense of how these uh real world

[416:34]

algorithms can be translated to actual

[416:36]

computers that you and I can control. We

[416:38]

thought we'd begin today uh with an

[416:40]

actual algorithm for sort of taking

[416:42]

attendance. We of course do this with

[416:43]

scanners outside, but we can do it old

[416:45]

school whereby I just use my hand or my

[416:46]

mind and start doing 1 2 3 4 5 6 7 8 9

[416:50]

10 11 12 and so forth. That's going to

[416:53]

take quite a few steps cuz I've got to

[416:54]

point at and recite a number for

[416:56]

everyone in the room. So I could kind of

[416:58]

do what my like grade school teachers

[416:59]

taught me, which is count by twos, which

[417:01]

would seem to be faster. So like 2 4 6 8

[417:04]

10 12 14 16 18 20. And clearly that

[417:07]

sounds and is actually faster. But I

[417:10]

think with a little more intuition and a

[417:13]

little more thought back to week zero, I

[417:15]

dare say we could actually do much

[417:16]

better than that. So, if you won't mind,

[417:19]

I'd like you to humor us by all standing

[417:22]

up in place and think of the number one

[417:25]

if you could and join us in this here

[417:27]

algorithm. So, stand up in place and

[417:31]

think of the number one. So, at this

[417:34]

point in the story, everyone should be

[417:35]

thinking of the number one. Step two of

[417:38]

this algorithm for you is going to be

[417:40]

this. Pair off with someone standing.

[417:42]

Add their number to yours and remember

[417:45]

the sum.

[417:47]

Go.

[417:53]

Okay. At this point in the story,

[417:56]

everyone except maybe one lone person if

[417:59]

we've got an odd number of people in the

[418:00]

room is thinking of what number?

[418:02]

>> Two. Okay. So next step, one of you in

[418:06]

each pair should sit down.

[418:13]

Okay, good. Never seen some people sit

[418:16]

down so fast. So those of you who are

[418:18]

still standing, the algorithm still

[418:20]

going. So the next step for those of you

[418:22]

still standing is this. If still

[418:24]

standing, go back to step two.

[418:28]

Air go repeat or loop if you could.

[418:43]

And notice if you've gone back to step

[418:45]

two, that leads you to step three. That

[418:46]

leads some of you to step four, which

[418:48]

leads you back to step two. So this is a

[418:51]

loop.

[419:00]

Keep going. If still standing, pair off

[419:02]

with someone else still standing. Add

[419:04]

together and then one of you sit down.

[419:06]

So with each passing second, more and

[419:08]

more people should be sitting down

[419:12]

and fewer and few are standing. Okay,

[419:15]

almost everyone is sitting down. You're

[419:16]

getting farther and farther away from

[419:18]

each other. That's okay. I can help with

[419:21]

some of the math at the end here.

[419:25]

All right, I see a few of you still

[419:26]

standing, so I'll help out and I'll I'll

[419:28]

join you together. So, I see you in the

[419:30]

middle here. What's your number?

[419:32]

>> 32.

[419:33]

>> 32. Okay, go ahead and sit down and I'll

[419:36]

pair you off with What's your number?

[419:38]

>> 20. Okay, you can go ahead and sit down.

[419:40]

Uh, who's still

[419:43]

You're still standing?

[419:44]

>> 27.

[419:45]

>> 27. Okay, you can sit down.

[419:48]

>> You guys are still adding together.

[419:49]

Who's going to stay standing? Okay.

[419:50]

What's your number?

[419:52]

>> The worst part is doing like arithmetic

[419:54]

across a crowded room, but

[419:55]

>> 27.

[419:56]

>> 27. Also

[419:57]

>> 47.

[419:57]

>> 47. Okay, you can sit down. Is anyone

[419:59]

still standing? Yeah,

[420:00]

>> 15.

[420:01]

>> Nice. 15. Okay, you can sit down. Anyone

[420:03]

still standing?

[420:06]

Okay, so all I've done is sort of

[420:07]

automate the process of pairing people

[420:09]

up at the end here. When I hit enter, we

[420:12]

should hopefully see Oh, the numbers are

[420:14]

a little What's going on there? There we

[420:16]

go. When I hit enter, we'll add together

[420:19]

all of the numbers that were left. And

[420:22]

if you think about the algorithm that we

[420:23]

just executed, each of you started with

[420:24]

the number one, and then half of you

[420:26]

handed off your number. Then half of you

[420:28]

handed off your number. Then half of you

[420:29]

handed off your number. So theoretically

[420:31]

all of these ones with which we started

[420:33]

should be aggregated into the final

[420:35]

count which if this room weren't so big

[420:36]

would just be in one person's mind and

[420:38]

they would have declared what the total

[420:40]

number of people in the room is. I'm

[420:41]

going to speed that up by hitting enter

[420:42]

on the keyboard. And if your execution

[420:45]

of this algorithm is correct, there

[420:48]

should be

[420:50]

141 people in the room. According to our

[420:52]

old school human though, Kelly, who did

[420:55]

this manually, one at a time, the total

[420:58]

number of people in the room, according

[421:01]

to Kelly, if you want to come on up and

[421:03]

shout it into the microphone, is of

[421:04]

course going to be

[421:05]

>> I don't know, something around 160, I

[421:07]

think.

[421:08]

>> 160. So, not quite the same. Okay, but

[421:10]

that's pretty good. Okay, round of

[421:11]

applause for your your accuracy.

[421:17]

Okay, so ideally counting one at a time

[421:19]

would have been perfectly correct. So,

[421:21]

we're only off by a little bit. Now,

[421:23]

presumably that's just because of some

[421:24]

bugs in execution of the algorithm.

[421:26]

Maybe some mental math didn't quite go

[421:28]

according to plan. But theoretically,

[421:30]

your third and final algorithm wherein

[421:32]

you all participated should have been

[421:33]

much faster than my algorithm or Kelly's

[421:35]

algorithm whether or not we were

[421:36]

counting one at a time or two at a time.

[421:38]

Why? Well, think back to week zero when

[421:40]

we did the whole phone book example,

[421:42]

which was especially fast in its final

[421:44]

form because we were dividing and

[421:45]

conquering, tearing half of the problem

[421:47]

away, half of the problem away. And even

[421:49]

though it's hard to see in a room like

[421:50]

this, it stands to reason that when all

[421:53]

of you were standing up, we took a big

[421:55]

bite out of the first problem and half

[421:56]

of you sat down, half of you sat down,

[421:58]

half of you sat down, and theoretically

[422:00]

there would have been, if you were

[422:01]

closer in in uh space, one single person

[422:04]

with the final count. So let's see if we

[422:06]

can't analyze this just a little bit by

[422:09]

considering what we did. So here's that

[422:11]

same algorithm here. Recall is how we

[422:13]

motivated week zero's demonstration of

[422:15]

the phone book in either digital form as

[422:17]

you might see in an iPhone or Android

[422:19]

device looking for someone for instance

[422:20]

like John Harvard who might be at the

[422:22]

beginning middle or end of said phone

[422:24]

book but we analyze that algorithm just

[422:26]

as we can now this one. So in my very

[422:28]

first verbalized algorithm 1 2 3 4 you

[422:32]

could draw that as a straight line

[422:33]

because the relationship between the

[422:34]

number of people in the room and the

[422:36]

amount of time it takes is linear. It's

[422:38]

a straight line with each additional

[422:40]

person in the room. It takes me one more

[422:42]

step. So if you think to sort of high

[422:44]

school math, there's sort of a slope of

[422:45]

one there. And so this n number denoting

[422:48]

number of people in the room is indeed a

[422:50]

straight line. And on the x-axis, as in

[422:52]

week zero, we have the size of the

[422:54]

problem in people and the time to solve

[422:56]

in steps or seconds or whatever your

[422:57]

unit of measure is. If and when I

[422:59]

started counting two at a time, 2 4 6 8

[423:02]

10 and so forth, that still is a

[423:04]

straight line because I'm taking two

[423:06]

bytes consistently out of the problem

[423:07]

until maybe the very end where there's

[423:09]

just one person left, but it's still a

[423:11]

straight line, but it's strictly faster.

[423:13]

No matter the size of the problem, if

[423:14]

you sort of draw a line vertically,

[423:16]

you'll see that you hit the yellow line

[423:17]

well before you hit the red line because

[423:19]

it's moving essentially twice as fast.

[423:21]

But that third and final algorithm, even

[423:23]

though in reality it felt like it took a

[423:25]

while and I had to kind of bring us to

[423:27]

the exciting conclusion by doing some of

[423:29]

the math, that looked much more like our

[423:31]

third and final phone book example.

[423:33]

Because if you think about it from an

[423:35]

opposite perspective, suppose there were

[423:36]

twice as many people in the room. Well,

[423:38]

it would have taken you all

[423:40]

theoretically just one more step. Now,

[423:42]

granted, one more loop and there might

[423:43]

be some substeps in there, if you will,

[423:45]

but it's really just fundamentally one

[423:46]

more step. If the number of people in

[423:48]

the room quadrupled, four times as many

[423:50]

people, well, that's two more steps.

[423:53]

Equivalently, the amount of time it

[423:55]

takes to solve the attendance problem

[423:58]

using that third infogal algorithm grows

[424:01]

very slowly because it takes a huge

[424:04]

number of more people in the room before

[424:06]

you even begin to feel the impacts of

[424:08]

that uh growth. And so today indeed, as

[424:11]

we talk about not only the correctness

[424:13]

of algorithms, we're going to talk about

[424:14]

the design of algorithms as well. just

[424:16]

as we have code because the smarter you

[424:18]

are with your design the more efficient

[424:21]

your algorithms ultimately are going to

[424:22]

be and the slower their cost is going to

[424:26]

grow and by cost I mean time like here

[424:28]

maybe it's money maybe it's the amount

[424:30]

of storage space that you need any

[424:32]

limited resource is something that we

[424:33]

can ultimately measure and we're not

[424:35]

going to do it very precisely indeed

[424:36]

we're going to use some broad strokes

[424:37]

and some standard mechanisms for

[424:39]

describing ultimately the running time

[424:42]

the amount of time it takes for an

[424:44]

algorithm or in turn code to actually

[424:47]

run. So, how can we do this? Well, last

[424:49]

week recall we set the stage uh for

[424:52]

talking about something called arrays,

[424:53]

which were the simplest of data

[424:55]

structures inside of a computer where

[424:56]

you just take the memory in your

[424:57]

computer and you break it up into chunks

[424:59]

and you can store a bunch of integers, a

[425:01]

bunch of strings, whatever, back to back

[425:03]

to back to back. And that's the key

[425:04]

characteristic for an array. It is a

[425:06]

chunk of memory wherein all of the

[425:08]

values therein are back to back to back.

[425:11]

So, right next to each other in memory.

[425:13]

So we drew this fairly abstractly by

[425:14]

drawing a grid like this and I said well

[425:16]

maybe this is bte zero and this is bte 1

[425:18]

billion whatever the total number amount

[425:20]

of memory is that you have. We zoomed in

[425:22]

and looked at a little something like

[425:23]

this a canvas of memory. We talked about

[425:26]

what and where you can put things. But

[425:28]

today let's just assume that we want 1 2

[425:30]

3 4 5 6 seven chunks of memory for the

[425:33]

moment. And inside of them we might put

[425:35]

something like these numbers here. Well,

[425:37]

the interesting thing about computers is

[425:39]

that even though if I were to ask you

[425:40]

all, find the number 50 in this array. I

[425:43]

mean, our minds quickly see where it is

[425:45]

because we sort of have this bird's eye

[425:47]

view of the whole screen and it's

[425:48]

obvious where 50 is. But the catch with

[425:50]

computers and with code that we write is

[425:54]

that really these arrays, these chunks

[425:56]

of memory are equivalent to a whole

[425:58]

bunch of closed doors. And the computer

[426:00]

can't just have this bird's eye view of

[426:02]

everything. If the computer wants to see

[426:04]

what value is at a certain location, it

[426:07]

has to do the metaphorical equivalent of

[426:09]

going to that location, opening the door

[426:11]

and looking, then closing it and moving

[426:13]

on to the next. That is to say, a

[426:14]

computer can only look at or access one

[426:17]

value at a time. Now, that's in the

[426:19]

simplest form. You can build fancier

[426:20]

computers that theoretically can do more

[426:22]

than that, but all the code we write

[426:23]

generally is going to assume that model.

[426:25]

You can't just see everything at once.

[426:26]

You have to go to each location in these

[426:29]

here lockers, if you will. Starting

[426:31]

today two when we talk about the

[426:32]

locations in memory we're going to use

[426:34]

our old uh zero indexing uh vernacular

[426:38]

that is to say we start counting from

[426:39]

zero instead of one. So this will be

[426:42]

locker zero locker one locker two dot

[426:44]

dot dot all the way up to locker six. So

[426:46]

just ingrain in your mind that if you

[426:48]

hear something like location six that's

[426:50]

actually implying that there's at least

[426:52]

seven total locations because we started

[426:54]

counting at zero. So that's intentional.

[426:56]

Um we don't have in the real world

[426:57]

yellow lockers. So, we're going to make

[426:59]

this metaphor red instead. We do have

[427:01]

these lockers here. And suppose that

[427:03]

within these seven lockers physically on

[427:04]

stage. We've put a whole bunch of money,

[427:07]

uh, monopoly money, if you will, but the

[427:09]

goal initially here is going to be to

[427:11]

search for some specific denomination of

[427:13]

interest and use these physical lockers

[427:14]

as a metaphor for what your computer's

[427:16]

going to do and what your code

[427:18]

ultimately is going to do. If we're

[427:19]

searching for the solution to a problem

[427:22]

like this, the input to the problem at

[427:23]

hand is seven lockers, all of whose

[427:25]

doors are metaphorically closed. The

[427:28]

output of which we want to be a bull.

[427:29]

True or false answer. Yes or no? That

[427:32]

number is there or no it is not. So

[427:34]

inside of this black box today is going

[427:36]

to be the first of our algorithm

[427:38]

step-by-step instructions for solving

[427:39]

some problem where the problem here is

[427:41]

to find among all of these dollar bills

[427:43]

specifically the $50 bill. If we could

[427:46]

get two volunteers to come on up who are

[427:48]

ideally really good at monopoly. Okay.

[427:50]

How about over here in front? And uh how

[427:52]

about let me look a little farther in

[427:54]

back. Okay. Over here there and back.

[427:56]

Come on down. All right. As these uh

[427:58]

volunteers kindly come down to the

[428:00]

stage, we're going to ask them in turn

[428:02]

to search for specifically the $50 bill

[428:05]

that we've hidden in advance. And if uh

[428:07]

my colleague Kelly could come on up too

[428:09]

because we're going to do this twice.

[428:10]

Once searching uh in one with one

[428:13]

algorithm and a second time with

[428:14]

another. Uh let me go ahead and say

[428:16]

hello if you'd like to introduce

[428:18]

yourselves to the group.

[428:19]

>> Hey, I'm Jose Garcia.

[428:22]

>> Hi, I'm Caitlyn Cow.

[428:24]

>> All right, Jose and Caitlyn. Nice to

[428:25]

meet you both. Come on over and let me

[428:27]

go ahead and propose that Jose um the

[428:30]

first algorithm that I'd like you to do

[428:31]

is to find the number 50. And let's keep

[428:33]

it simple. Just start from the left and

[428:34]

work your way to the right. And with

[428:36]

each time you open the door, stand over

[428:38]

to the side so people can see what's

[428:39]

inside and just hold the dollar amount

[428:41]

up for the world to see. All right, the

[428:43]

floor is yours. Find us the $50 bill.

[428:48]

20.

[428:50]

>> Shut it.

[428:50]

>> No, that's good. That's good acting,

[428:51]

too. Thank you. No, you can shut it just

[428:53]

like the computer. All right.

[428:57]

No. Very clear. Thank you.

[429:03]

Still no. $10 bill.

[429:06]

Next locker.

[429:09]

$5 bill. Not going well.

[429:13]

Uh $100 bill, but not the one we want.

[429:18]

This one. H $1 bill. Still no 50. Of

[429:21]

course, you've been sort of set up to

[429:22]

fail, but here, amazing. A round of

[429:25]

applause. Jose found the $50 bill.

[429:29]

All right. So, let me ask you, Jose, you

[429:31]

found the $50 bill. Um, it clearly took

[429:34]

you a long time. Just describe in your

[429:36]

own words, what was your algorithm, even

[429:37]

though I nudged you along.

[429:38]

>> Yeah. So, my algorithm was basically

[429:40]

walk up to the first door available,

[429:42]

open it, check if the dollar bill was

[429:45]

the dollar bill that I was looking for,

[429:46]

and then put it back, and then go to the

[429:48]

next one.

[429:48]

>> Okay. So, it's very reasonable because

[429:50]

if the $50 bill were there, Jose was

[429:53]

absolutely going to find it eventually,

[429:55]

if slowly. In the meantime, Kelly's

[429:57]

going to kindly reshuffle the numbers

[429:58]

behind these doors here. And even though

[430:00]

Jose took a long time here, I mean, what

[430:02]

if Jose like wouldn't have been smart to

[430:04]

start from the other end instead, do you

[430:06]

think?

[430:07]

>> Um, not necessarily because we don't

[430:09]

know if the 50 is going to be at that

[430:11]

end.

[430:11]

>> Exactly. So, he could have gotten lucky

[430:13]

if he sort of flaunted my advice and

[430:15]

didn't start on the left, but instead

[430:16]

started on the right. Boom. he would

[430:18]

have solved this in one step, but in

[430:19]

general that's not really going to work

[430:20]

out. Maybe half the time it will. You'll

[430:22]

get lucky, half the time it won't. But

[430:23]

that's not really a fundamental change

[430:25]

in the algorithm whether you go left to

[430:26]

right, right to left. To Jose's point,

[430:28]

if you don't know anything priori about

[430:30]

the numbers, the best you can probably

[430:33]

do is just go through linearly left to

[430:35]

right or right to left. So long as

[430:37]

you're consistent. Now, could you have

[430:39]

jumped around randomly?

[430:41]

>> Uh, I guess I could have, but if again,

[430:43]

if they weren't in any like specified

[430:45]

order, I don't think it would have

[430:46]

helped either. Yeah. So, in

[430:48]

additionally, if he just jumped around

[430:49]

to random order, they might get lucky

[430:52]

and it might be in the very first one

[430:53]

might have taken fewer steps ultimately,

[430:55]

but presumably you're going to have to

[430:56]

then keep track of like which locker

[430:58]

doors have you opened. So, that's going

[431:00]

to take some memory or space, not a big

[431:02]

deal with seven lockers. But if it's 70

[431:04]

lockers, 700 lockers, even random

[431:06]

probably isn't going to be the best job.

[431:08]

So, let me go ahead and take the mic

[431:09]

away and hand it over to Caitlyn. You

[431:11]

can stay on the stage with us. Caitlyn,

[431:13]

what I'd like you to do is approach this

[431:14]

a little more intelligently by dividing

[431:17]

and conquering the problem, but we're

[431:18]

going to give you an advantage over

[431:20]

Jose. Kelly has kindly sorted the

[431:22]

numbers from smallest to largest from

[431:25]

left to right.

[431:26]

>> So, accordingly, what's your strategy

[431:28]

going to be?

[431:29]

>> Start in the middle.

[431:30]

>> Okay, please.

[431:33]

And go ahead as before and reveal to the

[431:35]

audience what you found. Not the 50, the

[431:37]

20. But what do you know, Caitlyn? At

[431:38]

this point,

[431:38]

>> it'll be in on the left is left.

[431:41]

Correct. So the 20 is going to be to the

[431:43]

left. So where might you go next with

[431:44]

this three locker problem? Let me

[431:47]

propose that you maybe go to the middle

[431:49]

of the three.

[431:51]

>> There we go. The middle of the middle.

[431:53]

Like that would have been good. But

[431:55]

let's

[431:55]

>> Oh no.

[431:56]

>> Oh no. It's a 100 instead. You failed.

[431:58]

But what do you now know?

[431:59]

>> It's in the middle.

[432:01]

>> That I should have just let you. But now

[432:04]

we have a big round of applause for Kayn

[432:05]

for having found the 50 as well. Okay.

[432:11]

So, the one catch with this particular

[432:12]

demo is that because they know

[432:14]

presumably what monopoly money

[432:15]

denominations are because we just did

[432:17]

this exercise and we had the whole cheat

[432:18]

sheet on the board, you probably had

[432:20]

some intuition as to like where the 50

[432:21]

was going to be. even though I was

[432:22]

trying to get you to play along. But in

[432:24]

the general case, if you don't know what

[432:25]

the numbers are and that they're the

[432:27]

specific denominations, but you do know

[432:28]

that they're going from smallest to

[432:30]

largest, going to the middle, then the

[432:32]

middle of the middle, then the middle of

[432:34]

the middle again and again would have

[432:35]

the effect of starting with a big

[432:36]

problem and having it, having it, having

[432:38]

it, just like the phone book as well.

[432:41]

So, thanks to you both. We have these

[432:43]

wonderful parting gifts that we found in

[432:44]

Harvard Square. Uh, if you like

[432:46]

Monopoly, you'll love the Cambridge

[432:47]

edition filled with Harvard Square name

[432:51]

spots. So, but thank you to you both and

[432:52]

a round of applause for our volunteers

[432:54]

here.

[432:59]

>> All right. So, let's see if we can't

[433:01]

formalize a little bit these two

[433:03]

algorithms known as linear search in so

[433:06]

far as Jose was searching essentially

[433:07]

along a line left to right and binary

[433:10]

search by implying two because we were

[433:12]

having that problem in two again and

[433:14]

again and again. So for instance with

[433:16]

linear search from left to right or

[433:18]

equivalently right to left we could

[433:20]

document our pseudo code as follows. For

[433:23]

each door from left to right if the 50

[433:26]

is behind the door well then we're done.

[433:28]

Just return true. That's the boolean

[433:30]

value which was the goal of this

[433:31]

exercise to say yes here is the 50.

[433:34]

Otherwise at the very bottom of this

[433:36]

pseudo code we could just say return

[433:38]

false. Because if you get all the way

[433:40]

through the lockers and you have never

[433:42]

once declared true by finding the 50,

[433:44]

you might as well default at the very

[433:46]

end to saying false. I did not find it.

[433:48]

But notice here, just like in week zero

[433:50]

when we talked about pseudo code for

[433:51]

searching the phone book, my indentation

[433:53]

of all things is actually very

[433:55]

intentional. This version of this code

[433:57]

would be wrong if I instead used our old

[434:00]

friend if else and made this conditional

[434:04]

decision. Why is this code now in red

[434:06]

wrong in terms of correctness? Yeah, if

[434:09]

it's not behind the first door, it'll

[434:10]

return false.

[434:11]

>> Exactly. Because if the number 50 is not

[434:13]

behind the first door, the else is

[434:16]

telling you right then and there, return

[434:17]

false. But as we've seen in CC code,

[434:19]

whenever you return a value, like that's

[434:21]

it for the function. It is done doing

[434:23]

its work. And so if you return false

[434:25]

right away, not having looked at the

[434:26]

other six lockers, you may very well get

[434:28]

the answer wrong. So the first version

[434:30]

of the code where there wasn't an else

[434:32]

but rather this implicit line of code at

[434:34]

the very or this explicit line of code

[434:36]

at the very end that just says if you

[434:37]

reach this line of code return false

[434:40]

that addresses that problem and to be

[434:41]

clear even though it's right after an

[434:43]

indented return true when you return a

[434:46]

value as in C that's it like execution

[434:49]

stops at that point at least for the

[434:51]

function or in this case the pseudo code

[434:52]

in question. All right, so here's a more

[434:55]

computer sciency way of describing the

[434:57]

same algorithm. And even though it

[434:58]

starts to look a little more arcane, the

[435:00]

reality is when you start using

[435:01]

variables and sort of standard notation,

[435:03]

you can actually express yourself much

[435:04]

more clearly and precisely, even though

[435:06]

it might take a little bit of practice

[435:07]

to get used to. Here is how a computer

[435:09]

scientist would express that exact same

[435:11]

idea. Instead of saying for each door

[435:13]

from left to right, we might throw some

[435:15]

numbers on the table. So for i a

[435:18]

variable apparently from the value zero

[435:20]

on up through the value n minus one is

[435:23]

what this shorthand notation means if 50

[435:26]

is behind doors bracket i so to speak.

[435:29]

So now I'm sort of treating the notion

[435:30]

of doors as an array using our notation

[435:34]

from last week. If 50 is behind doors

[435:36]

bracket I return true. Otherwise if you

[435:39]

get through the entirety of that array

[435:42]

of doors you can still return false. Now

[435:45]

notice here n minus one seems a little

[435:47]

weird because aren't there n doors? Why

[435:49]

do I want to go from 0 to n minus one

[435:51]

instead of 0 to n? Yeah,

[435:54]

>> because zero is the first block.

[435:56]

>> Exactly. If you start counting at zero

[435:58]

and you have n elements, the last one is

[436:00]

going to be addressed as n minus one,

[436:02]

not n because if it were n, then you

[436:04]

actually have n + one elements, which is

[436:06]

not what we're talking about. So again,

[436:08]

just a standard notation and it's a

[436:09]

little turser this way. it's a little

[436:11]

more succinct and frankly it's a little

[436:13]

more adaptable to code. And so what

[436:15]

you're going to find is that as our

[436:16]

problem sets and programming challenges

[436:19]

that we assign sort of get a little more

[436:20]

involved, it's often helpful to write

[436:22]

out pseudo code like this using an

[436:24]

amalgam of English and C and eventually

[436:27]

Python code because then it's way easier

[436:29]

after to just translate your pseudo code

[436:31]

into actual code if you're operating at

[436:33]

this level of detail. All right. So, in

[436:36]

the second algorithm, uh, where Caitlyn

[436:38]

kindly searched for 50 again, but Kelly

[436:40]

gave her the advantage of sorting the

[436:42]

numbers in advance. Now, she doesn't

[436:43]

have to just resort to brute force, so

[436:45]

to speak, trying all possible doors from

[436:47]

left to right. She can be a little more

[436:49]

intelligent about it and pick and choose

[436:51]

the locker she opens. And so, with

[436:53]

binary search, as we call that, we could

[436:54]

implement the same pseudo code. We could

[436:57]

implement pseudo code for it as follows.

[436:59]

We might say if 50 is behind the middle

[437:02]

door, then go ahead and return true.

[437:05]

Else if it's not behind the middle door,

[437:07]

but 50 is less than that number behind

[437:10]

the middle door, we want to go and

[437:12]

search the left half. So that didn't

[437:14]

happen in Caitlyn's sense because we

[437:16]

ended up going right. So that's just

[437:17]

another branch here. Else 50 is greater

[437:19]

than what was at the middle door. We

[437:21]

want to search the right half. But

[437:24]

there's going to be one other condition

[437:26]

here that we should probably consider,

[437:27]

which is what is it here? Is it to the

[437:31]

left? Or is it to the right? But there's

[437:33]

another a corner case that we'd better

[437:36]

keep track of. What else could happen?

[437:40]

>> If it's not in the array or really like

[437:42]

we're out of doors, so we can implement

[437:44]

this in a different way. I left myself

[437:45]

some space at the top because I

[437:47]

shouldn't do any of this if there are no

[437:49]

doors to search for. So, I should have

[437:51]

this sort of sanity check whereby if

[437:53]

there's no doors left or no doors to

[437:55]

begin with, let's just immediately

[437:56]

return false. And why is that? Well,

[437:58]

notice that when I say search left half

[438:00]

and search right half, this is

[438:02]

implicitly telling me just do this

[438:04]

again. Just do this again, but with

[438:06]

fewer and fewer doors. And this is a

[438:08]

technique for solving problems and

[438:10]

implementing algorithms that we're going

[438:11]

to end today's discussion on because

[438:13]

what seems very colloquial and very

[438:15]

straightforward. Okay, search the left

[438:17]

half, search the right half is actually

[438:18]

a very powerful programming technique

[438:20]

that's going to enable us to write more

[438:21]

elegant code, sometimes less code to

[438:23]

solve problems such as this. And more on

[438:25]

that in just a little bit. But how can

[438:28]

we now formalize this using some of our

[438:30]

array notation? Well, it looks a little

[438:31]

more complicated, but it isn't really.

[438:34]

Instead of asking questions in English

[438:36]

alone, I might say if 50 is behind doors

[438:38]

bracket middle, this pseudo code

[438:40]

presupposes that I did some math and

[438:42]

figured out what the numeric address,

[438:45]

the numeric index is of the middle

[438:47]

element. And how can I do that? Well, if

[438:48]

I've got seven doors and I divide by

[438:50]

two, what's that? 7id two,

[438:54]

three and a half. Three and a half makes

[438:56]

no sense if I'm using integers to

[438:58]

address this. So maybe we just round

[439:00]

down. So three. So that would be locker

[439:02]

number 0 1 2 3 which indeed if you look

[439:06]

at the seven lockers is in fact the

[439:08]

middle. So this is to say using some

[439:10]

relatively simple arithmetic I can

[439:12]

figure out what the address is the index

[439:14]

is of the middle door if I know how many

[439:16]

there are and I divide by two and round

[439:18]

down. Meanwhile, if I don't find 50

[439:20]

behind the middle door, let's ask the

[439:22]

question. If 50 is less than the value

[439:25]

at the middle door, then let's search

[439:28]

not the left half per se in the general

[439:31]

sense. More specifically, search doors

[439:33]

bracket zero through doors bracket

[439:36]

middle minus one. Otherwise, if 50 is

[439:40]

greater than the value at the middle

[439:41]

door, go ahead and search doors bracket

[439:44]

middle + one through doors bracket n

[439:47]

minus one. Now let's consider these in

[439:49]

turn. So searching the left half as we

[439:51]

described this earlier seems to line up

[439:54]

with this idea like s start searching

[439:56]

from doors bracket zero the very first

[439:57]

one. But why are we searching doors

[440:01]

bracket middle minus one instead of

[440:03]

doors bracket middle.

[440:06]

Yeah

[440:08]

>> middle.

[440:08]

>> Yeah exactly. We already checked the

[440:10]

middle door by asking this previous

[440:11]

question. And so you're just wasting

[440:12]

everyone's time if you divide the half

[440:15]

and still consider that door as

[440:17]

checkable again. And same thing here. We

[440:19]

check middle plus one through the end of

[440:22]

the lockers array because we already

[440:24]

checked the middle one. So same reason

[440:25]

even though it just kind of complicates

[440:27]

the look of the math, but it's really

[440:29]

just using variables and arithmetic to

[440:30]

describe the locations of these same

[440:33]

lockers. But let's consider now what we

[440:35]

mean by running time. The amount of time

[440:37]

it takes for an algorithm to run. and

[440:39]

consider which and why one of these

[440:42]

algorithms is better than the other. So

[440:44]

in general when talking about running

[440:46]

time we can actually use pictures like

[440:47]

this. This is not going to be some like

[440:49]

very low-level mathematical analysis

[440:50]

where we count up lots of values. It's

[440:52]

going to be broad strokes so that we can

[440:54]

communicate to colleagues uh to other

[440:56]

humans generally whether an algorithm is

[440:58]

better than another and how you might

[441:00]

compare the two. So here for instance is

[441:03]

a pictorial analysis of two different

[441:05]

algorithms. It's the phone book from

[441:06]

week zero and then the attendance taking

[441:08]

from today itself. And let's generally

[441:11]

as we've done before sort of label these

[441:13]

things. So the very first algorithm took

[441:14]

n steps in the very worst case if I had

[441:17]

to search the whole phone book or if I

[441:18]

had to count everyone in the room. So

[441:20]

the first algorithm took indeed n steps.

[441:22]

The second algorithm took half as many

[441:24]

plus one maybe but we'll keep it simple.

[441:26]

So we'll call that n /2. And the third

[441:28]

and final algorithm both in week zero

[441:30]

with the phone book and today with

[441:31]

attendance is technically log base 2 of

[441:34]

n. And if you're a little rusty in your

[441:35]

logarithms, that's fine. Just take on

[441:37]

faith that log base 2 alludes to taking

[441:40]

a problem of size n and dividing it in

[441:43]

half and half and half as many times as

[441:44]

you can until you're left with one

[441:46]

person standing or one page in the phone

[441:49]

book. That's how many times you can

[441:50]

divide in half a problem of size n.

[441:53]

Well, it turns out that we're getting a

[441:54]

little more detailed than most computer

[441:56]

scientists t care to get uh when

[441:58]

describing the efficiency of algorithms.

[442:01]

So in fact we're going to start to use

[442:03]

some not common notation instead of

[442:05]

worrying precisely mathematically about

[442:07]

how many steps today's and the future's

[442:09]

algorithms take. We're going to talk in

[442:10]

broader strokes about how many steps

[442:12]

they are on the order of and we're going

[442:14]

to use what's called big O notation

[442:16]

which literally is like a big O and then

[442:19]

some parenthesis and you pronounce it

[442:20]

big O of such and such. So the first

[442:22]

algorithm seems to be in big O of N

[442:25]

which means uh it's on the order of N

[442:27]

steps give or take some. this algorithm

[442:30]

here, you might be inclined to do

[442:32]

something similar. Ah, it's on the order

[442:34]

of n / two steps and ah, this one's on

[442:36]

the order of log base 2 of n steps. But

[442:39]

it turns out what we really care about

[442:41]

with algorithms is how the time grows as

[442:45]

the problem itself grows in size. So the

[442:48]

bigger n gets, the more concerned we are

[442:51]

over how efficient our algorithm is. if

[442:53]

only because today's computers are so

[442:55]

darn fast. Whether you're crunching a

[442:57]

thousand numbers or 2,000 numbers, like

[442:59]

it's going to take like a split second

[443:01]

no matter what. But if you're crunching

[443:02]

a thousand numbers versus a million

[443:04]

numbers versus a billion numbers, like

[443:06]

that's where things start to actually be

[443:08]

noticeable by us humans and we really

[443:10]

start to care about these values. So in

[443:11]

general, when using big O notation like

[443:14]

this, you ignore lower order terms or

[443:17]

equivalently, you only worry about the

[443:19]

dominant term in whatever mathematical

[443:21]

expression is in question. So big O of N

[443:23]

remains big O of N. Big O of N / two.

[443:26]

Eh, it's the same thing really as like

[443:28]

big O N. Like it's not really, but

[443:31]

they're both linear in nature. One grows

[443:33]

at this rate, one grows at this rate

[443:34]

instead. But it's for all intents and

[443:36]

purposes the same. They're both growing

[443:38]

at a constant rate. This one too, ah,

[443:41]

it's on the order of log of n where the

[443:43]

base is who cares. In short, what does

[443:46]

this really mean? Well, imagine in your

[443:47]

mind's eye that we were about to zoom

[443:49]

out on this graph such that instead of

[443:52]

going from 0 to like a million, maybe

[443:55]

now the x-axis is 0 to a billion. And

[443:57]

same thing for the y-axis, 0 to a

[443:59]

million. Let's zoom out. So, you're

[444:00]

seeing 0 to a billion. Well, in your

[444:02]

mind's eye, you might imagine that as

[444:04]

you zoom out, essentially things just

[444:06]

get more and more compressed visually

[444:08]

because you're zooming out and out and

[444:09]

out, but these things still look like

[444:10]

straight lines. This thing still looks

[444:12]

like curved lines, which is to say as n

[444:14]

gets large, clearly this green

[444:17]

algorithm, whatever it is, is more

[444:18]

appealing it would seem, than either of

[444:21]

these two algorithms. And if we keep

[444:22]

zooming out, like at some point, the ink

[444:24]

is going to be so close together that

[444:25]

they all for are for all intents and

[444:27]

purposes pretty much the same algorithm.

[444:29]

So this is to say computer scientists

[444:31]

don't care about lower order terms like

[444:33]

divide by two or base 2 or anything like

[444:36]

that. We look at the most dominant term

[444:37]

that really matters as n gets bigger and

[444:39]

bigger. So that then is bigo notation

[444:43]

and it's something we'll start to use

[444:45]

pretty much recurringly anytime we

[444:47]

analyze or speak to how good or how bad

[444:49]

some algorithm is. So here's a little

[444:51]

cheat sheet of common running times. So

[444:53]

for instance here's our friend big O of

[444:55]

N which means uh the algorithm takes on

[444:56]

the order of n steps. Uh here is one

[444:59]

that takes on the order of login steps.

[445:01]

Here are some others we haven't seen

[445:02]

yet. Some algorithms take n times log n

[445:05]

steps. Some algorithms take n squared

[445:07]

steps and some algorithms just take one

[445:09]

step maybe or maybe two steps or four

[445:12]

steps or 10 but a constant number of

[445:14]

steps. So let me ask of the algorithms

[445:16]

we've looked at thus far for instance

[445:19]

linear search being the very first today

[445:22]

what is the running time of linear

[445:24]

search in big O notation that is to say

[445:28]

if there's n people uh if there's n

[445:30]

lockers on the stage how many steps

[445:33]

might it take us to find a number among

[445:36]

those n lockers big O of yeah

[445:41]

>> big O of N in fact is exactly where I

[445:44]

would put linear search. Why? Well, if

[445:46]

you're using linear search in the very

[445:48]

worst case, for instance, the number

[445:50]

you're looking for, as with Jose, might

[445:51]

be all the way at the end. So, you might

[445:53]

get lucky. It might not be at the very

[445:55]

end, but generally, it's useful to use

[445:57]

this bigo notation in the context of

[446:00]

worst case scenarios because that really

[446:02]

gives you a sense of how badly this

[446:04]

algorithm could perform if you just get

[446:05]

really unlucky with your data set. So e

[446:08]

even though big O really just refers to

[446:10]

an upper bound like how many steps might

[446:12]

it take it's generally useful to think

[446:14]

about it in the context of like the

[446:16]

worst case scenario like ah the number I

[446:17]

care about is actually way over here but

[446:19]

what about binary search even in the

[446:22]

worst case so long as the data is sorted

[446:25]

how many steps might binary search take

[446:27]

by contrast

[446:29]

>> big O of log N so binary search we're

[446:31]

going to put here which is to say that

[446:33]

in general and especially as n gets

[446:35]

large binary search is much faster it

[446:37]

takes much less time. Why? Because

[446:39]

assuming the numbers are sorted, you

[446:41]

will be dividing in half and half and

[446:42]

half just like with the phone book in

[446:44]

week zero that problem and you will get

[446:46]

to your solution much faster. Why should

[446:49]

you not use binary search though on an

[446:52]

unsorted array of lockers

[446:56]

like a random set of numbers? Yeah,

[446:58]

>> you could just get rid of the value

[447:00]

because you don't know like what the

[447:03]

inequality is going to be.

[447:04]

>> Exactly. You're making these decisions

[447:06]

based on inequalities, less than or

[447:08]

greater than, but based on like no rhyme

[447:10]

or reason. You're going left, going

[447:11]

right, but there's no reason to believe

[447:13]

that smaller numbers are this way and

[447:14]

bigger numbers are that way. So, you're

[447:16]

just making incorrect decision after

[447:17]

incorrect decision. So, you're probably

[447:19]

going to miss the number altogether. So,

[447:21]

binary search on an unsorted array is

[447:22]

just incorrect. Incorrect usage of the

[447:24]

algorithm. But, like Kelly did, if you

[447:26]

sort the data in advance or you're

[447:27]

handed sorted data, well, then you can

[447:29]

in fact apply binary search perfectly

[447:31]

and much more efficiently.

[447:33]

>> I have a question. Is there ever a case

[447:35]

where linear search is more efficient

[447:37]

just because the process of sorting the

[447:39]

data yourself?

[447:40]

>> Absolutely. Is linear search sometimes

[447:42]

more efficient if it's going to take you

[447:44]

more time to sort the data and then use

[447:47]

binary search? Absolutely. And that's

[447:49]

going to be one of the design decisions

[447:50]

that underlies any implementation of an

[447:52]

algorithm because if it's going to take

[447:53]

you some crazy long time not to sort

[447:55]

like seven numbers but 70 700 7,000 7

[447:58]

million but you only need to search the

[448:00]

data once then what the heck are you

[448:02]

doing? Like why are you wasting time

[448:03]

sorting the data if you only care about

[448:05]

getting an answer once? You might as

[448:06]

well just use linear search or heck do

[448:09]

it even randomly and hope you get lucky

[448:10]

if you don't care about reproducing the

[448:12]

same result. Now in general that's not

[448:14]

how much of the world works. For

[448:16]

instance, Google's working really hard

[448:18]

to make faster and faster algorithms

[448:19]

because we are not searching Google once

[448:23]

and then never again doing it. we're

[448:25]

doing it again and again and again. So

[448:27]

they can amortize, so to speak, the cost

[448:29]

of sorting data over lots and lots of

[448:32]

searches. But sometimes it's going to be

[448:33]

the opposite. And I think back to

[448:34]

graduate school where I was often

[448:35]

writing code to analyze large sets of

[448:37]

data. And I could have done it the right

[448:39]

way, sort of the CS50 way by fine-tuning

[448:42]

my algorithm and thinking really hard

[448:43]

about my code. But honestly, sometimes

[448:45]

it was easier to just write really bad

[448:47]

but correct code, go to sleep for seven

[448:49]

hours, and then my computer would have

[448:51]

the answer by morning. The downside, as

[448:53]

admittedly happened more than once, is

[448:54]

if you have a bug in your code and you

[448:56]

go to sleep and then seven hours later

[448:57]

you find out that there was a bug,

[448:58]

you've just wasted the entire evening.

[449:00]

So there too, a trade-off sometimes when

[449:02]

making those resource decisions. But

[449:04]

that's entirely what today is about,

[449:05]

making informed decisions. And sometimes

[449:07]

maybe it's smarter and wiser to make the

[449:10]

more expensive decision, but not

[449:11]

unknowingly, at least knowingly. All

[449:13]

right, so there might we have our first

[449:16]

two algorithms, but let's consider

[449:18]

another way of describing the efficiency

[449:20]

of an algorithm. Big O is an upper

[449:23]

bound. Sort of how bad can it get in

[449:25]

these uh cases where maybe the data is

[449:27]

really uh not working to our advantage.

[449:30]

Omega, a capital omega symbol here is

[449:32]

used for lower bounds. So maybe how

[449:34]

lucky might we get in the best case, if

[449:36]

you will. How few steps might an

[449:38]

algorithm take? Well, in this case here,

[449:40]

here's just a cheat sheet of common

[449:42]

runtimes, even though there's an

[449:43]

infinite number of others, but we'll

[449:44]

generally focus on uh um u functions

[449:48]

like these. Let's consider those same

[449:50]

algorithms. So with linear search from

[449:53]

left to right, how few steps might that

[449:56]

algorithm take?

[449:58]

For instance, in like the best case

[450:00]

scenario?

[450:02]

Yeah. Is this hand about to go up?

[450:04]

>> Yeah. So one step. Why? Because maybe

[450:06]

Jose could have gotten lucky and opened

[450:07]

this door and voila, that was the 50. It

[450:10]

didn't play out that way, but it could

[450:12]

have. In the general case, the number

[450:14]

you're looking for could very well be at

[450:15]

the beginning. So we're going to put

[450:16]

linear search at omega of one. So one

[450:20]

step and maybe it's technically a few

[450:22]

more than that, but it's a fixed number

[450:24]

of steps that has nothing to do with the

[450:26]

number of lockers. Case in point, if I

[450:27]

gave you not seven but 70 lockers, he

[450:29]

could still get lucky and still take

[450:31]

just one step. So omega is our lower

[450:33]

bound. Big O is our upper bound. Ah,

[450:36]

spoiler. What is binary search's lower

[450:40]

bound? Well, apparently it's also omega

[450:43]

of one. But why? That is in fact

[450:46]

correct. Yeah,

[450:47]

>> you could just get lucky again.

[450:49]

>> Same reason you could get lucky in the

[450:50]

best case and it's just smack dab in the

[450:52]

middle of all of the data. So the fewest

[450:54]

number of steps binary search might take

[450:55]

is also actually one. So this is why we

[450:58]

talk about upper bound and lower bound

[450:59]

because you get kind of a r a sense of

[451:01]

the range of performance. Sometimes it's

[451:03]

going to be super fast which is great

[451:05]

but something tells me in the general

[451:07]

case we're not going to get lucky every

[451:08]

time we use an algorithm. So it's

[451:10]

probably going to be closer to those

[451:11]

upper bounds the big O. Now, as an

[451:13]

aside, there's a third and final uh

[451:15]

symbol that we use in computer science

[451:17]

to describe algorithms. That of a

[451:18]

capital theta. Capital theta is jargon

[451:21]

you can use when big O and omega happen

[451:25]

to be the same. And we'll see that

[451:27]

today. Not always, but here's a similar

[451:30]

cheat sheet. None of the algorithms thus

[451:32]

far can be described in this way with

[451:34]

theta notation because they are not all

[451:36]

the same with their big O and omega.

[451:38]

They differed in both of our analyses.

[451:40]

But we'll see at least one example of

[451:42]

one where it's like okay we can describe

[451:43]

this in theta and that's like saying

[451:45]

twice as much information with your

[451:46]

words to another computer scientist

[451:48]

rather than giving them both the upper

[451:50]

and the lower bounds. The fancy way of

[451:52]

describing all of what we're talking

[451:53]

about here big O omega and theta is

[451:56]

asmmptoic notation. And asmtoic notation

[451:59]

refer or asmtoic uh lee refers to a

[452:03]

value getting bigger and bigger and

[452:05]

bigger and bigger but not necessarily

[452:06]

ever hitting some boundary as n gets

[452:09]

very large in short is what we mean when

[452:12]

we deploy this here asmtoic notation.

[452:15]

All right. So, with the first of these

[452:18]

things like linear search, let's

[452:20]

actually kind of make this a bit more

[452:22]

real. Let me actually go over to in just

[452:24]

a moment uh my other screen here. Okay,

[452:27]

in VS Code, let me go ahead and create a

[452:29]

program called search.c. And in search

[452:31]

C, let's go ahead and implement a fairly

[452:33]

simple version of linear search

[452:35]

initially. So, let me go ahead and

[452:36]

include, for instance, cs50.h. Let me go

[452:39]

ahead and include standard io.h. Then,

[452:42]

let me go ahead and do in main void. So,

[452:44]

we're not going to bother with any

[452:45]

command line arguments for now. And then

[452:47]

let me go ahead and just give myself an

[452:49]

array of numbers to play with. And we

[452:51]

did this briefly last week in answer to

[452:53]

a question, but I'm going to do it now

[452:55]

concretely rather than use something uh

[452:58]

ma more manual to get all of these

[452:59]

numbers into the array. I'm going to say

[453:01]

give me an array called numbers. And the

[453:04]

numbers I want to put in this array

[453:06]

initially are going to be the exact same

[453:07]

denominations we've been playing with.

[453:09]

20 500 10 5 100 1 and 50. Again, this is

[453:14]

notation that I alluded to in answer to

[453:16]

a question last week whereby if you want

[453:18]

to statically initialize an array, that

[453:20]

is give it all of your values up front

[453:22]

without having the human type them all

[453:24]

in manually, you can use curly braces

[453:25]

like this. And the compiler is pretty

[453:27]

smart. You don't have to bother telling

[453:29]

the compiler how many numbers you want,

[453:31]

1 2 3 4 5 6 7 because it can obviously

[453:34]

just count how many numbers are in the

[453:36]

curly braces, but you could explicitly

[453:38]

say seven there so long as your counting

[453:40]

is in fact correct. So on line six, this

[453:42]

gives me an array of seven numbers

[453:44]

initialized to precisely that list of

[453:47]

numbers from left to right. All right,

[453:48]

let's ask the human now what number they

[453:50]

want to search for just as I did our two

[453:52]

volunteers and say int n equals get int.

[453:55]

Then let's just ask the user for the

[453:57]

number that they want to search for.

[453:59]

Then let's implement linear search. And

[454:00]

if I want to implement linear search in

[454:02]

terms of the programming constructs

[454:04]

we've seen thus far like what type what

[454:07]

uh keyword in C should I use? What

[454:10]

programming technique? Yeah. Yeah. So,

[454:13]

maybe a for loop or a while loop, but

[454:14]

for loop is kind of uh my go-to lately.

[454:17]

So, let's do four int i equals zero

[454:19]

because we'll start counting from the

[454:21]

left. I is less than seven, which isn't

[454:23]

great to hardcode, but I'm not going to

[454:25]

use the seven again. So, I think it's

[454:26]

okay in one place for this demo. then I

[454:28]

++ then inside of this array let's go

[454:32]

ahead and ask a question just like Jose

[454:34]

was by opening each of the doors by

[454:36]

saying if numbers bracket I equals

[454:40]

equals the number we asked about n well

[454:43]

then let's go ahead and print out some

[454:46]

informative message like found back

[454:48]

slashn and then for good measure like

[454:50]

last week let's return zero to signify

[454:53]

success it's sort of equivalent to

[454:55]

returning true but in main recall you

[454:58]

have to return an int. That's why we

[455:00]

revealed at the end of week two the

[455:02]

return type of main is an int because

[455:04]

that is what gives the computer its

[455:05]

so-called exit status which is zero if

[455:08]

all is well or anything other than zero

[455:10]

if something went wrong but I think

[455:11]

finding the number counts as all is well

[455:14]

but if we get through that whole loop

[455:16]

and we still haven't printed found or

[455:19]

return zero I think we can go ahead and

[455:21]

safely say not found back slashn and

[455:24]

then let's just return one as our exit

[455:26]

status to indicate that we didn't find

[455:28]

the actual number. So in short I think

[455:30]

and see this is linear search. Let me

[455:33]

open up my terminal window again. Let me

[455:35]

make search enter. Let me do / search

[455:40]

enter. And I'll search for as I asked

[455:42]

Jose the number 50. And we indeed found

[455:45]

it at the end. Let me go ahead and rerun

[455:47]

dot slash search. And let's search for

[455:48]

the other number at the beginning 20.

[455:51]

That then works. And just to get crazy,

[455:53]

let's search for a number we know not to

[455:55]

be there like a th00and. And that in

[455:57]

fact is not found. So I think we have an

[455:59]

implementation then of linear search.

[456:02]

But let me pause here and ask if there's

[456:04]

any questions with this here code and

[456:07]

the translation of algorithm to

[456:11]

see. Yeah, in the back

[456:14]

why I did not specify the length of the

[456:16]

array. So it is not necessary when

[456:19]

declaring an array and setting it equal

[456:22]

to some known values in advance to

[456:25]

specify in the square brackets how many

[456:26]

you have because like the compiler is

[456:28]

not an idiot. It can literally count the

[456:29]

numbers inside of the curly braces and

[456:31]

just infer that value. You could put it

[456:33]

there, but arguably you're opening up

[456:36]

the possibility that you're going to

[456:37]

miscount and you're going to put seven

[456:39]

here but eight numbers over there or six

[456:41]

numbers there. So it's best not to tempt

[456:42]

fate and just let the compiler do its

[456:44]

thing instead. A good question. Other

[456:47]

questions on this code so far?

[456:52]

All right, if none, let's go ahead and

[456:54]

maybe convert this linear search to one

[456:56]

that's maybe a little more interesting

[456:58]

that involves like searching for strings

[457:00]

of text. After all, we started the class

[457:02]

in week zero by searching for names in a

[457:04]

phone book like John Harvard. Let's see

[457:05]

if we can't adapt our code for searching

[457:08]

for strings instead of integers. So, in

[457:11]

my code here, let's go ahead and delete

[457:13]

everything inside of main just to give

[457:15]

myself a clean canvas. Let me go ahead

[457:17]

and give me another array. This one

[457:18]

called, let's just call it strings, cuz

[457:21]

that's the goal of this exercise. And

[457:23]

set them equal to some familiar pieces

[457:25]

from the game of Monopoly if you might

[457:27]

have played. So, there's like a

[457:29]

battleship piece in there, there's a

[457:31]

boot in there, there's a cannon in

[457:33]

there, an iron, a thimble, and a top

[457:37]

hat. Though, it does vary nowadays based

[457:38]

on the addition that you have. So kind

[457:40]

of a long array, but I have 1 2 3 4 5

[457:43]

six total values in this array of

[457:46]

strings. Now let's ask the user for a

[457:49]

string. We'll call it s for short. And

[457:51]

say with get string, what string are you

[457:54]

looking for among those six? Then I

[457:56]

think we can do an a for loop again for

[457:58]

int i= 0 i less than 6 i ++. And then

[458:04]

inside of this loop, let's do the same

[458:06]

thing. If uh let's say

[458:10]

uh strings

[458:13]

bracket i equals equals the string s

[458:17]

that the human typed in. I think we can

[458:19]

go ahead and say print found back slashn

[458:22]

and then as before return zero to

[458:25]

signify success. And if we don't after

[458:27]

that whole for loop let's print print f

[458:30]

not found back slashn down here and

[458:32]

return one to signify error. So, it's

[458:34]

really the same thing at the moment,

[458:37]

except that I'm actually using strings

[458:40]

instead of integers. All right, let me

[458:41]

go ahead and open up my terminal window

[458:43]

again and clear it. Let me go ahead and

[458:44]

recompile this code. Make search.c seems

[458:47]

to compile. Okay, let me do dot / search

[458:50]

and let's go ahead and search for the

[458:53]

first one. How about battleship enter?

[458:57]

Huh, not found. All right. Well, let's

[459:00]

maybe typo. Maybe let me search for

[459:02]

something easier to spell. boot not

[459:05]

found. That's weird. Both of those are

[459:06]

at the very start of the array. Let's do

[459:08]

dot slarch again and search for top hat.

[459:10]

Enter. Not found. What is going on?

[459:15]

Well, this isn't actually that obvious

[459:17]

as to what I'm doing wrong. But it turns

[459:19]

out that when we actually compare

[459:21]

strings instead of integers in C, we're

[459:24]

actually going to have to use this other

[459:25]

library, at least today, that we saw

[459:27]

briefly last week. Last week we

[459:29]

introduced it because of a function

[459:30]

called sterling which gives us the

[459:32]

length of a string. Turns out that

[459:33]

string.h also comes per its

[459:35]

documentation with another useful

[459:37]

function called stir comp for string

[459:39]

compare and its purpose in life is to

[459:42]

actually compare two strings left and

[459:45]

right to make sure they are in fact the

[459:47]

same. So for today's purposes suffice it

[459:49]

to say you cannot use equals equals

[459:51]

apparently to compare two strings

[459:54]

intuitively. Why is that? Well, for a

[459:56]

computer, it's super easy to compare two

[459:57]

integers because they're either there or

[459:59]

they're not in memory. But with a

[460:00]

string, it's not just a character and

[460:02]

another character. It's like seven a few

[460:05]

characters over here and a few

[460:06]

characters over here. Maybe it's a few,

[460:08]

maybe it's more. You have to compare

[460:10]

each and every character in a string to

[460:13]

make sure they're in fact the same. So,

[460:15]

stir compare does exactly that. probably

[460:18]

in the implementation of stir comp from

[460:19]

like years ago someone wrote a while

[460:21]

loop or a for loop that looks at each

[460:23]

string left to right and compares each

[460:25]

and every one of the characters therein

[460:27]

and then gives us back an answer. So how

[460:29]

do we go about using this? Well to use

[460:32]

stir compare what I can actually do in

[460:34]

VS code here is go and change my code as

[460:38]

follows. Instead of using equals equals

[460:40]

I'm going to actually use this function

[460:42]

per its documentation. I'm going to call

[460:44]

stir compare. Then I'm going to pass in

[460:46]

one of the strings which is in strings

[460:48]

bracket I. Then I'm going to pass in the

[460:50]

second string which is S. However,

[460:52]

having read the documentation and this

[460:54]

is a little non-obvious. It turns out

[460:56]

that stir comp will return zero if the

[460:59]

strings are equal. Otherwise, it's going

[461:02]

to return a positive number or a

[461:04]

negative number. So what I care about

[461:06]

for now is does the return value of stir

[461:09]

comp when given those two strings give

[461:12]

me back zero. If so, they are equal and

[461:15]

I'm going to say quote unquote found.

[461:17]

So, let's go ahead and open the terminal

[461:18]

again. Let me go ahead and clear it and

[461:20]

do make search to recompile my code. And

[461:23]

huh, I've done something wrong. Let's

[461:26]

see. Let me scroll up to the very first

[461:28]

line. In line 11, error call to

[461:31]

undeclared library function stir comp

[461:33]

with type in and something something

[461:35]

which gets more complicated after that.

[461:37]

Why is line 11 not working despite what

[461:40]

I just preached? Yeah.

[461:44]

>> Yeah. I just did something stupid. I

[461:45]

didn't include the string.h header

[461:47]

library. So all clang, our compiler, is

[461:50]

doing when invoked by make is it's

[461:52]

encountering literally the word stir

[461:54]

comp and not knowing what it is because

[461:56]

we haven't taught it what it is by

[461:58]

simply saying include string.h at the

[462:01]

top. Okay, let me reopen my terminal

[462:02]

window. Clear that message away. Do make

[462:05]

search again. Now it's compiling. Dot /

[462:07]

search. Enter. Now I'm going to go ahead

[462:09]

and search as I did before for

[462:10]

battleship. Ah, now it's finding it. Let

[462:13]

me run dot slash search again. Search

[462:15]

for boot. Ah, okay, that's found. Let me

[462:17]

go ahead and search for top hat. That

[462:19]

too is in there. Let me go ahead and

[462:21]

search for something that's not there,

[462:22]

like the number 50. Not in fact found.

[462:25]

So I think we've actually fixed that

[462:26]

there problem. But if we go back to this

[462:28]

code for a moment, it's indeed the case

[462:30]

per the documentation that equals equals

[462:32]

0 is what I want to do. Why in the world

[462:34]

would stir comp be designed to return a

[462:37]

positive or a negative number too? It's

[462:40]

not returning true or false. It's

[462:41]

returning one of three possible values.

[462:43]

Zero, negative, or positive.

[462:47]

Why might it be useful? Yeah.

[462:50]

>> Um you could kind of like compare which

[462:52]

of the strings is like greater.

[462:53]

>> Yeah, super clever. So, if you're

[462:55]

passing in two strings, it's great to

[462:57]

know if they're equal. But wouldn't it

[462:58]

be nice if this same function could also

[463:00]

help us sort these strings ultimately

[463:02]

and tell me which one comes first

[463:04]

alphabetically. And technically, it's

[463:05]

not going to be alphabetically. It's

[463:07]

going to be a cute phrase asetically

[463:09]

because it's actually going to look at

[463:10]

the asky values of the characters and do

[463:13]

some quick arithmetic and tell you which

[463:14]

one comes first and which one comes

[463:16]

later, which is enough as we'll

[463:18]

eventually see for actually sorting

[463:20]

these strings as well. So in short, the

[463:21]

documentation will tell me that I should

[463:23]

check not only for zero if I care about

[463:26]

equality, but if I care about

[463:27]

inequality, that is checking if one

[463:28]

comes first or last, I should check

[463:30]

whether something is less than zero or

[463:32]

greater than. But for this demonstration

[463:34]

implementing linear search, I don't care

[463:36]

about comparing them uh for inequality.

[463:39]

All I care about is that they are in

[463:41]

fact the same or not in this case. All

[463:45]

right. All right. Well, let's go ahead

[463:46]

and do one other example of sort of

[463:48]

linear search, but let's make the

[463:49]

problem more like that actually in week

[463:51]

zero of searching a phone book. So, let

[463:53]

me go back to VS Code here. Close

[463:56]

search.c and let's make an actual phone

[463:58]

book. So, I'm going to say code of

[463:59]

phonebook C. And then inside of

[464:01]

phonebook C, let's use our same header

[464:03]

file. So, include CS50.h, include

[464:06]

standard io.h,

[464:08]

and let's include an advanced string.h.

[464:12]

Then let's before as before do int main

[464:15]

void. No command line arguments today.

[464:17]

Then inside of here, let me give myself

[464:19]

first an array of strings. How about

[464:21]

some names in the phone book? So I'm

[464:22]

going to say string names equals and

[464:25]

then three names just to make uh a

[464:27]

demonstration. Kelly and David and say

[464:30]

John Harvard here. But if it's a phone

[464:32]

book, I need more than just names. So

[464:34]

let me go ahead and give myself another

[464:36]

array. String of numbers open bracket

[464:39]

close bracket equals. And now the same

[464:41]

phone numbers we used in week zero for

[464:43]

the three of us. Uh + 1 617 495 1. Uh

[464:48]

same for both Kelly and me. So plus1

[464:50]

617495

[464:53]

uh 1. And then as before, if you'd like

[464:55]

to text or call John directly, you can

[464:57]

do so at plus1 9494682750

[465:02]

and semicolon. So one question first. I

[465:07]

obviously declared our names to be a an

[465:10]

array of strings because that's what

[465:12]

text is. Why have I also declared phone

[465:15]

numbers to be strings and not integers?

[465:18]

Because a phone number is like literally

[465:20]

a number in the name of it. Yeah.

[465:24]

>> Yeah. So even though we have phone

[465:26]

numbers in the US, even though we have

[465:28]

social security numbers and a bunch of

[465:30]

other things that we call numbers, if

[465:32]

you have other non-digits in those uh in

[465:35]

those values, you have to actually use

[465:38]

strings because if it's not an actual

[465:39]

integer, but it does have things like

[465:40]

pluses or dashes or parentheses or any

[465:42]

other form of punctuation as is common

[465:44]

in the US and other countries for phone

[465:46]

numbers in particular, you're going to

[465:48]

actually want to use strings and not

[465:50]

numbers. as well as for corner cases

[465:52]

like if there are if you're in the habit

[465:53]

back home if you're not from uh say the

[465:55]

US and you actually have to dial zero

[465:57]

first to make like a local regional call

[465:59]

you don't want to have a leading zero in

[466:02]

a integer because mathematically as we

[466:05]

know from grade school like leading

[466:06]

zeros number zeros that come first have

[466:08]

no mathematical meaning they're going to

[466:10]

disappear effectively from the

[466:12]

computer's memory unless we store them

[466:14]

in fact as characters in strings in this

[466:16]

way okay with that said let's go ahead

[466:18]

and ask the human now after having

[466:20]

declared those two arrays for the name

[466:23]

they want to look up the number of. So

[466:24]

let's say string name equals get string

[466:27]

and let's go ahead and ask the human uh

[466:29]

for the name for which to search. Then

[466:31]

let's use a for loop as before for int i

[466:34]

equals z i less than 3 which again for

[466:37]

demonstration purposes I'm just hard

[466:38]

coding today i ++ and then in the for

[466:40]

loop I'm going to use our new friend

[466:42]

stir comp. If the return value of stir

[466:45]

compare passing in names bracket I and

[466:49]

the name the human typed in equals

[466:51]

equals zero signifying that they are in

[466:53]

fact the same. Well that means we found

[466:56]

the location i where the person's name

[466:58]

is. So let's go ahead and print out

[467:01]

found. But just to be fun let's print

[467:02]

out whom we found. So percent s back

[467:05]

slashn and then output there the number

[467:09]

which is going to be in the

[467:10]

corresponding numbers array at that same

[467:13]

location I will return zero and at the

[467:15]

very end of this program let's go ahead

[467:17]

and print out not found if we get that

[467:19]

far and return one. All right. So, a

[467:23]

little more complexity this time, but

[467:25]

notice I'm comparing the names just like

[467:27]

a normal person would in your iOS app or

[467:30]

your Android app when looking for

[467:31]

someone's name. But what I care about is

[467:33]

getting back the number. So, that's why

[467:34]

two lines later, I'm printing out the

[467:37]

number that I found at location I, not

[467:40]

the name because I already know the

[467:42]

name. All right. In my terminal window,

[467:44]

let's go ahead and make this phone book

[467:47]

dot /phonebook. Let's go ahead and

[467:48]

search for John, whose number is

[467:50]

hopefully indeed exactly that number.

[467:53]

So, suffice it to say, this code two

[467:55]

does work. This is a linear search

[467:57]

because I'm searching left to right.

[467:58]

These aren't actually sorted

[468:00]

alphabetically by name or let alone

[468:02]

number. So, I think we're doing well

[468:04]

here, but I don't necessarily love this

[468:09]

implementation. Even if you're new to

[468:11]

programming, what might you not like

[468:14]

about how I've implemented a phone book

[468:17]

in the computer's memory?

[468:21]

Why is this maybe not the best design?

[468:23]

Yeah.

[468:23]

>> Like there's a correspondence between

[468:25]

names and numbers. So like having two

[468:27]

different

[468:28]

>> Okay. Yeah. And I would say so uh you're

[468:30]

pointing out that we have this duality.

[468:32]

We've got two arrays. They're the exact

[468:33]

same length. And it just so happens that

[468:35]

location zero's name lines up with

[468:37]

location zero's number and location one

[468:40]

and location two. But we're kind of on

[468:41]

the honor system here whereby the onus

[468:43]

is on us to make sure we don't screw

[468:45]

this up and we make sure we always have

[468:46]

the same number of names and the same

[468:48]

number of numbers and better and

[468:50]

moreover that we make get the order

[468:53]

exactly right. We are just trusting that

[468:55]

when we print out the e number so to

[468:57]

speak that it lines up with the e name.

[469:00]

So that's fine and honestly for three

[469:01]

people who really cares it's fine. But

[469:03]

if you think about 30 people, 300, 3

[469:05]

million, well, we're not going to

[469:06]

hardcode them all here, but even in some

[469:08]

database that we'll store them in later

[469:10]

in the course feels like just trusting

[469:13]

that we're not going to screw this up is

[469:14]

asking for trouble. And indeed, a lot of

[469:16]

programming is just that, like not

[469:18]

trusting yourself and definitely not

[469:19]

trusting your colleague not to mess

[469:21]

something up, but programming a bit more

[469:23]

defensively and trying to encapsulate

[469:25]

related information a little more

[469:27]

tightly together and not just assume as

[469:29]

on the honor system that these two

[469:31]

independent arrays will line up. But at

[469:34]

this point, we have no means of solving

[469:36]

this problem unless we give ourselves

[469:39]

just a bit new functionality and syntax.

[469:42]

So I used this phrase earlier to kick

[469:43]

things off. data structures. It's like

[469:45]

how you structure your data in the

[469:47]

computer's memory. Arrays are the

[469:48]

simplest of data structures. They just

[469:50]

store data back to back to back from

[469:52]

left to right continuously in memory.

[469:54]

But they all have to be, as we've seen,

[469:55]

the same kinds of values. Int int or

[469:58]

string string string. There's no

[469:59]

mechanism yet for storing an int and a

[470:02]

string together and then another int and

[470:04]

another string together or let alone two

[470:05]

strings, two strings, two strings that

[470:08]

are somehow a little bit different. But

[470:10]

it would be nice if C gave us an actual

[470:13]

data type to store people in a phone

[470:15]

book such that we could create an array

[470:17]

called people inside of which is going

[470:19]

to be a whole bunch of persons if you

[470:20]

will back to back to back and I want two

[470:23]

of them. So wouldn't it be nice if I

[470:24]

could literally use this code in C. Well

[470:26]

decades ago when SE was invented they

[470:28]

didn't give us a person data type. All

[470:30]

we have is int and float and char and

[470:32]

bool and string and so forth. Person was

[470:36]

not among the available data types. But

[470:38]

we can invent our own data types it

[470:40]

turns out. So in C what we can do if we

[470:43]

want persons to exist and every person

[470:46]

in the world shall have a name and a

[470:48]

phone number for now we can do this

[470:50]

string name string number. Now that's a

[470:53]

decent start but it's going to be kind

[470:55]

of a stupid implementation if I then

[470:56]

just do name uh string name one string

[470:59]

name two string name three string name

[471:01]

four. We've already started down that

[471:02]

road last week and decided arrays were a

[471:05]

better solution. But here's an

[471:07]

alternative when you want to just store

[471:09]

related data together. I can use these

[471:11]

two keywords and see typed defaf strruct

[471:14]

which albeit tur just means define a new

[471:16]

type that is a data structure. So

[471:19]

multiple things together inside the

[471:21]

curly braces you literally put the two

[471:23]

things you want to relate together

[471:25]

string name string number and then

[471:26]

outside the curly braces you specify the

[471:28]

name you want to give to this brand new

[471:30]

custom type that you have invented.

[471:33]

Technically, stylistically, you'll see

[471:34]

that style 50 prefers that the name

[471:37]

actually be on the same line as the last

[471:39]

curly brace, which looks a little weird

[471:40]

to me, but that's what industry tends to

[471:42]

do, so so be it. But these several lines

[471:45]

together tell C, invent for me a new

[471:48]

data type called person, and assume that

[471:50]

every person in the world has a string

[471:52]

called name and a string called number.

[471:55]

And now I can use this new data type in

[471:58]

my own code to solve this problem a

[472:01]

little bit better. So, in fact, let me

[472:03]

go ahead and do this as follows. I'm

[472:05]

going to go back to VS Code here. And at

[472:07]

the very top of my code, above main,

[472:10]

just to make this available to not only

[472:11]

Maine, but maybe any future functions I

[472:13]

write, I'm going to say type defrct, as

[472:16]

we saw on the screen. Inside of my curly

[472:18]

braces, I'm going to say string name and

[472:21]

string number. And then I'm going to

[472:22]

name this thing person. Now, I'm going

[472:25]

to go about using this and I'm going to

[472:27]

go ahead and delete my previous honor

[472:30]

system approach of having names and

[472:31]

numbers in separate arrays. And I'm

[472:33]

instead going to give myself an array of

[472:36]

people. Uh, we could call it persons,

[472:38]

but I'm trying to be somewhat

[472:39]

grammatically correct. So, I'm going to

[472:41]

say people bracket three to give myself

[472:43]

an array called people inside of which

[472:46]

is room for three persons inside of

[472:48]

which is room for a name and number

[472:51]

each. So, how do I now initialize these

[472:53]

values? So I'm going to hardcode them.

[472:54]

That is type them manually. But you can

[472:56]

imagine using get string or get or some

[472:58]

other function to get this data from the

[473:00]

human themselves. I'm going to say go to

[473:02]

the people array at location zero and

[473:05]

access the name field. And this is

[473:08]

syntax we haven't seen yet, but it's not

[473:09]

that hard. You literally use a dot, a

[473:11]

single period to say go inside of that

[473:13]

structure and access the name field, the

[473:16]

name attribute, so to speak. And let's

[473:18]

set that equal to Kelly. Then let's go

[473:20]

into that same array location people

[473:23]

bracket zero and set the number for the

[473:26]

zeroth person to be + one 6174951000.

[473:32]

Then let's go ahead and do the same

[473:33]

thing for people bracket 1. Set that

[473:36]

person's name to for instance mine

[473:38]

David. Then let's do people bracket 1

[473:41]

number equals quote unquote same as

[473:43]

Kelly cuz we're both in the directory.

[473:44]

So + 1 617495

[473:47]

1,000. And then lastly, people bracket

[473:50]

2.name

[473:52]

equals quote unquote John for John

[473:54]

Harvard. People bracket 2 number equals

[473:57]

+ one uh 949

[474:01]

468 275

[474:04]

0 in this case. And now the rest of the

[474:06]

code is almost the same. I'm going to

[474:08]

now on the new line 24 still ask the

[474:10]

user what name they want. I'm going to

[474:12]

still iterate from 0 to three because

[474:14]

there's still three elements in this

[474:15]

array even though each has two values

[474:17]

within. And I'm going to compare now not

[474:20]

names but people bracket i.name

[474:24]

to go access the name of that i person

[474:27]

and compare it to the name that the

[474:29]

human has typed in. And when I find that

[474:32]

person I'm going to go into the people

[474:33]

array at location i but print out the

[474:36]

number instead. So all we've done here

[474:38]

is add this dot notation which allows

[474:40]

you to access the inside of a data

[474:42]

structure. And all we've done is

[474:44]

introduce up here some new C keywords

[474:46]

that let you invent your own data types

[474:49]

inside of which you can put most

[474:50]

anything you want. I have chosen a

[474:52]

string name and a string number. All

[474:55]

right, let me go ahead and open my

[474:56]

terminal window and clear it from

[474:57]

before. Let me do make phone book to

[475:00]

make this version. So far so good. Make

[475:02]

phone book. Enter. I'm going to go ahead

[475:04]

now and search for say John. And I have

[475:06]

again found his number. So this is still

[475:08]

correct. But even though this took more

[475:11]

minutes in terms of the voice over and

[475:12]

it took more lines of code, it's

[475:15]

arguably better designed now because at

[475:17]

people bracket zero is an actual person

[475:19]

and everything about them. At people

[475:21]

bracket one is another person and

[475:23]

everything about them and so forth. This

[475:25]

is what we mean by encapsulate. You can

[475:27]

think of these curly braces as sort of

[475:29]

hugging these data types inside of the

[475:32]

data structure together so as to keep

[475:34]

them together in the computer's memory

[475:37]

as well.

[475:38]

All right. Well, just to set the stage,

[475:41]

uh, literally as we'll strike the

[475:44]

lockers and put something else up, the

[475:46]

efficiency of binary search as

[475:48]

implemented by Caitlyn was predicated on

[475:50]

Kelly having in advance sorted the

[475:53]

values up front. But of course, we've

[475:55]

only considered now the running time of

[475:56]

searching for information using two

[475:58]

algorithms, and there can be many others

[476:00]

in the real world, but those are two of

[476:01]

the most canonical. We found that binary

[476:03]

search was faster than linear search,

[476:05]

but it required that we sort the data.

[476:07]

So to your question earlier, maybe we

[476:09]

should consider just how expensive it is

[476:11]

in terms of time, money, space, humans

[476:14]

to sort data, especially a lot of data,

[476:16]

and then decide whether or not it's

[476:18]

worth using something like binary search

[476:19]

or perhaps even something else. So the

[476:21]

next problem we'll solve today

[476:22]

ultimately is given a generic input and

[476:24]

output. The input to our next problem is

[476:26]

going to be unsorted data. So like

[476:28]

numbers out of order, the output of

[476:30]

which should be sorted data. So for

[476:31]

instance, if we pass in 72541603,

[476:35]

I want whatever black box is

[476:37]

implementing my sorting algorithm to

[476:39]

spit out 0 1 2 3 4 5 6 7. So that's

[476:43]

going to be the question we answer. But

[476:45]

first, I think it's time for some

[476:46]

delightful hello pandas, chocolate

[476:47]

biscuits. Uh let's take a 10-minute

[476:49]

break and snacks are now served.

[476:52]

All right, we are back. And recall that

[476:54]

the cliffhanger on which we left was

[476:55]

that how do we go about sorting numbers?

[476:57]

Well, here are some numbers, eight of

[476:59]

them in fact, from 0 to seven. but

[477:01]

currently unsorted. Um, we don't quite

[477:03]

have enough Monopoly boards for

[477:04]

everyone, but we do have some delightful

[477:06]

uh Super Mario Brothers Pez dispensers.

[477:09]

If I could get eight volunteers for this

[477:10]

final demo up here. Oh, and not a lot of

[477:13]

hands. Okay. All right. One, two, three,

[477:16]

four, five, six, and let's go farther

[477:19]

back. Seven, and eight. How about All

[477:21]

right. Come on up. Hopefully I counted

[477:24]

properly. Come on over.

[477:28]

Upon arrival at the stage, go ahead and

[477:29]

grab your favorite illuminated number

[477:32]

and stand in that same order at the

[477:34]

front of the stage if you all could.

[477:38]

Welcome to the stage. All right, grab

[477:40]

your favorite number. Stand in that same

[477:41]

order.

[477:44]

All right,

[477:47]

good. And one, two, three, four, five,

[477:50]

six. I definitely said one through

[477:51]

eight. Who is the number eight then?

[477:53]

Okay, we need an eight. Come on down.

[477:55]

All right. Well, technically we need a

[477:56]

four, but come on down. Yeah. All right,

[477:59]

grab the four and let me start from this

[478:00]

end first if you want to give a quick

[478:02]

hello and a little something about you.

[478:03]

>> Uh, hi, my name is Cameron. I'm a first

[478:06]

year and I want to study mechanical

[478:08]

engineering.

[478:10]

>> Welcome.

[478:11]

>> Hi, I'm Charlotte. I'm also first year

[478:13]

and I'm in Canada F.

[478:15]

>> Welcome.

[478:16]

>> Hi, I'm Ella. I'm also a first year and

[478:19]

I'm in the

[478:20]

>> Hi, I'm Precious. I'm also a first year.

[478:23]

I'm there.

[478:24]

>> Hi, I'm Michael. I'm just an Eventbrite

[478:26]

guest.

[478:27]

>> Yeah.

[478:28]

>> Hi, I'm Marie. I'm a first year and I'm

[478:30]

in Canada.

[478:31]

>> Welcome.

[478:32]

>> Hi, I'm Rick. I'm a first year and I'm

[478:34]

in whole worthy.

[478:35]

>> Welcome.

[478:36]

>> Nice.

[478:37]

>> I'm Jaden. I'm a first year in

[478:39]

Hullworthy and I really like free stuff.

[478:41]

>> Okay. Well, let's see then uh if we

[478:44]

can't award all these Super Mario

[478:45]

Brothers Pez dispensers. The first

[478:46]

notice, of course, that all eight of our

[478:48]

volunteers are completely out of order,

[478:49]

but in an ideal world, we would have the

[478:51]

smallest number over here.

[478:54]

Go over there. Number zero. Wait a

[478:57]

minute. Seven. Let's go over here.

[479:01]

Two. Okay. F. Okay. Make yourselves look

[479:03]

like that.

[479:06]

No pez. It's okay. All right. So, 725

[479:10]

41603.

[479:12]

Okay. We won't do the introductions

[479:13]

again, but now we have a list of numbers

[479:15]

completely out of order. And wouldn't it

[479:17]

be nice if zero were eventually over

[479:19]

here, seven were all the way over there,

[479:20]

and everything else was sorted from

[479:22]

smallest to largest? Well, if you all

[479:24]

could go ahead and sort yourselves from

[479:26]

smallest to largest. Go.

[479:33]

All right. And Jaden, what was your

[479:35]

algorithm for doing that? Um I

[479:39]

I I I know that I have the least number

[479:41]

because I don't think there anybody has

[479:43]

a number less than zero. So I put myself

[479:44]

at the last bottom line.

[479:46]

>> Okay. And I assume Precious. What was

[479:47]

your algorithm?

[479:49]

>> I knew I had the largest number. So I

[479:51]

just had to be at the end of the

[479:53]

>> Okay, fair. So you guys got the easy

[479:55]

ones. Uh number four. How about

[479:58]

>> I knew three was before me and five was

[480:00]

after me.

[480:01]

>> Nice. So number four didn't actually

[480:02]

have to move coincidentally. But as for

[480:04]

five and three and two and one and six,

[480:06]

they probably had to take into account

[480:08]

some additional information. Who's to

[480:09]

their left? Who's to their right? And it

[480:10]

just kind of worked. But it didn't look

[480:11]

very algorithmic, if you will. It looked

[480:13]

very organic and obviously correct. But

[480:16]

I'm not sure that same approach would

[480:17]

work well if we had not eight, but 80 or

[480:19]

800 or 8,000 pieces of data. So let's

[480:22]

see if we can't formalize this a little

[480:23]

bit. Let me take the mic and if you guys

[480:24]

could reset yourselves to those same

[480:26]

original positions from seven on the

[480:28]

left to three on the right. Let me

[480:31]

propose a couple of algorithms,

[480:32]

canonical ones if you will, but see if

[480:34]

maybe we can't formalize step by step

[480:36]

what to do. So the first one I'm going

[480:38]

to do given all of these numbers is just

[480:40]

try to select the smallest number. Why?

[480:41]

To Jaden's point earlier, I just want to

[480:43]

put the smallest number over here. At

[480:45]

least that's a problem I can solve. It's

[480:47]

very well defined. It's a nice bite out

[480:48]

of the problem. So seven. Okay, smallest

[480:51]

so far. Two, that's that's smaller. So

[480:53]

I'm going to remember that two is the

[480:54]

now smallest number I've seen. Not five,

[480:57]

not four. One is even smaller. So, I'm

[480:59]

going to remember one, not six, zero.

[481:01]

That's pretty good. But I'm going to

[481:02]

check the whole list. Maybe there's

[481:03]

negative one or something like that. But

[481:05]

no, three. So, I'm going to remember

[481:06]

that zero was the smallest element I

[481:08]

found. Let's select Jaden and put Jaden

[481:10]

over here. But before Precious or anyone

[481:13]

else moves, we don't really have room

[481:14]

for you. Like, Precious is in the way

[481:16]

because if this is an array of eight

[481:19]

values for integers, well, we can't just

[481:21]

kind of make room over here because if

[481:23]

you think back to last week, we might

[481:24]

have uh some garbage values there or

[481:26]

something else is going on. We don't

[481:27]

want to change data that doesn't belong

[481:29]

to us. So what to do with precious?

[481:31]

Well, maybe Precious, maybe you can go

[481:32]

over there. So you just take Jaden's

[481:34]

spot and we'll swap these two values

[481:36]

accordingly. Now though, Jaden is in the

[481:38]

right space, which is good because now I

[481:40]

can move on to the second problem.

[481:41]

What's the next smallest element that's

[481:43]

presumably greater than zero? Well, at

[481:44]

the moment, two is the next smallest

[481:46]

element. Not five, not four. Ooh, one is

[481:49]

the next smallest element. I'm going to

[481:50]

remember that. Not six, not seven, not

[481:52]

three. Okay, so number one, if you could

[481:55]

go to the right location, but I'm afraid

[481:56]

we're going to have to evict number two

[481:58]

to make room. All right, let's do this

[482:00]

again. Zero and one are in good shape.

[482:02]

So now I think I can ignore them as

[482:04]

complete. Five is the current smallest.

[482:07]

Nope. Four now is Nope. Two now is six.

[482:10]

No. Seven. No. Three. No. Okay, so two

[482:12]

is the next smallest. So let's swap two

[482:14]

and five. And now I've solved three out

[482:16]

of the eight problems. Let's do this

[482:18]

again. Four is at the moment the

[482:19]

smallest. Not five, not six, not Oh,

[482:22]

three is the now smallest. So, let's

[482:23]

swap three. Four and three, which

[482:25]

unfortunately is making the four problem

[482:27]

a little worse. Like he belongs there,

[482:29]

it would seems, but I think we can fix

[482:31]

that later. So, now half of the list is

[482:33]

sorted. Five is the next smallest. Six

[482:35]

and seven. A four. Now, we got to fix

[482:37]

the four. So, four goes back there. Now,

[482:39]

I messed up the five, but it will come

[482:40]

back to that. All right. Six. Seven.

[482:42]

Okay. Five. Let's put you where six is.

[482:45]

And now one more mistake to fix. So,

[482:47]

seven. Okay. Six and seven need to swap.

[482:50]

And now I've solved eight problems in

[482:52]

the aggregate. So it's complete. Now to

[482:54]

be fair, my approach is clearly way

[482:56]

slower than your approach, but you all

[482:58]

were working in parallel, whereas I was

[483:00]

doing it more methodically, step by

[483:01]

step. And I dare say my algorithm is

[483:03]

probably going to be more translatable

[483:05]

to code. And indeed, what I just acted

[483:07]

out is what the world would call

[483:08]

selection sort, whereby on each

[483:10]

iteration, each pass in front of the

[483:12]

humans, I was selecting the smallest

[483:15]

element I could find. All right. What

[483:17]

how else could I do this, though? So,

[483:19]

let's do something that's maybe a little

[483:20]

more organic like your approach where

[483:23]

you were actually comparing who was next

[483:24]

to you. Go ahead and reset yourselves

[483:25]

one final time to this arrangement.

[483:27]

Seven on the left, three on the right.

[483:29]

And let me propose again to walk through

[483:31]

the list again and again. But let me

[483:33]

focus more narrowly on the problem right

[483:35]

in front of me because I felt like I was

[483:36]

taking a lot of steps back and forth,

[483:38]

back and forth. Maybe we can chip away

[483:40]

at some of that wasted time. Let's

[483:42]

compare seven and two. They're obviously

[483:44]

out of order. So, let's just immediately

[483:46]

swap you two if we could. All right.

[483:48]

Now, seven and five clearly out of

[483:50]

order. Let's swap these two. Seven and

[483:52]

four out of order. Let's swap these two.

[483:55]

Seven and one out of order. Let's swap

[483:56]

these two.

[483:58]

Seven and six out of order. Let's swap

[484:00]

these two. Seven and zero out of order.

[484:02]

Swap these two. Seven and three out of

[484:05]

order. Swap these two. So, a lot of work

[484:07]

for Precious there. But, I've now indeed

[484:10]

solved one of the eight problems.

[484:12]

Moreover, I don't need to keep uh

[484:14]

addressing the seven problem because

[484:16]

notice that Precious has essentially

[484:17]

bubbled her way up to the end of the

[484:19]

list. And indeed, that's going to be the

[484:21]

operative term here. Another algorithm

[484:23]

that computer scientists everywhere know

[484:25]

is called bubble sort, whereby the goal

[484:26]

is to get the biggest elements to just

[484:28]

bubble their way up to the top of or the

[484:30]

end of the list one at a time. Now, am I

[484:33]

done? Well, no. Clearly not. There's

[484:36]

still stuff out of order except for

[484:37]

precious. Indeed, I have solved one of

[484:39]

these eight problems. And now fine, I'll

[484:41]

go back and I'm just going to try this

[484:43]

same logic again. Two and five, good.

[484:45]

Five and four, nope, swap those. Five

[484:47]

and one, nope, swap those. Five and six

[484:50]

are good. 6 and zero, nope, swap those.

[484:52]

Six and three, nope, swap those. And I

[484:54]

already know that Precious is where she

[484:56]

needs to be. So, I think I'm done with

[484:57]

the second of eight problems. And I'll

[484:59]

do this a little faster now. Two and

[485:00]

four. Four and one, swap. Four and five

[485:03]

are good. Five and zero, swap. Five and

[485:05]

three, swap. And now we solved three

[485:07]

problems. Let me reset. Two and one,

[485:10]

swap. Two and four are good. Four and

[485:12]

zero, swap. Four and three, swap. And

[485:15]

now I've solved half of the problems.

[485:16]

Four out of eight. We're almost done.

[485:18]

One and two are good. Two and zero,

[485:20]

swap. Two and three are good. Okay. And

[485:22]

now we're done with five out of the

[485:24]

eight problems. One and zero swap.

[485:27]

Uh, one and two are good. Those are all

[485:30]

good. And let me just do a final sanity

[485:32]

check. Everything now is sorted. So now

[485:35]

I'm done solving all eight of those

[485:38]

problems. So, you all were wonderful. We

[485:39]

need the numbers back, but Kelly has

[485:40]

some delightful Pez dispensers for you

[485:42]

on the way out. If you want to head that

[485:43]

way, just leave the numbers on the

[485:44]

shelves. And a round of applause for our

[485:45]

eight volunteers for helping to act this

[485:48]

out.

[485:51]

Thank you.

[485:53]

So, let's see if we can't formalize what

[485:57]

these volunteers kindly just did with

[486:00]

us. Starting with the first of those

[486:01]

algorithms. Thank you. Namely, selection

[486:03]

sort. Let's see if we can't slap some

[486:05]

pseudo code on this. thinking of our

[486:06]

humans now as more generically an array.

[486:09]

So we had the first person at location

[486:11]

zero and we had the last person at

[486:13]

location n minus one. And just for

[486:15]

clarity so that you've kind of seen the

[486:17]

uh symbology this obviously is going to

[486:19]

be location n minus2. This is location n

[486:22]

minus3 and so forth until sort of dot

[486:24]

dot dot you hit the other end that we've

[486:26]

already written out. So that's just how

[486:27]

we would refer to all of our eight

[486:29]

volunteers locations or in this case 1 2

[486:32]

3 4 5 6 seven locations but dot dot dot

[486:34]

in the middle conoting that this can be

[486:35]

a much much larger array. So here's some

[486:38]

pseudo code for the first algorithm

[486:40]

selection sort for i from zero to n

[486:43]

minus one. So from the first element to

[486:45]

the last element find the smallest

[486:48]

number between the numbers bracket i and

[486:51]

numbers bracket n minus one. In other

[486:53]

words, if you're starting I at zero,

[486:56]

look at specifically every lighted

[486:59]

number between location zero and

[487:01]

location n minus one. When you have

[487:04]

found that smallest element, swap it

[487:06]

with the number at location i, which

[487:08]

starts again at zero. That's how we got

[487:10]

I think jaden into place at the very

[487:12]

beginning. Then I by nature of how for

[487:15]

loops work gets updated from 0 to one.

[487:17]

So that we do the same thing. Find the

[487:19]

smallest number between numbers bracket

[487:21]

one. So the second element through the

[487:23]

eighth element because this number is

[487:25]

unchanged. N is the total number of

[487:27]

values. So the end point there is not

[487:28]

changing. Once we found the second

[487:30]

smallest person, we swap them with

[487:32]

location I aka one. And that's how we

[487:35]

got the number one into position and

[487:37]

then the number two and then the number

[487:38]

three and number four. So this then was

[487:40]

selection sort in pseudo code form. And

[487:42]

that allowed us to actually go through

[487:44]

this list again and again and again in

[487:46]

order to find the next smallest element.

[487:49]

So what was happening a little more

[487:50]

methodically if it helps just to map

[487:53]

that symbology of the bracket notation

[487:55]

and the eyes. If this is where we

[487:57]

started with location I and we did

[487:59]

everything between location N minus one.

[488:02]

Essentially I traversed this whole list

[488:03]

from left to right literally walking in

[488:05]

front of our volunteers looking at each

[488:07]

element and the first element I saw was

[488:09]

seven. At the moment that was the

[488:11]

smallest element I had found. And who

[488:12]

knows in a different list maybe seven

[488:14]

would be the smallest element. So I kind

[488:16]

of stored it in a variable in my mind.

[488:18]

But I checked then two and remembered no

[488:19]

no two is clearly less than. Now I'm

[488:21]

going to remember two. Okay. Now I'm

[488:23]

going to remember one when I find it.

[488:24]

Then I'm going to remember zero when I

[488:26]

find it. And then what I did once I

[488:28]

found jade in it with the value of zero

[488:31]

uh lighted up. I moved location that

[488:34]

location to here and then evicted

[488:36]

precious recall and moved precious over

[488:38]

to that location that we had freed up.

[488:40]

Why? Why all this sort of back and

[488:42]

forth? Well, you have to assume with an

[488:44]

array that you're not entitled to the

[488:45]

memory over here. You're not entitled to

[488:47]

the memory over here if you've already

[488:49]

decided that you have seven lockers or

[488:51]

eight people. You have to commit to the

[488:53]

computer in advance. That's why we put

[488:54]

the number typically in the square

[488:55]

brackets or the compiler infers from the

[488:58]

curly brackets how big the array

[489:00]

actually is. All right. And suffice it

[489:02]

to say when I went through this again

[489:03]

and again and again, I did the same

[489:05]

thing over and over. Now, you might have

[489:07]

thought me sort of dumb for having asked

[489:10]

the same questions again and again like

[489:11]

I was surprised to discover the number

[489:13]

one. I was surprised to discover the

[489:14]

number to two even though on my very

[489:16]

first pass I literally looked at all

[489:18]

eight of those numbers but you have to

[489:20]

think about what memory I'm actually

[489:21]

using. Now I certainly could have

[489:23]

memorized all of the numbers and where

[489:25]

they are. But I propose that just very

[489:27]

simply I was using like a single

[489:28]

variable in my brain just to keep track

[489:30]

of the then smallest element. And once

[489:33]

I'm done finding that and solving that

[489:35]

problem I moved on to do it again and

[489:36]

again. But that's going to be a

[489:38]

trade-off. And this is going to be

[489:39]

thematic in the coming weeks whereby

[489:41]

well sure you could use more memory and

[489:43]

I could have been smarter about it and

[489:44]

maybe that would have improved or um

[489:47]

hurt the running time of the algorithm.

[489:49]

There's often going to be a trade-off

[489:50]

between how much memory or how much time

[489:53]

you actually use. So we'll discover that

[489:55]

over time. So how fast or slow is

[489:59]

selection sort? Well consider when I had

[490:00]

eight humans on stage I first went

[490:02]

through uh all n of them. But how many

[490:05]

comparisons did I make? Really, I was

[490:07]

doing n minus one comparisons because if

[490:09]

I've got n people, I've got to compare

[490:11]

the smallest number I found against

[490:13]

everyone else. And you compare n people

[490:16]

left to right n minus one times total.

[490:18]

So the first pass I was making I was

[490:20]

asking n minus one questions. Is this

[490:22]

the smallest? Is this the smallest? Is

[490:23]

this the smallest? N minus one times.

[490:25]

Once I solved one problem, when we got

[490:27]

Jaden into Jaden's right place, then I

[490:30]

had one fewer problem. Then one fewer

[490:31]

fewer problem and so forth. So, it was

[490:33]

like n -1 steps plus n -2 steps plus n

[490:37]

-3 steps plus dot dot dot one final step

[490:40]

once I got to the final of the eight

[490:42]

problems. Now, if you remember kind of

[490:44]

the cheat sheet at the back of your math

[490:45]

books, uh say growing up, you'll note

[490:47]

that this uh series here can be more

[490:50]

simply written as n * n -1 all / 2. And

[490:53]

if you've not seen that before, just

[490:55]

take on faith that this is identical to

[490:57]

this series of numbers up here. So, now

[490:59]

we can just kind of multiply this out.

[491:01]

So that's technically n^2 minus n all

[491:03]

divided by 2, which is great. If we

[491:06]

multiply that out, that's n^ square over

[491:07]

2 - n /2. We're getting too into the

[491:09]

weeds. Let's whip out our big O notation

[491:12]

now, whereby we can wave our hands at

[491:14]

the lower order terms only care about

[491:16]

the biggest most dominant term, which

[491:18]

mathematically in this expression, if

[491:20]

you plug in a really big value of n,

[491:22]

which is going to matter more? The n

[491:24]

squ, the two, the n, or the two?

[491:27]

Like the n squ? like the others

[491:29]

absolutely contribute to the total

[491:31]

value. But if you plug in a really big

[491:33]

value, the dominant force is going to be

[491:35]

this n squ because that's really going

[491:36]

to blow up the total value. So we can

[491:39]

say that selection sort when analyzed in

[491:41]

this way, ah it's on the order of n

[491:44]

squared steps because I'm doing so many

[491:46]

comparisons so many times. So if that's

[491:49]

the case, the question then is um what

[491:53]

is indeed not just its upper bound but

[491:55]

maybe it's lower bound as we'll

[491:57]

eventually see. So for selection sort

[491:58]

for now, let's stipulate that it's

[492:00]

indeed in big O of N squ. And that's

[492:02]

actually the worst of the algorithms

[492:03]

we've seen. Like that's way slower than

[492:04]

linear search because at least linear

[492:06]

search was big O of N. Selection sort is

[492:09]

N squar which of course is N * N which

[492:11]

is and will feel much much slower than

[492:14]

that. So what if though we consider the

[492:17]

lower bound of selection sort? All

[492:19]

right, maybe it's bad in the worst case,

[492:20]

but maybe it's really good when the

[492:22]

numbers are mostly sorted.

[492:23]

Unfortunately, this is the same pseudo

[492:25]

code for selection sort. We make no

[492:27]

allowance for checking the list to make

[492:29]

sure it's already sorted. And in fact,

[492:31]

that's kind of a perverse case to

[492:33]

consider for any algorithm. What if the

[492:34]

problem's already solved? How's your

[492:36]

algorithm going to perform? Like if all

[492:37]

of my volunteers is they kind of almost

[492:39]

did accidentally, they started lining up

[492:41]

roughly in order. Suppose they literally

[492:43]

had been in order from 0 to 7. Well, my

[492:45]

stupid algorithm would still have me

[492:47]

walking back and forth, back and forth,

[492:49]

back and forth. Why? because the code

[492:51]

literally tells me do this this many

[492:53]

times and every time I do that find the

[492:56]

smallest element. So it's going to be

[492:57]

sort of a stupid output because the list

[492:59]

is not going to be any changed any any

[493:01]

at all changed but my code is not taking

[493:04]

into account in any way the original

[493:07]

order of the numbers. So no matter what

[493:09]

this is to say that if we consider

[493:11]

whether the lockers or the humans the

[493:13]

omega notation for this algorithm even

[493:15]

in the best case where the data is

[493:16]

already sorted is crazily also n

[493:20]

squared. Now I could certainly change

[493:21]

the pseudo code but selection sort as

[493:23]

the world knows it is more of a

[493:25]

demonstrative algorithm or sort of a

[493:27]

quick and dirty one. Its running time is

[493:29]

going to be in omega of n squ. And now

[493:32]

we can actually deploy our theta

[493:33]

notation because the bigo notation is n^

[493:35]

squ and the omega notation is n^ squ and

[493:38]

the same. We can also say that selection

[493:39]

sort is in theta of n^2 which is not

[493:42]

great because that's annoyingly slow. So

[493:45]

maybe the solution here is don't do

[493:47]

that. Let's use bubble sort instead. The

[493:48]

second algorithm where I just compared

[493:50]

everyone side by side again and again.

[493:52]

Well, here's some pseudo code for bubble

[493:54]

sort which you can assume applies to the

[493:56]

same kind of array from zero on up to n

[493:58]

minus one. Here's one way to write

[494:00]

bubble sort. Repeat the following n

[494:02]

times. For i from 0 to n minus 2, if the

[494:07]

number at location i and the number at

[494:10]

location i + 1 are out of order, swap

[494:13]

them. And there's kind of an elegance to

[494:15]

this algorithm and that like that's it.

[494:17]

And you just assume that when you go

[494:19]

through the list, this is how from I

[494:21]

from 0 to n minus two, this is how I was

[494:23]

effectively comparing elements 0 and 1,

[494:26]

one and two, two and three, three and

[494:28]

four, dot dot dot, uh seven, six and

[494:32]

seven. But notice I didn't say eight.

[494:35]

There were eight total people. Why do we

[494:37]

go from 0 to n minus2 instead of from 0

[494:41]

to n minus one?

[494:43]

Uh yeah. Yeah. We already checked the

[494:47]

last one.

[494:49]

>> Not quite. So it's not that we've

[494:51]

already checked the last one. I'm saying

[494:52]

with this line of code here, we never

[494:54]

even go to N minus one. Technically,

[494:57]

>> if we have NUS, it is going to compare

[495:00]

against NUS because that's

[495:02]

>> exactly because we're doing this simple

[495:04]

arithmetic here. We're checking current

[495:05]

location I + 1. You can think of these

[495:08]

as my left and right hand. Left hand is

[495:09]

pointing at zero. Right hand's pointing

[495:10]

at one. I don't want to do something

[495:12]

stupid and have my left hand point at n

[495:14]

minus one because then my right hand

[495:16]

arithmetically when you add one is going

[495:18]

to point at n which does not exist.

[495:20]

That's beyond the boundary of the array

[495:22]

because the array goes from zero to n

[495:24]

minus one. So just a little bit of a

[495:27]

safety check there to make sure we don't

[495:28]

walk right off the end of the array. But

[495:30]

we do this n times because recall that

[495:32]

precious ended up being where uh seven

[495:35]

needed to be at the very end of the

[495:37]

list. But that didn't mean there weren't

[495:39]

seven uh seven more problems still to

[495:41]

solve. 0 through six. So I did it again

[495:43]

and I did it again and per its name

[495:45]

bubble sort the biggest element bubbled

[495:47]

up first then the next biggest then the

[495:49]

next biggest then the next business

[495:50]

biggest biggest that is seven then six

[495:52]

then five then four and we got lucky on

[495:54]

some of them but eventually we finished

[495:55]

with zero. So how do we analyze this

[495:58]

thing? Well, we could also technically

[496:00]

do this n minus one times as an aside if

[496:02]

you're thinking through that I'm wasting

[496:03]

some time because we get one for free

[496:05]

once we get to uh solving seven

[496:07]

problems. You get the eighth one for

[496:08]

free because that person is obviously

[496:10]

where they need to go. So when we had

[496:12]

these numbers initially and we were

[496:14]

comparing them with bubble sort again

[496:15]

left hand right hand it's like treat

[496:16]

this as I this is I plus one and we just

[496:19]

kept swapping pair-wise numbers if in

[496:22]

fact they were out of order. So all this

[496:24]

is saying is what our humans were doing

[496:27]

for us organically. So how do we

[496:29]

actually analyze the running time of

[496:30]

this? Last time I just kind of

[496:32]

spitballled that it was n minus one

[496:34]

steps plus n minus two steps. Well, you

[496:36]

can actually look at pseudo code

[496:37]

sometimes and if it's neatly written,

[496:39]

you can actually infer from the pseudo

[496:41]

code how many steps each line is going

[496:44]

to take. For instance, how many steps

[496:45]

does this first line take? I mean like

[496:47]

literally n minus one. The answer is

[496:49]

right there because it's saying to the

[496:50]

computer or to me acting it out, repeat

[496:52]

the following n minus one times. All

[496:54]

right, so that's helpful. How many line

[496:56]

how many steps does this inner loop

[496:59]

induce? Well, you're going from i to n

[497:01]

minus2. So that's actually n minus one

[497:05]

total steps not n. And then this

[497:08]

question here, if numbers bracket i and

[497:10]

numbers i are out of order, it's a

[497:13]

single question. It's like our boolean

[497:14]

expression. We'll call it one. I mean,

[497:16]

maybe you need to do a bit of more work

[497:17]

than that, but it's a constant number of

[497:19]

steps. Doesn't matter how big the list

[497:20]

is. Comparing two numbers is always

[497:22]

going to take the same amount of time.

[497:24]

And then swapping them, oh, I don't

[497:25]

know, it's going to take like one or two

[497:28]

or three steps, but constant. Doesn't

[497:29]

matter which the numbers are takes the

[497:32]

same amount of work. So, let's

[497:33]

stipulate, let me rewind, stipulate that

[497:35]

the real things that matter are the

[497:37]

loops. These constant number of steps,

[497:39]

who really cares? But the loops are what

[497:41]

are going to add up as n gets large. So

[497:42]

this really then is if this is the outer

[497:44]

loop and this is the inner loop. Think

[497:46]

about our two-dimensional Mario square

[497:48]

from week one. We did something on the

[497:49]

outside and then something on the inside

[497:51]

to get our rows and columns. This is

[497:53]

equivalent to n -1 * n minus one. If we

[497:56]

do our little foil method, n^2 - n - n +

[497:59]

1 combine like terms, n^2 - 2 n + 1. Who

[498:03]

cares? This is ultimately going to be on

[498:05]

the order of big O of

[498:09]

N squared only because again if you ask

[498:11]

yourself when I plug in a really big

[498:13]

value for N which of these is really

[498:15]

going to contribute most to the answer

[498:17]

it's obviously going to be n^ squ again

[498:18]

and we can ignore the lower order terms.

[498:20]

So this doesn't seem to have made any

[498:22]

progress like selection sort was on the

[498:24]

order of big O of N was on the order of

[498:26]

N squ bubble sort based on this analysis

[498:29]

is also on the order of N squed. Maybe

[498:31]

we're getting lucky in the lower bound.

[498:32]

So on the upper bound for bubble sort,

[498:34]

it's indeed n squ as was selection sort.

[498:37]

But with this pseudo code for bubble

[498:39]

sort, unfortunately

[498:41]

we rather unfortunately we were not

[498:43]

doing anything clever to catch that

[498:45]

perverse case where maybe the list was

[498:47]

already sorted. After all, consider if

[498:49]

the list was sorted from 0 to 7. I was

[498:51]

still asking all the same darn

[498:53]

questions. Even if I did no work, I was

[498:55]

going to repeat that n minus one times

[498:57]

back and forth making no swaps but

[498:59]

making all of those comparisons. But

[499:01]

here's an enhancement to bubble sort

[499:03]

that we can add that selection sort

[499:05]

didn't really have room for. I can say

[499:07]

after one pass of this inner loop

[499:10]

walking from left to right, if I made no

[499:12]

swaps, quit. So put another way, if I

[499:15]

traverse the list from left to right, I

[499:17]

make no swaps, I might as well just

[499:19]

terminate the algorithm then because

[499:21]

there's no more work clearly to be done.

[499:24]

All right. So based on that

[499:26]

modification, the lower bound of bubble

[499:29]

sorts running time would be said to be

[499:31]

an omega then of

[499:35]

n because I'm minimally going to need to

[499:38]

make one pass through the list. You

[499:40]

can't possibly claim that the list is

[499:42]

sorted unless you actually check it

[499:43]

once. And if there's n elements, you're

[499:45]

going to have to look at all n of them

[499:46]

to make sure that it's in order. But

[499:48]

after that, if you've done no work and

[499:49]

made no swaps, no reason to traverse the

[499:52]

list again and again and again. So a

[499:54]

bubble sort can be said to be an omega

[499:57]

of n because indeed we can just

[499:59]

terminate after that single pass if

[500:00]

we've done no work. We can't say

[500:02]

anything about theta because they're not

[500:03]

one and the same big O and omega. But

[500:05]

that does seem to have given us some

[500:07]

savings. Unfortunately, it really only

[500:09]

saves us time when the list is already

[500:11]

or mostly sorted. But in the average

[500:13]

case and in the worst case, odds are

[500:15]

they're both going to perform just as

[500:17]

bad on the order of n square. In fact,

[500:19]

let's take a look at a visualization

[500:20]

that'll make this a little clearer than

[500:22]

our own humans and voices uh might have

[500:25]

explained. Here is a bunch of vertical

[500:27]

purple bars uh made by a friend of ours

[500:29]

uh in the real world. And this is an

[500:31]

animation that has a bunch of buttons

[500:32]

that lets us execute certain algorithms.

[500:34]

A small bar represents a small number. A

[500:36]

big bar represents a big number. And the

[500:38]

goal is to get them from small numbers

[500:40]

or small bars to big numbers or big bars

[500:42]

left to right. So I'm going to go ahead

[500:44]

and click on selection sort initially.

[500:46]

And what you'll see from left to right

[500:48]

is in pink the current smallest element

[500:52]

that's been discovered, but also in pink

[500:54]

the equivalent of my walking across the

[500:56]

stage left to right again and again and

[500:58]

again trying to find the next smallest

[501:00]

element. And you'll see clearly just

[501:02]

like when we put Jaden at the far left,

[501:04]

the smallest element ended up over here.

[501:06]

But it might take some time for precious

[501:08]

for instance or number seven to end up

[501:09]

all the way over on the right because

[501:11]

with each pass we're really just fixing

[501:13]

one problem at a time and there's n

[501:16]

problems total which is giving us on the

[501:19]

order of those n squared steps and now

[501:21]

the list is getting shorter so we're at

[501:22]

least doing some work that you don't

[501:24]

have to keep touching the elements you

[501:25]

already sorted which just like I was. So

[501:27]

now selection sort is complete. Let's

[501:29]

visualize instead bubble sort. So let me

[501:31]

rerandomize the array just so we're

[501:33]

starting with a random order. Now let's

[501:34]

click on bubble sort. And you'll see the

[501:36]

pink bars work a little differently. It

[501:38]

conotes which two numbers are being

[501:40]

compared at that moment in time. Just my

[501:41]

like my left hand and right hand going

[501:43]

left to right. And you'll see that even

[501:45]

though it's not quite as pretty as

[501:47]

selection sort where I was getting at

[501:48]

least the smallest elements all the way

[501:50]

to the left here, we're just pair fixing

[501:53]

pair-wise problems, but the biggest

[501:55]

elements like precious's number seven

[501:57]

are indeed bubbling their way up to the

[501:59]

top one after the other. But as you can

[502:03]

see, and this is where n squared is sort

[502:04]

of visual visualizable, we're touching

[502:07]

these elements or looking at them so

[502:09]

many times again and again. We are

[502:12]

making so many darn comparisons. This is

[502:14]

taking frustratingly long. And this is

[502:16]

only what a few dozen bars or numbers.

[502:19]

You can imagine how long this might take

[502:21]

with hundreds, thousands, or millions of

[502:23]

values. I dare say we're going to have

[502:24]

to do better than bubble sort and

[502:28]

selection sort because we're not done

[502:30]

even yet. just trying to give the

[502:32]

satisfaction of getting to the end and

[502:33]

now we are. But neither of those

[502:36]

algorithms seems incredibly performant

[502:38]

because it's still taking us quite a bit

[502:40]

of time to actually get to that there

[502:43]

solution. So how can we actually do

[502:45]

better than that? Well, we can try

[502:47]

taking a fundamentally different

[502:49]

approach. And this is one technique that

[502:51]

you might have encountered in math or

[502:53]

even in the real world even if you

[502:54]

haven't sort of applied this name to it.

[502:57]

Recursion is a technique in mathematics

[503:00]

and in programming that allows you to

[503:02]

take sort of a fundamentally different

[503:03]

approach to a problem. And in short, a

[503:05]

recursive function is one that's uh

[503:07]

defined in terms of itself. So if you

[503:09]

had like f ofx equals f of something on

[503:12]

the right hand side of a mathematical

[503:13]

expression, that would be recursive in

[503:14]

that the function is dependent on

[503:16]

itself. More practically in the world of

[503:19]

programming a recursive function is a

[503:21]

function that calls itself. So if you

[503:23]

are writing some function in C and in

[503:26]

that function you call yourself you

[503:29]

actually have a line of code that says

[503:30]

call that same function by the same

[503:32]

name. That function is recursive. Now

[503:35]

this might feel a little weird because

[503:37]

if a function is calling itself it feels

[503:38]

like this is the easiest way to get into

[503:40]

an infinite loop because why would it

[503:41]

ever stop if the function is calling

[503:43]

itself calling itself calling itself

[503:44]

calling itself? We're going to have to

[503:45]

actually address that kind of problem.

[503:47]

But in the real world, we've actually or

[503:49]

rather in this class already, we've

[503:51]

actually seen implicitly an example of

[503:53]

this including today as well as in week

[503:55]

zero. So here is that algorithm for

[503:57]

searching the doors of the lockers. And

[504:00]

recall that after we did this check at

[504:02]

the very top, if there are any doors

[504:03]

left, return false. If if uh not, we did

[504:07]

these uh conditions. We said if the

[504:10]

number is behind the middle door, return

[504:11]

true cuz we found it. But things got

[504:13]

interesting here where I said if else if

[504:15]

the number is less than the middle door

[504:17]

then search the left half. Else if the

[504:20]

number is greater than the middle door

[504:21]

then search the right half. Well at that

[504:23]

point in time you should be asking me or

[504:25]

yourself well how do I sort search the

[504:27]

left half? How do I search the right

[504:29]

half? Well here you go. Like on the

[504:31]

screen right now is a search algorithm.

[504:34]

And even though it says down here search

[504:36]

the left half or search the right half

[504:38]

which is like well how do I do that?

[504:40]

We'll just use the same algorithm again.

[504:42]

And this is how in terms of my voice

[504:44]

over, you end up searching the left half

[504:46]

of the left half or the right half of

[504:48]

the left half or any such combination.

[504:51]

This line here, search left half. This

[504:54]

line here, search right half, is

[504:55]

representative of a recursive call. This

[504:58]

is an algorithm or a function that calls

[505:01]

itself. But why does it not induce an

[505:04]

infinite loop? Like why is it important

[505:06]

that this line and this line are written

[505:09]

exactly as they are so as to avoid this

[505:11]

thing just forever searching aimlessly?

[505:15]

Yeah,

[505:15]

>> there's the condition at which it stops.

[505:17]

>> We do have this condition at which it

[505:19]

stops. But more importantly, what is

[505:21]

happening before I make these recursive

[505:23]

calls?

[505:27]

>> Exactly. I'm recursing that is calling

[505:29]

myself but I'm handing myself a smaller

[505:32]

problem. A smaller problem. a smaller

[505:34]

problem. It would be bad if I just

[505:37]

handed myself the exact same number of

[505:38]

doors and just kept saying, "Search

[505:40]

these, search these, search these."

[505:41]

Because you would never make any

[505:42]

progress. But just like our volunteers

[505:44]

earlier, so long as we did divide and

[505:46]

conquer and we search smaller and

[505:48]

smaller numbers of doors, eventually

[505:49]

indeed we're going to bottom out and

[505:51]

either find the number we're looking for

[505:52]

or we're not. So, generally, we're going

[505:55]

to call these kinds of conditions that

[505:57]

sort of just ask a very obvious question

[505:59]

and want an immediate answer base cases.

[506:02]

Base cases are generally conditionals

[506:04]

that ask a question to which the answer

[506:06]

is going to be yes or no right then and

[506:07]

there. A recursive case by contrast

[506:10]

these two down here is when you actually

[506:12]

need to do a bit more work to get to

[506:14]

your final answer. You call yourself but

[506:17]

with a smaller version of the problem.

[506:20]

So we could have in fact in week zero

[506:22]

have written this sort of similarly. If

[506:24]

you go back to in your mind to week zero

[506:26]

we had more of a procedural approach so

[506:28]

to speak. When we were searching the

[506:29]

phone book, I proposed that this induced

[506:31]

what we called loops on line 8 and line

[506:33]

11, which just literally said go back to

[506:35]

line three. And that was more of a

[506:36]

mechanical way of sort of inducing a

[506:38]

loop structure. But if I really wanted

[506:40]

to be elegant, I could have said, well,

[506:42]

you know what? 7 and 8 together really

[506:44]

just mean search the left half. And 10

[506:46]

and 11 together really mean just search

[506:48]

the right half. So let's condense these

[506:51]

pairs of lines into shorter

[506:53]

instructions. Search the left half of

[506:55]

the book. Search the right half of the

[506:56]

book. I can then delete two blank lines

[506:58]

and now I have a recursive algorithm for

[507:01]

searching a phone book. It's a little

[507:03]

less obvious because you have to ask

[507:04]

yourself when you get to line seven or

[507:05]

nine, wait a minute, how do I search the

[507:07]

left half or the right half? And that's

[507:08]

when you need to realize you start the

[507:10]

same algorithm again but with a problem

[507:12]

that's half as large. In week zero, we

[507:15]

do the procedural approach where we

[507:17]

literally tell you what line of code to

[507:18]

go to, but today we're offering a

[507:21]

different formulation, a recursive

[507:22]

approach where it's more implicit what

[507:24]

you should do. and we'll see now a

[507:26]

couple of examples from the real world,

[507:28]

so to speak. So, here's a screenshot

[507:30]

from Super Mario Brothers 1 on the

[507:32]

original Nintendo uh entertainment

[507:34]

system. Let me go ahead and get rid of

[507:35]

some of the distraction like the the um

[507:37]

ground and the mountains there. And here

[507:39]

we have a sort of half pyramid, not

[507:41]

unlike that you implemented in problem

[507:43]

set one. But this is an interesting

[507:45]

realworld physical structure in that you

[507:48]

can define it recursively. Like what is

[507:51]

a pyramid of height for if you will?

[507:54]

Well, just to be a little uh a little

[507:56]

difficult, a pyramid of height four is

[507:57]

really just a pyramid of height three

[507:59]

plus one more row. Okay. Well, what is a

[508:02]

pyramid of height three? Well, a pyramid

[508:04]

of height three is really just a pyramid

[508:05]

of height two plus one more row. Well,

[508:08]

what's a pyramid of height two? Well, a

[508:10]

pyramid of height two is really just a

[508:11]

pyramid of height one plus one more row.

[508:14]

Well, what's a pyramid of height one? A

[508:17]

single brick on the screen. And I sort

[508:19]

of changed my tone with that last remark

[508:21]

to convey that this could then be our

[508:23]

base case whereby I just tell you what

[508:25]

the thing is without sort of kicking the

[508:27]

can and inviting you to think through

[508:29]

what a smaller structure is plus one

[508:31]

more row. Whereas every other definition

[508:34]

I gave you then of a pyramid of some

[508:35]

height was defined in terms of that same

[508:39]

structure albeit a smaller version

[508:41]

thereof. So we can actually um see this

[508:45]

in the real world. Let me go ahead and

[508:46]

pull up one thing here. I'm going to go

[508:48]

to uh give me one sec before I flip

[508:50]

over. Here I am on google.com. If you'd

[508:53]

like a little computer science humor

[508:54]

here, uh if you ever Google search for

[508:57]

recursion and hit enter, you'll see uh a

[509:01]

joke that computer scientists at Google

[509:03]

find funny.

[509:06]

Haha. One, two laughs. Does anyone see

[509:08]

the joke? I did not make a typo, but

[509:11]

Google's asking me, did I mean

[509:12]

recursion? And if I click on that, I

[509:15]

just get the same haha page. Okay. All

[509:18]

right. That didn't go over well. Anyhow,

[509:20]

so there are these Easter eggs in the

[509:22]

wild everywhere because computer

[509:23]

scientists are the ones that implement

[509:24]

these things. But let's go ahead and

[509:26]

actually um implement, for instance, a

[509:29]

version of this in code. Let me go back

[509:31]

over here in a moment to VS Code. And in

[509:33]

VS Code, let me propose that in my

[509:35]

terminal window, let me create one of

[509:37]

two final programs. This one's going to

[509:39]

be called iteration C. Just to make

[509:41]

clear that this is the iterative that is

[509:43]

loop-based version of a program whose

[509:46]

purpose in life is to print out a simple

[509:48]

Mario pyramid. I'm going to go ahead and

[509:50]

include cs50.h at the top as well as

[509:53]

standard io.h. I'm not going to need

[509:55]

string.h. I don't need any command line

[509:57]

arguments today. So this is going to

[509:58]

start off with inmain void. And now I'm

[510:00]

going to go ahead and ask a question

[510:02]

like uh give me a variable called height

[510:05]

of type integer and ask the human for

[510:07]

the height of this Mario like pyramid.

[510:10]

And then let's assume for the moment

[510:11]

that I've already implemented a function

[510:13]

called draw whose purpose in life is to

[510:14]

draw a pyramid of that height semicolon.

[510:17]

So I've abstracted away for the moment

[510:19]

the notion of drawing that pyramid. Now

[510:21]

let's actually implement draw whose

[510:23]

purpose in life again is to print out a

[510:25]

pyramid akin to the one we saw a moment

[510:27]

ago like this here on the screen. Well,

[510:29]

in order to print out a pyramid of a

[510:31]

given height, I think I need to say uh

[510:34]

void uh draw int n for instance because

[510:38]

I'm not going to bother returning a

[510:40]

value. I just want this thing to print

[510:41]

something on the screen. So void is the

[510:43]

return type. But I do want to take as

[510:44]

input an integer like the height of the

[510:46]

thing I want to print. I can call this

[510:48]

argument or parameter anything I want.

[510:50]

I'll call it n for number. So how can I

[510:52]

print out a pyramid that again looks

[510:54]

like this? Well, I'll do this quicker

[510:56]

than you might have in problem set one.

[510:57]

But seems obvious that like on the first

[510:59]

row I want one brick. On the second row

[511:01]

I want two. On the third I want three.

[511:03]

On the fourth I want four. So it's

[511:04]

actually a little easier than problem

[511:06]

set one in that it's sloped in a

[511:08]

different direction. So let me go ahead

[511:09]

and do exactly this in code. Let me say

[511:12]

for int i= 0 i less than n the height i

[511:16]

++. So this is going to be really for

[511:19]

each row of the pyramid pyramid. Let me

[511:23]

go ahead now and in an inner loop for

[511:26]

int j equals z, let's do j less than i +

[511:32]

1 for reasons we'll see in a moment and

[511:33]

then j++ and then inside of this loop

[511:36]

let's just print out a single hash no

[511:38]

new line but at the end of the row let's

[511:40]

print out a single new line to move the

[511:43]

cursor to the next line. Now why am I

[511:45]

doing this? Well, this represents for

[511:47]

each column of pyramid. And if you think

[511:51]

about it, on the first row, which is row

[511:53]

zero, I actually want to print not zero

[511:56]

bricks, but one brick. So that's why I

[511:58]

want to go ahead here and go from zero

[512:01]

to i + 1 because if i is zero, i + 1 is

[512:04]

1. So my inner loop is going to go from

[512:06]

0 to 1, which is going to give me one

[512:08]

brick. It's a little annoying to think

[512:10]

about the math, but this just makes sure

[512:11]

that I'm actually getting bricks in the

[512:13]

order I want them. And then it's going

[512:14]

to give me two bricks and then three and

[512:16]

then four. And between each of those

[512:18]

rows, it's going to print a new line. So

[512:20]

let's go ahead and do make iteration to

[512:21]

compile this code. Ah, I messed up. Why

[512:25]

do I have a mistake on line

[512:28]

eight of this code? Let me hide my

[512:30]

terminal and scroll back up. It seems

[512:32]

clang. My compiler does not like my draw

[512:34]

function. Yeah.

[512:38]

Yeah, I forgot the prototype. So this is

[512:40]

the one and only time where it seems

[512:41]

reasonable to copy paste. Let's grab the

[512:43]

prototype of that function up here and

[512:45]

go ahead and teach the compiler from the

[512:47]

get-go what this function is going to

[512:49]

look like even though I'm not defining

[512:50]

it now until line 13 onward. All right,

[512:52]

let's go ahead and make iteration again.

[512:55]

Ah, dot /iteration. Enter. Let's do a

[512:57]

height of say four. And voila, now I've

[513:00]

got that there pyramid. So, I did it a

[513:02]

little quickly and it's certainly to be

[513:03]

expected if it took you hours on problem

[513:05]

set one to get the other type of pyramid

[513:06]

printed. But the point for today is

[513:08]

really to demonstrate how we can print a

[513:10]

pyramid like this using indeed what I'd

[513:13]

call iteration. Iteration just means

[513:15]

using loops to solve some problem. But

[513:18]

we can alternatively use recursion by

[513:20]

reimplementing our draw function in a

[513:23]

way that's defined in terms of itself.

[513:25]

So let me go into my code here and I'm

[513:28]

actually going to leave the prototype

[513:30]

the same. I'm going to leave main the

[513:32]

same. But what I'm going to go ahead and

[513:33]

do is delete all of this iterative code

[513:36]

that's doing things very procedurally

[513:38]

step by step by step with loops. And I'm

[513:41]

instead going to do something like this.

[513:44]

Well, if I want to print a pyramid of

[513:46]

height n, what did I say earlier? Well,

[513:49]

a pyramid of height n is really just a

[513:51]

pyramid of height n minus one plus one

[513:54]

more row. So, how do I implement encode

[513:57]

that idea? Well, let me go back in code

[513:59]

here and say, well, if a pyramid of

[514:01]

height n first requires drawing a

[514:03]

pyramid of height n minus one, I think I

[514:06]

can just write this, which is kind of

[514:07]

crazy to look at, but cuz you're calling

[514:09]

yourself in yourself, but let's see

[514:12]

where this takes us. Once I have drawn a

[514:14]

pyramid of height n minus one, that is a

[514:17]

height three for instance, what remains

[514:19]

for me to do is to myself print one more

[514:21]

row. And so to print one more row, I

[514:24]

think I can do that really easily with

[514:25]

fewer loops. I can do four int i= 0 i

[514:29]

less than n i ++ and then very simply in

[514:33]

this loop I can print out a single hash

[514:35]

one at a time at the end of this loop I

[514:37]

can print out a new line but no more

[514:39]

nesting of loops what I've done is print

[514:43]

one more row and here I've done print a

[514:48]

pyramid of height n minus one

[514:52]

I'm not quite done yet but I think this

[514:54]

is consistent with my verbal definition

[514:56]

that a pyramid of height three is a

[514:58]

pyramid of height sorry a pyramid of

[514:59]

height four is a pyramid of height three

[515:02]

which I can implement per line 16 just

[515:06]

draw me a pyramid of height n minus one

[515:08]

and then I myself will take the trouble

[515:10]

to print the fourth and final row but

[515:14]

something's missing in this code let me

[515:16]

go ahead and try running it let's see

[515:18]

what happens make oh oh darn it I meant

[515:21]

to call this something else so I'm going

[515:22]

to do this I'm going to close this

[515:24]

version here I'm going going to rename

[515:26]

iteration C to recursion C to make clear

[515:29]

that this version is completely

[515:31]

different. Let me now go ahead and make

[515:33]

the recursion version. And huh, Clang is

[515:36]

noticing that I have screwed up. On line

[515:38]

14, it says error. All paths through

[515:41]

this function will call itself. And

[515:43]

Clang doesn't even want to let me

[515:44]

compile this code because that would

[515:46]

mean literally just forever

[515:48]

loop effectively by calling yourself. So

[515:51]

what am I missing in my code here? If I

[515:53]

open up what we're now calling

[515:54]

recursion.c

[515:56]

in my editor,

[515:58]

what's missing here over here? Yeah, I'm

[516:01]

missing a base case. And I can express

[516:03]

this in a few different ways, but I

[516:04]

would propose that before I do any

[516:06]

drawing of anything at all, let's just

[516:08]

ask ourselves if there is anything to

[516:10]

draw. So, how about if n equals zero,

[516:14]

well then don't do anything, just

[516:16]

return. You don't return a value. When

[516:18]

your return value is void, it means you

[516:19]

don't return anything. So you just

[516:21]

return period or return semicolon. Or

[516:24]

just to be super safe, I could actually

[516:26]

do something like this, which is

[516:27]

arguably better practice just in case I

[516:29]

get into this perverse scenario where

[516:31]

someone hands me a negative number. I

[516:32]

want to be able to handle that and not

[516:34]

print anything either. So just to be

[516:36]

safe, I might say less than or equal to

[516:38]

zero. I'm not doing one because if I did

[516:41]

do one, then I would want to at least

[516:43]

myself print out one brick, which is

[516:44]

fine, but I'd have to like rech change

[516:46]

all of my code a little bit. So I think

[516:48]

it's safer if my base case is just if n

[516:52]

is less than or equal to zero, you're

[516:54]

done. Don't do anything. And this then

[516:56]

ensures that even though thereafter I

[516:58]

keep calling draw again and again and

[517:01]

again and the problems getting smaller

[517:02]

and smaller from four to three to two to

[517:04]

one, as soon as I hit zero, the function

[517:07]

will finally

[517:09]

return.

[517:11]

So let's go ahead and open up my

[517:12]

terminal. Rerun make recursion to make

[517:14]

this version did compile this time. dot

[517:16]

/recursion enter let's type in four

[517:19]

cross my fingers and this too prints the

[517:22]

exact same thing and even though it

[517:23]

doesn't look like fewer lines of code I

[517:25]

would offer that there's an elegance to

[517:27]

what I've just done whereas with the

[517:28]

iterative version with all the loops it

[517:30]

was very clunky like step by step just

[517:32]

print this and print that and have a

[517:33]

nested loop inside of another but with

[517:35]

this especially if we distill it into

[517:37]

its essence by getting rid of my

[517:39]

comments like this and frankly I can get

[517:42]

rid of the unnecessary curly braces only

[517:44]

because for single lines in

[517:46]

conditionals. You don't need them. Like

[517:48]

this is arguably like a very beautiful

[517:50]

implementation of drawing Mario's

[517:52]

pyramid even though it's calling itself

[517:54]

and arguably because it is calling

[517:57]

itself.

[517:58]

Questions then on this idea of recursion

[518:01]

or this implementation of Mario? Yeah.

[518:05]

>> Are there no scope issues involved if

[518:07]

you like?

[518:09]

>> Good question. Are there any scope

[518:11]

issues involved? Short answer, no.

[518:13]

However, the current value of I, for

[518:16]

instance, will not be visible to the

[518:18]

next time the function is called. It

[518:19]

will have its own copy of I, if that's

[518:22]

what you mean. And we'll next week talk

[518:24]

in more detail about what's going on

[518:25]

here. And in fact, I probably can't

[518:27]

break this in class very easily. But it

[518:30]

turns out if I use a very large version

[518:33]

for heights, let's just hit a lot of

[518:34]

zeros and see what happens. That was too

[518:36]

many. Let's see what happens. That's

[518:38]

also too many. Let's see what happens

[518:40]

there.

[518:42]

That's the first time at least I in

[518:43]

class have encountered this error. You

[518:45]

might have encountered this weird bug in

[518:47]

office hours or in your problem set and

[518:48]

that's fine if you did. We'll talk about

[518:49]

what this means next week too. But this

[518:51]

is bad. Like this clearly hints at a

[518:53]

problem in my code. However, the

[518:56]

iterative version of this program would

[518:57]

not have that same error. So this

[518:59]

relates to something involving memory

[519:01]

because it turns out as a little teaser

[519:03]

for next week, each time I call draw,

[519:05]

I'm using a little more memory, a little

[519:06]

more memory, a little more memory, a

[519:08]

little more memory, and my computer only

[519:09]

has so much memory. this program in its

[519:11]

current form is using too much memory.

[519:13]

There are workarounds to this, but that

[519:15]

is a trade-off to the elegance we're

[519:17]

gaining in this solution. So, what's the

[519:19]

point of all this? And how do we get

[519:21]

sidetracked by Mario? There's another

[519:22]

sorting algorithm. The third and final

[519:24]

one that we'll consider today that

[519:26]

actually uses recursion to solve the

[519:28]

problem not only elegantly arguably, but

[519:31]

also way faster somehow than bubble sort

[519:34]

and selection sort. And in essence, it

[519:36]

does so by making far fewer comparisons

[519:38]

and wasting a lot less work. It doesn't

[519:42]

keep comparing the same numbers again

[519:43]

and again. Here in its essence is the

[519:46]

pseudo code for merge sort. Sort the

[519:49]

left half of the numbers, sort the right

[519:51]

half of the numbers, then merge the

[519:52]

sorted halves. And this is kind of a

[519:54]

weird implementation of an algorithm

[519:56]

because I'm not really telling you

[519:57]

anything. It seems like you're asking me

[519:59]

how do I sort numbers and I say, well,

[520:00]

sort the left half, sort the right half.

[520:02]

It's like someone being difficult. And

[520:03]

yet implicit in this third line is

[520:06]

apparently some magic. This notion of

[520:08]

merging halves that are somehow already

[520:10]

sorted is actually going to yield a

[520:13]

successful result. As an aside, we're

[520:15]

actually going to need one base case

[520:16]

here, too. So, if you're only given one

[520:18]

number, you might as well quit right

[520:19]

away because there's nothing to do. So,

[520:21]

we'll toss that in there as well. And

[520:23]

base cases are often for zero or one or

[520:25]

some smallum sized problem. In this

[520:28]

case, it's a little easier to express it

[520:29]

as one because if you have one element,

[520:31]

it's indeed already sorted. So, what

[520:34]

does it mean to merge two sorted halves?

[520:36]

Well, let's actually consider this. I'm

[520:37]

going to reuse some of these same

[520:38]

numbers here. I'm going to put my one,

[520:41]

my three, my four, and my six on the

[520:44]

left. And these together represent a

[520:46]

list that is indeed sorted of size four.

[520:50]

And then I'm going to put four other

[520:51]

numbers on the right there that are

[520:52]

similarly sorted as well. And by merging

[520:56]

these two lists, I mean start at the

[520:59]

left end of this list, start at the left

[521:01]

end of this list, and just decide one

[521:02]

step at a time which number is the next

[521:05]

smallest. And then I'm going to put it

[521:06]

on the top shelf to make clear what is

[521:07]

sorted. So if my left hand's pointing at

[521:09]

this list, my right hand's pointing at

[521:11]

there, which hand is obviously pointing

[521:13]

to the smaller element, left or right?

[521:15]

Like the right. So I'm going to grab

[521:17]

this and I'm going to use a little more

[521:18]

space up top here and put the zero in

[521:20]

place. And then I'm going to point to

[521:21]

the next element there. So my left hand

[521:23]

has not moved yet. It's still pointing

[521:24]

at the one. My right hand is pointing at

[521:25]

the two. Which number comes next?

[521:27]

Clearly left. So, I'm going to grab the

[521:29]

one and put it up there and update where

[521:31]

my left hand is pointing. So, now I'm

[521:33]

pointing at the three here and the two

[521:35]

there. What comes next? Obviously the

[521:36]

two. What comes next? Obviously the

[521:39]

three. What comes next? Obviously the

[521:42]

four. What comes next? Obviously the

[521:45]

five. But notice my hands are not going

[521:48]

back and forth, back and forth, back and

[521:49]

forth like any of the algorithms thus

[521:50]

far. I'm just taking baby steps, moving

[521:53]

them only to the right, effectively

[521:55]

pointing at for a final time each number

[521:58]

once and only once. What comes next?

[522:00]

Six. And now my left hand is done. What

[522:02]

comes last? The number seven. So what I

[522:04]

just did is what I mean by merge the

[522:07]

sorted halves. If you can somehow get

[522:09]

into a scenario where you've got a small

[522:12]

list sorted and another small list

[522:14]

sorted, it's super easy now to merge

[522:16]

them together using that left right

[522:18]

approach, which I'll claim only takes n

[522:21]

steps. Why? Because every time I asked

[522:23]

you a question, I was taking one bite

[522:24]

out of the problem. There's eight bytes

[522:26]

total. I asked you eight questions or I

[522:28]

would have if I verbalized them all. So,

[522:30]

it's n steps total to merge lists of

[522:32]

that size. So, what then is merge sort?

[522:35]

Merge sort is really all three of these

[522:38]

steps together only one of which we've

[522:40]

acted out. Two of which are sort of

[522:42]

cyclical in nature. They're recursive by

[522:44]

design. So what does this mean? Well,

[522:46]

let's start with this list of eight

[522:48]

numbers which is clearly out of order. 6

[522:50]

3 4 1 5270. And let's apply merge sort

[522:54]

to this set of numbers. And I'll do it

[522:55]

digitally here because it'll take

[522:57]

forever to keep moving the numbers up

[522:58]

and down physically. So let's move it to

[523:00]

the top just to give ourselves a little

[523:01]

bit more room. And let me propose that

[523:04]

we apply merge sort. What was the very

[523:06]

first step in merge sort? At least that

[523:08]

we highlighted the juicy steps.

[523:11]

What's the first step in merge sort?

[523:14]

Sort the left half. Yeah. And then the

[523:16]

second step was going to be sort the

[523:17]

right half. And then the third step was

[523:19]

going to be merge the sorted halves. So

[523:20]

let's see what this means by actually

[523:22]

acting it out on these numbers. So

[523:23]

here's my eight numbers. Let's go ahead

[523:25]

and sort the left half. Well, the left

[523:27]

half is obviously going to be the four

[523:29]

numbers on the left. And I'm just going

[523:30]

to pull them out just to draw our

[523:32]

attention to them over here. Now I have

[523:34]

a list of size four and the goal is to

[523:37]

sort the left half. How do I sort a list

[523:39]

of size four?

[523:43]

>> Uh be well yes but just be more pedantic

[523:46]

like how do I sort any list using merge

[523:49]

sort

[523:50]

>> sort the left half. So let's do just

[523:52]

that. So of a list of size four how do I

[523:55]

sort this? Well I'm going to sort the

[523:56]

left half. How do I sort a list of size

[523:58]

two?

[524:00]

>> Sort the left half. All right. Well I'm

[524:01]

just going to write the six here. How do

[524:03]

I sort a list of size one?

[524:06]

I just don't. I'm done. That was the

[524:07]

so-called base case where I just said

[524:09]

return. Like I'm done sorting the list.

[524:12]

Okay, so here I here's the story recap.

[524:15]

Sort the left half. Sort the left half.

[524:18]

Sort the left half. And I just finished

[524:20]

sorting this. So what comes next? Sort

[524:23]

the right half, which is this. And now

[524:26]

I've sorted the left half of the left

[524:29]

half of the left half, which is a big

[524:32]

mouthful. But what do I do as a third

[524:34]

and final step when sorting this list of

[524:36]

size two? Merge them. This part we know

[524:39]

how to do. I point left and right. And I

[524:41]

now take the smallest element first,

[524:43]

which is the three. Then I take the six.

[524:45]

And now this list of size two is sorted.

[524:47]

So if you remind in your mind's eye,

[524:49]

what step are we on? Well, we have now

[524:51]

sorted the left half of the left half.

[524:54]

So what comes after the left half is

[524:57]

sorted? We sort the right half. So we're

[524:59]

sort of rewinding in time, but that's

[525:01]

okay. I'm keeping track of the steps in

[525:03]

my mind. I want to now sort this list of

[525:06]

size two. How do you sort a list of size

[525:08]

two? Well, you divide it into a list of

[525:10]

size one. How do you sort this? You're

[525:12]

done. You then take the other right half

[525:14]

and you sort it. Done. Now you merge the

[525:16]

two sorted halves. So I point at the

[525:18]

four and the one. Obviously the one

[525:19]

comes first, then the four. Now I have

[525:21]

sorted the right half of the uh the

[525:24]

right half of the left half of the

[525:28]

original numbers. What's the next step?

[525:29]

Now that I have the left and right

[525:31]

halves of this list of s four sorted

[525:35]

merge those. So same idea but with fewer

[525:37]

elements. I'm pointing at the three and

[525:38]

the one. Obviously the one comes. Now

[525:40]

I'm pointing at the three and the four.

[525:41]

Obviously the three comes next. Pointing

[525:43]

at the six and the four. The four comes

[525:44]

next. And now the six comes last. Now I

[525:47]

have sorted the left half. And it's

[525:49]

intentional that 1 3 4 6 is the original

[525:52]

arrangement of the lighted numbers I had

[525:54]

on the shelves a moment ago. All right,

[525:56]

it's a long story it seems. But what

[525:57]

comes after you sorting the left half of

[526:00]

the original list? You sort the right

[526:02]

half. So let's put some uh put those

[526:04]

numbers over here. How do I sort a list

[526:06]

of size four? Well, you sort the left

[526:08]

half. How do you sort this thing of size

[526:09]

two? You sort the left half. You sort

[526:11]

the right half. And now you merge those

[526:13]

together. How do I now sort the right

[526:16]

half of the right half? Well, I sort the

[526:19]

left half. I sort the right half. And

[526:22]

then I merge those together. Now I have

[526:25]

sorted the left half and the right half

[526:28]

of the right half of the original

[526:30]

elements. What's next? The merging 0 2 5

[526:34]

and 7. Now we're exactly where we were

[526:37]

originally with the lighted numbers.

[526:38]

I've got 1 3 4 6. The left half sorted

[526:41]

0257. The right half sorted. What's the

[526:43]

third and final step? Merge those two

[526:46]

halves. of course 0 1 2 3 4 5 6 and 7

[526:54]

and hopefully even though there's a lot

[526:55]

of words that come out of my mouth I was

[526:56]

acting this out there wasn't a lot of

[526:58]

back and forth like I definitely wasn't

[527:00]

like walking back and forth physically

[527:01]

and I also wasn't comparing the same

[527:03]

numbers again and again I was doing sort

[527:05]

of different work at different

[527:06]

conceptual levels but that was like only

[527:09]

what like three levels total it wasn't n

[527:11]

levels on the board visually so where

[527:14]

does this get us with merge sort s.

[527:16]

Well, with merge sort, it would seem

[527:18]

that we have an algorithm that I claim

[527:21]

is doing a lot less work. The catch,

[527:23]

though, is that merge sort requires

[527:24]

twice as much space, just as we saw when

[527:26]

I needed two shelves in order to merge

[527:29]

those two lists. So, how much less work

[527:32]

is actually going to be possible? Well,

[527:33]

let's consider sort of the analysis of

[527:36]

the original list and how we might

[527:38]

describe its its running time in terms

[527:40]

of this big O notation. Hopefully, it's

[527:42]

not going to be as bad as n^ squ

[527:44]

ultimately. So, here are some like

[527:46]

breadcrumbs that if I hadn't kept

[527:47]

updating the screen and deleting numbers

[527:49]

once we moved them around, here are sort

[527:51]

of like traces of every bit of work that

[527:53]

we did. We started up here. We did the

[527:55]

left half, the left half of the left

[527:57]

half, the right half of the right half,

[527:58]

and then everything else in between. And

[528:00]

you'll see that essentially I took a

[528:02]

list of size eight and I did three

[528:05]

different passes through it. At this

[528:07]

conceptual level, at this conceptual

[528:09]

level, and at this one. And each time I

[528:10]

did that, I had to merge elements

[528:12]

together. And if you kind of think about

[528:13]

it here, I pointed at four elements here

[528:16]

and four elements here. And in total, I

[528:18]

pointed at eight elements. So there was

[528:20]

n steps here for merging. And if you

[528:22]

trust me, I'll claim that on this level

[528:24]

conceptually, there were also eight

[528:25]

steps. I wasn't merging lists of size

[528:27]

four, but I was merging two lists of

[528:29]

size two over here and two more lists of

[528:31]

size two over there. So if you add those

[528:33]

up, those are n total steps or or

[528:35]

merges, if you will. And then down here,

[528:37]

this was sort of kind of silly. I was

[528:39]

but I was merging ultimately eight

[528:42]

single lists alto together into the

[528:45]

higher level of con uh of conceptually.

[528:48]

So from a list of size eight we sort of

[528:49]

had three levels of work and on each

[528:53]

level we did n steps the merging. So

[528:56]

where is three? Well it turns out if you

[528:57]

have eight elements up here the

[528:59]

relationship between 8 and three is

[529:01]

actually something formulaic and we can

[529:02]

describe it as log base 2 of n. Why?

[529:05]

Because if n is eight, if you don't mind

[529:06]

doing some logarithms here, log base 2

[529:09]

of 8 is the same thing as log base 2 of

[529:10]

2 to the 3 power. The log 2 and the two

[529:13]

cancel itself out, which gives you

[529:14]

exactly the number three that I sort of

[529:18]

visualized with those traces on the

[529:20]

screen. Which is to say irrespective of

[529:23]

the specific value of n the big O

[529:25]

running time of merge sort is apparently

[529:28]

not n^ squ but it's log n time n or more

[529:34]

conventionally n * log n because you're

[529:37]

doing n things log n times technically

[529:40]

base 2 but we don't care about that

[529:42]

generally for big O notation and indeed

[529:43]

in big O notation we would say that

[529:46]

merge sort is on the order of N log N

[529:49]

that's its big O running time sort of at

[529:51]

the upper bound. What about the lower

[529:52]

order bound? Well, there's no clever

[529:54]

optimization in our current

[529:55]

implementation as there was for bubble

[529:57]

sort. And so it turns out the lower

[529:59]

bound would be an omega of n login and

[530:02]

in theta therefore of n login as well

[530:04]

because big o and omega are in fact in

[530:07]

this case one and the same. And if we

[530:08]

actually go back to our visualization

[530:10]

from earlier, give me just a moment to

[530:12]

pull that up here. In our earlier

[530:14]

implementation or an earlier

[530:16]

demonstration of these algorithms, we

[530:19]

had a side-by-side comparison of all the

[530:21]

comparisons. But here, if I go ahead and

[530:23]

randomize it and click merge sort,

[530:25]

you'll see a very different and clearly

[530:27]

faster algorithm. Even though the

[530:29]

computer speed has not changed, but it's

[530:31]

touching these elements so many fewer

[530:34]

times, it's wasting a lot less time

[530:36]

because of this cleverness where it's

[530:38]

instead dividing and conquering the

[530:39]

problem into smaller and smaller and

[530:41]

smaller pieces. And to give this a final

[530:44]

flourish since that was yes faster but

[530:46]

not necessarily obviously faster than

[530:48]

other things that we've done. How might

[530:50]

we actually compare these things side by

[530:53]

side by side? Well, in our final moments

[530:55]

together, let's go ahead and

[530:56]

dramatically and for no real reason just

[530:58]

dim the lights so that I'll hit play on

[531:00]

a visualization that at the top is going

[531:02]

to show you selection sort with a bunch

[531:04]

of random data. On the bottom is going

[531:06]

to show you show you bubble sort with a

[531:08]

bunch of random data. And in the middle

[531:09]

is going to show you merge sort. And the

[531:11]

takeaway ultimately for today is the

[531:13]

appreciable feel of difference between

[531:17]

big O of N^2 and now big O of N log N.

[531:37]

Heat. Heat.

[532:28]

All right. The music just makes sorting

[532:30]

more fun. But that's it for today. We

[532:31]

will see you next time.

[533:52]

All right. This is CS50 and this is week

[533:55]

four, the week in which we take off the

[533:57]

proverbial training wheels that have

[533:59]

been the CS50 library and reveal to you

[534:01]

all the more what's going on underneath

[534:03]

the hood of a computer in terms of its

[534:04]

memory. We'll also talk about files and

[534:06]

how you can actually persist information

[534:08]

for a long time, whether it's a file

[534:09]

you've downloaded or today that you've

[534:11]

created yourself. But first, I just

[534:13]

wanted to share some artwork that two of

[534:14]

your classmates, Avery and Marie, kindly

[534:17]

made before class, which is a picture

[534:20]

made out of Post-it notes. uh some

[534:22]

green, some purple, which collectively

[534:25]

from where you are looks like what?

[534:28]

>> Yeah. So indeed it's a cat that they

[534:31]

made using only zeros and ones or green

[534:34]

and purple pieces. And in fact, even

[534:36]

though this is fairly low resolution in

[534:38]

that it only has a few pixels this way

[534:40]

and a few pixels this way, it's actually

[534:42]

representative of how computers do

[534:44]

actually store images underneath the

[534:47]

hood. So let's actually start there. In

[534:49]

fact, we've had this bowl of stress

[534:50]

balls for some time here on the lect

[534:52]

turn. And if we take a beautiful photo

[534:53]

of it, they look a little something like

[534:55]

this. Of course, this too is a finite

[534:57]

resolution. And by resolution, I just

[534:59]

mean how many dots go horizontally and

[535:02]

how many dots go vertically. Multiply

[535:03]

those two together and you get some

[535:05]

number of bytes, maybe in kilobytes,

[535:07]

megabytes, or heck, if it's a massive

[535:09]

image, it could be even bigger than

[535:10]

that. But it is in fact finite. And if

[535:12]

we zoom in on this image, you start to

[535:14]

see a little more detail. But at the

[535:16]

same time, if you keep zooming in, you

[535:18]

start to see indeed that there's only

[535:19]

finite detail. And when we go really uh

[535:22]

zoomed in, you start to see actual dots

[535:24]

or pixels as they're called. In fact, on

[535:26]

most any screen, any image you look at,

[535:29]

if you look close enough by pulling your

[535:31]

phone up to your eyes or walking really

[535:33]

close to a TV, you may very well see the

[535:35]

same thing because any image on a screen

[535:36]

like this is represented by hundreds,

[535:39]

thousands, millions of tiny little dots

[535:41]

called pixels. And each of those pixels

[535:43]

has a color that gives it collectively

[535:46]

the appearance of stress balls in this

[535:47]

case or cats in this case. So in fact

[535:50]

among the things we're going to do this

[535:52]

week in the problem set is actually have

[535:53]

you write code via which you can

[535:55]

manipulate your own images um not only

[535:58]

to understand what's going on underneath

[535:59]

the hood but to apply some of today's

[536:01]

most familiar filters so to speak. In

[536:04]

fact if we go all the way down here

[536:06]

you'll see that this image of course is

[536:07]

multiple colors. We've got some white

[536:09]

and some red and shades in between. But

[536:10]

let's keep things simple for a moment

[536:12]

and propose that instead of looking at

[536:13]

these dots, we look at these zeros and

[536:16]

ones. And let me propose that in a

[536:18]

picture like this, any zero will be

[536:20]

interpreted as black. Any one will be

[536:23]

interpreted as white accordingly. If you

[536:26]

can see it, what is this a picture of?

[536:32]

>> Oh, smiley face is in fact right.

[536:34]

Because if you kind of focus only on the

[536:37]

zeros and try to ignore those ones, as I

[536:40]

can do here for you, you'll see that

[536:41]

embedded in that image was in fact this

[536:43]

smiley face. Now, this would be a sort

[536:46]

of one bit image. You either have a zero

[536:48]

or one representing each of the colors.

[536:50]

In modern times, we would actually use

[536:52]

16 bits per color, 24 bits for color,

[536:54]

maybe even more. And that's how we can

[536:56]

get every color of the rainbow instead

[536:57]

of just something black and white. But

[536:59]

in effect, what's happening here is that

[537:00]

if you did have a file on your Mac or PC

[537:03]

or phone storing this pattern of zeros

[537:05]

and ones and you opened it up in some

[537:06]

kind of image program or like the photos

[537:08]

app, it would be depicted to you

[537:10]

visually as this simply a grid X and Y

[537:14]

where some of the dots are white, some

[537:15]

of the dots dots are black. All right,

[537:18]

so with that said, how what kinds of um

[537:22]

representations might be involved here?

[537:25]

Well, we can actually rewind to week

[537:26]

zero. Recall that we talked briefly

[537:28]

about RGB, which just means red, green,

[537:30]

and blue, which is one of the most

[537:31]

common ways to represent colors inside

[537:33]

of a computer. And if any of you have

[537:35]

ever dabbled with Photoshop or similar

[537:37]

editing programs, or if maybe in high

[537:39]

school or earlier you made your own web

[537:40]

pages, odds are you're actually familiar

[537:42]

with a syntax we're going to see a lot

[537:44]

of today. This doesn't add anything

[537:46]

intellectually new. It's just an

[537:47]

introduction to a common convention for

[537:49]

how else we can represent numbers. So,

[537:51]

this is a screenshot of Photoshop's

[537:53]

color picker. Photoshop being a popular

[537:55]

program for editing photos and files.

[537:57]

And you'll see here that my selected

[538:00]

color looks to the human eye as black.

[538:02]

And I've highlighted here how I got

[538:03]

that. I chose black by typing in 0 0 0.

[538:09]

Which also, if you look up here, means

[538:11]

that I want zero red, zero green, and

[538:13]

zero blue. And yet, we somehow

[538:16]

translated it to six zeros instead of

[538:18]

just three. Well, if we take a look at

[538:19]

another color like white instead, I

[538:22]

claim that you can represent white in

[538:24]

Photoshop and today in code with FF FFF

[538:28]

or equivalently 255 red, 255 green, 255

[538:33]

blue. And here, if you think back to

[538:34]

week zero is maybe a hint at where we're

[538:36]

going with this. If you're using an 8bit

[538:38]

number, which means then you can count

[538:41]

from zero on up to 255. So recall that

[538:44]

255 is like the biggest number you can

[538:46]

represent with just eight bits. And yet

[538:48]

somehow there's going to be a

[538:50]

relationship between the 255s and these

[538:52]

Fs that we see down here. Let's just run

[538:54]

through a few more. If we wanted to

[538:55]

represent something like red, we're

[538:57]

going to use FF 000000. If we want to

[539:00]

represent green, we're going to use 00

[539:03]

FF 0. And lastly, to represent blue,

[539:06]

we're going to use 0000

[539:09]

FF. So what's going on here? And why do

[539:11]

we have just this different convention?

[539:12]

Well, turns out in the context of images

[539:15]

and also memory in general, it's just

[539:17]

human convention or programmer

[539:19]

convention to use this alternate

[539:21]

representation of numbers. Not the

[539:22]

so-called decimal system, but another

[539:24]

one that's not all that far off from

[539:26]

what we've been doing over the past few

[539:27]

weeks. So, here again was the binary

[539:29]

system. You've got just two digits in

[539:31]

your vocabulary, 0 and one. Here is the

[539:33]

familiar decimal system where you've got

[539:35]

10 instead, 0 through 9. Suppose we

[539:38]

wanted a few more digits. Well, we're

[539:40]

sort of out of Arabic numerals here, but

[539:42]

I could toss into the mix like A, B, C,

[539:45]

D, E, and F, either in lowercase or

[539:47]

uppercase. And in fact, that's what

[539:50]

computer scientists do when they want to

[539:51]

have more than just 10 digits available

[539:53]

to them, but as many as 16 digits

[539:56]

available. And in fact, when you want to

[539:58]

use this many digits, you call it hexa

[540:01]

decimal, implying that you've got 16

[540:03]

digits, aka base 16. Now, this there's

[540:06]

an infinite number of base systems. We

[540:08]

could do base 3, base 4, base 15, base

[540:10]

17 on up. But this is just one of the

[540:12]

relatively few conventions that are

[540:14]

popular in computing. And let's just

[540:16]

tease it apart because we're going to

[540:17]

see these kinds of numbers a lot. Well,

[540:19]

thankfully, like in week zero, like it's

[540:21]

the same old number system with which

[540:22]

you're familiar with the columns and the

[540:24]

placeholders. It's just the bases in

[540:27]

those columns mean a little something

[540:28]

different. So instead of using powers of

[540:30]

two or powers of 10, we're going to

[540:32]

today use powers of 16. So 16 to the 0

[540:35]

of course is 1. 16 to the first power is

[540:38]

uh 16. So we have the ones column, the

[540:41]

16's column and so forth. Meanwhile, if

[540:44]

we wanted to therefore start counting in

[540:46]

hexadimal, this twodigit number in

[540:49]

hexadimal is of course the number you

[540:51]

and I know in decimal as 0 because it's

[540:54]

still just 16 * 0 + 1 * 0. This in

[540:57]

hexadeimal is how you would represent

[540:59]

one, but you would say 01 or 01 instead

[541:02]

of just one to make clear there's two

[541:04]

digits. This would be 02 03 04 05 6 7 8

[541:09]

9. Now things get a little interesting.

[541:11]

In the decimal world, we're about to

[541:13]

carry the one and give ourselves two

[541:15]

digits 1 and zero. But in hexodimal, you

[541:18]

can keep going. So the next number in

[541:20]

hexodimal is going to be 0 A 0 B 0 C 0 D

[541:24]

0 E 0 F. And now things get interesting

[541:26]

again. What probably comes after zero F?

[541:29]

Even if you've never seen hex before

[541:32]

>> so one zero. You still still carry the

[541:35]

one as before. This goes back to zero.

[541:36]

And why is this now appropriate? Well,

[541:39]

how many digits did we just how many

[541:41]

numbers did we just count through? Well,

[541:42]

we started at 0 0. We went up through 0

[541:46]

F. And that's a total of 16

[541:49]

combinations. So, the highest we

[541:51]

counted, let me rewind. This number

[541:53]

here, of course, is going to be 1* F.

[541:55]

But what is F? Well, let's rewind

[541:57]

further. In fact, let's have our little

[541:58]

cheat sheet here. If we want to have

[542:00]

these digits at our disposal, I dare say

[542:03]

that 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

[542:11]

15. So fif f is just going to represent

[542:13]

the number 15. So if we now fast forward

[542:15]

back to where we were just counting from

[542:17]

zero on up through 0 a through 0 f, we

[542:21]

land here. This of course is 16 * 0 1 *

[542:25]

f which is 1 * 15. So this is how in

[542:27]

hexodimal you would represent the number

[542:29]

15. This in hexodimal is how you would

[542:31]

represent the number 16 instead. 15 to

[542:35]

16. This is not 10. That's how you would

[542:37]

pronounce it in decimal. This is 1 0 in

[542:39]

hexodimal because 16 * 1 + 1 * 0 gives

[542:42]

us of course 16. Now we could do this

[542:44]

toward infinity but we won't. 1 2 1 3

[542:48]

dot dot dot all the way up to ff. So

[542:52]

quick mental math. 16 * f. That is to

[542:55]

say 16 * 15 + 1 * 15 is any guesses?

[543:02]

>> It is in fact 255. You don't even have

[543:04]

to do the math because if you just think

[543:05]

about where we were going with this,

[543:07]

indeed we saw pairs of fs in the

[543:09]

Photoshop screenshots because this is

[543:11]

how a computer would represent the

[543:12]

number you and I know in decimal is 255

[543:15]

by just using two fs. So why do we care

[543:17]

about hexadimal? Well, it turns out that

[543:19]

it's just convenient to use two

[543:21]

hexadesimal digits to represent numbers

[543:24]

because a single hexodimal digit can be

[543:27]

used to represent four bits at once. For

[543:30]

instance, let me go ahead and explode

[543:31]

this by putting a little bit of space

[543:32]

between the two digits here. And let's

[543:34]

consider how you would represent f.

[543:36]

Well, if f is 15 and you want to

[543:38]

represent 15 in binary, I think that's

[543:40]

just going to be 1 one one.

[543:43]

Now, why is that? Well, one in the

[543:45]

eighth's place plus one in the four's

[543:48]

place uh plus uh one in the two's place

[543:51]

plus one in the onees place indeed gives

[543:53]

me 15. So using a single f I can count

[543:55]

up as we've seen already as high as 15.

[543:58]

But of course I've claimed in the past

[543:59]

that it's super common to use eight bits

[544:01]

at a time or one bite to represent any

[544:04]

value because that's just a very useful

[544:06]

common unit of measure. And so in

[544:08]

hexadimal if you wanted to represent

[544:11]

four ones you can say f. If you want to

[544:13]

represent another four ones, you can

[544:14]

just say f, which is to say that f and f

[544:18]

together is just like the same as eight

[544:20]

ones together, which is how we finally

[544:22]

get to the total number of 255 because

[544:25]

this is the ones place, the two's place,

[544:27]

the four's place, the 8s, 16, 32, 64,

[544:30]

128. But if you group these into

[544:33]

clusters of four bits alone, you can

[544:35]

represent all of the possibilities from

[544:37]

0 through 15 just using 0 through f. So

[544:40]

with one hex digit you can represent

[544:42]

four bits which is a long way of saying

[544:44]

is it's just convenient for that reason

[544:46]

which is why the world tends to use hex

[544:47]

when talking about colors and as we'll

[544:50]

see memory as well. So in fact let's

[544:53]

consider what is meant by memory and

[544:56]

what's going on inside of the computer

[544:58]

when we've been storing values thus far.

[544:59]

Well here's that canvas of memory. I

[545:01]

proposed last time uh in uh I proposed

[545:04]

last time and before that we can sort of

[545:05]

number these bytes arbitrarily but

[545:07]

reasonably. This is bite 0 1 2 3 4 5 6 7

[545:10]

dot dot dot and maybe this is bite 15.

[545:13]

That's fine. Nothing wrong with that.

[545:14]

But in the real world, any programmer

[545:16]

would actually think of these locations

[545:18]

instead not in decimal notation but in

[545:20]

hexadimal notation just because because

[545:22]

it's convenience for the reasons

[545:24]

discussed. So we would actually number

[545:25]

these from zero on up through 9 and then

[545:28]

keep going with a b c d e f and so

[545:33]

forth. So what does that mean for the

[545:36]

other digits? Well, this would be 1 0.

[545:38]

This would be 1 1. This would be 1 2 dot

[545:40]

dot dot. Here now is 1 9. But here's 1

[545:43]

A, 1 B, 1 C, 1 D, 1 E, 1 F, and so

[545:47]

forth, just using hexodimal notation.

[545:49]

But there's arguably some ambiguity

[545:50]

here. For instance, if you just at a

[545:52]

glance were to look at this board and

[545:53]

see this address 1 0, is that by 10 or

[545:58]

is that byte 16? It's just non-obvious

[546:02]

because if you don't know what base

[546:04]

system you're working in, which you

[546:05]

could infer by looking at the rest of

[546:06]

it, it could potentially be ambiguous.

[546:08]

So in the world of hexodimal, super

[546:10]

common to literally prefix any number

[546:12]

you ever write in hexodimal notation

[546:15]

using 0x. The zero doesn't mean anything

[546:18]

per se or the x. It just means what

[546:20]

follows the 0x is a number in hexodimal

[546:22]

notation which makes unambiguous the

[546:24]

fact that this is o x10 which if you do

[546:27]

the math in decimal again ends up being

[546:29]

16 not of course the number 10. In short

[546:33]

today you're about to see a lot of zero

[546:34]

x's and a lot of twodigit or fourdigit

[546:37]

or 8digit numbers in hexodimal notation.

[546:39]

Generally we don't care what the numbers

[546:41]

translate to. You don't need to do a lot

[546:43]

of math but it's going to be common

[546:45]

place to see syntax like this. All

[546:47]

right, back to sort of normal time. So,

[546:50]

here is a line of code int n equals 50

[546:53]

wherein we might want to declare a

[546:55]

variable called n and store a number

[546:56]

like 50 in it. Let's actually go ahead

[546:58]

and do this simple now as it probably is

[547:00]

in a file called how about addresses C.

[547:03]

We're going to play around with computer

[547:05]

addresses. And in addresses C, I'm going

[547:07]

to do something super simple at first

[547:09]

whereby I'm going to include standard

[547:11]

io.h. Then I'm going to go ahead and in

[547:13]

uh write int main void. No command line

[547:16]

arguments here. And then I'm going to

[547:17]

declare this variable n, set it equal to

[547:20]

the arbitrary but familiar value of 50.

[547:22]

And then just so that this program does

[547:23]

something mildly useful, let's go ahead

[547:25]

and print out with percent i and a back

[547:27]

slashn that value of n. So nothing new

[547:31]

here. I'm just literally going through

[547:33]

the motions of declaring a variable and

[547:35]

printing its value. So let's do that.

[547:36]

Make addresses enter dot slash

[547:39]

addresses. And hopefully I'll indeed see

[547:42]

the number 50. So, not all that much

[547:44]

going on in the code, but let's consider

[547:45]

what's going on in the computer's

[547:47]

memory. This line of code and the one

[547:49]

after it is giving the results of that

[547:51]

program, but where is that n ending up?

[547:53]

Well, here's my grid of memory. And

[547:55]

let's just suppose for the sake of

[547:56]

discussion that the 50 ends up down

[547:58]

here. Maybe there's other things going

[547:59]

on in my program. So, this part of my

[548:01]

computer's memory is already in use. So,

[548:02]

it's reasonable that it could end up in

[548:04]

this location here. But what is

[548:06]

important is that how many bytes am I

[548:08]

using for n? Apparently,

[548:10]

>> four. And that's because we've said

[548:12]

integers tend to be four bytes aka 32

[548:14]

bits. So this is at least to scale even

[548:16]

though I'm just imagining where it ends

[548:18]

up in memory. So that's where the 50

[548:20]

actually ends up. So when I actually

[548:22]

call print f and pass in n, clearly the

[548:26]

computer is going to that location in

[548:28]

memory and actually printing out that

[548:30]

value. But that value is indeed at a

[548:34]

specific memory address. It's not going

[548:36]

to be quite as simple as ox0 or o x1 or

[548:39]

a small number typically. It maybe is

[548:42]

going to be something arbitrary like

[548:43]

ox123 where I'm just making this up.

[548:45]

It's an easily pronouncable number in

[548:47]

hexadimal notation. All right. So what

[548:51]

can I use that information for? Well,

[548:54]

thus far this hasn't been useful to us,

[548:56]

but certainly programs we've been

[548:58]

writing have actually been making use of

[549:00]

this. But with a bit more syntax, I can

[549:02]

actually start to see things like this,

[549:03]

not just on the screen, but in code. In

[549:06]

fact, let me propose that we introduce

[549:08]

two new operators in C. So, two new

[549:10]

pieces of syntax. One is a single

[549:12]

amperand and one is a single asterisk.

[549:15]

And we'll see that uh the asterisk has a

[549:17]

few different uses, but the amperand has

[549:20]

a very simple straightforward one, which

[549:22]

is to just get the address of a variable

[549:26]

in memory. So if you've got a variable

[549:27]

like n, if you prefix it with amperand

[549:30]

n, you can actually ask the computer at

[549:33]

what address is this variable stored.

[549:35]

You can find out if it's indeed ox123 or

[549:38]

something else altogether. So in fact,

[549:40]

let me go ahead and do this by going

[549:41]

back to my addresses.c program and let's

[549:44]

see if we can print out not the value,

[549:46]

which is obviously going to be 50, but

[549:48]

let's actually print out the address

[549:49]

thereof. So up here in my code, I'm

[549:52]

going to change the N on line six to be

[549:55]

amperand N instead. And I'm going to go

[549:59]

ahead and make one other change because

[550:01]

yes, N lives at an address. And yes,

[550:04]

that address is technically a number,

[550:06]

but it's conventional not to use percent

[550:08]

I to display that number, but rather

[550:10]

another piece of syntax, which is just a

[550:12]

new format code, which you don't often

[550:14]

need. This is more demonstrative than

[550:15]

useful, I would say. But percent p is

[550:18]

going to be what we use when we want to

[550:19]

print out an address of something in the

[550:22]

computer's memory. So, back to the VS

[550:24]

Code. One more change. I'm going to

[550:25]

change my percent i to percent p

[550:28]

instead. So, at this moment, we should

[550:31]

see a version of the program that's not

[550:33]

going to display 50 anymore, but

[550:35]

something like ox123, but probably a

[550:38]

bigger number than that cuz my computer

[550:39]

has way more memory than that address

[550:41]

suggests. So, let's again make

[550:42]

addresses. Let's run dot / addresses.

[550:45]

And indeed, this variable at that moment

[550:47]

in time apparently lives somewhere in

[550:49]

the computer's memory at address ox7

[550:52]

FFD3 C34 EC C. All of those are

[550:56]

hexodimal digits. It would be painful to

[550:58]

do the mental math to figure out what

[550:59]

the numeric address is. But we're seeing

[551:01]

it indeed in this common hexodimal

[551:03]

notation which is not going to be often

[551:05]

useful for us as humans. But the

[551:08]

computer is and has been using this

[551:11]

information for some time. So in fact

[551:13]

what we're about to introduce is

[551:16]

admittedly one of the more complicated

[551:17]

concepts in computing and in C in

[551:20]

particular namely a topic called

[551:21]

pointers. And I will say today more so

[551:23]

than ever might feel like a bit of a

[551:25]

fire hose. In fact, all these years

[551:26]

later, I still remember the day in which

[551:29]

I finally understood this topic, which

[551:31]

was not the day of the lecture in which

[551:33]

it was introduced, but it was in like

[551:35]

the back right corner of the Elliot

[551:36]

House dining hall. I was sitting down

[551:37]

during office hours with my teaching

[551:39]

fellow and he finally helped that light

[551:40]

bulb go off over my head. So, if some of

[551:42]

this feels a little arcane today, it

[551:44]

just comes with time and with practice

[551:46]

like everything else. So, what is a

[551:48]

pointer? A pointer is going to be a

[551:50]

variable that can store an address. Now,

[551:54]

yes, that address is technically just a

[551:55]

number, like an integer, but we

[551:56]

distinguish between integers that we

[551:58]

care about like 50 and things we might

[552:00]

do math on, and a pointer, which in this

[552:02]

case is just going to be the address of

[552:04]

a variable uh the address of a value in

[552:08]

memory. So, what does this mean? Well,

[552:10]

we can start to do things like this. I

[552:13]

can declare my variable n as before and

[552:15]

set it equal to the value 50. But I can

[552:17]

actually get the address of n and put

[552:20]

that address in another variable. And

[552:22]

that variable we now call a pointer. So

[552:24]

P is going to be the name of this

[552:26]

variable. It's going to store the

[552:27]

address of N which we can get using the

[552:29]

amperand. But there's one more piece of

[552:31]

syntax which I promised before. This

[552:33]

asterisk here. And the asterisk here

[552:36]

means that this variable P stores the

[552:40]

address of an integer, not an actual

[552:43]

integer per se. It's weird looking

[552:45]

syntax. It kind of looks like

[552:46]

multiplication, but it isn't. It's just

[552:48]

the developers of C decades ago decided

[552:50]

to use an asterisk, even though it's

[552:52]

admittedly nonobvious what it's doing.

[552:54]

But in this context, when you see an

[552:55]

asterisk right after a data type like

[552:57]

int, it just means that the variable in

[553:00]

question is not going to be an int per

[553:01]

se, but an address of an integer. Okay,

[553:06]

so let's put this to the test using a

[553:08]

line of this code in my own file here.

[553:12]

Let me propose that we do this. Let me

[553:14]

go back to VS Code here. Let me

[553:16]

introduce this additional variable int

[553:19]

star p as it's typically pronounced. Set

[553:21]

that equal to amperand n and then do the

[553:25]

exact same thing as before. Let's not

[553:26]

print out amperand n but let's actually

[553:28]

print out the value of p itself because

[553:31]

p is now equivalent to amperand n. So

[553:34]

let me go back to VS Code. Let me do

[553:37]

make addresses again. And huh, I did

[553:40]

something wrong and stupid here. This

[553:42]

was not meant to be the moral of the

[553:43]

story. What did I do wrong? Yeah.

[553:47]

>> Yeah, I just missed the semicolon. So,

[553:48]

still making those mistakes here. All

[553:50]

right. And let me clear my screen again

[553:51]

and do make addresses. Entertresses.

[553:55]

And now I should indeed see the address

[553:58]

of N, I just so happen to temporarily

[554:01]

store it this time inside of a variable

[554:03]

called P. Now, just so you've seen it,

[554:05]

it turns out that when using this syntax

[554:08]

of using a star to declare a so-called

[554:11]

pointer and amperand over here to get

[554:13]

the address of something, you might see

[554:15]

in online references and such different

[554:16]

formattings of this. This is the

[554:18]

canonical way to declare a pointer. Int

[554:21]

space, then the star, then without a

[554:24]

space, the name of the variable.

[554:25]

However, it will work and you will

[554:27]

sometimes see that the star is over here

[554:30]

or the star is in the middle. But again,

[554:32]

we would recommend stylistically that it

[554:34]

just go here. Admittedly, I think it

[554:35]

would have been clean clearer if the

[554:37]

star were over here, making clear that

[554:39]

it's related more to the int than it is

[554:41]

to the variable name. But this is simply

[554:43]

the convention. So this means, hey

[554:45]

computer, give me a variable called p

[554:47]

that's going to store the address of an

[554:50]

integer. And the amperand is just

[554:51]

saying, hey computer, tell me the

[554:53]

address of n. And it's the compiler and

[554:57]

computer itself that decided where to

[555:00]

put that variable in memory. Questions.

[555:07]

>> Would you get an error if you didn't put

[555:08]

the asterisk? You would. And let's take

[555:10]

a look. So, let me go ahead and clear my

[555:11]

terminal. Let me go ahead and delete the

[555:14]

star before the variable p. Now, let me

[555:16]

go ahead and do make addresses again.

[555:18]

And indeed, I'm getting an error.

[555:20]

Incompatible pointer to integer

[555:22]

conversion initializing int dot dot dot.

[555:24]

And even though that's a lot of big

[555:26]

words, it kind of says what it means.

[555:28]

You're trying to go from a pointer on

[555:31]

the right to an integer on the left,

[555:33]

which is just not appropriate here. Yes,

[555:35]

at the end of the day, they're all

[555:36]

numbers, but it's more properly a

[555:38]

pointer or an address on the right, but

[555:40]

a little old int now incorrectly on the

[555:43]

left. So, the fix there is just to

[555:44]

indeed put it back. Other questions on

[555:47]

this new syntax? Yeah.

[555:52]

you do like

[555:55]

>> indeed. To recap the question, can you

[555:57]

use the address of operator to find the

[555:58]

address of other data types like

[556:00]

strings? Absolutely. And we'll do that

[556:01]

with a couple of examples today as well.

[556:03]

We're just using ins to keep it super

[556:04]

simple initially. Other questions on

[556:06]

these addresses and pointers.

[556:08]

>> So we still use

[556:10]

variables even if they're not integers.

[556:13]

Is that right?

[556:14]

>> Correct. Correct. Even if it's not an

[556:16]

int question, we'll come back to other

[556:18]

data types in a little bit. You're still

[556:19]

going to use the star. That is the same

[556:21]

syntax for everything.

[556:23]

And yes,

[556:24]

>> can you tell the computer I want to

[556:26]

store these variables in this address?

[556:28]

>> Oh yes. Can you tell the computer you

[556:30]

want to store a variable in this

[556:32]

address? That's where we're going in

[556:33]

just a bit. Indeed. Now that we have the

[556:35]

ability to find out the address of

[556:37]

something in memory, stands to reason

[556:39]

that we can go to that address ourselves

[556:41]

and maybe poke around and actually put

[556:43]

values there. And in fact, that's that's

[556:45]

among our goals for today. So let's

[556:47]

consider how we might get there. So here

[556:49]

now is my canvas of memory and let me

[556:51]

propose that the number 50 happened to

[556:53]

get stored in the variable n down there

[556:55]

at bottom right just because and that's

[556:57]

probably ox123 or in reality a much

[557:00]

larger address but it's easier and

[557:01]

quicker for us to just pretend it's at

[557:03]

0x123.

[557:05]

What is actually happening in code when

[557:08]

I declare P and put a value there? Well,

[557:12]

recall a moment ago I declared P to be a

[557:15]

pointer to an integer. that is the

[557:17]

address of an integer. So what's

[557:18]

happening in memory is this. If n is

[557:20]

down here and happens to be at address

[557:22]

ox123

[557:24]

when I actually assign p to amperand n

[557:29]

that just literally takes that address

[557:31]

of n and puts it inside of p. Now p as

[557:35]

an aside happens to be pretty big. It

[557:36]

turns out by convention on most systems

[557:38]

a pointer that is a variable that stores

[557:40]

an address is actually going to be eight

[557:42]

bytes large. It's going to be 64 bits.

[557:45]

Why is that? Our computers have so much

[557:46]

darn memory nowadays in the gigabytes

[557:48]

that you need to be able to count higher

[557:50]

than 4 billion. As an aside, if you only

[557:52]

used 32 bits for your pointers, you

[557:55]

could only count recall as high as 4

[557:57]

billion. 4 billion uh is 4 gigabytes

[558:00]

equivalently. That would mean your

[558:02]

computers could not have 8 gigabytes of

[558:04]

memory, 16 gigabytes of memory. Your

[558:06]

servers couldn't have tens of gigabytes

[558:07]

of memories. We use 64 bits or eight

[558:10]

bytes nowadays for pointers because our

[558:12]

computers have that much more memory.

[558:14]

All right. So what is Ptor Storing?

[558:16]

Literally just an address like this. So

[558:18]

when we wrote this code just a moment

[558:20]

ago, what the computer did and has been

[558:22]

doing for the past several weeks is

[558:24]

literally just finding the location of N

[558:27]

in memory and plopping that value inside

[558:30]

of P which itself is taking up a bit of

[558:33]

memory but or uh by convention more

[558:35]

memory 8 bytes in this case. The thing

[558:38]

is who really cares about this level of

[558:40]

detail? Typically, as programmers, it's

[558:42]

useful to understand what's going on,

[558:44]

but rarely are we going to care

[558:46]

precisely about where things are in

[558:48]

memory. Today is really about just kind

[558:49]

of looking at what's going on underneath

[558:51]

the hood. So, in fact, we can abstract

[558:53]

away most of my computer's memory, I

[558:54]

would propose, because at the moment,

[558:56]

all we care about is P existing and N

[558:58]

existing. So, who really cares what else

[559:00]

is going on? And frankly generally I am

[559:02]

not going to care that N is at address

[559:05]

ox123 just that it is at an address that

[559:10]

happens to be ox123. And so the way a

[559:12]

programmer or computer scientist when

[559:14]

talking about design on like a

[559:15]

whiteboard or frankly in sections and

[559:17]

office hours on a whiteboard we rarely

[559:19]

care what the actual addresses are. So

[559:20]

we generally abstract the specific

[559:22]

address away and literally represent

[559:25]

pointers with arrows on the screen or on

[559:28]

the whiteboard or the like. This just

[559:30]

means that P is a variable that points

[559:32]

to the number 50 in memory.

[559:36]

Okay. Questions on this mental model for

[559:38]

what a pointer is. It's a pointer in

[559:41]

like very much the literal sense.

[559:46]

Okay. So, if you're on board with that,

[559:50]

let me propose that we consider now um

[559:53]

what these things look like maybe more

[559:55]

physically. In fact, we've we've got a

[559:56]

couple of mailboxes here to make clear

[559:58]

with a little metaphor that uh here is a

[560:01]

physical representation of our variable

[560:04]

say P labeled as such. Inside of this is

[560:08]

presumably going to be the address of

[560:10]

some actual value. That value at the end

[560:12]

of the story is going to be the value of

[560:14]

N which recall for consistency is that

[560:16]

address ox123.

[560:18]

So what happens when you actually try to

[560:22]

uh locate a value in memory is analogous

[560:25]

to sort of looking up something inside

[560:27]

of these mailboxes which if you think of

[560:29]

your computer's memory as hundreds or

[560:31]

thousands of little mailboxes maybe more

[560:33]

apartment style where you've just got

[560:35]

rows and columns of mailboxes as opposed

[560:37]

to individual ones for single family

[560:38]

homes. Each of those mailboxes can

[560:41]

contain the address of some value in

[560:44]

memory. And so what's really happening

[560:46]

is that if this is P, not drawn to scale

[560:48]

because they only make mailboxes so

[560:50]

large. Inside of P is going to be an

[560:53]

address like ox123. And just to be

[560:55]

dramatic since there's a big football

[560:56]

game this weekend, uh here is a Harvard

[560:59]

foam finger metaphorically like this

[561:01]

pointer is like pointing at that value

[561:03]

over there. And in fact, we're going to

[561:05]

see as you asked a moment ago, can we

[561:07]

actually go to an address in memory? We

[561:09]

don't yet have the syntax for that, but

[561:11]

we're about to. Yes, you can. And in

[561:13]

fact, if I follow what I'm pointing at,

[561:14]

open up this location in memory, voila,

[561:18]

there is the 50 in question. So, anytime

[561:21]

we're talking about values or we're

[561:23]

talking about the addresses thereof, you

[561:25]

can think of it analogously as being

[561:26]

like physical mailboxes, one of which

[561:28]

might contain a useful number like 50,

[561:30]

one of which might contain the address

[561:32]

of that value. And we now have the

[561:34]

syntax we'll see to actually go from one

[561:36]

to the other. Let me actually go back

[561:38]

into VS code here which in the most

[561:40]

recent version of my program what I was

[561:42]

doing was getting the address of N and

[561:44]

storing it in P and then I was literally

[561:46]

printing out P itself and that's when we

[561:48]

saw the big hexodimal number that is

[561:50]

generally not useful but it's maybe

[561:52]

interesting to see that one time. Let me

[561:54]

instead though introduce another use of

[561:56]

that star or asterisk operator that

[561:59]

allows us as was asked a moment ago to

[562:01]

actually go to that address. So in this

[562:03]

version of my program, I'm going to keep

[562:05]

N equal to 50. I'm going to keep P equal

[562:08]

to the address of N. But what I'm now

[562:11]

going to do is show you how

[562:12]

syntactically I can print out not P, but

[562:17]

N, but by using P, following the

[562:20]

proverbial uh foam finger metaphor by

[562:23]

printing out percent I back slashN and

[562:25]

printing out N instead. Now, obviously,

[562:27]

I could cheat and just say N and print

[562:29]

out N like in version one, but that

[562:31]

doesn't really demonstrate anything

[562:32]

interesting here. However, if I only

[562:35]

have P at this point in the story, it

[562:37]

turns out you can use the star for

[562:39]

another purpose. If you simply prefix

[562:42]

your variable name with a star, that is

[562:45]

the so-called now dreference operator,

[562:48]

which means go to the address in P. So

[562:51]

if I now open up my terminal here, do

[562:53]

make addresses for this version, then

[562:55]

dot / addresses and enter, I now get

[562:58]

back the number 50. So what's really

[563:00]

happening in line five, as has been true

[563:02]

for several weeks now, we have a

[563:05]

variable called n being initialized to

[563:08]

the number 50. Then on my next line six,

[563:11]

I'm declaring p as an address of some

[563:14]

value, an integer specifically, and

[563:16]

putting the address of n in there

[563:18]

exactly. And then on line seven, I'm

[563:20]

actually saying print out an integer

[563:22]

percent I as we've done for weeks. But

[563:24]

what integer? Go to the address in P and

[563:29]

print out what you find there. So that's

[563:31]

equivalent again to the the foam finger

[563:33]

which is over there pointing at the

[563:35]

address I actually want to point print

[563:37]

out instead.

[563:40]

Okay. So

[563:43]

usefulness. Well, I think we can get

[563:44]

there by taking a look at one of our

[563:46]

little white lies that we've been

[563:48]

telling. In fact, let's turn our

[563:49]

attention to strings, which up until now

[563:51]

have been a sequence of characters in

[563:52]

the computer's memory. A string is a

[563:54]

thing in programming more generally, but

[563:56]

in C, it technically doesn't exist by

[563:58]

this name. But you can still use strings

[564:00]

in C, but just not by calling them str

[564:03]

iing as the actual data type. But let's

[564:06]

let's start with our familiar code here.

[564:09]

Let me go into addresses.c. Let me add

[564:11]

our trading wheels in for now and

[564:13]

include cs50.h

[564:15]

because in this version of my addresses

[564:17]

program, what I want to do is declare a

[564:19]

string s and I'm going to set it equal

[564:21]

to high exclamation point. Then as we

[564:24]

did in week one, let's go ahead and

[564:26]

print out with percent s back slashn

[564:28]

that value of s. So nothing new, nothing

[564:30]

interesting here. So let me just do it

[564:32]

quickly and do make addresses then dot

[564:34]

/resses and we see hi on the screen. So

[564:37]

that has all been something we've been

[564:38]

taking for granted. But let's consider

[564:41]

what is going on underneath the hood of

[564:42]

even that program. So the string we've

[564:45]

declared in memory exists somewhere in

[564:47]

the computer's canvas of memory. So

[564:49]

string s equals high might end up

[564:51]

somewhere down here. And I'm going to

[564:52]

stop drawing all of the boxes when not

[564:54]

necessary. But here we have hi

[564:55]

exclamation point. And as we discussed

[564:57]

two weeks ago, the null character and ul

[565:01]

which just means the string stops here.

[565:03]

So as a quick refresher, even though the

[565:05]

word is three characters, it takes up

[565:07]

how many bytes? Four. always because you

[565:10]

need that null terminator. All right, so

[565:12]

maybe that string could be accessed then

[565:15]

by its name S. And we've seen this

[565:16]

before. S bracket zero is the first

[565:18]

character. S bracket 1 2 and then if you

[565:21]

want to poke around, you can go into S

[565:22]

bracket 3, but you'll probably see quote

[565:24]

unquote null on the screen or the

[565:26]

compiler will sort of the computer will

[565:27]

sort of remind you that you don't really

[565:29]

want to look there at that point. So,

[565:31]

three characters accessible via this

[565:33]

array syntax. But we know now that

[565:36]

everything in the computer's memory is

[565:37]

addressable. And maybe that H just so

[565:41]

happened to end up at ox123 and the i

[565:44]

ends up at ox124 125 126 respectively.

[565:48]

Doesn't matter what these numbers are,

[565:49]

but because strings are sequences of

[565:52]

characters back to back up to back in

[565:53]

memory, it must be the case that these

[565:56]

addresses are themselves contiguous back

[565:58]

to back to back without gaps inside of

[566:00]

them. That's how a string has always

[566:02]

been stored in memory. It's just an

[566:04]

array of characters. All right, so with

[566:07]

that said, what really is S? We've

[566:10]

thought of S in every program we've used

[566:12]

strings in before as just a string. Like

[566:14]

that is the sequence of characters or

[566:15]

really it's the name of an array. But

[566:17]

that's a bit of a white lie because what

[566:20]

S really is is going to be a more

[566:23]

specific value. Take a guess what is

[566:26]

actually going to be the value in S.

[566:32]

>> Yeah, the address of if I may that

[566:35]

array. So we've got like sort of four

[566:37]

possible answers here. A, B, C, and D.

[566:39]

Multiple choice. Which of those numbers

[566:41]

probably makes sense to store in the

[566:44]

variable called S in order to get to

[566:47]

this string? What what is S's value?

[566:50]

Yeah.

[566:54]

>> 0x123

[566:55]

is correct. So we don't talk about this

[566:57]

in like week one because like it's

[566:59]

already hard to like remember semicolons

[567:00]

in week one. Like god forbid start

[567:02]

thinking about like what these specific

[567:03]

addresses are. S is a string. S. But

[567:06]

technically S is and has been since week

[567:08]

one a pointer. The address of an array

[567:12]

of characters in memory. The address

[567:15]

specifically of the first character in

[567:18]

memory which is sufficient. Why? Because

[567:20]

of this null terminating convention that

[567:23]

we talked about weeks ago that tells the

[567:25]

computer where the string ends. The

[567:27]

pointer tells the computer where the

[567:29]

string begins. And that's how you get

[567:32]

using just numbers, zeros and ones

[567:35]

inside of a computer to store something

[567:37]

as interesting as an actual string. So

[567:41]

in fact, let's make let's take a closer

[567:43]

look at this. In fact, let me go into uh

[567:46]

VS Code again and just for the sake of

[567:48]

discussion, let me declare S as before,

[567:50]

but instead of printing out uh the whole

[567:52]

string at once, let's go ahead and do

[567:54]

this. print f uh quote unquote percent p

[567:58]

back slashn

[568:00]

and then let's print out s itself

[568:03]

initially to see whether it's actually o

[568:05]

x123 or presumably a much bigger number

[568:07]

then after that let's print out another

[568:09]

pointer another address rather percent p

[568:13]

back slashna

[568:15]

and now I'd like to print out the

[568:18]

address of the first character of s but

[568:21]

let's let's not get ahead of ourselves

[568:23]

let me go ahead and make addresses n dot

[568:25]

/resses. Okay, there now in this high

[568:28]

program is the address at which the

[568:31]

string itself is stored. ox

[568:33]

5a7143027004.

[568:36]

So bigger than ox123. Well, let's now

[568:39]

poke around. What if I were to do this?

[568:41]

What if I want to print out the address

[568:43]

of how about the first character in that

[568:47]

string? Well, at the moment, recall that

[568:49]

s bracket zero is literally the first

[568:52]

character. That is a char. So with what

[568:54]

syntax could I get the address of the

[568:57]

first character?

[568:58]

Well, we haven't learned all that much

[569:00]

that's new today. It's just a single

[569:02]

amperand that will get me the address of

[569:04]

that character. If I do this for the

[569:06]

next character, I can see one after

[569:10]

another. And in fact, this is going to

[569:11]

have four characters in total, including

[569:13]

the null character. So let me copy

[569:15]

paste, which is generally frowned upon,

[569:17]

but not for a lecture demo because we're

[569:18]

just trying to do this quickly. Let's

[569:20]

print out the address of S itself. and

[569:22]

then more specifically the address of

[569:24]

S's first character, the address of S's

[569:26]

second character, third, and the address

[569:29]

of that null terminator. All right,

[569:31]

let's go back into make addresses. Let

[569:34]

me go ahead and clear my terminal and

[569:35]

dot slash addresses. And we see if I

[569:38]

zoom in on my terminal here, the

[569:40]

following. S itself contains ox 56199

[569:46]

bd00004.

[569:48]

And the address of the first character

[569:50]

in S, aka S bracket zero, is exactly the

[569:53]

same thing. The next character, the I in

[569:56]

high is one bite away. The exclamation

[569:58]

point is one more bite away. And the

[570:00]

null terminator is one more bite away.

[570:03]

So again, bigger numbers, but the point

[570:04]

is these are indeed just the actual

[570:07]

addresses of all of these characters in

[570:10]

memory. All right, let me pause for any

[570:12]

questions here. Yeah,

[570:15]

>> why do you need a reference specific

[570:19]

but not S?

[570:20]

>> Good question. Why do I need the

[570:22]

amperand before the specific characters

[570:24]

in S but not S itself? Think what S

[570:28]

actually is. I'm claiming for the moment

[570:29]

that S itself is the address of that

[570:32]

whole string which just so happens by

[570:35]

design to be equivalent to the address

[570:37]

of the first character because that is

[570:39]

the convention humans came up with

[570:40]

decades ago to represent a string. Now

[570:42]

you might think that you need the

[570:43]

address of every character in the

[570:45]

string. But no, that's why humans

[570:46]

decades ago decided to just terminate

[570:48]

every string in memory with the

[570:50]

backslash zero or null terminator

[570:52]

because if you give me the beginning of

[570:53]

the string and the end, I can obviously

[570:54]

with a loop find everything else in

[570:57]

between. Other questions? No. All right.

[571:02]

Well, what is then this actual thing in

[571:06]

memory? Well, it turns out that S is

[571:09]

yes, a string as we've been describing

[571:11]

it. It turns out that yes, S is a string

[571:13]

as we've been describing it all this

[571:15]

time. But technically, I think we're

[571:17]

ready to reveal what little white lie

[571:19]

we've been telling or if you will, what

[571:20]

abstraction S actually is in the CS50

[571:23]

library. The type you know as string

[571:26]

since week one all this time has simply

[571:29]

been a synonym for char star s this is

[571:35]

where

[571:37]

maybe so what does this really mean well

[571:39]

we saw instar p earlier here we're

[571:43]

seeing char star s but what does that

[571:45]

really mean well s is the name of the

[571:46]

variable and yes it's a string but what

[571:48]

is it really s is the address of a char

[571:52]

and so in week one of the course in the

[571:53]

actual CS50 50 library. We've told this

[571:55]

little white lie by just creating a

[571:56]

synonym in the library that makes char

[571:59]

star so to speak the exact same thing as

[572:02]

string s t r i n g just so that we don't

[572:04]

have to think about this level of detail

[572:05]

let alone hexodimal notation and

[572:07]

addresses and pointers and dreferencing

[572:08]

and all of this complexity in the first

[572:10]

weeks of the course. It simply abstracts

[572:12]

away what the char what a string

[572:15]

actually is. And in fact we've seen this

[572:18]

technique before in a more complicated

[572:20]

way. In fact, if you recall a couple

[572:22]

lectures uh last week, we actually

[572:24]

claimed that you could create a phone

[572:25]

book for instance using uh persons and

[572:27]

persons have names and numbers and we

[572:29]

created our own type by saying type

[572:32]

defaf and that type was a whole

[572:34]

structure which is the complexity part a

[572:36]

structure containing a name and a number

[572:37]

and we gave that data type ultimately

[572:39]

the keyword person. So we've already

[572:41]

invented in class our own makebelieve

[572:44]

data types to create things that didn't

[572:46]

come with C itself like a person. Well,

[572:48]

the strruct is very specific to what we

[572:50]

were trying to do with the phone book,

[572:52]

but typed defaf is more generally useful

[572:54]

because it literally allows you to

[572:55]

define your own type. So, for instance,

[572:57]

if we wanted to create an synonym for

[573:00]

int because we never remember what it is

[573:02]

and call it integer instead, you could

[573:04]

simply say type def int.

[573:08]

And that would create in your

[573:10]

programming environment a data type

[573:12]

called integer that is literally

[573:13]

equivalent to int. Now, this is not all

[573:15]

that useful. So instead in the CS50

[573:17]

library, we do use typed defaf to tell

[573:20]

the computer that charar should instead

[573:23]

be spelled as string semicolon. And that

[573:28]

just means that string ever after is the

[573:31]

same thing as saying char star. So all

[573:33]

of this time since week one, I could

[573:35]

have been doing exactly that if I

[573:37]

wanted. And in fact, if I go back to VS

[573:39]

Code here, let's simplify this quite a

[573:41]

bit and go back to the very first

[573:43]

version of the program wherein I use

[573:45]

percent s and just print it out s is

[573:48]

value itself, the string high. Well,

[573:50]

this of course is going to work as

[573:52]

always as follows. It's just going to

[573:54]

print out high on the screen. But now,

[573:57]

if I get rid of the CS50 library and try

[573:59]

to recompile this, notice we'll get an

[574:01]

error that I think I've seen before.

[574:04]

Here we have if I scroll up to the very

[574:07]

first line use of undeclared identifier

[574:09]

string did I mean standard in and no I

[574:12]

don't and no I didn't a couple weeks ago

[574:13]

when I accidentally did that but it the

[574:16]

compiler does not know about the keyword

[574:18]

string at the moment. Well that's fine

[574:20]

even if I don't have the CS50 library

[574:22]

installed on this computer. I can just

[574:24]

get rid of the word string which is a

[574:26]

concept but not a keyword in C and just

[574:29]

rename it to char star. And now in my

[574:31]

terminal window, I can do make addresses

[574:33]

again, dot slash addresses, and voila,

[574:35]

we're back in business with no CS50

[574:37]

training wheels whatsoever because

[574:40]

printf knows given a char star, go to

[574:43]

that address, print, print, print, print

[574:44]

until you get to the null terminator,

[574:46]

and then stop printing. There's a loop

[574:48]

in there that does exactly that.

[574:52]

questions

[574:54]

on char star or what a string actually

[574:57]

now is.

[575:01]

>> Yeah. In front.

[575:13]

>> Good question. How does print f know to

[575:14]

keep going until it gets to the null?

[575:16]

the format code because I've been using

[575:19]

percent s which means print a string

[575:21]

instead of percent c which means print a

[575:24]

single character print fc is that

[575:26]

percent s and it was like oh I should

[575:27]

use a loop to print out all of the

[575:30]

characters until the null terminator if

[575:32]

I instead passed in just percent c it

[575:34]

would stop after a single character

[575:36]

>> okay that makes sense

[575:38]

>> other questions

[575:45]

>> good question why Why don't I dreference

[575:47]

S in order to print it out? So, let me

[575:50]

try that for just a moment here. Why do

[575:52]

I not have to now or any week prior do S

[575:57]

here? Because after all, if S is the

[575:58]

string, I want to go to the string and

[576:01]

print it out. Well, the first answer is

[576:02]

that print f is doing this for you

[576:04]

because it's being handed the address

[576:06]

and it is going to the address for you.

[576:08]

So, that star is somewhere in print f's

[576:10]

implementation. But this is also

[576:12]

incorrect conceptually because yes s is

[576:15]

the string but more technically today s

[576:18]

is the address of the first character in

[576:21]

the string. So I really want to provide

[576:24]

print f in this case with the address

[576:26]

not the specific character because I

[576:29]

want it to treat it as a string not a

[576:32]

single character indeed. So I could use

[576:34]

the percent s if I change to percent uh

[576:36]

I could use star s if I change to

[576:38]

percent c to print out the single

[576:40]

character. All right. So let's play

[576:43]

around just syntactically for just a

[576:44]

moment here in VS code. Let me propose

[576:46]

that we still use charst star s here and

[576:49]

then just demonstrate exactly what's

[576:50]

going on. So I'll do exactly what was

[576:52]

just asked. So I'll use percent c and

[576:54]

then I'm going to go ahead and print out

[576:55]

for now our old week 2 syntax treating s

[576:59]

as an array. So s bracket zero, s

[577:01]

bracket one and s bracket 2. And I'm

[577:03]

using some copy paste just for time

[577:05]

sake. This of course is not going to do

[577:07]

anything all that interesting, but it is

[577:08]

going to demonstrate that indeed we have

[577:10]

h i exclamation point back to back to

[577:13]

back in memory. And if I really want um

[577:15]

I could print it all on one line by

[577:17]

getting rid of of course those new

[577:18]

lines. But what more can I do with this

[577:20]

syntax? Well, I could take literally the

[577:22]

fact that s is the address of the first

[577:25]

character in memory. So instead of using

[577:27]

this array notation which we introduced

[577:29]

in week two, I could technically go to

[577:33]

the address of S. Why? Well, S is the

[577:37]

address of the first character of the

[577:39]

string. Star S means go to that address.

[577:42]

And voila, you're at the first character

[577:45]

by definition of what S is. So I could

[577:48]

print out the first character using star

[577:50]

S instead of S brackets zero. How could

[577:54]

I do this? Well, here's where we can

[577:56]

actually take advantage of the fact that

[577:59]

pointers and addresses more generally

[578:01]

are in fact numbers and you can actually

[578:05]

do arithmetic on pointers themselves. In

[578:08]

other words, there is a concept known as

[578:09]

pointer arithmetic which means given an

[578:10]

address, you can add to it, subtract to

[578:12]

it. Heck, you could even multiply or

[578:13]

divide. Even though that would probably

[578:14]

be weird in most cases, we could

[578:17]

certainly add numbers to an address. So

[578:19]

for instance, if I want to print out the

[578:21]

second character of S, that's kind of

[578:24]

equivalent to going to S but then moving

[578:27]

over one character. So maybe I should do

[578:29]

a little bit of pointer arithmetic and

[578:31]

do S + 1 in parenthesis just so that

[578:34]

like in math class we uh do order of

[578:36]

operations correctly. And then down here

[578:39]

I could go to S again. But wait a

[578:41]

minute, I want to go to S plus two

[578:44]

characters away or two bytes away. So

[578:46]

now I can do make addresses down here.

[578:49]

Oh, and I did mess up. Oh, new mistake.

[578:53]

Unintentional.

[578:56]

Yep, I forgot my parenthesis on the very

[578:58]

end here. So that was just user error.

[579:00]

Make addresses again dot sladdresses.

[579:03]

And now I indeed see h i exclamation

[579:06]

point one more time using pointer

[579:08]

arithmetic instead of our familiar array

[579:10]

notation. So what is that array

[579:12]

notation? It's what we would generally

[579:13]

call syntactic sugar, which is a very

[579:15]

weird way of saying like it's just nicer

[579:17]

syntax. Like no one wants to write code

[579:19]

that looks like this. It sort of, you

[579:20]

know, bends the mind a little bit to

[579:22]

read and parse all of this visually.

[579:23]

Just s bracket zero is much more

[579:25]

straightforward. But what it's really

[579:26]

doing is this. And the computer is

[579:30]

essentially converting that bracket

[579:31]

notation for us into this more esoteric

[579:34]

but correct version instead.

[579:37]

All right. What else can I do? Well,

[579:39]

just for fun, for some definition of

[579:41]

fun, let's go ahead and print out three

[579:43]

different strings. And recall that a

[579:45]

string is a sequence of characters that

[579:47]

starts at some address. So, let's first

[579:49]

print out the sequence of characters

[579:51]

that starts at s. Let's next print out

[579:54]

the sequence of characters that starts

[579:55]

at s+ one. And let's lastly print out

[579:58]

the string that starts at s+ 2. Just

[580:00]

playing around with the definition of

[580:02]

what these pointers are. Let me do make

[580:04]

addresses.

[580:06]

And oh, not my day.

[580:09]

What did I forget? Semicolon. So if it

[580:12]

happens to you, it happens to me, too.

[580:13]

Make addresses dot sladdresses. And now

[580:16]

this one's going to be a little curious.

[580:18]

But I see hi I and just exclamation

[580:22]

point. Why? Because I'm treating a

[580:24]

string literally as what it is, a

[580:25]

sequence of characters, but I'm giving

[580:27]

print f the address of the first

[580:29]

character initially, then of the second

[580:31]

character, then of the third. But all

[580:33]

three of those statements work because

[580:35]

all three of them happen to be

[580:37]

terminated by the same null character.

[580:39]

Even though I and the exclamation point

[580:41]

alone was not really my intention, that

[580:44]

doesn't stop me from being able to do it

[580:46]

nonetheless.

[580:49]

All right. Well, let's do one other

[580:51]

maybe uh application of this idea. Let

[580:54]

me propose that. Let me propose that we

[580:56]

take a look at our computer's memory

[580:57]

here and let's suppose that we want to

[580:59]

start uh comparing values because in

[581:01]

week one we did a lot of that and we

[581:03]

even in week zero we did a lot of that

[581:04]

with if and else if and else and so

[581:06]

forth. So let's make this a little more

[581:07]

real and also reveal why last week we

[581:10]

had to solve a unexpected problem using

[581:13]

another string function namely stir comp

[581:15]

str cmp. So here for instance are two

[581:18]

arbitrary variables in memory I and J

[581:20]

and I gave them both the value of 50 and

[581:22]

maybe they indeed end up there each of

[581:24]

them taking up four bytes. Last time

[581:26]

recall that we weren't able to compare

[581:29]

two values in memory just by using the

[581:32]

equal equal operator unless those values

[581:34]

last time were actually integers. In

[581:37]

fact let's do that. Let me go back into

[581:39]

VS Code here. close out addresses and

[581:42]

let's code up maybe another version of

[581:44]

my compare program from last uh from the

[581:46]

past. This time I am going to use the

[581:48]

CS50 library just to keep things simple

[581:51]

initially. I'm going to include both it

[581:53]

and the standard IO library here. I'm

[581:55]

going to give myself main with no

[581:56]

command line arguments. And then in main

[581:58]

I'm going to declare exactly what we

[582:00]

just saw on the screen. A variable I set

[582:02]

to 50, a variable J set to 50. And then

[582:05]

we're going to do our old familiar

[582:06]

syntax from week one. If I equals equals

[582:09]

J, then let's go ahead and print out

[582:11]

something like same back slashn. Else,

[582:14]

let's go ahead and print out quote

[582:15]

unquote uh different back slashn. So

[582:19]

super simple program that simply

[582:21]

compares two variables that yes are

[582:23]

obviously going to be the same, but

[582:24]

let's do this. So let's do make compare

[582:27]

dot /compare. They're in fact the same.

[582:30]

Okay, so that actually works as

[582:32]

intended. But why didn't it work last

[582:35]

time when we tried comparing strings?

[582:37]

The solution to which was actually to

[582:39]

introduce stir comp. Well, let's go back

[582:41]

to VS Code and resurrect that buggy

[582:43]

example initially. In fact, let me go

[582:45]

into VS code here and instead of using

[582:49]

say integers, let's go ahead and do

[582:53]

this. And I'll rename them just by

[582:54]

convention. So my first string will be

[582:56]

quote unquote uh let's do my first

[583:00]

string will be whatever get string gives

[583:02]

me. So we'll prompt the user for s. My

[583:04]

next string will be called T by

[583:06]

convention and I'm going to ask the user

[583:09]

for that. Then down here, instead of

[583:11]

using I and J, which are common for

[583:12]

integers, I'm just going to use S and T,

[583:14]

which are common for strings, and just

[583:15]

ask literally the same question as we

[583:18]

have in the past. All right, let me go

[583:19]

ahead and do make uh compare

[583:23]

and wow, what's the error? Well, I'll

[583:26]

show you the error message. What did I

[583:28]

unintentionally do wrong here?

[583:32]

Yeah, I'm getting a string, but I'm

[583:34]

trying to store it into an int. So, this

[583:35]

is just frowned upon. So, let me go

[583:37]

ahead and change that to what I should

[583:38]

have typed the first time. Give me a

[583:40]

string s and a string t. Now, if I do

[583:42]

make compare, we're back in business.

[583:44]

All right, let me do dot /compare. And

[583:46]

I'm going to go ahead and type in, for

[583:47]

instance, uh let's say hi exclamation

[583:50]

point and high exclamation point, both

[583:53]

for S&T, which are obviously clearly

[583:57]

different.

[583:58]

Now, we've tripped over this before and

[584:00]

recall that the solution was indeed to

[584:02]

introduce a function called stir comp.

[584:04]

And I explained at a high level. Well,

[584:06]

that's because you're not just comparing

[584:07]

two values. You got to compare character

[584:09]

after character after character. And

[584:10]

that's what indeed stir comp does. So,

[584:12]

let's go ahead and do that. Let me go

[584:14]

back into this file. Let's go ahead and

[584:16]

include the string library at the top

[584:18]

here. And instead of doing s= t, let's

[584:22]

do if the string comparison of s and t

[584:27]

happens to equal equals zero, which per

[584:29]

the documentation for the function means

[584:30]

they're equal instead of one before or

[584:32]

one after the

[584:35]

other.

[584:37]

No, I did not get it wrong this time. I

[584:39]

caught it. Um, yes. So, how do we

[584:42]

actually go ahead and compare the

[584:43]

strings this time? Well, let me go ahead

[584:45]

and do make compare dot /compare. And

[584:47]

now type in exactly the same thing. Hi

[584:49]

exclamation point. Hi exclamation point.

[584:51]

And now they're in fact the same. And

[584:53]

just to demonstrate that this isn't just

[584:54]

some fluke, I can type in hi for

[584:56]

instance and buy. And those are in fact

[584:58]

different. So clearly stir comp is doing

[585:01]

something useful. But what is it

[585:03]

actually doing? Well, first of all,

[585:04]

let's make clear that what was a string

[585:06]

last week is technically a char star

[585:09]

this week. So I can remove that training

[585:11]

wheel. I'm still going to include the

[585:12]

CS50 library because as we'll see by the

[585:15]

end of class today, get string and get

[585:17]

int and all of those get functions from

[585:18]

CS50 are actually still useful because

[585:20]

it's a pain in the neck in C still to

[585:23]

get user input without using functions

[585:25]

like those. But I'm going to get rid of

[585:26]

the data type that we thought was called

[585:29]

string. This will still work exactly as

[585:31]

before. If I do make compare dot

[585:33]

/compare and type in high and high,

[585:36]

we're indeed seeing that they are now

[585:37]

the same. So, what's actually going on

[585:39]

inside of the computer's memory with

[585:41]

strings? Well, I would offer that S

[585:44]

probably ends up like over here in

[585:46]

memory. And then maybe it actually has

[585:49]

its characters down here. So, notice the

[585:52]

duality. S as of now, is an address,

[585:54]

which means it takes up eight bytes or

[585:56]

64 bits, but the actual characters, it

[585:58]

turns out, end up somewhere else in the

[586:00]

computer's memory. And this is what's

[586:02]

different about an int. The int i and

[586:04]

the int j both ended up exactly where

[586:06]

the variables were named. But with

[586:08]

strings, the variable itself contains

[586:11]

not the string, but the address of the

[586:13]

first character in that string, which I

[586:15]

claim could end up anywhere else in the

[586:17]

computer's memory. So that those

[586:20]

addresses might be ox123, 1 124,125, and

[586:22]

126 for instance. Meanwhile, S is going

[586:25]

to contain literally the address of that

[586:27]

first character. When I create T in

[586:30]

memory now, it ends up maybe over there

[586:32]

taking up eight bytes of its own down

[586:34]

here ends up the second thing that I

[586:36]

typed in not at the same address but at

[586:38]

ox456 457 458 459. Now if the computer

[586:41]

were really smart and generous, it could

[586:43]

probably notice, oh wait a minute, you

[586:44]

typed that thing in already. Let me just

[586:46]

point you at the other memory. But

[586:47]

that's not how it works. When you call

[586:49]

get string, you get your own chunk of

[586:51]

memory for whatever the human typed in.

[586:52]

Even if by coincidence it's exactly the

[586:54]

same. So T's characters are ending up

[586:57]

here. S's characters are ending up here.

[586:59]

What value should go in T?

[587:04]

>> Exactly 0x456 because that's the first

[587:06]

uh address of the first character in T.

[587:09]

So we put ox456 there. So at this point

[587:12]

in the story, we have two strings in

[587:15]

memory and two pointers there too. And

[587:18]

so in fact, if we kind of abstract that

[587:20]

away, it's kind of equivalent to S

[587:22]

pointing at the chunk of memory on the

[587:23]

left and T pointing at the chunk of

[587:25]

memory on the right. So why was string

[587:29]

comparison actually necessary? Well, in

[587:32]

this case, we wanted to make sure that

[587:36]

the stir comp function was handed the

[587:39]

address of S and the address of T. So

[587:42]

that the stir comp function written by

[587:44]

someone else decades ago actually has

[587:45]

its own for loop or while loop that

[587:47]

essentially starts at the beginning of

[587:49]

each string and compares them character

[587:51]

by character by character by character.

[587:54]

That's what it's designed to do. By

[587:56]

contrast, when I was using equal equals

[587:58]

a few minutes ago and also last week

[588:00]

incorrectly to compare strings, what was

[588:03]

getting compared? Well, if you literally

[588:05]

compare s= t, that's like saying, does o

[588:09]

x123 equal equal ox456?

[588:12]

And that's obviously not true because

[588:14]

those are literally two different

[588:15]

addresses. So, the answer I was getting

[588:18]

last week and today was correct. Those

[588:20]

addresses are different. But

[588:22]

conceptually of course I actually

[588:24]

intended for the program to compare the

[588:27]

actual characters in the string not the

[588:30]

uh simply the addresses thereof. So how

[588:34]

do we go about fixing something like

[588:36]

that? Well using stir comp ensures that

[588:39]

we can actually go ahead and compare

[588:42]

them character by character and I don't

[588:44]

need to create my own for loop or y

[588:45]

loop. The stir comp function does that

[588:47]

for me. And we can see this too. If I go

[588:49]

back to VS Code here, get those two

[588:51]

strings and just for kicks, go ahead and

[588:53]

print them both out using print f of

[588:55]

percent p back slashn. Then let's go

[588:57]

ahead and print out with percent uh p

[589:00]

again back slashn for each of them

[589:02]

passing in those variables s and t

[589:05]

respectively. What I should see that

[589:07]

even if I type the exact same thing,

[589:09]

we're going to see two different

[589:10]

addresses when I make this version of

[589:12]

the program. Here's my first high.

[589:14]

Here's my second. And the two addresses

[589:16]

are it's subtle very much different. The

[589:21]

first one ends in B 0. The second one

[589:23]

ends in F0. Both of which are hexadimal

[589:27]

values.

[589:29]

Question

[589:31]

on any of that thus far?

[589:36]

Any qu? Oh yeah, question in front.

[589:39]

Yeah. What's that?

[589:44]

>> Really good question. When you create a

[589:47]

pointer in memory or really when you

[589:48]

allocate a string or an integer in

[589:50]

memory, how does the computer decide

[589:51]

where to put it? It uses different

[589:54]

chunks of memory for different purposes.

[589:56]

And in fact, one of the topics we'll

[589:57]

look at after break today is exactly

[589:59]

that. How a computer decides where to

[590:01]

lay things out. It's often very

[590:02]

intentional and it is often auto

[590:04]

incremented. So they'll go back to back

[590:05]

to back when possible, but over time

[590:08]

things will start to get messier,

[590:09]

especially in larger programs where

[590:11]

you're adding and subtracting values

[590:13]

from memory all the time. So more to

[590:14]

come. Other questions on what we have

[590:16]

done here.

[590:19]

All right, before we break, let's do one

[590:21]

other example that elucidates perhaps

[590:23]

what can go wrong without understanding

[590:26]

some of these underlying building

[590:27]

blocks. whereby let's go ahead and

[590:30]

create a program this time that aspires

[590:31]

to copy two strings, which seems pretty

[590:34]

reasonable at a glance because it's

[590:35]

certainly easy to copy two integers. You

[590:36]

just set one equal to the other, but

[590:38]

that's not going to be the case, it

[590:39]

turns out, with copying a string. So,

[590:41]

let me open up how about uh copy C, a

[590:45]

new program, and I'm going to include a

[590:46]

few libraries at the top. We'll use

[590:48]

CS50.h so that we can still use get

[590:50]

string conveniently. We're going to

[590:51]

include uh cype.h for reasons we'll soon

[590:55]

see, but we saw that a few weeks back.

[590:57]

We'll include standard IO as always. And

[590:59]

lastly, we'll include string.h

[591:02]

inside of my main function, which won't

[591:04]

take any command line arguments. Let's

[591:05]

go ahead as before and declare a string

[591:07]

equal to get string and just prompt the

[591:09]

user for a variable s. Then let's go

[591:13]

ahead and try to copy

[591:16]

uh s into a new variable t just like I

[591:18]

would copy any two variables using the

[591:20]

assignment operator. Then let's treat

[591:23]

the copy otherwise known as T now as an

[591:27]

array which we're allowed to do per week

[591:28]

2. So let's say the first character in T

[591:31]

we actually want to set equal to the

[591:32]

uppercase version of that same

[591:34]

character. So this line 12 at the moment

[591:36]

is literally on the right hand side

[591:38]

saying use the two upper function from

[591:40]

the cype library which we used a couple

[591:42]

weeks back. Pass in the first character

[591:44]

of the copy T and then update the actual

[591:48]

first character of T. So let's

[591:50]

capitalize T but not S. Now at the very

[591:53]

bottom of this program, let's go ahead

[591:54]

and print out the value of S at this

[591:56]

point in time. And then let's print out

[591:58]

the value of T at this point in time.

[592:02]

And

[592:04]

when I go ahead and make this program

[592:06]

called copy and dot /copy, let's type in

[592:09]

high exclamation point. Uh no, let's do

[592:11]

it lowerase first. Let's do high in

[592:14]

lowercase. Enter. And we'll see

[592:16]

curiously that S and T both got

[592:21]

capitalized even though the only

[592:22]

character I touched was T bracket zero.

[592:25]

I didn't touch S after making this copy.

[592:28]

Now to be clear what's going on? Why

[592:30]

don't we remove one of these training

[592:32]

wheels? So string really doesn't

[592:34]

technically exist. It's always been a

[592:35]

char star. And this string is also a

[592:38]

char star. So what's really going on?

[592:40]

Well, more clearly now S is the address

[592:44]

of the string uh that the human typed

[592:47]

in. But T is a copy of what? Literally

[592:51]

the address of the thing the human typed

[592:55]

in which is going to be one and the

[592:56]

same. So in fact pictorially you can

[592:58]

think about it this way. If here is my

[593:00]

canvas of memory and the user is

[593:02]

prompted for S and the user types in

[593:04]

high in lowercase as I did and it

[593:06]

happens to end up down there. what gets

[593:07]

stored in S is going to be the address

[593:09]

of that memory which for the sake of

[593:10]

discussion is maybe ox123. So ox123 is

[593:13]

what is stored in S. When I then on my

[593:16]

second line of code create T, I get

[593:18]

another eight bytes of memory or 64 bits

[593:20]

to store a pointer charar aka string.

[593:23]

But what is put in S? What is put in T?

[593:27]

Literally S o X123. So abstractly it's

[593:31]

essentially equivalent to S and T both

[593:33]

pointing to the same chunk of memory. So

[593:36]

when I do t bracket zero and go to the

[593:39]

zeroth or first character of t, that

[593:42]

happens to be the exact same chunk of

[593:43]

memory that s is pointing to. And so

[593:45]

when that lowercase h becomes a capital

[593:47]

h, it's as though both s and t have

[593:52]

changed. And recall too, if you're

[593:54]

enjoying the syntax, if I go back to VS

[593:57]

code here, I did use array notation, but

[593:59]

I equivalently could have said go to the

[594:02]

address in t. go to the address of that

[594:05]

first character which functionally is

[594:07]

exactly the same. We're just not using

[594:08]

the syntactic sugar now of the square

[594:10]

brackets. That is why hi is actually

[594:13]

being capitalized for seemingly both

[594:16]

versions of it. The original and the

[594:19]

copy. So how do we go about fixing this?

[594:21]

Well, we need a couple of new solutions,

[594:24]

namely two new functions here. Maloc is

[594:27]

going to be a function that allocates

[594:28]

memory. So memory allocation aka maloc.

[594:31]

and then free which is going to be the

[594:33]

opposite which is when you're done with

[594:34]

new memory you can hand it back to the

[594:35]

computer and say use this for something

[594:37]

else. So using these two functions alone

[594:40]

I dare say we can solve now this problem

[594:42]

in memory by making an actual conceptual

[594:45]

copy of the string by copying hi

[594:48]

exclamation point and the null character

[594:50]

elsewhere in memory so that we can

[594:52]

actually manipulate the copy thereof. So

[594:54]

how do I do this? Well, let me go back

[594:55]

to VS Code here. Let me propose that we

[594:58]

get rid of much of what we did earlier

[595:00]

except we'll keep around the declaration

[595:02]

of S. But now if I want to create a copy

[595:06]

of S, it turns out I'm going to need to

[595:08]

ask the computer for as much memory as S

[595:12]

itself takes up. So hi exclamation point

[595:15]

takes up how many bytes in memory?

[595:17]

Four is correct because you need the

[595:19]

null character. So how do we figure this

[595:21]

out? You can do this. Let me give myself

[595:23]

another string called T. But we don't

[595:26]

need that white lie anymore. Another

[595:28]

char star called t and set it equal to

[595:30]

not s which we knew was going to go

[595:33]

wrong. Set it equal to the return value

[595:35]

of this new function maloc which is

[595:37]

going to return the address of a chunk

[595:39]

of memory for me. How many bytes do I

[595:41]

want? Well, technically I just want four

[595:43]

bytes. So I could do maloc of four. And

[595:45]

that will literally ask the operating

[595:46]

system running in the cloud in VS Code

[595:49]

for four bytes of memory somewhere in

[595:51]

that black and yellow grid I keep

[595:53]

drawing on the screen. I don't know

[595:55]

where it's going to be, but I don't care

[595:56]

because Maloc's return value will be the

[595:58]

address of the first bite thereof. Now,

[596:01]

it's a little dumb to hardcode four, not

[596:04]

knowing what the human's going to type

[596:05]

in, but that's okay. We can do this more

[596:07]

dynamically and use our old friend

[596:08]

Sterling, ask the computer, what is the

[596:11]

length of S? and then

[596:15]

add one because we know that we need to

[596:18]

additionally have an extra bite even

[596:20]

though the length of high in the real

[596:22]

world is three but we know underneath

[596:23]

the hood we actually need that fourth

[596:25]

bite hence the plus one. Now to use

[596:27]

maloc I actually need to add another

[596:30]

library here standard lib for standard

[596:32]

library.h

[596:34]

and that's going to give me access to

[596:35]

the prototype for and in turn the maloc

[596:38]

function. Now with this chunk of memory,

[596:42]

it's up to me to copy the string. So how

[596:44]

do I go about copying a string from S

[596:47]

into T? Well, I can do this in a bunch

[596:48]

of ways, but let me propose that we do

[596:50]

it like this. For int i equals zero, i

[596:53]

is less than the string length of s,

[596:55]

whatever that is, i ++. And then inside

[596:57]

of this fairly mundane loop, let's just

[596:59]

set the uh i value of t equal to the i

[597:05]

value of s and copy literally very

[597:08]

mechanically every character from s into

[597:12]

t.

[597:14]

Then down here, let's go ahead and

[597:16]

capitalize just the first character of t

[597:20]

by using two upper as before with or

[597:23]

without the syntactic sugar. And then at

[597:25]

the very bottom of this program, let's

[597:26]

print out the value of S itself just for

[597:30]

good measure to make sure we didn't

[597:31]

screw it up this time. And let's print

[597:33]

out the value of T just so we see that I

[597:36]

in fact have capitalized T and only T.

[597:39]

But I'm not quite done yet. There's a

[597:42]

design flaw here and a mistake, but it's

[597:46]

subtle. Does anyone want to pluck off

[597:49]

one or the other?

[597:51]

Check 50 and design 50 are not going to

[597:53]

like this. Yeah. We don't actually pop

[597:55]

over the like terminating character of

[597:57]

the string.

[597:57]

>> Yes, because Sterling always returns the

[598:00]

sort of real world length of the string.

[598:02]

Hi exclamation point 3. This would seem

[598:05]

to accidentally forget to copy the null

[598:08]

character. So I can fix this in a few

[598:10]

different ways. I could for instance at

[598:12]

the bottom of my loop actually do

[598:15]

something like t bracket 4 equals single

[598:18]

quotes back/z and manually terminate it

[598:21]

myself because I know it's got to end

[598:22]

with a null character. This would be

[598:23]

frowned upon too. I shouldn't be hard

[598:25]

coding the four. This is all too sloppy.

[598:26]

So don't do this. What I could instead

[598:29]

do is say go up to and through the

[598:31]

length of S because if the length of S

[598:34]

is three, but I use less than or equal

[598:37]

to that thing's going to iterate of

[598:39]

course four times because I'm starting

[598:41]

at zero as always. So that I think fixes

[598:44]

that problem. But now the design flaw

[598:46]

which is subtle but we've seen it

[598:47]

before. Yeah.

[598:53]

Exactly. It's just dumb of me to be

[598:54]

asking the computer what's the length of

[598:56]

s what's the length of s what's the

[598:57]

length of s and every iteration. So this

[598:59]

is why we introduced this trick where

[599:01]

you can set another integer variable

[599:03]

like n equal to that string length and

[599:05]

then after the semicolon just keep

[599:07]

comparing i against n which means you're

[599:10]

not calling functions wastefully as

[599:12]

before. All right if I didn't mess up

[599:15]

anything else let me go into my

[599:17]

terminal. Let me do uh oh did I mess

[599:19]

something up?

[599:24]

I still Yes, I did mess something up. I

[599:26]

should have put this back as well. Thank

[599:27]

you. All right. So, let's go ahead and

[599:29]

do make copy. Enter dot /copy. And now

[599:32]

I'm going to go ahead and type in hi in

[599:34]

all lowercase and hit enter. And you'll

[599:37]

see now that s is unchanged. It's

[599:40]

printed out again in lowercase, but t is

[599:43]

in fact capitalized here. Now, why is

[599:46]

this? Well, in this case, what's

[599:47]

happened is that I've got S in memory,

[599:50]

but this time when I allocate T, I then

[599:53]

use Maloc to get a whole chunk of memory

[599:55]

here that initially just contains who

[599:56]

knows what garbage values as we've

[599:58]

called them before. I'll just leave them

[599:59]

as blank here, but it happens to be for

[600:01]

the sake of discussion at ox456 7 8 and

[600:04]

9. When then I actually set t equal to

[600:07]

the return value of maloc, it's as

[600:09]

though t is just pointing to this chunk

[600:10]

of memory. Then in my own loop when I go

[600:13]

from zero on up through n that just

[600:16]

means to copy the h then the i then the

[600:19]

exclamation point and because of the

[600:20]

equal sign also print uh copy the null

[600:24]

character instead.

[600:26]

So this is getting a little tedious

[600:28]

though admittedly like this is a lot of

[600:29]

work just to copy a couple of strings.

[600:31]

Could we be doing this a little bit

[600:33]

better? So we actually can because of

[600:35]

the libraries we're including. Turns out

[600:37]

there's functions for copying strings

[600:38]

that come with C. So in fact if I go

[600:40]

back to VS code here I don't actually

[600:43]

need any of this for loop here so long

[600:46]

as I have actually allocated enough

[600:48]

memory for this string which I do think

[600:50]

I've had. I can actually use literally a

[600:52]

function called stir copy strcpy for

[600:55]

short and pass in the destination and

[600:58]

the source in that order. Almost feels a

[601:00]

little backwards but that's the way it's

[601:01]

done to copy s's bytes into t. It's easy

[601:06]

to mess them up, but don't mess them up.

[601:08]

Per the documentation, the destination

[601:10]

comes first and then the source string

[601:12]

instead. So, if I do this now, let's do

[601:14]

make copy. We're good to go. Uh, if I do

[601:18]

dot /copy now and type in high and all

[601:20]

lowercase, we still have preserved that

[601:22]

good property. But let me propose that

[601:25]

things can go wrong. And in fact, this

[601:27]

is about to make the program look way

[601:28]

more complicated than feels ideal. But

[601:30]

I've been a little lazy here. There's a

[601:32]

bunch of things that can go wrong for

[601:34]

which it's worth knowing about the

[601:35]

return values of these here functions.

[601:38]

So all of this time it has been possible

[601:41]

for certain functions we've been using

[601:43]

get string among them to return

[601:45]

confusingly

[601:46]

this null value null. Again humans

[601:51]

decades ago decided that one would be

[601:53]

called null. Other humans decided this

[601:55]

new thing would be called null. N UL

[601:58]

pronounced null is just the null

[602:00]

terminator back/zero. It is a single

[602:02]

bite of eight bits all of which are

[602:04]

zeros. That's been true for a few weeks

[602:06]

now. NL happens to be a special memory

[602:10]

address literally ox0 at which nothing

[602:14]

is supposed to ever live. So whenever I

[602:17]

describe the top left corner as this is

[602:19]

address zero, this is one, this is two.

[602:21]

Humans years ago decided, you know what,

[602:23]

let's just waste bite location zero and

[602:26]

never put anything there so that we have

[602:28]

a special value to ensure that we can

[602:31]

signal when something has gone wrong. So

[602:32]

humans just decided don't use memory

[602:34]

address ox specifically and a few bytes

[602:37]

after it. So what does this mean? Well,

[602:39]

in my code all this time and since week

[602:42]

one, frankly, things could have gone

[602:44]

wrong. So in VS Code here, I'm using get

[602:47]

string and I'm using Maloc and I'm using

[602:49]

stir copy and um all of these print

[602:52]

statements here, but I'm not actually

[602:54]

adding as many error checks as I should.

[602:57]

So it turns out if you read the actual

[602:58]

documentation for get string, which in

[603:00]

fairness we never told you about until

[603:01]

now, in cases of error, get string can

[603:04]

return null. Why would it ever have an

[603:06]

error if the human types in such a large

[603:09]

paragraph of text maybe that there's no

[603:10]

room in the computer's memory for

[603:12]

everything they've typed in? Well, you

[603:13]

don't want to just get back part of the

[603:14]

text and not know that something went

[603:16]

wrong. Get string is designed to return

[603:18]

a special sentinel value null in all

[603:21]

caps. That just means I can't oblige. I

[603:23]

can't return you a correct value. Here's

[603:26]

an error instead. So what I should

[603:28]

always have been doing since week one

[603:30]

but we consciously don't because it adds

[603:31]

just too much overhead is check if s

[603:35]

equals equals null then we should abort

[603:37]

the program altogether and for instance

[603:39]

like return one as we've done before to

[603:41]

just signify error like we cannot

[603:43]

proceed because get string did not work

[603:46]

that is true of maloc 2 technically we

[603:48]

should say if the address in t also

[603:50]

equals null that is ox0

[603:53]

we should also return one because

[603:56]

something uh went wrong.

[604:00]

So, let's do this one more time. Turns

[604:03]

out that even two upper is taking for

[604:05]

granted the fact that the humans typed

[604:07]

in anything at all. What if the human

[604:09]

just types enter? Well, that's a valid

[604:11]

string. It's the so-called empty string,

[604:13]

quote unquote. But what is the length of

[604:16]

nothing? It's going to be zero. And

[604:19]

that's problematic because if you try to

[604:21]

go to T at the first location, what is

[604:24]

actually there? Well, that's actually

[604:25]

the null character, which is not

[604:27]

something you should even try to

[604:28]

capitalize, it would seem. So, what we

[604:30]

should really do here, too, is check

[604:32]

only if the sterling of S is greater

[604:36]

than zero should you even bother

[604:38]

uppercasing that first character. I

[604:41]

mean, one, at best, it makes no sense

[604:43]

because if there's no string, there's

[604:44]

nothing to uppercase. At worst, I could

[604:46]

break something by touching memory that

[604:48]

I should not. And if I may, there's

[604:51]

another issue. Now, on line 15, I'm

[604:54]

asking the computer for memory, and it's

[604:56]

going to hand me those four bytes. But

[604:57]

technically, I'm never giving them back.

[604:59]

And so, even though this program is so

[605:00]

short that it's going to quit pretty

[605:02]

soon, and it's not a big deal, the

[605:03]

computer will automatically reclaim that

[605:04]

memory in longunning programs that like

[605:07]

servers or things that are running for a

[605:08]

long time. If you use Maloc and ask for

[605:11]

memory, but never give it back to the

[605:13]

computer, never free it, so to speak,

[605:15]

your computer might get slower and

[605:17]

slower and slower and slower essentially

[605:18]

because it's running out of memory. Not

[605:20]

physically, but the computer thinks it's

[605:22]

using all of its memory even if it's not

[605:24]

actively in use. You as the human know

[605:27]

best. And so at the end of this program

[605:29]

when I am completely done with T, you

[605:32]

should similarly call free of T passing

[605:35]

in the address that you allocated

[605:37]

previously so that the operating system

[605:39]

gets that memory back. If you don't do

[605:41]

that, it's what's called a memory leak.

[605:42]

If you've ever used a Mac program, a

[605:44]

Windows program, an iPhone or Android

[605:46]

program that somehow is just getting

[605:48]

slower and slower and slower and slower,

[605:51]

that is often a symptom of a human

[605:52]

having messed up and not freeing memory

[605:55]

that they don't actually need anymore.

[606:01]

Questions on null or any of these kinds

[606:04]

of checks?

[606:07]

No. All right. Well, as a teaser, in

[606:11]

just a bit, we're going to reveal when

[606:13]

and why things can go terribly wrong by

[606:15]

way of a little bit of claimation from

[606:17]

our friends at Stanford, but feels like

[606:18]

we're long past a good uh snack break.

[606:20]

So, why don't we go ahead and have some

[606:21]

oranges and some fruit snacks, and we'll

[606:23]

see you in 10.

[606:25]

All right, we are back. So, with memory,

[606:28]

a lot of things can go wrong. And in

[606:30]

fact, a question came up during the

[606:31]

break about whether or not I should have

[606:33]

also called free on s, which was the

[606:36]

string that I actually got back from get

[606:38]

string. The short answer is no. This has

[606:40]

been a deliberate choice over the past

[606:41]

several weeks whereby the implementation

[606:43]

by CS50 of get string automatically

[606:46]

frees memory that it has given to you

[606:48]

once it is no longer needed. So that's a

[606:50]

bit of magic underneath the hood once

[606:52]

those train once you no longer use that

[606:54]

though that feature goes away. But

[606:55]

because I actually used maloc to get my

[606:58]

memory for t I did have to free that

[607:01]

specific memory. So the rule of thumb

[607:02]

quite simply is if you maloclocked it

[607:04]

you must free it. If we get string

[607:07]

malocked it, you do not have to free it

[607:09]

yourself. But of course, things can go

[607:11]

wrong. And thankfully, there are tools

[607:13]

via which we can find memory related

[607:15]

errors. And one thing we're going to

[607:16]

show you briefly is another tool called

[607:17]

Valgrren, which is a nice complement to

[607:19]

something like debug 50 and print f and

[607:22]

the duck for actually chasing down

[607:23]

specifically in this case memory related

[607:26]

errors. So in fact, let me go over to VS

[607:28]

Code and open up a program I wrote in

[607:30]

advance because it's just not all that

[607:32]

useful, but it is demonstrative of some

[607:34]

things that can go wrong. And in

[607:35]

memory.c we have this code here. We

[607:38]

include standard IO.h and we include

[607:40]

standard lib.h the latter of which

[607:42]

recall is necessary now when you want to

[607:44]

use maloc and in turn free. And inside

[607:46]

of this main function I'm doing a few

[607:47]

things. I am first allocating three

[607:51]

integers in kind of an interesting way

[607:54]

because it turns out that maloc takes as

[607:56]

its argument the number of bytes that

[607:58]

you want to get. Now I know on most

[608:00]

systems an integer is indeed four bytes.

[608:02]

So if I want space for three integers, I

[608:04]

could just do 3 * 4 is 12 and put 12

[608:07]

inside the parenthesis here. But that's

[608:09]

generally frowned upon because it would

[608:10]

make my code less portable to other

[608:12]

systems where an int might not be four

[608:14]

bytes. So turns out you can use this

[608:16]

operator size of and actually ask the

[608:19]

computer how big is a data type like an

[608:21]

int on this specific system. And for

[608:24]

chars you'll always get back one. For

[608:26]

ins usually get back four. And same goes

[608:28]

for other data types as well. But this

[608:30]

is the more dynamic way to ask that

[608:31]

question. If you want to get three uh

[608:34]

integers worth of memory, what I'm then

[608:36]

going to do is assign on the left hand

[608:38]

side the return value of maloc to this

[608:40]

variable x just because and x itself is

[608:43]

a pointer to an integer more

[608:46]

specifically to this chunk of memory

[608:47]

which is a sequence of three integers.

[608:50]

This is very arbitrary and this is only

[608:51]

meant to demonstrate things you can do

[608:54]

incorrectly ultimately. But this is how

[608:56]

I would dynamically get space for three

[608:59]

integers from maloc and store the

[609:02]

address thereof in x. So it stands to

[609:04]

reason that I could put my first value

[609:06]

at uh x bracket 1 equ= 72, my second

[609:10]

value uh equaling 73 and my third value

[609:13]

equaling 33. Now if some of this is

[609:15]

rubbing you wrong, like these are

[609:16]

actually there's riddled with mistakes

[609:18]

already, some of which are old to us.

[609:20]

What's the first thing I've done wrong?

[609:21]

Even if you have no idea what's going on

[609:23]

with line eight, what about lines 9, 10,

[609:25]

and 11? What I do wrong?

[609:29]

Yeah.

[609:30]

>> Yeah, my indexing is wrong. Like we've

[609:32]

known for weeks now that with arrays or

[609:34]

with array syntax, you always start

[609:35]

counting at zero, then one, then two,

[609:38]

not one, two, three. So that's an issue.

[609:40]

And this is a new detail. But given that

[609:42]

I've used maloc on line 8, what other

[609:44]

mistake have I done in this version of

[609:46]

the program?

[609:50]

What's missing?

[609:52]

Free. So I didn't actually call free. So

[609:54]

this program has a memory leak. It's

[609:56]

asking for memory and never handing it

[609:58]

back. Now that's pretty good. You know,

[609:59]

a few of us were able to just kind of

[610:00]

eyeball the code and debug it. But

[610:02]

that's not going to be true for all

[610:04]

people, all programs, certainly when the

[610:05]

programs get larger and more

[610:07]

complicated. So a program like

[610:09]

Valgrren's purpose in life is to help

[610:11]

you spot these kinds of errors. So for

[610:13]

instance, when I run make memory to

[610:16]

compile this program and then do

[610:17]

slashmemory at a glance, like it

[610:19]

actually seems perfectly fine, if only

[610:21]

because I'm not seeing any me errors

[610:22]

even when I compile it or when I run it.

[610:25]

But we I do claim that there's at least

[610:27]

two that we've seen here. It's just

[610:29]

we're not getting so unlucky that the

[610:31]

program is actually crashing as a

[610:34]

result. So this is a more latent, harder

[610:36]

to detect bug. But what I'm going to do

[610:37]

now is this. I'm going to open up my

[610:39]

terminal window in full screen. I'm

[610:41]

going to then do Valgrind space

[610:44]

memory so as to run the Valgrren memory

[610:47]

checker on this program. So similar to

[610:49]

debug 50, but the name now is Valgrren.

[610:51]

This isn't a CS50 thing. This is a

[610:52]

common program that programmers use.

[610:54]

When I hit enter, the output's going to

[610:56]

be atrocious, frankly. Um it's more way

[610:58]

more complicated than it needs to be.

[611:00]

They put this number here, which means

[611:02]

something specific, but it's just stupid

[611:03]

that it's on every line of output. So

[611:04]

it's overwhelming at a glance. But once

[611:06]

you've trained your eyes to look for

[611:08]

useful information, there's a couple of

[611:10]

useful insights here. So one, invalid

[611:13]

write of size 4 that apparently is

[611:15]

somehow related to line 11. So let's go

[611:19]

there. Let me just minimize my terminal

[611:21]

window, look at line 11 of memory C, and

[611:24]

just see which line that was. Okay,

[611:26]

invalid write of size 4. Well, writing

[611:30]

means like changing a value. Reading

[611:32]

means accessing a value. So they're sort

[611:34]

of opposites. invalid write of size

[611:35]

four. Well, here's why it's generally

[611:37]

useful to know generally how big an int

[611:39]

is. Like four, you're trying to write

[611:40]

four bytes incorrectly. So why is line

[611:43]

11 invalid?

[611:45]

Just to be clear,

[611:47]

because the index is off like I'm

[611:49]

touching memory that I should not. If I

[611:51]

ask the computer for space for three

[611:53]

integers, each of which is four bytes,

[611:56]

that should give me location 0, one, and

[611:58]

two, not location three. So you still

[612:01]

have to know a little something about

[612:02]

programming to be able to make good use

[612:04]

of that information invalid right of

[612:06]

size four but once you've sort of

[612:08]

trained your mind and your eye to catch

[612:10]

it like h now I'm an idiot I have to go

[612:12]

in and fix that problem but what else is

[612:14]

wrong based on valgrren's output here so

[612:18]

this is kind of worrisome leak summary

[612:20]

definitely lost 12 bytes in one blocks I

[612:23]

don't really know what one blocks means

[612:25]

for now but 12 bytes should be familiar

[612:27]

because if you generally remember that

[612:28]

an int is four bytes and you ask or

[612:30]

three of them. Oh, there's my 12. So,

[612:32]

somehow I'm losing 12 bytes of memory.

[612:34]

Not in a literal sense, but it means by

[612:36]

the time the program finishes, you have

[612:39]

not returned or freed all of the memory

[612:42]

that you asked for. So, this line here

[612:45]

is your hint that you've done something

[612:46]

wrong with respect to 12 bytes in total.

[612:49]

And sometimes you'll see slightly

[612:50]

different output here. For instance, we

[612:52]

see mentioned up here, 12 bytes and one

[612:54]

blocks are definitely lost in loss

[612:56]

record 101. Very verbose. But the juicy

[612:58]

part is ah on line 8 is the source of

[613:01]

that error specifically. So there too

[613:03]

it's a little bit of a breadcrumb

[613:04]

leading me to the solution for fixing

[613:06]

this. So if I go up here, I look at line

[613:09]

8. Okay, there's only so much that I

[613:11]

could have done wrong on line 8. If I've

[613:13]

maloced the memory on line 8, sounds

[613:15]

like I do need to free it later on. So

[613:17]

let's fix both of these problems. The

[613:19]

first one is just the indexing issue.

[613:20]

Change the 1 2 3 to 0 1 2. Let's then ch

[613:23]

fix the second problem by just freeing x

[613:25]

at the very end. And just for good

[613:28]

measure,

[613:29]

this was not caught by Valgrren because

[613:31]

it doesn't always happen. But there's

[613:33]

one other

[613:35]

scenario that could go wrong and it

[613:37]

relates to line eight.

[613:40]

What should I be doing?

[613:49]

>> I am doing an array, but recall that we

[613:51]

can use array syntax on chunks of

[613:54]

memory. So technically what line 8 is

[613:57]

doing is this. It is allocating 12 bytes

[614:00]

of memory from the computer just because

[614:01]

just to demonstrate how maloc works and

[614:03]

it's storing the address of that first

[614:05]

bite in a variable called x. The bracket

[614:08]

notation is just the syntactic sugar

[614:10]

that allows me to change values at x's

[614:13]

address. I could alternatively just use

[614:15]

pointers and say go to x and put 72

[614:18]

there. Go to x + one and put 73 there.

[614:24]

go to x + 2 and put 33 there using

[614:29]

pointer arithmetic. But those are

[614:30]

identical and no generally, you know,

[614:33]

most people would just use square

[614:35]

bracket notation because it's just a

[614:36]

little cleaner and easier to read and

[614:38]

write. Okay, but back to this question.

[614:40]

There's still a subtle bug here based on

[614:42]

our example just before break. What

[614:44]

should you be doing anytime you call

[614:46]

maloc and get string and a few other

[614:49]

functions for that matter?

[614:54]

Did I hear the answer? Checking for

[614:58]

checking for null, right? Because if me

[615:00]

lock has an error, there's not enough

[615:03]

memory for whatever reason, you should

[615:05]

not be proceeding to touch that memory

[615:06]

because it might be the null address

[615:09]

that is 0x0. So what you should really

[615:12]

be checking is, well, if x equals equals

[615:14]

null, there's no more work to be done

[615:16]

here. Let's just return one down here.

[615:18]

And only if we get all the way to the

[615:19]

bottom should we maybe return zero to

[615:21]

signify uh explicitly that there is in

[615:24]

fact successful operation. All right,

[615:26]

with that said, let's go back down here.

[615:28]

Remake memory. No error messages from

[615:30]

the compiler. Dot /memory. That too

[615:33]

seems okay, but it was fine the first

[615:34]

time. Let's now run valgrren. Let me uh

[615:36]

maximize my window. Run valgrren dot

[615:39]

slashmemory. Crossing my fingers as

[615:42]

always. And now this is actually pretty

[615:44]

good. It's much shorter output even

[615:46]

though it's just as scary at a glance,

[615:47]

but most of this is fluffy and not uh

[615:49]

very uh revealing. Heap summary in use

[615:52]

at exit zero and zero. So look like all

[615:54]

heap blocks were freed. No leaks are

[615:56]

possible. Heap is a word we'll come back

[615:58]

to, but this means there's nothing

[616:00]

wrong. In fact, zero errors, which is a

[616:02]

good thing. So in short, Valgrren is

[616:04]

among the most arcane programs we're

[616:05]

going to use. It's output was really

[616:07]

designed for those more comfortable, if

[616:09]

you will. But there's still juicy

[616:10]

insights there. If you just kind of look

[616:12]

for things that lead you to like this

[616:14]

file on this line number, odds are that

[616:16]

will lead you to the most subtle of

[616:18]

bugs. In fact, another type of bug is

[616:20]

when we do indeed touch memory, we

[616:22]

shouldn't. So, let me uh zoom out on

[616:24]

that, clear my terminal, and let me open

[616:25]

up another program or maybe write this

[616:28]

one real fast incorrectly. So, let me

[616:31]

create a program called garbage.c C to

[616:34]

demonstrate what we've generally called

[616:36]

garbage values. That is values that are

[616:38]

still in memory, but I didn't put them

[616:39]

there myself necessarily. I'm going to

[616:41]

include standard io.h. I'm going to

[616:43]

include standard lib.h. And then I'm

[616:46]

going to go ahead and actually no need

[616:48]

for standard lib this time. Let's do int

[616:50]

main void. And inside of main, let's

[616:53]

give myself an array of like way too

[616:55]

many exam scores or whatnot. We used to

[616:58]

do just a few, but let's say there's

[616:59]

a,024. Then let's go ahead and do for

[617:02]

int uh for int i equals z i less than

[617:06]

124 i ++ and in here let's go ahead and

[617:11]

print out uh whoops let's go ahead and

[617:14]

print out using print f each of those

[617:16]

scores of course I have clearly

[617:18]

forgotten to do something in this

[617:20]

program which is what

[617:27]

I haven't actually put in any scores

[617:29]

there for real like I've asked the

[617:30]

computer give me an array for 12,024

[617:34]

integers, but I've not used get int or

[617:36]

even manually typed in any of my quiz

[617:38]

scores, which we did in the past. That's

[617:40]

because I'm intentionally trying to show

[617:41]

us garbage inside of the computer's

[617:43]

memory. What this loop is going to do on

[617:45]

line 8 now is literally print out the

[617:47]

first int, the second int, the third

[617:48]

int, all,024 ins, but all of them should

[617:52]

be garbage values because I myself

[617:54]

haven't put anything in those addresses

[617:56]

yet. So, let's go ahead and make

[617:58]

garbage. Let's go ahead and maximize my

[618:00]

terminal window just to see more on the

[618:02]

screen. Do dot/garbage. It's going to be

[618:04]

super fast output because the computer's

[618:05]

way faster than,024 variables values

[618:07]

alone. There is a lot of garbage output.

[618:10]

So when we talk about garbage values in

[618:12]

the abstract like here's just some

[618:13]

random zeros, a 25, a 32,000, a negative

[618:16]

number and so forth, that's because

[618:17]

that's essentially remnants from the

[618:19]

computer's memory of stuff that might

[618:20]

have happened previously, not

[618:22]

necessarily by me in this moment, which

[618:24]

is to say you just shouldn't touch that

[618:26]

memory at all whatsoever. So now we're

[618:29]

seeing garbage values for the actual

[618:31]

first time. Let's consider another

[618:33]

example of a program that uh doesn't

[618:36]

contain that does contain potentially

[618:40]

memory errors. And let's look at this

[618:42]

too. So this is not really a useful

[618:44]

program. It's meant to be demonstrative

[618:46]

of some of these concepts. So here we

[618:47]

have a program takes no command line

[618:49]

arguments. Up here we've got a line that

[618:52]

pair of lines that declares two pointers

[618:54]

but doesn't yet initialize them to any

[618:56]

variables. And that's fine. You don't

[618:57]

have to have an equal sign with any

[618:59]

variable. You just eventually should

[619:02]

assign it some value. But this just

[619:03]

tells the computer, give me a variable X

[619:05]

that's going to store the address of an

[619:06]

int. Give me another variable Y that's

[619:08]

going to store the address of another

[619:09]

int. Okay, what happens next? Well, on

[619:12]

this line of code, in this simple

[619:13]

example, we're allocating enough space

[619:15]

for a single integer just because it's a

[619:18]

stupid exercise. There's no reason to do

[619:19]

this other than to demonstrate how Maloc

[619:21]

works for the moment. Maloc returns the

[619:24]

address of that chunk of memory. So

[619:27]

that's what goes in X. So X is now

[619:29]

pointing at somewhere in memory four

[619:31]

bytes of space that it can certainly put

[619:33]

a value at. How do we do that? Well, if

[619:36]

you do star X and use the dreference

[619:38]

operator, that means go to that chunk of

[619:40]

memory and put the number 42 there.

[619:41]

That's totally valid. This says go to

[619:44]

the address in Y and put the unlucky

[619:47]

number 13 there. Unlucky quite literally

[619:51]

because what is Y pointing to at this

[619:53]

moment?

[619:55]

It's just the garbage address. Why?

[619:57]

Because if you don't initialize Y, who

[620:00]

knows what it's going to be pointing to?

[620:01]

Maybe it's zero, maybe it's 25, maybe

[620:02]

it's 32,000, a negative number, just

[620:04]

like we saw in the previous example. You

[620:06]

have no idea what values are going to be

[620:09]

in X and Y unless you yourself put those

[620:11]

values there. So, this is highlighted in

[620:13]

red because bad things are going to

[620:15]

happen if you try to dreference an

[620:17]

invalid or a bogus pointer. Even worse

[620:20]

than just touching uh variables that

[620:22]

might not have values, if you dreference

[620:24]

an address and try going to some random

[620:26]

place, the computer is generally not

[620:28]

going to like that. And in fact, our

[620:30]

friends at Stanford wonderfully brought

[620:31]

this particular scenario to life whereby

[620:33]

even though this example is a bit

[620:35]

contrived just to fit it all on the

[620:36]

screen at once, it is going to be the

[620:38]

case that bad things happen if we don't

[620:42]

check for these values and actually

[620:44]

assign valid values in the form of as

[620:46]

we'll see now some claimation. So here I

[620:49]

give you uh binky

[620:53]

uh which is a bit of claimation from our

[620:55]

friend Nick Parlante at Stanford. If we

[620:57]

could dim the lights unnecessarily

[620:59]

dramatically.

[621:02]

>> Hey Binky, wake up. It's time for

[621:05]

pointer fun. What's that? Learn about

[621:08]

pointers. Oh goody. Well to get started

[621:12]

I guess we're going to need a couple

[621:13]

pointers. Okay. This code allocates two

[621:16]

pointers which can point to integers.

[621:19]

>> Okay. Well, I see the two pointers, but

[621:21]

they don't seem to be pointing to

[621:22]

anything.

[621:23]

>> That's right. Initially, pointers don't

[621:25]

point to anything. The things they point

[621:26]

to are called pointies, and setting them

[621:28]

up is a separate step.

[621:30]

>> Oh, right. Right. I knew that. The

[621:32]

pointies are separate. So, how do you

[621:34]

allocate a pointy?

[621:35]

>> Oh, thanks.

[621:36]

>> Okay. Well, this code allocates a new

[621:38]

integer pointy, and this part sets X to

[621:41]

point to it.

[621:42]

>> Hey, that looks better. So, make it do

[621:44]

something.

[621:45]

>> Okay. I'll dreference the pointer X to

[621:48]

store the number 42 into its pointy. For

[621:51]

this trick, I'll need my magic wand of

[621:53]

dreferencing. Your magic wand of

[621:56]

dreferencing. Uh, that that's great.

[622:00]

This is what the code looks like. I'll

[622:02]

just set up the number. And

[622:04]

hey, look, there it goes. So, doing a

[622:07]

dreference on X follows the arrow to

[622:10]

access its point. in this case to store

[622:12]

42 in there. Hey, try using it to store

[622:15]

the number 13 through the other pointer

[622:17]

Y. Okay, I'll just go over here to Y and

[622:22]

get the number 13 set up and then take

[622:24]

the wand of dreferencing and just

[622:29]

Oh, hey, that didn't work. Say, uh,

[622:32]

Binky, I don't think dreferencing Y is a

[622:35]

good idea cuz, uh, you know, setting up

[622:37]

the point is a separate step and, uh, I

[622:39]

don't think we ever did it. H good

[622:41]

point.

[622:42]

>> Yeah, we we allocated the pointer Y, but

[622:45]

we never set it to point to a point D. H

[622:48]

very observant.

[622:49]

>> Hey, you're looking good there, Binky.

[622:51]

Can you fix it so that Y points to the

[622:53]

same point as X? Sure, I'll use my magic

[622:55]

wand of pointer assignment. Is that

[622:58]

going to be a problem like before? No,

[623:00]

this doesn't touch the pointies. It just

[623:02]

changes one pointer to point to the same

[623:03]

thing as another. Oh, I see. Now Y

[623:07]

points to the same place as X. So, so

[623:09]

wait, now Y is fixed. It has a pointy.

[623:12]

So, you can try the wand of dreerencing

[623:13]

again to send the 13 over.

[623:16]

Okay, here it goes. Hey, look at that.

[623:20]

Now, dreferencing works on Y. And

[623:22]

because the pointers are sharing that

[623:23]

one point, they both see the 13. Yeah,

[623:26]

sharing. Uh, whatever. So, are we going

[623:28]

to switch places now? Oh, look, we're

[623:30]

out of time. But I can only imagine how

[623:33]

long that took, Nick. But the key detail

[623:36]

was that bad things happened to Binky

[623:37]

when we did this line of code.

[623:39]

Dreferencing a invalid pointer that had

[623:42]

no true value assigned. It was just some

[623:44]

garbage value. Now what's the solution?

[623:45]

Well, as Nick proposed, just don't do

[623:47]

that. And instead, at least do something

[623:49]

sensible like assign X equal to Y. Not

[623:52]

to make a copy of anything per se, but

[623:54]

to literally point X at the same

[623:56]

location in memory to point Y at the

[623:58]

same location in memory as X. Then a

[624:00]

line like this is perfectly valid. you

[624:03]

can go to that address which happens to

[624:05]

be the same as the 42 and that's why in

[624:07]

the claimation form we saw that the 42

[624:10]

became a 13 instead. So again at the end

[624:12]

of the day this is only demonstrative of

[624:14]

these basic building blocks that we now

[624:16]

have at our disposal but also how easy

[624:19]

it is to do things incorrectly. So this

[624:22]

is one of those with great power comes

[624:23]

great responsibility. C is one of the

[624:25]

languages that is incredibly high

[624:28]

performing. It's so close to the

[624:29]

hardware that you have so much control

[624:31]

over the memory and operation that you

[624:32]

can write really good, really fast code.

[624:34]

And that's why even all these decades

[624:36]

later, it's among the most omniresent

[624:38]

programming languages in the world. At

[624:40]

the same time, you can really screw

[624:41]

things up. And so many of today's

[624:43]

software that are hacked in some way or

[624:46]

crashed for some reason is often because

[624:47]

humans have just missed some simple

[624:50]

mistake like this that happens to relate

[624:52]

to memory. So more modern languages that

[624:54]

we'll soon see like Python and if I in

[624:56]

high school you studied Java. Uh you

[624:58]

don't have this much control over the

[625:00]

computer's memory. There's many more

[625:01]

defenses put in place to protect you and

[625:04]

me from ourselves so to speak. But you

[625:06]

pay the price by some of those languages

[625:08]

tend to be uh less uh slower and less

[625:11]

performant. Yeah.

[625:27]

What is the difference here that we're

[625:28]

now playing with memory? This will

[625:29]

become clear this week and next. And in

[625:30]

fact, some of the examples on which

[625:32]

we'll end today will motivate needing to

[625:34]

have finer grain control over what's

[625:36]

going on inside of the computer. When

[625:37]

you want to deal with files, for

[625:39]

instance, you're going to need to know a

[625:40]

little something about memory addresses

[625:41]

and where things are. when you want to

[625:43]

build structures in memory beyond the

[625:45]

complexity of an array. In fact, next

[625:47]

week we're going to start building like

[625:48]

two-dimensional structures in the

[625:50]

computer's memory to represent the

[625:52]

equivalent of like a family tree, for

[625:53]

instance, or trees more generally that

[625:56]

can store data in a more efficient way.

[625:58]

Up until now, all we have is arrays. And

[626:00]

with arrays, you can achieve something

[626:01]

like binary search, but we're going to

[626:03]

see there are things you can't do with

[626:04]

arrays, especially if speed's important.

[626:06]

>> But I I was saying like, for example, if

[626:08]

you were to ask me to do this like say

[626:09]

last week about this, I would be like x

[626:12]

equals like 13 or something like

[626:14]

assigning a variable.

[626:15]

>> Correct. So last week if you just said

[626:17]

int x= 13 or in y equals 42 or whatnot

[626:20]

totally fine. And again this program

[626:22]

sole purpose in life is to demonstrate

[626:24]

how you can make mistakes in and of

[626:26]

itself is not useful here but it's

[626:28]

representative of how we're going to

[626:30]

start using this syntax not only in this

[626:32]

week's problem sets but next week as

[626:34]

well.

[626:36]

All right. So, with that claim made that

[626:40]

we can do a lot of damage, let's

[626:42]

consider how pointers and knowledge of

[626:44]

memory addresses can actually solve some

[626:46]

useful problems. Um, can we get one

[626:48]

volunteer to come on up and help pour a

[626:50]

drink? Come on up. All right. What is

[626:53]

your

[626:54]

name?

[626:59]

Come on over.

[627:01]

>> If you want to say a quick hello to the

[627:03]

group.

[627:03]

>> I'm Olivia.

[627:04]

>> Okay. and and a little something about

[627:06]

yourself.

[627:06]

>> Oh, um I live in Canada.

[627:08]

>> Okay, welcome. Well, come on over here,

[627:09]

Olivia. And we have um two glasses.

[627:12]

Well, really three glasses. So, we have

[627:13]

these fancy ray bands that have cameras

[627:15]

built in whereby we can sort of capture

[627:17]

your point of view. If you're

[627:18]

comfortable, we'll put these on. There's

[627:19]

no lenses in them. The white light will

[627:21]

mean we're recording. Hopefully, a

[627:23]

memorable moment.

[627:27]

This battery too is dead. All right. We

[627:29]

don't have a backup for the backup, so

[627:30]

we're going to pretend that this part

[627:31]

never happened. So,

[627:34]

>> Olivia, we have two glasses here for

[627:35]

you. And I'm going to go ahead and pour

[627:37]

uh some colored liquid into both. So,

[627:39]

we've got some blue liquid here into

[627:40]

this glass. All right. So, we'll fill

[627:42]

this up here.

[627:45]

And then in this one, we're going to go

[627:47]

ahead and pour this orange liquid. And

[627:49]

at this point in the story, I'm going to

[627:51]

exclaim, "Oh no, I accidentally put the

[627:53]

wrong liquid in the wrong glass. So, I

[627:55]

got this backwards." So, what I'd like

[627:57]

you to do is swap the values in these

[627:59]

glasses so that the blue goes into that

[628:01]

glass and the the orange goes into this

[628:04]

glass

[628:04]

>> without mixing it or

[628:06]

>> without mixing it. So, well, you're

[628:07]

hesitating. Why?

[628:08]

>> Well, it would be hard to do unless you

[628:10]

can like talk to the mic if you could.

[628:11]

>> Oh, it would be like hard to do um

[628:14]

without mixing the two because like you

[628:16]

don't have anywhere to put the other

[628:18]

one,

[628:18]

>> of course. So, in the real world, this

[628:20]

is not really solvable unless for

[628:21]

instance, we have a temporary variable

[628:23]

if you will, like an empty glass in

[628:24]

which to do this. So, here is your third

[628:26]

variable if you want to go ahead now and

[628:28]

get the blue into that one and the

[628:29]

orange into that one. Yeah.

[628:33]

No pressure.

[628:36]

All right. So, we're putting one value

[628:39]

into the temporary variable. We're

[628:41]

putting the other value into the

[628:44]

original value.

[628:48]

Okay. And now you're probably going to

[628:50]

take Yep. I'm guessing the temporary

[628:52]

value put it into the original variable

[628:59]

and that that was very well done. If

[629:00]

maybe we can give Olivia a round of

[629:02]

applause for just that. Thank you. We

[629:03]

have

[629:05]

little parting gift for you here too. So

[629:09]

goal here really being to create a

[629:10]

memorable moment of like oh remember the

[629:11]

time Olivia tried to swap two values she

[629:13]

needed a temporary variable is the

[629:14]

takeaway. So why is that? one code. If

[629:16]

we wanted to do the same principle,

[629:18]

we're going to need somewhere temporary

[629:20]

to put one of those values before we can

[629:22]

make this happen. The catch is though

[629:24]

that if we don't do this intelligently,

[629:26]

like it's just not going to work in C

[629:28]

unless we take advantage of some of

[629:29]

these new capabilities. So, in fact, I'm

[629:31]

going to go over to VS Code here and I'm

[629:33]

going to open up a program called swap.c

[629:35]

that I wrote in advance whose purpose in

[629:37]

life is simply to swap two variables

[629:40]

values. So, I've got standard io.h at

[629:43]

the top so I can use printf. I've got

[629:45]

the prototype for a swap function which

[629:47]

is uh might as well be Olivia in this

[629:48]

case that's going to take two inputs A

[629:50]

and B or two uh glasses and swap their

[629:53]

values ultimately is its purpose inside

[629:55]

of main though I'm going to do this I'm

[629:57]

going to set two variables X and Y equal

[629:59]

to one and two respectively I'm then

[630:01]

just as uh point of clarification going

[630:04]

to print out the value of X is such and

[630:05]

such y is such and such then I'm going

[630:08]

to call the swap function aka Olivia to

[630:11]

swap the values x and y then I'm going

[630:13]

to print out x is this and why is this?

[630:15]

So that hopefully I'll see that they've

[630:16]

indeed been swapped. At the bottom of

[630:18]

this file, we have the actual swap

[630:20]

function. And as you might expect, it

[630:22]

takes two inputs, A and B, both of which

[630:24]

are integers. So I could have called

[630:25]

them anything I want. The first thing

[630:27]

this function does is it grabs an empty

[630:29]

glass called temp, puts a or the blue

[630:32]

liquid into it. Then we put into A the

[630:36]

value of B. So we've sort of lost the

[630:38]

value of A at this point except that we

[630:39]

did make a copy of it into temp. And

[630:41]

then lastly, we put into B the temporary

[630:44]

variable. And at the end, the temp

[630:46]

variable is empty. Although technically

[630:48]

it still has a copy of the value, but

[630:49]

it's no longer useful because the job is

[630:51]

done. And A has become B and B has

[630:52]

become A. So I dare say this is like the

[630:55]

literal translation of what Olivia just

[630:56]

did. And I I like the logic of it.

[630:59]

However, when I actually run this

[631:02]

program, something goes ary. So let me

[631:05]

go ahead and do make swap dot slap. And

[631:09]

I'll maximize my window. I should see

[631:11]

hopefully that X is one, Y is two, and

[631:13]

then X is two, and Y is one.

[631:17]

But no, like even though I literally

[631:20]

translated into code what Olivia did,

[631:23]

this didn't actually seem to work. And

[631:26]

why is that? Well, it turns out that

[631:28]

this version of the program is not

[631:30]

right. In fact, because of issues of

[631:32]

scope. And we've talked about scope

[631:34]

before, generally in the context of like

[631:35]

where a variable lives. We've said that

[631:37]

a variable only exists in like the most

[631:39]

recent curly braces that you opened up

[631:41]

for it. And that was true. It's just

[631:42]

sort of a colloquial way of describing

[631:44]

what scope is. But scope comes into play

[631:46]

here because it turns out that A and B,

[631:50]

in so far as they are the arguments or

[631:51]

parameters for the swap function, they

[631:54]

have a different scope than X and Y. And

[631:57]

that still follows the same definition.

[631:58]

They're inside of different curly braces

[632:00]

than X and Y are. So it seems that I may

[632:03]

very well be swapping A and B, but I'm

[632:06]

not having any impact on X and Y. So why

[632:09]

is that? Well, in C, all this time,

[632:12]

anytime you pass in arguments to a

[632:13]

function, you are passing in those

[632:15]

arguments by value, so to speak. You're

[632:17]

literally passing in copies of the

[632:19]

variables to the function you are

[632:21]

calling. So what does this mean? Well,

[632:24]

more concretely, if like this is a p

[632:25]

photograph of a chunk of memory inside

[632:27]

of the computer and we sort of zoom in

[632:29]

as we've done before and we abstract

[632:30]

away all of the bytes from top to

[632:32]

bottom, what's really happening inside

[632:34]

of the computer's memory is that we're

[632:36]

using some of it for X and Y and some

[632:39]

other memory for A and B. But how is

[632:42]

that in fact happening? Well, it turns

[632:43]

out to a question that came up before

[632:45]

the break, memory in a computer is

[632:47]

actually assigned in a somewhat

[632:48]

deliberate fashion. And generally if we

[632:51]

think of this rectangle is representing

[632:52]

my computer's whole chunk of memory.

[632:54]

Generally what happens when you run a

[632:56]

program with dot slash something or on a

[632:57]

Mac or PC by double clicking or on a

[633:00]

phone by single tapping. What happens is

[633:02]

all of the zeros and ones that were

[633:04]

compiled by the company or person who

[633:06]

made that program are loaded into the

[633:08]

top of the computer's memory so to

[633:10]

speak. This is just an artist rendition.

[633:11]

There's no notion of top or bottom per

[633:12]

se, but it's loaded into this chunk of

[633:14]

memory at the very edge of the

[633:16]

computer's memory aka machine code. the

[633:18]

zeros and ones that compose the actual

[633:20]

program. That's where they go. So,

[633:22]

they're copied from the hard drive or

[633:23]

the SSD, whatever you know it as, the

[633:25]

persistent storage, and it's put there

[633:26]

in the computer's RAM or random access

[633:28]

memory, which is the faster memory where

[633:30]

programs and files live while you are

[633:32]

using them. Meanwhile, if your program

[633:35]

or the program you're using has any

[633:36]

global variables, global in the sense

[633:38]

that they're defined outside of main and

[633:40]

not inside of main or inside of other

[633:42]

functions, they end up right below that

[633:44]

machine code by convention, just so

[633:46]

they're accessible everywhere.

[633:48]

Meanwhile, there's this big chunk of

[633:50]

memory below that called the heap. The

[633:52]

heap is the chunk of memory that Maloc

[633:54]

uses to allocate memory for you. So the

[633:57]

first time you call Maloc, it's going to

[633:58]

give you probably this chunk of memory.

[634:00]

The second time this chunk, the third

[634:01]

time, this chunk, and this chunk, and so

[634:02]

forth, back to back to back in memory,

[634:04]

but Maloc is going to manage all of that

[634:06]

for you. You don't have to worry about

[634:07]

where it's coming from, but it's coming

[634:09]

more generally from this big heap area.

[634:12]

But it turns out that the way computers

[634:14]

are designed is that the heap of course

[634:17]

sort of grows and therefore downward

[634:19]

again even though there's no notion of

[634:20]

up down inside of the computer but it

[634:22]

grows in this direction. But it'd be

[634:24]

nice to make use of this other area of

[634:27]

memory and that's what's called the

[634:28]

stack. And the stack is the area of

[634:30]

memory that's used anytime you create

[634:32]

local variables or call functions. So

[634:35]

again, maloc uses memory from up here

[634:38]

and functions and variables use memory

[634:40]

down here just because this is what

[634:42]

humans in a room decided years ago is

[634:44]

how the computer's memory would be used.

[634:45]

Therefore, the stack grows sort of

[634:47]

vertically much like stacking trays in a

[634:49]

cafeteria or the dining hall. They go

[634:51]

from bottom to top in this model. All

[634:53]

right. Well, let's consider for the

[634:55]

moment just how the stack is used

[634:56]

because we're using a main function in

[634:58]

this program. We're using a swap

[634:59]

function in this program. So I claim

[635:00]

that those functions are going to use

[635:02]

memory down here. Well, how are they

[635:04]

going to use it? And how is this in fact

[635:06]

bad for our current goal? Well, when you

[635:10]

call the main function, it uses this

[635:12]

chunk of memory here. Specifically, if

[635:14]

main has any arguments like command line

[635:17]

arguments, or if main has any local

[635:18]

variables, they end up down here in

[635:21]

memory. Meanwhile, when Maine calls

[635:23]

swap, swap gets the next available chunk

[635:25]

of memory above it, so to speak, in

[635:26]

memory, and any of its arguments or

[635:29]

local variables end up there. So when

[635:32]

main uh when swap is done executing it's

[635:34]

as though that memory disappears even

[635:36]

though the zeros and ones are still

[635:37]

there but the computer can now reuse

[635:39]

that same chunk of memory later. Airgo

[635:41]

garbage values when functions are being

[635:43]

called going up and down conceptually

[635:45]

that's why you're getting remnants of

[635:46]

previous values in the computer's

[635:48]

memory. But let's focus on main for a

[635:49]

moment in Maine in this program. Recall

[635:51]

that I declared two variables X and Y. X

[635:54]

getting the value one Y getting the

[635:56]

value two per these two lines of code.

[635:58]

Then I called the swap function. So swap

[636:00]

is going to get its own chunk of memory,

[636:03]

more technically called a frame of

[636:05]

memory. And inside of that frame, it has

[636:07]

two arguments, A and B, and a local

[636:09]

variable called temp. So I'll draw them

[636:11]

as such. When you actually call swap

[636:14]

passing in X and Y, X and Y are passed

[636:16]

in by value, that is to say copy. So A

[636:20]

becomes a copy of X and B becomes a copy

[636:23]

of Y. So when this line of code or

[636:25]

rather this uh prototype for swap just

[636:28]

makes clear that it takes two arguments

[636:30]

a and b both of which are integers in

[636:31]

that same order. So x comma y uh lines

[636:35]

up with a comma b. So what happens then

[636:37]

inside of the swap function if a is a

[636:40]

copy of x and b is a copy of y. Well at

[636:43]

the moment it's equal to one and two

[636:45]

respectively. But consider this first

[636:47]

line of code int temp gets a. So temp

[636:50]

takes on the value of a. Next line of

[636:53]

code, A gets B. So A gets the value of

[636:57]

uh B. Sorry, which just happened.

[636:59]

Meanwhile, B gets the value of temp. So

[637:02]

B gets the value of temp. Now temp still

[637:05]

has a copy of one. So it's not quite

[637:07]

analogous to the liquid because we're

[637:09]

that glass is clearly now empty, but it

[637:12]

does contain remnants of what it once

[637:13]

did. But the key here is that A and B

[637:15]

have successfully been swapped. If I

[637:18]

were to print out A and B, I would see

[637:20]

that they've been swapped. But what has

[637:22]

obviously not been swapped in this

[637:24]

story? No one has touched X or Y because

[637:26]

when swap returns, especially if I don't

[637:29]

even print out anything in swap, X and Y

[637:32]

are unchanged. So A and B, the copies

[637:35]

were swapped but not the original

[637:37]

values. And that's the essence of the

[637:38]

problem here with this represent this

[637:40]

simple uh example of swapping values

[637:42]

because I was passing by value. But as

[637:45]

of today, we now have a solution to this

[637:47]

problem. Because previously today, if I

[637:49]

asked you to write a function that

[637:50]

swapped two values, you could not

[637:52]

physically do it in code because you had

[637:54]

no way of expressing the solution to

[637:56]

this problem. But now we have the

[637:57]

ability to pass by reference. That is

[637:59]

use pointers and addresses more

[638:01]

generally to tell the function how to go

[638:03]

to an address and do something there.

[638:05]

How to go to another address and do

[638:06]

something there. How do I express this

[638:08]

syntactically? It's going to look a

[638:10]

little scary at first glance, but it's

[638:11]

just an application of today's new

[638:13]

building blocks. This bad version of the

[638:15]

program where a and b are both integers

[638:18]

just needs to change to be addresses of

[638:22]

integers. So give the function a sort of

[638:24]

treasure map that leads it to the actual

[638:26]

x and y by saying that a is now not

[638:29]

going to be an int per se but the

[638:30]

address of an int. b is going to be the

[638:32]

address of an int. And now to use those

[638:34]

values, you can say the following. int

[638:37]

temp gets whatever is at location A, go

[638:41]

to location A and put whatever is at

[638:43]

location B, go to location B and put in

[638:47]

the temp value. And here is a perfect

[638:49]

example of where this use and overuse of

[638:52]

the star or asterisk operator is just

[638:54]

like cognitively confusing frankly

[638:56]

because we use star for multiplication.

[638:58]

We use it for declaring a pointer. We

[638:59]

use it for dreferencing a pointer.

[639:00]

Ideally, humans years ago would have

[639:02]

come up with another symbol on the US

[639:03]

English keyboard to represent these

[639:05]

different ideas. But this is where we're

[639:07]

at. We're using the star for different

[639:09]

things in different contexts. So, this

[639:11]

just tells the computer that A is going

[639:13]

to be a pointer, an address of an int.

[639:15]

This tells the computer that B is going

[639:16]

to be the address of an int. This star

[639:19]

when there's no data type to the left of

[639:21]

it means go to that address, as does

[639:24]

every other example thereof. So, what's

[639:27]

happening this time? If we actually look

[639:29]

at the diagram again, X and Y are still

[639:31]

one and two respectively. Swap gets

[639:32]

called. It gets now the values of the

[639:36]

address of X and the address of Y. So

[639:39]

pictorially we might draw that as

[639:41]

following. A is pointing to X. B is

[639:44]

pointing to two. I mean technically it's

[639:45]

like ox123 and ox12 whatever, but who

[639:48]

cares? We're just going to abstract it

[639:49]

away now with actual arrows or pointers.

[639:52]

The beauty of this now then is if we

[639:54]

look at the swap function, int temp gets

[639:57]

star a that means start at a and go

[639:59]

there sort of shoots in ladder style

[640:01]

familiar with the game and you find the

[640:03]

value one. So you put the value one

[640:05]

inside of temp which is why it's there.

[640:07]

Now meanwhile this next line of code go

[640:09]

to A's address go to B's address and

[640:12]

copy the ladder to the former. So this

[640:14]

means go to A. This means go to B where

[640:17]

you find the two. So put the two where A

[640:20]

is pointing. Lastly, go to B and put

[640:23]

temp there. So that's easy. Go to B and

[640:25]

point temp, which is why we now have the

[640:27]

one. And the beauty of this now is that

[640:29]

when swap is done executing, this

[640:32]

memory, this frame sort of goes away

[640:34]

conceptually, even though the zeros and

[640:35]

ones are still there, but it's done

[640:37]

being used, but we have now mutated the

[640:39]

actual values of X and Y by giving them

[640:42]

a proverbial treasure map of the

[640:44]

addresses of X and Y, not copies of the

[640:47]

values themselves.

[640:50]

So hopefully this is the beginning of an

[640:52]

answer to like why is this stuff useful?

[640:54]

You can now solve a whole new class of

[640:56]

problem and even more next week. Other

[640:58]

uh questions though on any of the syntax

[641:01]

pictures or the like.

[641:05]

This is good use of pointers now instead

[641:07]

of bad. All right. So with that new

[641:10]

capability,

[641:12]

let us consider here

[641:16]

how things can still go wrong and why

[641:19]

indeed with this power comes that

[641:21]

responsibility. Well, if you consider

[641:23]

now the bad version of the code is

[641:25]

fixable via this good version of the

[641:26]

code, we've still left a big glaring

[641:28]

problem in the diagram itself. Designing

[641:30]

something that grows this way against

[641:32]

something that grows this way, like this

[641:34]

is not going to end well. Why? Because

[641:35]

the more you call maloc, the more memory

[641:37]

that gets used here. The more functions

[641:39]

you call, the more memory that gets used

[641:40]

here. And at some point, like they will

[641:42]

collide because the computer only has a

[641:44]

finite amount of memory. So how do you

[641:46]

avoid this situation? Like you kind of

[641:49]

don't like you honestly just make sure

[641:51]

that you minimize how much memory you're

[641:52]

using by calling maloc only as much as

[641:54]

you need to and not calling for a

[641:56]

million bytes of memory just because you

[641:57]

might need them. You only allocate what

[641:59]

memory you need. and you try not to call

[642:01]

functions again and again and again and

[642:03]

again and again and again without them

[642:04]

finally returning. So if you ever did

[642:06]

something recursive a a couple weeks ago

[642:08]

where you accidentally maybe called a

[642:10]

function that never had a base case

[642:12]

never divided and conquered and actually

[642:14]

shrunk the problem you could overflow

[642:17]

the stack or equivalently heap by just

[642:20]

using too many frames of memory. So it's

[642:23]

just a mistake in the programmer uh for

[642:25]

the program themselves. So if you've

[642:26]

ever heard these phrases now, which some

[642:28]

of you might have heap overflow or stack

[642:30]

overflow, there's a very popular website

[642:32]

called stack overflow. And this is the

[642:33]

etmology thereof. Like stack overflow

[642:35]

refers to this representative big

[642:37]

problem with computers memories if

[642:40]

you're not mindful of how you're using

[642:42]

the computer's memory. And this is just

[642:44]

the way it is. If you've got finite

[642:46]

amount of anything, that resource can

[642:48]

eventually run out at which point

[642:50]

program will crash or something else

[642:52]

might very well go wrong. In fact, this

[642:54]

is a general more specific examples of

[642:56]

what are called buffer overflows. A

[642:58]

buffer overflow is generally just a

[643:00]

chunk of memory like an array that

[643:02]

actually just gets uh overflowed with

[643:05]

too many values like using allocating a

[643:07]

small array and trying to put too many

[643:09]

numbers therein. There's problems that

[643:12]

um and in fact you can see this very

[643:13]

simply if we take off those last of our

[643:15]

training wheels. So for instance these

[643:16]

are the functions in the CS50 library

[643:18]

get int get string and so forth. um

[643:21]

they're harder to take off these

[643:23]

training. It's harder to take off these

[643:25]

training wheels because C does not

[643:27]

fundamentally make it that easy to

[643:30]

manage memory yourself. So for instance,

[643:32]

let's focus for just a moment on get

[643:34]

int. I'm going to go over to VS Code

[643:37]

here in just a second and let's go ahead

[643:39]

and create our very simple program

[643:40]

called getc whose purpose in life is to

[643:42]

just get an integer much like CS50's own

[643:45]

function. So, in get C, I'm going to

[643:47]

propose that we write a program that

[643:50]

does a little something like this. Uh,

[643:55]

include CS50.h,

[643:57]

include standard io.h, and then inside

[644:00]

of main, let's go ahead and declare an

[644:03]

int n. Uh, set it equal to get int, and

[644:07]

we'll just ask the user for the value of

[644:08]

n. Then let's go ahead and print out n's

[644:11]

value verbatim back by just doing quote

[644:15]

unquote comma n. This program is simply

[644:18]

using the get in function in order to

[644:19]

get an int and stored in n. So let's run

[644:21]

it. Make get slashget. Type in a number

[644:24]

like 50. Seems to work. And yes, I think

[644:27]

this program is correct even though it

[644:28]

is using the CS50 training wheel of get

[644:30]

int. Let's stop using get int though. It

[644:34]

turns out that you don't have to use get

[644:35]

int if you instead use a function called

[644:37]

scanf which scans formatted input which

[644:40]

just means read something from the

[644:42]

keyboard into memory. This is

[644:44]

essentially what get string and get in

[644:47]

using although that too is a bit of an

[644:49]

oversimplification but let's use it here

[644:51]

now is an opportunity to get rid of the

[644:54]

training wheel of the CS50 library al

[644:56]

together and down here let's do this

[644:58]

instead of using get int let's declare a

[645:00]

variable n but not give it a value yet

[645:03]

let's now print out just a little prompt

[645:06]

just to tell the human what we want we

[645:07]

want them to type in a value for n and

[645:10]

now let's use this new function called

[645:12]

scanf and say scan from the user's

[645:14]

keyboard an integer represented by

[645:16]

percent i, our old friend and format

[645:19]

code. And please put the integer that

[645:22]

the human types in

[645:24]

in the variable n. This is slightly

[645:28]

buggy though because if I want a

[645:30]

function like scanf to be able to change

[645:32]

the value of a variable, just like the

[645:34]

swap function, I can't just pass in n. I

[645:39]

need to pass in the address of n here.

[645:42]

In fact, let's take a moment now to go

[645:45]

into the swap function which we knew to

[645:48]

be buggy before and actually update it

[645:50]

to match what we saw on the slides. I

[645:52]

claim that the problem is that we're

[645:53]

passing in originally x and y as one and

[645:55]

two into the swap function but therefore

[645:58]

we're passing in copies. But what if we

[646:00]

change the swap function to take indeed

[646:02]

the address of an int and the address of

[646:03]

an int. Let me change my prototype

[646:05]

accordingly because that two must be

[646:07]

changed. Then when I change this

[646:11]

function to take in those pointers, I

[646:14]

need to change my code to dreference

[646:15]

them. But there's one last thing I need

[646:17]

to do. I'm still on this line of swap

[646:20]

passing in X and Y, which is literally

[646:22]

the values X and Y. If I want to pass in

[646:24]

the address of X and the address of Y,

[646:26]

what other operator do I now need?

[646:30]

the amperand x and the amperand y to

[646:34]

pass in sort of the treasure map the

[646:37]

pointer to those two variables

[646:39]

locations. So if I open up my terminal

[646:41]

window now do make swap on this version

[646:44]

dot / swap cross my fingers now this new

[646:47]

and improved version of swap as claimed

[646:49]

does actually swap the values the key

[646:51]

being swap now has access not to x and y

[646:54]

per se but to the addresses of x and y.

[646:58]

So if we now close out swap and go back

[647:01]

to get, here is the same principle

[647:03]

applied to scanf. If scanf exists and it

[647:06]

comes with c, its purpose in life is to

[647:08]

scan an integer from the keyboard and

[647:10]

put it somewhere you want. You can't

[647:11]

just give it the variable name because

[647:13]

it's going to get a copy of whatever

[647:14]

garbage value is in there. You have to

[647:16]

say put this answer in the address at

[647:20]

the address of n itself. So lastly after

[647:24]

this, let me go ahead and print out n

[647:26]

colon and then percent i again as a

[647:28]

format code back slashn, n. This line is

[647:32]

just my prompt because I just want the

[647:33]

human to know what they're being asked

[647:35]

for. This line is printing out n colon

[647:37]

and then the actual value. So the only

[647:39]

interesting part here is that I'm

[647:41]

declaring a variable called n, but I'm

[647:43]

not giving it a value myself, but I'm

[647:45]

using scanf instead of get int to scan

[647:48]

so to speak an integer from the keyboard

[647:50]

and put it at the address of n. So that

[647:53]

scanf has access to that value. So if I

[647:57]

now do make get without any cs50

[647:59]

library/get,

[648:01]

let's type in the number 50, I indeed

[648:03]

see the number spit back at me. And just

[648:05]

to be clear, print f uses these format

[648:07]

codes of percent i and so forth. Scanf

[648:10]

uses essentially the same format code.

[648:12]

So that's why I'm using percent i in

[648:13]

both places. Both functions per their

[648:15]

documentation are designed to do just

[648:17]

that. So this is great. We've gotten rid

[648:19]

of get int. Catch is that getting rid of

[648:22]

get string is much much harder. Why?

[648:24]

Well, let's try another example. Let's

[648:26]

go ahead and try to get a string from

[648:28]

the user instead of just an int. So

[648:30]

we'll call it string s. But wait a

[648:32]

minute. CS50 library is not included. So

[648:34]

we need to use the actual thing that

[648:36]

this is. So char star s means give me a

[648:38]

variable that's going to store a string.

[648:40]

Let's go ahead and print out that prompt

[648:42]

just to prompt the user for s just for

[648:44]

clarity. Now let's use scanf and scan a

[648:48]

string with percent s and put it at

[648:51]

location s. Then let's go ahead and

[648:53]

print out just a reminder that the value

[648:55]

of s is now that passing in s. Now

[648:58]

there's something a little bit bit

[648:59]

different here. Notice that I've

[649:01]

deliberately not used an amperand before

[649:04]

this s why even though I did before the

[649:08]

n. Yeah.

[649:14]

>> Yeah. So I want to pass in the address

[649:17]

of the string which is if I may like

[649:19]

already s like s is by definition the

[649:21]

address of some string that is what a

[649:23]

char star is or rather it's the address

[649:25]

of a character but we know already that

[649:27]

if you lead it to the first character

[649:29]

whatever function can find the end of it

[649:31]

thanks to the null character except that

[649:34]

that's not going to be wholly true here

[649:35]

but I don't want to do amperand here

[649:37]

because if s is an address doing

[649:38]

amperand s would be the address of an

[649:40]

address which is actually a thing called

[649:42]

a pointer to a pointer but none of at

[649:44]

today, but it's going to be correct as

[649:46]

written here. N was an integer, so I

[649:49]

needed the address of it. S is already a

[649:51]

pointer by definition. It's a char star,

[649:54]

so I don't use the amperand here. But

[649:56]

the problem is this. If I now do makeget

[649:59]

dot slashget, and let's type in a word

[650:02]

like how about hi.

[650:05]

Okay, it did work. Let me try something

[650:07]

even bigger like hi. Let's just hold

[650:11]

this down a lot. Uh, let's do how about

[650:13]

this? A really long string. Oh, come on.

[650:18]

Let's type in a really long string

[650:22]

like hi.

[650:26]

And it's always a gamble to see if I've

[650:28]

done this long enough, but okay, it

[650:31]

didn't break. Okay, you'd like to think

[650:33]

that this is correct, but let's go ahead

[650:35]

and do this. Valgrind of get uh slashget

[650:40]

enter. Let me maximize my screen. Oh,

[650:43]

uh, and let me go ahead and type in a

[650:45]

value for S. While Valgren is running,

[650:46]

I'm going to type in hi exclamation

[650:48]

point. And now

[650:51]

lot, uh, let's actually scroll down to

[650:53]

the scroll up to the top of this. A lot

[650:55]

of error seems to have happened here.

[650:58]

Use of uninitialized value of size

[651:00]

eight. Use of uninitialized value of

[651:02]

size eight. Like a lot of stuff is going

[651:04]

wrong here apparently on it looks like

[651:06]

maybe line four, which is quite early in

[651:08]

the program. And in fact, well, actually

[651:10]

that's not it. Uh, line

[651:13]

multiple lines of code here we're having

[651:15]

issues with. But why? Well, let's focus

[651:17]

on the code here alone for a moment.

[651:19]

Line five is giving me what? A variable

[651:21]

called S. That's the address of a char.

[651:24]

But what is S right now? Like what value

[651:26]

is in there?

[651:28]

>> It's a garbage value because there's no

[651:30]

equal sign involved. I'm just saying

[651:32]

give me space. Like give me eight bytes,

[651:34]

64 bits to store the address of a

[651:36]

character. But if I don't use the equal

[651:38]

sign and actually put anything there, it

[651:40]

is in fact just some garbage value. The

[651:41]

print f is uninteresting. It's just

[651:43]

printing out son. Scanf though is saying

[651:46]

go to this address and store the

[651:48]

characters that the human typed in. But

[651:50]

that means like following the wiggly

[651:52]

line that we drew on the screen before

[651:53]

because we have no idea where S is

[651:55]

pointing. It might be there, there,

[651:56]

there, there. You're putting the string

[651:58]

at a bogus location in memory. You

[652:01]

haven't actually allocated memory. So

[652:03]

when you then try to print it, you're

[652:05]

just trusting that you're going to

[652:06]

memory again that you control. So what

[652:09]

is the solution here? Well, there's a

[652:10]

few different ways we could solve this.

[652:12]

We could do something like this.

[652:14]

Actually allocate space for like four

[652:16]

bytes so that the human can safely type

[652:19]

in uh so the human can safely type in

[652:23]

high exclamation point with room for the

[652:25]

null character. We could change S to

[652:27]

actually be an array of size four

[652:28]

because we can treat arrays as though

[652:31]

they're addresses and addresses as

[652:32]

though they're arrays. It turns out that

[652:33]

syntactic sugar really goes in both

[652:35]

directions. This too would solve that

[652:37]

problem. Or better still, we wouldn't

[652:39]

use scanf at all because how do I know

[652:42]

how many characters the human's going to

[652:43]

type in? Like this was a question too

[652:45]

that came up during break. Well, high

[652:46]

will fit in four bytes with the null

[652:48]

character. By will not. So maybe I need

[652:52]

five. Well, what if they type in a

[652:53]

longer word? Six. Well, maybe the longer

[652:55]

words, seven. Well, maybe a hundred or

[652:57]

maybe a thousand or 10,000 or 100,000 or

[653:00]

a million. Like, at some point, you've

[653:01]

got to draw a line in the sand and say

[653:03]

you can't type in something longer than

[653:05]

this. And you see this in applications

[653:06]

all the time. Like on the web, you can

[653:08]

only type in so many characters

[653:09]

sometimes into forms. And that's for

[653:11]

various reasons. Among them is this. Get

[653:14]

string though will handle almost an

[653:16]

infinite number of characters because

[653:18]

the way we implemented get string is to

[653:20]

take baby steps through the input. When

[653:22]

you type in a word on the keyboard or

[653:23]

even a paragraph on the keyboard, we get

[653:25]

strings implementers call maloc

[653:28]

essentially again and again and again

[653:30]

and again just asking for one more bite

[653:31]

if we need it, one more bite if we need

[653:33]

it, one more bite so that you don't have

[653:34]

to worry about doing that. The problem

[653:36]

is if you were to write code yourself

[653:38]

without the CS50 library or someone

[653:40]

else's equivalent library, you have to

[653:42]

decide like how many bytes do you want

[653:45]

to allow and you have to trust that the

[653:46]

human is not going to mess around and

[653:48]

type in more values than you actually

[653:51]

expect. So what's happening with all of

[653:53]

these examples thus far is that if you

[653:55]

think of your memory as kind of a

[653:56]

minefield of garbage values wasn't a

[653:59]

problem when we declared n to have a

[654:01]

value of 50 because we told scanf to go

[654:04]

to that address and put the number 50

[654:06]

there and it fits. That's fine because

[654:08]

an int is always four bytes in this

[654:10]

case. Who knows how many times the human

[654:11]

is going to hit the keyboard when typing

[654:13]

in a string. Could be three or four or a

[654:15]

million or anything else. So when we

[654:18]

declare S here to be a pointer, it takes

[654:21]

up eight bytes per the Oscar the grouch

[654:23]

Oscar is the grouch here whereby that's

[654:26]

eight garbage values that collectively

[654:28]

represent that address at the moment

[654:30]

because we've not assigned it to any

[654:32]

other value. So if we try to tell scanf

[654:35]

go to this address and store high or

[654:38]

anything else there like who knows where

[654:40]

it's going to end up in memory hence the

[654:41]

squiggly line again and the program will

[654:43]

quite often crash. I didn't get it

[654:45]

because I didn't type in long enough of

[654:46]

a string, but it would eventually, if I

[654:48]

tried hard enough, crash because you're

[654:50]

touching memory that you yourself did

[654:52]

not allocate as an array via maloc or

[654:55]

some other mechanism. So, what is the

[654:58]

solution? Honestly, like don't use C for

[655:01]

user input like this unless you're

[655:03]

prepared to implement that complexity

[655:04]

yourself. Use the CS50 library or some

[655:06]

other library. This too is why in two

[655:08]

weeks we're going to switch to Python

[655:09]

because Python makes life so much easier

[655:12]

when it comes to basic things like

[655:14]

getting user input as do many other

[655:16]

modern languages. But those languages

[655:19]

just have code that other humans have

[655:21]

written to solve these problems for you.

[655:23]

So these problems exist but they'll be

[655:25]

abstracted away for you. All right,

[655:29]

let's tie this now together with where

[655:30]

we began, which was to convey ultimately

[655:34]

that we want to have uh the ability now

[655:37]

to actually access files. And we

[655:39]

introduce now a topic called file IO. IO

[655:42]

for input and output. A file is just a

[655:44]

bunch of bytes that are stored on disk,

[655:46]

where disk might mean a hard drive, the

[655:47]

thing that spins around with a platter

[655:49]

with lots of zeros and ones on it, or an

[655:50]

SSD, a solid state drive, which is u no

[655:53]

moving parts nowadays and generally

[655:54]

where our data is stored long term.

[655:56]

Whereas RAM, random access memory, the

[655:58]

y, the yellow pictures we've been

[656:00]

drawing, is volatile. That is to say,

[656:02]

when you lose power, the battery dies,

[656:04]

you lose everything in RAM. On a hard

[656:06]

drive or a solid state drive, that's

[656:07]

persistent storage or nonvolatile

[656:09]

storage, which means when the power goes

[656:11]

out, thankfully, you don't lose all of

[656:12]

your documents and essays and so forth,

[656:14]

whether it's on your Mac or PC or

[656:16]

somewhere in the cloud. But we haven't

[656:18]

yet seen any code via which you

[656:19]

yourselves can create files. Like

[656:21]

literally every program we've written,

[656:22]

even the phone book example last time

[656:24]

when I typed in names and numbers, they

[656:26]

got deleted as soon as the program quit

[656:28]

and ended. So with File IO though, we

[656:31]

have the ability now to start creating,

[656:33]

saving, editing, deleting files much

[656:36]

like you would from the file menu of

[656:38]

Google Docs, Microsoft Word, or the

[656:40]

like. Here are just some of the

[656:41]

functions that come with the programming

[656:43]

language C that allow you to open files

[656:46]

aka FOP, close files, aka Flo, print to

[656:50]

a file, scan from a file, read a file,

[656:52]

write to a file, lots of different

[656:54]

functions, some of which we'll explore

[656:55]

this coming week. But why don't we first

[656:56]

use them to solve a problem here in VS

[656:59]

Code. So, let me go ahead and close

[657:00]

get.c. Let's go ahead and open up a new

[657:03]

program called phonebook.c, C, but

[657:05]

implement a persistent version of it

[657:07]

ultimately that doesn't just get deleted

[657:10]

from memory when the program quits.

[657:12]

Let's go ahead and only because it will

[657:14]

make life easier, let's include the CS50

[657:16]

library still for this. Let's include

[657:18]

standard io.h for this. And let's

[657:20]

include string.h for this. Then inside

[657:24]

of main, no command line arguments,

[657:26]

let's go ahead and open a file called

[657:29]

phonebook.csv.

[657:31]

CSV stands for commaepparated values.

[657:33]

Many of you have probably used them in

[657:35]

the real world. They're like very

[657:36]

lightweight spreadsheets where things

[657:37]

are effectively stored in rows and

[657:39]

columns where the columns are

[657:40]

represented by just commas between

[657:42]

values. And we'll see this in just a

[657:43]

moment. How do you open a new file

[657:46]

called phonebook.csv?

[657:48]

Well, I'm going to do file star file

[657:51]

equals fop phone.csv.

[657:55]

And then I'm going to do quote unquote w

[657:57]

for write. So what's going on here? fop

[658:00]

is opening a file whether or not it

[658:02]

exists yet called phonebook.csv

[658:05]

and it's opening it in such a way that I

[658:07]

will be allowed to write to it. Hence

[658:09]

the quote unquote w per the

[658:10]

documentation it means I can write to

[658:12]

this file and not just read it. The

[658:14]

return value is going to be stored in a

[658:16]

variable called file. All lowercase by

[658:18]

convention but that file is technically

[658:21]

a strct called file in all caps. It's a

[658:25]

little weird. It's among the few things

[658:26]

that is fully capitalized in C. It

[658:27]

doesn't mean it's a constant or anything

[658:28]

like that. It's just how someone

[658:30]

implemented it years ago. This is giving

[658:32]

me a pointer to essentially the contents

[658:35]

of that file. That's a bit of a white

[658:36]

lie. Technically giving you a pointer to

[658:38]

a chunk of memory that represents that

[658:40]

file, but for all intents and purposes,

[658:42]

it's a pointer to the file for now. Now,

[658:44]

let's go ahead and ask the user for a

[658:46]

name and number to add to this phone

[658:47]

book. Let's do charar name equals get

[658:50]

string uh quote unquote name to prompt

[658:52]

the human for that. Charar number. Let's

[658:55]

prompt them for that. and do it with

[658:58]

this. And I could be using the string

[659:00]

data type, but I'm trying to at least

[659:01]

remove what training wheels we don't

[659:02]

technically need anymore. And now that

[659:04]

we've got a name and number in

[659:06]

variables, let's print them to the file.

[659:08]

That is, let's save them to the file.

[659:10]

Instead of print f, we're going to use

[659:12]

frrint f, we're going to specify what

[659:14]

file we want to print to in case we have

[659:17]

multiple ones open. What do I want to

[659:20]

print? A string followed by a string

[659:22]

followed by a new line. ergo comma

[659:25]

separated values one after the other per

[659:27]

line. Then I'm gonna pass in the values

[659:30]

name and number respectively. And now

[659:33]

I'm going to go ahead and

[659:36]

do f close to close that file so that

[659:39]

it's effectively saved. All right. So

[659:41]

let me go ahead and demonstrate first

[659:43]

that phone book.csv

[659:45]

does not really exist. It's empty

[659:46]

initially. Let me go ahead and scooch it

[659:48]

over to the right here so we can see

[659:49]

both at the same time. I'm now going to

[659:50]

do make phone book. Enter. So far so

[659:53]

good. Dot slashphonebook and let me go

[659:56]

ahead and type in for instance uh let's

[659:58]

see uh my name 617495

[660:02]

1000 and watch the top right of your

[660:04]

screen as the program f writes to it and

[660:07]

f closes the contents. All good. All

[660:10]

right, let's run it again because maybe

[660:12]

like the iOS app or the Android app, I'm

[660:13]

adding new friends to my phone book

[660:15]

here. So, I'm going to do dot /phonebook

[660:17]

and I'm going to go ahead and uhoh, top

[660:20]

right just got turned blank. Well, let's

[660:23]

try this. Kelly 6174951,000.

[660:26]

Enter. Okay, she's back. Let me run it

[660:29]

again. Dot phone book gone. Well, what's

[660:32]

going on here?

[660:35]

It's not persisting at least as long as

[660:37]

I would like. It seems to be the case

[660:40]

that like writing to a file means

[660:42]

literally rewrite the file. So if you

[660:44]

use W, you're going to write to the

[660:46]

file, but literally starting at the

[660:48]

first bite. If you want to be smart

[660:49]

about it and append to the file, well,

[660:51]

per the documentation for FOP, you

[660:53]

instead use quote unquote A for append

[660:55]

instead of quote unquote W for write.

[660:57]

This is a convention in other languages,

[660:59]

too. All right, let's start this over.

[661:00]

Let me go ahead and recompile this

[661:02]

program. Make phone book. Now, let me do

[661:04]

/phonebook. I'll type in my name again

[661:05]

first. 6174951000.

[661:08]

Enter. So far so good. Phonebook. So far

[661:12]

so good. Kelly 6174951000.

[661:16]

Enter. And now we're on our way. In

[661:18]

fact, I can close this file. I can close

[661:20]

this file. I can then open up

[661:22]

phonebook.csv.

[661:24]

And indeed, it has persisted. And in

[661:27]

fact, if I downloaded this file onto my

[661:29]

Mac or my PC, I could then rightclick it

[661:31]

or double click on it and probably open

[661:32]

it in Microsoft Excel or Apple Numbers.

[661:34]

I could import it into Google Sheets or

[661:36]

any number of other spreadsheet tools

[661:38]

because now I am persisting and writing

[661:40]

files of my own.

[661:43]

questions on any of the techniques we

[661:46]

just tried out here.

[661:49]

If we really want to be nitpicky, like

[661:51]

technically I should fix one bug or

[661:53]

missed opportunity if I open up

[661:55]

phonebook.c, I'm going to propose that

[661:57]

as with any use of pointers and

[662:00]

addresses more generally. Here too,

[662:02]

something could be wrong like maybe I'm

[662:04]

just out of space and so fop can't

[662:06]

physically open the file for me. So here

[662:08]

too, I should check if file equals

[662:09]

equals null. Okay, fine. return one and

[662:12]

then maybe at the very bottom here I

[662:13]

return zero to make clear nope nope if I

[662:15]

get this far all is well. So in short

[662:17]

anytime you are dealing now with

[662:19]

pointers you should be checking the

[662:20]

return values to see if all in fact went

[662:24]

well. Yeah

[662:31]

>> yes everything we are using is part of

[662:33]

standard io.h H which is wonderfully

[662:36]

useful now because it has not just print

[662:38]

f but frint f and so forth as well. Good

[662:42]

questions. Yeah.

[662:49]

>> Yes. So we have how are pointers used in

[662:52]

this code? The short answer is you have

[662:54]

to use pointers because this is how C

[662:56]

designed files to work. So, we couldn't

[662:59]

really introduce you all to files, file

[663:01]

IO in week one or two or three because

[663:04]

we had it. We'd have to introduce like

[663:05]

this stupid little character to you and

[663:06]

you'd be like, "What does this mean?

[663:07]

It's not multiplication." Because the

[663:10]

way file IO works is that when you open

[663:12]

a file, you are essentially handed the

[663:14]

address of that file in memory. That's

[663:16]

an oversimplification. You're

[663:17]

technically handed the address of a data

[663:19]

structure in memory that references the

[663:21]

file actually on disk. But for all

[663:23]

intents and purposes, as I said, this

[663:25]

gives you a pointer to the contents of

[663:27]

the file. And if you want to write to

[663:29]

the file, you need to then do use frint

[663:32]

f in this case, tell it what file to

[663:34]

write to. So you can go there and then

[663:36]

store something like this string with

[663:38]

these values plugged in. So in short, in

[663:40]

C without pointers, you just can't do

[663:42]

file IO unless it's abstracted away for

[663:44]

you by some library. Good question.

[663:47]

Other questions on file IO?

[663:52]

All right. Well, let me do one other

[663:54]

example here that's a little reminiscent

[663:56]

of things we see all the time on our

[663:57]

phones and laptops and desktops, like

[663:59]

these progress bars for like video

[664:00]

players. And you're all probably

[664:01]

generally familiar with the term like

[664:03]

buffering. If only because YouTube and

[664:04]

other apps when they are slow or you

[664:06]

have a slow internet connection, they

[664:07]

might say buffering dot dot dot. Well,

[664:09]

what does that mean? Well, a buffer is

[664:11]

just a chunk of memory. More

[664:12]

specifically, it's often an array that

[664:14]

is only a finite size that stores bytes

[664:16]

of stuff. Well, in the context of a

[664:18]

video player, for instance, this red

[664:19]

line here, which represents you're that

[664:20]

way through that much through the video,

[664:22]

it's an array that stores like the next

[664:24]

few bytes of a video. And ideally, if

[664:25]

you have a fast enough connection, when

[664:27]

you hit play, those bytes keep getting

[664:28]

downloaded and added to the buffer. And

[664:30]

hopefully, you don't finish watching the

[664:33]

bytes that have been downloaded before

[664:34]

more bytes have been downloaded. So, a

[664:36]

buffer is just a chunk of memory or more

[664:38]

specifically an array in a language like

[664:41]

C. Well, just to demonstrate how else

[664:43]

you can do things with file IO, let me

[664:45]

propose that we write a simple little

[664:47]

program that is our own implementation

[664:50]

of the CP program, the copy program that

[664:52]

we've used a few times already that

[664:54]

allows you in your terminal window to

[664:57]

copy one file to another, likening it to

[664:59]

this idea of a progress bar, where bite

[665:02]

by bite, you want to do something,

[665:04]

namely in this case, copy it, not watch

[665:06]

it instead. So, let me go in VS Code and

[665:08]

code up a program called CP.C. And in in

[665:11]

this program, I'm going to go ahead and

[665:12]

include standard io.h at the top. I'm

[665:15]

going to then give myself a main

[665:17]

function that this time does take

[665:19]

finally a command line argument via int

[665:22]

arg c and our old friend string uh arg v

[665:28]

which today we can now reveal to be also

[665:32]

just a char star. In fact, this is how

[665:36]

we could now technically write the

[665:37]

declaration for main because string no

[665:39]

longer exists without the CS50 library

[665:41]

per se. So that's really what's been

[665:43]

going on this whole time. Now, let me go

[665:44]

ahead and do this. I want to be able to

[665:46]

write a program that takes two command

[665:48]

line arguments actually. The name of the

[665:50]

file to copy and the name of the new

[665:52]

file to create from it. So let's go

[665:55]

ahead and create a file using the same

[665:57]

syntax as before called src for short,

[666:00]

source as is a convention. And let's

[666:02]

open a file using

[666:05]

uh the file name argv bracket one. So

[666:08]

the first word the human types and let's

[666:11]

go ahead and open it in read mode

[666:14]

because I want to read the source and

[666:16]

write to the destination. My next file

[666:19]

file star dst destination for short will

[666:21]

be fopen of argv 2,

[666:25]

quote unquote write. Now why one and two

[666:28]

and not zero and one in zero is the name

[666:30]

of the program which is not interesting.

[666:32]

One and two will contain the next two

[666:34]

words that the human types. Now let me

[666:36]

propose that I want to copy this file

[666:38]

from source to destination bite by bite

[666:41]

similar in spirit to a buffer like this

[666:43]

where you're just grabbing from the

[666:44]

internet one bite of the video at a time

[666:46]

so as to watch it. In this case I want

[666:48]

to copy it. So how can I do this? Well

[666:50]

we don't have a data type per se for

[666:52]

representing a bite eight bits. However,

[666:55]

a common convention is to actually use

[666:57]

our new friend type defaf and simply

[666:59]

declare bite to be something significant

[667:01]

or something specific. So, let me

[667:03]

declare a type uh called bte. And what

[667:06]

is a bite going to be? Well, it ideally

[667:08]

is just a char because a char we know is

[667:10]

one bite or eight bits. But recall that

[667:14]

chars can be treated as integers and

[667:16]

integers of course can be positive and

[667:17]

negative. So even though this is a

[667:19]

little esoteric, technically I want to

[667:21]

define a bite to be what we'll call an

[667:23]

unsigned char, which is probably a

[667:25]

keyword you haven't yet seen. But it

[667:27]

just tells the compiler that this char

[667:29]

that is this sequence of eight bits

[667:31]

cannot be interpreted as a negative

[667:32]

number because I am not doing anything

[667:34]

with math. These are just raw bytes or

[667:36]

eight bits. So now down here I can give

[667:39]

myself a bite and I'll call it B for

[667:42]

short. And now I'm going to write a loop

[667:44]

similar in spirit to what YouTube and

[667:46]

other players are probably doing which

[667:47]

just iterates over a file bite by bite

[667:50]

making in our case a copy thereof. So

[667:52]

while I am reading from this file into

[667:57]

this bite the size of one bite one at a

[668:00]

time into this destination.

[668:04]

Go ahead and check that I've read at

[668:06]

least one. So while the return value of

[668:09]

a new function called fad is not equal

[668:10]

to zero go ahead and

[668:14]

oops sorry source go ahead and call

[668:17]

fright another new function going to

[668:20]

that address of the bite grabbing the

[668:22]

size of it which happens to be one but

[668:24]

I'll use size of for consistency grab

[668:26]

one such bite and write it to

[668:28]

destination this is a huge mouthful

[668:30]

admittedly the last thing of which I

[668:32]

need to do is close the destination so

[668:35]

as to save it close the original file

[668:37]

the source. Um, but this huge mouthful

[668:40]

which you'll get more familiar with the

[668:42]

next problem set is essentially saying

[668:44]

on line 12 while I can read one bite at

[668:47]

a time, write on line 14 that bite to

[668:51]

the file. Implementing essentially this

[668:53]

idea of the red progress bar going bite

[668:55]

to bite to bite reading one bite at a

[668:56]

time reading from one file the source

[668:59]

writing to the other the destination.

[669:01]

And here too to your question earlier

[669:03]

like why why pointers? This is the way

[669:06]

file IO is done. You have to be able to

[669:08]

express go to this address, go to this

[669:10]

file if you want to get data from it or

[669:13]

to it. And a minor refinement too,

[669:15]

technically when you open in files, if

[669:17]

you know they're binary files, that is

[669:19]

zeros and ones and not asy or unicode

[669:21]

text files, you can technically tell fop

[669:23]

write and read in binary mode. So

[669:26]

there's no mistaking the bits for

[669:28]

something other than raw data, an image

[669:31]

or otherwise. All right. So, if I go

[669:34]

ahead now and do make cp, it so far

[669:37]

compiles. Let's try this out. So, here

[669:40]

again is phonebook.csv.

[669:42]

Whoops. Here, that's phonebook.c. Here

[669:45]

again is phonebook.csv with two of us,

[669:47]

David and Kelly. Let's try to make a

[669:48]

copy of this file as follows. CP. So,

[669:51]

this is my version of the copy program,

[669:53]

not the one that comes with the system.

[669:55]

Let's copy phonebook.csv

[669:57]

into copy.csv.

[670:00]

Enter. Let's open now the copy of

[670:04]

the CSV. Enter. And voila. Thank god

[670:08]

like it actually worked. I have made a

[670:10]

bite forbyte copy of this file using

[670:12]

syntax that was not available to us

[670:15]

until today. So who cares? And what's

[670:18]

the motivation? Well, it's a lot more

[670:19]

fun to treat not just text files and

[670:21]

these tiny little examples, but to

[670:22]

actually play with real world examples.

[670:24]

And in the next problem set, among the

[670:25]

things you'll do is experiment with BMP

[670:27]

files, bitmapped files, which

[670:29]

essentially just means a grid of pixels

[670:31]

top to bottom, left to right, much like

[670:34]

our cat uh that our volunteers at

[670:36]

classes start created for us. With a bit

[670:38]

mapap file, you'll store in files

[670:40]

literal uh sequences of pixels or dots,

[670:43]

each of which is going to be represented

[670:44]

with a specific color, a red value, a

[670:46]

green value, and a blue value. And among

[670:48]

the things you'll be able to do given

[670:49]

such beautiful photos as this is as the

[670:51]

weeks bridge down by the Charles River

[670:52]

is actually make your own Instagram-l

[670:54]

like filters to apply to photos like

[670:56]

this understanding now as you do or soon

[670:59]

will understand to be able to iterate

[671:02]

over the file top to bottom left to

[671:03]

right over each of the bytes therein and

[671:06]

somehow mutate the bites to look a

[671:08]

little bit different. So if this is the

[671:09]

original photo, you might be able to

[671:10]

make it all grayscale by changing the

[671:12]

Rs, the G's and the B's to smaller

[671:14]

values somehow that are simpler values

[671:16]

that are just black and white and gray

[671:18]

tones. You might take that same photo as

[671:20]

input and give it more of a sepia tone

[671:21]

like an old school photograph instead.

[671:23]

You might actually reflect it like

[671:25]

actually put these bytes over here and

[671:27]

these bites over here so as to create

[671:29]

the inverse of the image by reflecting

[671:31]

it over the the vertical axis here. Or

[671:33]

you might even blur the image like this.

[671:36]

This is kind of a common feature in a

[671:37]

lot of photo editing programs to either

[671:39]

blur or deblur. Well, you can sort of do

[671:41]

a little bit of math and make every

[671:43]

pixel a little fuzzier by kind of

[671:45]

clouding what the human is actually

[671:47]

seeing. Or feeling more comfortable, you

[671:49]

can actually write code now that you

[671:51]

know how to manipulate files and

[671:52]

addresses thereof and actually do edge

[671:54]

detection and find the salient

[671:56]

characteristics of something like the

[671:57]

bridge to distinguish it from the sky

[671:59]

and actually find filter-like edges like

[672:02]

these. So, those are just some of the

[672:04]

problems that you're going to solve over

[672:05]

the coming week's problem set and

[672:06]

manipulating ultimately files like these

[672:09]

as well as JPEGs. And the last thing we

[672:11]

thought we'd end on is a sort of

[672:12]

computer science joke which for better

[672:13]

or for worse, you're now getting more

[672:14]

and more able to interpret. So, I'll

[672:18]

leave you dramatically with this here

[672:19]

famous joke.

[672:24]

Oh, that's more laughter than usual. All

[672:26]

right, that's it for week four. We will

[672:28]

see you next time.

[672:30]

Heat. Heat.

[673:47]

All right, this is CS50 and this is week

[673:51]

five already uh wherein we will focus

[673:53]

today on data structures which is a

[673:55]

topic we've touched on a little bit in

[673:57]

simp in simple form but today we'll dive

[673:59]

all the more deeply and for better or

[674:01]

for worse this is our last week on C uh

[674:04]

next week of course we transition to

[674:06]

Python which is a so-called higher level

[674:07]

programming language which is really

[674:09]

frankly just going to make our lives a

[674:10]

lot easier we're going to be able to

[674:11]

solve a lot of the same problems but so

[674:13]

much more quickly as humans but not

[674:16]

necessarily as we'll see as fast when we

[674:19]

run the code as the computer might have

[674:21]

if we were still using a lower level

[674:23]

language like C. So indeed thematic over

[674:25]

this weekend next is going to be the

[674:26]

theme we've seen before of tradeoffs.

[674:29]

But before we get there, why don't we

[674:30]

focus on a couple of data structures

[674:33]

that you might encounter in the real

[674:35]

world. Uh namely stacks and cues. Let's

[674:39]

learn some facts about both of these. If

[674:41]

we could dim the lights dramatically.

[674:48]

Once upon a time, there was a guy named

[674:50]

Jack. When it came to making friends,

[674:53]

Jack did not have the knack. So, Jack

[674:55]

went to talk to the most popular guy he

[674:57]

knew. He went up to Lou and asked, "What

[674:59]

do I do?" Lou saw that his friend was

[675:01]

really distressed. "Well," Lou began,

[675:04]

"Just look how you're dressed. Don't you

[675:06]

have any clothes with a different look?"

[675:08]

"Yes," said Jack. "I sure do. Come to my

[675:11]

house and I'll show them to you." So

[675:13]

they went off to Jack's and Jack showed

[675:15]

Lou the box where he kept all his shirts

[675:17]

and his pants and his socks. Lou said,

[675:19]

"I see you have all your clothes in a

[675:21]

pile. Why don't you wear some others

[675:23]

once in a while?" Jack said, "Well, when

[675:26]

I remove clothes and socks, I wash them

[675:28]

and put them away in the box. Then comes

[675:31]

the next morning and up I hop. I go to

[675:33]

the box and get my clothes off the top."

[675:36]

Lou quickly realized the problem with

[675:38]

Jack. He kept clothes, CDs, and books in

[675:40]

a stack. When he reached for something

[675:42]

to read or to wear, he chose the top

[675:45]

book or underwear. Then when he was

[675:47]

done, he would put it right back. Back

[675:49]

it would go on top of the stack. I know

[675:52]

the solution, said a triumphant Lou. You

[675:54]

need to learn to start using a queue.

[675:56]

Lou took Jack's clothes and hung them in

[675:58]

a closet. And when he had emptied the

[676:00]

box, he just tossed it. Then he said,

[676:02]

"Now Jack, at the end of the day, put

[676:05]

your clothes in the left when you put

[676:06]

them away. Then tomorrow morning when

[676:08]

you see the sunshine, get your clothes

[676:10]

from the right, from the end of the

[676:12]

line. Don't you see? said Lou. It will

[676:15]

be so nice. You'll wear everything once

[676:17]

before you wear something twice. And

[676:19]

with everything in cues in his closet

[676:21]

and shelf, Jack started to feel quite

[676:24]

sure of himself. All thanks to Lou and

[676:26]

his wonderful queue.

[676:29]

All right. Our thanks to Professor

[676:30]

Shannon Deval at Elon University who

[676:32]

kindly put together that animation. And

[676:34]

it's meant to paint a picture of a

[676:35]

couple of things that we've all

[676:36]

encountered in the real world. But more

[676:38]

technically, what we just saw were what

[676:39]

are known as abstract data types whereby

[676:42]

they're data structures in some sense,

[676:43]

but it's really about the design

[676:45]

thereof. What characteristics or

[676:47]

features or functionality these

[676:48]

structures offer irrespective of how

[676:50]

they are implemented in terms of lower

[676:52]

level implementation details, which is

[676:54]

to say you can implement, as we'll see,

[676:56]

cues and stacks in any number of ways,

[676:58]

which are going to have real world

[676:59]

implications for how you can actually

[677:01]

use them and what kinds of problems you

[677:02]

can solve with them. So let's consider

[677:04]

for instance Q's in the first place. So

[677:05]

a Q is something you sort of experience

[677:07]

all the time. Anytime you go to a store

[677:09]

uh go to uh some event in for which you

[677:12]

have to line up in a so-called queue.

[677:14]

You'd ideally like there to be some

[677:15]

fairness property about that queue such

[677:17]

that if you got in line first you get

[677:19]

into the store first. You get to check

[677:20]

out first or some other such goal.

[677:22]

Meanwhile, the person who got there last

[677:24]

actually is at the end of the line and

[677:26]

stays at the end of the line and

[677:27]

therefore gets served or enters in at

[677:29]

the end. So Q's have what a computer

[677:32]

scientist would say is a FIFO property.

[677:34]

First in first out. That is if you're

[677:36]

the first person in line, you're the

[677:37]

first person to get out of line. And for

[677:39]

many problems, that is a good solution.

[677:41]

Certainly if you're concerned with

[677:42]

fairness. Um but more technically, AQ

[677:46]

has what we'll call two operations. NQ,

[677:48]

which is a fancy way of saying getting

[677:49]

in line, and DQ, a fancy way of saying

[677:51]

getting out of the line from the front

[677:53]

of it. But those two operations, if you

[677:55]

think about it in code, could it be

[677:56]

implemented with different actual

[677:59]

details? And by that I mean this here is

[678:02]

one way that we could go about

[678:03]

implementing in CC code a que for a

[678:06]

bunch of people or persons who want to

[678:08]

line up for something. So for instance

[678:10]

we'll decree that this queue can hold no

[678:11]

more than 50 people like that's the

[678:13]

physical capacity and then we define a

[678:15]

structure which we've done a couple of

[678:17]

times in the past whereby this structure

[678:19]

has not only an array of persons that

[678:22]

we'll call people and that will be as

[678:24]

big as is the capacity. So this is an

[678:26]

array of size 50 for 50 such persons.

[678:29]

And then we're going to propose that we

[678:31]

also keep track in this implementation

[678:33]

of a queue of the current size of the

[678:35]

queue. So we're going to make a

[678:36]

distinction between the capacity like

[678:37]

how many total people can be there and

[678:39]

the size like actually how many people

[678:41]

are in line at that moment in time so

[678:43]

that you know which of the spots in the

[678:44]

array are effectively empty. And we're

[678:46]

going to call that whole structure a Q.

[678:48]

Now the catch with this particular

[678:50]

implementation in code of a Q is what

[678:54]

there is inherent in it a a limitation

[678:57]

something you just kind of have to deal

[678:58]

with and I see you nodding what what's

[678:59]

your instinct for this

[679:02]

>> for example 50 students

[679:04]

>> okay well I think you hit the nail on

[679:05]

the head in that it's only for 50

[679:07]

students or 50 people which means if a

[679:09]

50irst person wants to get into line you

[679:11]

literally have no means of remembering

[679:13]

them in this data structure so how do

[679:15]

you solve that well we could just

[679:16]

recompile our code after changing the 50

[679:18]

to like 51 or maybe 500 or 5,000. But

[679:22]

there there's this trade-off because you

[679:24]

could still be undershooting the total

[679:26]

number of people trying to get into

[679:27]

maybe a big concert in the case of an

[679:29]

extreme. But at at the same time, if you

[679:32]

overallocate memory using 5,000

[679:34]

locations in memory, what if only a few

[679:36]

people show up? Now you're just wasting

[679:38]

memory. And certainly at the end of the

[679:39]

day, you only have a finite amount of

[679:40]

memory in the computer. So you kind of

[679:42]

have to decide a priority like before

[679:44]

compiling your code, how big is this

[679:46]

structure going to be? how much space

[679:48]

are you going to waste? And in the end,

[679:49]

it's all sort of stupid. It would be

[679:51]

ideal if instead we could just grow the

[679:53]

queue as needed and shrink it.

[679:55]

Essentially asking the operating system,

[679:57]

as we started doing last week, for more

[679:58]

memory and then giving it back if we

[680:00]

don't actually need that memory, which

[680:02]

is to say can't really do an array in

[680:05]

this static sense. And by static, I mean

[680:07]

we're literally deciding in advance at

[680:09]

compilation time how big this thing is

[680:11]

going to be. As an aside, this is also a

[680:13]

bit annoying for implementing a queue

[680:15]

because you have to somehow keep track

[680:17]

of who is at the head of the queue, the

[680:19]

front of the queue, because as you start

[680:21]

plucking people off, you need to

[680:22]

remember who's the next person

[680:23]

effectively. But there are ways in code

[680:25]

that we could solve this. So let's

[680:27]

consider an alternative to a queue which

[680:28]

gives us very different properties,

[680:29]

namely a stack. And we saw that in the

[680:31]

animation whereby uh Jack used a stack

[680:34]

to put his clothes into a box so that

[680:36]

every time he got dressed he sort of

[680:38]

took the sweater from the top from the

[680:40]

top from the top and might never wear

[680:41]

anything other than black as a result.

[680:43]

If he does a wash before he actually

[680:44]

reaches the blue and the red sweater

[680:46]

there. So a stack as we've just seen has

[680:49]

a LIFO property to it. Last in first

[680:52]

out. So, if I do a load of laundry and I

[680:54]

plop some more sweaters on this stack,

[680:56]

well, I'm presumably going to use the

[680:58]

last sweater that went in first as

[681:00]

opposed to trying to create a mess and

[681:01]

like, you know, pull the bottommost

[681:02]

sweater out, which is just going to be a

[681:04]

little more effort than uh than it would

[681:06]

be otherwise from just taking it from

[681:07]

the top. So, sometimes last and first

[681:10]

out doesn't give you maybe this fairness

[681:12]

property you might want for other

[681:13]

problems, but it does give you an

[681:15]

efficiency, a convenience certainly. So,

[681:17]

maybe that might be compelling. And

[681:19]

stacks are actually everywhere, too. If

[681:20]

you've checked your Gmail recently, odds

[681:22]

are you've opened up gmail.com or

[681:24]

outlook.com and you've looked at your

[681:25]

inbox. And where does the new mail by

[681:27]

default end up? At the top. At the top.

[681:29]

At the top. And I dare say all of us are

[681:31]

guilty of sort of neglecting emails that

[681:33]

fall below the break or onto the next

[681:35]

page and sort of focusing only on the

[681:37]

last in and therefore replying to it

[681:39]

first out, which isn't great maybe for

[681:41]

the senders of those emails, but it's

[681:43]

just how those user interfaces are

[681:46]

implemented quite often unless you

[681:47]

override those default settings. So how

[681:49]

might we implement a stack? Well, we

[681:51]

need to implement more technically two

[681:53]

fundamental operations. The analoges of

[681:55]

NQ and DQ in the world of stacks are

[681:57]

called push, which means push something

[681:58]

onto the top of the stack, and pop,

[682:00]

which means remove something from the

[682:02]

top of the stack also. And the the team

[682:05]

in the cafeterias and dining halls on

[682:06]

campus do this all day long. Any of the

[682:08]

cafeterias or dining halls that have

[682:10]

stacks of trays, of course, you put the

[682:12]

first tray at the bottom and then the

[682:13]

next tray and the next tray and the next

[682:14]

tray. And which tray do all of you pick

[682:16]

up? Well, presumably the one on the very

[682:18]

top because it's even harder to grab the

[682:19]

bottommost tray than it would be for

[682:21]

something like a sweater. As a result,

[682:23]

there's maybe undesirable properties

[682:24]

like maybe no one ever gets to the nasty

[682:26]

tray at the very bottom of the stack

[682:28]

because we're constantly replenishing

[682:29]

the top ones. But thanks to gravity,

[682:32]

like that just happens to be the most

[682:33]

appropriate data structure in the real

[682:35]

world for distributing things like trays

[682:37]

in a cafeteria. So, how might we

[682:39]

implement that idea in code? Well, funny

[682:41]

enough, we can pretty much use the exact

[682:43]

same structure. We could just rename Q

[682:46]

to stack because at the end of the day

[682:47]

we need to keep track of some number of

[682:50]

people and maybe people's is a weird

[682:51]

sort of analog here but we kept

[682:53]

everything else the same so why not that

[682:55]

but the size is also something we still

[682:57]

need to remember and it turns out it's a

[682:59]

little easier to implement a stack in

[683:00]

this way because you could always remove

[683:02]

it from the end of the array end of the

[683:04]

array and the first thing that went into

[683:06]

the stack the first in can always stay

[683:09]

at location zero for instance but

[683:11]

ultimately we could implement it in this

[683:13]

way but we have the same darn limitation

[683:14]

You can still only put 50 sweaters, 50

[683:17]

trays, 50 people into that stack data

[683:20]

structure. So this is just one

[683:22]

implementation approach. But that

[683:24]

doesn't mean that's necessarily a

[683:25]

limitation of stacks and cues. They're

[683:27]

abstract in the sense that we could do

[683:29]

better. We could maybe start to manage

[683:31]

our own memory, move away from

[683:32]

statically defining the total size of

[683:34]

this array and just start allocating and

[683:36]

deallocating, that is growing and

[683:38]

shrinking the data structure instead.

[683:40]

which is to say we can make these

[683:41]

abstract data types much less abstract

[683:44]

with actual implementations. Let's

[683:46]

consider a data structure that we saw an

[683:48]

abstract data type that we saw early on

[683:50]

that we didn't necessarily give this

[683:52]

name. A dictionary is yet another

[683:54]

abstract data type that's sort of

[683:55]

everywhere in the world literally in the

[683:57]

world of dictionaries containing words

[683:59]

and their definitions. And you can think

[684:01]

of a dictionary really in the abstract

[684:02]

if you were to draw this on the

[684:04]

chalkboard as really just a two column

[684:05]

table whereby on the left is the word

[684:07]

and on the right is the definition. And

[684:09]

if it's a physical book, it's

[684:10]

essentially the same thing with lots of

[684:12]

columns of words on the left, often

[684:14]

bold-faced, and then the definitions

[684:15]

right next to them. You can also see

[684:17]

this in the context of like a phone

[684:19]

book, which is where we began the course

[684:20]

in week zero, where it's essentially a

[684:22]

dictionary of names and numbers instead

[684:24]

of words and definitions. And a computer

[684:26]

scientist would generalize the notion of

[684:28]

a dictionary further and just call the

[684:30]

thing on the left a key and the thing on

[684:32]

the right a value. And these things are

[684:35]

omniresent in computing. And you're

[684:37]

going to start to see them all the more

[684:38]

today. next week and beyond in that if

[684:40]

you just want to associate some piece of

[684:42]

data with another piece of data, a

[684:44]

so-called key value pair, a dictionary

[684:47]

is going to be your go-to data type. But

[684:50]

even these two we can implement in

[684:52]

different ways for reasons that we've

[684:54]

already seen. Like maybe there's only a

[684:56]

finite size to this dictionary if we're

[684:58]

using an array. Maybe we can do better

[684:59]

than that. And maybe a dictionary if

[685:01]

implemented one way is going to be fast.

[685:03]

Maybe if implemented another way is

[685:04]

going to be slow. So we'll consider

[685:06]

these other design possibilities today

[685:08]

too in the context of phone books and

[685:10]

other data structures as well. After

[685:12]

all, if you have an iPhone or an Android

[685:15]

phone and Apple or Google only decided

[685:17]

that you can have 50 friends because

[685:19]

they implemented the contacts app in an

[685:21]

array. I mean that would be an annoying

[685:23]

limitation. So presumably they've done

[685:26]

things a little more dynamically as

[685:27]

we'll do today. So let's focus on the

[685:30]

first of the data structures we saw back

[685:31]

in week 2. That is an array which recall

[685:33]

was just a chunk of memory where you can

[685:35]

store values in it back to back to back

[685:38]

and that was the fundamental definition.

[685:39]

The values are back to back to back or

[685:41]

contiguous in memory and as we've seen

[685:43]

we generally have to decide in advance

[685:45]

the size of an array. So for instance if

[685:47]

we want to store three values like 1 2

[685:49]

and three it might look pictorially like

[685:51]

this or in code let's go ahead and

[685:53]

implement this same idea and take a

[685:56]

moment to whip up our very first program

[685:57]

here and we'll call it say list C. And

[686:00]

in this program, let's just do something

[686:02]

demonstrative of how you could use

[686:04]

arrays to store three things in memory.

[686:06]

It's quite simply the numbers 1 2 3, but

[686:08]

you can imagine it being three people's

[686:10]

names, three sweaters, three people, or

[686:12]

any other piece of data as well. So, I'm

[686:14]

going to go ahead and at the top of list

[686:16]

C include standard io.h. I'm going to

[686:19]

then do int main void. So, no command

[686:20]

line arguments. Then, I'm going to go

[686:22]

ahead and give myself an array of

[686:24]

integers of size three called list. And

[686:27]

that's how we've done that uh from week

[686:29]

two onward. Then just for the sake of

[686:31]

discussion, I'm going to hardcode some

[686:32]

representative values. So the first

[686:34]

value will be at location zero because

[686:36]

arrays are zero indexed. Then I'm going

[686:37]

to do the second value which will be

[686:39]

two. And then the third value which will

[686:41]

be at location two, but the value will

[686:44]

be three. Now just to prove that we've

[686:46]

stored this correctly in memory, let's

[686:47]

just do a quick for loop for int i

[686:49]

equals uh equals z. Uh i is less than 3

[686:54]

i ++.

[686:56]

And then inside of this for loop, I'm

[686:58]

just going to do a quick print f of

[686:59]

percent i back slashn printing out the

[687:02]

value of list at location i. So it's not

[687:05]

a useful program per se, but it gives us

[687:07]

an array to play with. It prints out

[687:08]

that what's in it. So hopefully we will

[687:10]

see one, two, and three on the screen.

[687:12]

So let me make this list program dot

[687:15]

/list enter. And voila, we're on our way

[687:17]

going. All right. But what if now we

[687:20]

actually want to uh change that design

[687:23]

and be like, "Oh, shoot. I now have a

[687:24]

fourth number that I want to store or

[687:26]

just bought a fourth sweater or a fourth

[687:28]

person wants to get in line or I want to

[687:30]

add a fourth friend to my contacts.

[687:31]

Whatever the scenario might be, it

[687:33]

stands to reason that ideally you would

[687:35]

plop that fourth value right here in

[687:37]

memory so that everything remains

[687:38]

contiguous. You're still using an array.

[687:40]

Your code doesn't really have to change

[687:41]

except for the length. All for for all

[687:43]

intents and purposes, it's the same

[687:45]

implementation using a just a bit more

[687:47]

memory. But recall that when you declare

[687:49]

an array of a fixed size, you only are

[687:52]

getting promised that chunk of memory,

[687:55]

not necessarily more memory to the

[687:57]

right, to the left, above or below

[687:58]

conceptually because recall in the

[688:00]

context of your whole computer, you've

[688:01]

got this canvas of memory, all of which

[688:04]

represent here bytes. And there could be

[688:06]

a whole bunch of actual values or

[688:07]

garbage values in memory. So in a more

[688:10]

complicated program, that 1 2 3 sure

[688:12]

might end up here. But if I also had

[688:14]

created a string in this program, h e l

[688:17]

o comma world might have also ended up

[688:19]

right next to it in memory. Which means

[688:21]

I can't just plop the four here because

[688:23]

then if I'm still using that string

[688:24]

elsewhere in my program now it's going

[688:26]

to say hello world instead of hello

[688:28]

world because you're just claiming the h

[688:30]

that bite as your own which does not in

[688:33]

fact belong to your array. Of course

[688:35]

there looks like there's plenty of other

[688:37]

memory I could use here because these

[688:38]

garbage values represented by Oscar are

[688:41]

not being used. They've been used in the

[688:42]

past, but we treat garbage values as

[688:44]

memory we could reuse. Certainly. So,

[688:46]

wouldn't it be nice to maybe just plop

[688:48]

the 1 2 3 and four in this chunk of

[688:50]

memory over here? And I can totally do

[688:53]

that. But, of course, if I want to do

[688:54]

that, I got to copy the first three

[688:56]

values over and then put the fourth one

[688:58]

there and then presumably give back to

[689:00]

the operating system the memory I no

[689:02]

longer need. So, that in fact when using

[689:05]

arrays is a perfectly valid solution.

[689:09]

And I think we can go ahead and do this

[689:11]

in our same program. So let me go back

[689:12]

to VS Code here. And instead of

[689:15]

statically allocating memory for this

[689:18]

array and by static I mean literally

[689:20]

hard hard- coding the number three here

[689:22]

in a way that is permanent uh

[689:24]

effectively. Let me go ahead and do this

[689:27]

instead. At the top of my code, let me

[689:29]

delete the static allocation of that in

[689:32]

uh that array before. And now let me

[689:35]

leverage my understanding if still

[689:37]

preliminary of pointers and memory

[689:39]

management from this past week four to

[689:41]

just dynamically allocate a guess at how

[689:44]

much memory I need initially. So I'm

[689:46]

going to go ahead and use maloc and

[689:48]

allocate space for three integers but

[689:50]

integers take up a few bytes and it's

[689:52]

usually is four but just for good

[689:54]

measure I'm going to say times whatever

[689:56]

the size of an int is is the total

[689:58]

number of bytes I want. So presumably

[690:00]

it's going to be 3 * 4 equals 12. But

[690:02]

I'm generalizing it. But then recall

[690:03]

that maloc returns the address of that

[690:06]

chunk of memory, the address of the

[690:08]

first bite. So if I want to create an

[690:11]

array effectively called list, I can't

[690:14]

just do int list like this yet. But what

[690:18]

I could say is that all right now my

[690:20]

list variable is actually going to be

[690:22]

the address of an integer and set

[690:25]

maloc's return value equal to that. So

[690:28]

in code here what I've done is I'm

[690:29]

asking on the right hand side the

[690:30]

operating system please give me 12

[690:33]

contiguous bytes in memory. All of those

[690:35]

bytes of course can be numerically

[690:37]

addressed like ox123425.

[690:39]

We've had that story before. Maloclock

[690:42]

by definition returns the address of the

[690:43]

first such byte and it's on me to

[690:45]

remember that I allocated 12 if need be.

[690:48]

So I'm just storing the address of that

[690:50]

first bite in a pointer called list. But

[690:54]

recall from last week, there's this

[690:57]

functional equivalence we saw between

[690:59]

treating a pointer as an array and

[691:02]

sometimes even treating an array like a

[691:04]

pointer. The C uh language sort of lets

[691:07]

us do this this conversion if you will.

[691:10]

So what I could do here now is quite the

[691:12]

same syntax as before. I could say list

[691:16]

bracket 0 gets one, list bracket one

[691:19]

gets two, list bracket two gets three.

[691:22]

And even though I have this fancy new

[691:24]

line inspired by week four, the syntax

[691:27]

thereafter can be exactly the same. Why?

[691:30]

Well, recall that these three lines here

[691:32]

using square bracket notation is just

[691:34]

syntactic sugar for the stuff we learned

[691:36]

last week. Specifically, I could instead

[691:39]

of doing list bracket zero, I could much

[691:41]

more arcanely say go to that address in

[691:45]

list and put the number one there,

[691:47]

please. I can say go to the address list

[691:52]

+ one and put the value two there. I

[691:55]

could then say finally go to the address

[691:57]

at list + two and put the number three

[692:00]

there. But this looks ridiculous and

[692:02]

even u sort of an experienced programmer

[692:04]

might not be inclined to do this. If

[692:05]

with using fewer keystrokes and more

[692:07]

readable code, they could just do

[692:09]

instead what I did the first time

[692:10]

around, which is functionally the same,

[692:13]

and just treat that chunk of memory as

[692:15]

though it's an array. and the computer

[692:18]

will essentially do the requisite

[692:20]

pointer arithmetic to figure out where

[692:22]

to put one, two, and three. So even

[692:24]

though this is still kind of fresh, hot

[692:26]

off the press from last week, it's

[692:27]

exactly the same as we tinkered with

[692:29]

last week. So suppose now that some time

[692:33]

passes and I realize for the sake of the

[692:36]

story that oh shoot, I need more than

[692:38]

three integers. I need space for four so

[692:40]

as to achieve this picture in memory.

[692:43]

Well, I could of course just like delete

[692:44]

all that code, change the three to a

[692:46]

four, redo the whole thing, recompile

[692:48]

the code, rerun it. But let me propose

[692:50]

that we write our code in a way that

[692:51]

allows us to change our mind while the

[692:54]

program is running how much memory we

[692:56]

actually need. And case in point, if you

[692:57]

meet someone new, you want to add them

[692:59]

to your phone. Well, you obviously don't

[693:00]

want to have to wait for Apple to

[693:02]

recompile the contacts app, reboot your

[693:04]

phone just to add one more person. You

[693:06]

want the program just to ask the

[693:07]

operating system for more memory for

[693:10]

that new person. So in this case, let's

[693:12]

just pretend that some time passes and

[693:14]

now I want to go ahead and actually

[693:15]

change my mind and instead allocate

[693:18]

space for four integers instead. Well, I

[693:21]

could do something like this. I could

[693:22]

just say literally list equals maloc of

[693:26]

4*

[693:28]

size of int semicolon. I don't need to

[693:32]

redeclare list on line 13 because it

[693:34]

already exists from line five. But this

[693:37]

is bad because what have I done wrong

[693:40]

here in line 13? I've made a poor

[693:43]

decision. Yeah, in front.

[693:46]

>> You

[693:47]

like waste all the memory that

[693:49]

>> Yeah, I'm wasting all of the memory I

[693:51]

had from line five because I'm

[693:52]

essentially forgetting where it is. If

[693:55]

the list pointer is literally a pointer,

[693:58]

like a foam finger pointing somewhere in

[693:59]

memory, what I'm really doing is saying

[694:01]

point it over here now, but I've

[694:03]

completely lost track of those other

[694:05]

three integers in memory. And that's

[694:06]

what we described last week as a memory

[694:08]

leak, which you could find with

[694:09]

valgrren. And if you didn't find it or

[694:11]

fix it in code, eventually the computer

[694:12]

and the program would slow down over

[694:14]

time. So this is probably bad. It's not

[694:16]

good to just unilaterally change your

[694:18]

mind and say, "No, no, no, forget about

[694:20]

that memory. Give me a new chunk of

[694:21]

memory." especially if you want to copy

[694:23]

the old memory into the new, just like I

[694:25]

did a bit ago when trying to get the 1 2

[694:27]

3 into the bigger chunk of memory that

[694:30]

can fit 1 2 3 4. So, how might I do

[694:32]

this? Well, a temporary variable is kind

[694:34]

of our go-to solution anytime we need to

[694:35]

remember something in addition to uh

[694:38]

something we already have in mind. So,

[694:40]

let me just give myself a temporary

[694:41]

variable called tmp by convention for

[694:44]

short and set the return value of this

[694:46]

mala call to that. And then what I could

[694:49]

do is something like this. Much like my

[694:51]

print statement earlier, I could do

[694:53]

another for loop and say for int i

[694:55]

equals 0, i is less than 3, i ++. And

[695:00]

then in this for loop, I could say treat

[695:02]

that new chunk of memory as an array

[695:04]

like we can set the i location equal to

[695:08]

the i location in list. So these lines

[695:11]

here

[695:13]

copy old list into new list. It copies

[695:17]

those first three values. And then what

[695:19]

I bet I could do at the bottom here is

[695:20]

then just manually I can say go to the

[695:23]

fourth location which when you zero

[695:25]

index is technically bracket three and

[695:27]

set that equal to the number four. So

[695:30]

these lines here copy the one, the two,

[695:33]

and the three using a loop. And then

[695:34]

line 20 here at the moment just adds the

[695:36]

fourth value. And again, this is a

[695:38]

stupid sort of way to write code in that

[695:40]

if you want to put the four there, you

[695:42]

should have just done it earlier. I'm

[695:43]

just pretending that some time has

[695:44]

indeed passed in the program. and I've

[695:46]

changed my mind along the way and I want

[695:48]

to let the user add some value to

[695:50]

memory. Okay, but before we proceed

[695:53]

further, I dare say that there are some

[695:58]

other mistakes we should clean up. One

[696:00]

of the lessons I preached last week was

[696:02]

that anytime you use Maloc, what should

[696:04]

you do or check for

[696:08]

is you should always what? You should

[696:10]

always free. So here I'm clearly not

[696:12]

freeing any memory. So I should

[696:14]

definitely do that. And there was one

[696:15]

other rule of thumb with memory. What

[696:17]

should you always do when using Malik?

[696:20]

Yeah.

[696:22]

>> Check to see if null came back, which

[696:24]

just means something is wrong, like it's

[696:26]

out of memory or something else went

[696:27]

wrong. And if you don't do that, your

[696:29]

program may very well crash with one of

[696:31]

those segmentation faults that we saw uh

[696:33]

briefly in the past. So, it makes the

[696:35]

code a lot more bloated, but it is good

[696:37]

practice. So, let's just check if the

[696:38]

list pointer I get back contains null.

[696:41]

There's no point continuing on. Let's

[696:43]

just go ahead and immediately return one

[696:46]

because something has indeed gone wrong.

[696:47]

And then down here under maloc again,

[696:50]

let's do the same. If the temporary

[696:52]

pointer also contains null, now let's go

[696:54]

ahead and similarly return one or any

[696:57]

other nonzero value. But here's a

[696:59]

subtlety and let me combine your two

[697:00]

ideas. If I immediately return one on

[697:03]

line 20 after the second maloc call

[697:06]

fails, what should I still go back and

[697:08]

do first?

[697:12]

Yeah. Yeah. You want to elaborate on

[697:13]

your first instinct?

[697:17]

>> Yeah. I want to still free the first

[697:18]

chunk of memory because if we execute

[697:20]

line five and all is well, which means

[697:22]

that line 6, 7, 8, and 9 don't apply.

[697:25]

Like it's not in fact null. We got back

[697:26]

a legitimate value. That means we have a

[697:29]

chunk of memory given to us for three

[697:31]

integers, which means it still exists

[697:33]

down here at line 19 and 20. So if I'm

[697:36]

ready now to abort this program and

[697:38]

return one to signify error, I first

[697:41]

want to free that original list and say

[697:43]

to the operating system, here's your

[697:45]

memory back. Now, as an aside, strictly

[697:47]

speaking, this is not necessary because

[697:48]

the moment the program itself quits, the

[697:50]

computer is just going to give back the

[697:52]

memory to the operating system. So when

[697:54]

programs quit, the memory leaks sort of

[697:56]

go away, but your code is still buggy.

[697:58]

And generally we're running software

[698:00]

that doesn't run for a split second but

[698:01]

for minutes, hours, days, uh continually

[698:04]

in which case it's best practice to

[698:06]

squash these memory related bugs now.

[698:08]

Check for null, free any memory so that

[698:10]

you never indeed encounter these kinds

[698:13]

of leaks. All right, so let's forge

[698:14]

ahead a little bit more and let me

[698:17]

propose that after we have done the

[698:19]

copy, we now want to similarly free the

[698:23]

original list. However, what I think

[698:25]

we're going to want to do first is after

[698:28]

freeing the original list is remember

[698:31]

that the new list is effectively

[698:34]

that which we allocated the second time

[698:36]

around. So even though this program is

[698:38]

getting a little long, notice that what

[698:39]

I've just done is I've said, okay, store

[698:41]

in the list variable the address of this

[698:43]

new chunk of memory. So that list now

[698:45]

with a foam finger is effectively

[698:47]

pointing here instead of up here. But

[698:50]

before that, I made sure to free what my

[698:53]

finger was pointing at originally, the

[698:56]

list pointer. All right. Lastly, let's

[698:59]

just scroll down to the bottom of the

[699:00]

code here. I can manually change the

[699:02]

three to a four just to demonstrate that

[699:04]

I've stored all four values in here. And

[699:06]

then at the very end of the program, I

[699:07]

think I have to free the list again

[699:09]

because now list is pointing all the

[699:11]

foam finger to the bigger chunk of

[699:12]

memory, the 1 2 3 4. And then I can go

[699:15]

ahead and return zero at the very end

[699:17]

because all is hopefully well at this

[699:20]

point. Let me go ahead and open my

[699:22]

terminal window again and make this

[699:23]

version of list. I made a lot of

[699:25]

mistakes here it seems. Let's scroll up

[699:27]

to the very first call to undeclared

[699:30]

library function maloc dot dot dot. What

[699:34]

have I apparently done wrong or

[699:37]

forgotten? What have I done wrong? Yeah.

[699:40]

In back. Yep. Yeah. So in standard lib.h

[699:44]

H is where maloc is actually declared.

[699:45]

So let's just add that quickly. Let's go

[699:47]

ahead and include standard lib.h in

[699:50]

addition to standard io.h. Let me clear

[699:52]

my terminal window. Rerun make list.

[699:55]

Enter. Now we're good. Dot /list. And ph

[699:58]

we see 1 2 3 4. Okay. So at this point

[700:01]

in the story, all we've done is write a

[700:03]

dopey little program that allocates

[700:05]

memory for three integers. 1 2 and

[700:07]

three. then changes our mind and

[700:09]

allocates more memory for four integers,

[700:11]

freeing the original chunk of memory

[700:13]

after copying the first three integers

[700:15]

into the new memory and adding that

[700:17]

fourth value. But this is kind of a lot

[700:19]

of hoops to jump through. And let me

[700:20]

propose one refinement here. So if back

[700:23]

in VS Code, we go back into list.c here.

[700:26]

It turns out that at least this loop

[700:28]

isn't strictly necessary, not to mention

[700:31]

the fact that we already have another

[700:32]

loop for just printing the list. If I

[700:34]

want to more cleverly reallocate memory,

[700:37]

it turns out that there's another

[700:38]

function that we didn't talk about last

[700:39]

week, but is in standard lib.h2 called

[700:42]

realloclock, which as the name kind of

[700:44]

suggests, it reallocates memory, but a

[700:46]

little smarter in that it will try to

[700:49]

grow your existing chunk of memory if it

[700:52]

can, which is going to be super

[700:53]

efficient because then you can just plop

[700:54]

the four at the very end. or if there

[700:56]

just isn't room there because maybe

[700:58]

someone else put hello world right there

[701:00]

in memory elsewhere in your program.

[701:02]

It's going to do all of the copying for

[701:03]

you. So what you get back ultimately is

[701:05]

a pointer to the new chunk of memory

[701:08]

containing all of the original data as

[701:10]

well. However, we're still going to have

[701:12]

to check for null. We're still going to

[701:15]

want to free the original list if

[701:16]

something goes wrong and then return

[701:18]

one. We're still going to want to add

[701:20]

the fourth value because realo has no

[701:22]

idea what more we want to put in the

[701:23]

list. But I can in fact delete my other

[701:26]

for loop whose purpose in life was just

[701:28]

to copy all of those integers from old

[701:31]

into new. All right, that was a lot. Let

[701:35]

me pause for any questions.

[701:37]

>> How does real know that it should

[701:39]

reallocate the memory in list? Should

[701:41]

you tell like if you have a lot of

[701:44]

before, how does it specifically?

[701:48]

>> Very good question. That's because I

[701:49]

wrote a bug uh that we didn't trip over

[701:51]

because I didn't compile this version of

[701:52]

the code. So the question is how does

[701:54]

realloc know what to realloclock? Well,

[701:56]

according to the documentation which I

[701:57]

forgot to read, you need to tell

[701:58]

realloclock what the address is of the

[702:02]

chunk of memory that you do want to

[702:03]

realloc. So the first argument to

[702:05]

realloc, which I did admittedly forget

[702:07]

until a moment ago, is to put the

[702:10]

address of the chunk of memory that you

[702:11]

already maloced earlier so that it knows

[702:13]

to go there, see if there's indeed some

[702:15]

garbage values it can reclaim at the end

[702:17]

of that chunk of memory or if it has to

[702:19]

wholesale move things elsewhere in

[702:21]

memory to give you four times the size

[702:23]

of the int this time instead of just

[702:25]

three. But still things can go wrong

[702:27]

like you still want to check for this

[702:29]

null value because real might not be

[702:30]

able to give you enough memory or your

[702:32]

memory could just be so fragmented that

[702:34]

even though you want four bytes maybe

[702:36]

there's three bytes over here two bytes

[702:38]

over here one bite over here if there

[702:39]

aren't four contiguous bytes realloclock

[702:42]

2 could fail and it will return null to

[702:45]

signify as much other questions on any

[702:47]

of this

[702:48]

>> why do we still need the tempable

[702:51]

>> why do we still need the temp variable

[702:52]

for the same reasons as before because

[702:55]

if we just say list equals reallock and

[702:59]

something does go wrong. Realloc by

[703:01]

definition will return null but not

[703:03]

touch the original memory which case we

[703:05]

have now lost track of where that

[703:08]

original chunk of memory is. So we can

[703:10]

never go back to it to print it to

[703:11]

change it to free it. So we have to use

[703:13]

this temporary variable here. Good

[703:16]

question. Other questions? Yeah.

[703:18]

>> Is there a reason?

[703:22]

Is there a reason that we free list

[703:24]

instead of temp? Uh, so let me So down

[703:27]

here or further down? Okay, so further

[703:30]

down, let me scroll down to where we

[703:32]

came from. So here after we've added

[703:36]

this fourth value to temp, I've gone

[703:39]

ahead and freed list, which at this

[703:41]

point in the story is still pointing to

[703:43]

the original chunk of memory, the 1 2 3.

[703:46]

Then I am updating

[703:48]

list as a variable to point to the new

[703:50]

chunk of memory. Then I'm doing my thing

[703:52]

by printing out all of the integers

[703:54]

therein. Then I am freeing what list is

[703:57]

then pointing to. So I'm not technically

[703:58]

freeing the same address in memory

[704:01]

multiple times because I'm in the

[704:03]

intervening time moving what list is

[704:06]

pointing to.

[704:10]

>> Absolutely

[704:13]

yes. it would be correct to go ahead

[704:15]

down here and just say temp because temp

[704:17]

is still in scope. It's still pointing

[704:18]

at the same thing. I would just argue

[704:20]

that that's semantically wrong because

[704:22]

at this point in the code really list is

[704:25]

the variable you care about. Temp was

[704:27]

really meant to be a throwaway temporary

[704:28]

variable and you're asking for trouble

[704:31]

if you use a temporary variable later

[704:33]

than you the programmer intended. And if

[704:35]

a colleague did that too, who knows what

[704:37]

you've done with the temp variable in

[704:38]

the meantime. Good questions. Yeah, in

[704:41]

front

[704:43]

Real always goes for the like memory

[704:46]

space right after your original place.

[704:48]

>> Correct. Realloc will try to give you

[704:50]

more memory in the same location as

[704:52]

before if there's room at the end.

[704:54]

>> The code we made earlier originally

[704:56]

instead of realloc

[704:59]

>> so realloc will two potential things for

[705:02]

you. So if the computer's memory looks

[705:04]

like this, you're sort of out of luck

[705:06]

because realo can't give you this bite.

[705:09]

However, if it finds like four bytes

[705:11]

down here, for instance, realloc will

[705:13]

not only allocate those four bytes for

[705:15]

you, it will then copy the data for you

[705:18]

over to it, which is wonderful because

[705:19]

it just means we don't need an extra for

[705:20]

loop all the time we do this.

[705:24]

Yeah, in front.

[705:25]

>> How does it know how much data?

[705:29]

>> How does it know how much data to

[705:31]

>> copy?

[705:31]

>> Uh because how much how does the how

[705:33]

does real know how much data to copy?

[705:36]

Because the operating system and you can

[705:38]

think of it as the standard library

[705:39]

stdlib.h

[705:42]

keeps track of what memory has been

[705:44]

allocated for you in the past. So when

[705:47]

you pass in that same address, it knows

[705:49]

it has essentially a lookup table, a

[705:50]

dictionary if you will, that tells it

[705:52]

what memory has been allocated already.

[705:53]

So you don't have to worry about that.

[705:55]

>> Yeah. In front.

[706:03]

>> Good question. In other programming

[706:04]

languages, you don't always have to

[706:05]

declare the length of an array. Case in

[706:07]

point, Python coming next week. That is

[706:09]

because someone else who invented that

[706:12]

programming language wrote all of this

[706:13]

kind of code for you. And indeed, that's

[706:15]

one of the goals with our transition

[706:16]

between weeks five and six is to

[706:18]

demonstrate that all of these problems

[706:19]

are still being solved, just not by you

[706:21]

and not by me anymore. We're standing on

[706:23]

the shoulders of other smart people who

[706:24]

have invented not just new code, but

[706:26]

like a new language and a new compiler,

[706:28]

or as we'll see, an interpreter for it

[706:30]

so that we can hide all of these lower

[706:32]

level details. Because honestly, as you

[706:33]

can see already, like this is an

[706:34]

annoying number of lines of code just to

[706:36]

have a conversation about the numbers 1

[706:38]

2 3 4. In Python, we could reduce this

[706:40]

code to like two lines of code, one line

[706:42]

of code. It's going to be fun. All

[706:45]

right, so with that said, the uh among

[706:48]

the goals here was to demonstrate that

[706:50]

there are a bunch of ways in which we

[706:51]

can implement these data types, but

[706:53]

let's talk more concretely about what

[706:54]

we'll call data structures, which are

[706:56]

concrete definitions of how you use the

[706:58]

computer's memory to lay stuff out in

[706:59]

memory. and using data structures, you

[707:01]

can implement stacks and cues and

[707:03]

dictionaries and all of these other

[707:04]

things. So, we're going to put into your

[707:06]

toolkit today a whole bunch of canonical

[707:08]

data structures that like every computer

[707:10]

scientist does and should know that you

[707:12]

necess won't necessarily implement all

[707:14]

of the time yourself. But when you use

[707:15]

some feature of Python or Java or C++ or

[707:18]

some other language, you are choosing

[707:20]

among typically implementations of these

[707:22]

data structures that someone else has

[707:24]

written the code for so that you can

[707:25]

just benefit from the functionality and

[707:27]

the features thereof like that FIFO

[707:29]

property we talked about or LIFO without

[707:31]

having to get into the weeds too much

[707:33]

yourself. So when it comes to data

[707:36]

structures, let's consider that we have

[707:39]

at our disposal now a few new pieces of

[707:41]

syntax in C and we're going to add just

[707:43]

one more today. We saw last week that we

[707:45]

have the strruct keyword and we've seen

[707:47]

that for a few weeks now. Whenever we

[707:48]

want to invent our own data structure,

[707:51]

we can use literally strruct. We saw in

[707:53]

the past that you can use the dot

[707:55]

operator to actually go inside of a

[707:57]

structure to get at someone a person's

[707:58]

name or their number. And we saw last

[708:01]

week the star operator for dreferencing

[708:04]

a pointer, dreferencering an address to

[708:05]

actually go somewhere like inside of a

[708:07]

structure wonderfully. Today we're going

[708:10]

to see that you can actually in some

[708:11]

cases combine the dot and the asterisk

[708:14]

into a single operator with two

[708:15]

characters that literally looks like an

[708:17]

arrow and that will help reflect the

[708:19]

yellow and black drawings that we've

[708:21]

done over the past couple of weeks where

[708:23]

we have an arrow on the screen pointing

[708:24]

somewhere. This literal arrow in code is

[708:27]

going to line up with that same concept.

[708:29]

So let's introduce the first of our

[708:31]

alternatives to arrays. An array again

[708:33]

is a contiguous chunk of memory where

[708:35]

the values are back to back to back.

[708:37]

Among the upsides so fast because like

[708:39]

all the data is right there. We've seen

[708:40]

since week zero, you can do binary

[708:42]

search and just jump around randomly by

[708:44]

just doing simple arithmetic to go to

[708:46]

the middle the middle of the middle by

[708:47]

just dividing by two a couple of times

[708:49]

and rounding as needed. But the problem

[708:51]

with arrays to be clear is that they are

[708:53]

statically

[708:55]

uh they are statically all allocated to

[708:58]

be a specific size maybe three maybe

[709:00]

four but it is a finite value which is

[709:02]

problematic because look at all the code

[709:04]

we had to write just to resize these

[709:06]

things again and again. Well, what if we

[709:08]

sort of try to preempt that kind of pain

[709:12]

and try to just build up a list by

[709:14]

linking it together no matter where the

[709:17]

values actually are in memory and move

[709:19]

away from this constraint that

[709:20]

everything has to be contiguous. After

[709:21]

all, as I said a moment ago, if the

[709:23]

computer has plenty of memory here,

[709:25]

here, here, here, that to collectively

[709:28]

is more than enough memory, but none of

[709:30]

those individual chunks is quite as big

[709:32]

as you need for an array. Well, heck,

[709:34]

let's at least try to leverage all of

[709:36]

the available memory and stitch together

[709:38]

the data structure as opposed to really

[709:40]

holding firm this constraint that the

[709:42]

array be back to back to back and

[709:44]

contiguous. So, a linked list is

[709:46]

something you can now build using that

[709:48]

syntax from last week and a bit more

[709:50]

today in your same canvas of memory. So,

[709:52]

that for the sake of discussion, suppose

[709:53]

that we want to store first in our list

[709:56]

the number one. Well, we all know

[709:58]

already that it might very well exist at

[710:00]

an address like ox123 for the sake of

[710:02]

discussion, but it's somewhere there.

[710:04]

Suppose that you want to store a second

[710:05]

value in memory, but you didn't think

[710:07]

about it initially and so you weren't

[710:08]

smart enough to put it like right next

[710:10]

to the one and then the next value next

[710:12]

to that, but you know somehow from maloc

[710:14]

or similar functions that you could put

[710:16]

the number two over here at address

[710:18]

ox456 for the sake of discussion and

[710:20]

similarly there's room for the number

[710:22]

three over here at say address ox789.

[710:26]

So already we have a list of values in

[710:28]

memory, but because they're not

[710:30]

continuous, you can't just do some

[710:32]

trivial plus+ trick to go from one to

[710:34]

the other because they're differing

[710:36]

numbers of bytes apart. They're not just

[710:38]

backto back one bite. So what if we try

[710:41]

to solve that problem in the following

[710:44]

way? Instead of just using one bite for

[710:46]

each of these values, let me waste a

[710:48]

little bit of memory or spend a little

[710:50]

bit of memory and have some metadata

[710:52]

associated with our data. So data is

[710:54]

value or values you care about. Metadata

[710:57]

is data that helps you maintain the data

[710:59]

you care about. So let me propose that

[711:01]

we use two chunks of memory for every

[711:03]

value such that the top of each of those

[711:05]

chunks represents the actual var you we

[711:07]

care about 1 2 and three respectively.

[711:09]

And you can perhaps see where this is

[711:11]

going. The second chunk of memory that

[711:12]

I've allocated to each of these values

[711:15]

could perhaps be a pointer to the next

[711:18]

one. A pointer to the next one. And if

[711:20]

this is the end, we can put our old

[711:22]

friend o x0 aka null and just treat that

[711:25]

as the end of the list implicitly. So

[711:28]

even though these things could be

[711:29]

anywhere in memory, by just storing with

[711:32]

each value the address of the next value

[711:35]

in memory, creating effectively a

[711:36]

treasure map or breadcrumbs, however you

[711:38]

want to think of it metaphorically, we

[711:40]

can get from one node to the other. And

[711:43]

indeed, that's going to be a term of art

[711:44]

we start using. A node is just a generic

[711:46]

structure that contains data and

[711:48]

metadata usually like the number you

[711:50]

care about and a pointer to the next

[711:52]

such node. Um these are not to scale as

[711:54]

an aside. This is typically four bytes.

[711:56]

A pointer as we've discussed is

[711:57]

technically eight bytes but it just

[711:58]

looks prettier to draw them as simple

[712:00]

squares on the screen. So what does this

[712:02]

really mean? Well, who really cares

[712:03]

about ox 1 2 3 4 5 6 7 8 9. We can

[712:06]

really think of this actually as being

[712:08]

more of a picture with arrows. But to

[712:10]

keep track of this list of three values,

[712:12]

I do propose that we're going to need

[712:14]

one additional value over here. And it's

[712:17]

deliberately just a single square

[712:18]

because to keep track of this list of

[712:20]

three values, I'm going to use just one

[712:22]

variable called say list and store in

[712:25]

that variable a pointer as we defined it

[712:28]

last week, the address of the first

[712:30]

node. Why? Because the first node can

[712:32]

then get me to the second. The second

[712:33]

node can then get me to the third and so

[712:36]

forth. So what's the upside now? If I

[712:38]

want a fourth value somewhere on the

[712:40]

screen, I could put it here, here, here,

[712:42]

here, wherever there's enough room and

[712:44]

just make sure that I update the arrow

[712:47]

to point to that next chunk. Update the

[712:48]

arrow to point to the next chunk.

[712:49]

There's no copying of data. 1 2 and

[712:51]

three can stay there now forever until

[712:53]

the program quits and we do actually

[712:54]

free it. But we can just keep adding

[712:56]

adding adding or growing this data

[712:59]

structure in memory. So that is what the

[713:02]

world knows as a linked list. In Python

[713:06]

to which you were essentially alluding

[713:08]

um a list in Python is indeed a linked

[713:10]

list. Other languages call these vectors

[713:12]

but they are essentially arrays that can

[713:14]

be grown and shrunken automatically

[713:17]

effectively without you having to worry

[713:18]

quite as much about it. So how does the

[713:20]

code for implementing something like

[713:22]

this work? Well, let me propose that we

[713:24]

have this familiar friend of a person,

[713:26]

which we claimed in past weeks has a

[713:29]

name and a number associated with them.

[713:31]

We know from last week that strings are

[713:33]

not technically a thing in C as a

[713:35]

keyword. So that's technically just char

[713:37]

star name and number, but same idea

[713:39]

otherwise. And this is what we defined

[713:41]

in the past as a person. So this is a

[713:43]

structure we've seen before. I now need

[713:45]

to implement the code equivalent of

[713:47]

these rectangles, each of which has an

[713:48]

integer and then a pointer to the next

[713:51]

such value. So let me propose that we

[713:53]

delete what's inside this structure,

[713:55]

change the name from person to node,

[713:57]

which again is a generic term for a

[713:58]

container of values, and let me propose

[714:01]

that inside of this new node structure,

[714:03]

we put literally an int for the number

[714:05]

we care about. There's going to be my 1

[714:07]

2 3 or four. And then and this is a

[714:10]

little bit new. Let's include in this

[714:14]

structure a pointer to the next such

[714:18]

node. It's a pointer in the sense that

[714:20]

it's an arrow. It's the address of the

[714:21]

next node. So that's why we say node

[714:23]

star. I could call it anything I want,

[714:25]

but semantically calling it next makes

[714:27]

perfect sense because it's the next such

[714:28]

node. But this isn't quite right. For

[714:30]

annoying technical reasons, I need to do

[714:32]

one other thing here. I need to

[714:34]

technically and we've not done this

[714:36]

before put the name give the a temporary

[714:39]

name to this structure if you will. So

[714:41]

literally say strruct node here even

[714:43]

though I've already said node here. Why?

[714:45]

Because I technically need to change

[714:46]

this line to say strruct node star. Long

[714:50]

story short why is this necessary? Well

[714:51]

recall in the past C and the compiler

[714:53]

read your code top to bottom left to

[714:55]

right. Well if in a previous version of

[714:57]

this code we use the word node here but

[714:59]

the compiler never sees the word node

[715:01]

until down here. like it's just not

[715:03]

going to compile because the word

[715:04]

literally doesn't exist. We saw this

[715:05]

with functions in the past. So we the

[715:07]

solution to that was to put the

[715:08]

prototype higher up in the file and then

[715:10]

it would compile. Okay, you can think of

[715:12]

this as somewhat analogous whereby if I

[715:14]

give this structure a name on this first

[715:17]

line even if it's redundant to this one

[715:19]

then I can say struck node inside of

[715:21]

these curly braces because the compiler

[715:24]

has already seen the word node there. So

[715:27]

just you have to do it this way. So now

[715:30]

that we have this in code, we can kind

[715:33]

of start playing around with actually

[715:35]

storing these things in memory. So let

[715:38]

me propose that we go ahead and do this

[715:41]

by transitioning back to VS code here.

[715:43]

And let's instead of using our array

[715:45]

based implementation, let's implement

[715:47]

the first of our linked lists. And I'm

[715:48]

going to be a bit extreme and delete

[715:51]

pretty much everything inside of main. I

[715:53]

am for convenience now going to include

[715:55]

the CS50 library not so much for the

[715:57]

char star thing but because as we

[715:59]

discussed last week it's still useful

[716:01]

for getting ints and getting strings and

[716:03]

other things which instead unless you

[716:04]

use scanf are much harder and more

[716:06]

annoying to get in C. So let's go ahead

[716:08]

and do this um outside of main let's go

[716:11]

ahead and invent this node called

[716:14]

strruct node here. Then inside of my

[716:15]

curly braces, we'll give every such node

[716:17]

a number and every such node a pointer

[716:22]

to the next such node. And we'll call

[716:24]

this whole thing node by convention.

[716:26]

Then inside of main, let's go ahead and

[716:28]

do this one step at a time. Let me

[716:30]

propose that to create a linked list.

[716:32]

Initially, it's empty. So how do I

[716:34]

represent an empty linked list? Well, I

[716:36]

could call the variable list and set it

[716:38]

equal to null. But what is the data type

[716:41]

for a linked list? Well, per the picture

[716:43]

that we had up earlier, in so far as all

[716:46]

we need is a single pointer at far left

[716:49]

here to represent the address of the

[716:51]

first node in the list. I dare say all

[716:54]

we need to say is that our list is of

[716:56]

type node star. That is to say, what is

[716:58]

the link list? Well, it's by definition

[717:00]

the address of the first node in the

[717:03]

list.

[717:05]

So that's the first subtlety here. So

[717:07]

that gives me a picture with no other

[717:10]

nodes. It just gives me a single pointer

[717:11]

initialized to null. Now let's go ahead

[717:14]

and for par with the previous example

[717:16]

just do something three times. So in

[717:18]

this for loop structured exactly as

[717:20]

before, let's go ahead and allocate a

[717:23]

new node, ask the user for a number to

[717:26]

put inside of it and then start

[717:28]

stitching things together so as to

[717:30]

achieve a picture in memory quite like

[717:32]

this. So how am I going to do this?

[717:34]

Well, first I need to allocate a new

[717:35]

node. How do I do that? Well, I can use

[717:37]

our new friend Maloc and allocate the

[717:39]

size of a node. I want to store the

[717:44]

address of this chunk of memory

[717:45]

somewhere. And what I'm going to propose

[717:47]

is that we have a temporary variable and

[717:48]

I'll call this n which whose type is

[717:52]

that of a node star. So what am I doing

[717:55]

here? I'm trying to build up this list

[717:57]

in memory so that I first have a pointer

[718:01]

to the list. I I first have a pointer

[718:04]

that is null pointing nowhere. no list

[718:06]

exists. I then want to go ahead and

[718:09]

create one new node, store value in it,

[718:12]

and then point my list at that node.

[718:15]

Then I want to do it again and again a

[718:18]

total of three times. So how do we do

[718:20]

this? We allocate space for the size of

[718:22]

a node. However many bytes that's going

[718:24]

to be, it's probably going to be 12 cuz

[718:26]

it's four for the int and eight for the

[718:28]

pointer, but who cares? Size of will

[718:30]

answer that question for me. I'm going

[718:32]

to store the address of this chunk of

[718:34]

memory inside of a temporary variable

[718:36]

called n for node and that's why it has

[718:38]

to be node star because it's going to be

[718:41]

pointing to an actual node. I'm going to

[718:44]

do my quick sanity check. So if n equals

[718:46]

equals null, we can't proceed further.

[718:48]

I'm going to go ahead and just return

[718:50]

one right now. So that's just sort of

[718:52]

boilerplate code you should be in the

[718:53]

habit of doing anytime you're using

[718:54]

Maloc. But if all goes well, let's do

[718:57]

this. Let's go to the address in n and

[719:02]

then go inside of that node and change

[719:05]

its number to be whatever the human

[719:07]

wants it to be by using get int and just

[719:09]

prompt the human for their favorite

[719:10]

number. Then let's go to that same node

[719:13]

and update the next field to equal for

[719:16]

now null because all I want to do is

[719:19]

allocate one new node with that number.

[719:21]

That's it.

[719:22]

Then I'm going to need to stitch this

[719:24]

together further. So I'll propose that

[719:28]

all we need do and let's clean this up

[719:30]

first is now make sure that we string

[719:34]

these nodes together. This syntax isn't

[719:37]

quite right because technically because

[719:39]

of precedence I need to drefer oops I

[719:42]

need to

[719:43]

uh dreference n and then go inside of

[719:46]

it. I need to dreference n and then go

[719:48]

inside of it. However this syntax if

[719:50]

it's looking a little overwhelming and

[719:52]

you have no idea now what's going on.

[719:54]

Thankfully in C there's much simpler

[719:55]

syntax which is this. Go to the node and

[719:59]

go inside it to get the number. Go to

[720:01]

the node and go inside it to get next.

[720:03]

So the arrow notation that I promised we

[720:05]

would now have is the same thing as

[720:07]

using the star operator the deep

[720:09]

reference operator parenthesizing it.

[720:11]

Then the dot operator which is just a

[720:13]

pain in the neck to write out all the

[720:14]

time. I dare say n arrow number and n

[720:17]

arrow next is just much simpler. It says

[720:19]

go to n and point at the number field or

[720:22]

the next field respectively. All right.

[720:26]

So the last thing I'm going to propose

[720:27]

we do and then we'll make this much more

[720:29]

clear in picture form is this. Let's go

[720:31]

ahead and prepend

[720:34]

the node to the list. And by prepend I

[720:37]

mean insert it at the beginning. Insert

[720:38]

it at the beginning. Insert it at the

[720:40]

beginning again and again. I'm going to

[720:41]

say n next equals list. Then update the

[720:46]

list to set equal to n. And then after

[720:49]

all of this mess, I'm going to return

[720:50]

zero. Okay, this was a huge amount of

[720:53]

code, but let me give a quick recap.

[720:54]

Then we'll paint a picture. Here is my

[720:57]

init list initially. So the foam finger

[720:59]

is pointing to null, which is means the

[721:00]

list is of size zero. There's nothing

[721:02]

there. Then I ask the computer to do

[721:03]

this three times. Give me enough memory

[721:05]

for a new node. Then after checking that

[721:07]

it's not null, put the user's favorite

[721:10]

number in it and update the next field

[721:12]

for the moment to null. Then lastly, go

[721:15]

ahead and prepend this brand new node to

[721:18]

the existing list. And by preand

[721:19]

prepend, I mean put it at the front. So

[721:22]

n at this moment is pointing to that new

[721:24]

node. And I'm saying, you know what,

[721:26]

whatever the current list is, empty or

[721:29]

otherwise, set the next pointer equal to

[721:33]

the list, whatever that list is, and

[721:36]

then change the list to point at this

[721:37]

new node. So now let's do this more

[721:40]

carefully, step by step, in picture

[721:42]

form. So I'm going to propose that we go

[721:45]

through some of these representative

[721:46]

lines as follows. Here is the first line

[721:49]

of code even without the assignment. If

[721:51]

you just allocate a variable called list

[721:53]

that's a pointer to a node, what you

[721:55]

essentially has is a box of memory that

[721:57]

looks like this. It's a garbage value

[721:58]

though because there's no assignment

[721:59]

operator. So who knows what's inside of

[722:01]

this pointer. That is why in my actual

[722:03]

code I set it equal to null which

[722:05]

effectively creates in memory the same

[722:07]

box but gets rid of Oscar the Grouch and

[722:09]

puts the null value there. So we know

[722:11]

it's not a garbage value. It's a pointer

[722:13]

known as null. So that's what that very

[722:15]

first line of code did in the computer's

[722:17]

memory. The next thing I wanted to do

[722:19]

was allocate enough memory for a node,

[722:22]

not a node star, for a whole node. I

[722:24]

want that whole chunk of a rectangle

[722:26]

given to me in memory. That's going to

[722:28]

return to me the address of the first

[722:30]

bite thereof. And I'm going to store

[722:31]

that in a temporary variable called n.

[722:33]

So at this point in the story, n is

[722:35]

going to be a pointer of its own,

[722:36]

another box that initially sure is going

[722:38]

to be a garbage value, but because I am

[722:40]

using the assignment operator, it's

[722:42]

going to point to that chunk of memory

[722:45]

which maloc if successful presumably

[722:47]

allocated for me in the computer's

[722:49]

memory. So n for all intents and

[722:51]

purposes points at that same chunk.

[722:54]

These values are still garbage values

[722:56]

because it's just a chunk of memory. Who

[722:58]

knows what it's been used before? But

[722:59]

that's why after this line of code, I

[723:02]

took care to get an int from the user

[723:04]

and then initialize the next pointer to

[723:06]

null. So for instance, for the sake of

[723:07]

discussion, let's get rid of get int for

[723:09]

the picture and just say the human typed

[723:10]

in the number one initially. Well,

[723:12]

that's equivalent to putting the one in

[723:15]

the number field by first going to the

[723:18]

address of in n and then dreferencing it

[723:21]

using the star and the dot notation

[723:23]

respectively. So that means follow the

[723:25]

arrow and then change number to the

[723:28]

value one. Then the next line of code or

[723:31]

rather or equivalently you can just do

[723:33]

the same thing. And thankfully now C

[723:35]

syntax lines up with what the pictures

[723:37]

look like we've been drawing. Go to N

[723:39]

follow the arrow to the number field.

[723:41]

That's literally what the syntax is

[723:43]

telling me. Meanwhile, if I use that

[723:45]

same syntax again for N arrow next set

[723:48]

it equal to null. That's like saying go

[723:49]

to N follow the arrow and change the

[723:51]

next field in this case to null. or

[723:53]

we'll just blank it out to be clear. So

[723:56]

at this point in the story, we have

[723:57]

allocated the node. We have stored one

[724:00]

and null. There list is still null. N is

[724:03]

pointing to this, but the whole point of

[724:04]

this exercise is to add this node to the

[724:07]

list. So we need to somehow update this

[724:09]

value, which is why ultimately I'm going

[724:11]

to do something like list equals N. Now

[724:14]

that seems a little weird semantically,

[724:16]

but recall that N is a pointer. That is

[724:18]

the address pointing at ox123 or

[724:20]

wherever that is. So to point list at

[724:23]

the same node, it's equivalent to

[724:25]

setting list equal to n because then

[724:28]

we'll effectively have an arrow

[724:29]

identical from list pointing at that new

[724:32]

node. And at this point, I don't even

[724:34]

care what n is anymore. It was always

[724:35]

meant to be a temporary value. This now

[724:38]

is my list. So even though I did it in

[724:40]

code already pre preemptively in a loop,

[724:43]

the first iteration for that loop

[724:45]

literally created this in memory. Let me

[724:48]

pause before we go through numbers two

[724:50]

and three for any questions

[724:54]

because the VS Code version looks scary.

[724:56]

This is perhaps a little more

[724:57]

bite-sized.

[725:01]

Okay. So, how about we do this twice

[725:04]

more for two and three, respectively.

[725:06]

So, again, inside of our loop, we're

[725:08]

back to this line, which asks the

[725:09]

operating system for enough memory for

[725:11]

the size of a node, stores that address

[725:13]

temporarily in a variable called n. So,

[725:16]

here's our friend Oscar brought back

[725:17]

onto the screen. Maybe the new chunk of

[725:19]

memory is over there. This effectively

[725:21]

points n at that chunk of memory. The

[725:24]

next line of code inside of that loop

[725:25]

that's relevant is this. And we'll get

[725:26]

rid of get int and just pretend that I

[725:28]

literally typed in two. We're going to

[725:30]

go to this version of n, follow the

[725:33]

arrow, go to the number field, and set

[725:35]

that equal to two. The next line of

[725:37]

code, we start at the end, follow the

[725:38]

arrow, change the next field to null.

[725:41]

And then same lines as before, we now

[725:44]

need to update list equaling n. But

[725:48]

something's about to go wrong here. If I

[725:50]

update list to point to the same node

[725:53]

that n is pointing at, watch what

[725:55]

happens. I set list equal to that n

[725:58]

because it's temporary might as well go

[726:00]

away at this point. But

[726:03]

what have I done wrong logically here?

[726:06]

Yeah,

[726:07]

>> you lost the arrow to

[726:09]

>> Yeah, I lost the arrow to the original

[726:11]

node. I have orphaned the first node

[726:13]

because now nothing in my code is

[726:16]

actually pointing at it. I've got in

[726:18]

duplication two pointers pointing at

[726:20]

this chunk of memory. So this thing,

[726:21]

even though we obviously as humans can

[726:23]

still see it, we have lost track in code

[726:25]

of where it is, which means that is the

[726:26]

definition of a memory leak. I can never

[726:28]

get that back or give it back to the

[726:30]

operating system until the program

[726:31]

itself finally quits. So, I think I need

[726:33]

to be a little smarter and not do this

[726:35]

line quite like this yet. I think what I

[726:37]

want to do, and I've rewound, so list is

[726:39]

still pointing to the original list. N

[726:41]

is pointing to only the new node. What I

[726:43]

think we need to do is something like

[726:45]

this. And this is why the code was

[726:47]

fairly non-obvious in VS Code at first.

[726:49]

Go to N, follow the arrow, go to the

[726:52]

next field, and here's the cleverness.

[726:55]

Point this pointer to the existing lists

[726:58]

value. So if the existing list is

[727:00]

pointing here, that just means, hey,

[727:02]

point this to the exact same thing

[727:04]

because now I can safely update the list

[727:09]

to point at the same thing as n. So its

[727:11]

arrow now points here. But even when I

[727:14]

get rid of n, I wonderfully have the

[727:16]

whole thing stitched together. And the

[727:18]

metaphor I often think of is like around

[727:20]

like Christmas time in olden times when

[727:21]

people would like stitch popcorn

[727:23]

together. That's what you're kind of

[727:24]

doing with a thread here. You're trying

[727:25]

to stitch together these nodes or

[727:27]

popcorn kernels if you will such that

[727:29]

one can lead you to the next can lead

[727:31]

you to the next can lead you to the next

[727:33]

but you can never let go of part of that

[727:35]

strand in the process. So here now we

[727:38]

have a list which is great because

[727:40]

notice we haven't touched the one but

[727:41]

we've added the two. We can go ahead in

[727:43]

a moment and add the three but you can

[727:44]

perhaps see where this is going. I'm

[727:46]

kind of doing it backwards by accident

[727:48]

but we'll get there soon. So now let's

[727:50]

allocate a new node run through in our

[727:52]

mind's eye all of those same steps. I'm

[727:54]

going to hopefully end up with a list

[727:55]

that now looks like this. And even

[727:57]

though it's kind of long and stringy,

[727:59]

these values could be anywhere in

[728:00]

memory, but because of these various

[728:02]

pointers, I can jump from one location

[728:04]

to the other, making more efficient use

[728:07]

of everything inside of the computer's

[728:09]

own memory. All right, but of course,

[728:12]

we've got this symptom that I didn't

[728:13]

really intend whereby the whole darn

[728:15]

thing is backwards. But I think that's

[728:18]

kind of okay for now. But I'd like to

[728:20]

propose that we consider how we can now

[728:22]

maybe traverse this thing and actually

[728:24]

print out the values in memory. So let

[728:26]

me go ahead and do this. Let's go ahead

[728:30]

and how about

[728:33]

let's say let's go back to VS code here.

[728:35]

So at this point in the story we've got

[728:37]

the same code that implements that same

[728:39]

idea except I'm using get int just so

[728:41]

that I can dynamically type in the one

[728:43]

the two and the three without having to

[728:44]

hardcode it into the actual code.

[728:46]

Suppose that after doing this exercise,

[728:49]

I actually want to do something

[728:50]

interesting like print the numbers.

[728:53]

Well, we don't have that code yet in

[728:55]

this version of my program. So, let's

[728:58]

bring that back. Last time I did this

[728:59]

just using a for loop and array

[729:01]

notation. And I think I can do that. But

[729:03]

let me propose first that I implement

[729:07]

this idea pictorially. Here's the same

[729:09]

diagram. This is what exists in the

[729:10]

computer's memory. If I want to go ahead

[729:12]

and print out these numbers, albeit in

[729:14]

reverse order, let me propose that we

[729:16]

can do this by giving ourselves another

[729:18]

temporary variable. We'll call it ptr,

[729:20]

pointer for short. And that's like

[729:22]

having another foam finger that points

[729:23]

at the start of the list. So it's not

[729:25]

pointing at list. It points at whatever

[729:27]

list is pointing at, which means here.

[729:30]

Then I can print out the three pretty

[729:32]

easily. So long as I next update pointer

[729:34]

to point to the two, print it out. then

[729:36]

point it to the one, print it out, and

[729:38]

eventually I'm going to realize, oh, I'm

[729:39]

out of nodes because the end of this

[729:41]

list is null. So that's the idea I want

[729:43]

to implement now logically in code.

[729:46]

Create a temporary variable called

[729:47]

pointer. Set it equal to whatever the

[729:50]

list itself is. Print out the value,

[729:52]

update the pointer, print out the value,

[729:54]

update the pointer, print out the value,

[729:55]

update the pointer, realize it's null,

[729:58]

and stop. So in code, it's a relatively

[730:01]

small loop, even though the syntax is

[730:03]

still pretty new since we've only just

[730:05]

started playing with memory since last

[730:06]

week. But what I'm going to do is

[730:07]

exactly what I proposed. I'm going to

[730:09]

create a new pointer called ptr and set

[730:12]

it equal to the list itself. That's like

[730:14]

having another foam finger temporarily

[730:16]

pointing at the first element in the

[730:18]

list. Then what I'm going to do is say

[730:20]

while that temporary variable is not

[730:23]

null, go ahead and traverse the list.

[730:27]

What do I mean by that? Well, let's go

[730:29]

ahead and print out the current element

[730:30]

in the list by using percent i back

[730:33]

slashn and printing out whatever the

[730:35]

pointer is pointing at specifically its

[730:38]

number field. So that is follow the

[730:40]

arrow and print out the number. Then

[730:42]

inside of this loop, I'm going to update

[730:45]

after doing that my temporary variable

[730:47]

called pointer to be equal to pointer

[730:49]

arrow next. And that will have the

[730:52]

effect with just those few lines of code

[730:55]

of implementing precisely this idea. I

[730:58]

first set pointer equal to the list

[731:01]

which happens to point here first. I

[731:03]

then do my print f and then I update the

[731:06]

next field rather I update pointer to be

[731:09]

the value of pointer follow the arrow

[731:12]

next. So if this is ox123 for instance

[731:15]

that is what is now in oh sorry if this

[731:18]

is ox456 that is what's now in pointer.

[731:21]

So the arrow effectively looks there in

[731:23]

my for loop I print out with percent i

[731:25]

this number and then I go to the next

[731:27]

field follow the arrow and then set it

[731:30]

equal to rather whatever this pointer is

[731:34]

here ox789

[731:36]

set it equal to the pointer there. So I

[731:38]

effectively move the arrow there. Then

[731:40]

lastly, I update ptr to point to the

[731:43]

value of this next field which is null.

[731:46]

Which means effectively pointer itself

[731:47]

is null. Which means the for loop

[731:49]

cleverly

[731:51]

stops now because I was supposed to do

[731:53]

this whole loop while pointer is not

[731:56]

null but pointer is now null. And just

[731:59]

as an aside, if you prefer the semantics

[732:01]

of a for loop, there's nothing new here

[732:03]

per se. I can do this exact same thing

[732:06]

using a for loop simply as follows. And

[732:09]

it's a little tighter to implement as

[732:10]

follows. I can say for instead of int i

[732:13]

equals z in that old approach. I can

[732:15]

actually use pointers in a for loop like

[732:17]

this. For node star pointer equals the

[732:21]

start of the list. Keep doing something

[732:23]

so long as pointer does not equal null.

[732:25]

And on each iteration of this loop,

[732:27]

update the pointer to equal whatever the

[732:29]

pointer's own next field is. And then

[732:32]

inside of this for loop print out using

[732:35]

percent i back slashn the current

[732:37]

pointers number field semicolon. So here

[732:40]

is where again we see the equivalence of

[732:42]

for loops and while loops. What you can

[732:44]

do with one you can do with the other.

[732:45]

This is a little more elegant in that

[732:47]

you can express a whole lot of logic in

[732:49]

one line of the for loop. Frankly I do

[732:52]

think the first version is nonetheless

[732:54]

more readable. So let me undo undo undo

[732:56]

undo everything I just did. On the

[732:58]

courses website you'll see both of these

[732:59]

versions. This one's a little more

[733:01]

pedantic as to what it's doing step by

[733:04]

step. Okay, that two was a lot. Let me

[733:07]

pause here to see if there are any

[733:10]

questions.

[733:14]

And if you're feeling like that fire

[733:15]

hose like this is why we transition to

[733:17]

Python where all of this now gets swept

[733:18]

under the rug but is still happening

[733:21]

just not by us in a week. Questions?

[733:23]

Yeah.

[733:29]

Yeah, really good question. So we I I

[733:32]

here I've been preaching like we don't

[733:33]

want to lose memory. We don't want to

[733:34]

leak memory. And here I am fairly

[733:36]

extravagantly now spending twice as much

[733:38]

memory to maintain this data structure.

[733:40]

That's going to be among the themes with

[733:42]

all of the data structures we talk

[733:43]

about. If we want to gain some benefit

[733:46]

like dynamic growth and shrinking of the

[733:48]

data structure, you got to give me

[733:50]

something. And what you've got to give

[733:51]

me in this case is the ability to use

[733:53]

more space. Um, in a bit today and after

[733:55]

break in particular, we're going to

[733:56]

decide we'd really like these algorithms

[733:58]

to be faster. Well, that's fine, but

[734:00]

you're going to have to give me

[734:01]

something in return. You're going to

[734:02]

have to spend more space to make the

[734:04]

code faster. And so time and space and

[734:06]

financial cost and human time and any

[734:08]

number of other resources are all things

[734:10]

that you need to evaluate as a

[734:12]

programmer or a manager and decide which

[734:14]

is least andor most important to you.

[734:16]

And right now I don't care about space

[734:18]

as much as I care about the dynamism

[734:20]

that I'm trying to solve first. Other

[734:22]

questions on here? Yeah.

[734:29]

>> Yes. Why am I using pointer instead of

[734:31]

n? I Well, yes, I could reuse n at this

[734:35]

point. I deliberately chose to use

[734:36]

pointer for two reasons. One, I'm using

[734:38]

it for different reasons here. Um, two,

[734:41]

it's not necessarily the best idea to

[734:43]

use one variable here for a specific

[734:45]

purpose and then reuse the name down

[734:47]

here besides it's out of scope at this

[734:49]

point anyway. Um, so it just makes me

[734:51]

feel better that I have different

[734:52]

variables doing different things, but it

[734:54]

would not break if I did it your way.

[734:56]

Other questions?

[734:58]

Yeah. And back

[735:02]

>> are pointers temporary? Not necessarily.

[735:03]

Like the linked list we are building up

[735:05]

in memory exists because we are using

[735:08]

pointers to build this data structure

[735:10]

and to keep it intact for as long as the

[735:12]

program is running. My temporary

[735:14]

variables n and pointer ptr in this case

[735:17]

those are ephemeral and I'm only using

[735:18]

them to kind of stitch things together

[735:20]

temporarily.

[735:22]

A good question. All right. So let's now

[735:24]

motivate why we're spending so much time

[735:26]

sort of stitching these things together

[735:27]

so carefully. Well, here's our little

[735:29]

cheat sheet of common but not exhaustive

[735:31]

running times. Let's consider what the

[735:33]

running time is for some fairly basic

[735:35]

operations like inserting a number into

[735:37]

a linked list, maybe searching for a

[735:38]

number in a link list or traversing it

[735:41]

uh and also deleting ultimately numbers

[735:43]

in a linked list. So here is my list

[735:45]

initially completely empty. And suppose

[735:47]

I go ahead and insert the one, then I

[735:49]

insert the two, then I insert the three

[735:52]

using code like we just wrote. I love

[735:54]

this approach because even though it

[735:56]

looks a little scary at first, this is

[735:58]

probably the simplest way to implement

[736:00]

insertion into a linked list. Why?

[736:02]

Because I'm just constantly prepending

[736:04]

the next element. Prepending,

[736:06]

prepending, which means all of my hard

[736:08]

work is just here at the beginning of

[736:09]

the list. So even if this thing has a

[736:11]

thousand elements in it, I'm only

[736:13]

manipulating some pointers all the way

[736:14]

over here pictorially at the left, which

[736:16]

means it's pretty darn fast. So given

[736:19]

that definition in this picture, what

[736:21]

would you say the big O running time is

[736:22]

of insertion into a link list when using

[736:26]

my current implementation?

[736:28]

>> Big O of one. Why? Well, it's not

[736:30]

literally one step, but it is a constant

[736:32]

number of steps because if we literally

[736:34]

counted the lines of code I was

[736:35]

executing, it's a a few steps to sort of

[736:38]

point one thing up here, point the other

[736:40]

thing down here, then update the third,

[736:42]

and boom, we're done. In particular,

[736:45]

what my current code does not care about

[736:46]

is the whole length of this list. Why?

[736:48]

Because I'm never traversing the whole

[736:50]

thing for the insertion part. I am

[736:52]

obviously for the printing part, but for

[736:54]

the insertion, I'm just prepending again

[736:57]

and again. The downside though of this

[736:58]

approach is that the whole darn thing is

[737:00]

coming out backwards. I'm not doing

[737:02]

anything with regard to the ordering of

[737:03]

these elements, which means what's the

[737:05]

running time of search going to be? For

[737:07]

instance, if I tell you search for like

[737:10]

the number one, find it for me.

[737:13]

What's the running time going to be

[737:14]

there in big O?

[737:17]

Big O of yeah, big O of N because in the

[737:20]

worst case, it's going to be all the way

[737:22]

at the end. And we've seen this scenario

[737:23]

before. So, it's big O of N for

[737:26]

searching. It's definitely big O of N

[737:28]

for traversing or printing. But that

[737:30]

goes without saying. If you want to

[737:31]

print every element, obviously you have

[737:32]

to touch every one of the N elements.

[737:34]

But what about deletion? Suppose I want

[737:36]

to delete an element. That's going to be

[737:37]

in big O of

[737:40]

>> N.

[737:40]

>> Also N. Why? Because again in the worst

[737:42]

case it could be all the way at the end.

[737:44]

So only insertion as currently

[737:46]

implemented is bigo of one because we

[737:49]

are exercising full control over where

[737:52]

the new elements go irrespective of what

[737:54]

the actual values are. So things could

[737:56]

escalate quickly here if we do actually

[737:59]

want to start keeping things say in

[738:01]

sorted order because we can no longer

[738:03]

just naively plop things at the very

[738:05]

beginning of the list. I think we need

[738:07]

to start being a little more careful as

[738:09]

to where we put things. So in fact, even

[738:12]

though we're doing okay on insert right

[738:15]

now, we still have big O of N for the

[738:17]

searching and for the deletion, which we

[738:19]

won't do in code, um as well as of

[738:22]

course for traversal. So how else might

[738:24]

we go about building this list? Well,

[738:26]

let me propose that we could maybe

[738:28]

append to the end of the list. Let's try

[738:30]

that and see if it gets us anywhere

[738:31]

better. So here's my list initially,

[738:33]

completely empty, aka null. I go ahead

[738:35]

and insert the number one as before, but

[738:37]

now in this algorithm I'm going to

[738:38]

insert the number two and the number

[738:40]

three. So this is great because now by

[738:43]

chance it ended up beautifully in order.

[738:45]

But that's because I chose the numbers 1

[738:46]

2 3. But we'll come back to that detail.

[738:49]

Let's consider now what the running time

[738:51]

is of this algorithm of insertion using

[738:54]

appending to the list. What's the big O

[738:56]

not big O running time of insertion now?

[739:01]

Big O of N. So it's sort of strictly

[739:03]

worse because now it's always going at

[739:05]

the end. Now I could be a little smart

[739:06]

about it. I could just allocate another

[739:09]

pointer and just always have another

[739:10]

pointer pointing at the end of the list

[739:12]

just as I have a pointer pointing to the

[739:14]

start of the list. That's totally fine

[739:16]

if you're willing to spend one more

[739:17]

pointer which is a drop in the bucket. A

[739:19]

legitimate solution. But where I'd like

[739:21]

to go with this is let's maintain sorted

[739:23]

order no matter the order in which the

[739:26]

numbers are inserted. Whether it's 1 2 3

[739:28]

3 2 1 213 312 whatever order the human

[739:32]

types in the numbers I want to build the

[739:34]

structure out such that they always end

[739:35]

up in sorted order just so that my

[739:38]

contacts in my iPhone or my Android

[739:40]

phone for instance are sorted as

[739:41]

intended. So how do we go about doing

[739:44]

that? Well here we're still dealing with

[739:45]

some big O. Let's try this. Here's my

[739:47]

list initially empty. Now we the user

[739:50]

inserts person number two first. So it

[739:52]

ends up there. Then they insert number

[739:53]

one. I'd like it to go there. person

[739:56]

number four, it goes over there. And

[739:58]

then person number three, it ends up

[740:00]

here. Even though it's sort of obvious

[740:02]

with a piece of paper and pencil how to

[740:04]

stitch this together, this is now an

[740:06]

annoying number of logical steps because

[740:08]

there are so many opportunities where I

[740:10]

could screw up and orphan one or more of

[740:11]

these nodes. But let's consider the

[740:14]

scenarios that might we encount we might

[740:16]

encounter. Maybe we get lucky and it's

[740:18]

like an empty list and we just have to

[740:20]

insert one new node. That is trivial.

[740:21]

We've done that already. The two was

[740:23]

super easy to implement. The one could

[740:25]

be really easy to implement too because

[740:27]

that involves the prepending scenario

[740:29]

and we've seen that prepending is super

[740:30]

simple. So there's only two other

[740:32]

scenarios to consider appending if it's

[740:34]

a really big number and ends up at the

[740:36]

end and we've talked about but haven't

[740:37]

seen code for that. The annoying one I

[740:39]

dare say is going to be when the new

[740:40]

number belongs in the middle. But I

[740:42]

propose to think through it this way

[740:43]

because now you just have four problems

[740:45]

to solve not just one massive illdefined

[740:48]

problem. You've got scenarios in which

[740:49]

you want to insert a new node into an

[740:51]

empty list. you want to prepend the new

[740:54]

node into the beginning of the list,

[740:55]

append it to the end of the list or

[740:57]

somewhere in the middle. So that's like

[740:58]

four blocks of code in my program. I can

[741:02]

now sort of take the proverbial baby

[741:03]

steps and implement this bit by bit. And

[741:06]

to do this, let me propose that in a

[741:08]

moment I'll switch over to VS Code, but

[741:10]

uh sort of Julia Child style, I'm going

[741:12]

to open up a pre-made version of the

[741:15]

program that actually gives us a working

[741:18]

solution, albeit initially with some

[741:20]

bugs. So here we have out of the oven

[741:23]

this version of list C at the top of the

[741:26]

file I've got my same includes as before

[741:29]

I've got my same structure as before

[741:31]

here I've again got in main void I've

[741:33]

got the beginning of my list here

[741:35]

setting it equal to null and then for

[741:36]

the sake of discussion I'm going to

[741:38]

insert three values for this example 1 2

[741:41]

and three by allocating enough room for

[741:44]

a node setting it equal to n then I'm

[741:47]

going to make sure a sanity check that n

[741:48]

is not null and then I'm going to

[741:50]

populate this with the human's first

[741:53]

choice of values. So, let me scroll

[741:55]

down. But as such, there's nothing too

[741:57]

new just yet.

[741:59]

Here we have the lines of code in which

[742:02]

I'm getting an int from the user,

[742:04]

setting next equal to null, and then I'm

[742:07]

prepending no matter what per our

[742:09]

earlier version that we did on the fly

[742:11]

this new node to the list and then

[742:13]

updating the list to point to it. And

[742:15]

then down here, I'm printing the number.

[742:16]

So, this is where we left off, but this

[742:18]

is a pre-made version that's nicely

[742:19]

commented. It's on the courses website

[742:21]

for reference. What I'm not doing now is

[742:23]

intelligently prepending, appending, or

[742:26]

plopping the code in the middle. So, how

[742:28]

do we do that? Let's take a look at this

[742:30]

version of the code. So, everything thus

[742:32]

far is the same. And if I scroll down

[742:35]

besides the new comments, you'll see

[742:38]

that now I'm starting to make some

[742:40]

decisions after I have allocated the new

[742:43]

node and populated its number and next

[742:45]

field. As an aside, I don't strictly

[742:47]

need to initialize the next field to

[742:49]

null because eventually, as we've done

[742:50]

in every past example, I've updated that

[742:52]

next field anyway. However, because this

[742:55]

one might now end up at the end of the

[742:56]

list, and I just want to program

[742:58]

defensively, initializing pointers to

[743:01]

null before you're ready to assign their

[743:03]

value is a good thing in general. So,

[743:05]

here's the first of the questions I'm

[743:07]

going to ask myself. If the list into

[743:09]

which I am inserting this new node is

[743:12]

empty, so it's the beginning of the

[743:13]

story. Super easy. Just set the list

[743:16]

equal to the address of that new node,

[743:18]

and we're done. That's what happened

[743:20]

when I inserted a bit ago the number two

[743:22]

for the very first time. So indeed what

[743:24]

has just happened here is that now the

[743:27]

list previously empty contains only a

[743:29]

node containing two. However, thereafter

[743:32]

there was another scenario. So when we

[743:33]

moved on in our story and added the

[743:35]

number one to the list, well that

[743:37]

happened to end up at the beginning but

[743:38]

it could also end up at the end or in

[743:40]

the middle. So let's break down those

[743:41]

scenarios here too. So here if it is not

[743:44]

the case that the list is empty in that

[743:46]

if condition we're going to end up here

[743:48]

now in the else. What do I want to do

[743:51]

here? Well let's go ahead and for now in

[743:53]

this simplified version append it to the

[743:55]

end of the list so we can see that code.

[743:57]

How do I do this? Well I'm using a for

[743:59]

loop much like the one I had before

[744:01]

which just allows me to traverse the

[744:03]

existing list whether it has one node or

[744:05]

many. And I'm gonna ask a question. If

[744:08]

following the current nodes pointer

[744:10]

field, next field leads me to null, aka

[744:14]

the end of the list. Okay, let's go

[744:16]

ahead and update the end of the list to

[744:19]

actually equal the new node. So in other

[744:22]

words, if I'm sort of following

[744:23]

following following all of the arrows

[744:24]

and I reach a node whose next field is

[744:26]

null, no problem. Update that next field

[744:29]

to point to the new node I want to

[744:31]

insert. Irrespective of the values, I

[744:34]

just want to append this node. no matter

[744:36]

what. And then I want to break out of

[744:37]

the code. Then at the bottom of this

[744:39]

version of the program, it's all quite

[744:40]

the same, printing out the numbers using

[744:42]

the for loop version of my code from

[744:44]

before instead of the while loop, but

[744:45]

they're equivalent. But what I did do in

[744:47]

advance in baking this version of the

[744:49]

program is also go through the motions

[744:51]

of freeing every one of the nodes

[744:54]

afterward, but we'll come back to that.

[744:56]

So this version of the code, just to be

[744:58]

clear, only appends nodes to the list.

[745:01]

It's still not treating things in order.

[745:03]

But we've now seen two of the scenarios

[745:05]

plucked off. The list is empty or it has

[745:08]

numbers and we want to put something at

[745:09]

the end. So let me propose now that I

[745:12]

take out of uh our distribution code

[745:15]

another version of this program that

[745:17]

does that and a bit more. I'm going to

[745:19]

go ahead and open up in just a moment a

[745:22]

new and improved version of list.c. And

[745:25]

now it looks almost the same at the top.

[745:28]

Scrolling down. Scrolling down.

[745:30]

Scrolling down, here's some now familiar

[745:32]

code. If the list is empty, do that

[745:34]

simple thing as before and just prepend

[745:36]

it. Uh rather just set it equal to the

[745:39]

list. But here is now where we're adding

[745:41]

some inequality. So if the number in

[745:44]

question belongs at the beginning of the

[745:46]

list. So if the number in the new node n

[745:51]

is less than the number in the current

[745:54]

list which is presumed to be the first

[745:57]

node at the moment then go ahead and

[745:59]

update the new node's next field to

[746:02]

point at the existing list and then

[746:05]

update the list to point at this new

[746:07]

node thereby giving us from two in the

[746:10]

list to one and two in the list. To be

[746:14]

clear, if I go back to VS Code here,

[746:15]

what's happened here is because one is

[746:18]

less than two, of course, I'm going to

[746:21]

update the new nodes next field to point

[746:24]

to the list. What does this mean? Well,

[746:25]

the new node at this point in the story

[746:27]

is the new node for the number one

[746:28]

because that's the second thing we're

[746:30]

inserting. I'm going to update its next

[746:33]

field to be whatever the list a moment

[746:36]

ago was already pointing at. So this is

[746:38]

the after effect but a moment ago list

[746:41]

was pointing at only the two. So now the

[746:44]

next field of the one points at the two

[746:47]

and then lastly here in this line I

[746:49]

update the list pointer to be the

[746:51]

address of that new node. And here's

[746:54]

where I'll wave my hand a little bit

[746:55]

today because it starts to escalate

[746:57]

quickly. It's useful and it might very

[746:58]

well be useful for problem set five in

[747:00]

particular, but I think more healthily

[747:02]

reviewed step by step at a slower pace.

[747:05]

Here is where I'm asking myself, all

[747:07]

right, if it's not the only element in

[747:09]

the list and it doesn't belong at the

[747:11]

beginning of the list, well, it belongs

[747:13]

somewhere later in the list, which gives

[747:14]

me two final scenarios. Let's figure out

[747:17]

which scenario we're in. Let's use this

[747:19]

for loop to iterate over all of the as

[747:20]

as many of the nodes in the list as we

[747:22]

need to. If we get all the way to the

[747:25]

end, because our pointer variable now

[747:27]

equals null, it's like following the

[747:29]

arrows, following the arrows, and maybe

[747:30]

we're trying to insert the number five.

[747:32]

I've already hit the number four. I've

[747:33]

hit null. five belongs at the end. So

[747:36]

here we have our promised append code

[747:39]

which is exactly the same as before but

[747:41]

now I'm doing it conditionally if I've

[747:43]

indeed found my way to the end of the

[747:45]

list. And then lastly, let me scroll

[747:47]

down just a little bit. If it's not the

[747:49]

case that the list is empty and it's not

[747:51]

the case that the new node belongs at

[747:53]

the beginning and it's not the case that

[747:55]

the new node belongs at the end, I'm

[747:57]

just somewhere in the middle of the list

[747:59]

because the new number I'm inserting is

[748:01]

less than the one I'm looking at here.

[748:04]

And it's okay to use two arrows, but

[748:06]

I'll wave my hands at that for now.

[748:08]

These three lines, two pointer

[748:10]

manipulations and a break is what's

[748:12]

going to stitch together that three in

[748:14]

between the two and the four. And let me

[748:17]

propose for lecture sake, take this on

[748:19]

faith that this collectively does stitch

[748:21]

things together properly. But I do think

[748:23]

as you'll see in problem set five, it's

[748:24]

a much better exercise to think through

[748:26]

a little more carefully step by step

[748:28]

because there's just a lot of

[748:30]

fine-tuning of these pointers together

[748:32]

and the order of operations does matter.

[748:34]

But at the very end of this program,

[748:36]

notice this is kind of mindless even

[748:37]

though the syntax is undoubtedly less

[748:39]

familiar. Here is how just like

[748:41]

traversing the whole list to print it

[748:43]

out, we can similarly do one more pass

[748:45]

over the linked list and free every one

[748:49]

of the nodes. But notice it's not quite

[748:51]

as simple as just saying free the whole

[748:53]

list. Free is not that smart. Maloc is

[748:56]

not that smart. And even though you have

[748:58]

called maloc one, two, three times, you

[749:00]

have to really call free. You have to

[749:02]

call free one, two, three times. You

[749:05]

can't just pass at the beginning of the

[749:06]

link list and say you figure out what to

[749:08]

delete cuz it has no idea what a linked

[749:10]

list is or what your data structure

[749:11]

actually is. So the reason that this

[749:14]

loop is a little complicated is that

[749:16]

what I'm doing with these three lines is

[749:19]

essentially traversing my list

[749:22]

and making sure that I have a pointer

[749:25]

that when I'm ready to delete the three,

[749:26]

the one, I have a pointer pointing at

[749:28]

the two and then I free the one. I

[749:31]

update my pointer to point at the three

[749:32]

and then I delete the two. I update my

[749:34]

pointer to point at the four, then I

[749:36]

delete the three, and then I delete the

[749:38]

four. So, there's a bit of trickery

[749:40]

involved in making sure you don't orphan

[749:43]

things step by step.

[749:47]

Okay, that was a lot. Let me pause here

[749:50]

to see if there are in fact any

[749:53]

questions, even though we're

[749:55]

deliberately waving our hands at some of

[749:56]

those details.

[750:00]

Questions on this? Now, let me add one

[750:04]

final flourish. If we were to really

[750:07]

quibble over this, I mean, my god, we're

[750:08]

up to 80 lines of code already just to

[750:10]

implement the numbers one, two, three,

[750:11]

four. But there are some subtle bugs in

[750:15]

here at the moment. So, for instance,

[750:18]

suppose that something goes wrong with

[750:20]

maloc inside of this for loop here. And

[750:23]

suppose that it's not your first

[750:24]

iteration, something goes wrong on maybe

[750:26]

the second or the third iteration. Why

[750:29]

is this error check suddenly bad as I've

[750:33]

implemented it?

[750:36]

Yeah,

[750:39]

I didn't free the memory from the

[750:40]

previous iteration. So this is where

[750:42]

like oh like memory management starts to

[750:43]

get really annoying because if you do

[750:45]

want to practice what I've been

[750:46]

preaching which is free any memory

[750:48]

you've allocated and you've already

[750:49]

allocated one maybe two nodes because

[750:51]

maloc is again failing maybe at the last

[750:54]

iteration here you have to somehow go

[750:56]

back and free all of that and that's

[750:57]

fine like we have code at the bottom of

[751:00]

my file here which could traverse

[751:02]

through the existing list and just free

[751:04]

it all. So I could just copy paste that

[751:06]

code, put it into my if condition and

[751:08]

then run that code too to delete the

[751:10]

whole list. But at this point if you're

[751:12]

copying and pasting you're probably

[751:13]

doing something wrong. And so let me

[751:14]

propose as a final version of this just

[751:16]

for your reference later in the ninth

[751:18]

and final in version nine of this file

[751:20]

here zero indexed what we have. Give me

[751:23]

one second to just make a quick copy and

[751:25]

copy it over in list 9. see our last

[751:28]

version of this. We have the following

[751:31]

whereby now in my function uh in my main

[751:36]

function I have the exact same code as

[751:38]

before but I've taken the liberty of

[751:40]

implementing an unload function so that

[751:42]

I can call it here as well as at the

[751:45]

bottom of this main function. So I can

[751:48]

unload it here or unload the list there.

[751:50]

And all I've done now is in good form in

[751:53]

terms of design just implement the

[751:55]

notion of deleting a linked list in its

[751:56]

own function. So I could call it any

[751:58]

number of times from any number of

[752:00]

places. But just so you've seen how I

[752:02]

might do that there. All right. So let's

[752:04]

ask the question after all of this. What

[752:07]

is the running time of inserting into a

[752:10]

linked list?

[752:13]

Big O of

[752:16]

say a little big O of

[752:19]

>> N. Damn it. Like that's no better. All

[752:21]

right. What's the running time of

[752:22]

searching a link list?

[752:25]

>> Big O of N. Damn it. Uh what's the

[752:27]

running time of deleting from a link

[752:28]

list?

[752:29]

>> Big O of N. So like everything is

[752:31]

literally big O of N. So there's the

[752:32]

price we've suddenly paid. We have an

[752:35]

hour after we started with arrays gotten

[752:36]

to the point where we can dynamically

[752:38]

grow in a linked list and I dare say

[752:40]

even though we've not done it and won't

[752:42]

do it today, shrink the link list by

[752:44]

freeing things that we don't need. So we

[752:45]

have the dynamism and we can make more

[752:47]

efficient use of memory even if it's

[752:48]

very fragmented and there's a few bytes

[752:50]

here a few bytes there but we've paid

[752:52]

this price because with arrays recall

[752:54]

even our phone book example we at least

[752:55]

had binary search the running time for

[752:57]

which was big O of log so my god not

[753:00]

only are we spending more space the darn

[753:02]

thing is slower surely this is not how

[753:04]

our phone contacts are implemented

[753:05]

surely this is not how stacks and cues

[753:07]

are always implemented and indeed it's

[753:09]

not this is just going to be a stepping

[753:10]

stone to now doing a sort of mashup of

[753:13]

data structures whereby we take the best

[753:14]

features of arrays, the best features of

[753:17]

link list, mash them together to get new

[753:19]

and improved data structures. But for

[753:21]

that, we're going to have to have some

[753:22]

cookies first and we'll come back in 10

[753:23]

minutes. Cookies are now served.

[753:26]

All right, we are back. So, let's recap

[753:29]

how we got here and why. So, we started

[753:31]

with our old friends arrays, which we

[753:32]

introduced in week two. And recall that

[753:34]

the whole appeal of arrays was that one,

[753:36]

as all things go, like relatively

[753:38]

simple, certainly now in retrospect, but

[753:40]

more importantly, they were really darn

[753:42]

fast. Like arrays in so far as they are

[753:45]

stored backtoback contiguous in memory

[753:47]

means that we could do very simple

[753:48]

arithmetic recall to like fi figure out

[753:51]

the length of it and then divide by two

[753:52]

to get the middle divide by two again to

[753:54]

get the middle of the middle and so

[753:55]

forth. And even though we might have to

[753:56]

deal with a little bit of rounding

[753:58]

arrays lent themselves to binary search

[754:00]

and thus logarithmic time so big O of

[754:02]

login. But today I claim that the

[754:05]

downside of arrays is that you have to

[754:06]

decide in advance how big you want it to

[754:08]

be. And if you guess wrong and it's too

[754:10]

small how much uh memory you ask for,

[754:12]

you then have to reallocate memory. And

[754:14]

that's fine. It's solvable with maloc or

[754:16]

realloclock. But it's going to take some

[754:18]

amount of time to copy all of the old

[754:20]

memory into the new memory. Whether you

[754:23]

do it with a for loop or mal realloclock

[754:25]

does it for you. Meanwhile, we only did

[754:27]

it with like three values, maybe four.

[754:29]

But imagine it being 3 million values

[754:31]

that you now need to allocate more space

[754:33]

for. You're going to waste a huge amount

[754:34]

of time copying 3 million values from

[754:36]

the old location to the new. And so

[754:38]

that's just generally not very

[754:40]

appealing. And so that motivated our

[754:42]

whole discussion of linked lists whereby

[754:44]

now we can create a more dynamic data

[754:46]

structure whereby we only allocate

[754:48]

memory as we need it. So we don't have

[754:50]

to worry about underestimating or

[754:52]

overestimating and therefore wasting

[754:54]

memory. We can just go bit by bit for

[754:56]

each new value. We allocate another

[754:58]

node, another chunk of memory, and the

[754:59]

thing just grows and grows and grows.

[755:01]

But as we saw just before break, the

[755:03]

downside is even though we're avoiding

[755:05]

the inefficiency of having to move stuff

[755:07]

around in memory, once allocated, the

[755:09]

nodes can stay where they are and we

[755:10]

just update our pointers. All of our

[755:13]

running times for searching, inserting

[755:15]

new elements, deleting old elements

[755:17]

would seem to be big O of N. But why was

[755:20]

that? Well, in the context of a linked

[755:22]

list, recall that it might look a little

[755:25]

something like this, whereby we have a

[755:27]

pointer called list pointing to maybe

[755:29]

four values like this. And suppose that

[755:31]

we do want to uh search for a value.

[755:34]

Now, it's nice because in our latest

[755:36]

version of this linked list, it was

[755:38]

sorted from smallest to largest. And

[755:39]

that was always a precondition of doing

[755:41]

binary search. But even though it's

[755:43]

obvious to our human eyes where the

[755:44]

middle is, it's like roughly over there.

[755:46]

How is the computer going to figure that

[755:48]

out? is how is your code that you write?

[755:50]

Well, unfortunately, the way we've

[755:52]

stitched a link list together with these

[755:53]

pointers is if you want to find the

[755:55]

middle, you can, but you got to start at

[755:57]

the beginning, traverse the whole thing

[755:59]

to figure out how long it is, then do it

[756:01]

again, and stop halfway through once you

[756:04]

know what the halfway point roughly is.

[756:06]

Then, if you want to search the middle

[756:08]

of the middle, you've essentially got to

[756:09]

do that whole process again. And so, now

[756:12]

just to use binary search, you need to

[756:14]

spend big O of N steps just to even find

[756:16]

the middle. Now, if your mind is kind of

[756:18]

spinning and you're like, well, maybe I

[756:19]

could just kind of cheat and use a

[756:21]

pointer to always point to the middle of

[756:22]

the list. Totally fine. You can spend in

[756:25]

some additional space to remember the be

[756:26]

the middle of the list, the end of the

[756:28]

list. But where does that stop? What if

[756:30]

with binary search, you go not just to

[756:32]

the middle, but the middle of the

[756:33]

middle, the middle of the middle of the

[756:34]

middle, the middle? Are you going to

[756:35]

keep around a pointer to every element?

[756:37]

Because if you do, you're essentially

[756:39]

back to an array if you've got one

[756:41]

location for every other location. So it

[756:43]

just kind of devolves into a mess. Even

[756:45]

though there's some minor optimizations

[756:47]

we could in fact make. In fact, we

[756:48]

didn't talk about it yet. But one common

[756:50]

alternative to a singly linked list,

[756:52]

which ours is, it's linked with a single

[756:55]

pointer from node to node. Uh computer

[756:57]

scientists also like to talk about

[756:58]

doubly linked lists where there's arrows

[757:00]

going both directions, which actually

[757:02]

would have simplified some of the last

[757:04]

code that we looked at because I don't

[757:06]

have to look ahead to figure out what I

[757:09]

want to free or what and where I want to

[757:11]

insert some value. But that too doesn't

[757:13]

fundamentally change the speed. It just

[757:14]

makes your code a little easier to

[757:16]

write. So in short, with link list, we

[757:18]

get dynamism. We can now grow and shrink

[757:20]

things without wasting time copying. But

[757:22]

we've lost hold of our binary search.

[757:25]

And that was very appealing as far back

[757:27]

as week zero when we wanted to do

[757:29]

something quite quickly. So let's see if

[757:32]

we can't make some mashups now. take

[757:34]

some arrays, take some link lists,

[757:36]

literally mash them together into a sort

[757:38]

of Frankenstein data structure and see

[757:40]

if we can't get some of the speed of

[757:42]

arrays, but the dynamism of linked

[757:44]

lists. And so I give you trees. If you

[757:46]

think about in your mind's eye what a

[757:48]

family tree looks like where you

[757:49]

typically have some parents and then

[757:51]

some children and some grandchildren and

[757:52]

so forth. It's this sort of treelike

[757:54]

structure even though by convention it's

[757:56]

drawn top down instead of bottom up like

[757:58]

trees in the real world. But the top of

[758:00]

that family tree uh we're going to call

[758:02]

the root of the tree. It just so happens

[758:04]

to indeed grow down. But a tree is a

[758:06]

very common data structure and it's

[758:08]

interesting visav arrays and link lists

[758:10]

in that it's the first of our

[758:12]

two-dimensional data structures. An

[758:13]

array is effectively just a single

[758:16]

dimension along from left to right. A

[758:18]

link list is essentially the same. Even

[758:20]

though in reality it might be up, down,

[758:21]

left, and right in memory. It's still

[758:22]

just one thing stitched together in a

[758:24]

single dimension. A tree adds now a

[758:26]

second dimension. And specifically

[758:28]

useful for us is what we're going to

[758:29]

call binary search trees, which is

[758:33]

spoiler going to give us back the

[758:34]

ability to use binary search. But we're

[758:36]

going to store the data a little more

[758:38]

cleverly than in arrays alone. Instead

[758:40]

of storing our data in one dimension in

[758:42]

a binary search tree, we're going to

[758:44]

store in effect in two different

[758:46]

dimensions. And that's going to gain us

[758:47]

some speed. So here for instance is an

[758:49]

array of seven numbers as we might have

[758:52]

seen it back in week uh two when we

[758:54]

first introduced arrays. Let me draw our

[758:56]

attention to the middle element and then

[758:58]

to the middle of the middles and then

[759:00]

the middles of the middles of the

[759:01]

middles just by color coding them

[759:02]

slightly differently. If I were to run

[759:04]

binary search on these numbers or the

[759:06]

lockers that we had on the stage a few

[759:08]

weeks back, I would jump to the middle

[759:10]

then the middle of the middle and so

[759:11]

forth. The catch though is that

[759:13]

implementing it as an array, it's not

[759:15]

going to be very easy to add new values.

[759:17]

Why? Because if I want to add the number

[759:18]

eight or nine or 10, I might get lucky

[759:21]

and there might be room in memory here,

[759:23]

but I might get unlucky. In which case

[759:24]

then we got to start jumping through

[759:25]

those hoops of maloc or realloclock and

[759:27]

all and and copying all of this memory

[759:29]

to a new location which is doable. We

[759:31]

solved it in code but it's going to be

[759:33]

slow for larger data sets. So can we

[759:35]

avoid that? Well maybe I deliberately

[759:37]

colorcoded things like this because let

[759:39]

me propose that instead of storing these

[759:41]

seven values in an array, let's store

[759:44]

them in a family treel like structure

[759:46]

like this where I just kind of exploded

[759:48]

them vertically on the y-axis here. So

[759:50]

now the middle element, the fours at the

[759:52]

top of this tree. The four, the two and

[759:55]

the six which were the middle elements

[759:57]

after the middle are going to be to the

[759:59]

left and right of the four. And then

[760:00]

these leaf nodes so to speak. We borrow

[760:02]

a lot of vernacular from the world of

[760:04]

actual trees. These are leaves in the

[760:06]

sense that they themselves have no

[760:07]

children. They're at the edge of the

[760:09]

data structure are going to be the

[760:10]

middles of the middles of the middles.

[760:12]

But all of the data is still there. I've

[760:14]

just exploded it from one to two

[760:16]

dimensions. And let me propose that now

[760:17]

that we have this technique of using

[760:18]

pointers which we use with CC code but

[760:21]

you can depict them pictorially with

[760:23]

arrows. Let me propose that we stitch

[760:25]

together these seven values in memory

[760:28]

using a bunch of pointers whereby now

[760:30]

each of these nodes drawn as a single uh

[760:32]

square for simplicity is going to have

[760:34]

not only an integer associated with it

[760:37]

and not just one pointer but per these

[760:39]

arrows as many as two arrows associated

[760:42]

with it. So our nodes are about to go

[760:44]

from data structures with two things, a

[760:46]

number and a pointer to three things, a

[760:49]

number and two pointers for the left and

[760:52]

right child respectively. And I dare say

[760:55]

now that we have a two-dimensional tree

[760:57]

data structure, consider how you might

[761:00]

find a number therein. Suppose I'm

[761:02]

searching for the number five. Well, I

[761:04]

start at the root of the data structure.

[761:05]

And even though our human eyes obviously

[761:07]

know where we're going, notice what's

[761:08]

important about this binary search tree.

[761:11]

If I go to the root of the no of the

[761:12]

tree, I see the four. Four is obviously

[761:14]

less than five. What does this mean?

[761:17]

This means I can divide and conquer the

[761:19]

problem right off the bat. I know that

[761:20]

five is going to be to the right of this

[761:23]

node, which means effectively, if you

[761:25]

think in your mind's eye about snipping

[761:26]

the branch there, I have just haved the

[761:29]

problem essentially like dividing the

[761:31]

phone book in half. Why? Because I don't

[761:33]

even waste time looking at this subtree,

[761:35]

the left child of the four element.

[761:38]

Meanwhile, if I go from the root to its

[761:41]

right child here, I see the number six.

[761:43]

Five, of course, is less than six. So,

[761:45]

this is effectively like snipping off

[761:47]

that child because I don't need to go

[761:49]

further there because I know a smaller

[761:50]

element is going to be in this

[761:52]

direction. And that's the key property

[761:53]

of a binary search tree. It's not just a

[761:56]

family tree with numbers all over the

[761:58]

place. They follow a certain pattern.

[762:01]

every element is going to be greater

[762:04]

than its left child and less than its

[762:07]

right child assuming you don't have

[762:09]

identical values and that property is

[762:11]

actually a recursive one to borrow

[762:13]

terminology from a couple of weeks back

[762:14]

recall that a recursive function is one

[762:16]

that calls itself a recursive data

[762:18]

structure like the pyramid in Mario is a

[762:21]

data structure that can be defined in

[762:23]

terms of itself well binary search tree

[762:25]

is a recursive property in so far as if

[762:27]

it applies to this node it also applies

[762:29]

to this node case point two is greater

[762:31]

than one but it's also less than three.

[762:34]

It's true over here. Six is greater than

[762:36]

five but less than seven. And it's

[762:38]

technically true of the leaf nodes

[762:39]

because the definition is at least not

[762:41]

violated there because they don't even

[762:42]

have children themselves. So this is a

[762:45]

binary search tree because of that

[762:47]

pattern. So this then invites the

[762:48]

question, well how long does it take us

[762:50]

to search for a value in a binary search

[762:53]

tree? Well, if the number is five, it's

[762:55]

going to take me one two steps. But if

[762:57]

there's n elements here, can someone

[762:59]

want to generalize that either

[763:00]

mathematically or just instinctively?

[763:02]

Big O of

[763:04]

log n. And even if you're not quite sure

[763:06]

how the math works out, anytime you take

[763:08]

a data set and you have it, have it have

[763:10]

it, we're talking about log base 2 of n

[763:12]

again. And indeed, that's going to

[763:14]

describe the height of this tree. The

[763:15]

height of this tree is essentially log

[763:17]

base 2 of n because if n is seven, it's

[763:21]

going to give me uh essentially two when

[763:24]

we round appropriately. If we round up,

[763:25]

if we've got eight elements, log base 2

[763:27]

of 8 2 the 3r. So that means three. So 1

[763:29]

2 3. It kind of works out even if I'm

[763:31]

doing that a bit quickly. The height of

[763:33]

this tree is log base 2 of n aka bigo of

[763:37]

login. How long does it take to insert?

[763:39]

I think it's going to take login because

[763:41]

I can insert over here or over here or

[763:43]

over here depending on where the number

[763:44]

goes. Uh how long does it take to

[763:46]

delete? I'll claim it's going to take

[763:47]

about the same. So wow, we're back in

[763:49]

business. I've got now the ability to

[763:52]

grow and shrink my data structure

[763:53]

because if I want to insert the number

[763:55]

eight, it's going to go right there. If

[763:56]

I want to insert the number like 5.5, I

[763:58]

I can see where I would put it. It's

[764:00]

going to be easy to add new nodes by

[764:02]

just updating the pointers without

[764:04]

copying everything in memory like we had

[764:05]

to for arrays. But there is a downside

[764:09]

here. I got to concede something. What

[764:12]

am I what price am I paying? What's the

[764:14]

trade-off here to gain that dynamism and

[764:16]

that speed? But

[764:19]

>> each individual node takes more memory.

[764:21]

>> Yeah, I'm literally using three times as

[764:23]

much memory now because even though it's

[764:24]

not depicted here explicitly, each of

[764:26]

these squares represents an integer and

[764:29]

a pointer and another pointer. So that's

[764:31]

like 16, that's like 20 bytes at this

[764:33]

point of memory instead of just four

[764:35]

bytes for each of the integers in an

[764:37]

array. Nowadays though, space is pretty

[764:39]

cheap. We all have very large Dropbox

[764:40]

folders, iCloud folders, and the like.

[764:42]

So it's not really a big deal to use

[764:44]

that many more bytes. Certainly not a

[764:45]

big deal for seven numbers, but if it's

[764:47]

seven million numbers, maybe this isn't

[764:49]

the best data structure to use, even if

[764:51]

speed is important. You got to decide

[764:53]

ultimately based on your actual use case

[764:55]

what matters more. So in short, a binary

[764:59]

search tree you can kind of think of as

[765:00]

an amalgam of or rather a variant of a

[765:04]

linked list except that every node has

[765:06]

as many as two pointers instead of one,

[765:08]

which is what gives us now this this

[765:10]

second dimension. And in fact, this

[765:12]

translates pretty nicely to code. In

[765:14]

fact, if we consider how we implemented

[765:16]

in a linked list a node, recall that it

[765:19]

looked like this where you got a number

[765:20]

in each node and a pointer to the next

[765:22]

element in the linked list. Well, I

[765:24]

think for a binary search tree, we can

[765:26]

sort of borrow this as inspiration, make

[765:28]

a little more room because we need two

[765:29]

pointers instead of one. And I'm just

[765:31]

going to call the left child the left

[765:32]

pointer and the right pointer. But here

[765:35]

is the three times as much space give or

[765:37]

take because I now have three elements

[765:39]

associated. Two pieces of metadata and

[765:41]

one piece of data that I actually care

[765:43]

about to stitch this thing here

[765:45]

together. All right. Well, if this is

[765:47]

the data structure there, how could I

[765:49]

implement this in code? Well, here's

[765:51]

where recursion again comes into play.

[765:53]

The fact that a binary search tree is

[765:55]

recursive in nature in that what you say

[765:57]

about this node about it being greater

[765:59]

than the left child and less than the

[766:01]

right child can be said of this node and

[766:02]

this node and this node and this node.

[766:04]

You can leverage that beautifully in

[766:06]

code like this. So suppose I'm

[766:08]

implementing a search function in C

[766:10]

whose purpose in life is just to say yes

[766:12]

or no true or false the number you're

[766:14]

looking for is in this tree which might

[766:16]

be a useful thing to uh check uh in a in

[766:19]

an algorithm. Search is going to take

[766:22]

two arguments. I propose the number

[766:23]

you're searching for and a pointer to

[766:26]

the tree. That is the root of the tree

[766:28]

initially. So how do you actually

[766:30]

traverse this thing in C code? Well, we

[766:32]

can pluck off the the easy case first.

[766:34]

The base case if the tree itself is

[766:36]

null. Like if you hand me nothing, I'll

[766:37]

give you your answer right now. False.

[766:39]

Like there's no number here if the tree

[766:42]

is empty. So that's easy. Otherwise, if

[766:44]

the number you're looking for is less

[766:46]

than the number in the current node. So

[766:50]

tree is what's passed in a pointer to

[766:52]

the root. So if you follow the arrow,

[766:53]

you can get inside of that value and see

[766:56]

its number. If the number you're looking

[766:57]

for is less than that, okay, you want to

[767:00]

what? Snip off the right tree and dive

[767:02]

down the left subree. So you search the

[767:06]

trees left child for the same number.

[767:09]

Else, if the number you're looking for

[767:10]

is greater than that number, you search

[767:13]

for the trees right child for that same

[767:16]

number. And the fourth and final

[767:17]

scenario is what? Well, if the number

[767:19]

you're looking for equals the number in

[767:21]

the current node, you got it. Return

[767:24]

true. And if you're uh recall some of

[767:27]

our past design discussions, this is

[767:28]

sort of a waste of everyone's time to

[767:30]

ask this question explicitly. Let me

[767:31]

tighten this up design-wise because

[767:33]

there's only four possible scenarios.

[767:35]

Either there's nothing there, it's to

[767:36]

the left, it's to the right, or you

[767:38]

found it. It's right there. So whether

[767:41]

or not you agree at this point in your

[767:43]

programming career, like there is a

[767:45]

beauty to this code that most

[767:47]

programmers would claim is here and that

[767:49]

it's so relatively elegant whereby

[767:51]

you've defined what the function is.

[767:52]

You've got this base case which is

[767:53]

arguably one of the clunkiest parts. But

[767:55]

the fact that you can just check a value

[767:57]

here and then traverse the exact same

[768:00]

structure but a subset of it by

[768:01]

traversing the left subree or the right

[768:03]

subree is like a beautiful application

[768:05]

of recursion. And it allows you to uh

[768:08]

search for this thing no matter where it

[768:09]

is in the computer's memory. Questions

[768:12]

then on this idea of a binary search

[768:13]

tree or this actual code thereof.

[768:16]

>> And if you don't ask the question, if

[768:18]

the number is not there,

[768:22]

>> uh, nope. If the number is not there, we

[768:24]

recall. So, if we get all the way to the

[768:27]

bottom of the tree such that now I'm at

[768:28]

one of those leaf nodes and that's not

[768:30]

the number I'm looking for, such that

[768:32]

there's no left child left, no right

[768:33]

child left, this conditional is going to

[768:36]

kick in and I'm going to return false.

[768:38]

But if I find it along the way, whether

[768:40]

it's at the top of the tree or somewhere

[768:41]

in the middle or among the leaves, I

[768:43]

will eventually return true.

[768:46]

Good question. And to be clear, even

[768:48]

though I'm calling this a tree, that's

[768:50]

true certainly for the first time I call

[768:51]

this function because I'm passing in a

[768:52]

pointer to the whole tree structure. But

[768:55]

if you think about it, what's the left

[768:56]

subree and the right subree? It's just a

[768:58]

smaller tree. It's like a baby tree

[769:00]

that's attached to this parent node, so

[769:02]

to speak. So it's perfectly reasonable

[769:03]

to just call the search function with

[769:05]

that child because it in turn has a

[769:08]

whole subree below it or the right child

[769:10]

which has the whole subree below it

[769:13]

instead. All right. So I like this

[769:16]

direction. We've now kind of improved

[769:18]

upon link list. We've gained back some

[769:20]

of our performance because we can now

[769:22]

find something with big O of log and

[769:24]

time. I don't love the fact that I'm

[769:26]

using three times as much memory

[769:28]

roughly. That feels like kind of a high

[769:29]

price to pay just to speed things back

[769:31]

up. But let's consider whether or not

[769:33]

this thing is actually going to work as

[769:34]

the data structure gets bigger and

[769:35]

bigger as well. So it looks beautiful

[769:37]

here as written and that's deliberate

[769:38]

because I drew the picture like this and

[769:40]

it's got seven elements in it. But how

[769:42]

did we get to seven elements? Let's

[769:43]

start from the beginning. Suppose that

[769:44]

the tree is initially empty and suppose

[769:46]

that a human using get int or some other

[769:49]

technique inserts the first element into

[769:51]

the list like the number two and the

[769:53]

goal is to maintain the binary search

[769:55]

tree property which means you got to

[769:57]

have it greater than left child less

[769:59]

than the right child. So suppose the

[770:00]

human using get int or some other

[770:02]

technique next gives me the number one

[770:04]

no big deal I plop it right there as the

[770:05]

left child suppose they give me the

[770:07]

number three next no big deal it goes

[770:09]

right there I have very deliberately

[770:11]

manipulated this story to work out

[770:13]

beautifully such that the tree is

[770:14]

smaller but it's still a binary search

[770:17]

tree and nicely balanced so to speak but

[770:20]

what if the user for whatever reason

[770:21]

just gives me a more perverse sequence

[770:23]

of inputs like the worst case scenario

[770:26]

to give me three elements and suppose

[770:27]

they give me one first Okay, that's the

[770:30]

root. Then they give me two. Okay,

[770:32]

that's cool. That's like the right

[770:34]

child. But what if they then give me

[770:36]

three? Well, to maintain that binary

[770:38]

search property, the three has to go

[770:40]

over here. Suppose perversely then they

[770:42]

didn't give me four, then five, then

[770:44]

six. Imagine in your mind's eye where

[770:45]

this story is going. What have I

[770:47]

accidentally created in memory? Then a

[770:50]

link list, which is like bad for all the

[770:53]

reasons we discussed before the break

[770:54]

because even though we're getting the

[770:55]

dynamism, it's devolving into big O of

[770:57]

N. So I've kind of manipulated the

[771:00]

situation here with their original

[771:02]

example with seven seven elements and

[771:03]

then three elements by making sure that

[771:05]

they were inserted in just the right

[771:07]

order. Because unless you are clever

[771:10]

about how you build the tree in memory,

[771:12]

it could very well devolve from a tree

[771:14]

in two dimensions into actually a linked

[771:17]

list in one dimension. And now this is

[771:19]

just a long and stringy tree that does

[771:21]

not violate the binary search tree

[771:23]

definition, but it is surely not

[771:25]

balanced in this case. Now, as an aside,

[771:27]

if you take higher level languages and

[771:29]

data structures and algorithms, there's

[771:30]

many different alternatives to binary

[771:33]

search trees that actually have baked

[771:36]

into the algorithms a little bit of

[771:38]

rejiggering of the structure so that

[771:40]

really as soon as you insert this three,

[771:43]

you spend a little bit more time and

[771:44]

clean the situation up. And essentially

[771:46]

what you do is like pivot the thing

[771:47]

around this way so that two becomes the

[771:50]

new route and then one hangs off of it

[771:52]

and three still hangs off of it. So with

[771:54]

each insertion or deletion, you

[771:56]

rebalance the tree as needed, which does

[771:59]

cost you a bit more time, but it avoids

[772:01]

the thing devolving into big O of N

[772:04]

again. And we won't do that in code. So

[772:06]

this is recoverable, but not if you

[772:08]

implement it naively, as I did, at least

[772:10]

verbally in this story. All right. Well,

[772:15]

can we do better than that? Well, why

[772:17]

might we want to? Well, at this point in

[772:19]

the story, it certainly could devolve

[772:21]

into big O of N, and that's not great.

[772:23]

Certainly for large data sets, it's nice

[772:25]

that we're back to login. At least if

[772:26]

you take on faith that we could kind of

[772:28]

rebalance this thing as needed and

[772:30]

maintain a logarithmic height for it.

[772:32]

But really the holy grail of data

[772:34]

structures is to achieve something that

[772:36]

is big O of one like constant time

[772:38]

whereby no matter how many numbers or

[772:40]

names or sweaters are in the data

[772:42]

structure it will take just one step or

[772:45]

maybe three steps or even 100 steps but

[772:48]

a number of steps that is completely

[772:50]

independent of how many actual pieces of

[772:53]

data are in the data structure. That is

[772:55]

to say over time it doesn't get any

[772:57]

slower even if you've got tens,

[772:58]

hundreds, thousands, millions of

[773:00]

elements in there already. So how do we

[773:03]

gain something like big O of one

[773:05]

constant time the appeal of which is

[773:08]

reminiscent of our early picture from

[773:10]

week one like this was our early

[773:12]

algorithm for finding someone in a phone

[773:13]

book or counting students in the room

[773:15]

something linear literally straight

[773:17]

lines. This was the logarithmic curve

[773:18]

which especially as you zoom out starts

[773:20]

to get very very appealing time-wise.

[773:22]

Something that's constant time looks

[773:24]

even prettier. It is a straight line at

[773:27]

like the one step mark or the twostep

[773:28]

marks whatever the constant number of

[773:30]

step marks is. And even though

[773:31]

logarithmic will still grow in

[773:33]

perpetuity, constant time by definition

[773:36]

never changes. And this is what we'd

[773:38]

really like. So when you're searching

[773:39]

for someone in your phone, you're

[773:40]

searching for something on Google,

[773:41]

you're asking a question of chatbt, you

[773:43]

get an answer like that in constant time

[773:46]

independent of how much data is actually

[773:48]

in there. Well, let's see how we can do

[773:50]

this. To do this, we're going to at

[773:51]

least need a new building block, a term

[773:53]

of art known as hashing. Hashing sort of

[773:55]

formally takes an infinite domain of

[773:58]

values and maps it to a finite range of

[774:00]

values. So from high school math class,

[774:02]

domain is the input, range is the

[774:03]

output. So an infinite domain to a

[774:06]

finite range is the goal here of

[774:08]

hashing. And we might see this actually

[774:10]

in the real world when you're playing,

[774:11]

you know, games or whatnot or you're

[774:13]

cleaning up after a game like here is

[774:14]

here are some super jumbo playing cards

[774:16]

that we got online. And suppose that you

[774:18]

want to just get these into sorted

[774:20]

order. Um you could do this very

[774:21]

painstakingly. There's 52 cards here.

[774:23]

You can kind of lay them all out and

[774:24]

start sifting through them and put the

[774:26]

two over here and the four over here and

[774:28]

the hearts and the clubs and so forth.

[774:29]

Or you can start to look at the cards

[774:31]

and bucketize them first to take a 52-

[774:34]

size problem and maybe uh shrink it down

[774:36]

into four 13 byt problem. So here for

[774:39]

instance is where uh the first diamond

[774:41]

might go, the club here, spade over

[774:44]

here, diamond over here. And I can kind

[774:48]

of just do this again and again

[774:50]

bucketizing literally all of these

[774:53]

values so that I've got a very simple

[774:56]

heristic that allows me to move the

[774:59]

cards into these buckets each of which

[775:01]

is going to have a subset of the values

[775:03]

and then I've got smaller problems I can

[775:04]

deal with. So dot dot dot assume that I

[775:06]

bucketize all 52 of these values. Then

[775:08]

I've just got four problems remaining.

[775:10]

And I dare say it's a little easier then

[775:11]

because they're all of the same suit and

[775:13]

so I can pretty easily sort it from ace

[775:15]

to king or whatnot because those are

[775:17]

effectively just numbers at that point.

[775:19]

So hashing refers to again taking values

[775:21]

from an infinite range. In this case, it

[775:23]

it can be finite and it is in this case.

[775:25]

But if you were doing it more generally

[775:27]

with numbers, you just have to map it to

[775:29]

a finite range like 1 2 3 4 finite

[775:33]

number of buckets of values at which

[775:35]

point then you can solve the problem a

[775:36]

little differently or a little more

[775:38]

efficiently. So why is this gerine?

[775:41]

Well, I would propose that if we want to

[775:45]

start organizing our data in memory

[775:48]

toward an idealistic goal of achieving

[775:50]

constant time, hashing might be one

[775:53]

ingredient for the solution there too.

[775:54]

And generally, we're going to describe

[775:56]

the process by which you decide what

[775:58]

input goes to what output is namely

[776:00]

what's called a hash function. It's a

[776:02]

mathematical function or a function in

[776:03]

code that takes as input a card from a

[776:06]

deck or maybe a word from a dictionary

[776:08]

and outputs a value that represents the

[776:10]

bucket into which it should go. So in

[776:13]

the case of our contacts app for

[776:14]

instance, of course in the guey of it,

[776:16]

you have all of your friends and family

[776:17]

top to bottom uh alphabetically

[776:19]

presumably you might want to ideally

[776:22]

find someone quite quickly, ideally in

[776:25]

constant time, right? The naive

[776:27]

implementation that Apple or Google

[776:28]

could implement is just use linear

[776:29]

search. Search through all of your

[776:30]

contacts top to bottom and eventually

[776:32]

you will correctly find the person. But

[776:35]

wouldn't it be nice if they instead use

[776:36]

an array and then they can use binary

[776:38]

search and get you the person in

[776:39]

logarithmic time? That's great. But if

[776:40]

you have a lot of friends and family in

[776:41]

there or a much larger data set,

[776:43]

wouldn't it be nice to just jump to the

[776:45]

answer in one step instead of even log

[776:47]

of nst step? So that's our goal. Can we

[776:49]

get close to or actually at constant

[776:52]

time? So with a hash function, we

[776:54]

essentially have our old friend problem

[776:55]

solving here, the inside of which the

[776:58]

algorithm is known as a hash function.

[776:59]

And for instance, if I'm looking at

[777:01]

Mario's number, I might now want to look

[777:04]

for Mario, not top to bottom or not

[777:06]

divide and conquer, jumping around to

[777:08]

the half, the middle of the middle of

[777:09]

the middle. Let me just figure out what

[777:11]

bucket Mario is in. And in the English

[777:13]

alphabet, there's 26 letters of the

[777:15]

alphabet, A through Z, either uppercase

[777:16]

or lowerase. And suppose that I want to

[777:19]

find what bucket Mario is in. Well, much

[777:21]

like these cards and the suits thereof,

[777:23]

wouldn't it make sense that anyone whose

[777:25]

name start with with A goes into the

[777:26]

first bucket and maybe the B's go into

[777:28]

the second bucket and the dot dot dot

[777:30]

Z's go into the last bucket. So, it

[777:32]

stands to reason that if I pass in Mario

[777:34]

to a hash function implemented in C or

[777:36]

some other language, I would like to get

[777:38]

back the number 12 because M is the 13th

[777:40]

letter of the alphabet, but if we start

[777:42]

counting at zero with our buckets, which

[777:44]

are essentially an array, then it's

[777:46]

index location 12 instead of 13.

[777:48]

Similarly, if Luigi is the input, I'd

[777:50]

like to get back the number 11. So, my

[777:52]

hash function somehow takes as input in

[777:54]

this story, a string, and gives me an

[777:56]

integer. I claim there's theoretically

[777:59]

an infinite number of names in the world

[778:01]

in the English language. But there's

[778:03]

only going to be 26 possible answers

[778:05]

from this hash function 0 through 25.

[778:07]

So, that's our infinite domain to our

[778:09]

finite range. Instead of four, it's now

[778:11]

26. All right. So what should we do with

[778:14]

the computer's memory to leverage the

[778:17]

fact that we can very easily bucketize

[778:19]

names based on the first letter of

[778:22]

someone's name? Well, let me propose

[778:24]

that the hash function part of this

[778:26]

arcane as it looks is actually pretty

[778:28]

straightforward. So if you wanted to

[778:29]

translate this idea into C, you can

[778:31]

include uh cype.h, which we've used a

[778:33]

few times to get it access to like

[778:35]

functions like two upper. And this is

[778:37]

just to make sure you can be case

[778:38]

insensitive. Here's my hash function.

[778:40]

It's going to return an int, which is

[778:42]

the goal. Takes a string as input. We'll

[778:44]

call it name. And what does this

[778:45]

function do? Well, it's kind of some

[778:47]

clever asymmetric. It first converts to

[778:50]

uppercase. The first letter of that

[778:52]

person's name. So, if it's in all

[778:54]

lowercase, forces it to uppercase. Why?

[778:56]

Because I want to subtract no matter

[778:58]

what 65 aka the asky value of capital A

[779:01]

from this. And I don't want to screw up

[779:02]

the math. If I'm doing like a lowercase

[779:03]

letter minus a capital, I want capital

[779:05]

minus capital is all. So this will

[779:08]

return to me a number between 0 and 25

[779:11]

inclusive because if it is a letter a

[779:15]

name that starts with a. I'm only

[779:17]

looking at the first letter. I'm

[779:18]

subtracting off a that gives me zero and

[779:20]

I'm going to return zero as a result.

[779:23]

Dot dot dot. If it's z, I'm going to

[779:24]

return 25 instead. Now there's no error

[779:26]

checking in here. If you type in uh

[779:28]

non-English symbols, uh it's going to

[779:31]

break. So let's just assume for

[779:32]

simplicity this is indeed an English

[779:34]

name that's coming in. I can refine this

[779:36]

a little bit. I'm going to propose

[779:37]

moving forward in our final week here of

[779:39]

C, there are some added defenses you can

[779:41]

put in place when writing code. Like if

[779:43]

you know that you're receiving a name as

[779:45]

input, that is you're passing something

[779:47]

in by reference, there's a danger now

[779:50]

per last week, because now the caller of

[779:54]

this function, whoever's using this

[779:55]

function is telling you where to find

[779:57]

Mario and where to find Luigi's name.

[779:59]

The problem with that is that you could

[780:00]

go to that address and actually change

[780:02]

their name in memory. Even if you're not

[780:04]

supposed to, you're supposed to just use

[780:05]

the name. So you can do something like

[780:07]

const which says you should not be able

[780:09]

to change this value even though I'm not

[780:11]

giving you a copy of it by value. I'm

[780:13]

giving you a reference there too.

[780:15]

Another refinement here is that a hash

[780:17]

function for an array as the goal should

[780:20]

return a value that's zero or one or two

[780:22]

on up. Never negative. So we can even

[780:24]

more protectively say it's not just an

[780:26]

int, it's an unsigned int. And we talked

[780:28]

briefly about that last week, albeit in

[780:30]

the context of chars. These are just

[780:31]

like minor improvements that makes your

[780:33]

code arguably better designed because

[780:35]

you're opening yourself up to fewer

[780:38]

possible mistakes or issues. All right,

[780:41]

so with that said, let's now assume that

[780:43]

we've got this kind of function in uh

[780:45]

implemented and we can now use it to

[780:46]

decide what bucket to put these people's

[780:50]

names into. Well, let's give you what

[780:52]

are called hashts, which are sort of the

[780:54]

Swiss army knives of data structures.

[780:56]

the kind of thing that some computer

[780:57]

scientists have been quoted as saying if

[780:58]

they were stuck on a desert island with

[781:00]

only one data structure, this is

[781:01]

probably the one they would want. Why?

[781:04]

It's just really generally useful

[781:06]

because it allows you quite powerfully

[781:08]

to associate keys with values. Which is

[781:11]

to say to come full circle today, hashts

[781:13]

are often how you would implement at a

[781:15]

lower level the thing we began class

[781:17]

with talking about dictionaries,

[781:19]

collections of key value pairs. That

[781:21]

after all is what a phone book is. We

[781:23]

call it, you know, names and numbers,

[781:24]

but it's keys and values. That's what an

[781:26]

actual English dictionary is. The Oxford

[781:28]

English dictionary, it's a bunch of

[781:30]

words and definitions or keys and

[781:32]

values. So useful in general to be able

[781:34]

to associate one piece of data with

[781:35]

another. Argo hashts. So here's how you

[781:38]

might implement in C a hash table. You

[781:41]

want it to be of size 26 for instance.

[781:42]

So 26 buckets from A to Z, hence the 26.

[781:46]

You want this to be an array and that's

[781:48]

fine. This is an array of four buckets.

[781:49]

I'm going to use an array of 26 buckets

[781:52]

because a hasht 2 is going to be an

[781:54]

evolution of our linked list mashed

[781:56]

together with an array. So a hasht in

[781:59]

short is going to be an array with

[782:01]

linked lists as we'll soon see. Here's

[782:03]

the array. 26 pointers to nodes. So I'm

[782:07]

going to give myself an array of

[782:09]

pointers that is going to store

[782:11]

ultimately a whole bunch of person

[782:13]

objects like this. So for instance,

[782:15]

here's a char star name, charst star

[782:17]

number, as we've discussed in the past,

[782:19]

representing a person. These are the

[782:20]

pieces of data I might want to store in

[782:22]

this data structure. However, let's

[782:25]

simplify. Let's not worry about the

[782:26]

phone number because we're not going to

[782:27]

call anyone today. But for a linked list

[782:29]

of persons, I'm going to need to store

[782:32]

let's say the person's name, but also a

[782:34]

pointer to the next such name, to the

[782:36]

next such name, to the next such name.

[782:38]

So again, I'm just deleting number as

[782:39]

being unnecessary detail. But if we're

[782:41]

going to have an array of link lists,

[782:44]

this is our new definition of node for

[782:46]

this part of class whereby it's not for

[782:48]

a tree. It's now for a hash table. And

[782:51]

we'll see this in action now. Here is my

[782:53]

array of size 26. I drew it vertically,

[782:55]

but who cares? These have always been

[782:57]

artist renditions thereof. It just fits

[782:58]

nicely on the screen this way. This is

[783:00]

location zero. This is location 25. So

[783:02]

any A names should end up over here. any

[783:05]

uh Z name should end up down here and so

[783:07]

forth. Let's just generalize this away

[783:09]

as letters of the alphabet for clarity.

[783:11]

That's where all the names are going to

[783:12]

go. So hopefully Mario here, Luigi here,

[783:14]

and everyone else. So what are each of

[783:17]

these squares? They're just pointers to

[783:19]

nodes. Initially, all null, all claim.

[783:21]

But as soon as I insert Mario into this

[783:24]

so-called hash table, I'm not going to

[783:26]

put him literally here. I'm going to

[783:27]

create a new node in memory, put Mario

[783:29]

there, and then stitch it together.

[783:31]

Because if I get another M name, I'm

[783:33]

going to stitch it together and together

[783:35]

and together again. So for instance,

[783:37]

here comes Mario into this data

[783:39]

structure. So this is a pointer to a

[783:42]

person structure. Here's Luigi. And

[783:44]

here's a third character as well, Peach.

[783:46]

That's all working out great. Dot dot

[783:48]

dot. There's a whole bunch of characters

[783:50]

in the Nintendo universe. Here's a lot

[783:51]

of them. Unfortunately, especially if

[783:53]

you're a fan, there's also other names

[783:55]

that do start with M and L and other

[783:58]

letters of the alphabet. So, we're

[783:59]

poised to have what we're going to call

[784:01]

collisions, which is a downside of using

[784:03]

a hash function. If you're going from

[784:05]

something infinite to something finite,

[784:07]

by definition, you're going to have a

[784:08]

heck of a lot of potential collisions

[784:11]

somehow. Multiple M names, multiple L

[784:13]

names, and so forth. So, we've got to

[784:14]

mitigate this somehow. Well, if you meet

[784:16]

someone in the real world whose name

[784:18]

happens to start with M, and you already

[784:19]

are friends with Mario, well, you could

[784:20]

delete Mario from your phone and put

[784:22]

that new person there. But that's kind

[784:24]

of dumb. You could clobber the value,

[784:26]

that is. Or maybe you put the M friend

[784:28]

here. And when that fills up, you put

[784:30]

the M friend here. And then when you

[784:32]

meet someone else whose name starts with

[784:33]

M, you put it here. But then it just

[784:34]

devolves into this mess. At which point

[784:36]

now there's no rhyme or reason as to who

[784:38]

is where. It devolves back into

[784:39]

something linear. If you have to search

[784:40]

the whole darn thing looking for M

[784:42]

friends just because you ran out of

[784:44]

space where you want it. So here's the

[784:45]

beauty of mashing together an array with

[784:47]

a linked list. You hash the name to the

[784:51]

intended location like box 12 here. And

[784:54]

then you just start stringing them

[784:55]

together in a linked list. And hopefully

[784:57]

you don't have too many of those

[784:58]

collisions, but at least now you don't

[784:59]

have to delete or make a mess of the

[785:02]

data structure. So here's another bunch

[785:04]

of names, three starting with L. Here's

[785:06]

a bunch for the other letters of the

[785:07]

alphabet. And it's just a linked it's an

[785:10]

array now of linked lists. This then is

[785:13]

a hash table. So the question to

[785:15]

consider now is this better than an

[785:19]

array? Is this better than a linked

[785:22]

list? Well, I dare say it's better than

[785:24]

a linked list because if it were a

[785:25]

linked list from A to Z, what would be

[785:28]

the running time of searching for

[785:29]

anyone? Well, I'll spoil it. Big O of N.

[785:31]

Because even if it's alphabetically

[785:32]

sorted, you got to start at the

[785:34]

beginning and go all the way through the

[785:35]

list potentially to find someone like

[785:37]

Zelda whose name starts with, of course,

[785:39]

Z. But here we have an array of linked

[785:43]

lists. So what's really the running time

[785:45]

here? It's not quite as bad as n steps

[785:49]

because if you assume a uniform

[785:50]

distribution of names such that the

[785:52]

world of Nintendo maybe has as many M

[785:54]

names as L names as A names as B names,

[785:57]

you could assume that there's a bunch of

[785:58]

chains, a bunch of linked lists here

[786:00]

chained together, but they're all

[786:02]

roughly the same. So maybe you have n

[786:04]

names in your phone book this way, but

[786:08]

there these lists are only of size uh

[786:10]

they're only 126 of that length because

[786:13]

you've got that many names there. So

[786:15]

what's the running time? Well, ideally

[786:17]

we'd move away from link lists with big

[786:19]

O of N and achieve our constant time.

[786:22]

But uh we have these collisions to worry

[786:25]

about here. Just to be clear, we want to

[786:29]

get from big O of N to something

[786:31]

constant time, but we're not going to

[786:33]

get to constant time if we've got

[786:34]

collisions. If we've got three L names

[786:36]

and a few B names and a few A names, we

[786:38]

can't just jump to that location and

[786:40]

find the person we're looking for. So,

[786:42]

what's the fundamental goal? Well, I

[786:44]

think we want to maybe use a smarter

[786:45]

hash function. And here depicted is an

[786:47]

excerpt from a bigger hash table that is

[786:50]

a much bigger array that assumes that

[786:52]

you're not looking at the first letter

[786:54]

of everyone's name, but apparently what

[786:55]

instead the first three letters of the

[786:59]

person's name, which just decreases the

[787:02]

probability of collisions because in

[787:04]

this model, I dare say there's no one

[787:06]

else's name in the Nintendo universe

[787:07]

that starts with L I N. So now Link has

[787:10]

its own location in memory. And

[787:13]

similarly for Luigi, LUI I believe is

[787:16]

unique in the Nintendo universe. So we

[787:18]

don't have a collision. Unfortunately,

[787:20]

while this does seem to eliminate

[787:22]

collisions based on this tiny example,

[787:25]

what's the trade-off

[787:27]

or what's the catch? Yeah,

[787:29]

>> use a lot more memory.

[787:30]

>> This is a lot more memory. I mean, kind

[787:32]

of hinted at the fact that I didn't even

[787:34]

fit most of it on the screen anymore.

[787:35]

Here's L A. Here's L U. But what about

[787:39]

all of the other letters of the alphabet

[787:40]

and the other combinations of dot dot

[787:42]

dot dot dot dot all possibilities.

[787:44]

Moreover, some of these just don't make

[787:46]

much sense. At least in English or in

[787:47]

the Nintendo world, I don't think

[787:48]

there's anyone whose name is going to

[787:49]

start with a aaa or a aab or a a or a a

[787:53]

d or a and so forth. You we're wasting a

[787:55]

huge amount of space to reduce the

[787:58]

probability of collision. So that's

[788:00]

fine. We might get constant time now,

[788:02]

but at what cost? Well, a heck of a lot

[788:04]

more memory. And so this is one of the

[788:06]

tensions when using a hash table is you

[788:08]

want to come up with a good hash

[788:10]

function that's maybe a little more

[788:12]

sophisticated than the first letter but

[788:14]

not so wasteful that you need a crazy

[788:16]

number of buckets and therefore a huge

[788:18]

amount more memory. So really even with

[788:22]

collisions it's not quite as bad as n

[788:25]

steps cuz technically if you have k

[788:26]

buckets where k is like 26 buckets or

[788:29]

four in this case technically if you do

[788:31]

assume that the names are uniformly

[788:32]

distributed over a through z the English

[788:34]

alphabet. Well each of those link lists

[788:36]

is going to be hopefully no bigger than

[788:37]

n / k. So n / 26. But what do we know

[788:41]

about higher order terms when doing big

[788:43]

O notation? Big O of N / K. Yes, it's

[788:47]

faster but asmmptoically that is

[788:49]

theoretically you're still talking about

[788:51]

big O of N. So here's the tension though

[788:54]

like it's absolutely going to be faster.

[788:55]

It will be like 26 times faster than a

[788:57]

linked list but it's still just big O of

[789:00]

N because it's going to take an amount

[789:02]

of time that's still linear in the size

[789:05]

of the data set. So we seem to have

[789:07]

strayed yet again away from our constant

[789:10]

time search. So can we find this holy

[789:13]

grail? Well, we kind of can if you let

[789:15]

me spend just like a lot more space.

[789:18]

There are tries in the world, which

[789:19]

could weirdly is short for retrieval,

[789:21]

even though we don't say retrival, but a

[789:23]

try is a tree made out of arrays, right?

[789:27]

So, at some point, computer scientists

[789:28]

were just like mashing things together

[789:30]

Frankenstein style, like like length

[789:31]

lists and arrays, and now we've got uh

[789:34]

trees and and arrays. You two can mash

[789:37]

something together and come up with your

[789:38]

own. Let's look at what a try actually

[789:40]

is because it is going to get us that

[789:42]

constant time grail. So here is the root

[789:45]

of a try. You can think of each node in

[789:47]

a try as really being an array of values

[789:51]

a through z in the case of an English

[789:53]

problem like we've been playing with

[789:54]

here. And what you do is you treat this

[789:57]

array as being indexed from 0 through 25

[790:00]

or equivalently a through z. And you

[790:02]

treat each of those elements as a

[790:04]

pointer to another such node in the try.

[790:07]

And what you do is implicitly store the

[790:10]

names that you're storing in this data

[790:12]

structure by going to an appropriate

[790:15]

location based on the first letter in

[790:17]

their name and then adding a pointer

[790:19]

that represents the second letter in

[790:20]

their name. Adding a pointer that

[790:22]

represents the third letter of their

[790:23]

name and so forth. So what do I mean by

[790:24]

this? Suppose we want to insert Toad,

[790:27]

one of the characters from the Nintendo

[790:28]

universe first. If we count up where T

[790:30]

is in the alphabet, this uh pointer here

[790:34]

will be changed from null to a pointer

[790:36]

to a new node that represents the second

[790:39]

letter in Toad's name, which is going to

[790:40]

be, of course, O. Then to insert to o A,

[790:44]

we're going to need another node. A is

[790:46]

going to lead me to D. And for p uh

[790:48]

depiction sake, I'm going to draw in

[790:50]

green, even though this would actually

[790:51]

be a boolean or something like that in

[790:52]

memory that indicates that Toad's name

[790:55]

stops here. So in other words, this try

[790:58]

in memory has four nodes. Now each of

[791:00]

those nodes is essentially an array of

[791:02]

size 26. But the word toad is not

[791:05]

actually stored in the data structure

[791:07]

explicitly. There's no charar toad, but

[791:10]

implicitly because the tinter is

[791:13]

non-null, the o pointer is non-null, the

[791:15]

a pointer is non-null, and the dp

[791:17]

pointer is in fact null at this point is

[791:20]

the common technique here. This allows

[791:22]

me to to insert other names from

[791:24]

Nintendo's universe like Toadette

[791:26]

because I can continue from here to go

[791:28]

to the E node to the T- node uh to the

[791:31]

T- node again and an E node which I'll

[791:33]

again mark in green. So you can even

[791:35]

have names that are substrings or

[791:37]

equivalently superstrings of each other

[791:39]

by just having all of these various

[791:41]

breadcrumbs along the way where again a

[791:44]

non-null pointer here to a non-null to a

[791:46]

non-null to a null pointer here

[791:47]

indicates that or it can't be null at

[791:50]

this point. This is where we have to use

[791:51]

a boolean indicates that there is a name

[791:53]

in this data structure that ends here

[791:55]

and there's another name that ends here.

[791:56]

Meanwhile, if there's a third name from

[791:58]

the universe like Tom, same idea, but

[792:00]

eventually we can start reusing some of

[792:01]

these arrays whereby non-null non-null

[792:05]

null or there's a boolean flag here that

[792:06]

says true, a name ends here. Now we're

[792:09]

reusing that same array. So each of the

[792:11]

nodes represents the e letter of the

[792:15]

word or the name you're trying to store

[792:17]

in the data structure. And by playing

[792:19]

around with null and non-null and some

[792:20]

booleans, you can implicitly store names

[792:23]

in this structure. Now, it's way too

[792:27]

uh pictorially difficult to depict lots

[792:29]

and lots of names in this form. So, just

[792:31]

imagine in your mind's eye that there's

[792:32]

dozens, hundreds, thousands of names now

[792:34]

in this data structure, but just more

[792:36]

arrows and more arrays. How do you

[792:39]

actually look someone up in this data

[792:41]

structure? Well, if you want to ask a

[792:43]

question like is Toad in this data

[792:45]

structure or is toad in this data

[792:47]

structure or anyone else, you can simply

[792:50]

start at the root node as we would do

[792:51]

for any tree and you hash on the first

[792:54]

letter of toad's name which gives you

[792:56]

this location and you check is it null?

[792:58]

If not, T is implicitly there. So, you

[793:01]

follow that pointer here and then you

[793:03]

hash the second letter of Toad's name,

[793:05]

an O, and check this pointer. And you

[793:07]

follow that arrow. Then you check the

[793:09]

third you hash on the third letter of

[793:11]

Toad's name A and you follow that arrow.

[793:13]

Then the fourth letter of Toad's name D

[793:15]

and you see ah there's a boolean here

[793:17]

represented in green that means Toad is

[793:20]

in this data structure. And notice

[793:21]

what's subtle here. It doesn't matter if

[793:25]

there's three names in this try or three

[793:27]

million names in this try. How many

[793:29]

steps did it take me to confirm or deny

[793:31]

that Toad is in this try? one, two,

[793:34]

three, four, which is arguably constant.

[793:36]

Even though the names can vary, at some

[793:38]

point there's no Nintendo name longer

[793:40]

than what, like 10 characters, 20

[793:42]

characters, maybe 30. I mean, there's

[793:43]

some reasonable bound that is finite

[793:45]

where there's never going to be a name

[793:47]

longer than that because Nintendo's

[793:48]

never going to come up with a crazy long

[793:49]

name for a game. And so, you effectively

[793:52]

have constant time for looking up to o a

[793:54]

d, Toadette, Tom, Mario, Luigi, Peach,

[793:57]

any of the other names we've looked at.

[793:59]

So this is to say a try allows you to

[794:02]

ask questions like is Toad in this data

[794:05]

set or equivalently what is Toad's phone

[794:07]

number in this data set because if you

[794:09]

assume now that each of these pointers

[794:11]

ultimately is not just a bull saying yes

[794:12]

or no but maybe it's an actual person

[794:14]

structure with a name and a number you

[794:17]

can store even uh data like that your

[794:20]

key value pairs where your names are

[794:21]

your keys and your phone numbers are

[794:23]

your values to make this more clear then

[794:25]

here is a data structure how we might

[794:27]

represent in See each of these nodes.

[794:30]

It's not quite technically an just an

[794:32]

array. It's an array of size 26. We'll

[794:34]

call it children because it represents

[794:35]

the children of that node of type struck

[794:38]

node star. And then here for instance

[794:39]

for simplicity is that person's number.

[794:41]

If we reintroduce numbers and want to

[794:43]

store in this data structure someone's

[794:45]

phone number as well. So using that data

[794:48]

structure and that kind of uh code you

[794:52]

can implement a try using something as

[794:54]

simple as this. Initially your try is

[794:56]

just a pointer to a node. one such uh

[794:59]

strct. We can of course initialize it to

[795:01]

null to make clear that there's no names

[795:02]

in here. But each time we allocate a

[795:04]

node, we can then add another node,

[795:06]

another node, hashing on the first, the

[795:09]

second, the third, the fourth, dot dot

[795:10]

dot, the last character in the person's

[795:13]

name, allocating a node as needed,

[795:14]

flipping that boolean to true or false,

[795:16]

or adding their phone number as a char

[795:18]

star to indicate that we have then found

[795:20]

them. And so of all the data structures

[795:21]

we've looked at today, big O of one is

[795:25]

actually achieved with tries. And yet

[795:28]

curiously for problem set five, you're

[795:30]

not going to implement tries, you're

[795:31]

going to implement hashts, that sort of

[795:33]

Swiss Army knife of data structures that

[795:34]

like every programmer everywhere knows

[795:36]

about. Why? Like why not use tries very

[795:41]

often in practice? Perhaps

[795:44]

you certainly can, but what's the

[795:47]

trade-off perhaps? Yeah,

[795:50]

>> take up too much memory.

[795:51]

>> It's a huge amount of memory. Things

[795:53]

have escalated since the start of class.

[795:55]

We add we started with one int. Then we

[795:56]

added an int and a pointer and int and

[795:58]

two pointers. Now I'm proposing 26

[796:00]

pointers plus a boolean or a data

[796:01]

structure called person. I mean it's

[796:03]

escalating significantly. And the

[796:04]

biggest catch with a try as you might

[796:06]

have imagined with toad and toad and Tom

[796:08]

on the screen there's a huge amount of

[796:09]

wasted memory just as we saw with a hash

[796:12]

function potentially but that can be

[796:13]

reigned in as you'll explore in the

[796:15]

problem set with a try. most of the

[796:17]

pointers in those arrays are just null

[796:19]

and unused and it just tends to result

[796:21]

in you're using way more memory to solve

[796:23]

the problem correctly but in a way that

[796:25]

tends to slow the computer down and just

[796:27]

waste more memory than is useful. That

[796:29]

said, just as we started today, there

[796:31]

are stacks in the real world. There's

[796:32]

cues in the real world. There are even

[796:36]

hashts in the real world which you'll

[796:38]

indeed implement in code for problem set

[796:40]

five. Has anyone here ever had a salad

[796:42]

from a restaurant called Sweet Green in

[796:43]

Harvard Square? also elsewhere in the US

[796:46]

like not one, two, like two of us, three

[796:48]

of us. Okay, so not hard to imagine

[796:51]

going to such a store, getting in a

[796:52]

queue and staring at a shelf like this

[796:54]

because what Sweet Green and similar

[796:56]

restaurants do when you order for pickup

[796:58]

is they hash your salad into a shelf

[797:01]

like this. And so literally in Sweet

[797:03]

Green might you see some wooden shelves

[797:05]

like this. This is the A through E

[797:06]

bucket, the F throughJ bucket, the K

[797:08]

through N bucket and the uh O through Z

[797:10]

bucket whereby if your name like Min

[797:12]

happens to be in one of those ranges,

[797:13]

they will hash my salad and put it here.

[797:16]

But of course, even in the real world,

[797:17]

there are some constraints. And what can

[797:19]

go wrong with this here hasht system?

[797:23]

Someone who's been there maybe what can

[797:26]

go wrong? Imagine like the extreme lots

[797:27]

of values here. Yeah. So there's no more

[797:30]

space, right? So and this has happened

[797:31]

to me in the past especially since green

[797:33]

before adopting this system. And they

[797:34]

used to put the A's here, the B's here,

[797:35]

the C's here, the D's here and so forth.

[797:37]

And then someone at some point realized

[797:38]

that they were very frequently

[797:40]

overflowing the A's to the B's and the

[797:42]

B's to the C's. The no one was using Q

[797:44]

or Z with any frequency. And so they

[797:46]

were sort of wasting space and running

[797:47]

out of space. So at some point they

[797:48]

decided to like literally remove most of

[797:50]

the letters of the alphabet, make the

[797:52]

buckets bigger and fewer. So now it's

[797:55]

very unlikely that you're going to have

[797:56]

so many K's through N's that you

[797:59]

overflow the shelf. But this is in the

[798:01]

real world a data structure like we've

[798:03]

seen today. And so therefore among the

[798:05]

goals, even as arcane as things seem to

[798:06]

be getting with all the pointer notation

[798:08]

and dreferencing this and that, really

[798:09]

all we're doing in code is implementing

[798:11]

realworld solutions that other people

[798:14]

have already come up with and

[798:15]

translating them to a new domain. And

[798:17]

the very last thing you'll do in C this

[798:18]

week is indeed implement your very own

[798:20]

spell checker whereby we'll give you a

[798:22]

very large file of 100,000 plus English

[798:24]

words. you'll have to come up with a

[798:26]

clever and efficient way to load it up

[798:28]

into memory. And we'll give you tools

[798:29]

that will actually measure how fast or

[798:31]

how slow your code is, how much memory

[798:33]

or how little memory your code is so as

[798:35]

to actually compare it against not just

[798:37]

your own but perhaps others as well. So

[798:39]

with that said, we'll end a bit early

[798:40]

today. We'll see you next time.

[799:11]

Heat. Heat.

[799:59]

All right, this is CS50 and this is

[800:02]

already week six wherein we transition

[800:04]

away from C to a programming language

[800:06]

called Python. And that's not to say

[800:08]

that the past several weeks haven't been

[800:09]

among the goals of the course. Indeed,

[800:11]

in learning C, I very much think that

[800:13]

you'll have at the end of this class so

[800:15]

much more of a bottom-up understanding

[800:17]

of how computers work, of how

[800:19]

programming languages work. And in

[800:20]

particular, you'll appreciate and

[800:22]

understand better how Python and Java

[800:24]

and C++ and Swift and so many other

[800:26]

languages are actually doing their thing

[800:28]

nowadays. But recall that we started

[800:30]

with Scratch some weeks ago. When in

[800:32]

Scratch, what was nice was that the

[800:34]

first program we wrote, hello world, was

[800:36]

just all too accessible. All you had to

[800:38]

do was interlock two puzzle pieces in

[800:40]

order to make the cat in that case say

[800:42]

hello world. Well, thereafter, of

[800:44]

course, we transitioned to C. And recall

[800:46]

that in week one, we asked you to take

[800:48]

on faith that you can sort of ignore

[800:49]

that first line and a lot of these

[800:50]

parentheses and the curly braces and

[800:52]

really just focus on the essence of the

[800:53]

program, which clearly is still about

[800:55]

hello world and printing it, albeit

[800:58]

using a different function and a bit new

[801:00]

syntax. Today, very excitingly, all of

[801:02]

that is truly going to go away and be

[801:04]

distilled into a single line of code

[801:06]

when you indeed want to have the

[801:07]

computer say something like hello world.

[801:10]

And this is what we mean by Python being

[801:11]

a higher level language. So, humans over

[801:13]

the decades learned uh from earlier

[801:15]

designs, earlier programming languages,

[801:17]

what worked well, what did not.

[801:19]

Computers got faster, computers had more

[801:21]

memory, and so you were able to start

[801:22]

spending more of those resources in

[801:24]

order to have the computer do more for

[801:26]

you. And so, you don't need to be as

[801:27]

pedantic syntactically anymore. you

[801:30]

don't need to write as much code anymore

[801:32]

and frankly you can just start solving

[801:34]

problems of interest to you building

[801:36]

products of interest to you so much more

[801:38]

readily by choosing the right tool for

[801:40]

the job and so in the real world if you

[801:41]

continue coding after CS50 like

[801:43]

sometimes C will be the right tool for

[801:45]

the job sometime Python will be the

[801:47]

right tool for the job and sometimes

[801:49]

it's going to be a different language

[801:50]

altogether that you'll never have

[801:51]

studied in school and in fact what's

[801:53]

compelling I think about this week six

[801:56]

much like when I took the class back in

[801:57]

the day is that after CS50 50, you'll

[802:00]

have a taste of one, two, maybe a few

[802:02]

different programming languages. And

[802:03]

that's going to be enough to bootstrap

[802:05]

yourself and teach yourself new

[802:07]

languages because you're going to start

[802:08]

to recognize in the real world

[802:10]

similarities with past languages that

[802:12]

you've seen, programming paradigms that

[802:13]

are still sort of with us. And the

[802:15]

syntax, yeah, that's invariably going to

[802:17]

change, but that's the stuff that you

[802:18]

are going to Google or ask chat GPT or

[802:20]

some other AI about down the line. So

[802:22]

long as you know enough of it to sort of

[802:24]

get real work done, you'll focus mostly

[802:26]

ultimately on the ideas and the problems

[802:28]

you want to solve and less on the

[802:29]

syntax. And so among the goals for this

[802:31]

week and this week's problem set and

[802:33]

really the rest of the course is to get

[802:34]

you more comfortable feeling

[802:36]

uncomfortable in front of your keyboard

[802:38]

because we're not going to give you and

[802:39]

tell you everything you need to know for

[802:41]

a language like Python. You're going to

[802:42]

turn to the documentation. You're going

[802:43]

to turn to the duck and you're going to

[802:44]

learn to teach yourself ultimately a new

[802:47]

language. So let's actually write our

[802:50]

first program and compare and contrast

[802:52]

with how we might do that in C. So

[802:55]

recall that in C we were in the habit

[802:56]

for the first couple of weeks and doing

[802:58]

make hello and make this build utility

[803:00]

just kind of magically new to look for a

[803:01]

file called hello.c C and magically to

[803:04]

create a program called hello and then

[803:06]

you could run it with dot/hello and then

[803:08]

a week or so later we revealed that make

[803:10]

is really just automating compilation of

[803:13]

your program with the actual compiler

[803:15]

clang in this case and passing it

[803:16]

command line arguments like - o to get a

[803:19]

specific output like the file name hello

[803:21]

instead of the default which recall was

[803:23]

a.out out passing in the name of the

[803:25]

file you want to compile and turning on

[803:27]

any libraries that you might want to

[803:29]

compile into your program link into your

[803:31]

program beyond the standard ones but

[803:33]

then you could still run it in exactly

[803:35]

the same way starting today when you

[803:37]

write Python code and then want to run

[803:39]

it you're simply going to run the Python

[803:42]

program itself so just as clang is a C

[803:45]

compiler uh Python is itself not only a

[803:48]

programming language but a program as

[803:51]

well and with the Python program which

[803:53]

understands the Python programming

[803:55]

language. Can you run code that you'll

[803:57]

have written in a file called hello.py?

[803:59]

And what this program is doing is a

[804:01]

little bit different from what clang is

[804:02]

doing, but we'll see that difference

[804:04]

before long. But first, let me go over

[804:05]

to VS Code and let's write our simplest

[804:08]

our first of Python programs by doing

[804:10]

code hello.py. And then in this file

[804:14]

without any includes, any int main

[804:16]

voids, I'm simply going to say print

[804:18]

quote unquote hello, world close quote.

[804:21]

All right. Now I'm not going to do make.

[804:23]

I'm instead just going to do Python of

[804:25]

hello.py. Cross my fingers as always and

[804:27]

voila, my first program in Python. So

[804:30]

it's sort of obvious that we got rid of

[804:32]

the uh hash include. We got rid of the

[804:34]

int main void. No curly braces. Only a

[804:36]

couple of parentheses here. But what

[804:37]

else is different to your eyes that's a

[804:39]

little more subtle here versus C. Yeah.

[804:44]

>> Yeah. So there's no F. So the print

[804:46]

function is a little more human

[804:47]

friendly. It's print instead of print f

[804:49]

where the f did mean formatted, but

[804:50]

we'll see that we still have that

[804:51]

functionality.

[804:52]

>> No need for the line break.

[804:54]

>> So no need for the line break,

[804:55]

specifically the back slashn. And yet

[804:57]

here's my cursor on the next line. So I

[804:59]

dare say humans over the years realized

[805:01]

we are more commonly wanting a new line

[805:04]

than we don't want it. And so they made

[805:06]

the default actually give it to you

[805:07]

automatically. And there's one more

[805:09]

detail. Yeah.

[805:11]

>> No semicolon.

[805:12]

>> So there's no semicolon. So, I finished

[805:14]

my thought at the end of the line, but I

[805:15]

didn't need to explicitly terminate it

[805:17]

with a semicolon. This is just with one

[805:19]

program, all of these salient

[805:21]

differences, but I'd argue that we got

[805:22]

rid of all of the annoying stuff thus

[805:24]

far anyway. So, we can really focus on

[805:26]

what this program itself is doing. But

[805:29]

what's exciting with Python 2 is just

[805:31]

how quickly you can solve certain

[805:33]

problems. And this isn't true of just

[805:34]

Python. It's really any higher level

[805:36]

language than C. In fact, just for fun,

[805:40]

let me go ahead and implement Problem

[805:44]

set five wherein you're challenged with

[805:46]

implementing the fastest spell checker

[805:47]

possible. So let me go back here to VS

[805:50]

Code. Let's close out hello.py and clear

[805:52]

my terminal window. And let me go ahead

[805:53]

and do this. Let me first split my

[805:55]

terminal by clicking this rectangular

[805:57]

icon over here. And that's going to give

[805:58]

me two terminal windows now left and

[806:00]

right. Because in the first one at left,

[806:02]

I'm going to CD into a directory I came

[806:04]

with today, which is the staff's

[806:06]

solution to problem set 5's spellch

[806:08]

checker in C. And on the right hand side

[806:10]

here, I'm going to CD into another

[806:12]

directory I brought with me today called

[806:13]

Python. Inside of which is a translation

[806:16]

of problem set 5 into Python. In

[806:18]

particular, I've implemented in advance

[806:20]

a spell.py file, which is the analog in

[806:22]

Python of spellar.c in C. And I've also

[806:25]

prepared a dictionary. Py file.

[806:28]

Unfortunately, if we open up

[806:29]

dictionary.py,

[806:31]

you'll see that it's not actually

[806:32]

implemented yet. So in dictionary.py,

[806:35]

let's implement in Python problem set

[806:37]

five and see how long it takes. Well,

[806:39]

the first thing I'm going to do is

[806:40]

declare a global variable. We'll call it

[806:42]

words. And set that equal to the return

[806:43]

value of a Python function called set,

[806:45]

which essentially gives me a set object,

[806:47]

wherein I can store a whole bunch of

[806:49]

words without duplicates. Python's going

[806:51]

to manage all of that for me. In effect,

[806:53]

it's going to implement what I needed to

[806:55]

implement myself in problem set 5, a

[806:58]

hash table. Now, down here, I'm going to

[807:00]

go ahead and define a function called

[807:01]

check. Pass in as input a parameter

[807:04]

called word because, of course, that's

[807:06]

how it was implemented in C. But notice

[807:08]

a difference already. In Python, we use

[807:10]

a new keyword called defaf to define a

[807:12]

function. And we don't have to specify

[807:14]

the type of the variable being passed in

[807:16]

word in this case. And we also don't

[807:18]

have to specify a return type for the

[807:20]

function. Now, inside of this check

[807:22]

function, it suffices to do this. I'm

[807:24]

going to return word.

[807:26]

In words, which is effectively a boolean

[807:29]

expression asking, is the lowercase

[807:31]

version of this word in the set? If so,

[807:33]

return true. Otherwise, return false.

[807:36]

done with the check function. Now let's

[807:38]

go ahead and define another function

[807:40]

called load which recall took an

[807:42]

argument of the dictionary that you want

[807:43]

to load into memory. And let's go ahead

[807:45]

now and do this with open dictionary as

[807:48]

file which effectively opens the

[807:50]

dictionary as in C we used fop in Python

[807:53]

we use open and it gives it a variable

[807:56]

name of file. Then once that file is

[807:58]

open, I'm going to go ahead and update

[808:00]

that entire set of words which starts

[808:02]

out empty by taking the file, reading

[808:06]

the entire contents top to bottom, left

[808:08]

to right, and splitting all of the lines

[808:10]

therein on the new lines that terminate

[808:12]

each of the strings, effectively

[808:14]

updating the set with every word in that

[808:16]

their dictionary. Then I'm going to

[808:18]

assume that it all just worked because

[808:20]

there's a lot less effort for me to uh

[808:22]

to perform myself in Python. And I'm

[808:24]

just going to go ahead and return true

[808:26]

capital T in Python. Done. Next, let's

[808:29]

go ahead and define that other function

[808:31]

from problem set 5 size whose purpose in

[808:33]

life was to tell me the size of the

[808:34]

dictionary I had loaded. Well, in

[808:36]

Python, that's pretty easy. I can just

[808:37]

return the length or leen for short of

[808:40]

the set in which I've stored all those

[808:42]

words. Done. And then lastly, I'm going

[808:44]

to go ahead and define an unload

[808:46]

function, which recall was responsible

[808:48]

for freeing any memory I myself had

[808:50]

allocated. I don't seem to have done any

[808:52]

of that in Python. In fact, that's

[808:54]

managed for me now. So, I'm going to go

[808:55]

ahead and simply say return true because

[808:58]

there's no work to be done. And that's

[809:00]

it. In like 19 lines of code in Python,

[809:03]

most of which are blank lines, I claim I

[809:06]

have reimplemented problem set 5 in

[809:08]

Python. Well, let's take a look now at

[809:10]

the difference. I'm going to go ahead

[809:11]

and reopen my terminal window, and I'm

[809:13]

going to go ahead and maximize it so we

[809:14]

can see more output. And now I'm going

[809:16]

to go ahead and run Python, which is

[809:18]

going to be not only the name of the

[809:19]

language, but the name of the program we

[809:21]

use today to start running our Python

[809:22]

code. And I'm going to run it on

[809:24]

spellar.py, which I brought with me

[809:25]

today, specifically on the largest of

[809:28]

problem set 5's files homes.ext. Enter.

[809:32]

And as with problem set 5 itself, we'll

[809:34]

see a whole bunch of misspelled words

[809:36]

being printed to the screen. Some of

[809:37]

which might very well be misspelled.

[809:38]

Some of which are just not in the

[809:40]

dictionary. Some of which are simply

[809:41]

possessives of words that are in the

[809:43]

dictionary. But at the very end of this

[809:45]

output, I should see not only how many

[809:47]

words were found, but the total time

[809:49]

involved, which appears to be 1.87

[809:52]

seconds. Not bad, seeing as it only took

[809:54]

me like what, a minute or two to write

[809:56]

the actual code. But there is going to

[809:58]

be a trade-off. We'll see. Even though

[810:00]

it took me much less human time and

[810:02]

arguably was a lot easier to implement

[810:04]

this imp spell checker in Python than I

[810:07]

dare say it was for most everyone in C.

[810:10]

Let's see what that trade-off might be.

[810:12]

over in my lefthand terminal window in

[810:15]

which I'm in the C directory which I

[810:16]

brought with me as the staff solution in

[810:19]

C to problem set 5. Let's go ahead and

[810:21]

make that spellch checker. Then let's go

[810:24]

ahead and do/speller

[810:26]

and run it on the same file uh homes.ext

[810:30]

and see how long the C implementation

[810:33]

takes. Enter. And we see some of the

[810:35]

same output might be slower sometimes

[810:37]

just because of the cloud. there. Total

[810:39]

time spent in the CPU, not necessarily

[810:41]

printing everything to the screen, which

[810:43]

might take longer, is only 1.32 seconds

[810:46]

versus the 1.87 seconds in Python. Now,

[810:49]

while only half a second, that's a

[810:51]

decent percentage of the total amount of

[810:53]

time spent running the spell checker in

[810:55]

each of the windows. And so, that alone

[810:57]

seems to be one of the trade-offs. Even

[810:59]

though it seems to be much faster and

[811:01]

there say easier to implement a problem

[811:03]

in Python, there's going to be

[811:05]

trade-offs in so far as the code might

[811:07]

very well run slower. And as we'll see

[811:09]

today, that's in large part because

[811:11]

whereas C is of course compiled. That's

[811:13]

why I ran make and in turn clang. And

[811:15]

then the zeros and ones, the so-called

[811:17]

machine code is what you're running. In

[811:18]

Python, generally the pro the computer

[811:21]

is interpreting your code essentially

[811:23]

reading it top to bottom, left to right,

[811:25]

much like a human in between two other

[811:27]

humans might slowly translate one spoken

[811:30]

language to the other if those two

[811:32]

people don't in fact speak the same

[811:33]

language themselves. So there's a bit of

[811:35]

overhead when using Python, but I will

[811:38]

say that the Python community has been

[811:41]

working on this problem for some time.

[811:42]

And so in general, it's not necessarily

[811:44]

going to be as significant a trade-off

[811:46]

because there are certain tricks we can

[811:47]

do. And in fact, underneath the hood,

[811:49]

what the Python language can do for you

[811:51]

and the specific interpreter you're

[811:52]

using is technically semi-secretely

[811:54]

compile your code for you into something

[811:56]

called bite code and then run that bite

[811:57]

code, which is more efficient than

[811:59]

actually reinterpreting it again and

[812:00]

again. But we'll see more of this over

[812:02]

time. For now, let's take a look at

[812:04]

maybe two other problems that we might

[812:06]

solve, dare say more easily, more

[812:08]

quickly than we could have in C for

[812:10]

problem set 4. Let me go ahead and

[812:12]

shrink down my terminal window here.

[812:14]

Close out dictionary.py. close one of my

[812:16]

terminal windows and cd back to my main

[812:19]

directory. And let's go ahead and open

[812:20]

up that bridge bit mapap photograph that

[812:24]

we used in problem set four and had to

[812:25]

apply a number of Instagram-l like

[812:27]

filters there too. Well, now let's go

[812:29]

ahead and implement maybe one of those

[812:30]

filters, the blur filter, whose purpose

[812:33]

in life is just to blur this image.

[812:34]

Well, let's see how long this takes. Let

[812:36]

me go ahead and open up say uh blur.py,

[812:39]

which is now going to be a Python

[812:41]

program for blurring images. It's empty

[812:43]

initially, but I can pretty much write

[812:46]

this quite quickly. Now, let me go ahead

[812:48]

and at the top of this file, write the

[812:51]

Python keyword from PIL for Python image

[812:54]

library. Import a object called image

[812:58]

and another one called image filter. In

[813:00]

particular, two features of the Python

[813:02]

image library that's going to make this

[813:04]

so much easier to actually solve. And

[813:06]

then let's go ahead and define a

[813:07]

variable. We'll call it before

[813:08]

representing the before version of this

[813:10]

image. And set that equal to image.open

[813:12]

open quote unquote bridge.bmp where that

[813:15]

of course is the name of the file we

[813:17]

want to blur. Then let's go ahead and

[813:18]

create a variable called after

[813:20]

representing the after version of this

[813:23]

same filter and set that equal to before

[813:26]

filter open parenthesis image filter.box

[813:30]

blur and then just to be a little

[813:32]

dramatic I'm going to blur it more so

[813:33]

than you needed to in problem set four

[813:35]

but we'll see it more visibly now on the

[813:37]

screen. Let's do an argument of 10. And

[813:40]

then at the very end of this process,

[813:42]

let's do after.save and save it in a

[813:44]

file called say out.bmp.

[813:47]

Done. So in just four lines of code, I

[813:50]

claim I've implemented the blur function

[813:52]

now in Python of what we did previously

[813:54]

in C. Let me open my terminal window.

[813:56]

Let me run the Python command this time

[813:58]

on blur.py. Cross my fingers as always.

[814:01]

And indeed, I've made a mistake. Perhaps

[814:04]

even if you've never written Python

[814:05]

before, you can see it. And in fact,

[814:06]

we'll see a number of these errors. Some

[814:08]

intentional, some unintentional. But on

[814:10]

line four, what I intended to do was set

[814:12]

equal to uh before.filter that variable

[814:15]

I created called after. All right,

[814:17]

that's all right. Let's go back down to

[814:18]

my terminal window, clear it to get rid

[814:20]

of all that, and rerun python of

[814:22]

blur.py. Cross my fingers even harder

[814:24]

this time. Nothing bad seems to be

[814:27]

happening indeed. Now, let's go ahead

[814:29]

and open up out.bmp. And before we

[814:32]

reveal that, let's go back to the

[814:33]

original, which is bridge.bmp. BMP. And

[814:36]

now dramatically, let's see the blurred

[814:38]

version thereof.

[814:40]

Voila. Hopefully to your eyes, too. It

[814:42]

looks quite a bit blurry. Well, how

[814:44]

about one more flourish? Those of you

[814:45]

who were feeling more comfortable last

[814:47]

week and implemented perhaps uh edges

[814:50]

edge detection in C. Well, let's see if

[814:52]

we can whip that up quite quickly, too.

[814:54]

Let's go ahead and write a file called

[814:56]

edges.py using that same bridge.bmp

[814:59]

file. And in this file, let's go ahead

[815:01]

and do the following. As before, from

[815:03]

the Python image library, let's import

[815:06]

uh the image feature and the image

[815:08]

filter feature. Then, as before, let's

[815:10]

create a variable called before. Set it

[815:12]

equal to image.open, passing in

[815:15]

bridge.bmp. So, so far the same as

[815:17]

before. Now, let's create a variable

[815:19]

called after. Set it equal to before.

[815:22]

Passing in this time image filter.find

[815:25]

edges, which is different from box blur.

[815:27]

And by definition, it's going to find

[815:29]

the image the edges of this image. And

[815:31]

then after, as before, let's do

[815:33]

after.save of out.bmp and just clobber

[815:36]

the version of the blurred file that we

[815:38]

just created. All right, that's it.

[815:41]

Let's go ahead and open up my terminal

[815:43]

window now. Let's go ahead and again run

[815:45]

Python, but this time on edges.py. Cross

[815:47]

my fingers real hard. So far so good.

[815:49]

And that was quite fast. Recall that the

[815:51]

bridge.bmp image looked like this. But

[815:54]

now when we open up this new and

[815:56]

improved version of out.bmp, BMP. Thanks

[815:58]

to Python in just four lines of code, we

[816:00]

now have all of our edges detected.

[816:04]

So, what can we then learn from C

[816:06]

itself? Well, C had, of course,

[816:08]

functions. And functions were those

[816:09]

actions or verbs that simply got work

[816:11]

done. And let's go ahead and compare

[816:13]

side by side, much like we did with

[816:14]

Scratch and C, the ideas that today

[816:17]

onward, are still going to be the same.

[816:18]

And uh how they translate to Python. So,

[816:21]

on the left here, we'll now have our

[816:23]

friend Scratch. This, of course, was one

[816:25]

of the first puzzle pieces we saw. It's

[816:26]

a purple puzzle piece saying say and it

[816:28]

was a function in so far as it said the

[816:31]

value of its argument which in this case

[816:32]

is hello world. Well, we've already seen

[816:34]

in Python what this looks like. It looks

[816:36]

similar to the version in C, but it's no

[816:38]

longer print f. There's no longer a

[816:40]

semicolon and there's no longer an

[816:42]

explicit new line. So in Python, it's

[816:44]

quite simply this. Meanwhile, in Python,

[816:47]

there are a whole bunch of libraries as

[816:49]

well. Now in C we had simply header

[816:52]

files and those header files give you

[816:53]

access to the prototypes of that is the

[816:56]

signatures of the functions that you

[816:57]

want to use from those libraries. Python

[816:59]

uses somewhat different vernacular

[817:00]

whereby Python has what are called

[817:02]

modules and packages and a package is

[817:05]

just a collection of modules. But a a

[817:07]

module is just a library using Python

[817:09]

speak so to speak. So, anytime you hear

[817:12]

someone discussing a module or a package

[817:13]

in Python, they're just talking about

[817:15]

using a library. And that library might

[817:16]

come with the language itself just built

[817:18]

in as standard or it might be a

[817:20]

third-party library that you might

[817:21]

download and install yourself much like

[817:23]

I did a few weeks back when we installed

[817:25]

uh the cowsay program so that I could

[817:27]

actually have a cow or other animals on

[817:30]

the screen display text. So, in C

[817:32]

recall, we had something like this

[817:33]

include CS50.h, which was the header

[817:35]

file pre-installed for you somewhere.

[817:37]

But we will have for at least this week

[817:40]

a analog of the CS50 library in C also

[817:43]

in Python just to make this transition

[817:45]

from C to Python a bit easier. These two

[817:47]

though are meant to be training wheels

[817:49]

that you can take off and should take

[817:50]

off, you know, even within a week or so.

[817:52]

It's just meant to smooth that

[817:54]

transition and make clear what's the

[817:55]

same and what's different. So in the

[817:57]

CS50 library for Python, we also have a

[818:00]

function called get string whose purpose

[818:02]

in life is to get a string. To access it

[818:04]

though, you don't use hashincclude

[818:06]

cs50.h. That's a C thing. In Python, you

[818:08]

would say from CS50 import get string.

[818:11]

It's a little more verbose, but it's

[818:13]

also a little more precise as to what

[818:14]

you want from the library, especially if

[818:16]

you don't want the whole thing loaded

[818:17]

into memory. So here, for instance, is

[818:20]

now a Scratch program that was a little

[818:21]

more interesting than just printing out

[818:23]

hello world. This was the first program

[818:25]

we wrote that actually got some user

[818:27]

input. So in fact, let me go back to VS

[818:30]

Code and let's see if we can't resurrect

[818:32]

this C program real quickly in the form

[818:34]

of a new hello.c. So I'm going to run

[818:36]

code of hello.c and then in my ter in my

[818:40]

uh code tab I'm going to do include

[818:42]

cs50.h

[818:44]

include standard io.h and then below

[818:47]

that I'm going to go ahead and whip up

[818:49]

our familiar version of this int main

[818:51]

void and then inside the curly braces

[818:53]

we'll bring back string even though we

[818:55]

now know it's char star. We'll call our

[818:57]

variable answer. Set it equal to get

[818:59]

string. Ask the user quote unquote

[819:01]

what's your name with a space just to

[819:03]

move the cursor over. still need my

[819:05]

semicolon and C. And then after that,

[819:06]

recall back in week one, we did hello,

[819:09]

percent s back slashn and then plugged

[819:12]

in the variable answer so as to see

[819:14]

hello David, hello Kelly or something

[819:16]

else. Just to be safe, let me do make

[819:18]

hello. All is well so far dot /hello

[819:22]

type my name. And this version in C

[819:24]

seems to be working. Okay, so in C,

[819:27]

these lines of code here translate

[819:29]

pretty literally to what we just saw.

[819:32]

Although we got the answer variable in

[819:34]

Scratch for free. That blue puzzle piece

[819:36]

just existed without R having to create

[819:37]

it. But it's a decent number of hoops to

[819:40]

jump through in order to just get user

[819:41]

input and print it out. Well, in Python,

[819:44]

this is going to get a little more

[819:45]

succinct in that the Python version of

[819:47]

this code is now going to look like

[819:49]

this. Print f is now print. The

[819:52]

semicolons are gone. And what else seems

[819:55]

a little bit different?

[819:58]

Yeah.

[820:01]

>> I don't need any placeholders. Yeah. So,

[820:02]

we don't need the percent s anymore. In

[820:05]

fact, I'm curiously using a plus, which

[820:07]

if some of you studied Java or some

[820:08]

other language, you might have actually

[820:09]

seen this before. Even if you've never

[820:11]

seen Python before, you've only seen C

[820:13]

in CS50, you can probably guess what the

[820:16]

plus is doing. Even if you don't know

[820:17]

the the technical vocab, what is the

[820:19]

plus probably doing here?

[820:22]

Yeah. So, it's concatenating or joining

[820:24]

together the thing on the left with the

[820:25]

thing on the right. And we actually had

[820:26]

that vernacular in the world of Scratch.

[820:28]

We had the join puzzle piece that joins

[820:30]

hello, space and the value inside of

[820:32]

answer. A plus in Python can do exactly

[820:35]

the same thing. So it's a little more

[820:37]

user friendly than having to anticipate,

[820:39]

oh, let's put the placeholder here and

[820:41]

then come back later and plug in the

[820:42]

variable. Humans over time just realize

[820:44]

that it's a lot easier to sort of do

[820:46]

this in this way than bother with

[820:48]

placeholders. Though you can still use

[820:49]

placeholders for other purposes. Another

[820:51]

subtle difference between the C and

[820:53]

Python version of these two lines.

[820:57]

More subtle than that.

[821:02]

What's missing?

[821:04]

Yeah, I'm back.

[821:06]

>> Uh, so the back slashn is again gone for

[821:08]

Python. So that sort of happens for free

[821:11]

indeed. And one more difference.

[821:12]

>> You don't need to declare the type of

[821:14]

answer.

[821:15]

>> Yeah, we don't need to declare the type

[821:17]

of answer. Recall that if we rewind in

[821:19]

the C version, you needed to tell the

[821:21]

compiler that this is a string. And last

[821:23]

week, we could have changed string to

[821:24]

char star, but we still had to tell the

[821:26]

compiler what data type we're putting

[821:28]

into that variable. In Python, we can

[821:30]

now get rid of that data type. And

[821:32]

Python will just figure it out from

[821:34]

context. If get string returns a string,

[821:36]

well then obviously the variable should

[821:37]

store a string. If a function returns an

[821:39]

int, well then obviously the variable

[821:41]

should store an int. And the language is

[821:43]

just doing more of that decision-making

[821:44]

for you just to save you time and save

[821:47]

you thought. There's a subtlety here

[821:49]

though where we can make this program a

[821:50]

little bit different. In fact, let's

[821:51]

whip it up first in Python. Let me go

[821:53]

back to VS Code here. Clear my terminal

[821:55]

and let's go ahead and create a program

[821:57]

again called hello.py. That'll open up

[822:00]

my previous version thereof. And just so

[822:01]

we can see these things side by side,

[822:03]

I'm going to drag that tab over to the

[822:05]

right of VS Code and let go. And now you

[822:08]

can see the C version still on the left

[822:09]

and the Python version at the right.

[822:11]

What I'm going to do here now in my

[822:13]

Python version is change it to be quite

[822:16]

like the version in C now at left. So as

[822:19]

promised I'm going to do from CS50

[822:21]

import get string. Then below that I'm

[822:24]

going to say simply answer equals get

[822:26]

string quote unquote what's your name

[822:28]

question mark space no semicolon. But

[822:31]

then on the next line what I'm whoops

[822:33]

but uh parenthesis. Then on the next

[822:35]

line, I'm going to do print quote

[822:37]

unquote hello, space close quote plus

[822:42]

answer. Down here, I'm going to go ahead

[822:44]

and run Python if hello.py again. No

[822:46]

compilation step. I'm just going to

[822:48]

interpret it line by line. What's my

[822:50]

name? David. And it seems now to work

[822:52]

exactly the same. Now, it turns out in

[822:54]

Python there's even more ways to solve

[822:56]

problems like this, even trivial

[822:57]

problems like this. So here we're using

[822:59]

the plus sign, not as addition per se,

[823:01]

but as the concatenation operator, the

[823:03]

join operation. If you want though you

[823:05]

can take advantage of the fact that

[823:06]

print in Python can take more than one

[823:09]

argument. It can take two or three or

[823:12]

four or even zero by simply changing the

[823:15]

plus to a comma getting rid of that

[823:18]

seemingly superfluous space and just

[823:20]

give print two things to print because

[823:21]

it turns out per the documentation of

[823:23]

print which we'll eventually see it

[823:25]

knows that if it takes one two arguments

[823:28]

by default separate them for you by a

[823:30]

single space and that's something we can

[823:32]

override as well. which one is better

[823:34]

like h like I don't know like they're

[823:36]

sort of equivalent. It's such a trivial

[823:37]

difference but it speaks to the

[823:39]

flexibility that you'll start to have

[823:41]

whereby the language is a little less

[823:42]

rigid than C was certainly when it comes

[823:44]

to printing strings. So in fact if I go

[823:46]

back to VS Code here and I go ahead and

[823:50]

change that plus to a comma and get rid

[823:52]

of the space inside of the quotes. I can

[823:54]

rerun Python of hello.py, type in my

[823:57]

name and we see exactly the same result

[824:00]

there. But we can take this one step

[824:03]

further. Even though it's going to look

[824:04]

a little cryptic, this is sort of the

[824:06]

more Pythonic way to do things. And that

[824:08]

too is actually a term of art to do

[824:10]

something Pythonically is to do it the

[824:12]

way that most Python programmers would

[824:14]

do it. It's not the only way. It's not

[824:16]

necessarily the right way, but it's sort

[824:17]

of the recommended way in the community.

[824:19]

So here we have that latest version

[824:21]

where I'm passing two arguments to

[824:22]

print. The first is quote unquote hello,

[824:25]

and then the second of which is the

[824:27]

value of answer. I could similarly write

[824:30]

this same program with this crazy

[824:32]

syntax. Takes a little getting used to,

[824:34]

but it turns out it's actually kind of

[824:36]

nice overall. What's obviously

[824:38]

different? Well, one, there's these

[824:40]

weird curly braces are back. They're not

[824:42]

part of the logic of the program.

[824:43]

They're literally inside of the double

[824:46]

quotes. But you can probably guess how

[824:48]

this what this does for me because

[824:50]

there's one other crucial difference.

[824:53]

What else has changed between before and

[824:56]

after?

[824:57]

Yeah, there's this weird f which is not

[824:59]

part of print f. It's actually inside of

[825:02]

the parenthesis and next to the double

[825:04]

quotes. And even this one when this came

[825:05]

out was a little weird looking to

[825:07]

people. But this is how you get this

[825:09]

thing to be a formatted string, aka an F

[825:12]

string, as opposed to it being just a

[825:14]

literal string of text. Now, you can

[825:16]

probably guess what it means to put the

[825:18]

variable's name inside of the curly

[825:20]

braces. It means the value of that

[825:22]

variable is going to be substituted

[825:24]

right there. Similar in spirit to the

[825:26]

percent s in C, but a little more

[825:28]

explicit. With the percent S, you had to

[825:30]

remember that that percent S corresponds

[825:32]

to this variable's value or something

[825:33]

like that, which was just annoying if

[825:35]

anything else uh if anything. But this

[825:38]

time you have a placeholder in curly

[825:40]

braces that just says what you want

[825:42]

there, that particular value. And what

[825:44]

this means more technically is that the

[825:46]

answer variable will be interpolated by

[825:49]

the interpreter which means its value

[825:51]

will be plugged in right there. So let's

[825:53]

try this. Let me go back over to VS Code

[825:56]

and quite simply on my last line of code

[825:59]

here, let's change the input to print to

[826:02]

be quote unquote hello, and then curly

[826:05]

brace answer

[826:08]

then close curly brace close quote. And

[826:11]

I've done this. This is intentional, but

[826:13]

let's see. Let me go ahead and rerun

[826:14]

python if hello.py davv ID. What are we

[826:17]

about to see? Hello,

[826:21]

answer. So this is a bug, but just to

[826:24]

demonstrate like what is going on and

[826:26]

what's therefore missing. What what did

[826:28]

I forget? Yeah.

[826:31]

>> Yeah, I didn't declare that this is a

[826:33]

so-called fring or format string. The

[826:35]

fix for this, weirdly, is just to put an

[826:37]

F right there. And now if I rerun Python

[826:40]

of hello.py, Pi. Type in my name again.

[826:42]

Cross my fingers. Now I see that the

[826:45]

variable has indeed been interpolated

[826:46]

and its value plugged in where I wanted

[826:50]

it. All right. Turns out we can take off

[826:52]

one of these training wheels already. I

[826:54]

I propose that get string just exists in

[826:56]

the library just to smooth the

[826:57]

transition, but honestly it's not really

[826:59]

doing anything all that interesting. So

[827:01]

let's take this first training wheel

[827:02]

off. It turns out that Python comes with

[827:04]

a function appropriately named input

[827:07]

such that if you want to get input from

[827:09]

the human via their keyboard, you can

[827:11]

just use the input function. So we can

[827:13]

already for this program get rid of the

[827:15]

CS50 library because input essentially

[827:17]

behaves just like the get string

[827:19]

function. So if I go back to my Python

[827:21]

version here, I can change get uh get

[827:23]

string to input. And I can even go and

[827:26]

delete this training wheel up there.

[827:28]

Rerun Python of hello.pay in my

[827:30]

terminal. DAV ID enter and we're still

[827:33]

in business as well. So input is

[827:35]

generally going to be the way you go

[827:36]

about getting input now from the user.

[827:40]

All right, let me pause here and see if

[827:41]

there's any questions as we try to

[827:43]

bridge these two worlds from C to

[827:46]

Python. Yeah,

[827:49]

>> so in Python, we don't need the main

[827:51]

function. And why is that?

[827:53]

>> Good question. In Python, why don't we

[827:55]

need the main function anymore? because

[827:57]

clearly that's been omnipresent in like

[827:59]

every program we've written thus far.

[828:00]

And here we have it in all of our Python

[828:02]

programs thus far absent. It turns out

[828:04]

that humans realize it's just so common

[828:07]

that you want the file you're editing to

[828:09]

be the main part of your program. Like

[828:11]

why bother adding the additional syntax

[828:14]

of saying int main void or something

[828:16]

analogous? It's just easier if you want

[828:18]

to write two lines of code to get some

[828:19]

work done. Why do you have to waste my

[828:21]

time adding all of these this

[828:22]

boilerplate code which we've been doing

[828:25]

up until now. Now that said, we're going

[828:26]

to bring back main in a little bit

[828:28]

because it will solve a problem. But

[828:30]

generally speaking, what I'm doing here

[828:31]

is indeed a program, but people in the

[828:33]

real world would also call these scripts

[828:35]

where a script is like a lightweight

[828:36]

program that pretty much just reads top

[828:38]

to bottom, left to right. It might be

[828:39]

fairly lightweight. It's really

[828:40]

synonymous with writing a program, but

[828:43]

this is again one of the appeals of a

[828:44]

language like Python. You can just get

[828:46]

right in and get out and get the job

[828:47]

done. Even Java has moved to this in

[828:50]

recent years where you don't have to put

[828:51]

everything in a class. Uh public static

[828:53]

void main for those familiar. You can

[828:55]

just write uh system.out.print line and

[828:58]

get some work done.

[829:00]

>> Yeah.

[829:01]

>> Is input only for string?

[829:03]

>> Good question. Is input only for a

[829:04]

string? Yes. Right now it will get input

[829:06]

from the user via their keyboard and

[829:08]

you'll get back a string just like get

[829:10]

string. And we'll come back to why

[829:12]

that's maybe not a a good thing. All

[829:15]

right. So what's more might we want to

[829:18]

do at this point? Well, let's tease

[829:20]

apart some differences now with C. So up

[829:23]

until now, every argument we've ever

[829:25]

passed into a function in C and Scratch

[829:28]

for that matter is a so-called

[829:30]

positional parameter. And a parameter is

[829:33]

the same thing as an argument, but

[829:34]

generally when you're looking at the

[829:35]

function from the functions perspective,

[829:37]

it's a parameter that it accepts. But

[829:39]

when you're calling the function and

[829:40]

passing in an input, you call it

[829:42]

typically an argument, but they refer to

[829:44]

essentially the same thing. And all of

[829:45]

the parameters we've been passing into

[829:48]

functions thus far have been positional

[829:50]

in the sense that the order matters. the

[829:53]

first thing, then the second thing, then

[829:54]

the third thing, and so forth. For

[829:55]

instance, with print f, the first thing

[829:57]

has to be the quoted string, maybe with

[829:58]

a placeholder, and then if there's

[830:00]

another argument after the comma, that

[830:02]

can be the second argument, the third

[830:04]

argument, and so forth. But it turns out

[830:06]

Python additionally supports what are

[830:08]

called named parameters, whereby you

[830:11]

don't have to rely only on the order in

[830:13]

which you're enumerating the arguments

[830:15]

to a function. And that's helpful

[830:16]

because some functions, especially in

[830:19]

the real world, when you start using

[830:20]

other people's libraries that have lots

[830:22]

of functionality, they might not take

[830:23]

just one or two arguments. They might

[830:25]

take four arguments, 10 arguments, maybe

[830:28]

even more. And it can just be unwieldy

[830:30]

to have to remember the precise order of

[830:32]

all those arguments. You're just asking

[830:33]

for trouble if you're going to screw up

[830:34]

or a colleague is going to get the order

[830:36]

out of uh out of whack. So with name

[830:39]

parameters, you can actually be explicit

[830:41]

with Python and tell it what argument

[830:44]

you are trying to pass in by giving it

[830:47]

an actual name. So let me go over to VS

[830:49]

Code here and propose that we use this

[830:51]

for really the simplest of programs in

[830:54]

order to override that default new line

[830:58]

that we seem to be getting for free just

[831:00]

by calling print. In other words, let me

[831:02]

go ahead here and clear my terminal

[831:04]

window. Let me close. C and focus only

[831:07]

on hello.py for just a moment. And let's

[831:09]

make it much simpler like the very first

[831:11]

version and just print out using

[831:13]

Python's print function, not print f

[831:15]

quote unquote hello world close quote.

[831:18]

And now here I'm going to do Python of

[831:20]

hello.py. Enter. And we still see that

[831:22]

the cursor moves to the next line. The

[831:24]

dollar sign moves to the next line

[831:26]

because I'm automatically getting a new

[831:28]

line. Well, what if you don't want that?

[831:30]

How can you override that behavior?

[831:31]

Well, you can actually use a named

[831:33]

parameter in Python. And I can go up

[831:36]

here and add a second argument that if

[831:39]

it were just something like uh this,

[831:42]

that would literally print out the word

[831:44]

this because it's just another string.

[831:46]

But if I give it a name like end equals

[831:50]

quote unquote, I can override the

[831:53]

default behavior of the Python print

[831:55]

function by changing the value of its

[831:58]

end parameter to be the so-called empty

[832:01]

string, quote unquote, which means

[832:02]

literally there's nothing there. Watch

[832:05]

what happens now. If I run Python of

[832:07]

hello.py and hit enter, the dollar sign

[832:10]

is weirdly and sort of in the ugly way

[832:12]

on the same line, just like it was when

[832:14]

I made the mistake in C in week one of

[832:16]

omitting the backslash.

[832:18]

That is to say, what the default value

[832:20]

of this end parameter really is is quote

[832:23]

unquote back slashn. And I can make it

[832:25]

explicit by changing my code as such.

[832:27]

I'm going to go ahead and rerun python

[832:29]

of hello.py. And now the cursor is back

[832:32]

on the next line. And not that this is

[832:33]

that useful other than overriding that

[832:36]

default, but you could do fun things

[832:37]

like exclamation point, exclamation

[832:39]

point, exclamation point if you really

[832:40]

want print to be excited to print some

[832:42]

things for you. And if I now run Python

[832:44]

of hello.pay a third time, now you see

[832:47]

that it's ending with exclamation point,

[832:49]

exclamation point, exclamation point.

[832:50]

Looks a little stupid with the dollar

[832:51]

sign. So you could even toss in a new

[832:53]

line there. Run it yet again. And now we

[832:56]

sort of get both of those there. But I

[832:58]

would say the common case is to use that

[833:00]

end uh named parameter simply to

[833:03]

override it. So how do you learn more

[833:05]

about these kinds of things? Well, if

[833:06]

you go to the official documentation for

[833:09]

Python, which is a thing more so than

[833:11]

with C, like if you want to learn more

[833:13]

about Python and the functions it offers

[833:15]

and the arguments it takes, you go to

[833:17]

the official documentation uh

[833:18]

docs.python.org. This is essentially

[833:20]

analogous to the so-called manual pages

[833:22]

or man pages that CS50 has a version of,

[833:25]

but there is no one de facto source for

[833:27]

those man pages. Several different

[833:29]

versions of them exist in the while.

[833:30]

Whereas Python itself as a community

[833:32]

maintains its own official

[833:34]

documentation. So for instance, if you

[833:35]

go to a specific URL like this ending in

[833:38]

functions.html, you'll see an exhaustive

[833:40]

list of all of the functions that come

[833:42]

with Python besides just the print

[833:44]

function. And we'll see a bunch of more

[833:46]

today. If specifically you scroll down

[833:48]

to the print uh documentation, you'll

[833:51]

see something that's a little arcane

[833:53]

that looks like this. But this is

[833:55]

representative of a Python prototype, if

[833:58]

you will, often also called a signature

[834:00]

that just tells you the name of a

[834:02]

function and then how many and what type

[834:04]

of arguments it takes. So how to read

[834:07]

this? Well, the print function takes

[834:09]

some number of objects. So in Python

[834:12]

specifically this syntax of star objects

[834:14]

just means zero or more objects whatever

[834:16]

that is like a number or a string or

[834:18]

something else the stuff you want to

[834:20]

print out. After that if you start using

[834:22]

named parameters you can specify what

[834:24]

the default separator is the separator

[834:27]

between arguments to print. So, recall

[834:29]

that when I did quote unquote hello,

[834:32]

comma, quote unquote, uh, or quote

[834:35]

unquote hello, comma, answer, that was

[834:37]

separated automatically for us by a

[834:39]

single space, even without my hitting

[834:41]

the space bar inside of my quotes.

[834:43]

That's because the default value here is

[834:45]

in fact a single space. The default

[834:47]

value for end, as promised, is indeed

[834:50]

back slashn. And then there's some other

[834:51]

stuff related to file IO that print can

[834:53]

also deal with, but more on that perhaps

[834:55]

another time. There's one curiosity

[834:57]

here. In Python, it turns out that you

[834:59]

can use double quotes or single quotes

[835:01]

around strings, where in C, it was much

[835:03]

more regimented. Double quotes are for

[835:05]

strings and single quotes are for

[835:08]

chars, characters only, single

[835:10]

characters. It doesn't matter in Python

[835:13]

which one you use so long as you're

[835:15]

consistent. And stylistically, you

[835:16]

should really pick one and go with it.

[835:18]

And the only time you should really

[835:19]

alternate between the two is maybe if

[835:21]

you want to put like an apostrophe for

[835:23]

some human's name inside of double quote

[835:25]

inside of single quotes or something

[835:27]

like that. But generally you have a

[835:28]

little more flexibility in Python. And

[835:30]

you'll see in different languages Python

[835:32]

community tends to use single quotes at

[835:34]

least in the documentation. The

[835:35]

JavaScript world tends to use single

[835:37]

quotes. Um we in CS50 often use double

[835:39]

quotes just for consistency with what we

[835:41]

do in C. But any uh community or company

[835:45]

would typically have its own style guide

[835:46]

that dictates which one you should use

[835:48]

if only for consistency

[835:51]

questions then on this here print

[835:54]

function

[835:56]

as just representative of all of the

[835:58]

docs that you'll see.

[836:02]

All right. Well, let's take a quick look

[836:03]

at variables. We've used these a few

[836:04]

times already, but let's focus in a

[836:06]

little more detail on what's actually

[836:08]

different in Scratch. If you wanted to

[836:09]

create a variable called counter and set

[836:11]

it equal to zero, you would use this

[836:12]

orange puzzle piece here. In C, you

[836:14]

would do something like this. The type

[836:16]

of the variable, the name of the

[836:17]

variable, and then set it equal to the

[836:18]

initial value semicolon. In Python, it's

[836:21]

going to be a little similar, but you

[836:23]

can probably guess where we're going

[836:24]

with this. How is this line of code

[836:26]

probably about to change? Yeah,

[836:31]

>> good. We're not going to bother with int

[836:33]

or the data type more generally. We're

[836:34]

just going to say counter cuz obviously

[836:36]

like a smart interpreter can just figure

[836:37]

it out from context that you're putting

[836:39]

a zero in there. It's obviously an

[836:40]

integer. And what else is about to go

[836:42]

away? The semicolon. So this is the C

[836:45]

version. And voila, this now is the

[836:47]

Python version. And this is as silly as

[836:48]

this example is, it's kind of

[836:49]

representative of how languages like

[836:52]

Python just tend to be a little more

[836:53]

programmer friendly because you just

[836:54]

type less and get the same work done.

[836:56]

All right. So if we wanted to do

[836:57]

something now in Scratch like increment

[836:59]

the counter by one, you would use this

[837:01]

puzzle piece here. In C, we could do

[837:03]

something like this. In Python, it's

[837:05]

going to be almost exactly the same

[837:06]

except of course no semicolon. In C, we

[837:09]

could alternatively do this. And you can

[837:11]

also do this in Python. Uh in C though,

[837:16]

you could also do what other technique

[837:19]

>> plus+ I'm sorry, but Python has taken

[837:21]

that away from us. So if you got into

[837:22]

the habit of using plus+ or minus minus,

[837:24]

that's great. Use them in C all you

[837:26]

want. In Python, they just don't exist.

[837:28]

So you'll see this more commonly instead

[837:30]

as the heruristic. All right. What about

[837:32]

the various types that exist in Python?

[837:35]

Because even though you don't have to

[837:37]

specify the types when declaring your

[837:39]

variables, they do in fact actually

[837:41]

exist underneath the hood. And it's

[837:43]

worth knowing a little something about

[837:44]

them because not knowing will lead often

[837:46]

to some form of bug. So in C, we had

[837:49]

types like this bull, char, double,

[837:51]

float, int, long, and string. The last

[837:53]

of which was thanks to the CS50 library.

[837:54]

that last week we would have started

[837:56]

calling uh a string charst star instead

[837:58]

which it still is a data type the

[838:00]

address of some char. In Python we're

[838:03]

going to whittle this list down to a

[838:04]

subset of those essentially whereby we

[838:06]

still have bulls we still have floats we

[838:08]

still have ins and we do have strings

[838:10]

but they're literally called stirs str.

[838:12]

So it's not a CS50 thing. The Python

[838:14]

community call strings str. But absent

[838:17]

from this list is any mention of star

[838:19]

not to mention charst star. There are no

[838:22]

pointers in Python. And indeed, as

[838:24]

powerful as I'd hope you found uh weeks

[838:27]

four and five to be, I dare say you also

[838:30]

found them incredibly frustrating and

[838:32]

challenging and want to yield bugs in

[838:35]

your code because with that power of

[838:37]

memory management comes a whole slew of

[838:40]

potential mistakes that you can make.

[838:41]

And that's true not just for CS50

[838:43]

students, but for programmers, adult

[838:44]

programmers, full-time programmers

[838:46]

around the world. And so among the other

[838:48]

features of languages like Python is

[838:50]

they try to take away certain features

[838:52]

of languages like C that were just too

[838:54]

dangerous in the first place might be

[838:56]

wonderfully powerful might help you

[838:57]

solve problems more quickly more

[838:59]

precisely but if they tend to do more

[839:01]

damage than they're worth sometimes it's

[839:03]

worth just abstracting those details

[839:05]

away. Similarly Java has references as

[839:07]

some of you might know but does not have

[839:09]

pointers per se. You can't go poking

[839:11]

around arbitrary locations in memory in

[839:13]

the same way that you can with C. So,

[839:16]

let's take some of these data types out

[839:18]

for a spin and see what's the same and

[839:19]

what's different. Let me go back to VS

[839:20]

Code here and let me propose that we

[839:22]

bring back one of our old calculators

[839:25]

from a while back. So, let me clear my

[839:26]

terminal, close hello.py, and let me go

[839:30]

ahead and open up a version of this

[839:32]

program that I brought in advance, which

[839:34]

was our calculator version 0 from back

[839:37]

then. So, just to remind you, one of the

[839:39]

first versions of our calculator had the

[839:42]

CS50 library as well as the standard IO

[839:44]

library. And then we simply got an int

[839:47]

using get int in week one. We got

[839:49]

another int in week one using get int.

[839:51]

And then we simply perform some

[839:53]

addition. So it was a very trivial

[839:54]

calculator that we did very early on

[839:56]

just to demonstrate some of the

[839:57]

operators and syntax of C. Well, let's

[840:00]

go ahead and try converting this to

[840:02]

Python by creating our own program

[840:04]

calculator.py. So in my terminal window,

[840:06]

I'm going to write code of uh

[840:09]

calculator.py.

[840:11]

It's going to open another tab which I'm

[840:13]

just going to drag over to the right

[840:14]

just so we can see both side by side. I

[840:16]

won't bother with uh say well let's do

[840:19]

it for par here. Let me copy the C code

[840:22]

into the Python file even though this

[840:23]

will not work in the same way but let's

[840:26]

keep what we need and get rid of what we

[840:28]

don't. So instead of the slash for

[840:31]

comments in Python turns out the

[840:33]

convention is to use a single hash

[840:35]

symbol like this. So it's a minor

[840:38]

difference. It's uh half as many

[840:39]

keystrokes. So that's nice, but we're

[840:41]

not going to include anything like this.

[840:42]

But we are going to do from CS50, let's

[840:44]

import a function that I promised would

[840:46]

exist called get int. But we'll soon get

[840:48]

rid of that training wheel as well. We

[840:50]

don't need main or this curly brace. We

[840:52]

don't need this curly brace. And we

[840:54]

don't need all of this indentation as a

[840:56]

result. So I'm going to move all of that

[840:57]

over to the left. I'm going to fix all

[840:59]

of the comments to be Python comments by

[841:02]

changing the slash to hash symbols. And

[841:05]

now I'm going to change each of these

[841:06]

three lines of code, as you might

[841:08]

expect, to the Python version. So you

[841:10]

probably can guess already, we can get

[841:12]

rid of the int there and the int there.

[841:14]

We can get rid of the semicolon here and

[841:16]

the semicolon here. We can get rid of

[841:18]

the f in print f here. And we can get

[841:21]

rid of the semicolon here. And there's a

[841:23]

few different ways we could do this, but

[841:24]

I dare say the simplest is going to be

[841:26]

to get rid of the format code altogether

[841:28]

and that first argument and just tell

[841:30]

Python to print x + y. So, there's a few

[841:33]

different ways we can do this, but

[841:34]

that's probably the most literal

[841:35]

translation of the program at left to

[841:37]

the program at right. Let's reopen the

[841:39]

terminal window and run Python of

[841:41]

calculator.py and hit enter. Let's do

[841:44]

something like x is 1, y is two, and

[841:46]

hopefully we do in fact get three. All

[841:50]

right, so that's all fine and good, but

[841:53]

let's take off one of our training

[841:55]

wheels now. So, let me get rid of our C

[841:58]

version here and focus just for the

[842:00]

moment on Python. Let's take away this C

[842:02]

code. And what was the function we can

[842:04]

use to get user input?

[842:07]

Yeah, it was called a little louder.

[842:09]

It's just called input. So, let's get

[842:11]

rid of CS50's get int already and use

[842:14]

input instead. All right. So, this

[842:17]

program is much simpler already. So,

[842:19]

let's go ahead and reopen the terminal

[842:20]

window. Run Python of calculator.py.

[842:23]

Do one again for x, two again for y, and

[842:26]

of course 1 + 2 equals 12.

[842:31]

So what's going on here? Because clearly

[842:35]

this is a step backwards. Yeah.

[842:39]

>> Yeah. So in the context of strings, plus

[842:42]

represents concatenation, the joining of

[842:44]

two arguments on the left and the right

[842:46]

here that seems to be what's happening

[842:48]

because it's not 12 per se. It's more

[842:50]

literally one two concatenated together.

[842:53]

But why is that? Well, apparently the

[842:54]

input function indeed returns a string.

[842:57]

That is the key. Those are the

[842:58]

keystrokes that came back from the user.

[842:59]

might look like numbers and Arabic

[843:01]

numerals to us one and two but it's

[843:04]

being treated as a string more

[843:06]

technically like underneath the hood

[843:07]

there is some char star stuff going on

[843:09]

there even though we're not using that

[843:11]

same terminology so intuitively what's

[843:14]

going to be the solution

[843:16]

without just reverting to using the

[843:18]

training wheel that is the get int

[843:20]

function from CS50 put another way how

[843:22]

did CS50 probably implement get int

[843:25]

might you think

[843:29]

>> Yeah. So recall that in C we could cast

[843:31]

some data types to other data types.

[843:33]

Typically ints to chars or chars to

[843:35]

ints. It's not quite as simple as

[843:38]

casting in this case because underneath

[843:40]

the hood thanks to our knowledge of C.

[843:43]

There's a bunch of stuff going on.

[843:44]

There's probably a one and there's a

[843:46]

null character. There's a two and

[843:48]

there's a null character. So it's not

[843:49]

quite as literal as a char to an int or

[843:52]

an int to a char. So, we're going to

[843:53]

more properly convert the string or the

[843:57]

stir to an int. We're not casting, but

[843:59]

converting. And converting just implies

[844:01]

that there's a little more work that has

[844:02]

to be done. But thankfully, Python can

[844:04]

do this for us. In fact, let me go up to

[844:06]

line four here and say, uh, pass the

[844:11]

well, actually, let's do it in this a

[844:12]

couple ways. Let's first convert the x

[844:15]

value to an integer. Let's convert the y

[844:18]

value to an integer as well. So, funny

[844:20]

enough, it's very similar syntactically

[844:22]

to casting, but in C, when you cast

[844:24]

something, you actually wrote the data

[844:26]

type in parenthesis. Now, the data type

[844:28]

itself is a function that takes an

[844:31]

argument, which is the stir or string

[844:33]

that you want to convert. So, let me go

[844:35]

back to my terminal, do Python of

[844:37]

calculator.py, enter, type in one, type

[844:39]

in two, and now I get back my three

[844:42]

answer. Now, as you might imagine, just

[844:44]

like in C, we can kind of play around

[844:45]

with where we're performing some of

[844:47]

these operations. And this looks, you

[844:49]

know, arguably a little less obvious now

[844:51]

as to what is being added. So I really

[844:53]

like the simplicity of x plus y just

[844:55]

does what it says. So I could convert

[844:58]

these in other ways. I could say after

[845:00]

line four, you know what, re change x to

[845:03]

be the int version of x. But generally

[845:06]

speaking, that's kind of wasting a line

[845:08]

of code by just doing something you

[845:09]

could do on a single line. So let me

[845:11]

delete that and instead just say that

[845:13]

well if I know the return value of the

[845:15]

input function is a stir let's just pass

[845:18]

that output as the input to the int

[845:21]

function and it'd be a little more

[845:22]

Pythonic so to speak to just pass the

[845:26]

input functions output as the input to

[845:29]

int which is really hard to say but

[845:31]

we've done this in C just nesting

[845:33]

function calls like this. All right so

[845:35]

if I run this one more time Python of

[845:38]

calculator.py pi. Type in one. Type in

[845:39]

two. We're back now in business. Now,

[845:42]

what I won't trip over just yet is a

[845:44]

subtlety that whereby I'm deliberately

[845:46]

typing in actual numbers like one and

[845:48]

two, but if you are following along at

[845:50]

home or on your laptop, if you were to

[845:51]

type in cat and dog, like bad things

[845:54]

will happen. But we'll come back to that

[845:56]

before long. All right. Questions though

[845:59]

on any of this conversion of our strings

[846:02]

to our

[846:05]

integers in this case? Oh, all right.

[846:08]

Well, what more does Python offer to us?

[846:10]

Well, in addition to these data types,

[846:13]

there's actually going to be a bunch of

[846:14]

others. A few of which we'll actually

[846:15]

use today. In fact, we'll see ranges of

[846:17]

numbers. That's like that's a thing

[846:18]

built into Python. We'll see lists of

[846:20]

numbers, which is going to be like a new

[846:21]

and improved version of an array that

[846:23]

solves like all of last week's problems

[846:25]

when we talked about the downsides of

[846:27]

using arrays. There's going to be tpples

[846:29]

for things like x, y coordinates or GPS

[846:32]

coordinates or anything where you have

[846:33]

collections of values. There's going to

[846:35]

be dicks or dictionaries whereby you can

[846:37]

have key value pairs provided to you

[846:40]

without having to write a whole hash

[846:41]

table yourself. And you can have sets

[846:43]

which you can use to just contain unique

[846:45]

sets of values that you just want to

[846:47]

check for membership. And there's

[846:49]

bunches of other data types as well. And

[846:51]

this is where languages like Python

[846:52]

start to get really powerful because all

[846:54]

of the data structures we talked about

[846:57]

in C, we really only got from the

[846:59]

language itself an array. everything

[847:01]

else we had to build or at least talk

[847:03]

about building in class. These now and

[847:06]

more come with the language. Meanwhile,

[847:08]

in the CS50 library for Python, just so

[847:10]

you know, there are a whole bunch of

[847:12]

functions. These though were the C

[847:13]

versions. In Python, it stands to reason

[847:15]

that we don't need as many because

[847:16]

there's fewer data types in Python, but

[847:18]

get float, get int, and get string do

[847:20]

all exist in the CS50 library for

[847:22]

Python. you're welcome and encouraged to

[847:23]

use it because indeed among the goals

[847:25]

for problem set six are going to be to

[847:28]

redo some of your C problem set problems

[847:32]

in Python where you can look at your own

[847:34]

C code and hopefully um uh you like that

[847:37]

solution and figure out how to convert

[847:39]

it line by line essentially to the

[847:40]

corresponding Python version but clearly

[847:42]

we've seen ways of taking these training

[847:44]

wheels off quite quickly as well and in

[847:47]

fact if you wanted to import all three

[847:48]

of those functions for a larger program

[847:50]

you could do this just following the uh

[847:53]

approach that I took so already, but you

[847:55]

can also just separated them by commas

[847:56]

like this. Or it turns out you can also

[847:59]

import the whole CS50 library as you'll

[848:01]

see in some code and then just access

[848:03]

the functions within with slightly

[848:05]

different syntax as well. All right, how

[848:06]

about another construct from scratch and

[848:08]

from C now in fact in Python. So in uh

[848:12]

Scratch if we wanted to do a comparison

[848:13]

like is X less than Y where each of

[848:15]

those are variables then say as much

[848:17]

here in C it looked like this and nicely

[848:21]

enough you can probably guess already

[848:22]

which what's going to change here like

[848:24]

the f is about to go away the back

[848:26]

slashn is about to go away the semicolon

[848:28]

is about to go away but some other

[848:30]

stuff's about to go away as well focus

[848:32]

your attention on the syntax like

[848:33]

parenthesis and curly braces because in

[848:35]

Python it's just that so we got rid of

[848:38]

the parenthesis because they didn't

[848:39]

really add all that much logic ically we

[848:41]

got rid of the curly braces which

[848:42]

technically we could do in C anytime

[848:45]

there's a single line of code inside of

[848:47]

a conditional but for uh consistency

[848:50]

stylistically we always use them as

[848:52]

well. Python though does not have you

[848:55]

use any of those curly braces at all.

[848:57]

But Python requires that you indent your

[849:00]

code properly. So, if you've ever been

[849:02]

among those who are writing out your

[849:03]

program and like everything is just

[849:05]

crazily like left aligned and just a big

[849:07]

mess until style 50 swoops in and cleans

[849:09]

it up for you, you're not going to be

[849:11]

able to write Python code like that

[849:12]

anymore. That's been such a societal

[849:14]

problem among programmers, newbies and

[849:17]

professionals alike, that the language

[849:19]

itself requires logically that if you

[849:21]

want this line of code to execute if

[849:23]

this boolean expression is true, you've

[849:25]

got to indent this line by convention

[849:27]

four spaces. You can't be lazy and leave

[849:30]

it all left aligned and sort of fix it

[849:31]

up later. This has made Python code

[849:33]

arguably more readable because of these

[849:35]

language-based requirements. Meanwhile,

[849:37]

let's look at a if else construct in

[849:39]

Scratch which looked a little something

[849:41]

like this. In C, it looked like this,

[849:43]

which is kind of a lot of lines just to

[849:44]

express the simple idea. All of those

[849:46]

same things are going to go away.

[849:49]

Whereby in Python, it looks like this

[849:51]

instead. And the only other difference

[849:52]

worth calling out is that because you

[849:54]

don't have the curly braces, you do have

[849:55]

a colon which precedes the subsequent

[849:58]

indentation as well. Meanwhile, if we've

[850:00]

got an if else if else in Scratch in C,

[850:03]

of course, it looked like this. A lot of

[850:05]

this is going to go away in the flash of

[850:07]

a screen, but there's going to be a

[850:08]

curiosity, which is not in fact a typo.

[850:11]

Notice what happens with the elseif.

[850:13]

It's abbreviated L if. And honestly, to

[850:16]

this day, all these years later, I can

[850:17]

never remember if it's l if or else if

[850:19]

because different languages use

[850:20]

different shorthand spellings of this

[850:22]

phrase. It's L if in Python. Uh because

[850:25]

that's maybe the most succinct you can

[850:26]

make the two words themselves. But

[850:29]

everything else is effectively the same,

[850:31]

including the additional colon this

[850:32]

time. Okay, questions on any of those

[850:36]

conditionals and syntax. Yeah.

[850:38]

>> So, what language did they code Python?

[850:41]

>> What a good question. What language did

[850:43]

they code Python in? The interpreter we

[850:45]

are using within VS code is itself

[850:49]

written in C aka C Python. However, you

[850:52]

can implement a Python interpreter

[850:53]

really in any language including machine

[850:56]

code like raw zeros and ones if you have

[850:58]

that much free time in assembly language

[851:00]

which we saw briefly weeks ago. You

[851:02]

could write an interpreter for Python in

[851:03]

Python if you really want to be meta

[851:05]

about it or in C++ or in Java. This is

[851:08]

the thing about programming languages.

[851:10]

You can use any language to create a

[851:12]

compiler for or interpreter for another

[851:14]

language. What's going to vary is just

[851:16]

how easy or difficult it is and how much

[851:19]

time it therefore takes you. Good

[851:21]

question. Other questions on any of

[851:24]

these here features?

[851:26]

Oh. All right. Well, let's do something

[851:28]

a little bit uh different in Python visa

[851:32]

VC by opening up maybe a comparison

[851:35]

program that we looked at some time ago.

[851:37]

So, let me go back to VS Code here. I'm

[851:39]

going to close my calculator and I'm

[851:41]

going to open up now from my uh

[851:44]

distribution code today a version of our

[851:47]

comparison program from a while back

[851:48]

which was essentially the uh version

[851:51]

three zero index thereof. So this one

[851:53]

has comments which the very first one in

[851:55]

week one did not. But notice as a

[851:57]

refresher what this comparison program

[851:58]

was doing. It was including cs50.h and

[852:00]

standard.io.h. It was prompting the user

[852:02]

for two integers via get int x and y. It

[852:05]

was then doing a very simple comparison

[852:07]

comparing X against Y to determine if

[852:09]

it's less than, greater than, or dot dot

[852:12]

dot the same as X and uh the same or

[852:16]

equal to the same. So just so that we

[852:19]

can go through the motions of converting

[852:20]

one of these to the other, let's do that

[852:22]

side by side. Let me code a program

[852:24]

called compare.py. Let me close my

[852:27]

terminal. Drag the Python version over

[852:28]

to the right here. And without comments

[852:31]

this time, let's just do from CS50

[852:33]

import get int. Then below that, let's

[852:36]

do x equals get int and ask the user for

[852:39]

what's uh x question mark. Then let's

[852:42]

ask the user for y using get intquote

[852:45]

what's y question mark. Then below that,

[852:48]

let's do if x less than y colon. Go

[852:51]

ahead and print quote unquote X is less

[852:53]

than Y. Close quote. L if X greater than

[852:58]

Y. Go ahead and print quote unquote X is

[853:01]

greater than Y. Else colon, let's go

[853:05]

ahead and print out quote unquote X is

[853:07]

equal to Y. So I dare say these are now

[853:11]

equivalent. It's clearly fewer lines

[853:13]

because a lot of the lines it left were

[853:14]

admittedly comments, but also some curly

[853:17]

braces. And there's more syntax like

[853:18]

parenthesis that we got rid of, too. Let

[853:20]

me open my terminal window. Let me run

[853:22]

Python of compare.py.

[853:24]

We'll type in one and two. One is less

[853:28]

than uh x is less than y. Let's do it

[853:30]

again using two and one. x is greater

[853:33]

than y. Let's do it one last time. One

[853:35]

and one. And of course, those two now

[853:37]

are equal to each other. All right. But

[853:41]

why go down this road again? Because

[853:43]

that was kind of a simple exercise. But

[853:44]

recall that we introduced this

[853:46]

comparison of ants because it was so

[853:48]

sort of stupidly simple. even if the

[853:50]

syntax at that week was completely new.

[853:52]

But we ran into an issue pretty fast

[853:54]

when we started comparing strings. And

[853:56]

that was a problem we really only fixed

[853:58]

in week four when we finally revealed

[854:00]

what a string actually is. If we focus a

[854:02]

bit more on Python strings, it turns out

[854:05]

that we can solve that problem much more

[854:07]

easily in the world of Python. In fact,

[854:10]

let me go back to VS Code here. Let me

[854:12]

close these two versions of int

[854:14]

comparison. Let me open up at left a

[854:16]

version of my program that I brought

[854:19]

with me here that contains a version

[854:22]

from week 2 wherein we finally revealed

[854:24]

that a string is just a char star. But

[854:26]

recall that the solution in week four as

[854:29]

well as in week one when we first

[854:31]

encountered this problem was to use stir

[854:33]

comp a function that whose purpose in

[854:36]

life is to compare two strings character

[854:38]

by character by character using a for

[854:39]

loop or something like that. But they

[854:41]

have knowledge therefore of how to

[854:42]

navigate pointers, how to look for the

[854:44]

null character, the back/zero at the

[854:45]

end. And all of that came from our

[854:47]

friend string.h. Well, how can we go

[854:50]

about implementing the same idea in

[854:52]

Python? Well, let's open up VS Codes

[854:55]

terminal window, open up a new program

[854:57]

called compare.py,

[854:59]

but this time let's get rid of the

[855:01]

integer version thereof. Let's get two

[855:04]

ins from the user. And I won't even use

[855:05]

any CS50 training wheels. Let's just use

[855:07]

the input function to get S and ask the

[855:09]

user for a value of S. So S colon close

[855:14]

quote with a space T equals input ask

[855:16]

the user for a variable T. And then

[855:19]

let's just ask the question. If S equals

[855:22]

T, then print out quote unquote same.

[855:25]

Else go ahead and print out quote

[855:28]

unquote different. Let me move these

[855:30]

side by side just so you can see the

[855:31]

difference. Notice how much code we have

[855:34]

to write and how much we needed to

[855:36]

understand in order to compare something

[855:38]

as trivial as two strings in C. But in

[855:40]

Python, we're literally just using

[855:42]

equals equals. And let's see if it

[855:44]

actually works. So, Python of

[855:45]

compare.py. Enter. Let's type in maybe

[855:49]

cat for s and dog for t. And those are

[855:52]

in fact different, but we would have

[855:53]

gotten the same answer in C. Let's rerun

[855:55]

Python of compare.py and type in cat.

[855:58]

Type in cat again. And now it's

[856:00]

detecting them the same. So wonderfully,

[856:02]

Python has solved that seemingly

[856:04]

annoying problem of not taking us

[856:06]

literally like don't compare the pointer

[856:08]

against the pointer. Compare what a

[856:10]

reasonable programmer probably really

[856:12]

cares about the values of those strings.

[856:14]

So the equal equals is doing all of the

[856:16]

for loop or the while loop iterating

[856:17]

over those things character by character

[856:19]

and actually giving us the answer we

[856:22]

want. So what else gets easier in

[856:24]

Python? Well, let's focus a bit more on

[856:26]

these strings. Let me go back into VS

[856:28]

Code here. Let me close out our two

[856:30]

comparison programs and clear my

[856:31]

terminal. And let me go ahead and open

[856:33]

up a prior program that we wrote that

[856:35]

one called agree.c. And namely in the

[856:38]

staff version of the code online, this

[856:39]

was agree to. C, which is where we left

[856:41]

it. Now recall in this C program that we

[856:43]

did the following. We first using CS50's

[856:46]

get char function prompted the user for

[856:47]

a char hopefully Y or N for yes or no

[856:50]

respectively. And then we used a boolean

[856:53]

expression and actually the combination

[856:55]

of two using the two vertical bars to

[856:57]

ask whether the inputed character is

[856:59]

capital Y or the inputed character is

[857:01]

lowercase Y. And if so, we went ahead

[857:03]

and printed out that the user agreed.

[857:05]

Otherwise, if they type in anything else

[857:07]

for that character, we simply printed

[857:09]

out not agreed. Well, how can we go

[857:11]

about implementing that same program in

[857:13]

Python? For instance, in a file called

[857:14]

agree.py. Well, let me go ahead and open

[857:17]

up my terminal window again. Let's

[857:18]

create a file called agree.py. not pi as

[857:21]

before. Let me go ahead and drag it over

[857:22]

to the right so we can see these two

[857:24]

things side by side. And let me go ahead

[857:26]

and do this. I'm going to set a variable

[857:29]

say called s uh equal to the return

[857:32]

value of input quote unquote do you

[857:34]

agree thereby asking the user the same

[857:36]

question as before. No need to use the

[857:38]

CS50 library because the input function

[857:40]

here suffices. And instead of using C,

[857:42]

I'm deliberately using S because it

[857:43]

turns out in Python, there is no way to

[857:46]

get a single character per se, but you

[857:48]

can get a string that has a single

[857:51]

character. Indeed, char is not a data

[857:52]

type in Python. But once we have this

[857:55]

input from the user, let's now go ahead

[857:57]

and implement a conditional using one or

[857:59]

more boolean expressions. Well, let's

[858:00]

ask if S equals equals quote unquote

[858:04]

capital Y or S equals equals lowercase

[858:08]

Y, then let's go ahead and print out as

[858:11]

before quote unquote agreed. And now

[858:14]

notice what's different this time. I'm

[858:16]

literally using the word or instead of

[858:18]

the two vertical bars because in the

[858:20]

spirit of Python, things tend to be a

[858:21]

little more English-like, a little more

[858:22]

readable, top to bottom, left to right.

[858:24]

And indeed, or hits that nail on the

[858:26]

head. Otherwise, if it is not an capital

[858:29]

Y or a lowercase Y, let's go ahead and

[858:31]

print out quote unquote not agreed. And

[858:34]

that's it for converting this program

[858:36]

from C here into Python. But of course,

[858:39]

this isn't the most robust version of

[858:40]

the program because it would be nice if

[858:42]

the user could type in something like

[858:44]

yes uh ye capitalized maybe in different

[858:47]

ways. So, how might we go about

[858:49]

implementing that? Well, we could do

[858:51]

this in a few ways. I could of course

[858:53]

and let's go ahead and get rid of my C

[858:54]

version now and focus just on the

[858:56]

Python. I could do something like this

[858:58]

and just start oring together more

[858:59]

possibilities like or S equals uh quote

[859:03]

unquote yes or S equals equals quote

[859:06]

unquote yes very emphatically or and so

[859:09]

forth. But you could imagine that this

[859:11]

doesn't scale very well. If I want to

[859:12]

consider all the possible permutations

[859:14]

maybe of the caps lock key being up or

[859:16]

down, that's quite a few possibilities

[859:18]

to enumerate. So perhaps we could do

[859:20]

this a little bit differently. And in

[859:22]

fact, we can by maybe storing all of the

[859:24]

possibilities in a so-called list. So

[859:27]

whereas C had of course arrays, Python

[859:29]

has what are called lists which

[859:30]

effectively underneath the hood are

[859:32]

indeed linked lists as we explored in

[859:34]

week five. Now a linked list of course

[859:36]

can dynamically grow and even shrink.

[859:38]

And that's indeed what Python does for

[859:40]

us. I can simply create a list of values

[859:43]

from the get-go. Or as we'll eventually

[859:44]

see, I can add things to it, remove

[859:46]

things from it, and all of the

[859:47]

underlying memory gets managed for me.

[859:49]

And in fact, with lists, we get a whole

[859:51]

bunch of features that can make this

[859:52]

possible. But for now, let's use them

[859:54]

simply as statically initialized lists

[859:56]

with values I know from the get-go that

[859:58]

I want. And I'm going to go ahead and do

[859:59]

this in VS Code. I'm going to delete

[860:02]

most of this boolean expression, the

[860:05]

combination of all of those there

[860:06]

phrases. And I'm going to simply say if

[860:08]

S is in using a Python keyword in,

[860:11]

literally the following list of values

[860:13]

quote unquote Y, quote unquote yes. And

[860:16]

for now, I'm going to use just those

[860:18]

two. But let's see how it works. Let me

[860:19]

open up my terminal window again. Let me

[860:22]

run python of agree.py. Really for the

[860:24]

first time, but let me claim that it

[860:26]

would have worked even in the previous

[860:27]

version. Enter. I'm going to go ahead

[860:29]

and type in lowercase y. And I've

[860:31]

agreed. I'm going to go ahead and run it

[860:32]

again and type in lowercase n. And I've

[860:35]

not agreed. I'm going to go ahead and

[860:36]

run it again. And I'm going to type in

[860:38]

all caps. Yes, because I really agree.

[860:40]

And yet I don't because there is a bug

[860:42]

still in this version. So even though up

[860:45]

here in my Python implementation I do

[860:47]

have a list of values that I'm looking

[860:48]

for, Python's going to look literally

[860:50]

for those values. So lowercase Y and

[860:53]

lowercase yes. So how can I go about

[860:55]

tolerating different capitalizations by

[860:58]

the user? Well, I can do this in a few

[861:00]

different ways. I could for instance

[861:02]

after getting the user's input in a

[861:04]

variable called S, I could update S to

[861:06]

be S.L, lower which is going to have the

[861:09]

effect of lowercasing the word for me

[861:11]

and then updating the value itself of s

[861:14]

and now I think this will work even for

[861:16]

an uppercase version let me go ahead and

[861:17]

run python of agree.py pi emphatically

[861:19]

type in yes enter and yet this time I've

[861:22]

agreed because I forced the user's input

[861:24]

to lowercase and then I have compared

[861:26]

against the canonical forms I've written

[861:28]

which are all lowercase I could have

[861:29]

done the opposite I could have forced

[861:30]

the user's input to uppercase and then

[861:32]

enumerated in my Python list in between

[861:34]

those square brackets uh capital y and

[861:37]

capital yees but either approach here is

[861:40]

fine now technically I don't need this

[861:42]

additional line here I can go ahead and

[861:44]

delete that line wherein I lowercased it

[861:46]

and in Python I can actually ain some of

[861:49]

these function calls together by saying

[861:51]

input.lower so that the return value of

[861:53]

input ultimately gets forced to

[861:55]

lowercase by using lower here. Uh

[861:58]

alternatively still I could just

[861:59]

lowercase the very at the very moment

[862:02]

I'm actually comparing it and down here

[862:03]

I could do s.

[862:05]

And then compare the lowercase version

[862:07]

of what's going on uh to y or yes. Now

[862:11]

what's really this all about? Well, this

[862:13]

is actually an example of what's

[862:14]

generally known as object-oriented

[862:16]

programming or OOP for short, whereby in

[862:19]

Python and a lot of other languages.

[862:21]

Now, you can have variables and data

[862:23]

types more generally that have not only

[862:26]

values associated with them like Y or

[862:28]

yes, but also functionality built in. In

[862:32]

other words, whereas in C, we would have

[862:34]

used a function from like the C type

[862:36]

library called to upper or to lower and

[862:38]

we would have passed as an argument to

[862:40]

those functions the very character that

[862:42]

we wanted to force to uppercase or to

[862:44]

lowercase. Well, in Python and indeed

[862:46]

object-oriented programming languages in

[862:48]

general, the developers behind the

[862:50]

language recognize that sometimes

[862:52]

there's functionality that's inherently

[862:54]

related to the values in question. And

[862:56]

indeed, when we're dealing with strings,

[862:58]

it's pretty reasonable to want to

[862:59]

sometimes uppercase them or lowercase

[863:01]

them, capitalize them, or do any number

[863:03]

of other things. And so, built into the

[863:05]

string type in Python is in fact the

[863:08]

lower function itself, as well as a

[863:10]

whole bunch of others. In fact, at this

[863:11]

URL here, can you see the documentation

[863:13]

for all of the string functions built

[863:15]

into Python? More technically, when a

[863:17]

function is built into a data type and

[863:20]

you access it via this dot notation,

[863:22]

instead of by calling some global

[863:24]

function and passing an argument into

[863:26]

it, you are using what are called

[863:27]

methods. So methods are simply functions

[863:30]

that are inside of objects. And in this

[863:32]

case, the object in question itself is a

[863:35]

string. So what's really happening with

[863:37]

this here example when I'm checking

[863:39]

whether the user has agreed or not is

[863:41]

I'm taking that value that string s

[863:43]

which is technically now an object in

[863:45]

memory and inside of that object are is

[863:47]

not only the user's input but some

[863:49]

built-in functionality otherwise known

[863:51]

now as methods and those methods were

[863:53]

written by the same people who invented

[863:54]

the string data type itself. So this is

[863:57]

just the first of these examples, but

[863:58]

we'll see yet others. But notice the

[864:00]

syntax is actually quite similar to C,

[864:01]

just as in C. When you wanted to go

[864:03]

inside of a structure, you can similarly

[864:05]

go inside of an object in Python and

[864:08]

access not just the values ultimately,

[864:10]

but also these built-in methods.

[864:13]

All right, how about another comparison

[864:14]

of C to Python again involving strings?

[864:17]

Well, let me go ahead and reopen and

[864:19]

clear my terminal and close out of

[864:21]

agree.py. Let me go ahead and open up a

[864:23]

version of copying strings from a couple

[864:26]

of weeks back whereby we finally started

[864:28]

solving it correctly by doing some

[864:30]

proper memory management. So here in the

[864:32]

staff version of copy 5.C we have not

[864:34]

only a commented version of what we did

[864:36]

a couple weeks back but we also have a

[864:38]

reminder of how what was involved in

[864:40]

copying strings in C. Recall for

[864:43]

instance that we prompted the user in

[864:44]

this example using CS50's get string

[864:46]

function for a string that they wanted

[864:48]

to make a copy of and then we did some

[864:50]

error checking ultimately to make sure

[864:51]

that there was enough memory and nothing

[864:52]

went wrong. Then recall that the right

[864:54]

solution to this problem in C was not to

[864:57]

just use the assignment operator and

[864:59]

assume that S can be copied into T, but

[865:02]

rather to allocate using maloc enough

[865:04]

memory for the copy plus one more bite

[865:07]

for the null character. Again, making

[865:08]

sure that all is well by checking the

[865:10]

return value of that. and then actually

[865:13]

copying character by character by

[865:15]

character the characters from S into the

[865:18]

chunk of memory now known as T or

[865:20]

ultimately recall we used a built-in

[865:22]

stir copy function which does all of

[865:24]

that looping for us and then when it

[865:26]

came time to capitalize just the copy we

[865:28]

did a quick sanity check is the length

[865:30]

of t greater than zero otherwise there's

[865:32]

nothing to capitalize and if so go ahead

[865:34]

and use the cype libraries to upper

[865:36]

function passing as input that specific

[865:39]

character t bracket zero and and

[865:41]

updating t bracket zero itself. So

[865:43]

here's an example of procedural

[865:45]

programming in contrast with

[865:46]

object-oriented programming. Again, I'm

[865:48]

passing the argument to be uh uppercased

[865:52]

into the two upper function as opposed

[865:54]

to simply going to that character and

[865:56]

asking it via some dot operator to for

[865:59]

instance uppercase itself. Now I went

[866:02]

ahead in the C version and printed out

[866:03]

the two strings. I freed up my copy of

[866:05]

memory that I myself had allocated and

[866:07]

that was it for this program. So, it was

[866:09]

a decent amount of work, recall, in C,

[866:11]

to actually go about just copying a

[866:13]

string. Well, as with so many things in

[866:15]

Python, it's going to be so much easier.

[866:17]

Let me go ahead and do this. Let me open

[866:18]

my terminal window. Let me create a file

[866:20]

called copy.py.

[866:22]

Let me move it over to the right hand

[866:24]

side so we can see them side by side.

[866:25]

Closing my terminal window. And let's do

[866:27]

roughly the same. Let's create a

[866:29]

variable called s. Set it equal to on

[866:31]

the right hand side the return value of

[866:33]

Python's own input function because we

[866:35]

don't really need CS50's own get string

[866:37]

function. and ask the user for s. Then

[866:40]

let's go ahead and create a second

[866:42]

variable called t. Set it equal to

[866:44]

literally s. capitalize whose purpose in

[866:47]

life, if we read Python's documentation

[866:49]

for string methods, will be to uppercase

[866:52]

the first letter of the word that the

[866:55]

user has presumably just typed in. Then

[866:57]

I'm going to go ahead and print out as

[866:59]

before the user's input. And I can do

[867:01]

this in a couple of different ways, but

[867:03]

I'm going to use one of our format

[867:04]

strings and say s colon and then

[867:06]

interpolate that variable s by using my

[867:08]

curly braces to say put the value of s

[867:11]

here. Then I'm going to go ahead and

[867:12]

print out t by saying t colon

[867:14]

interpolate its value here inside of

[867:16]

quotes close parenthesis. So let's see

[867:19]

if this works. Let me go ahead now and

[867:21]

run python of copy.py. I'm going to go

[867:24]

ahead and type in say cat in all

[867:27]

lowercase and hit enter. And now notice

[867:29]

S remains in all lowercase, but the copy

[867:33]

indeed has been capitalized alone. All

[867:36]

right. Well, let's take a look at one

[867:38]

other example involving strings uh

[867:40]

between C and Python equivalents. Uh let

[867:43]

me go ahead and remind us that a few

[867:45]

weeks back too, we created this

[867:46]

uppercase program whose purpose in life

[867:48]

was to prompt the user using get string

[867:51]

for a string saying here's the before

[867:52]

string. then it prints out after because

[867:55]

the purpose in life of this program was

[867:56]

to uppercase all of the characters in

[867:59]

the string, not just capitalize the

[868:01]

first one. So, as you might expect, we

[868:03]

used a loop a few weeks back and we

[868:05]

iterated from zero on up to the length

[868:07]

of the string using plus+ to increment i

[868:10]

in each iteration and then each time we

[868:12]

went ahead and printed out one character

[868:14]

at a time. So, strictly speaking, we

[868:16]

didn't change the string from lowercase

[868:18]

perhaps to uppercase. We just changed

[868:20]

each letter to uppercase and printed it

[868:22]

out right away. Well, how might we do

[868:24]

something similar in Python? Well, here

[868:26]

too we have a couple of different

[868:28]

approaches. Let me go ahead and open up

[868:29]

my terminal now. Run uh code of say

[868:33]

uppercase.py.

[868:35]

Close my terminal window and let's drag

[868:36]

this to the right so we can see them

[868:37]

side by side. And let's do roughly the

[868:40]

same. Let me create a variable this time

[868:42]

called before. uh set that equal to the

[868:45]

return value of input and just prompt

[868:47]

the user for that before string. Then

[868:50]

after that, let's go ahead and print out

[868:51]

preemptively after colon space space

[868:54]

just to align everything nicely. But let

[868:56]

me not print a new line yet because I

[868:58]

want to go ahead and see uh the

[869:00]

following string on that same line. And

[869:02]

then let's go ahead and do this

[869:04]

analogously to the C version first, but

[869:06]

then tighten things up. Here's how we

[869:08]

can iterate in Python over every

[869:10]

character in a string. I don't need to

[869:11]

bother with I and indexing into the

[869:13]

string or anything like that. I can

[869:15]

using a Python for loop simply say for

[869:17]

each character C in that string called

[869:20]

before go ahead and print out the

[869:23]

uppercase version of that character. But

[869:26]

don't yet print out a new line. But at

[869:28]

the very end of this loop, go ahead and

[869:30]

print out nothing but a new line. Let me

[869:32]

go ahead and open my terminal. Run

[869:34]

Python of uppercase.py.

[869:36]

Enter. Type in cat in all lowercase.

[869:39]

Cross my fingers. and after each and

[869:41]

every one of the characters is

[869:43]

uppercased. And what's nice about this,

[869:44]

if nothing else, is that this for loop

[869:46]

in Python there on line three is pretty

[869:48]

elegant, whereby you implicitly get

[869:51]

access to each character in the string

[869:52]

because that's how Python knows how to

[869:55]

iterate over a string object. But it

[869:57]

turns out we don't have to do this quite

[869:59]

as analogously in Python as we did in C.

[870:01]

We don't have to do it character by

[870:02]

character in so far as Python is

[870:04]

object-oriented and these strings are

[870:06]

objects and those objects have methods.

[870:09]

those methods will actually operate on

[870:11]

the entire string at once unlike the

[870:14]

more pedantic work we had to do

[870:15]

character by character in C. So in fact

[870:17]

let me go ahead and close the C version

[870:19]

here uh clear my terminal and hide it

[870:22]

and let's go ahead and make this quite

[870:23]

simpler. Let's get rid of the for loop

[870:26]

al together and let's simply and let's

[870:28]

get rid of that print statement al

[870:30]

together leaving only the before

[870:32]

variable and getting the user's input.

[870:34]

And now let's create an after variable.

[870:36]

Set it equal to before dot upper thereby

[870:39]

uppercasing the entire string called

[870:41]

before and setting the return value to

[870:44]

the after variable. And then let's go

[870:46]

ahead and print using our old friend

[870:48]

string uh after colon uh space and then

[870:51]

interpolate the value of that after

[870:54]

version. So now we're down to just three

[870:56]

lines at that. Let me go ahead and

[870:57]

reopen my terminal. Python of

[870:59]

uppercase.py enter. Type in cat and all

[871:02]

lowercase. And voila. Now I have

[871:04]

capitalized the cat all at once.

[871:08]

All right. Before we take a break for

[871:10]

some uh fruit by the foot, let's go

[871:12]

ahead and take a look at Python's

[871:14]

implementation of loops further. So in

[871:17]

Scratch, recall that we implemented a

[871:19]

loop with something like this. If I

[871:20]

wanted to meow three times on the

[871:21]

screen, I would literally use a repeat

[871:23]

block. In C, it was a little clunkier to

[871:25]

mimic that same idea. Like we could

[871:27]

implement a variable uh called I and set

[871:29]

it equal to zero. Then we could ask a

[871:31]

boolean expression, is I less than

[871:33]

three? If so, print meow and then

[871:35]

increment i using our old plus+ friend,

[871:37]

which in Python is now gone. In Python,

[871:39]

we can do this almost the same except I

[871:42]

don't think we need the data type. I

[871:44]

don't think we need the semicolon. We

[871:45]

don't need the parenthesis. While still

[871:47]

exists, we don't need the curly braces.

[871:49]

And we can't use the plus+. We don't

[871:51]

need the f. I mean, we're mostly just

[871:52]

trimming clutter from this here

[871:54]

implementation. So, this is the C

[871:56]

version. This now is the Python version.

[871:58]

a little tighter, a little easier to

[872:00]

read. It's pretty much the minimal

[872:01]

syntax available to get the job done.

[872:04]

So, how can we actually have a cat meow

[872:06]

in this case? Well, let me go into VS

[872:09]

Code and I'll stop doing everything side

[872:11]

by side and just stipulate that we've

[872:13]

done most of these examples previously

[872:14]

in C. And in my first cat, well, I could

[872:18]

certainly do it the easy way. And let me

[872:19]

go ahead and create cat.py. And like we

[872:22]

always started in the past with, I could

[872:24]

just do me and then our old friend copy

[872:26]

paste. And this of course was bad for

[872:27]

bunches of reasons, but it gets the job

[872:29]

done. In Python, if I want to do this,

[872:31]

well, I can just borrow that same

[872:32]

inspiration and I could say set I equal

[872:35]

to zero, then do while uh I is less than

[872:39]

three colon, then go ahead and print out

[872:42]

meow and then go ahead and do I equal or

[872:45]

rather I plus= 1 is maybe the most

[872:48]

succinct way to express that same idea.

[872:51]

All right, just to confirm that this

[872:52]

works, Python of cat.py. Enter. Meow

[872:55]

meow meow. All right. So, how else can

[872:57]

we do this? And how can we do this more

[872:59]

Pythonically? This is perfectly correct.

[873:03]

Many people might implement it this way,

[873:05]

but it's not quite as succinct as we

[873:08]

could alternatively do in Python. Yeah.

[873:14]

>> Yeah. So, we could maybe use a for loop.

[873:16]

And in fact, let's let's go there

[873:18]

because we don't quite have the same

[873:19]

types of for loops in Python as we did

[873:21]

in C. while loops are essentially the

[873:23]

same, but for loops are actually a

[873:25]

little bit different and actually a

[873:26]

little bit better. So, let me go into my

[873:28]

code here, delete all four of these

[873:30]

lines, and literally just say for i in

[873:34]

this list of values 01 and two colon

[873:38]

print meow. In other words, in four

[873:41]

loops in Python, you don't have the

[873:42]

parentheses, you don't have the two

[873:43]

semicolons, you don't have the

[873:44]

initialization and the boolean

[873:46]

expression and the update. You just say

[873:48]

a little more English-like for each I in

[873:51]

the following list or for each value of

[873:54]

I in the following list. And what Python

[873:55]

will do for us is automatically on the

[873:57]

first iteration set I equal to zero. On

[873:59]

the second iteration set I to one on the

[874:01]

third iteration set I to two and then

[874:03]

there's only three things in the list.

[874:05]

So that's it. And so just as before with

[874:07]

the Y and the yes example where I use

[874:09]

square brackets similar to arrays and C,

[874:12]

I was using a Python list of strings in

[874:15]

that case. Here I'm using a Python list

[874:17]

of integers 0, one, and two. And they're

[874:20]

integers in the sense that they have no

[874:22]

quotes around them. So they're obviously

[874:23]

not strings. And I'm printing out meow

[874:25]

this many times. And indeed, if I do

[874:26]

Python of cat.py again, I get meow meow

[874:29]

meow. This is correct. This is arguably

[874:32]

better, at least in the sense that it's

[874:34]

two lines of code instead of four. And

[874:36]

it's arguably more readable as well. But

[874:40]

what do you not like about this perhaps

[874:43]

even if you're only seeing it for the

[874:44]

first time?

[874:49]

>> Yeah, it's going to be a lot more

[874:50]

difficult to do things more than three

[874:52]

times because recall in Python in in

[874:54]

Scratch at least. And in C, we had the

[874:56]

ability to either express ourselves

[874:58]

literally or at least in C, we could

[874:59]

just change that three to any number we

[875:01]

want. 30, 300, no big deal. It's a super

[875:04]

simple change, even though it was kind

[875:05]

of annoying to type all of this out.

[875:07]

Well, in Python, yeah, I could do this

[875:09]

and say for I and 0 1 and two just to

[875:12]

mimic the numbers that we'd be setting I

[875:14]

equal to in the C version. Frankly, this

[875:16]

can be any list. It could be 1 2 3 4 5 6

[875:20]

uh cat, dog, bird, or any three things

[875:22]

whatsoever. But I'm just using 0 1 and

[875:24]

two for consistency with the way C would

[875:27]

have done it. But slightly better than

[875:28]

this is to use one of those other data

[875:30]

types that was briefly on the screen

[875:32]

earlier. We have not just floats and

[875:34]

ints and stirs and lists and tpples. We

[875:38]

also have what are called ranges. And

[875:40]

range is not only a data type in Python,

[875:42]

but more literally a function that you

[875:44]

can call to get a range of values from

[875:47]

zero on up. So I can change this list of

[875:49]

three values to a function call to a

[875:52]

function called range. Pass in how many

[875:54]

things I want and by default, per the

[875:55]

documentation, I'll get back a list of

[875:57]

numbers 0, 1, and two. And nicely,

[876:00]

Python's pretty smart about this. It

[876:01]

technically doesn't hand you back all of

[876:03]

the numbers at once, whether it's three

[876:04]

or 30 or 300 or 3 million. It sort of

[876:07]

hands them back to you one at a time. So

[876:09]

you're not using more memory just

[876:11]

because you're doing more iterations. So

[876:13]

now if I do want to iterate four times,

[876:15]

five times, 30 times, 300 times. I again

[876:18]

can just change the single value. And if

[876:20]

you want to be fancy too, you can skip

[876:22]

numbers. You can go count all the way

[876:24]

through odd numbers or even numbers. You

[876:26]

can change the incrementation factor.

[876:27]

But the default and the most canonical

[876:29]

is indeed just to count up like that. So

[876:31]

if I go back to VS Code here and improve

[876:33]

this, I can change that hard-coded list

[876:36]

to just range of three, clear my

[876:38]

terminal, run this cat one more time,

[876:40]

and now I'm back in business as well. In

[876:43]

fact, this is so common. Let me throw up

[876:45]

one alternative to this. You'll notice

[876:47]

that in the previous example, both in VS

[876:49]

Code and on the screen, um I am not

[876:53]

actually using I in any way. In fact, if

[876:56]

you look back at how we converted the

[876:58]

Scratch to Python code, I'm using I

[877:00]

because when you use a for loop in

[877:02]

Python, you have to give it a variable

[877:05]

in some list or range of values. That's

[877:08]

just the way it is. But I'm technically

[877:09]

not using or printing I anywhere. And

[877:12]

that's fine. And so it's arguably

[877:14]

Pythonic, too. If you have a variable

[877:16]

out of necessity, but you're not

[877:17]

actually going to use it for anything

[877:19]

useful, just call it an underscore

[877:21]

instead. And even though this is weird

[877:22]

looking, an underscore is a valid symbol

[877:25]

for a variable name in Python. So it is

[877:27]

Pythonic to just use this just to signal

[877:30]

to yourself later and to colleagues that

[877:32]

yeah, I'm using a variable because I

[877:33]

have to, but it's not one I'm actually

[877:35]

going to use elsewhere. It's a minor

[877:37]

subtlety and not strictly uh necessary,

[877:40]

but perhaps commonly done. All right,

[877:43]

how about a couple final versions of

[877:46]

cats then? So recall that if we wanted

[877:48]

to do something in Scratch forever, we

[877:50]

had a forever block which literally did

[877:52]

that. Well, in C, we couldn't quite

[877:54]

translate that literally. So the closest

[877:57]

uh approximation was probably this while

[877:59]

true, whereby you have a boolean

[878:01]

expression that by definition is always

[878:02]

true. So the loop is never going to

[878:04]

stop, thereby infinite. If you wanted to

[878:06]

print out meow meow meow on the screen,

[878:08]

adnauseium. In Python, you can do it

[878:10]

almost the same, but the curly braces

[878:12]

are about to go, the f is about to go,

[878:13]

the back slashn, the semicolon, and the

[878:15]

parenthesis. But for whatever reason, in

[878:19]

C, we lowercase true and false. In

[878:21]

Python, we capitalize true and false.

[878:23]

So, a minor subtlety, but it's now

[878:25]

indeed capital T, but the indentation

[878:27]

has to be the same and the colon has to

[878:29]

be there as well. So, with that, we can

[878:32]

of course induce intentionally or

[878:34]

otherwise some infinite loops. As with

[878:37]

C, you can break out of them if need be

[878:39]

with control C to interrupt the process.

[878:42]

But let's just see lastly with this cat

[878:44]

how we can make it a little more

[878:46]

abstract like the final versions of our

[878:48]

cat in Scratch and C. So let me propose

[878:51]

to open up here uh in a pro version of

[878:55]

cat that we looked at that we wrote in

[878:57]

the past. Uh it was version 12 at the

[879:00]

time which looked a little something

[879:01]

like this. This was one of the final

[879:03]

versions of our cat in C that simply

[879:06]

allowed me in Maine to call a meow

[879:08]

function that took an argument which is

[879:10]

the number of times I wanted to meow.

[879:12]

This in C is how we implemented that

[879:14]

helper function so to speak that

[879:16]

returned nothing. So its return type was

[879:18]

void but it did take an integer called n

[879:21]

as its input. And then there was a for

[879:23]

loop inside of there that printed meow

[879:25]

that many times. So long story short,

[879:28]

this was how both in Scratch and in C we

[879:30]

invented our own functions. Well, how

[879:32]

can we do this now in Python? Well, let

[879:34]

me bring this version of cat over to the

[879:36]

right here. Delete that previous

[879:37]

version. And let me propose that we do

[879:41]

this. For I in range of three, let's go

[879:45]

ahead and assume for the moment that

[879:46]

there is a meow function in Scratch

[879:48]

whose purpose in life is to just meow on

[879:50]

the screen. Well, that of course does

[879:52]

not exist. So, in Python, I'm going to

[879:54]

use a trick that allows me to define my

[879:57]

own function. And the keyword for this

[879:59]

is literally defaf for define. the name

[880:01]

of the function and then parenthesis if

[880:04]

it takes no arguments. You don't need

[880:05]

the void keyword even if it takes no

[880:08]

inputs. So let's do a simpler version of

[880:10]

the cat first that takes no arguments

[880:11]

and then we'll add back that argument.

[880:13]

How do how does a cat meow? It literally

[880:15]

just says meow on the screen. So already

[880:18]

we seem to be an improvement. I've got

[880:19]

like four lines of actual code here

[880:21]

versus like 20 or so on the lefth hand

[880:23]

side. Let's go ahead and run Python of

[880:25]

cat.py.

[880:27]

Enter. And we see the first of our

[880:30]

errors which is remarkable because

[880:31]

usually I would have messed up by now.

[880:34]

So here we have in Python the equivalent

[880:36]

of like a compiler error message. The

[880:38]

program has not run. It's tried to run.

[880:41]

It's tried to be interpreted but it

[880:42]

encountered some error. These are

[880:43]

generally called trace backs in the

[880:45]

sense that you see a trace back in time

[880:47]

of everything the program was trying to

[880:49]

do just before it failed. So if you've

[880:51]

called a function which called a

[880:52]

function which called a function, you'd

[880:53]

see all of those function calls on the

[880:55]

screen. I've just tried to call one

[880:56]

function. So, it's a relatively short

[880:58]

error. This is clearly a problem. And

[881:00]

here's the type of problem. Name error.

[881:02]

The name Meow is not defined.

[881:05]

So, intuitively, even if you're seeing

[881:06]

Python for the first time, why is ma

[881:09]

meow not defined even though it's

[881:10]

literally defined right there? Yeah.

[881:16]

>> Yeah. As smart as Python is visav,

[881:19]

still kind of naive in that meow doesn't

[881:22]

exist until line four. So, if you try to

[881:24]

use it on line two, too soon. All right.

[881:26]

So, in C, we fix this problem by

[881:29]

initially just kind of hacking things

[881:32]

together by just all right, well, let's

[881:33]

just define it up here and then move

[881:35]

that down there. And that's totally

[881:37]

reasonable. And in fact, if I clear my

[881:39]

terminal and rerun Python of cat.py,

[881:41]

we're back in business. But I'd argue

[881:43]

you can only do that so many times,

[881:45]

especially once you've got a bunch of

[881:46]

functions. You don't want to relegate

[881:48]

like the main part of your program,

[881:50]

which really this loop is, to the very

[881:51]

bottom of the screen, if only because

[881:53]

like that's the first thing you care

[881:54]

about. I want to see at the top of the

[881:55]

screen. And that's the whole point of

[881:56]

putting main at the very top. So what

[881:58]

was the solution in C? The solution in C

[882:01]

was to put the prototype for the

[882:02]

function at the top of the file. That

[882:05]

though is not a thing in Python. You

[882:07]

don't just copy that first line of code,

[882:09]

put it at the top of the file, add a

[882:10]

semicolon, and then it works. Instead,

[882:14]

the Pythonic way to solve this problem

[882:16]

for better or for worse is to actually

[882:18]

put your code in a main function. Main

[882:22]

in Python has no special significance in

[882:25]

this sense. It's just convention to

[882:27]

borrow the name that so many other

[882:29]

languages use as the main function in

[882:31]

those languages. But you just wrap your

[882:33]

function in a function main so that

[882:35]

you're defining main then you're

[882:37]

defining meow before you're actually

[882:40]

using the meow function per se. But I

[882:43]

have made a mistake. If I run Python of

[882:46]

cat.py pi. Now cross my fingers for good

[882:49]

measure. And now the program does

[882:52]

nothing.

[882:54]

Why is that?

[882:58]

Yeah. Why is that?

[883:01]

>> Oh, sorry. Go ahead.

[883:04]

>> Yeah, curiously, I never called the main

[883:06]

function. So whereas in C and in Java

[883:09]

and C++ and a bunch of other languages,

[883:11]

main is special. Like main is the

[883:13]

function by definition that is

[883:15]

automatically called. Python has no such

[883:18]

special magic. It's not going to call

[883:19]

main for you just because you created

[883:21]

it. In fact, I didn't even call that

[883:22]

main function main. It's just a

[883:24]

convention. But the solution is exactly

[883:27]

that. Well, if the problem is that main

[883:29]

wasn't called at the bottom of this

[883:30]

file, what I can do is just literally

[883:32]

call main, which we would never have

[883:34]

done in C, but this is conventional to

[883:36]

do in Python. So that after you've

[883:39]

defined main up here and then define

[883:41]

meow down here now you can call main

[883:44]

which in turn will call meow but at that

[883:47]

point in the story both of those

[883:49]

functions functions exist. So if I go

[883:51]

down here and run cat.py again now I see

[883:55]

my meow meow meow. Now let me add one

[883:58]

final flourish because this version of

[883:59]

the code in C recall actually let me

[884:02]

specify how many times I want to meow

[884:04]

whereas here I actually have my for loop

[884:07]

in main at the right and I'm calling

[884:10]

meow that many times. Well, what if I

[884:11]

want to get rid of this loop over here

[884:14]

and de-indent main meow here and pass in

[884:17]

literally the number three here. Well,

[884:19]

in Python, you can just say inside of

[884:22]

the definition of a function that it

[884:23]

takes an argument like n. You don't have

[884:26]

to specify the data type. Python's smart

[884:28]

enough to figure it out. Then in your

[884:30]

function, you can use that as with for i

[884:32]

in range of n. Go ahead and print meow.

[884:36]

So now the right-hand version of this

[884:38]

program is pretty much equivalent to the

[884:41]

lefth hand version of this program as

[884:43]

always using fewer lines of code. Let me

[884:45]

go ahead and run python of cat.py. Meow.

[884:47]

Meow. Meow. We're good. And then let me

[884:49]

make one final change if only because

[884:51]

most every documentation you see online

[884:53]

or website tutorials on Python will

[884:55]

actually have you not just literally

[884:57]

call main at the bottom but you'll do

[884:58]

this crazy syntax that is solves a

[885:01]

problem that we won't trip over in this

[885:02]

class but typically it's Pythonic to

[885:05]

actually call main after asking the

[885:06]

question if name

[885:09]

equals equals quote unquote_ain

[885:14]

main. This is a stupid mouthful of code

[885:16]

that even I had to think about when I

[885:18]

was typing it out if I got all the

[885:19]

underscores correct. But long story

[885:21]

short, this convention of using a

[885:23]

conditional before you call main allows

[885:26]

you to write more modular code in Python

[885:29]

so that some of your files don't

[885:31]

actually do anything other than define

[885:33]

define define define functions that you

[885:35]

can then import into other files you

[885:37]

write. So in short, this is the right

[885:39]

way to do it. Even though in CS50 it is

[885:41]

unlikely that we are to trip over this

[885:43]

bug. Questions now on that last piece of

[885:47]

how we define functions in Python. Yeah.

[885:53]

>> Ah good question and good eye. Why do I

[885:55]

have two lines between my functions in

[885:57]

Python? As you will see via style 50, it

[886:00]

is Pythonic that is Python convention to

[886:02]

separate functions in your code by two

[886:05]

lines. Whereas there is no such

[886:06]

convention in C. So I'm trying to be

[886:08]

consistent with what the world does.

[886:10]

Yeah.

[886:15]

>> If you want to count backwards in a

[886:16]

loop, can you do that? Absolutely. You

[886:18]

could use the range function in a

[886:19]

different way. Start count uh start with

[886:21]

a much larger value and count down. How?

[886:24]

But you could alternatively do that with

[886:25]

a while loop. I would say that yeah, you

[886:27]

can make that work, but you shouldn't.

[886:29]

It just people don't do that unless it

[886:30]

does actually solve a problem for you.

[886:33]

Other questions on this?

[886:36]

All right. Well, when we looked at C,

[886:37]

recall there was a bunch of things that

[886:39]

ultimately like we couldn't do well. We

[886:41]

ran into issues of like full loading

[886:42]

point precision and integer overflow and

[886:44]

truncation and like all of these worlds

[886:46]

problems. Um, there's still going to be

[886:48]

some of those, but first let's take a

[886:50]

fruit by the foot break and we'll be

[886:51]

back in 10. Help yourself to seconds

[886:53]

today.

[886:55]

All right, so we're back and let's use

[886:57]

our remaining time together to focus not

[886:59]

only on some of the problems that Python

[887:01]

can solve more readily than C, but also

[887:03]

some of the problems that remain. So

[887:05]

here was a program early on in our

[887:08]

discussion of C that had this weird bug

[887:10]

whereby when we implemented a relatively

[887:12]

simple calculator to divide two numbers

[887:14]

x / y. We experienced what we called

[887:17]

truncation at the time whereby 1 / 3 was

[887:20]

curiously zero and like something like 4

[887:23]

/ 3 was curiously one and we were losing

[887:27]

everything after the decimal point. And

[887:29]

this was true even if we tried using

[887:31]

floats because with truncation recall

[887:34]

everything after the decimal point with

[887:36]

integer math is simply discarded. So if

[887:39]

you do int divided by int you're going

[887:41]

to lose what is after the decimal point.

[887:43]

So let's take a look in Python at

[887:45]

whether this is still actually a

[887:47]

problem. So let me go back into VS Code

[887:49]

here. We'll close out the C version

[887:51]

thereof and let's go ahead and create

[887:53]

our own program called calculator.py.

[887:56]

And in this version, let's modify the

[887:58]

original, which just did some addition,

[888:00]

and instead have it do some division

[888:03]

instead. I'll get rid of my outdated

[888:05]

comments and perform now division

[888:07]

instead of uh addition by doing x / y.

[888:12]

Python of calculator.py, let's try one

[888:14]

and let's try three. And oh, our

[888:17]

fractions are actually back. So it turns

[888:19]

out in Python, even when you're

[888:21]

manipulating integers, if you divide one

[888:24]

by the other, and the result logically

[888:26]

should actually be a floatingoint value,

[888:28]

that's what in fact you're going to get

[888:30]

back. And you don't have to jump through

[888:31]

the same hoops that we did before to

[888:33]

actually force things to floats and then

[888:34]

do floatingoint arithmetic and so forth.

[888:36]

In fact, if you want the old behavior,

[888:38]

it's still actually there. And you can

[888:40]

use two slashes in Python to use the old

[888:42]

integer division as opposed to what

[888:44]

we're seeing here. But a typical

[888:46]

programmer I dare say nowadays would

[888:48]

want it to behave in exactly the same

[888:49]

way. So truncation seems to be less

[888:52]

therefore of an issue for us. All right.

[888:55]

Well, what other problems did we

[888:56]

encounter at the time? Well, recall we

[888:58]

had issues of floating point imprecision

[889:00]

whereby even when we divided something

[889:02]

simple like one divided by three and in

[889:04]

grade school we learned that was like

[889:06]

0.333

[889:08]

repeating infinitely many times, we

[889:09]

started seeing weird numbers that were

[889:11]

not three at the end of that value back

[889:13]

in the day. in C. Unfortunately, that's

[889:16]

a problem that's still with us. In fact,

[889:17]

if I use this same program here, let me

[889:20]

go into VS Code and instead of printing

[889:22]

out just X / Y, let's go ahead and do

[889:25]

this temporarily. Let me give myself a

[889:27]

variable called Z and set it equal to X

[889:29]

/ Y only because it'll be a little

[889:32]

easier to see the formatting trick I'm

[889:34]

going to use. Let's go ahead and print

[889:36]

out a format string that prints out Z.

[889:38]

And for the moment, let me just claim

[889:40]

that this is do going to do the exact

[889:41]

same thing. It's just completely

[889:42]

gratuitous that I'm using an F string

[889:44]

now as opposed to just printing out Z.

[889:46]

But if I do 1 / 3, we're still seeing

[889:49]

0.333.

[889:50]

But we're only seeing just over 10 or so

[889:53]

digits here. What if we want to see like

[889:55]

50 digits and really start poking around

[889:57]

at what's being represented? Well, the

[889:59]

syntax is a little weird, but in Python,

[890:02]

using an F string, you can do tricks

[890:04]

similar to what we did with the percent

[890:05]

f with print f and c. And if after my

[890:08]

variable's name in this uh set of curly

[890:11]

braces, I do a colon and then a dot

[890:13]

because I want to see numbers after the

[890:15]

decimal point and say something

[890:16]

arbitrary like show me 50 digits after

[890:19]

the decimal point and treat this as a

[890:20]

float. This is a crazy incantation I do

[890:23]

think of a format string even I am sort

[890:25]

of cheating off of the paper in front of

[890:26]

me but this is how you format strings if

[890:29]

you want to see them with a little uh

[890:31]

more precision or so I think. If I rerun

[890:34]

Python of calculator.py pi and do one

[890:36]

divided by 3. Darn it, we're still in

[890:38]

the same mess that we were before. Now,

[890:40]

why is this? Well, it's still the case

[890:42]

that I'm running the code on the same

[890:43]

kinds of computers that I did before.

[890:45]

It's still the case that these computers

[890:46]

only have a finite amount of memory. And

[890:48]

so, even though I'm manipulating clearly

[890:50]

floatingoint values, Python is only

[890:52]

allocating, say, 64 bits to those float

[890:56]

variables. And so, there's only so much

[890:59]

precision that's possible. And so what

[891:00]

we're seeing is essentially the closest

[891:03]

representation to an infinite number of

[891:05]

threes that we can represent using

[891:07]

binary using a floatingoint

[891:08]

representation therein. So still a

[891:10]

problem but I do think in Python you'll

[891:13]

find that there's so many more libraries

[891:15]

out there thirdparty software that comes

[891:17]

not just with the language itself but

[891:18]

from others whereby you can use uh

[891:20]

libraries for more precise scientific

[891:22]

computing that essentially implement

[891:24]

their own versions of floatingoint

[891:25]

values so that you can use not 64 but

[891:27]

128 or more bits than that when it

[891:30]

really matters to some level of

[891:32]

precision. Thankfully though one problem

[891:35]

is at least solved for us namely integer

[891:37]

overflow. So recall that this was

[891:39]

another problem we ran into whereby if

[891:41]

you try counting higher than say 4

[891:43]

billion or even higher than 2 billion if

[891:45]

you're representing negative numbers

[891:46]

which has the total range that you have

[891:48]

available to you in the positive range

[891:50]

we ran into the situation where it

[891:52]

somehow wrapped around became negative

[891:54]

and then even ended up being zero as a

[891:56]

result. Well, Python wonderfully

[891:58]

nowadays just gives you more and more

[892:01]

bits as needed if your integers are

[892:03]

getting larger and larger. So this is a

[892:05]

wonderful feature and that we've at

[892:06]

least addressed one fundamental

[892:08]

limitation we ran into in C and this

[892:10]

time the language itself provides us a

[892:13]

solution. Python 2 has some pretty handy

[892:16]

features as well. One of them is what

[892:18]

are called exceptions. And so an

[892:20]

exception in Python is a way of handling

[892:24]

error conditions without relying on

[892:27]

return values alone. So recall that in C

[892:29]

if you ever wanted to signify that

[892:31]

something went wrong you have to return

[892:33]

like most recently like null n ul which

[892:36]

was a special sentinel value technically

[892:38]

it's just the zero address and by

[892:39]

checking for that you can make sure that

[892:41]

you know if you're getting back a valid

[892:42]

pointer or not and in other functions if

[892:45]

something went wrong you might similarly

[892:47]

have to check the return value maybe

[892:49]

checking for zero or negative one or one

[892:52]

or something like that but return values

[892:54]

were the only way in C that functions

[892:56]

could communicate back to the programmer

[892:59]

that something went wrong. And this is

[893:01]

problematic because if you imagine

[893:03]

implementing a function that's supposed

[893:05]

to return maybe an integer, whether

[893:07]

positive, negative, or zero, it's kind

[893:09]

of unfortunate sometimes if you have to

[893:11]

steal one of those values and say,

[893:13]

uh-uh, you can't use this value. It's

[893:15]

fine in the world of pointers because

[893:16]

the world decided years ago, we're never

[893:18]

going to use the actual address o x0,

[893:20]

the zero address. But that's still

[893:22]

technically costing us one or more bytes

[893:24]

of space. But in general, it's a bit

[893:26]

annoying if your function can't truly

[893:27]

return all possible values. Think about

[893:30]

a function like get string. If something

[893:32]

went wrong in getstring, what do you

[893:34]

want to return? Well, we saw in the C uh

[893:36]

CS50 library, we do in fact return null

[893:39]

once we introduce that. But in general,

[893:42]

wouldn't it be nice if functions could

[893:43]

somehow signal out of band, so to speak,

[893:45]

that something went wrong? So, by that I

[893:48]

mean this, let's go into a new program

[893:50]

that's inspired by one of our programs

[893:52]

today. And in VS Code, I'm going to go

[893:54]

ahead and close my calculator, open my

[893:56]

terminal window, and create a new

[893:57]

program called integer.py. So in

[893:59]

integer.py, let's just play around with

[894:01]

some integers and see what we can break.

[894:03]

So here, I'll define a variable called

[894:05]

n, and set it equal to the input

[894:07]

function, which comes with Python, just

[894:09]

asking the human for some input. Then

[894:12]

I'm going to go ahead and ask a

[894:13]

question. Is the user's input numeric?

[894:16]

And it turns out if you read the

[894:17]

documentation for strings in Python,

[894:20]

they come with not just an upper

[894:22]

function, a lower function aka methods,

[894:24]

but also is numeric function or method

[894:27]

that tells you whether or not the string

[894:29]

itself happens to be numeric. That is

[894:31]

looks like a number. All right. So I

[894:33]

think if I do that, I could then do

[894:35]

something like this. If n is numeric,

[894:37]

I'm going to go ahead and claim that in

[894:39]

fact it is an integer. Else if it's not

[894:41]

numeric, I'm going to claim that it's

[894:43]

not an integer. I have no idea what it

[894:44]

is. Maybe it's cat. Maybe it's dog.

[894:46]

Maybe it's a mix of numbers and letters,

[894:48]

but it's definitely not an integer as

[894:50]

defined by a sequence of decimal digits

[894:52]

in this case. All right, so let's try

[894:54]

this out. Python

[894:56]

of integer.py. Enter. We'll type in one.

[894:59]

That's an integer. We'll type in two.

[895:01]

That's an integer. We'll type in zero.

[895:03]

That's an integer. Type in cat. Not an

[895:05]

integer. So that seems to in fact work.

[895:08]

But what if I wanted to immediately

[895:10]

convert this to an int as we did in the

[895:13]

past. And so let me modify this a little

[895:15]

bit here and say instead this n equals

[895:18]

not just input

[895:21]

asking the user for an integer or rather

[895:23]

let's just ask them more generally for

[895:25]

input but let's assume that we want to

[895:27]

convert this input to an int. And

[895:30]

actually we can go ahead and say integer

[895:32]

here. All right. Well, here I'm going to

[895:35]

go ahead and just print out the claim

[895:37]

that yep, this is an integer because if

[895:39]

we get to line two, well, clearly we've

[895:41]

handled uh the user's input correctly.

[895:44]

In other words, how can I get rid of

[895:46]

constantly checking the return val

[895:48]

sorry, how can I get away from

[895:50]

constantly checking the return values of

[895:52]

functions to make sure it is what I

[895:54]

expect. All right. Well, let's go ahead

[895:56]

and run Python of integer.py now. Enter.

[895:58]

Type in one tells me it's an integer.

[896:01]

Type in two tells me it's an integer.

[896:02]

zero tells me it's an integer. Type in

[896:04]

cat. Notice this time what goes wrong.

[896:08]

Whereas last time we saw this kind of

[896:10]

trace back error message, it was a name

[896:12]

error because I was using the meow

[896:14]

function name too early. Now I'm getting

[896:16]

a value error which is a different type

[896:18]

of error that relates to invalid literal

[896:20]

for int with base 10 cat. Now that's a

[896:23]

mouthful. So unfortunately Python's

[896:25]

error messages aren't all that much

[896:26]

better than clang's error messages. But

[896:29]

clearly the interpreter does not like

[896:31]

the fact that I'm passing something to

[896:34]

int related to base 10, but that's quote

[896:36]

unquote cat. And really, the best you

[896:38]

can do with this kind of error is

[896:39]

realize like, okay, it's clearly the

[896:41]

case that cat is not an integer. So,

[896:43]

it's having trouble converting cat to an

[896:45]

integer. It makes no logical sense. All

[896:47]

right. So, what's the gist of the

[896:49]

problem? Well, I'm just blindly

[896:51]

converting the user's input to an

[896:54]

integer, even if it's not input. uh even

[896:57]

if it's not an integer. Well, all right.

[896:58]

Well, I could rewind to the previous

[897:00]

version of my function, use the is

[897:01]

numeric function, and then conditionally

[897:03]

convert it, but I'm trying to move away

[897:05]

from constantly checking return values

[897:07]

of error messages. And wouldn't it be

[897:09]

nice if I could somehow catch this value

[897:12]

error and just deal with it if it

[897:14]

happens? And in fact, you can with

[897:17]

Python exceptions and which exist in

[897:20]

other languages as well, Java among

[897:21]

them. You have the ability to sort of

[897:23]

listen for errors happening inside of

[897:26]

functions without having to rely on

[897:28]

return values alone. So, let me go back

[897:31]

to VS Code here, clear my terminal just

[897:33]

to simplify things a bit, and let me

[897:35]

literally say to the interpreter, please

[897:38]

try to execute the following two lines

[897:41]

of code, except if something goes wrong,

[897:44]

like a value error, in which case go

[897:46]

ahead and print out something like not

[897:49]

integer. So, wouldn't it be nice if you

[897:52]

could just wrap all of the code you've

[897:53]

written in CS50 thus far with try and

[897:56]

sort of ask the computer politely like

[897:57]

please try to execute this code? But

[897:59]

that really is the the semantics behind

[898:02]

it. Try to execute these lines of code

[898:04]

except if there's an error then do this

[898:06]

other thing instead. And therefore, you

[898:09]

don't have to check any return values.

[898:11]

you can just blindly pass the output of

[898:14]

the input function as the input to the

[898:16]

int function knowing that if something

[898:18]

goes wrong inside of there, Python is

[898:21]

going to execute this code instead

[898:23]

except when something goes wrong. So let

[898:25]

me go ahead and run Python of integer.py

[898:27]

now. I'll type in one and that works

[898:30]

because it's trying to execute line two

[898:32]

and succeeding. It's trying to execute

[898:34]

line three and succeeding. So lines four

[898:36]

and four never actually kick in. But if

[898:38]

I try again here with cat, line two is

[898:42]

going to fail. Line three is never going

[898:45]

to get reached because Python is

[898:46]

immediately going to jump to this

[898:48]

exception handler, so to speak, thereby

[898:51]

catching the error or the exception and

[898:53]

printing not integer instead. So it's a

[898:56]

little bit of a weird convention. It's

[898:58]

different from what C offers, but a lot

[899:00]

of newer languages nowadays do offer

[899:02]

this because it's a better way of just

[899:04]

writing code that you know should work

[899:06]

99% of the time. But if something does

[899:08]

go wrong out of memory, the human types

[899:11]

something wrong in or something like

[899:12]

that, you can handle all of those

[899:14]

exceptional cases, exceptional in a bad

[899:17]

sense using this accept keyword instead.

[899:22]

questions on any of this here technique.

[899:28]

Yeah,

[899:34]

>> a really good question. In this case, I

[899:35]

used a value error. Do I need to define

[899:38]

every possible thing that can go wrong?

[899:40]

Short answer, yes. Now, there aren't

[899:42]

terribly many. There's some standard

[899:43]

ones and they're all capitalized in this

[899:45]

way. Capital letter, capital letter,

[899:47]

something error. Typically, you can even

[899:48]

invent your own. Um, and it's good

[899:51]

practice to enumerate the kinds of

[899:52]

things that you think can go wrong.

[899:54]

Value error is pretty generic, but there

[899:56]

could be memory related errors. There

[899:57]

could be file not found related errors.

[899:59]

There's a bunch of different exceptions

[900:00]

that are all documented in Python that

[900:02]

you can listen for. That said, as nice

[900:04]

as Python's documentation is overall, it

[900:08]

is not good at documenting for specific

[900:10]

functions what exceptions they can

[900:12]

throw. And I've never understood this

[900:14]

after all of these years that no human

[900:15]

has gone into the documentation and

[900:17]

painstakingly enumerated all of the

[900:19]

possible things that can go wrong.

[900:21]

What's too often the case in the real

[900:22]

world with some of my own code included

[900:24]

is if you encounter an exception that

[900:26]

you didn't think was going to happen,

[900:28]

you go in and improve your code and add

[900:29]

to this list of except clauses. What

[900:32]

else might go wrong? Shouldn't be that

[900:34]

way. And different libraries are better

[900:36]

about documenting these things.

[900:38]

All right. Well, with that in mind, let

[900:41]

me propose that in the CS50 library for

[900:44]

Python, get int and get float, they work

[900:46]

just like the C library whereby if you

[900:48]

type in cat or dog or bird into those

[900:50]

functions, they just reprompt you. They

[900:52]

just reprompt you. And long story short,

[900:54]

this is the kind of code we wrote in

[900:55]

Python. Try to get input from the user

[900:57]

except if something goes wrong, prompt

[900:59]

them again, prompt them again. So, we

[901:01]

too were using precisely these features

[901:03]

even though it wasn't something that was

[901:05]

available to us in C. All right. But

[901:08]

something else that we did in C was play

[901:09]

around with Mario in a few different

[901:10]

forms. And in lecture recall a few weeks

[901:12]

back, we experimented with like using

[901:14]

some asy arts, some very simple text to

[901:16]

print out something like this pyramid of

[901:18]

height 3. Well, how can we go about

[901:20]

printing something like this? Well, I

[901:22]

would propose that if I go back to VS

[901:23]

Code here, let's close out my integer

[901:25]

examples, code up a new version of Mario

[901:27]

in Mario.py. This one's kind of simple.

[901:30]

I can say something like for I in range

[901:32]

of three, go ahead and print out quote

[901:34]

unquote a hash. down in my terminal

[901:37]

window, Python of Mario 3, and I've got

[901:38]

really the closest analog to three

[901:40]

bricks stacked on top of each other in

[901:43]

this way. But in C in eventually, uh,

[901:48]

our implementation of Mario started to

[901:49]

get a little fancy and we started to

[901:51]

prompt the user for the height of the p

[901:53]

of the wall and therefore we could have

[901:56]

not just three but maybe four or even

[901:58]

more bricks being printed. So, let me

[902:00]

actually open up that version from a few

[902:02]

weeks back whereby from week one we had

[902:05]

a version of Mario that looked like this

[902:10]

whereby we after including some header

[902:13]

files declared in main a variable called

[902:15]

n. Then we saw a new construct at the

[902:18]

time, a dowhile loop that just keeps

[902:20]

using get int get int get in so long as

[902:22]

n is not uh one or greater equivalently

[902:26]

so long as n is less than one and kept

[902:28]

prompting the user again and again. The

[902:30]

reason for having n up here recall was

[902:32]

issues of scope. This therefore it's

[902:34]

accessible lower in the function as

[902:36]

opposed to it being confined to those

[902:37]

curly braces. And then down here we used

[902:39]

a for loop to actually print out that

[902:40]

many hashes. So in short, the dowhile

[902:43]

loop solve the problem in C, whereby you

[902:46]

want to get user input at least once and

[902:48]

maybe again and again and again if they

[902:51]

don't cooperate the first time. And

[902:52]

that's where doh loops really shine. Do

[902:54]

something at least once and maybe again

[902:56]

again and again. Otherwise, it's a

[902:58]

little more annoying to do it with while

[903:00]

loops or for loops. Unfortunately,

[903:03]

Python does not offer a dowhile loop.

[903:06]

And so here too, we have an opportunity

[903:07]

to introduce you to what the world would

[903:09]

call Pythonic. What is Python's solution

[903:12]

there too? Well, on the right hand side

[903:14]

here in Mario.py, let's change this a

[903:16]

little bit and let's do from uh let's go

[903:19]

ahead and do

[903:22]

uh while whoops while true capital T. Go

[903:26]

ahead and use a variable n. Set it equal

[903:29]

to int input

[903:32]

height asking the human for the height

[903:34]

of the wall. And I'm going to just cross

[903:36]

my fingers that they're not going to

[903:37]

type in cat or dog or something that's

[903:39]

not an int. In this case, I'm going to

[903:40]

say if n is greater than zero, that is a

[903:44]

positive number. That's useful. We can

[903:46]

proceed. I'm going to now break out of

[903:48]

this loop. And then lower in the file,

[903:50]

I'm going to say for i in range of n, go

[903:53]

ahead and print out the hashes. So we

[903:56]

still have that same lesson as before,

[903:57]

like the Python version seems to be

[903:59]

shorter, more concise, even if you

[904:00]

ignore the comments on the lefth hand

[904:02]

side. And I've completely avoided using

[904:04]

a dowhile loop. But there are a few

[904:06]

things that are different nonetheless

[904:08]

that feel like versus C shouldn't even

[904:11]

work. Like what's weird about this

[904:13]

solution even though I think it's

[904:15]

actually correct?

[904:18]

Yeah,

[904:21]

>> I have two.

[904:24]

>> Okay, so it's not correct. That's uh one

[904:27]

of the first things to point out. So,

[904:29]

too many prepositions for this was

[904:31]

supposed to say for I in range. Okay.

[904:34]

So, now that this program's correct,

[904:36]

what looks weird to you and probably

[904:38]

could break it. Yeah.

[904:42]

>> Yeah. So, the end variable should be it

[904:45]

seems to be scoped to the while loop, at

[904:46]

least in so far as it's indented inside

[904:48]

the while loop, which feels analogous to

[904:50]

being inside of curly braces and C. And

[904:52]

so it seems weird that I'm presuming to

[904:54]

use n on line six even though it was

[904:56]

only defined on line two. It turns out

[904:59]

this is possible in Python. The issue of

[905:02]

scope that we encountered in C is not as

[905:05]

rigorously enforced. We'll say for today

[905:08]

such that when you define N up here, you

[905:11]

can actually use it down here. And you

[905:13]

can think of this as being a little

[905:14]

reasonable because if there's no more

[905:17]

specification of what data type n is and

[905:19]

no more semicolon. Just imagine it would

[905:21]

look kind of stupid if you just put an a

[905:23]

blank N there and hit enter just so it

[905:26]

kind of exists. There's no way to

[905:27]

express the idea of create this variable

[905:28]

in advance without actually assigning it

[905:30]

a value. Whereas in C we could do that.

[905:32]

So this is in fact okay and correct. Um

[905:37]

what else is going on here? Well instead

[905:38]

of a do while we're kind of just

[905:40]

implementing the idea of it. I'm just

[905:41]

blindly inducing deliberately an

[905:43]

infinite loop like do the following

[905:44]

forever but then as soon as I have the

[905:47]

answer I want like a positive integer

[905:48]

from the human break out of this loop

[905:50]

and this is indeed the pythonic way to

[905:53]

say get user input because this will

[905:55]

minimally ask the user for a height once

[905:57]

and maybe more and more times. So no do

[906:01]

loops only while loops and for loops and

[906:03]

only while loops are really the same as

[906:05]

in C. Even for loops we've seen are a

[906:07]

bit different. All right. Well, how

[906:09]

about instead of just that Mario uh

[906:11]

example, recall this one where we wanted

[906:13]

to print like four question marks in the

[906:15]

sky side by side. Well, we can do this

[906:17]

in a few different ways. Let me go back

[906:18]

to VS Code, close the C version, and

[906:20]

let's just completely change Mario.py to

[906:22]

implement this. Now, I want four

[906:24]

question marks in the sky. So, I think I

[906:26]

can do something like for I in range of

[906:29]

four, go ahead and just print out quote

[906:32]

unquote question mark. Do you like this?

[906:35]

Python

[906:36]

of Mario.py Pi. Should I run it? No.

[906:40]

Why?

[906:43]

This is how I did it in C. Yeah.

[906:47]

>> Yeah. I got to edit the end value, the

[906:50]

named parameter for the print function

[906:52]

because otherwise if I hit enter,

[906:54]

they're all on different lines, which is

[906:55]

not the effect I want when all four

[906:57]

question marks are meant to be side by

[906:59]

side. All right. Well, that's an easy

[907:00]

fix. I can pass the named parameter

[907:02]

called end into the print function. Set

[907:05]

it equal to quote unquote with double

[907:06]

quotes or with single quotes. As always,

[907:09]

stylistically, I would be consistent.

[907:10]

So, I'm going to use double quotes even

[907:11]

though the documentation is consistent

[907:13]

with its single quotes. Now, I'm going

[907:14]

to rerun Mario of Python Mario.py. And

[907:18]

I'm so close. Now, they're on the same

[907:19]

line, but the stupid cursor didn't move

[907:21]

to the next line. That's fine. How to

[907:23]

fix this? Well, just logically, I can

[907:25]

put a blank print statement below. And

[907:27]

even though I'm not passing anything in,

[907:29]

you get a new line for free when calling

[907:31]

print. So even though I'm not passing in

[907:34]

any arguments, I am getting the

[907:36]

aesthetic effect that I want. So that is

[907:38]

a perfectly reasonable way to do it.

[907:40]

Now, if you feel yourself becoming a bit

[907:42]

of a geek though in learning about

[907:44]

Python and previously C, you can even

[907:47]

solve this problem even more

[907:48]

Pythonically by saying print quote

[907:51]

unquote question mark* 4 using

[907:55]

multiplication similar in spirit to the

[907:57]

plus operator for concatenation. And now

[907:59]

multiply the exclamation point by itself

[908:02]

four times. So now if I go down here and

[908:04]

run Python of Mario.py, I get a very

[908:07]

elegant solution to exactly that same

[908:09]

problem. even more concisely than my

[908:12]

previous version. What if I want to do

[908:14]

something in two dimensions? Well,

[908:15]

recall that we moved to the underground

[908:16]

of Mario Brothers here and we had like a

[908:19]

3x3 grid of bricks. How can we do that?

[908:21]

Well, in C, we had nested for loops

[908:23]

using I and J back in the day. And I

[908:26]

could do the same thing in Python. Let

[908:27]

me go back into VS Code here and let me

[908:30]

do one outer loop for I in range of

[908:32]

three. Then let me do an inner loop for

[908:35]

J in range of three. Then let me go

[908:38]

ahead and print out a hash. But let me

[908:41]

learn from my past mistakes. I don't

[908:43]

want to print out a new line every time.

[908:45]

So let's override that default. But

[908:48]

after each row, let's print a new line.

[908:50]

So that down here, I can go in Mario.py,

[908:53]

run it, and I've got my 3x3 grid of

[908:55]

bricks. I could change this a little bit

[908:57]

and call this row and column. Even

[909:01]

though here too, even more so. I'm not

[909:03]

literally using row and column anywhere

[909:05]

explicitly, but semantically it kind of

[909:08]

explains maybe a little clearer to the

[909:10]

reader what's actually going on. So that

[909:13]

might help. But we could tighten this up

[909:15]

too, right? If I just want to print a

[909:16]

3x3 grid, well, I know that the top

[909:19]

thing here will iterate three times. And

[909:21]

I know how to very elegantly print

[909:23]

things out with a oneliner. So I could

[909:24]

just print out a hash times three in

[909:26]

this case. And then down here, I can go

[909:28]

to Python of Mario. And voila, I'm back

[909:31]

in business 2. So it's just sort of

[909:32]

easier to do these kinds of things and

[909:34]

express yourself all the more

[909:35]

succinctly. Well, what else can we do?

[909:38]

Well, it turns out in Python that unlike

[909:41]

arrays, you can ask lists how long they

[909:44]

are. So you don't have to keep around a

[909:46]

variable of how large an array is. You

[909:48]

can just add stuff to a list and then

[909:50]

ask Python how long is this list? How

[909:52]

many elements are in it? Case in point,

[909:54]

let me go back to VS Code and clear out

[909:56]

Mario.py pi and let's reimplement from a

[909:58]

few weeks back the notion of uh

[910:00]

calculating uh like and the average uh

[910:02]

quiz score that you might have in a

[910:04]

class. So in score.py, let's go ahead

[910:06]

and create a program that's got a list

[910:08]

called scores of three scores that we've

[910:10]

seen before, 72, 73, and 33. And recall

[910:13]

that we tried a few weeks back and see

[910:15]

to average these together. And to do

[910:16]

that, we had to add them all together.

[910:18]

We had to uh divide by the total number

[910:20]

of elements in the list. Like it wasn't

[910:22]

that hard. It was sort of like grade

[910:23]

school arithmetic to calculate an

[910:25]

average. But Python has more functions

[910:28]

available to us. Not just length, but

[910:30]

even summation. So let me go ahead and

[910:32]

do this. Let me say that my average

[910:34]

variable shall be the sum of those

[910:38]

scores divided by the length of those

[910:40]

scores. And indeed, per the

[910:42]

documentation, Python has a lang

[910:44]

function, leen for short, a sum function

[910:47]

which takes the add uh which adds

[910:48]

together all of the elements in that

[910:50]

list. And so down here now I can say

[910:52]

something like print with an f string or

[910:54]

format string that the average is

[910:57]

whatever that value is. And I don't have

[910:59]

to do any loops or math myself. I can

[911:00]

just call the function like I could in

[911:02]

Excel or Google Sheets or Apple numbers.

[911:05]

Python of score.py

[911:07]

enter. And my average is in fact

[911:09]

59.3333. And then some weird imprecision

[911:12]

at the end there. And in fact just for

[911:14]

consistency with our C code, let me

[911:16]

rename this. I'm going to rename score

[911:18]

to scores plural. That's going to close

[911:20]

the window. But now at least you'll see

[911:22]

online that we have a program indeed

[911:23]

called scores. Well, this is not that

[911:25]

interesting because I've just hard-coded

[911:27]

my 72, my 73, and 33. What if we want

[911:29]

the human to be able to type that in?

[911:31]

Well, I think we can do that, too. So,

[911:33]

let me actually open up that version of

[911:35]

the file now pluralized. Let me go ahead

[911:38]

and not initialize the list for the

[911:41]

human, but let me set it equal to an

[911:43]

empty list. Just using an open square

[911:45]

bracket and close square bracket, like

[911:47]

an array that has nothing in it. But

[911:48]

this one is literally of size zero at

[911:50]

the moment. And now let me do for I in

[911:53]

range of let's just for now ask the user

[911:56]

for three scores. Even though we could

[911:57]

certainly ask the user how many scores

[911:59]

do they want to input and then use that

[912:01]

number instead. So in each of these

[912:03]

iterations, let's ask the user for a

[912:05]

score using something like int input

[912:09]

score. I'm going to set aside the

[912:11]

reality that if the user types in cat or

[912:13]

dog, the whole thing's going to break

[912:15]

and therefore I should really add my try

[912:17]

and my accept. But I'm going to discard

[912:18]

that error checking and focus only on

[912:20]

the essence of this program for now. Now

[912:23]

after line three, if I have in a score

[912:25]

variable the user's quiz score, how do I

[912:28]

put it into that array? Well, in in that

[912:31]

list, well, with an array, I had to use

[912:32]

the square bracket notation, keep track

[912:34]

of how big it is and use like bracket I

[912:36]

or something like that. No longer in

[912:38]

Python because a

[912:41]

uh list is an object that has not only

[912:45]

data but functions aka methods

[912:47]

associated with it. I can just call a

[912:49]

method that comes with every Python list

[912:52]

called append and pass in that score

[912:55]

using that same dot notation as before.

[912:58]

The rest of my code can stay exactly the

[913:00]

same. If I now run Python of scores.py

[913:02]

pi and I type in 72 73 33 manually

[913:08]

though I still get that same average and

[913:10]

notice I did not need to decide in

[913:12]

advance how big that list of scores was

[913:15]

going to be questions on what we've just

[913:19]

done with lists.

[913:23]

No. All right. Even cooler for some

[913:26]

definition of cool is that we can now

[913:28]

implement hash tables or more

[913:30]

generically dictionaries sets of key

[913:32]

value pairs by just using a data type

[913:34]

that comes with Python. I claimed last

[913:36]

week that like Python that dictionaries

[913:38]

are sort and hashts in particular are

[913:40]

sort of the Swiss army knives of data

[913:42]

structures and that they just let you

[913:43]

associate some piece of data with

[913:44]

others. With Python, you do not need to

[913:46]

jump through the hoops that you needed

[913:48]

to with problem set five implementing

[913:49]

your own spell checker and your own

[913:51]

hasht. you just create a dict object in

[913:54]

Python, a dictionary that gives you the

[913:56]

ability to associate keys with values.

[913:58]

So, case in point, let's do this. Let me

[914:00]

go back into VS Code and close out

[914:02]

scores.py and let's create a new and

[914:05]

improved version of our phone book in

[914:07]

phone book.py. Let's go ahead and come

[914:09]

up with a list of names just to

[914:10]

demonstrate how we could store a bunch

[914:12]

of names in the phone book irrespective

[914:13]

of numbers and set those equal to say uh

[914:16]

Kelly's name and my name and John

[914:20]

Harvard's name just by putting four

[914:23]

quoted strings or stirs inside of this

[914:25]

list. Now let's ask the human using the

[914:28]

input function for the name that they

[914:29]

want to search for in this list. And now

[914:32]

let's implement linear search using

[914:33]

Python. I can do this in a bunch of

[914:35]

ways, but one way is to say for each uh

[914:39]

name, we'll call it n in names, go ahead

[914:42]

and ask the question if the name I'm

[914:44]

looking for equals the current name in

[914:47]

the list that I'm iterating over, go

[914:49]

ahead and print out just something

[914:51]

generic like found and then break out of

[914:54]

this loop. And let's see if we can find

[914:57]

Kelly or David or John or someone else.

[914:59]

Python of phonebook.py. Enter. Searching

[915:02]

for the name, say David. Enter. And it

[915:04]

was in fact found. Let me go ahead and

[915:06]

search for someone else's name that's

[915:07]

not in there, Brian. And now it's not in

[915:10]

fact found. Although it's not all that

[915:11]

enlightening to just ignore the question

[915:13]

altogether. It would be nice to say not

[915:16]

found. And here where is where in C it

[915:19]

would be kind of nonobvious to do this

[915:21]

in C. If you wanted to print out found

[915:24]

or if you get through the whole list and

[915:26]

you still haven't found the user, print

[915:28]

not found. you'd have to like keep track

[915:30]

with the variable of whether or not you

[915:31]

found the person or you'd have to return

[915:33]

from the code prematurely just to get

[915:35]

out of it logically. Turns out somewhat

[915:37]

weirdly but wonderfully usefully for

[915:40]

loops in Python can have else clauses

[915:44]

associated with them whereby I can say

[915:46]

down here print not found. If I run this

[915:50]

version of the program and search for

[915:51]

someone who's not in the phone book like

[915:53]

Brian now I actually see not found.

[915:56]

Semantically, it's a little weird, but

[915:58]

essentially what's happening is if you

[916:00]

get through this whole loop and you

[916:02]

never call break, then you've not

[916:05]

actually broken out of the loop. So,

[916:07]

you're going to hit the else. And in

[916:08]

that case, you're going to print out not

[916:10]

found. And this is such a common thing

[916:12]

to like do this kind of bookkeeping and

[916:14]

keep track of whether or not something

[916:15]

has happened inside of a for loop. And

[916:16]

if so, do this, else do that. Else

[916:19]

literally handles that scenario in

[916:20]

Python. And this is the most C unlike

[916:24]

thing that we've perhaps seen in terms

[916:25]

of features with regard to at least

[916:28]

loops. All right. Well, this is great

[916:31]

that I've kind of implemented linear

[916:32]

search, but like we did that in C and

[916:35]

it's getting a little tedious. Can't we

[916:36]

do better? We actually can. Let me clear

[916:38]

my terminal and tighten this up. Instead

[916:40]

of iterating over every name in names,

[916:43]

just like we keep iterating over

[916:45]

integers in ranges and checking for each

[916:48]

name if it equals the thing we're

[916:49]

looking at, you can actually do

[916:51]

something much more clever. You can just

[916:52]

literally ask Python if the name you're

[916:55]

looking for is in the names list, then

[916:59]

go ahead and print out uh found, else

[917:03]

print not found. And so this is where

[917:05]

Python 2 gets kind of cool. In line

[917:08]

five, you have just a simple if

[917:10]

condition with a boolean expression name

[917:12]

in names. How does Python know if name

[917:14]

is in names? It uses linear search

[917:17]

presumably to search over the whole list

[917:19]

of names looking for what you care about

[917:21]

and then tells you true or false if it

[917:23]

found it. You don't have to write the

[917:25]

code to iterate over it with a while

[917:27]

loop or for loop or whatnot. You just

[917:29]

say what you mean. And so here too, it's

[917:31]

a little more English-like. If name in

[917:33]

names, question mark, then print found,

[917:35]

much more so than it would be

[917:37]

pronouncable in C. So that's one other

[917:41]

cool feature that we now have at our

[917:42]

disposal. What's yet another? Well, when

[917:44]

it comes to dictionary objects in C, or

[917:47]

rather in Python, a dict object really

[917:50]

just gives you a set of key value pairs.

[917:52]

And we've seen this kind of chart before

[917:53]

whereby we might have name and number

[917:55]

and name and number and name and number.

[917:57]

How do we translate this to code?

[917:59]

Because in C, as with problem set 5, it

[918:01]

was going to be quite an undertaking to

[918:03]

be able to store a whole bunch of things

[918:05]

in memory in the form of something like

[918:06]

a hash table. Well, in Python, we can

[918:09]

actually define a dictionary ourselves.

[918:11]

So, these square brackets represent a

[918:15]

list, but I can alternatively use curly

[918:17]

braces for a very new purpose. I'm going

[918:20]

to go ahead and hit enter just to move

[918:21]

the second curly brace to a new line.

[918:23]

And I am going to now enumerate a bunch

[918:25]

of key value pairs. Namely, quote

[918:27]

unquote Kelly for the first key colon.

[918:30]

Then we'll do + one 617495

[918:34]

1,000 as the number. Then I'm going to

[918:36]

go ahead and do quote unquote David for

[918:38]

the second key. And since we both work

[918:40]

here, I'm going to go ahead and just use

[918:42]

that same number as we've done in

[918:43]

before. Then a third key for John

[918:45]

Harvard colon. And for John, we'll use

[918:48]

plus one 949

[918:50]

uh 4682750,

[918:53]

which is fun to call or text this. Now,

[918:55]

even though it's syntactically a little

[918:57]

different, gives me the equivalent of

[918:59]

this chart here, key value pairs, where

[919:02]

the keys are the staff names and the

[919:05]

values are the staff numbers. That

[919:08]

implements all of that, a hash table, if

[919:10]

you will, in Python's own syntax. So,

[919:13]

how do I now use this? Turns out I can

[919:16]

actually use it in exactly the same way.

[919:18]

I'm going to go ahead and generalize

[919:20]

this now to people because it contains

[919:21]

not just names but names and numbers. So

[919:23]

I'm going to change this variable down

[919:24]

here to people too. But notice the

[919:27]

syntax now. I can still ask the human

[919:29]

for a name they want to look up. I can

[919:31]

now still say if the name is in the

[919:33]

people dictionary. And by definition,

[919:36]

Python's going to interpret that

[919:37]

preposition in as meaning is the

[919:40]

following key in the dictionary. And if

[919:43]

so, it's going to return true. But

[919:45]

what's cool about this is that besides

[919:47]

just making this work as follows. Python

[919:50]

phonebook.py. And let's type in David.

[919:52]

And there's my number. Oh, that's not my

[919:53]

number. It just says found. Let's run it

[919:55]

again and type in say Brian. Not found.

[919:58]

Okay, that's as expected. But I'd like

[919:59]

to know what my number is or Kelly's

[920:01]

number or John's number. Well, that's an

[920:02]

easy fix, too. Inside of this

[920:05]

conditional, I can say something like

[920:06]

this. Number equals people bracket name.

[920:12]

And we've not seen this before, but we

[920:14]

have seen square brackets in C when we

[920:17]

had arrays. This square bracket notation

[920:19]

is how you indexed into an array to get

[920:22]

a specific value 0 1 2 3 4. What's

[920:27]

amazing about dictionaries, not just in

[920:29]

Python, but in other languages as well,

[920:31]

you can now index into a dictionary just

[920:34]

as you can index into an array. But

[920:36]

whereas an array you use numeric

[920:38]

indices,

[920:39]

in dictionaries you use string indices.

[920:42]

You can use strings to look up their

[920:44]

corresponding value. So to be clear,

[920:47]

name at this point is given to us by the

[920:51]

human's input. So if I typed in DAV ID,

[920:53]

name equals David. So this is like

[920:55]

saying people square bracket quote

[920:57]

unquote David. Find David's number. that

[921:00]

stores the answer from this two column

[921:03]

chart in the variable called number. And

[921:05]

all that remains is for me to print it

[921:07]

out, which I can do using an old fing.

[921:09]

Now, let me go down into my print

[921:10]

statement, change this to an fstring,

[921:12]

add a colon, add the number variable to

[921:14]

be interpolated, rerun this program as

[921:17]

Python of phone book.py, type in my

[921:19]

name, and there's my number as found.

[921:23]

And this is incredibly powerful. And why

[921:25]

again

[921:27]

uh hashts and in turn more generally

[921:28]

dictionaries are sort of the Swiss army

[921:30]

knife. Being able just to look up data

[921:32]

with such simple syntax is wonderfully

[921:34]

useful and powerful. And in fact we can

[921:36]

even do more than this. For instance,

[921:38]

let me propose that if you think about

[921:40]

other incarnations of um key value

[921:44]

pairs, you see them all the time. For

[921:45]

instance, in like spreadsheets, like

[921:46]

here's a screenshot of Google Sheets

[921:49]

whereby I've got the beginnings of a

[921:50]

spreadsheet with uh names and numbers.

[921:54]

But in this model, I want to actually

[921:56]

associate some metadata with my data. So

[921:58]

the data I care about is the actual

[921:59]

names and numbers. But you could imagine

[922:01]

having a third column like email address

[922:03]

and maybe home address or any number of

[922:05]

other pieces of data associated with

[922:07]

these three people. For now, I've just

[922:09]

got two columns or two attributes, names

[922:11]

and numbers. Each of the rows in a

[922:13]

spreadsheet, as most anyone knows who's

[922:14]

used a spreadsheet before, represents

[922:16]

different records or different pieces of

[922:18]

data, like this is Kelly, this is David,

[922:20]

this is John, and so forth. We can

[922:22]

implement this idea using dictionaries

[922:25]

and lists together. So the syntax is

[922:27]

going to be a little strange at first,

[922:29]

but let me go back to VS Code here and

[922:31]

let me change my people uh dictionary to

[922:35]

be a people list between square

[922:38]

brackets. And the elements of this list

[922:40]

now are going to be uh dictionaries

[922:43]

themselves. I'm going to use some curly

[922:46]

braces inside of these square brackets

[922:48]

and say that the name of one person is

[922:50]

quote unquote Kelly and the number for

[922:52]

that person is quote unquote +16174951

[922:57]

1000 close quote then comma on the

[923:01]

outside of the curly braces then I'm

[923:03]

going to have another quote unquote name

[923:05]

colon dv ID comma then another number

[923:09]

colon I'm going to borrow the same phone

[923:11]

number because we both work here then

[923:13]

lastly a comma and finally quote unquote

[923:16]

name colon quote unquote John and then

[923:20]

lastly a quote unquote number for John

[923:24]

colon plus one uh 949468275

[923:30]

zero.

[923:32]

All right. So what's going on here now?

[923:34]

Our people variable is now not just a

[923:36]

simple dictionary with just individual

[923:38]

key value pairs. Name number name number

[923:40]

name number number. We now have a more

[923:42]

generalized way of storing not just a

[923:44]

name or a number but an email address or

[923:47]

a home address or any number of other

[923:49]

values. How? Well, the commas just

[923:51]

separate the key value pairs now. So, if

[923:53]

I do have email addresses for us, I can

[923:55]

put comma quote unquote email colon like

[923:59]

and I can just keep adding these key

[924:01]

value pairs to each of the dictionaries

[924:03]

because a dictionary is a collection of

[924:05]

key value pairs. So it stands to reason

[924:07]

that I can associate name with David,

[924:09]

number with the number, email with

[924:10]

mailinhar.edu and so forth, effectively

[924:13]

implementing this idea now in the

[924:16]

computer's memory. And at the risk of

[924:18]

significantly oversimplifying, this is

[924:20]

what Google and Microsoft and Apple are

[924:22]

doing with their spreadsheet software.

[924:23]

They have written code that presents to

[924:25]

you a nice table with a graphical user

[924:27]

interface on the screen, but underneath

[924:28]

the hood, what they effectively have is

[924:30]

lists of dictionaries representing each

[924:33]

of those rows. And we're going to come

[924:34]

back to this when we start experimenting

[924:36]

before long with our own databases.

[924:37]

Going to get back rows of data from

[924:39]

databases. We are going to store that

[924:41]

data in lists of dictionaries for the

[924:43]

same reason as well. So, how can we use

[924:45]

this? Well, let me hide my terminal for

[924:47]

a second and tweak the program just a

[924:48]

little bit. I'm still going to get the

[924:50]

name of a person to look up their

[924:52]

number. I'm still going to uh how about

[924:56]

iterate over this because I've lost the

[924:58]

ability at least for now to just ask a

[925:01]

question like is this name in the

[925:02]

structure because it's a list I do now

[925:04]

need to iterate a little bit

[925:06]

differently. So I'm going to do for each

[925:08]

person in the people list go ahead and

[925:11]

check is the current person's name equal

[925:15]

to the name I'm looking for and if so go

[925:18]

ahead and create a variable called

[925:20]

number. set it equal to that person's

[925:23]

number and then go ahead and print out

[925:26]

for instance found colon then in my

[925:29]

curly braces that specific number and

[925:32]

then after all that break out of this.

[925:34]

So this is a mouthful but recall that

[925:37]

it's all the same syntax we've seen

[925:39]

before in smaller parts. Square brackets

[925:41]

and square brackets means here comes a

[925:43]

list. What are the elements of this

[925:44]

list? dict dict three dictionaries back

[925:48]

to back to back each of which has a key

[925:51]

and a value and a key and a value called

[925:53]

name and number respectively. The second

[925:55]

one temporarily has name and number and

[925:58]

email as keys plus three values and the

[926:00]

third one has keys of name and number as

[926:03]

well with their corresponding value. So

[926:06]

when I iterate over each person in the

[926:08]

people list that means on each iteration

[926:11]

person is going to be set to this

[926:13]

dictionary then this dictionary then

[926:16]

this dictionary on each iteration I'm

[926:18]

asking this question is that current

[926:20]

person's name key uh is rather is the

[926:24]

value of that person's name key equal to

[926:27]

the name I'm looking for and if so grab

[926:30]

a variable called number set it equal to

[926:32]

the value of that person's number key

[926:35]

and then just print it out. And if we

[926:37]

wanted email instead, I tweak the word

[926:40]

uh number to email. If I want to look up

[926:42]

anything else, you can tweak that code

[926:43]

there. But being able to index into

[926:46]

dictionaries using strings is sort of

[926:48]

the fundamentally powerful new technique

[926:52]

that we have here.

[926:56]

Question now on any of this? Yeah.

[926:59]

>> If both

[927:03]

>> Good question. If you wanted both name

[927:05]

and number on the screen, do you

[927:06]

concatenate? Sure, you could do that. Or

[927:08]

print them out by passing a comma into

[927:09]

the print function and printing one out

[927:11]

each way. Absolutely. However you want

[927:13]

to format it. And actually, just as an

[927:15]

aside too, even though this becomes a

[927:16]

little less readable, this is a little

[927:18]

silly that on line 11, I'm declaring a

[927:20]

variable called number only to use it

[927:22]

one line later and then never again.

[927:24]

Technically with those curly braces and

[927:26]

format strings, I could just take this

[927:28]

code on the right, plug it into those

[927:30]

curly braces and get rid of this

[927:32]

variable altogether. Just at some point

[927:34]

though, fstrings start to get a little

[927:36]

too hard to read with quotes inside of

[927:38]

quotes. And so like I kind of prefer

[927:40]

being a little more pedantic about it

[927:42]

and explicitly putting it in a variable

[927:44]

and then interpolating just that

[927:46]

variable. But you could do it in

[927:47]

different ways still.

[927:50]

All right, couple final features of

[927:52]

Python that'll get us on our way with

[927:54]

doing other things. Turns out there's a

[927:56]

whole bunch of libraries that come with

[927:57]

the language itself that you nonetheless

[927:59]

have to import. Even though they're not

[928:01]

third party, you didn't have to install

[928:02]

them. You just need to add them to your

[928:05]

code by importing them. One of them is

[928:06]

CIS. And among the things that the CIS

[928:08]

library has in Python is the ability to

[928:11]

give you access to command line

[928:13]

arguments. After all, we've lost access

[928:15]

to command line arguments because

[928:16]

there's no more main, at least by

[928:18]

convention. There's no int main void.

[928:19]

There's no int main argv arg stuff going

[928:22]

on in our code. But all of that

[928:24]

functionality is still available in a

[928:26]

library called uh cis. So how do we use

[928:29]

this? Well, let me go back to VS Code

[928:31]

here now. Let me create a relatively

[928:33]

simple program called greet.py. Similar

[928:35]

to a few weeks back that's just going to

[928:37]

greet the user using command line

[928:39]

arguments instead of get string or the

[928:41]

input function. I'm going to do this by

[928:42]

saying from the cy library import argv.

[928:46]

In this case, argv is essentially just a

[928:49]

list. It is a list of the command line

[928:51]

arguments that the human has typed. It's

[928:52]

a list, which means you can just ask the

[928:55]

length function leen what its length is.

[928:58]

So, there's no need for arg anymore. You

[928:59]

can just literally ask arg how long it

[929:02]

is, which is kind of nice. So, I'm going

[929:04]

to say this. If the length of argv

[929:07]

uh equals 2, which means the human typed

[929:11]

two words at the prompt. Okay, let's go

[929:13]

ahead and greet them assuming that's

[929:14]

their name and say hello,

[929:17]

and then whatever their name is. Let me

[929:19]

make this a format string. And to be

[929:21]

pedantic, let me create a variable

[929:22]

called name and set it equal to argv

[929:25]

bracket 1, which is going to be the

[929:27]

second word that the human typed in, as

[929:29]

has been our convention in the past.

[929:31]

Else, if they didn't type exactly two

[929:33]

command line arguments, let's just go

[929:34]

ahead and print out something like hello

[929:37]

world as generic. Let me run python of

[929:39]

greet.py. Enter. And you see hello world

[929:43]

because I apparently did not type in

[929:45]

exactly two words and yet I did. So

[929:48]

let's see where this is going. Let me

[929:49]

rerun Python of greet.py but type in my

[929:52]

name David at the command line. Enter.

[929:55]

And huh I screwed up unintentionally.

[929:58]

What did I do wrong? All right. Print f

[930:01]

is not a thing. So that's an easy fix.

[930:03]

Let's delete it. Let me clear my

[930:05]

terminal window. Rerun python of

[930:06]

greet.py space David. Enter. And now I

[930:09]

get hello David. The only thing that's

[930:11]

weird here is that I typed in three

[930:14]

words at the prompt and yet I'm checking

[930:16]

for two. And it's a bit subtle, but with

[930:19]

Python and RV, it ignores the Python

[930:22]

interpreter. It goes without saying that

[930:23]

you're using the Python interpreter to

[930:25]

run a Python program. So the only things

[930:26]

that are being counted are the words

[930:28]

after the Python interpreter itself. So

[930:32]

when I type greet.py and David, that's

[930:34]

two. When I only typed greet.py, that's

[930:37]

one instead.

[930:40]

All right. So now that I've done that, I

[930:42]

have access to my command line

[930:44]

arguments. Again, what about my exit

[930:45]

statuses? This was getting a little low

[930:47]

level, but in recent C programs, we've

[930:48]

had you all returning zero on success,

[930:51]

returning one on error. Can we still do

[930:53]

that? Well, yes. And in fact, the CIS

[930:55]

library is used for that as well. So if

[930:57]

I want to actually add some exit

[930:59]

statuses to a program to facilitate

[931:01]

check 50 and automated tests in the real

[931:03]

world, I can do that with a program

[931:05]

called let's call this uh exit.py. And

[931:08]

in exit.py, Pi I'm similarly going to

[931:11]

import uh CIS but in a different way.

[931:13]

I'm going to give myself access to

[931:17]

well yes let's go ahead and import the

[931:19]

whole library just to demonstrate how

[931:21]

you can access things inside of it

[931:23]

without explicitly saying from cis

[931:25]

import such and such as before if uh the

[931:29]

length of cis.orgv arg. So this is a

[931:34]

little bit different, but I'm asking the

[931:35]

same kind of question. Does not equal

[931:37]

to. I want to go ahead and print out to

[931:39]

the user missing command line argument,

[931:42]

which is something we did a while back

[931:44]

as well. And then I want to exit with

[931:46]

code one. CIS.exit

[931:49]

one else. If I don't run into that

[931:52]

issue, I'm going to go ahead. Actually,

[931:53]

let's not even bother with an else.

[931:55]

Let's for parody with our C version,

[931:57]

let's do this. print f quote unquote

[932:00]

hello

[932:02]

uh cis.orgv bracket one close quote

[932:07]

cis.exit exit zero. All right, that's a

[932:09]

whole mouthful, but what's really going

[932:11]

on? So, I could have done from cis

[932:14]

import argv, but I don't need to

[932:17]

enumerate every single variable or every

[932:20]

single function that I want from a

[932:22]

library. I can also just more generally

[932:23]

say import the whole library. Give me

[932:25]

access to everything and then I'll tell

[932:27]

you what I want from it later.

[932:28]

Therefore, on line three, I can still

[932:31]

access argv. I just have to scope it to

[932:33]

the cy library. So that I say cis.orgv

[932:36]

not arg means go inside of that library

[932:38]

and find me arguing

[932:40]

it to a variable unto itself in my own

[932:42]

code. Why am I saying not equal to two?

[932:44]

Well, if they don't give me two words uh

[932:47]

after the interpreter's name, I want to

[932:49]

yell at them and say missing command

[932:50]

line argument and then exit one. I'm not

[932:52]

going to give them a default hello world

[932:53]

anymore. I want them to give me their

[932:54]

name. Meanwhile, if I get this far and I

[932:57]

haven't exited from the program, I can

[932:58]

print out cis.orgv bracket one, which is

[933:01]

going to be David in the example I typed

[933:03]

before. And this means success. So

[933:05]

cis.exit

[933:07]

zero signifies success. It's more syntax

[933:10]

than before uh than it was in C, but we

[933:13]

have the exact same functionality

[933:15]

available to us as we have in the past.

[933:18]

How about one other example that we've

[933:20]

had in the past. Let's convert it to

[933:21]

Python as well. So you have a few more

[933:23]

tools in your toolkit. How about

[933:25]

implementing a version of this phone

[933:26]

book that actually persists? So instead

[933:29]

of hard coding into it Kelly and David

[933:31]

and John in this way, let's actually let

[933:33]

the user type in a name and a number

[933:35]

just like on your iPhone or Android

[933:36]

phone and add it to a text file like a

[933:38]

CSV file as we did before uh using

[933:41]

commaepparated values. Well, it turns

[933:43]

out that Python comes with a library to

[933:46]

handle CSV files. We don't need to

[933:48]

hackishly implement our own CSV support

[933:50]

by printing the commas ourselves.

[933:52]

Instead, we can import the CSV library.

[933:55]

We can then create say a variable called

[933:57]

file set it equal to open and open a

[933:59]

file called phonebook.csv

[934:01]

in append mode. So this is almost the

[934:04]

same as C except it's open instead of

[934:06]

fop which we saw a couple of weeks back.

[934:08]

Now let's ask the user via the input

[934:10]

function for the name they want to add

[934:12]

to their contacts and the number that

[934:14]

they want to add to their contacts. And

[934:16]

then in after that, let's go ahead and

[934:19]

do this, which is a bit of uh muscle

[934:22]

memory to to remember, but I'm going to

[934:24]

create a variable called writer, but I

[934:26]

could call it anything I want. Set it

[934:28]

equal to CSV.riter,

[934:30]

which means there's a function called

[934:32]

writer in the CSV library that I'm

[934:34]

simply accessing it because I didn't

[934:36]

import it explicitly by name. And I'm

[934:38]

going to pass it that file. This tells

[934:40]

Python, turn that file into a CSV that

[934:44]

can be written to. The next line of

[934:46]

code, I'm going to literally say

[934:48]

writer.right

[934:49]

row. Write row is a method aka function

[934:53]

associated with this writer object. And

[934:56]

I know that only because I did actually

[934:57]

read the documentation uh for the CSV

[935:00]

library. What do I want to write? Well,

[935:03]

I want to write a list of values, namely

[935:06]

a name and a number. And I'm using

[935:08]

square brackets to tell the right row

[935:10]

function that here you go. Here's a list

[935:12]

of values, two of them, a name and a

[935:15]

number. After all that, I'm going to do

[935:17]

file.close and just close the whole

[935:19]

file. All right, so where does this

[935:21]

actually get me? Well, let me go ahead

[935:23]

and open up phonebook.csv, which is

[935:25]

initially empty. I'll move this over to

[935:27]

the right hand side.

[935:29]

But when I now run this program with

[935:32]

Python of phonebook.py,

[935:35]

enter. I'll type in, say, Kelly's name.

[935:37]

Enter. + 1 6174951000.

[935:41]

Enter. And voila, it ends up in the CSV

[935:44]

using a little bit less code than we had

[935:46]

to last time with C. Let's run it once

[935:48]

more. And I'll type in my name. And I'll

[935:50]

again use + 1 617495

[935:52]

1000. Enter. It's being appended to that

[935:55]

file as well. And one last time for

[935:56]

John. Plus 1 9494682750.

[936:00]

Enter. Voila. So it's pretty easy. That

[936:03]

is to say in Python to start creating

[936:05]

files like this. But this isn't really

[936:07]

Pythonic. Let me in fact close the CSV

[936:09]

file, hide my terminal, and propose that

[936:12]

we can tighten up this code a bit too. I

[936:14]

don't need to open up the file way up

[936:16]

here. I can go ahead and get my

[936:18]

variables values uh this way first. And

[936:21]

in fact, I could have done that code a

[936:22]

little later anyway, but I can do this

[936:24]

in Python. I can say with the following

[936:27]

file opened, phone book.csv CSV in

[936:30]

append mode and refer to it as a

[936:33]

variable called file. Do this stuff and

[936:36]

close the file yourself. So this program

[936:38]

is suddenly significantly shorter

[936:40]

because this one line has the effect of

[936:42]

opening the file for me in append mode,

[936:44]

assign it to a variable, do this stuff,

[936:46]

and then as soon as the program's

[936:48]

indentation ends and there's code over

[936:50]

here or no code whatsoever, the file

[936:52]

gets closed for me automatically. This

[936:54]

just helps us avoid like memory leaks

[936:56]

and like stupid mistakes we've made in C

[936:58]

because you forget to close a file that

[936:59]

you have to open and you don't

[937:00]

necessarily notice unless you run valr

[937:02]

or something on it. Python tries to

[937:04]

avoid this by giving you a new keyword

[937:06]

with that doesn't really make sense

[937:08]

semantically except with the following

[937:10]

file open and it will close the file for

[937:12]

you. So that's two among the features

[937:14]

that you sort of get with Python. The

[937:17]

catch though is that this CSV is fairly

[937:18]

simplistic. In particular, it's missing

[937:20]

a header row that actually indicates

[937:22]

what is in each of the columns. In fact,

[937:24]

if I go ahead and run code of

[937:26]

phonebook.csv, we'll see again that the

[937:29]

file contains just one row for Kelly,

[937:31]

for me, and for John. Whereas, ideally,

[937:33]

it would look a little something more

[937:34]

like this Google sheet version, which

[937:36]

actually has at the very first row

[937:38]

something say name and number, which

[937:39]

then describes the data therein, after

[937:41]

which are the three actual rows. Now,

[937:43]

the simplest fix here, frankly, would

[937:45]

probably be to just start with name,

[937:48]

comma, number at the top of the file and

[937:50]

then assume that my phonebook.py program

[937:52]

is just going to append, append, append

[937:55]

additional rows to the file containing

[937:57]

the names and numbers respectively. I

[937:59]

could have done that from the get-go.

[938:00]

And in fact, that would be better than

[938:02]

putting some code inside of phonebook.py

[938:04]

PI that writes out that specific row

[938:06]

because after all, if I'm writing

[938:07]

running this program again and again, I

[938:09]

don't want the header row to appear

[938:11]

again and again and again unless I

[938:13]

complicate the program a little bit to

[938:15]

ensure that I only do that once. But

[938:16]

assuming that I do go into phonebook.csv

[938:19]

and from the get-go do have a file that

[938:21]

contains name and number, we can

[938:23]

actually start to improve upon the

[938:25]

implementation of phonebook.py pi

[938:27]

because we can take advantage of the

[938:28]

fact that my dictionary can act that my

[938:32]

writer can actually read that same

[938:34]

header. In fact, let me put these files

[938:36]

side by side here. And then in phone

[938:38]

book.py, let's go ahead and transition

[938:40]

away from using a writer to using a

[938:42]

so-called dictionary writer or dict

[938:45]

writer for short. Capital D, capital W.

[938:47]

And then let me go ahead and specify one

[938:49]

additional argument to this particular

[938:52]

function, namely field names, which I

[938:54]

know exists because I looked it up in

[938:55]

the documentation. And the value of this

[938:57]

argument is supposed to be a list of the

[938:59]

fields that are presumed to exist in the

[939:01]

CSV that we're about to write to. So I'm

[939:03]

going to do quote unquote name, quote

[939:05]

unquote number. Line's a bit long, so

[939:07]

it's scrolling there. But if I scroll

[939:09]

back to the left, we'll see that the

[939:10]

line is otherwise unchanged. But when I

[939:12]

go down now to write each respective

[939:14]

row, notice that I don't have to rely on

[939:17]

this list which just assumes somewhat

[939:20]

naively that name will always be in the

[939:22]

first column or column zero and number

[939:24]

will always be in the second or column

[939:25]

one. After all, if someone were to move

[939:27]

that data around, at least in the

[939:29]

spreadsheet using Excel or Google Sheets

[939:31]

or something else, my code would end up

[939:32]

being fairly fragile because at the

[939:34]

moment it's just assuming blindly that

[939:36]

name goes first followed by number. But

[939:39]

once we have that header row in there

[939:40]

and tell dict writer about it, we can

[939:43]

actually now pass in not a list but an

[939:45]

actual dictionary of key value pairs and

[939:47]

let the dictionary writer figure out

[939:49]

where in the file which column those

[939:52]

values should go in. So inside of this

[939:53]

dictionary, I'm going to have one key

[939:55]

called name, the value of which is

[939:56]

indeed the name the user typed in. The

[939:58]

second key of which is going to be quote

[940:00]

unquote number, the value of which is

[940:01]

the number that the user typed in. And

[940:03]

let me go back actually now and fix a

[940:04]

typo from earlier. We're only asking the

[940:06]

user for one number. So all this time I

[940:08]

should have just requested one number

[940:09]

aesthetically with my input function

[940:11]

there. Now notice I have the file ready

[940:15]

to go. Indeed name and number are there

[940:17]

that matches the field names I've

[940:18]

provided to my code and it matches the

[940:21]

key value pairs that I'm subsequently

[940:23]

passing to right row. So let's go ahead

[940:24]

and give this a try. Let me go ahead and

[940:26]

run again with this otherwise empty CSV

[940:29]

file. Say for the header uh phonebook.py

[940:32]

with uh Python of phonebook.py. Enter.

[940:36]

I'm going to now go ahead and type in

[940:37]

say the first name which was Kelly

[940:38]

before plus 1 617495

[940:41]

1000 and watch what happens at top

[940:43]

right. Kelly and her number end up in

[940:46]

the file even though I didn't actually

[940:47]

specify explicitly as with a list or

[940:50]

numeric indices which value goes where.

[940:52]

Let's run it once more and put in myself

[940:54]

again. Plus 1 617495

[940:57]

1000. Enter. And there again I am. And

[941:00]

lastly, just for good measure, let's go

[941:02]

ahead and put John back in the file with

[941:03]

plus one 949-468-2750,

[941:07]

which if you still haven't called or

[941:09]

texted, do feel free enter. And voila,

[941:11]

in phonebook.csv, we have all of those

[941:13]

same rows and code that's a little more

[941:15]

resilient now against any changes we

[941:17]

might subsequently make there, too. All

[941:19]

right, how about now some final

[941:21]

flourishes using some other features of

[941:23]

Python that we did see a glimpse of some

[941:25]

time ago, namely the ability to install

[941:27]

libraries of our own choice. So, up

[941:29]

until now in CS50.dev, we CS50 have

[941:32]

pre-installed most of what you need,

[941:34]

including back in week uh the earliest

[941:36]

weeks of the class when we had that cows

[941:39]

program that I wrote that was using a

[941:40]

thirdparty library that I had installed

[941:42]

into my code space in advance. Well, you

[941:44]

can use a program called pip to install

[941:48]

Python packages into your own code space

[941:50]

and if using your own Mac and PC onto

[941:52]

your own Macs and PCs as well if those

[941:54]

libraries are freely available as open

[941:56]

source online and in the repository from

[941:59]

which the Python uh pit program actually

[942:02]

draws. Let me go back to VS Code and let

[942:04]

me go ahead and create a new program

[942:06]

called cow.py. And with this program,

[942:08]

I'm going to go ahead and import that

[942:10]

library cows. And after that, I'm going

[942:12]

to call cowsay.cow

[942:14]

quote unquote say this is CS50 to have a

[942:17]

cute little cow on the screen say

[942:18]

exactly that. Now, in a previous

[942:20]

lecture, I had pre-installed this

[942:21]

library. But suppose I had forgotten to

[942:23]

do so today. Let's see what other type

[942:26]

of error we'll see on the screen. Well,

[942:27]

let me go ahead and run Python of

[942:29]

cow.py. Enter. And there's another one

[942:31]

of those trace backs. This one's a

[942:33]

little more straightforward than the

[942:34]

name error and the value error we saw in

[942:36]

the past. This is a literally module not

[942:38]

found error. no module named cows. Well,

[942:41]

this is where the pip command comes in.

[942:43]

If something hasn't been pre-installed

[942:45]

uh for you in cs50.dev or in the real

[942:47]

world on whatever system you're using,

[942:49]

you can use pip install cows and

[942:52]

assuming you've spelled it correctly and

[942:54]

assuming the library is publicly

[942:55]

available, hitting enter will result in

[942:57]

pip automatically downloading the latest

[942:59]

version, installing it in this case into

[943:01]

your code space and solving hopefully

[943:03]

that problem. Let me clear my terminal

[943:05]

window, run python of cow.py Pi again.

[943:08]

Definitely cross my fingers. And there

[943:10]

is the most adorable cow. And if we full

[943:12]

screen the terminal, we'll see that he's

[943:13]

indeed saying this is CS50. Now, that's

[943:16]

just one of the things we can install

[943:18]

with cows. I could also install

[943:20]

libraries onto my own Mac and PC. In

[943:23]

fact, in just a moment, I'm going to

[943:24]

switch over to another computer here

[943:25]

where I have a terminal window open on

[943:27]

my own actual Mac. And I'm doing this

[943:29]

because I'd like to play around with

[943:30]

some speech uh some texttospech uh

[943:33]

library functionality which you can't

[943:35]

really do in cs50.dev because it's

[943:37]

browserbased and when you run code in

[943:39]

the cloud it's not going to pass the

[943:40]

audio along to your speakers on your

[943:42]

laptop or desktop. But if I'm running

[943:44]

Python and my own code on my own

[943:47]

computer, a Mac in this case, or a PC in

[943:49]

someone else's case, I can install that

[943:51]

kind of library, speech to text, and

[943:53]

have my own code on my own computer, use

[943:56]

my own speakers to verbalize some string

[943:59]

quite like that. So, how can I go about

[944:01]

doing this? Well, having read some

[944:03]

documentation, I'm going to go ahead and

[944:04]

install with pip a library called pi to

[944:08]

text uh text to speech version 3.

[944:11]

hitting enter goes and finds and

[944:13]

downloads as needed the uh the library

[944:15]

if it's not already installed and then

[944:17]

brings me back to my terminal and I'm

[944:18]

going to use an older school program

[944:20]

here called Vim or vi to actually

[944:22]

implement a cow program on this computer

[944:24]

whereby I'm going to go ahead and write

[944:26]

some code using this library without VS

[944:28]

code but with just another text editor

[944:30]

instead to do this at the very top of my

[944:32]

file I'm going to import this library

[944:35]

called Python texttospech so pyttsx3

[944:40]

for version three and then I'm going to

[944:41]

use only three lines of code to

[944:43]

synthesize some voice. I'm going to say

[944:45]

a variable called engine. Set it equal

[944:47]

to pi ttsx3.init

[944:50]

because the documentation taught me that

[944:51]

I need to initialize the library the

[944:53]

first time I use it. I can then use this

[944:55]

variable called engine to actually say

[944:57]

something quite like scratch albeit

[944:59]

verbally instead of pictorially like

[945:01]

this is c-50 quote unquote. And then

[945:04]

lastly I can use engine.run run and wait

[945:08]

similar to some scratch block so that

[945:10]

the whole expression is actually

[945:11]

verbalized before my program actually

[945:13]

quits. Now, the first time I run this,

[945:14]

it might take a moment for the library

[945:16]

indeed to initialize itself. But on my

[945:18]

own Mac here, I'm going to run Python of

[945:19]

cow.py. If we could raise the volume

[945:21]

just a little bit, hopefully we'll not

[945:23]

see but hear this cow's greeting.

[945:27]

>> This is CS50.

[945:29]

It was very much in a rush to say it,

[945:31]

but after initializing for that long.

[945:32]

And if we ran it again and again and

[945:34]

added some optimizations, we could get

[945:35]

it talking much more quickly than that.

[945:37]

But we now have a version of the program

[945:38]

that indeed verbalizes what string or

[945:40]

stir it is that I've passed into it

[945:43]

here.

[945:43]

>> CS15.

[945:45]

>> It's really in a rush to finish there.

[945:46]

All right. But let's try one final

[945:47]

flourish of another library that's fun

[945:49]

to play around with, if only because

[945:51]

it'll motivate some of the things you

[945:52]

can now do in Python yourself. Let me go

[945:55]

into VS Code in my code space because

[945:57]

this one does not require my speakers.

[945:59]

I'll close that first version of the cow

[946:01]

and I'm going to go ahead and create a

[946:02]

QR code generator after installing with

[946:05]

pip uh a library called QR code which I

[946:09]

read about online and now it's installed

[946:11]

in my code space. I'm going to now go

[946:12]

ahead and create a file called uh QR.py.

[946:16]

So let's go ahead and code up QR.py and

[946:18]

I want to generate my own QR codes. Most

[946:20]

of you in the h are in the habit if

[946:21]

you've ever generated a QR code before,

[946:23]

you probably just Google around for some

[946:24]

generator online for which someone else

[946:26]

wrote code to generate the QR code. But

[946:28]

I can do that for myself and actually

[946:30]

generate my own images. I'm going to go

[946:32]

ahead and import the library that I just

[946:34]

installed. Import QR code. And then

[946:37]

below that, I'm going to create a

[946:38]

variable called for instance image and

[946:40]

set that equal to this libraries QR code

[946:43]

function. No relation to the make that

[946:46]

we use for C. And I'm going to make a QR

[946:48]

code containing a URL maybe of one of

[946:50]

the lecture videos. So let's do

[946:51]

httpsyoutube.com

[946:55]

the short version and then xvfz

[946:58]

j5

[947:00]

p g

[947:03]

uh gg0 if I got that just right. Then

[947:08]

after that I'm going to go ahead and

[947:09]

call image.save to save that URL as a

[947:12]

file called qr.png

[947:14]

quote unquote. And then PNG will be the

[947:16]

format which is portable network graphic

[947:18]

which is akin to a JPEG or a GIF but

[947:20]

with different features. I'm just going

[947:21]

to double check my writing here. So we

[947:23]

go to the right lecture video and I

[947:25]

think we are indeed good. And what that

[947:28]

should do after running my code is leave

[947:30]

me with today's final flourish a ping

[947:33]

file in my code space that when open is

[947:36]

going to be QR code that you can scan

[947:37]

with your phone. So if you'd like to get

[947:39]

ready for this final flourish I'm going

[947:40]

to go ahead and run Python of QR.PI and

[947:43]

hit enter. Thankfully, it worked. I'm

[947:45]

going to now open up qr.png

[947:49]

and close my terminal window. And for

[947:51]

our final moments together this here in

[947:52]

week six, after which we'll ultimately

[947:54]

transition to yet more languages and

[947:57]

problems to be solved, here is a final

[947:59]

code for you to scan of today's here

[948:02]

lecture.

[948:08]

All right, that's it for today. We'll

[948:10]

see you next time.

[949:30]

All right. This is CS50 and this is

[949:34]

already week seven wherein wherein we

[949:36]

introduce another programming language

[949:38]

this time known as structured query

[949:39]

language or SQL or SQL for short. Now

[949:43]

SQL as we'll see is a different sort of

[949:45]

programming language that allows us to

[949:46]

solve like a lot of the same kinds of

[949:48]

problems that we've been dabbling with

[949:49]

over the past several weeks but arguably

[949:51]

in a lot of context it allows us to

[949:53]

solve those problems more easily.

[949:55]

Indeed, among the goals for today are to

[949:57]

demonstrate that sometimes there's

[949:58]

multiple tools that you can use to solve

[950:00]

the same problem, whether it's C or

[950:01]

Python or today's SQL. Um, but we'll

[950:04]

also see that uh SQL allows us a

[950:06]

different sort of approach to solving

[950:08]

problems. Whereas C very much so and

[950:11]

Python to a large extent are very much

[950:13]

procedural programming languages whereby

[950:15]

you have to write these procedures,

[950:16]

functions step by step that tell the

[950:18]

computer what to do including loops and

[950:20]

conditionals and all of that. SQL is

[950:22]

said to be a declarative programming

[950:24]

language which is a different sort of

[950:25]

paradigm whereby when you want to solve

[950:27]

some problem you essentially declare

[950:29]

what problem you want to solve or you

[950:31]

declare what question you have and it's

[950:33]

up to the programming language to figure

[950:34]

out using loops and conditionals and all

[950:36]

of those lower level building blocks how

[950:38]

to get you the answer. So ultimately

[950:40]

today is all about teaching you yet

[950:42]

another language mostly so that you can

[950:44]

learn again to teach yourself new

[950:46]

languages and to appreciate that once

[950:48]

you exit a class like CS50 and are out

[950:49]

there in the real world really isn't all

[950:51]

that big a deal to pick up new

[950:53]

programming languages especially when in

[950:55]

advance you've seen different

[950:56]

programming paradigms like procedural

[950:59]

like object-oriented like today

[951:01]

declarative as well but today ultimately

[951:03]

is also about data and so to get us

[951:05]

started we thought we'd collect some

[951:06]

real world data by asking all of you a

[951:08]

couple of questions So, if on your

[951:10]

laptop or phone you would like to pull

[951:11]

up this URL here,

[951:14]

it will also exists in just a moment in

[951:16]

QR code form. So, if you'd like to go to

[951:18]

that URL there or simply scan this here

[951:21]

QR code with your phone, that's going to

[951:23]

lead you to a Google form. For those

[951:25]

unfamiliar, Google has lots of tools

[951:26]

among which are uh is a tool via which

[951:29]

you can ask people questions via forms.

[951:31]

Microsoft has something similar as well.

[951:34]

And at that URL, what you'll soon see is

[951:38]

a form that looks a little something

[951:41]

like this. Among those questions are

[951:43]

which is your favorite language, at

[951:45]

least among those we've studied thus

[951:46]

far. So go ahead and anonymously answer

[951:49]

the questions you see on this form.

[951:51]

You'll see which is your favorite

[951:52]

language and also which is your favorite

[951:55]

problem in problem sets thus far. And

[951:58]

meanwhile, as you might know, if you've

[952:00]

used Google forms yourself to collect

[952:02]

data, we can move from questions here to

[952:05]

actual responses. And as people start to

[952:07]

buzz in, we'll see that the data set

[952:08]

here is starting to update in real time.

[952:10]

And Google gives us these nice graphical

[952:12]

user interfaces or guies via which we

[952:14]

can analyze the data. And so far, Python

[952:15]

is easily the winner with 70% plus of

[952:18]

you preferring it. 11% of you uh wishing

[952:21]

we were still in Scratch and N 18% of

[952:24]

you in C. And you'll see the responses

[952:26]

are coming in here. But for our purposes

[952:28]

today, what's more interesting than the

[952:29]

actual answers to these questions is how

[952:31]

we can get at the raw data. So among the

[952:34]

things you can do in Google Sheets is

[952:36]

quite literally click view in sheets,

[952:38]

which is in Google forms is click on

[952:40]

view in sheets. And what this is going

[952:41]

to allow me to do is access the

[952:43]

underlying raw data. Now, because Google

[952:45]

has forms and spreadsheets, they sort of

[952:46]

tied these two products together. But

[952:48]

what's especially nice about Google

[952:50]

spreadsheets is that I can also download

[952:53]

the raw data as a file. I can download

[952:55]

it as an Excel file, a text file, a PDF.

[952:57]

But for today, we're going to download

[952:58]

it in a very common format known as CSV

[953:01]

for commaepparated values. And indeed,

[953:03]

if I go to the file menu, download

[953:05]

commaepparated values. This is perhaps

[953:07]

the most uh straightforward, easiest way

[953:11]

to get raw data out of any kind of

[953:13]

tabular data like this to load it into

[953:15]

code that we are about to write. So, if

[953:17]

you haven't buzzed in already, that's

[953:18]

fine. But at this point in time, now

[953:20]

that I've clicked the button, I now have

[953:21]

a CSV file in my Mac downloads folder,

[953:24]

which if I go ahead and open up here, I

[953:26]

can see that indeed I've got this long

[953:28]

named file, favor-form responses 1.csv.

[953:31]

I'm going to shorten that file name to

[953:32]

just favorites.csv.

[953:34]

And what I'm going to go ahead and do is

[953:35]

open up VS Code. And in my file

[953:39]

explorer, I'm going to literally just

[953:40]

drag and drop favorites.csv from my Mac.

[953:43]

that's going to have the effect of

[953:44]

uploading the file as it was at that

[953:47]

moment in time so that we can now begin

[953:49]

to write some code using this file. And

[953:52]

VS Code has automatically gone ahead and

[953:54]

opened it up for me. And what you're

[953:55]

looking at here is what we're going to

[953:57]

start to call a flat file database. It's

[953:59]

a very lightweight database in the sense

[954:01]

that it stores a lot of data. And it's a

[954:03]

flat file in the sense that it's

[954:04]

literally just a text file. And by

[954:06]

convention, the way the data is stored

[954:08]

in this file is indeed by separating

[954:10]

values with commas. There are other

[954:12]

conventions as well, but CSV is probably

[954:13]

the de facto standard. But TSV is a

[954:15]

thing for tab separated values, PSV,

[954:18]

which is pipe separated values where you

[954:20]

might have a vertical bar. Essentially,

[954:21]

these file formats try to use a

[954:23]

character that might not appear in the

[954:25]

actual data so as to separate your rows

[954:27]

and columns. So indeed, if I switch back

[954:29]

to VS Code here and we take a look at

[954:31]

the data, you'll see that from Google

[954:33]

Sheets, I've been given three columns.

[954:35]

Timestamp, which was automatically

[954:36]

generated for me, the language, as well

[954:38]

as the problem. And what I see here is

[954:41]

that we had a few respondents buzz in a

[954:43]

little early. Uh very excited for

[954:44]

today's data. But here's the rest of

[954:45]

them from like 1:30 p.m. Eastern onward.

[954:47]

And you'll see separating separated via

[954:50]

commas are effectively three columns of

[954:52]

data. So everything before the first

[954:54]

column represents a time stamp.

[954:55]

Everything between the first and second

[954:57]

comma represents the choice of language

[955:00]

that you all buzzed in with. And then

[955:01]

everything after the second comma

[955:03]

represents the problem. Now it's kind of

[955:05]

uh jagged edges. It doesn't line up in

[955:08]

nice rows and columns because some

[955:09]

answers are longer, some answers are

[955:10]

shorter, but the commas are sufficient

[955:12]

to tell the code we write where one

[955:14]

column ends and the next one begins. So,

[955:17]

how do we go about writing code like

[955:19]

this? If we'd now like to ask some

[955:20]

questions about the data, like what is

[955:22]

the most popular language? What is the

[955:24]

most popular problem? Or conversely, the

[955:26]

least of each of those. Well, we could

[955:28]

look at the original data in Google

[955:30]

forms and that's where we got the pie

[955:32]

chart. But how is Google figuring out

[955:34]

what the most popular answers are and

[955:36]

what uh pie charts it wants to depict?

[955:39]

Well, they probably wrote some code not

[955:41]

unlike what we're about to do. Although,

[955:43]

we'll start with just a command line

[955:45]

environment as always. So, within VS

[955:47]

Code, I'm going to go ahead and do this.

[955:49]

I'm going to go ahead and open up a

[955:51]

program called favorites.py. And let's

[955:53]

write a program whose purpose in life is

[955:54]

to open the CSV file, read it top to

[955:57]

bottom, left to right, and then crunch

[955:58]

some numbers, figure out what the most

[955:59]

popular answers are to those questions.

[956:02]

So, I'm going to go ahead and import a

[956:04]

package that comes with Python, a

[956:06]

library called the CSV library. And

[956:08]

nicely enough, this is just code that

[956:09]

someone else wrote years ago that

[956:10]

figures out how to read data from a

[956:12]

file, separating it via comma, so that

[956:14]

you and I don't have to write all of

[956:16]

that ourselves. Then, I'm going to use

[956:18]

this Pythonic convention with open quote

[956:21]

unquote favorites.csv

[956:23]

as file. Though, if I want to be super

[956:25]

explicit that I intend only to read this

[956:27]

file, which is the default, I'm going to

[956:29]

go ahead and explicitly say quote

[956:30]

unquote R, just like we did in C when

[956:33]

using fop to open a file in read mode.

[956:36]

And now I'm going to do this. I'm going

[956:37]

to go ahead and say reader equals

[956:40]

CSV.reader

[956:41]

file. So, this is a Python convention

[956:44]

whereby the CSV library comes with a

[956:47]

function called reader that takes as its

[956:49]

sole argument here a file that has

[956:52]

already been opened. And what that

[956:53]

reader will do is figure out where all

[956:55]

of the commas are so that I can iterate

[956:57]

over this reader in a loop and get back

[957:00]

row after row after row without me

[957:03]

having to write all of the code to

[957:04]

figure out where those commas are. So

[957:06]

what I'm going to do in this loop here

[957:08]

uh in this uh block of code is for each

[957:11]

row in that reader, let's go ahead and

[957:13]

just print out maybe the second column

[957:17]

which was the language column. So I'm

[957:19]

going to go ahead and say print row

[957:21]

bracket one because what we'll see is

[957:24]

that this reader which again comes with

[957:26]

Python hands me a list a list a list for

[957:31]

each of the rows wherein bracket zero

[957:34]

would represent the first column bracket

[957:36]

one would represent the second bracket

[957:37]

two would represent the third because

[957:39]

everything is zero indexed in Python.

[957:41]

All right so let's see what the effect

[957:43]

is here let me maximize my terminal

[957:44]

window run python of favorites.py Pi

[957:47]

cross my finger that I got this right

[957:49]

and voila there is every language that

[957:52]

was selected by you all in the form from

[957:54]

top to bottom by default chronologically

[957:56]

but there's a bit of a bug I dare say

[957:59]

let me scroll up and up and up in this

[958:01]

output through all of these answers

[958:05]

until I get to the very top where I ran

[958:07]

the program myself which is here python

[958:10]

of favorites.py Pi. There's a minor bug

[958:12]

here. What's the bug in the output?

[958:15]

Yeah,

[958:17]

>> yeah, it accidentally includes the

[958:18]

header, which is a bug in the sense that

[958:20]

I really just wanted to see the

[958:21]

languages, but the code is doing what I

[958:23]

told it to, which is just print out

[958:25]

every row. So, there's a few ways we

[958:27]

could ignore this. Let me go ahead and

[958:28]

minimize my terminal window and let me

[958:30]

go ahead and say, well, you know what?

[958:31]

after we create this reader, let's just

[958:33]

skip to the next uh let's just skip to

[958:37]

the next row and ignore it effectively

[958:39]

and then begin iterating over everything

[958:41]

thereafter. And so what happens now is

[958:44]

if I remaximize my window, rerun python

[958:47]

of favorites.py

[958:49]

enter and now scroll up again to the

[958:52]

beginning of this incarnation of the

[958:54]

program. You'll see that the very first

[958:56]

thing I see after my program was run was

[959:00]

indeed Python, Python, Python, Python,

[959:02]

and so forth. No more quote unquote

[959:04]

language. So, how is that? Well, this is

[959:06]

a a feature we haven't quite seen before

[959:10]

or talked about in much detail, but this

[959:12]

reader is is stateful in some sense. And

[959:14]

this was actually true of all of the

[959:16]

file IO we did in C whereby when you

[959:18]

were using f read or some other function

[959:20]

to read data from the file something was

[959:22]

remembering where it was in the file so

[959:25]

that you didn't get the same bites again

[959:27]

and again and again. It was more like uh

[959:30]

a cassette tape, an old school cassette

[959:31]

tape if you will, or a scrubber along

[959:34]

the bar uh along the bottom of like any

[959:36]

streaming video whereby when you just

[959:38]

read some data, it grabs the next chunk,

[959:39]

the next chunk, the next chunk, the next

[959:41]

chunk, and something inside of the

[959:43]

computer's memory remembers where it is.

[959:44]

So, this says skip to the next row. And

[959:47]

thus, when you do four row in reader,

[959:49]

you get everything but the first row

[959:51]

because the reader is stateful. It

[959:53]

remembers where it is in memory. All

[959:56]

right. All right. Well, thus far this

[959:57]

isn't all that useful because all I'm

[959:58]

doing is just printing out the data. But

[960:01]

let's take a step toward making this

[960:02]

program a little more useful. In

[960:03]

particular, let's just be a little more

[960:05]

pedantic and specify that what I'm

[960:07]

really doing here inside of this loop is

[960:10]

figuring out what the current rows

[960:12]

favorite is. So, I'm going to create a

[960:14]

variable called favorite and set that

[960:16]

equal to row bracket one. And then even

[960:18]

though this doesn't change the

[960:19]

functionality, I'm going to print that

[960:21]

favorite just because semantically,

[960:23]

stylistically, it's nice to know what

[960:25]

row bracket one is as by defining a

[960:28]

variable that tells me or anyone else

[960:30]

who reads this code in the future what

[960:31]

it's actually doing. All right, but

[960:34]

readers are only so useful. And in fact,

[960:37]

if I were to open up this CSV file,

[960:38]

maybe in Microsoft Excel or Apple

[960:40]

Numbers or Google Sheets, again, you

[960:42]

could imagine someone kind of moving the

[960:44]

data by just dragging one of the columns

[960:45]

to the left or the right such that now

[960:47]

it's no longer timestamp language

[960:50]

problem. Maybe it's timestamp problem

[960:52]

language or maybe time stamp is all the

[960:54]

way over to the right. You could imagine

[960:56]

therefore that the indices we're using 0

[960:58]

1 and two could be a little fragile

[961:00]

because if someone changes the data on

[961:02]

me now my code is just going to break

[961:04]

because I am blindly assuming that the

[961:06]

second column aka bracket 1 is going to

[961:10]

be the language column but that might

[961:13]

not be the case but there's an

[961:14]

alternative to this and you might recall

[961:16]

having seen this before. I'm going to go

[961:17]

into favorites.py and tweak my code a

[961:20]

little bit not just to use a reader but

[961:22]

a dictionary reader. So I'm going to

[961:23]

change this to dict reader instead of

[961:25]

just reader. And then the upside of

[961:27]

using a dictionary reader is that every

[961:30]

time I go through this loop reading row

[961:32]

by row by row, each row that I'm handed

[961:35]

by this reader is not going to be a list

[961:38]

anymore that's numerically indexed with

[961:40]

zeros and ones and twos. Each row is

[961:42]

going to be, as you might guess, a a

[961:45]

dictionary, which is a collection of key

[961:47]

value pairs, which means now we can use

[961:49]

words as our indices instead of just

[961:51]

numbers. Which is to say if I switch

[961:53]

from reader which gives me lists to dict

[961:56]

reader which gives me dictionaries I can

[961:58]

change this line 10 now and say I

[962:01]

specifically want the language column

[962:04]

wherever it is all the way to the left

[962:06]

or the middle or the right. So in

[962:07]

general using a dictionary reader is

[962:09]

probably just going to be more robust

[962:11]

because it's resilient against changes

[962:13]

in that actual numeric ordering. All

[962:16]

right, let me pause here to see first if

[962:18]

there's any questions on this exercise

[962:20]

whose purpose in life is just to

[962:21]

demonstrate how we can download the CSV

[962:23]

data then iterate over it line by line

[962:26]

without actually analyzing it yet.

[962:31]

No. Okay. So let's ask maybe the most

[962:34]

natural question which is like how many

[962:37]

people prefer Python? How many people

[962:38]

prefer C or Scratch in turn? In other

[962:42]

words, how can we recreate in our own

[962:43]

code what Google Forms is doing for us

[962:46]

graphically with those pie charts? Well,

[962:48]

I think what we could do is write some

[962:50]

code logically that essentially relies

[962:52]

on this mental model. What I have here

[962:54]

is an opportunity to use a bunch of key

[962:56]

value pairs because if I want to know

[962:57]

how many instances of Python there are

[962:59]

and C and Scratch, well, those might as

[963:01]

well be three keys, the values of which

[963:03]

are hopefully going to be three numbers

[963:05]

that represent the counts of the

[963:07]

popularity of each of those languages.

[963:09]

So in memory, I essentially want to

[963:10]

construct something that looks like this

[963:12]

and would if I were doing this on a

[963:13]

chalkboard. But recall that this mental

[963:15]

model maps perfectly to the notion of a

[963:18]

Python dictionary because a dictionary

[963:20]

in Python is indeed key value pairs. And

[963:22]

we've seen it already because that's how

[963:24]

the dictionary reader works. But we

[963:26]

could certainly use our own uh

[963:28]

dictionaries to solve this same problem

[963:30]

ourselves. So the goal at hand is to

[963:32]

count the number of people who said

[963:34]

Python and C and Scratch respectively.

[963:37]

So how to do this? Well, I think what I

[963:39]

could do is Oh, and actually let me

[963:41]

delete this line. Because we are using a

[963:43]

dictionary reader, we no longer need to

[963:45]

skip the first row. It is automatically

[963:47]

consumed by the dictionary reader for

[963:49]

us. So, this now would be the better

[963:51]

version of the dictionary reader. Let's

[963:53]

go ahead and do this. Let me declare

[963:54]

some variables first that will store for

[963:57]

me the total number of people who said

[963:59]

Python, Scratch, and C respectively. So,

[964:02]

I could say Scratch equals 0, uh C

[964:04]

equals Z, Python equals Z. And I could

[964:07]

just set three variables equal to 0 0 0

[964:10]

and 0. If you haven't seen it before,

[964:11]

there are some Pythonic uh tricks you

[964:13]

can do here. If you've got three

[964:14]

variables that you want to initialize

[964:16]

all at once because it's that simple,

[964:17]

you could alternatively do scratch, c,

[964:20]

python equals 0, 0, 0. This too would

[964:23]

have the intended effect and it looks a

[964:25]

little better because it's all a simple

[964:26]

oneliner. But what do I want to do now?

[964:29]

Well, down here, let's go ahead and do a

[964:32]

simple conditional before we enhance

[964:34]

this by using an actual dictionary. Let

[964:36]

me go ahead and say if the current

[964:38]

favorite in that reader equals equals

[964:40]

scratch. Well, let's go ahead and

[964:43]

increment the scratch variable by doing

[964:45]

plusals 1 as we saw last time. Uh, else

[964:48]

if the favorite in the current row

[964:51]

equals equals quote unquote C. Well,

[964:53]

let's go ahead and then increment the C

[964:55]

variable by one. uh else if the favorite

[964:59]

equals equals Python, then let's go

[965:01]

ahead and increment plus equals uh

[965:04]

Python by one instead. I could

[965:07]

technically get away with saying else

[965:09]

here, but I'm consciously this time not

[965:12]

trying to overoptimize this because if

[965:14]

someone changes the form maybe next

[965:16]

semester and whatnot and we're asking

[965:18]

about a fourth language, I wouldn't want

[965:19]

my code to assume that anything that

[965:21]

isn't Scratch or C must be Python when

[965:24]

there could be some future fourth

[965:25]

language. So, this is a little more

[965:26]

robust and in this case, we'll just

[965:28]

ignore anything that isn't Scratch or C

[965:30]

or Python. All right, at the end of

[965:32]

this, let's go ahead and not just print

[965:34]

out the favorite, but outside of the for

[965:35]

loop, let's go ahead and print out, for

[965:37]

instance, the Scratch count is this.

[965:41]

Then, let's go ahead and print out the C

[965:44]

count is this. And then let's print out

[965:48]

the Python count is this. But, of

[965:52]

course, there's a subtle bug here. Yeah.

[965:55]

Ah, so I didn't format these things as f

[965:57]

string. So I need the little f over here

[965:59]

to the left of each of these strings.

[966:00]

All right, so let me go ahead and

[966:02]

maximize my terminal window, run Python

[966:04]

of this version of favorites.py, and

[966:06]

hopefully what we'll see is not every

[966:08]

row again and again and again, but three

[966:10]

lines of output, giving me the total

[966:12]

counts instead. All right, this seems to

[966:15]

line up with the rough percentages that

[966:16]

we saw coming in earlier on Google

[966:18]

Forms. 109 of you like Python, followed

[966:20]

by 58 of you in C, and 24 of you

[966:22]

preferring Scratch instead. All right,

[966:24]

but why does this perhaps rub you the

[966:27]

wrong way? I already alluded to the fact

[966:29]

that we're going to get rid of this, but

[966:31]

why is this not the best design just

[966:33]

using three variables like this? Yeah,

[966:36]

>> different categories.

[966:41]

>> Yeah, exactly. If we were to add a bunch

[966:43]

more languages, a fourth one, a fifth

[966:44]

one, a sixth one, a 10th one, a 20th

[966:46]

one, like having that many variables is

[966:49]

just certainly going to look unwieldy

[966:50]

and it's just not going to it shouldn't

[966:52]

rub you the right way. At that point, we

[966:53]

should really be graduating to some

[966:55]

proper data structure, whether it was an

[966:57]

array in C or better still in Python, an

[966:59]

actual dictionary. So, let's do that

[967:01]

instead. Let me go ahead and in a newer

[967:03]

version of this file, let's get rid of

[967:05]

these individual variables and let's

[967:06]

just have a generic variable called

[967:09]

counts, for instance, and set it equal

[967:11]

to an empty dictionary. And just using

[967:13]

two curly braces will give me an empty

[967:15]

dictionary. Or if you want to be more

[967:17]

pedantic, you can actually call the dict

[967:19]

function, which will return to you an

[967:21]

empty dictionary. I'd argue though that

[967:23]

most people would probably just use the

[967:24]

double curly braces like this to

[967:26]

indicate that here comes a dictionary

[967:28]

for me. Now, how do I use this? Well, I

[967:31]

don't need to update three separate

[967:33]

variables. I think I could just do

[967:35]

something like this. I could say once

[967:37]

I've determined what the current rows

[967:39]

favorite value is for language, I could

[967:41]

say counts bracket favorite. So, use the

[967:45]

current string as an index into the

[967:47]

dictionary. So, it's going to be quote

[967:48]

unquote Scratch or C or Python. and then

[967:51]

just increment that by one. And then

[967:54]

down here, we don't have these variables

[967:56]

anymore. So, I'm going to go ahead

[967:57]

instead say uh how about this? We'll use

[968:00]

a loop for each favorite in those

[968:03]

counts. Let's go ahead and print out uh

[968:07]

how about the favorite value and the

[968:10]

counts thereof without any fing.

[968:14]

Okay. So the only thing that's different

[968:16]

is I'm using a dictionary here which is

[968:19]

essentially the code version of this two

[968:21]

column chart whose keys are going to be

[968:23]

the favorite strings uh scratch or C or

[968:27]

Python the values of which are going to

[968:29]

be the actual counts and I'm just doing

[968:30]

some simple math by plus+ing or

[968:32]

incrementing the count each time I see a

[968:35]

certain language. Unfortunately this

[968:37]

code is not quite going to work. Let me

[968:39]

go ahead and run Python of favorites.py

[968:40]

Pi and dang it, there's a key error. Let

[968:45]

me minimize the terminal window so we

[968:46]

can see both at once. Why is there a key

[968:50]

error apparently on line 11 wherein I'm

[968:54]

indexing into the counts array uh

[968:57]

dictionary?

[968:59]

What's going on? Yeah,

[969:01]

>> the key already exists.

[969:03]

>> Yeah, it's a little subtle, but if this

[969:05]

is like the very first time through the

[969:06]

file, there is no key Python. There is

[969:09]

no key C or scratch because no one has

[969:12]

put them there. And yet recall that plus

[969:14]

equal means you're going to that

[969:16]

location in the dictionary and just

[969:17]

blindly incrementing it. But what is it?

[969:19]

Well, it's effectively a garbage value.

[969:21]

But it's not even that because there's

[969:22]

no actual key there. So we need to do a

[969:25]

little bit of logic here. And we can

[969:26]

solve this in a couple of ways. Well, I

[969:28]

could say something very pedantically

[969:30]

like this. I could just say, well, if

[969:33]

this favorite is in the counts

[969:36]

dictionary, this is the Pythonic way to

[969:38]

ask that question. Is this key in this

[969:41]

dictionary? If so, well, then it's safe

[969:43]

to go ahead and increment it just as

[969:44]

I've done before. But if it's not, what

[969:46]

I think I want to do is set counts

[969:49]

favorites equal to

[969:55]

one instead because either I want to

[969:59]

increment the current count by one or

[970:02]

this is the first time logically I've

[970:04]

seen this favorite so I want to set it

[970:06]

equal to one instead. We could do this a

[970:09]

different way logically just like we

[970:10]

could in C solve problems differently. I

[970:12]

could instead say something like this. I

[970:15]

could get rid of all this code and just

[970:17]

say if favorite not in count then I

[970:22]

could say count bracket favorite equals

[970:24]

zero. So just always initialize it to

[970:26]

zero if it's not there. Now I can safely

[970:28]

blindly update the count by one because

[970:33]

now I know no matter what once I get to

[970:35]

line 13 that count is actually there.

[970:39]

All right, so let's see with this

[970:40]

version of the code. Let's go ahead and

[970:43]

clear my terminal window. Uh, rerun

[970:45]

python of favorites.py. Cross my

[970:47]

fingers. And there we go. Python and

[970:50]

Scratch and C. Interestingly, the order

[970:53]

switched around this time uh based on

[970:55]

the order in which I was inserting

[970:57]

things into the dictionary. But we'll

[970:58]

see how we can exercise a bit more

[971:00]

control over that. But let me propose

[971:01]

that that key error. call. We discussed

[971:04]

briefly last week that whenever you have

[971:05]

these kinds of trace backs that refer to

[971:08]

certain exceptions like exceptionally

[971:10]

bad situations that can happen, you can

[971:12]

also change your code to just try to do

[971:14]

something and then try to catch the

[971:16]

exception instead. So an alternative way

[971:18]

to do what we initially did would be

[971:20]

this. Instead of just blindly saying go

[971:23]

into the counts dictionary, index into

[971:25]

it at the favorite uh key and increment

[971:28]

it by one, what we could do is try to do

[971:32]

that. please, except if there is a key

[971:35]

error, in which case, you know what, go

[971:37]

ahead and just initialize that value to

[971:40]

one instead. So, in short, there's like

[971:42]

four different ways already to solve the

[971:44]

same problem. Whichever way you prefer

[971:45]

is quite reasonable. This is just

[971:47]

another way and arguably another

[971:49]

Pythonic way to do things by trying to

[971:51]

do something but anticipating that

[971:53]

something in fact can go wrong. A while

[971:56]

ago you removed

[971:58]

>> a while ago what

[971:59]

>> you removed next reader.

[972:02]

>> Correct. A while ago I removed next

[972:03]

reader because that was only necessary

[972:05]

for CSV reader because that was just

[972:08]

reading every row again and again. But

[972:10]

when you use a CSV dictionary reader

[972:12]

that automatically consumes the first

[972:15]

row because that's how the dictionary

[972:16]

reader knows what the columns will be

[972:18]

called and so you don't have to skip

[972:20]

over it instead. A nice enhancement.

[972:22]

other questions on what we've just done

[972:27]

here.

[972:29]

All right, so let me propose that like

[972:31]

writing this amount of code is kind of

[972:34]

annoying just to ask a relatively simple

[972:36]

question like what's the most popular

[972:37]

language in this file, right? You it's

[972:39]

been nice. It's sort of a step backwards

[972:41]

from Google spreadsheets and Apple

[972:43]

numbers and Microsoft Excel where you

[972:45]

could really just like highlight the

[972:47]

column and it would just tell you the

[972:48]

answer usually in the bottom righth hand

[972:49]

corner or you could use a function in

[972:51]

one of those spreadsheet tools to ask

[972:53]

the same question. So, it's starting to

[972:55]

feel like with almost a 20 lines of

[972:57]

code, like maybe there's a better way.

[972:59]

And I dare say there is. Rather than use

[973:01]

a flat file database, let's graduate

[973:03]

already to what the world calls a

[973:05]

relational database. And a relational

[973:07]

database is simply data in which you

[973:11]

define relations among your data, which

[973:13]

isn't so much relevant now except that

[973:14]

that timestamp is associated with that

[973:17]

language is associated with that uh

[973:20]

prefer favorite uh problem as well. But

[973:23]

we'll see that data sets can be much

[973:24]

more uh much larger and more

[973:26]

complicated. And it might be valuable if

[973:27]

we can actually express relationships

[973:29]

across multiple pieces of data. In

[973:31]

particular, let's introduce already a

[973:33]

programming language called structured

[973:35]

query language or SQL for short, aka

[973:38]

SQL. And SQL essentially only has four

[973:40]

fundamental operations. So even though

[973:42]

we're transitioning into a new language,

[973:44]

by the end of today, we're going to

[973:45]

transition out of the new language

[973:46]

because there's only so much you can do.

[973:47]

Now, as with any language, it's going to

[973:49]

take time and practice or to sort of get

[973:51]

a hold the hang of it. But take comfort

[973:54]

in knowing that SQL really just supports

[973:56]

four fundamental operations. And the

[973:58]

acronym that the world uses is indeed

[974:00]

CRUD, which stands for create, read,

[974:03]

update, and delete. That is to say, when

[974:06]

using a relational database, you can

[974:07]

create data, read data, update the data,

[974:10]

or delete data. And that's pretty

[974:12]

comprehensive as to what's possible.

[974:14]

Now, what is an actual database? Well,

[974:16]

generally speaking, a database is just a

[974:18]

piece of software that's running on a

[974:19]

computer somewhere inside of which is

[974:21]

stored a whole lot of data. And that

[974:23]

database therefore provides you with

[974:25]

access to that data at any time, whether

[974:27]

it's on your local Mac or PC somewhere

[974:29]

in the cloud or to a whole cluster of

[974:31]

web servers, which we'll talk about in

[974:33]

the weeks to come as we transition from

[974:34]

uh command line tools to the web. Now,

[974:37]

technically in SQL, the commands you

[974:39]

actually use to implement this idea of

[974:41]

creating data, reading data, updating,

[974:42]

and deleting data is almost the same.

[974:44]

But for whatever reason uh the world

[974:47]

chose the command select which is

[974:49]

equivalent to reading data. So we'll

[974:50]

soon see that there's a command in SQL

[974:52]

that lets us select data which is

[974:53]

equivalent to this idea of reading it

[974:56]

whereas the other three options refer of

[974:57]

course to writing data that is changing

[975:00]

data. Um technically speaking we'll be

[975:02]

able to insert data into a database as

[975:04]

we'll soon see and we'll also be able to

[975:06]

drop data altogether not just delete

[975:08]

individual rows but whole tables so to

[975:10]

speak of uh rows instead. So what does

[975:14]

this all mean? Well, let's go ahead and

[975:16]

do say an example of using SQL to solve

[975:20]

to ask some relatively simple questions

[975:22]

and begin to develop some muscle memory

[975:24]

for using this new language. If I were

[975:26]

to manually load a bunch of data into a

[975:29]

proper database for SQL, I would

[975:32]

actually use code like this. I would

[975:33]

literally type create table. Then I'd

[975:35]

come up with the name of the table, aka

[975:37]

sheet, and then I would specify every

[975:39]

column that I want to put in that table.

[975:42]

And here's where the vernacular changes.

[975:43]

So whereas in the world of spreadsheets

[975:45]

you have sheets, tabs that contain rows

[975:48]

and columns, in the world of databases,

[975:50]

you have tables which are just rows and

[975:52]

columns. It's different terminology, but

[975:54]

it refers to conceptually the same

[975:56]

thing. In CS50, we're going to use a

[975:58]

specific version of SQL known as SQL

[975:59]

light, which is like a lightweight

[976:01]

version of SQL that's actually very

[976:02]

commonly used in web applications, in

[976:05]

mobile applications, but it doesn't have

[976:07]

all of the bells and whistles or all of

[976:08]

the scalability uh that your Oracle, SQL

[976:12]

Servers, Microsoft Access, Postgress,

[976:14]

MySQL, those are just product names,

[976:16]

open source and commercial like, which

[976:18]

if you've ever heard of just represent

[976:20]

uh bigger, faster versions of SQL

[976:24]

databases. is, but we'll indeed use the

[976:25]

lightweight version of it known as SQL

[976:28]

light. And the command we're going to

[976:29]

start to run is quite literally SQLite

[976:31]

3, which is version three of the same

[976:33]

command, which we've pre-installed into

[976:34]

your code spaces for you. So, let's go

[976:37]

ahead and do this. Let me go ahead and

[976:38]

run a command called SQLite 3, which is

[976:40]

going to let me create my very first

[976:43]

SQLite database, and I'm going to import

[976:46]

into that database the CSV file that we

[976:49]

downloaded from Google Forms. In other

[976:52]

words, I'm going to load that same data

[976:54]

set into a different program, an actual

[976:56]

database, so that I can use a completely

[976:59]

different programming language to ask

[977:00]

questions about it instead of writing,

[977:02]

as we just did, some Python code. So,

[977:05]

let me go back into VS Code here. Let me

[977:07]

close my CSV file and my Python file.

[977:10]

Let me reopen my terminal window and let

[977:12]

me go ahead and run SQLite 3 space and

[977:16]

then the name I want to give to this

[977:17]

database, which for instance will be

[977:18]

favorites. DB for database uh by

[977:21]

convention. Enter. I'm going to be

[977:23]

prompted to make sure I want to create

[977:25]

this new file. Y for yes. Enter. And now

[977:28]

I'm inside of the database running a

[977:30]

command at a prompt that's now says SQL

[977:32]

light and then an angle bracket. I'm not

[977:34]

going to be using anySSQL

[977:36]

files for now. Although you can actually

[977:38]

write SQL code in separate text files.

[977:40]

I'm actually going to use the databases

[977:42]

interactive interpreter to just run all

[977:45]

of the commands I want interactively by

[977:47]

just typing them out. Semicolon enter.

[977:48]

type it out, semicolon, enter, back and

[977:51]

forth. But you can save all of these

[977:52]

commands as you'll see in problem set 7

[977:54]

in files as well. Now, how do I go about

[977:57]

actually importing that CSV file into

[978:00]

this lightweight database? Well, for

[978:02]

this, I'm going to execute three

[978:03]

commands. And any command in SQLite that

[978:06]

starts with a dot is specific to SQL

[978:08]

light, this lightweight version of SQL.

[978:10]

Anything that doesn't start with a dot

[978:12]

is generalizable and will work on most

[978:14]

any SQL database anywhere in the world,

[978:16]

no matter the product you're using. So,

[978:19]

I'm going to go ahead and in my SQLite

[978:22]

terminal, I'm going to change my mode to

[978:23]

CSV mode just to tell the database that

[978:25]

I want to load some CSV data. I'm going

[978:27]

to then literally import that data from

[978:30]

a file called favorites.csv, which is

[978:33]

the file we downloaded earlier and then

[978:35]

uploaded to my code. And now I have to

[978:37]

specify the name of a table. So, I'm

[978:39]

going to call this table aka sheet

[978:41]

favorites just to keep everything

[978:43]

consistent. And that's it. In the

[978:45]

absence of an error message, everything

[978:46]

probably worked fine. I'm going to do

[978:48]

gotquit. That quits out of SQLite. But

[978:51]

what you'll now see if I type ls is that

[978:54]

not only do I have favorites.csv, which

[978:56]

I uploaded, favorites.py, which we wrote

[978:58]

a few minutes ago, but I also now have

[979:00]

favorites. DB, which is a database

[979:03]

version of that same file. Now, I can't

[979:06]

actually see what's inside of it because

[979:08]

if I go ahead and run uh code of

[979:10]

favorites db, I'm going to see this file

[979:13]

is not displayed in the text editor

[979:14]

because it is either binary or uses an

[979:16]

unsupported text encoding. This is to be

[979:18]

expected because this database is stored

[979:21]

essentially in the form of zeros and

[979:23]

ones that the SQLite 3 program knows how

[979:25]

to read, but is not something that VS

[979:27]

Code can just show me everything

[979:28]

therein. And generally storing data in

[979:30]

binary is going to be more efficient

[979:32]

than storing things purely textually

[979:35]

because we're going to be able to use

[979:37]

various data structures and algorithms

[979:39]

that we've been talking about for weeks

[979:41]

uh more easily on that binary data. All

[979:44]

right, so let's go ahead now and see

[979:47]

what this import command did. I'm going

[979:48]

to again uh maximize my terminal window.

[979:51]

I'm going to go ahead and run SQLite 3

[979:53]

again, passing in favorites.db. Enter.

[979:56]

This time it already exists so it just

[979:58]

opened it without prompting me. And now

[980:00]

I'm going to go ahead and type another

[980:01]

SQLite specific command called schema.

[980:04]

The schema of a database is just the

[980:06]

design of the database. What does it

[980:08]

look like? What are the rows and columns

[980:10]

and tables therein? So if I type dots

[980:12]

schema, what I'm going to see is this

[980:15]

SQL command create table if not exists

[980:18]

quote unquote favorites which is the

[980:20]

name of the table. Then in parenthesis

[980:22]

there are going to be apparently three

[980:24]

columns. One of which is called time

[980:25]

stamp. The next of which is called

[980:26]

language. The third of which is called

[980:28]

problem. And each of those columns is

[980:30]

going to be raw text. Now we'll soon see

[980:32]

that it doesn't have to just be text.

[980:33]

But when I use the import command, this

[980:36]

is the default table that SQLite created

[980:40]

for me. Soon we'll see that I can

[980:42]

exercise more control, especially over

[980:44]

the types of data that I'm putting in

[980:46]

this database. But what's really nice

[980:47]

about the import command is it could not

[980:50]

be easier to convert a CSV file to a

[980:52]

SQLite database. So that now as we're

[980:54]

about to see we can use SQL on it

[980:56]

instead of Python or any other language

[980:59]

instead.

[981:01]

Okay. So how do we go about getting data

[981:04]

from this database? Well, the first of

[981:08]

our commands that we'll explore is that

[981:09]

one called select. So select data means

[981:11]

to read data from the database. And in

[981:13]

this sense, it's going to be a

[981:14]

declarative language because I'm just

[981:16]

going to declare what data I want to

[981:18]

select from the database. And I'm not

[981:20]

going to worry about opening the file

[981:21]

anymore or iterating over it with a for

[981:24]

loop or a while loop or defining

[981:25]

variables or the like. I'm just going to

[981:28]

select syntactically what I want. So let

[981:31]

me go back to SQLite here. Let me clear

[981:34]

my terminal just to get rid of the past

[981:36]

commands. And let's do the first of

[981:37]

these. Select star from favorites. And I

[981:41]

regret to say uh the semicolon is back

[981:43]

for the SQL code we're now writing.

[981:45]

Enter. and we will see a sort of asy art

[981:48]

version now. So even better than the raw

[981:51]

CSV file of all of the data that was

[981:53]

imported into this table. So select star

[981:56]

from favorites is apparently selecting

[981:58]

everything. So the star in this context

[982:00]

is a wild card of sorts that represents

[982:02]

all of the columns in the table. The

[982:04]

table itself is called favorites. So I'm

[982:06]

selecting all of the columns from the

[982:08]

table called favorites. And here you

[982:10]

have it with sort of simple ASKI art.

[982:11]

first column, second column, third

[982:13]

column, chronologically listed because

[982:15]

that's exactly how it was loaded into

[982:17]

the database. All right, so if star is

[982:20]

wild card, what more can we do? Well, if

[982:22]

you don't care about all of the columns,

[982:24]

you can actually be a little more

[982:25]

specific. So I could say instead, select

[982:27]

just the language column from the

[982:30]

favorites table, semicolon, enter. And

[982:33]

now I have just a single column of data

[982:35]

that shows me one cell for every

[982:38]

submission but not the timestamp or the

[982:40]

favorite problem that that person put

[982:42]

in. Or if I want to declare that I want

[982:44]

a couple of columns. So I can say select

[982:46]

language and problem but I don't care

[982:48]

about the timestamp from favorites as

[982:51]

such and now you get two columns

[982:53]

instead. So in short, rather than write

[982:55]

the dozen or so lines of code that we

[982:57]

earlier did with Python to open the file

[983:00]

and then iterate over it with a reader,

[983:02]

we just select what data we want from

[983:04]

this here database. But even more

[983:06]

powerfully, SQL comes with a whole bunch

[983:08]

of functions built in. Quite like the

[983:09]

spreadsheet software that you and I are

[983:11]

already familiar with in the real world

[983:12]

like Excel and numbers and Google

[983:14]

Sheets. SQLite comes with an average

[983:17]

function, account function, distinct

[983:19]

lower, min, max, min, uppercase, and so

[983:22]

forth. There's a whole list of them.

[983:24]

We'll play around with just a couple of

[983:25]

these. If we want to transform some of

[983:27]

this data, let me go back into VS Code,

[983:29]

clear my SQL light terminal, and suppose

[983:31]

I just want to get the total number of

[983:33]

rows in the favorites table, like how

[983:35]

many people at the moment in time I

[983:37]

downloaded the file, even if not

[983:38]

everyone had quite buzzed in yet, did I

[983:41]

end up with in that file? Well, I could

[983:43]

say select the count of all of the rows

[983:46]

from the favorites table semicolon. And

[983:49]

now I'll get back a single cell which

[983:51]

gives me 272 submissions had come in the

[983:54]

moment I downloaded that file. Suppose I

[983:57]

want to see just to confirm that no one

[983:59]

submitted bogus data. Which languages

[984:02]

were actually among those typed in?

[984:04]

Well, I can select only the distinct

[984:06]

languages that were typed in from the

[984:08]

favorites table. And now I get a unique

[984:11]

list of languages that everyone buzzed

[984:13]

in with irrespective of how many times.

[984:16]

If I want to maybe get um how many

[984:18]

distinct languages there are, if it's

[984:20]

not as obvious as three here, I could

[984:22]

select the count of distinct languages

[984:26]

from the favorites table and it would

[984:28]

just tell me the answer. Three is the

[984:30]

total number of languages that are

[984:32]

distinct in that submission. So again,

[984:34]

it's even easy to just eyeball this, but

[984:36]

very quickly with single statements that

[984:38]

are sort of English-like left to right

[984:41]

is enabling me to just select the

[984:42]

answers I want to some of these

[984:44]

problems. Well, what more can SQL do?

[984:46]

Well, here is a bunch of other

[984:49]

uh keywords that we can add to our SQL

[984:52]

commands that allow us to control

[984:54]

further what kind of data we're going to

[984:55]

get back. We're going to be able to

[984:56]

group data by similar values. We're

[984:59]

going to check for not just string

[985:00]

equality, but for uh fuzzy matching,

[985:02]

checking if something is close to a

[985:04]

string that we're looking for. We can

[985:05]

limit the total number of rows coming

[985:07]

back. We can order or sort the data by a

[985:09]

certain column. And we can actually have

[985:11]

predicates, so to speak, using a wear,

[985:13]

which is similar in spirit to an if

[985:15]

condition, but a little more succinctly

[985:17]

written instead. So, for instance, let

[985:19]

me go back to VS Code here. Let me clear

[985:21]

my terminal again, and let me go ahead

[985:23]

and select how many of you answered C is

[985:26]

your favorite language. Without

[985:28]

selecting all of the counts again, let's

[985:30]

just uh hit the nail on the head. So,

[985:32]

let's select the count of rows from the

[985:36]

favorites table where the language

[985:39]

selected equals quote unquote C

[985:42]

semicolon. And I get back a simple

[985:44]

answer. 58 of you buzzed in with the

[985:46]

answer C. How many of you liked both C

[985:49]

and very specifically the problem called

[985:51]

hello world? If you sort of that was the

[985:53]

extent of your sort of um the passion

[985:55]

for for code, let's go ahead and select

[985:57]

the count of star from favorites where

[986:00]

the language you typed in equals quote

[986:02]

unquote C. Uh and the problem you typed

[986:06]

in equals quote unquote hello,

[986:09]

world semicolon. And it looks like five

[986:11]

of you said your favorite language was C

[986:13]

and your favorite program was hello

[986:15]

world. Great. All right, so it's getting

[986:18]

a little more interesting. What about

[986:19]

the other version of hello world where

[986:21]

we called it hello, it's me. Well, that

[986:24]

one's interesting because I think it's

[986:25]

going to break my convention of using

[986:27]

single quotes, which would be convention

[986:28]

here in SQL. Whenever you're using a raw

[986:30]

string, single quotes here would be the

[986:32]

norm. But let's type this out. So,

[986:34]

select count of star uh from favorites

[986:39]

where language equals quote unquote C.

[986:42]

And the problem this time equals quote

[986:44]

unquote hello, it's me. So, at a glance,

[986:48]

this is probably going to confuse SQLite

[986:51]

3 because does that middle apostrophe

[986:53]

belong to the first one or the second

[986:55]

one? This is ambiguous. And this is

[986:57]

weird. In C, we would solve this problem

[986:59]

by putting a backslash in front of it in

[987:01]

a so-called escape character. Different

[987:03]

languages have different conventions.

[987:04]

This one's a little weird, but in

[987:05]

SQLite, what you instead do is doubly

[987:08]

single quote it. So putting two single

[987:11]

quotes is the convention for escaping a

[987:14]

single quote just because you got to

[987:16]

remember or Google these kinds of things

[987:18]

in the real world if you forget. Enter.

[987:20]

Now I get back that. So not it was not

[987:22]

the case that any of you liked both C

[987:24]

and that problem specifically. Well,

[987:27]

what if we want to be a little more

[987:28]

inclusive of either hello problem? Well,

[987:30]

I could do this in this way. Uh just

[987:32]

like in my uh code spaces terminal, I

[987:35]

can go up and down to go back through my

[987:37]

history. Same thing in SQLite. So I can

[987:39]

go back to commands to get up here and

[987:41]

let me go ahead and write something

[987:43]

longer where the problem is hello world

[987:47]

or the problem equals quote unquote

[987:49]

hello it's double apostrophe me single

[987:54]

apostrophe semicolon oh and parenthesis.

[987:56]

So it's wrapped onto two lines here. So,

[987:58]

it's a little messy, but I'm just

[988:00]

logically saying where you buzzed in

[988:02]

with C as your language and a problem of

[988:05]

hello world or a problem of hello, it's

[988:07]

me. Enter. It should be the same answer

[988:09]

as before because none of you liked

[988:11]

hello, it's me. But I chose this syntax

[988:14]

because I can actually make this a

[988:16]

little cleaner. I can go and delete this

[988:18]

whole parenthetical and just say where

[988:21]

language equals C. And the problem is

[988:24]

like quote unquote hello,

[988:27]

percent sign, single quote semicolon. So

[988:30]

this is a little weird too. It's just

[988:32]

how SQL does this instead. But whereas

[988:34]

previously I was using an equal sign to

[988:36]

check for literal string equality like

[988:38]

literally those problem names, like

[988:40]

allows me to use wild cards. And it's

[988:42]

not a wild card quite like the previous

[988:44]

used of the asterisk that we saw. When

[988:47]

you are using a wild card in a string in

[988:50]

SQL, you say percent sign to represent

[988:53]

zero or more characters there. So hello,

[988:56]

space percent is going to hopefully

[988:59]

match this or the other problem that

[989:02]

started with hello, so let me go ahead

[989:04]

now and hit enter. The answer is still

[989:06]

going to be the same, but indeed it's

[989:08]

demonstrative that that is how you could

[989:10]

express yourself a little more generally

[989:11]

if you wanted a pattern match like that.

[989:14]

Questions now on any of these

[989:16]

techniques? Yeah,

[989:18]

>> capitalization capitaliz.

[989:20]

>> Uh, good question. Does it have to be

[989:22]

capitalized when doing string equality?

[989:24]

Yes, but not with like. Like will

[989:26]

tolerate case insensitivity. So

[989:28]

uppercase or lower case,

[989:30]

>> but like count and everything.

[989:32]

>> Oh. Oh, I see. Good question. So the

[989:34]

capitalization so stylistically in SQL I

[989:37]

would argue and this is a stylistic

[989:39]

convention in SQL certainly for CS50 and

[989:41]

also for a lot of companies and

[989:42]

communities in the world to uppercase

[989:44]

your SQL keywords just to make them

[989:47]

stand out from words that you and I

[989:49]

chose as like the name of the table or

[989:51]

the name of the columns therein. This is

[989:53]

just a convention. I would propose like

[989:55]

always to be consistent but for CS50 and

[989:57]

for style50 sake I would propose that

[989:59]

you indeed capitalize like this. And

[990:00]

frankly, it just makes it easier to read

[990:02]

to my eye because the SQL stuff jumps

[990:04]

out and then the lowercase stuff is

[990:06]

specific to your data set. A good

[990:08]

question.

[990:10]

All right. How about another

[990:13]

uh set of keywords that we saw on the

[990:15]

screen earlier, namely grouping by?

[990:18]

Well, suppose we have a data set like

[990:20]

this whereby we suppose we have a data

[990:23]

set like this whereby

[990:26]

how does this go? Happy Halloween.

[990:29]

whereby here's just an excerpt from that

[990:31]

table. So for as languages go uh say one

[990:34]

of you liked C, two of you like or three

[990:37]

of you liked Python and then now that

[990:39]

we're introducing SQL, let's imagine

[990:40]

that two of you now like SQL even

[990:42]

better. So that's the extent of the data

[990:44]

set. Wouldn't it be nice to be able to

[990:46]

figure out how many of you like C or

[990:48]

Python or SQL? Well, I could write some

[990:51]

Python code, open the file, iterate over

[990:53]

it using variables, using a dictionary,

[990:54]

and those what 20 or so lines of code we

[990:57]

wrote earlier to answer this question.

[990:59]

Wouldn't it be nice to just ask the SQL

[991:01]

language to figure out how many of you

[991:03]

like C, how many of you like Python, how

[991:05]

many of you like SQL? We can do this by

[991:07]

grouping these cells by common values.

[991:10]

Let's group all of the Python rows

[991:12]

together and all of the SQL rows

[991:14]

together. And even though there's just

[991:15]

one, all of the C rows as well. So, how

[991:18]

can we do this? Well, let me go back to

[991:20]

VS Code here and clear my terminal. And

[991:22]

let's do this. Let's select every

[991:24]

language but its respective count as

[991:28]

well from the favorites table. But

[991:31]

before you do any of that, group

[991:33]

everything by language. So this one

[991:36]

takes a little more practice and getting

[991:38]

used to, but this is simply saying

[991:40]

select all of the it's saying look at

[991:43]

the languages essentially group all of

[991:46]

the common languages together and then

[991:47]

figure out what count that gives you for

[991:50]

all of the grouped rows. If I hit enter

[991:52]

here, we'll get an answer just like the

[991:54]

Python code that took me 20 lines of

[991:56]

code to write earlier. What's really

[991:58]

happening though in the database is

[992:00]

something a little bit like this.

[992:01]

Notice, of course, that there's only one

[992:03]

version of C. There's then three

[992:05]

versions of Python and there's two

[992:06]

examples of SQL. And the table I'm

[992:08]

essentially building is to group all of

[992:10]

those by identical values and then spit

[992:12]

out the total counts here. Now on the

[992:15]

screen, it's just one, three, and two.

[992:16]

in the data set with some 200 plus

[992:18]

responses, we have much larger answers

[992:20]

including scratch instead of SQL right

[992:22]

here. But this now sort of speaks to

[992:24]

just how much more convenient it is to

[992:26]

if you want to ask a question like that,

[992:28]

especially if the data set is more than

[992:29]

a couple of hundred rows. If your boss

[992:31]

for instance in the real world has a CSV

[992:32]

data set and wants you to analyze the

[992:34]

data, well, you can literally download

[992:35]

it, import it into SQLite, run one

[992:37]

command, and boom, like you've got this

[992:39]

analysis done. if the extent of it is

[992:42]

just to group the data and figure out uh

[992:45]

what kinds of uh counts you have in the

[992:48]

data set. All right, what else can we

[992:49]

do? Well, we can play around with this a

[992:51]

bit more. Let me go back here into VS

[992:52]

Code and propose that we could uh order

[992:56]

those results more than in just the uh

[992:59]

the default way. So, let's go ahead and

[993:00]

select the language uh and the count

[993:02]

from the favorites table yet again.

[993:05]

Let's group by language yet again, but

[993:08]

this time let's order by the counts

[993:11]

column in descending order. So, it's a

[993:14]

bit more of a mouthful and it takes some

[993:16]

practice to memorize all of the syntax,

[993:17]

but when I hit enter now, I get back the

[993:19]

same answers, but Python is at the very

[993:21]

top of the list. Now, count star isn't

[993:24]

necessarily all that self explanatory,

[993:27]

and indeed, it's a little annoying that

[993:28]

I have to write out count star here at

[993:30]

top right as well as in the beginning.

[993:32]

So, it turns out SQL also supports

[993:33]

aliases. So if you want to change the

[993:36]

temporary name of the column to be

[993:38]

something else like n for number, well

[993:40]

then I can actually define an alias with

[993:42]

the keyword as order by n at the end of

[993:45]

this statement and then hit enter and

[993:47]

get back the same results too. And so if

[993:50]

it's not sort of implicitly clear

[993:52]

already, each of these SQL select

[993:54]

commands is essentially giving me back a

[993:57]

temporary table. This is not being saved

[993:59]

anywhere. Like now it's gone from the

[994:00]

computer's memory once I've actually

[994:02]

gotten my answer. But it's essentially

[994:04]

returning a subset of the tables that do

[994:07]

exist in the computer's memory because

[994:09]

that's what the import command did for

[994:10]

me. It loaded the whole data set into

[994:12]

memory. And now I have these temporary

[994:15]

tables that are just containing the

[994:17]

answers to questions I care about. And

[994:19]

if you only care about the top one

[994:20]

language, well, there's a limit keyword,

[994:22]

too. I can literally just say limit one

[994:24]

at the end of that exact same statement.

[994:27]

Enter. And now I've got a single answer

[994:29]

to my question. A single row saying

[994:31]

Python was the most popular with 190

[994:33]

people selecting that.

[994:36]

All right, for now I think that's enough

[994:38]

on select. There's a few more keywords,

[994:40]

but it really is just a matter of

[994:41]

composing these building blocks.

[994:43]

Questions though on these capabilities

[994:47]

fundamentally.

[994:50]

All right. Well, how about maybe

[994:52]

inserting data instead? So here might be

[994:54]

the canonical way to insert a row into a

[994:57]

table in SQL. You literally say insert

[994:59]

into then the name of the table then in

[995:01]

parenthesis the one or more columns for

[995:03]

which you have data and then literally

[995:05]

the word values and then in another set

[995:07]

of parenthesis a commaepparated list of

[995:09]

the one or more values that you want to

[995:11]

insert into those there columns. So for

[995:14]

instance let me go back into VS code

[995:16]

here. And of course at the time we

[995:18]

circulated this form a few minutes ago

[995:20]

we had not yet assigned problem set 7.

[995:22]

But in problem set seven is a problem

[995:24]

called 50ville, which let's propose

[995:27]

might very well be someone's favorite in

[995:28]

a week. So let's go ahead and insert

[995:30]

that row now pro uh preemptively. Let's

[995:33]

insert into the favorites table two

[995:36]

columns, language and problem. Why?

[995:39]

Well, I don't really care to figure out

[995:41]

what the time stamp is and the format

[995:42]

thereof. So I'm just going to omit the

[995:44]

time stamp altogether. But the values

[995:46]

I'm going to insert for this new row are

[995:48]

going to be are going to be quote

[995:50]

unquote SQL comma quote unquote uh 50

[995:54]

bill close quote close parenthesis

[995:56]

semicolon enter. Nothing bad seems to

[995:59]

have happened. Let me go ahead and

[996:00]

select star from favorites just to see

[996:02]

what my data set looks like now. And

[996:04]

indeed at the bottom of the file or the

[996:06]

bottom of the table indeed there is that

[996:08]

new row. But what's sort of noteworthy

[996:10]

is that this isn't just blank. There's

[996:12]

our old friend null, which is not a null

[996:14]

pointer. It's the same word literally,

[996:16]

null l, and it refers explicitly to the

[996:19]

absence of data. And this is actually a

[996:21]

nice feature because if any of you have

[996:22]

ever used like Google spreadsheets,

[996:24]

Apple numbers, Microsoft Excel, and

[996:26]

thought about uh or looked at cells that

[996:28]

are blank, like what does it mean if a

[996:30]

spreadsheet cell is blank? Does it mean

[996:31]

like there's literally no data there?

[996:33]

Does it mean that you just don't have

[996:35]

the data there or it's missing in some

[996:37]

form? Well, how do you address that?

[996:39]

Well, maybe you put like n sl a in

[996:41]

English for like not available or

[996:43]

something like that, but that's kind of

[996:44]

hackish. And if you use na, that might

[996:46]

mean that no one can actually type na as

[996:48]

their answer. And so what's nice about

[996:50]

SQL and data and database languages more

[996:52]

generally is that null signifies the

[996:55]

conscious omission of data. It's not

[996:57]

just a missing value. It's consciously

[996:59]

not there. It's not just the empty

[997:01]

string, quote unquote, for instance. So

[997:04]

we might see different examples of that.

[997:05]

But what's nice now is that I can

[997:07]

distinguish null from other values. And

[997:09]

in fact, if that is not a good idea to

[997:11]

have any data in my data set that is

[997:14]

null for whatever reason, like it just

[997:15]

looks like bogus data, it would nice to

[997:17]

know who inserted that when. No problem.

[997:19]

We can also delete data from a table in

[997:21]

SQL. And I can delete from the name of

[997:23]

the table where some condition is true.

[997:26]

So for instance, if I want to delete

[997:28]

that, I can do this in a couple of ways,

[997:30]

but perhaps the simplest is to delete

[997:32]

from

[997:35]

favorites where uh timestamp

[997:39]

is null. Semicolon. So is 2 is another

[997:43]

SQL keyword here. And that will go ahead

[997:45]

and delete only those rows where the

[997:47]

time stamp is null. Enter. Let's do the

[997:50]

same select command as before. Enter.

[997:53]

And voila, that row is now gone. Be

[997:56]

very, very, very careful with delete

[997:59]

statements. If I had foolishly done

[998:02]

this, want to guess what the results

[998:04]

would be?

[998:07]

It would delete everything. And like you

[998:09]

can Google around and see actual

[998:11]

articles of like interns at companies

[998:13]

who had way too much access to a company

[998:14]

database executing something like delete

[998:16]

from favorites because they forgot the

[998:18]

predicate. They hit enter too soon. and

[998:20]

boom, all of the data is now gone. So

[998:22]

these are very destructive commands and

[998:24]

just like in the real world, if you

[998:25]

don't have backups or versions of these

[998:28]

same tables, the data can indeed be lost

[998:31]

forever. So don't do that. Always have

[998:33]

your wear and make sure your wear is

[998:36]

correct. All right. Well, let's go ahead

[998:38]

maybe and um suppose let's claim that

[998:41]

maybe 50ville is going to be a really

[998:43]

popular problem among students. So much

[998:45]

so that it becomes overnight everyone's

[998:47]

favorite problem. Well, we can update

[998:48]

the table as is. Here is the general

[998:50]

syntax for updating rows in a table. You

[998:52]

literally say update the name of the

[998:54]

table, the word set, and then a bunch of

[998:57]

key value pairs. The column that you

[999:00]

want to update, setting it equal to the

[999:02]

value that you want to update it to

[999:04]

where some condition is true. So, what

[999:06]

does this mean concretely? Well, let's

[999:08]

say that we want to change everyone's

[999:10]

favorite to SQL and 50ville. I could do

[999:13]

this. update favorites set language

[999:17]

equal to SQL comma problem equal to

[999:21]

50ville

[999:23]

close quote semicolon and this is where

[999:26]

again it can be dangerous but in this

[999:28]

case I'm going to go ahead and hit enter

[999:29]

without any predicate to filter this

[999:32]

nothing bad seems to happen but if I now

[999:34]

do select star from favorites semicolon

[999:36]

all of you would seem to like 50 bill

[999:39]

and there is no going back to the

[999:40]

previous version of the table unless I

[999:42]

quit out of this And I import the whole

[999:44]

CSV again, maybe after deleting the data

[999:46]

entirely. All right. So, how do I get

[999:48]

rid of all of the data? Well, if you

[999:50]

want to delete from favorites for real

[999:52]

now, enter. Select star from favorites.

[999:56]

We can confirm that that was a bad idea.

[999:58]

There's literally no data in the

[1000:00]

database anymore, but we can certainly

[1000:03]

restore from our actual CSV. So in

[1000:05]

short, we've got select, we've got

[1000:06]

insert, we've got update, we've got

[1000:09]

delete, we've seen create, albeit

[1000:12]

automatically generated by SQLite 3.

[1000:14]

Maybe we'll see drop. And actually, we

[1000:16]

can see drop now. So recall that if I do

[1000:18]

dots schema, I can see all of the tables

[1000:21]

in this here database. If I do drop

[1000:24]

table favorites semicolon, and now again

[1000:27]

dot schema, now there is nothing in this

[1000:30]

database at all. So that's an even worse

[1000:31]

command to run unless you know and

[1000:33]

intend what you're doing. Questions then

[1000:36]

on these CRUD operations creating,

[1000:38]

reading, updating, deleting. Yeah, here

[1000:40]

first.

[1000:45]

>> Why do you not do quotation marks around

[1000:47]

null? So null is a special symbol and if

[1000:50]

you put quotation marks around it, you

[1000:51]

would literally be looking for the value

[1000:53]

null l that maybe was the name of a

[1000:55]

language or the name of a problem or

[1000:57]

something literally in the CSV. We are

[1000:59]

looking for the absence of that data

[1001:00]

altogether. Yeah.

[1001:09]

>> Really good question. Is it's so easy to

[1001:11]

destroy data like this. Are people

[1001:12]

actively backing up their data? Short

[1001:13]

answer, yes, absolutely. Like all of

[1001:15]

CS50's web apps and the like are

[1001:17]

automatically backed up on some

[1001:18]

schedule. Even then, we have to decide

[1001:20]

what that schedule is. And if it's

[1001:21]

daily, for instance, nightly, we could

[1001:23]

lose up to like 23 hours 59 minutes of

[1001:26]

data. In some case maybe companies would

[1001:28]

therefore version their data more

[1001:29]

tightly like every 5 minutes every

[1001:31]

minute although that's going to consume

[1001:32]

a lot more space but there already is

[1001:34]

this theme of trade-off certainly in

[1001:36]

computing um you can also implement

[1001:38]

forms of access control so SQLite is

[1001:40]

lightweight it has no notion of

[1001:41]

usernames or passwords if you have

[1001:43]

access to the data you can touch

[1001:44]

everything but in the real world with uh

[1001:46]

commercial and open source software like

[1001:48]

uh Oracle and SQL server and Postgress

[1001:50]

and MySQL you actually have usernames

[1001:53]

and passwords and specific permissions

[1001:55]

so you can give users in turns the

[1001:57]

ability to select data but not update or

[1001:59]

delete or insert data or any combination

[1002:02]

thereof. So there are defenses other

[1002:04]

questions on these here CRUD commands.

[1002:11]

Okay, let's go ahead and play with some

[1002:12]

real world data. So many of you might be

[1002:14]

familiar with IMDb, the internet movie

[1002:16]

database, which is a great repository of

[1002:18]

data for movies and also TV shows and

[1002:20]

actors and the like. And within IMDb's

[1002:23]

website, you can actually download uh

[1002:25]

TSV files, tab separated values of files

[1002:28]

that contain a lot of the data from that

[1002:30]

their website. So we went ahead and did

[1002:31]

this. We then converted that TSV data

[1002:34]

into a whole bunch of SQL tables so that

[1002:37]

we can begin to play with it uh in the

[1002:39]

context of TV shows. However, let's

[1002:41]

start first with a question about how

[1002:43]

you could go about modeling data for TV

[1002:45]

shows themselves. So for instance in

[1002:48]

advance I also uh created a few

[1002:50]

different spreadsheets that just allowed

[1002:51]

me to play with how I might model data

[1002:54]

real world data at that. So the office

[1002:56]

is a very popular uh TV show. The US

[1002:58]

version here is uh the US version here

[1003:00]

starred Steve Carell and others. So if I

[1003:03]

think about how IMDb or maybe just even

[1003:05]

little old me with a spreadsheet might

[1003:07]

keep track of who starred in what TV

[1003:10]

show. Well, I might just use a Google

[1003:11]

sheet like this and in the first column

[1003:13]

have a title column where this is the

[1003:15]

title of the show, like The Office. And

[1003:16]

then if it stars one person, I would put

[1003:18]

Steve Carell in the next column. But if

[1003:20]

there was a second star, I might put

[1003:21]

Rain Wilson or John or Jenna or BJ Novak

[1003:25]

here, column by column by column. And I

[1003:27]

could just keep adding show after show

[1003:29]

after show after show, one row per show,

[1003:33]

and then however many stars that are in

[1003:36]

there. What might you not like about the

[1003:38]

design of this data, though? or what

[1003:40]

might start to look odd.

[1003:44]

>> Yeah, it's a little weird that we have

[1003:45]

star star star. Just this repetition has

[1003:48]

tended to be bad. Anytime we're copying

[1003:49]

and pasting should rub you the wrong

[1003:51]

way. Other observations about it too?

[1003:54]

Yeah.

[1003:56]

>> Yeah. At the moment I've got 1 2 3 four

[1003:58]

five stars and there's certainly TV

[1004:00]

shows with fewer TV stars and more and

[1004:03]

so okay I can add more columns. I can

[1004:05]

just keep saying star, star, star, but

[1004:07]

then it's going to be a very ragged data

[1004:09]

set, very sparse data set where there's

[1004:11]

going to be a lot of blank cells for

[1004:12]

shows that have small casts, but then a

[1004:14]

lot of columns for shows that have large

[1004:16]

casts. So, it just feels like this

[1004:18]

should be rubbing you the wrong way. It

[1004:19]

just feels like it's going to get messy,

[1004:21]

especially as the number of stars, let

[1004:23]

alone shows, gets larger. All right.

[1004:25]

Well, another version of this uh data

[1004:28]

set that I put together is this instead.

[1004:30]

So, I didn't like the fact that I was

[1004:32]

going to have an arbitrary number of

[1004:34]

columns based on the specific show in

[1004:36]

question. So, here I scaled back and I

[1004:39]

just have a single column for title as

[1004:40]

before, but now a single column for

[1004:43]

star. And I decided that if a TV show

[1004:45]

has multiple stars, well, I just put

[1004:47]

each of the stars names and then to the

[1004:50]

left of them specify the show that

[1004:52]

they're in. seems to be a little better

[1004:54]

and that I've solved some of the

[1004:55]

redundancy problem, but I've kind of

[1004:57]

just kind of like covered up the hole in

[1005:00]

a leaky hose and now another leak sprung

[1005:02]

up here, which is to say there's still a

[1005:03]

bad design. What's bad here?

[1005:06]

Yeah,

[1005:08]

>> yeah, now I've got the office, the

[1005:09]

office, the office, the office, the

[1005:11]

office. And that too feels like I'm

[1005:13]

wasting space. If I manually type this

[1005:14]

in, odds are eventually I'm going to

[1005:16]

screw up and one of these is going to be

[1005:17]

misspelled, which is going to break

[1005:18]

something somehow. So, this two doesn't

[1005:20]

feel quite ideal. So the third and final

[1005:23]

version I whipped up to model this data

[1005:25]

which is going to lead us to a similar

[1005:28]

design in an actual database looks a

[1005:30]

little more arcane but is the right way

[1005:33]

at least academically to do things and

[1005:35]

we'll see technologically too this is

[1005:36]

going to be a big game. So here I now

[1005:39]

have a spreadsheet with three separate

[1005:41]

sheets. One is called shows which is

[1005:43]

selected at the moment. Another is

[1005:45]

called people which is not selected yet

[1005:47]

and the third of which is called stars.

[1005:50]

What am I doing here? Well, notice that

[1005:51]

in the show sheet, I've still got the

[1005:54]

title column, but I've decided to give

[1005:55]

the office a unique ID. Much like a

[1005:57]

Harvard student has a unique ID number,

[1005:59]

much like an employee in a company

[1006:00]

probably has a unique employee ID.

[1006:02]

Similarly, have I given the office a

[1006:04]

unique identifier that happens to be the

[1006:06]

same as it is in IMDb. Meanwhile, for

[1006:09]

all of the people that exist in the

[1006:11]

world of TV shows, for instance, these

[1006:13]

five folks, I have their names as well

[1006:15]

as unique IDs for them. and those

[1006:17]

integers are unique to the people and no

[1006:20]

connection per se to the show ids just

[1006:23]

yet. But the third and final sheet I've

[1006:25]

whipped up is going to be a sort of

[1006:27]

cross referencing sheet that allows me

[1006:29]

to associate shows with people. And at a

[1006:31]

glance, this looks the most arcane of

[1006:34]

the three because it's just numbers.

[1006:36]

It's just integers. But if you recall

[1006:39]

from a moment ago that the office's

[1006:41]

unique ID was 386676.

[1006:43]

Well, that's how we associated that show

[1006:45]

with this person which happens to be

[1006:47]

Steve Carell and so forth. Now, at a

[1006:50]

glance, not very useful to me, the human

[1006:52]

unless I do some fancy spreadsheet stuff

[1006:54]

like VLOOKUPs, a familiar, the like, but

[1006:56]

this is a stepping stone to how proper

[1006:59]

databases do actually store data. What I

[1007:01]

have done here is normalize the data by

[1007:04]

eliminating all redundancies except for

[1007:07]

maximally some redundant integers. And

[1007:09]

why is that? Well, integers, at least we

[1007:11]

know from our days in C, are going to be

[1007:12]

a finite length. It's going to be 32

[1007:14]

bits, maybe 64 bits, but it's always

[1007:16]

going to be the same number of bits. And

[1007:18]

that's nice because anytime you have a

[1007:19]

fixed number of bits, it lends itself to

[1007:21]

storing things nicely in an array or

[1007:23]

doing binary search because everything

[1007:24]

is a predictable distance apart as

[1007:27]

opposed to strings like Steve Carell or

[1007:29]

John Krinski or the names might vary in

[1007:31]

length. These IDs for the title of the

[1007:33]

show and these IDs for the persons are

[1007:36]

not going to vary in length because

[1007:37]

they're all just integers. But of

[1007:39]

course, this spreadsheet now much less

[1007:41]

useful because if I want to figure out

[1007:42]

who is in the office, well, first I have

[1007:44]

to figure out what show this is, then I

[1007:46]

have to figure out what uh person this

[1007:48]

is and this is and this is but that's

[1007:50]

where SQL is again going to swoop in and

[1007:53]

allow us to solve this problem. And

[1007:55]

indeed SQL is one of the most common

[1007:58]

ways that web applications today, mobile

[1008:00]

applications today store any amount of

[1008:02]

data at scale. They are most likely not

[1008:04]

using simple CSV files. they are using

[1008:06]

SQL light or MySQL or Postgress or

[1008:08]

Oracle or other commercial and open

[1008:10]

source incarnations of SQL databases and

[1008:14]

odds are IMDb might be using the same as

[1008:17]

well. All right, so let's go ahead and

[1008:19]

do this. I have created in advance a

[1008:22]

file called shows db that contains

[1008:25]

hundreds of thousands of rows from TV

[1008:28]

shows and TV stars and other data from

[1008:30]

IMDb itself. And in a moment we'll see a

[1008:34]

database that if drawn as a picture

[1008:36]

looks a little something like this.

[1008:37]

There is going to be a people table.

[1008:39]

There's going to be a shows table.

[1008:41]

There's going to be a stars table that

[1008:43]

somehow links the two. There's also

[1008:44]

going to be a writer table and a ratings

[1008:47]

table and a genres table. So overnight

[1008:49]

this sort of escalated quickly from just

[1008:50]

favorites which was a single table to

[1008:52]

now a real world data set that has six

[1008:54]

tables. But here is the relational in

[1008:57]

relational databases as these arrows are

[1008:59]

meant to imply. Right now, there are

[1009:01]

relationships across these several

[1009:04]

tables. Case in point, here is people

[1009:06]

here. And we'll see in a moment that a

[1009:08]

person in the IMDb world has an ID

[1009:10]

number, a name, and a year of birth. A

[1009:13]

show in the IMDb world has a unique ID,

[1009:16]

a title, the year it debuted, and a

[1009:18]

total number of episode. But there's no

[1009:19]

mention of people and shows. There's no

[1009:21]

mention of shows and people. But per the

[1009:24]

arrows, there's going to be this third

[1009:26]

table here, stars, that somehow links

[1009:29]

show ids with person IDs. And this is

[1009:31]

where relational databases get really

[1009:33]

powerful because you can solve all of

[1009:35]

those redundancy concerns and actually

[1009:39]

enable yourself to select data much more

[1009:42]

quickly instead. But let's focus on

[1009:43]

something simple first. Let's focus just

[1009:45]

on the shows table, which pictorially

[1009:48]

might look a little something like this.

[1009:50]

So, in just a moment, I'm going to go

[1009:51]

ahead and reopen VS Code, and I'm going

[1009:53]

to open up instead of favorites. DB, I'm

[1009:55]

going to go ahead and open up uh a file

[1009:58]

called shows.db, which again, I arrived

[1010:01]

with in advance. So, if I open up with

[1010:04]

SQLite 3 shows db and hit enter, I'm

[1010:07]

back at a SQL prompt. Let me go ahead

[1010:09]

and type schema shows just to show you

[1010:13]

what command created this here table.

[1010:16]

And it got a little more interesting

[1010:17]

already. Notice that the table is called

[1010:20]

shows and it's got 1 2 3 four columns.

[1010:22]

The an ID for each show, a title for

[1010:25]

each show, the year it debuted for each

[1010:27]

show, and the number of episodes.

[1010:29]

There's also clearly some mention of

[1010:30]

types and some other keywords that we

[1010:32]

haven't yet talked about. But let's

[1010:33]

focus now first on just what the data

[1010:35]

is. The best way to wrap your mind

[1010:37]

around a new data set if someone hands

[1010:38]

you a SQL uh database or you've imported

[1010:40]

a CSV into a SQL database is just select

[1010:43]

some data. So select star from shows

[1010:45]

semicolon.

[1010:47]

That's a lot of data flying across the

[1010:49]

screen. It's not very easy to see

[1010:50]

because some of the show names are

[1010:52]

apparently crazy long and so it's

[1010:53]

wrapping, but it's still going and going

[1010:55]

and going. I'm going to hit control C to

[1010:56]

interrupt it. C as uh with our terminals

[1010:59]

in general is your friend. Let's run

[1011:01]

that same command, but just limit it to

[1011:02]

the first 10 shows. So, there are the

[1011:04]

first 10 shows in the IMDb database of

[1011:07]

TV shows. So, we've got 10 rows in this

[1011:11]

data set going back to it looks like the

[1011:12]

1970s is roughly where their data set

[1011:14]

starts. All right. So here's the data we

[1011:18]

have in here. Well, how much is there?

[1011:21]

Well, let's go ahead and check. So,

[1011:22]

select count star from shows semicolon.

[1011:26]

And now we're talking. There's 250,87

[1011:30]

shows in this database. And if I do the

[1011:32]

same for people, select count star from

[1011:34]

people semicolon. Looks like there are

[1011:37]

74,315

[1011:40]

TV stars associated with this year data

[1011:42]

set. So here too the data is much more

[1011:44]

interesting and much more representative

[1011:46]

of real world data. All right. How about

[1011:48]

the ratings? IMDb if unfamiliar is also

[1011:51]

a place where you could go to check the

[1011:52]

ratings from users as to whether

[1011:54]

something is good uh show a good show a

[1011:56]

bad show or anything in between. So

[1011:58]

let's do dots schema ratings and I'll

[1012:00]

see that yeah there's this table called

[1012:02]

ratings that as we saw briefly on the

[1012:05]

screen there's a show id and then a

[1012:07]

rating and then the total number of

[1012:09]

votes that contributed there too and

[1012:11]

again some data types and other syntax

[1012:12]

that we'll get to before long but let me

[1012:14]

go ahead and just do select star from

[1012:16]

ratings limit 10 just to get a sense of

[1012:19]

what the data is. That's now what the

[1012:21]

data looks like in that table. So to a

[1012:23]

human at a glance, not that useful

[1012:25]

because you don't know what those show

[1012:26]

ids are. But in a moment, we're going to

[1012:28]

see how we can reconstitute this data by

[1012:31]

linking these tables together by way of

[1012:34]

those ids and actually get answers to

[1012:36]

questions. So among other things, a SQL

[1012:38]

database or a relational database more

[1012:40]

generally supports onetoone

[1012:42]

relationships whereby a row in one table

[1012:45]

can map to a one row in another table.

[1012:48]

So it's this is in contrast to one to

[1012:50]

many for instance. So one one means one

[1012:52]

row over here somehow relates to one row

[1012:54]

over here. Again the relational in

[1012:56]

relational database. Uh how might we go

[1012:59]

about uh seeing this? Well first here's

[1013:03]

a tour of the data types that SQL light

[1013:05]

supports. Uh whereas in C we had a

[1013:08]

somewhat similar list and in Python that

[1013:10]

list went away at least with regard to

[1013:12]

explicit types in SQL we're back to when

[1013:15]

creating our tables explicitly stating

[1013:17]

what the types of those uh columns are.

[1013:20]

So you have integers, you have numeric,

[1013:22]

which is more of a catch-all for things

[1013:23]

like times and dates and other useful

[1013:25]

real world data. You have real numbers

[1013:27]

which are like floats with decimal

[1013:29]

points. You have text which we've seen

[1013:30]

already. And then you have blobs which

[1013:32]

is a great name which stands for binary

[1013:33]

large objects. You can actually store

[1013:35]

raw zeros and ones like files in the

[1013:37]

database. Generally that's frowned upon

[1013:38]

to store files. But there's certain

[1013:40]

times where you do want to store binary

[1013:42]

data and not pure text. That's it for

[1013:44]

SQL light. There are only these five

[1013:46]

types. in uh other commercial and open-

[1013:49]

source SQL databases like Oracle and

[1013:51]

MySQL and Postgress and the same names I

[1013:53]

keep rattling off, you have even more

[1013:55]

data types than these. So that's among

[1013:57]

the additional features you get by using

[1013:59]

other databases as well. There's a few

[1014:01]

keywords though that are worth noting in

[1014:03]

SQL. You can specifically say when

[1014:05]

creating a table that this column cannot

[1014:08]

be null. If you don't want timestamp for

[1014:10]

instance to ever allow for null values,

[1014:13]

you can literally specify when creating

[1014:14]

that table, this column cannot be null.

[1014:17]

And if I try to insert data into that

[1014:19]

table with a null value as by not

[1014:22]

providing a timestamp, the insertion

[1014:24]

will fail. And so here's where things

[1014:26]

are different from just writing Python

[1014:28]

code or certainly using a spreadsheet.

[1014:29]

You can actually have built-in defenses

[1014:31]

so that you and no one else messes up

[1014:33]

your data by inserting bogus or blank

[1014:36]

data accidentally. You can further say

[1014:38]

that things must be unique. So every

[1014:41]

element, every cell in a column must be

[1014:43]

unique to ensure that you can't

[1014:45]

accidentally put two things with the

[1014:46]

same ID. Two Harvard ids, two employee

[1014:48]

ids that are duplicates. You can avoid

[1014:50]

that all together. But more importantly,

[1014:54]

relational databases support these two

[1014:56]

concepts, primary keys and foreign keys.

[1014:59]

And this is where the magic really

[1015:00]

starts to happen. A primary key is the

[1015:03]

unique identifier for a table. It is the

[1015:07]

column of values that uniquely identify

[1015:09]

every row. So it's probably going to be

[1015:11]

the show ID, the person ID, the Harvard

[1015:13]

ID, the employee ID. Anytime you have a

[1015:16]

value, often numeric, often integral,

[1015:19]

that uniquely identifies rows, you

[1015:21]

simply call that a primary key. When

[1015:24]

that same ID appears in another table

[1015:27]

for cross referencing purposes, you

[1015:29]

refer to it instead as a foreign key

[1015:31]

because that same key is over there in

[1015:33]

another table, thus foreign. But they

[1015:35]

refer to one and the same things in the

[1015:37]

context of the table in which it's

[1015:39]

defined. It's primary. If it appears in

[1015:41]

some other table, it is now considered

[1015:43]

foreign. All right. So, how can we make

[1015:46]

use of this? Well, let me go ahead and

[1015:49]

propose that we execute a few SQL

[1015:51]

commands as follows. If I wanted to

[1015:54]

start asking questions about ratings, I

[1015:56]

could do something like this. Select

[1015:58]

star from ratings where the rating is

[1016:01]

maybe a good show. So, let's call it 6.0

[1016:03]

or higher. But let's just limit this to

[1016:05]

the top 10 shows that meet that

[1016:07]

threshold. Enter. So here I now have a

[1016:10]

temporary table that gives me three

[1016:12]

columns from the ratings table. Show ID,

[1016:15]

which is a for the moment useless

[1016:17]

identifier because I don't know what

[1016:18]

show it corresponds to, but the rating

[1016:20]

value and the number of votes that

[1016:21]

contributed there too. Well, how might I

[1016:24]

actually get to the shows that are

[1016:26]

actually highly rated at 6.0 or higher?

[1016:29]

Well, I don't need to select star. If

[1016:30]

all I care about is these top 10, I can

[1016:32]

whittle this same command down to just

[1016:34]

selecting the ratings. And now or sorry

[1016:37]

uh sorry, not the ratings, I can whittle

[1016:40]

this uh this table down to just

[1016:42]

selecting the show ids. So this is the

[1016:44]

answer to the question. What are the top

[1016:46]

10 TV shows whose ratings are 6.0 or

[1016:49]

higher? Well, from the table, these are

[1016:51]

the first 10 that come back. How do I

[1016:53]

now select the shows that correspond to

[1016:57]

these values? Here's where things can be

[1016:59]

done a few different ways. I could

[1017:01]

select everything I know from the shows

[1017:04]

table where the ID of the show is in the

[1017:09]

following set. I'm going to do a

[1017:11]

parenthesis and then just for

[1017:12]

readability, I'm going to hit enter. The

[1017:14]

dot dot dot and angle bracket just means

[1017:16]

I'm continuing my thought. It's not

[1017:17]

executing the command yet. What is the

[1017:19]

query I now want to run? Well, it's

[1017:20]

going to be a nested query. I can now do

[1017:22]

the same thing as before. Select the

[1017:24]

show id from the ratings table where the

[1017:28]

rating is really good greater than or

[1017:29]

equal to 6.0. But let's then limit the

[1017:32]

total number of queries to just 10. So

[1017:37]

here just like in sort of grade school

[1017:38]

math we have parenthesis. So the first

[1017:41]

thing that's going to be executed is the

[1017:42]

thing inside parenthesis. So this is

[1017:43]

going to get me every show ID from the

[1017:45]

ratings table that has a really good

[1017:46]

rating of 6.0 or higher. That's going to

[1017:48]

return to me a column of values. I'm

[1017:52]

then going to say select star from the

[1017:53]

shows table where the ID of the show is

[1017:57]

in that list of values but only show me

[1017:59]

10 of those is what I'm asking here. So

[1018:02]

what I should now see is much more

[1018:03]

useful data namely the 10 shows that are

[1018:06]

highly rated. Enter. And indeed I get

[1018:09]

back these 10 shows all of whose ratings

[1018:11]

are indeed quite a bit higher. If I want

[1018:14]

to only care about the title that too I

[1018:16]

can do. So let's do this again. Instead

[1018:18]

of selecting star, let's select title

[1018:21]

from shows where the ID of the show is

[1018:23]

in the following parenthetical. Select

[1018:26]

show ID from ratings where the rating is

[1018:29]

greater than or equal to 6.0. Close my

[1018:32]

parenthesis. Limit to 10. Enter. And I

[1018:35]

see the exact same thing, but just the

[1018:37]

nail being hit on the head. Just give me

[1018:39]

the titles of those top several shows.

[1018:42]

Of course, I might want to might be able

[1018:45]

to do this differently. In other words,

[1018:47]

here's the top 10 titles. Well, what are

[1018:49]

the ratings? Like, that's why you go to

[1018:50]

IMDb or Rotten Tomatoes or the like. You

[1018:52]

want to see the actual ratings, not the

[1018:54]

titles or the ratings. Well, it turns

[1018:56]

out we're going to need another

[1018:57]

technique to do that. Namely, an ability

[1018:59]

to join two tables. And in fact, just as

[1019:01]

a teaser for this, if we want to start

[1019:03]

playing around with some real data, here

[1019:05]

might be, for instance, excerpts from

[1019:08]

two tables. Here's the shows table at

[1019:10]

left. Here's the ratings table at right

[1019:13]

or a subset thereof. If I want to figure

[1019:15]

out what the rating is for a given show,

[1019:18]

wouldn't it be nice if I could somehow

[1019:19]

like line these two tables up together

[1019:21]

such that just like the tips of my

[1019:23]

finger, I line up this value with its

[1019:25]

corresponding value over here, a cross

[1019:27]

reference of sorts. Well, just for the

[1019:29]

sake of discussion, let me just kind of

[1019:30]

visually flip this around. Though that

[1019:32]

does nothing technically underneath the

[1019:33]

hood. Let me just scooch them together

[1019:35]

now after highlighting the common

[1019:36]

values. demonstrate that. Well, wouldn't

[1019:38]

it be nice to take the shows table and

[1019:40]

join it with the ratings table in such a

[1019:43]

way that those IDs all line up? And

[1019:45]

we're going to have the ability to do

[1019:47]

just this. Um, this is a lot already,

[1019:48]

and this isn't the sort of cliffhanger

[1019:50]

I'd wanted to end on cuz who cares about

[1019:52]

joins, but it's going to be cool. But

[1019:53]

let's take our 10-minute Halloween candy

[1019:54]

break and come back in 10 for the next.

[1019:58]

All right, we are back. So, recall where

[1020:02]

we left off was essentially here. We had

[1020:04]

these two tables. the shows table at

[1020:06]

left and the ratings table at right. And

[1020:08]

the motivation here was like how do we

[1020:10]

actually associate shows with their

[1020:12]

respective ratings because the ratings

[1020:14]

of course are not in the shows table. As

[1020:16]

an aside they could be and in fact

[1020:18]

because this is meant to demonstrate a

[1020:19]

onetoone relationship whereby every show

[1020:21]

has one rating. We could have just put

[1020:24]

the rating and the number of votes into

[1020:26]

the shows table but we chose not to

[1020:28]

because uh IMDb actually stores their

[1020:31]

ratings as a separate TSV file. And so

[1020:33]

what we tried to do for par with that is

[1020:35]

only import into a ratings table the

[1020:37]

very TSV file that we had downloaded

[1020:39]

from them. But that too would be a

[1020:40]

solution there too. So at this point in

[1020:42]

the story we've got the shows table

[1020:43]

here. We've got the ratings table over

[1020:46]

here. We've noticed that there are

[1020:48]

commonalities. There are show ids that

[1020:50]

appear in both tables. And in fact to

[1020:53]

use some of the new vernacular this is

[1020:55]

the primary key. The ID column here.

[1020:58]

This is that same value but in this

[1020:59]

context it's known as a foreign key

[1021:01]

because it's in some other table. But

[1021:02]

that's going to be how we link these two

[1021:04]

things together. So, how do we select

[1021:06]

for not just The Office, but maybe every

[1021:08]

TV show its respective rating? Well,

[1021:10]

let's go back to VS Code and at my SQL

[1021:12]

light prompt, let me go ahead and do

[1021:14]

this. Select star from the shows table.

[1021:18]

But let's go ahead and join the shows

[1021:21]

table with the ratings table. How do I

[1021:24]

want to join these two tables together?

[1021:26]

We'll do so on the shows tables ID

[1021:29]

column being equal to the ratings tables

[1021:34]

show id column and then go ahead and

[1021:36]

filter the results in the following way

[1021:39]

where the rating we care about should

[1021:41]

still be greater than or equal to 6.0

[1021:43]

and let's only limit this to the top 10

[1021:45]

results. So, it's a bit more of a

[1021:47]

mouthful, but what I'm doing is

[1021:49]

selecting everything from the result of

[1021:51]

joining shows and ratings on this column

[1021:55]

with this column. And the rest of the

[1021:58]

predicate is as before. So, join is

[1022:00]

going to do literally that join these

[1022:02]

two tables as I have prescribed. When I

[1022:04]

go ahead here and hit enter, now that I

[1022:05]

have my semicolon, I get back a complete

[1022:08]

table containing everything from the

[1022:10]

shows table, everything from the ratings

[1022:12]

table with those unique identifiers

[1022:15]

lined up. Indeed, if you look at the

[1022:16]

primary key over here, the ID column,

[1022:19]

62614 dot dot dot. Over here, you have

[1022:22]

show ID, which came from the ratings

[1022:24]

table, 62614

[1022:26]

dot dot dot. So, we've taken two tables

[1022:28]

and really joined them together, but

[1022:30]

we're only seeing a subset because I

[1022:32]

limited it to 10 such rows. Now, of

[1022:34]

course, most of this data doesn't seem

[1022:35]

very interesting if my whole goal is

[1022:37]

just to tell me what the ratings are for

[1022:40]

these shows. Well, let's go ahead and in

[1022:42]

code achieve this sort of result. Let's

[1022:44]

literally join these tables together.

[1022:46]

Let's get rid of the redundancy all

[1022:47]

together. And then really, let's whittle

[1022:49]

it down to just a title column and a

[1022:51]

rating column. So, how do we do that?

[1022:52]

Well, in code, I'm going to go ahead and

[1022:54]

select more specifically the title of

[1022:57]

every show and the rating of every show

[1022:59]

from the shows table, but I'm going to

[1023:01]

join it with the ratings table on shows

[1023:05]

doid equaling ratings.show id. And as

[1023:09]

before, I'm going to limit it to where

[1023:11]

rating is greater than or equal to 6.0

[1023:14]

and 10 such results. Enter. And now I

[1023:17]

have a nice simple temporary table that

[1023:19]

in one column has the titles of these

[1023:21]

shows and in the right hand side has the

[1023:23]

ratings of the shows. Even though those

[1023:25]

two data sets were completely separate

[1023:27]

in two separate tables. Indeed, if we

[1023:30]

think back to where this data came from,

[1023:32]

what we've been focusing on is the shows

[1023:34]

table and we've joined it with the

[1023:36]

ratings table. Here's the primary key

[1023:38]

for shows. Here's the foreign key for

[1023:39]

ratings. And by convention, notice that

[1023:42]

we've adopted a certain uh a certain

[1023:45]

approach. Anything that's called ID here

[1023:47]

implies that it's a primary key.

[1023:50]

Anything that's something underscore ID

[1023:52]

implies that it's a foreign key. And the

[1023:54]

convention we adopted which is actually

[1023:56]

quite common is if the table is called

[1023:58]

shows plural, we call the foreign key

[1024:01]

show singular ID. Different companies,

[1024:04]

different communities will have

[1024:05]

different practices, but we've been

[1024:06]

consistent across all of these tables

[1024:08]

with our underscore and lowercase

[1024:09]

conventions. Yeah. I'm just curious on

[1024:12]

how these IDs all generate and relate to

[1024:15]

each other properly.

[1024:16]

>> Really good question. How do all these

[1024:18]

IDs generate and relate to each other

[1024:19]

properly? Well, in our case, I have no

[1024:22]

idea. The Internet Movie Database people

[1024:24]

came up with these unique identifiers

[1024:25]

somehow and we simply in incorporated

[1024:27]

them into our data set. In practice,

[1024:29]

what they probably did and what you will

[1024:31]

do for instance in future problem sets

[1024:33]

when generating data is you just assign

[1024:35]

an arbitrary integer starting at one

[1024:37]

then two then three then four then five

[1024:39]

and you just let it auto increment all

[1024:41]

the way up and you let the database

[1024:42]

ensure that you never have duplicate

[1024:44]

values.

[1024:45]

>> Yeah.

[1024:46]

>> Just to clarify for the dot dot dot and

[1024:48]

arrow symbol that's only to like make it

[1024:51]

look better, right? like there's no like

[1024:55]

>> correct the dot dot dot in uh uh angled

[1024:58]

bracket that you keep seeing is just the

[1024:59]

continuation prompt which means I have

[1025:01]

prematurely hit enter deliberately

[1025:03]

because I want to move everything onto

[1025:04]

the next line so it doesn't wrap ugly

[1025:06]

onto multiple lines it is not SQL syntax

[1025:09]

it's specific to SQL light 3 and it's

[1025:11]

just a continuation of the thought

[1025:13]

that's all good good observation yeah

[1025:17]

>> when you limit it to 10 showing how

[1025:27]

Good question. When you limit something

[1025:29]

to 10, for instance, which ones do you

[1025:30]

get? You just get literally the first 10

[1025:32]

rows from the table. And so it will

[1025:34]

typically be ordered if you don't use

[1025:36]

the order by uh keywords uh in the same

[1025:40]

order from which it came from those

[1025:42]

tables. And so you're just seeing

[1025:43]

arbitrarily the first 10 that match that

[1025:45]

predicate, which is rating greater than

[1025:47]

or equal to six. We have not ordered it

[1025:50]

by rating. So I'm not getting like the

[1025:51]

10.0 shows necessarily. I'm just getting

[1025:53]

the first 10 shows that are greater than

[1025:55]

six. And the point for that is just I

[1025:56]

want it to fit on the screen rather than

[1025:58]

see hundreds of thousands of answers.

[1026:01]

Okay. So you might recall now that there

[1026:03]

were certainly other tables besides

[1026:05]

these. So let's see in the broader

[1026:07]

scheme, not just shows and ratings, but

[1026:09]

let's focus on genres. If only because

[1026:11]

genres is interesting because it's no

[1026:12]

longer a onetoone relationship because

[1026:14]

of course why would a show have multiple

[1026:16]

ratings. It sort of has its own rating.

[1026:18]

But a show could certainly belong to

[1026:19]

multiple genres. You could imagine a

[1026:21]

show being a comedy and a drama or a

[1026:23]

musical and a comedy or any other number

[1026:25]

of combinations of one or more genres.

[1026:28]

And so the way we've chosen to implement

[1026:30]

that here too is with a separate table

[1026:32]

called genres which is not perfect.

[1026:34]

There's going to be some redundancies

[1026:35]

here that we have not yet eliminated.

[1026:37]

But it does indicate that we can go

[1026:39]

ahead and have multiple such values

[1026:43]

associated with each and every show. So

[1026:46]

how do we get there? Let's focus just on

[1026:48]

this. Let's go back in just a moment to

[1026:49]

VS Code and let's take a look at the

[1026:51]

schema for now genres. In genres, we

[1026:54]

have the following. A table called

[1026:56]

genres which got has two columns. A show

[1026:58]

ID which is an integer that cannot be

[1027:00]

null and a genre which is text which is

[1027:02]

also not be null. And now for the first

[1027:03]

time, let's actually use some of the

[1027:05]

vernacular we've introduced. Here we

[1027:07]

have an example explicitly in SQL that

[1027:09]

specifies when creating this table that

[1027:12]

it shall the show id column shall be a

[1027:14]

foreign key that references the shows

[1027:17]

tables ID column. And admittedly I think

[1027:20]

the syntax for creating tables is a bit

[1027:22]

of a mouthful even. I often have to read

[1027:24]

uh to look it up to remember the order

[1027:26]

of everything. But here we have the

[1027:28]

columns listed first and then these key

[1027:31]

constraints. Foreign key referencing

[1027:33]

this primary key over here. And in fact,

[1027:36]

let's rewind to look at the shows table

[1027:37]

now to see from which uh from whence we

[1027:40]

came. So if I do do schema of shows,

[1027:44]

which we've done before, but waved our

[1027:46]

hand at it, then we'll indeed see that

[1027:48]

shows has a primary key called ID, which

[1027:50]

is an integer. How do I know that?

[1027:52]

Because the very last thing in the

[1027:53]

parenthesis says that the ID column in

[1027:56]

this table is a primary key. Then we see

[1027:58]

that uh the title is text can't be null.

[1028:00]

The year is numeric, which again I

[1028:02]

described as sort of a catchall for

[1028:03]

other real world numeric types that

[1028:05]

aren't purely integers or uh real

[1028:08]

numbers per se. Episodes is an integer.

[1028:10]

Both of those apparently can be null

[1028:12]

because maybe IMDb just doesn't have

[1028:14]

that data for some older shows, but

[1028:16]

primary key is indeed specified here.

[1028:17]

And just for thoroughess, let me

[1028:19]

distinguish now genres from ratings. If

[1028:22]

I do schema ratings again, which we

[1028:24]

waved our hand at earlier, very similar

[1028:27]

in spirit to genres in that there's an

[1028:30]

ID column that somehow references the

[1028:32]

shows table and then some other column

[1028:34]

here, genre. In this case, we had

[1028:36]

ratings and votes, which were reals and

[1028:38]

integers respectively. But notice this

[1028:40]

one additional constraint here. I

[1028:43]

deliberately specified that show ID in

[1028:45]

the ratings table must be unique. That

[1028:48]

is to say, you cannot have the same show

[1028:50]

ID more than once in the ratings table.

[1028:52]

Why? Because I indeed wanted a onetoone

[1028:55]

relationship. And it would not be one

[1028:56]

one if there were multiple show ids that

[1028:59]

correspond to one uh ID in the shows

[1029:02]

table itself. But genres, we're going to

[1029:05]

allow that it's uh can be duplicates.

[1029:08]

And so we don't have mention of unique

[1029:10]

there. All right. So where does this get

[1029:12]

us? Well, let me go back into uh my

[1029:15]

terminal here after clearing all of

[1029:16]

that. And let's go ahead and just see

[1029:18]

the data to wrap our mind around it a

[1029:20]

little more uh real. So select star from

[1029:23]

genres limit 10 just to see the the

[1029:26]

first 10. All right. So it looks like

[1029:27]

there's some comedies, adventures,

[1029:29]

comedies, family, action, sci-fi, and so

[1029:31]

forth. Well, let's go ahead and look up

[1029:33]

just one show's information. In fact, I

[1029:36]

saw this number, this ID before. How

[1029:38]

about let's just look up this show. What

[1029:39]

is this adventure show? Uh 63881. So

[1029:43]

select star from shows where ID equals

[1029:46]

63881 semicolon. Okay. So this is the

[1029:48]

show called Catweel from 1970 which had

[1029:51]

26 episodes in total and that was indeed

[1029:54]

its unique identifier. So that's all

[1029:56]

fine and good if I want to see something

[1029:58]

about that specific show. But as before,

[1030:00]

how do I associate Cat Weasel in this

[1030:02]

case with all of its genres? Well,

[1030:04]

instead of it being a onetoone

[1030:05]

relationship necessarily, maybe Cat

[1030:07]

Weasel is not just an adventure. Maybe

[1030:09]

it's also a comedy and a family show.

[1030:11]

And indeed, if I go back to the results

[1030:13]

just now, you'll see that 68111

[1030:15]

indeed lines up with adventure, comedy,

[1030:18]

and family. And then the ID changes to

[1030:20]

be about some other show. So, how do I

[1030:22]

select these three answers to the

[1030:24]

question, what genre is Cat Weasel?

[1030:27]

Well, for this, we need to talk about

[1030:28]

one to many relationships and how we can

[1030:30]

get those back. Well, let's go ahead and

[1030:32]

do this now in my terminal. Let me go

[1030:35]

ahead and say uh the following. Select

[1030:39]

genre from the genres table where the

[1030:42]

show ID equals just that 63881, which

[1030:46]

I'm now starting to memorize, adventure,

[1030:48]

comedy, and family. So, that's the

[1030:49]

answer to the question, but this

[1030:51]

certainly isn't the best way to do this

[1030:52]

where you have to like look up the

[1030:54]

unique ID for the show you care about,

[1030:56]

then copy paste it or memorize and type

[1030:58]

it out into this query just to get the

[1031:00]

genres. It would be nice to just ask all

[1031:02]

of this in one breath. Well, we can do

[1031:04]

this even though it's a bit more

[1031:05]

verbose. I'm going to instead this time

[1031:07]

say select genre from genres where the

[1031:10]

show id I care about equals and now I'm

[1031:13]

just going to hit enter so as to move

[1031:15]

this nested query inside of parenthesis

[1031:18]

and I'm going to say well I don't know

[1031:20]

off the top of my head what the unique

[1031:21]

ID is for catw weasel but I can ask the

[1031:23]

database select the ID from the shows

[1031:26]

table where the title of the show equals

[1031:29]

cat weasel and this now obviates the

[1031:32]

need for me to memorize or copy paste

[1031:34]

that unique ID I'll hit enter and close

[1031:36]

my parenthesis. Uh, I'm going to go

[1031:38]

ahead then and say uh, semicolon enter.

[1031:42]

And now I get back the exact same

[1031:44]

answers, but without having to know or

[1031:46]

care about these numeric values. And

[1031:48]

that's kind of the point here. Even

[1031:49]

though the database itself, the actual

[1031:51]

IMDb website needs to use these unique

[1031:54]

identifiers to store everything in the

[1031:56]

database, we humans, generally speaking,

[1031:58]

should not know or care what these

[1032:00]

identifiers are. They're just meant to

[1032:02]

implement this notion of relationships,

[1032:04]

these cross references. And so here we

[1032:06]

see an example where you can ask the

[1032:07]

question you care about without worrying

[1032:09]

about any of the underlying numbers or

[1032:11]

even seeing them as a result. All right.

[1032:15]

Well, what's really how else might we go

[1032:17]

about do doing this? Well, let me

[1032:19]

propose that we join these two tables

[1032:21]

and ask the question in a slightly

[1032:22]

different way. So, here's an excerpt

[1032:23]

from the shows table. Here's an excerpt

[1032:25]

from the genres table. And clearly we

[1032:27]

could do something like we did before

[1032:28]

for ratings where we could line these

[1032:30]

two up and kind of join them together.

[1032:34]

Just for the sake of discussion, let me

[1032:35]

flip these columns around though that

[1032:36]

has no technical significance. And now

[1032:38]

we can clearly see 63881 appears there

[1032:40]

and here. The difference though because

[1032:42]

now this is a one to many relationship

[1032:44]

is that it's not quite as simple as just

[1032:46]

joining the rows together. I need to

[1032:48]

kind of join it here and here and here.

[1032:51]

And the database can do this for you

[1032:53]

albeit at some cost in redundancy. So

[1032:55]

what I'm going to observe is that these

[1032:57]

ids are all the same. Primary key in

[1033:00]

this context, foreign key in this

[1033:01]

context. Well, I'm going to start to

[1033:04]

join them together here, but it's not

[1033:06]

possible to return a temporary table

[1033:09]

that's just outright missing data. You

[1033:11]

have to get the same number of rows and

[1033:13]

columns everywhere in a grid. So what

[1033:15]

the database is going to do if I do join

[1033:16]

these two tables together and they are

[1033:18]

participating in a one to many

[1033:20]

relationship with each other, it's going

[1033:22]

to duplicate the data that's necessary

[1033:24]

to sort of make every row look the same.

[1033:26]

Downside is it might indeed be taking up

[1033:28]

some additional space unless the

[1033:29]

database is smart and somehow using

[1033:30]

pointers or something like that

[1033:31]

underneath the hood to avoid the

[1033:33]

redundancy. But for my purposes, this is

[1033:36]

actually quite nice because if I iterate

[1033:37]

over these rows, as I could in Python,

[1033:40]

as we'll eventually see, it's just nice

[1033:42]

to have all the data you care about in

[1033:44]

each and every row, even though it's

[1033:45]

clearly redundant. But the data is not

[1033:48]

being stored redundantly in the data.

[1033:50]

It's just temporarily being presented to

[1033:52]

me with this here, redundancy. So, what

[1033:55]

do I really want to have happen? Well, I

[1033:57]

really care about actually joining these

[1033:59]

two tables together and ultimately just

[1034:01]

getting back the title and the genre

[1034:03]

respectively. So, let me go ahead and my

[1034:06]

VS code here and do select title and

[1034:10]

genre from the shows table. But let's

[1034:13]

join it this time on the genres table on

[1034:15]

shows ID equaling genres.show id. So

[1034:19]

that's quite the same as with ratings

[1034:21]

where uh the ID equals just for time

[1034:25]

sake 63881 which I know is Catweasel but

[1034:28]

I could certainly use a nested query if

[1034:30]

I wanted to do this as before. Enter.

[1034:32]

And I get back Catweel's three genres.

[1034:36]

And if I were to loop over this data in

[1034:38]

some kind of like Python code, I would

[1034:40]

have access to the title and genre with

[1034:42]

each iteration, which I claim is useful.

[1034:45]

But if I don't care about that and I

[1034:46]

just really want to select the genres, I

[1034:48]

can do this with joins too. Let me just

[1034:50]

select the genre from shows joining it

[1034:54]

on genres on shows ID equaling genres.

[1034:59]

ID where the ID is catw weasel 63881.

[1035:04]

And now I get back just that answer. So

[1035:06]

in short, what have we just seen? One,

[1035:08]

you can join two tables together and

[1035:10]

whittle down the temporary table to just

[1035:12]

the data you care about. Or if you

[1035:14]

prefer, and if I scroll back up in my

[1035:16]

history here, you could take a

[1035:18]

fundamentally different approach but

[1035:20]

still get the same answer of simply

[1035:22]

using a nested query. I would say as you

[1035:24]

learn SQL for the first time, I think

[1035:26]

it's quite often easier to just do

[1035:28]

multiple nested queries because you sort

[1035:31]

of work your way uh from the inside out,

[1035:33]

taking sort of baby steps to the

[1035:34]

problem. If the problem in question is

[1035:36]

give me all of the genres for a specific

[1035:38]

TV show, well, first I need to know

[1035:41]

because I know how the data is laid out

[1035:43]

in the database. I need to know the

[1035:45]

unique ID of the show I care about.

[1035:47]

Fine, that's pretty straightforward and

[1035:48]

hence this inner query. Once you have

[1035:50]

that, you can parenthesize it and on the

[1035:52]

outside now you can select the question

[1035:54]

to which you really want the answer,

[1035:56]

which is what is the genre that lines up

[1035:58]

with that show ID one or more times. So

[1036:01]

in short, nested queries probably easier

[1036:04]

and certainly when learning it for the

[1036:05]

first time, but quite powerful are these

[1036:08]

join queries where this achieves the

[1036:10]

exact same result. Especially if I were

[1036:12]

to generalize away the 63881 and do a

[1036:14]

nested query here. Sometimes you want

[1036:16]

join, sometimes nested queries suffice.

[1036:19]

>> How does SQL do all these searches?

[1036:23]

>> Oh my goodness. How does SQL do all of

[1036:25]

these searches? What's its time

[1036:26]

complexity? We'll talk about that toward

[1036:28]

the end of today. In the most naive

[1036:30]

implementation, SQL is essentially just

[1036:32]

doing linear search from the top of the

[1036:34]

table all the way to the bottom.

[1036:36]

However, we as the programmers are going

[1036:38]

to have the ability to optimize those

[1036:40]

queries so that the database can

[1036:42]

actually do something closer to binary

[1036:43]

search and in general we'll be able to

[1036:45]

achieve much better performance as a

[1036:47]

result. A really good question. All

[1036:49]

right, let's go back to the big uh

[1036:51]

flowchart of this data set. We've looked

[1036:53]

now at shows and ratings. We've looked

[1036:55]

at shows and genres. Let's now focus on

[1036:58]

the juiciest part like the part that

[1036:59]

associates shows with people. That is

[1037:01]

who stars in what. Thinking back now to

[1037:04]

what I was mocking up in the Google

[1037:05]

sheet at the very start whereby I wanted

[1037:07]

to somehow be able to associate the

[1037:08]

office with Steve Carell and John

[1037:10]

Krinski and Jenna Fischer and so forth.

[1037:12]

The right way and the right way I claim

[1037:14]

is going to be like this. Here's my

[1037:16]

people table which has a primary key of

[1037:18]

ID and then the name of each person and

[1037:20]

their birth year if known. Then we have

[1037:22]

the shows table which we keep talking

[1037:23]

about which again has a primary key, a

[1037:25]

title and year and episodes thereof. And

[1037:28]

then the stars table is somewhat new now

[1037:30]

because now when it comes to people

[1037:32]

starring in TV shows we have a third and

[1037:34]

final type of relationship, a many to

[1037:37]

many relationship. Why? Because it's

[1037:39]

certainly the case that one person can

[1037:41]

be in multiple shows. And it's certainly

[1037:43]

the case that some shows have multiple

[1037:45]

people hence many to many. So this is

[1037:47]

the third and final relationship where

[1037:49]

just to recap ratings was one one genres

[1037:53]

was one to many and now stars is going

[1037:56]

to be many to many. All right let's dive

[1037:59]

in. So these queries will be a bit more

[1038:02]

verbose but again they're going to

[1038:04]

follow this principle of sort of taking

[1038:05]

baby steps to the answer we care about.

[1038:07]

Let me go back into VS Code here and

[1038:09]

suppose I want to find out everything

[1038:11]

about the office that we know. So,

[1038:13]

select star from shows where title

[1038:15]

equals quote unquote the office

[1038:18]

semicolon. Well, that's interesting.

[1038:20]

There's a whole bunch of offices. There

[1038:21]

was the UK version. There's a few other

[1038:22]

variants, but the one we're probably

[1038:24]

talking about with these stars is the

[1038:26]

one that started in 2005 with 188

[1038:29]

episodes. That's the US version in fact.

[1038:31]

So, let me be a little more precise. Let

[1038:33]

me select everything I know from the

[1038:34]

stars from the shows table where the

[1038:36]

title equals office and year equals

[1038:38]

2005. so we don't confuse our answers

[1038:40]

with the other versions of the office.

[1038:43]

Now, how do I go about selecting all of

[1038:45]

the people who starred in that version

[1038:47]

of The Office? Well, I already have an

[1038:50]

answer to the question of what is the ID

[1038:52]

of that version of The Office because

[1038:54]

it's right there in front of me. And in

[1038:55]

fact, I can narrow my query more

[1038:57]

precisely. Let's just select the ID from

[1039:00]

the shows table where the title is the

[1039:02]

office and the year is 2005. 386676.

[1039:06]

Now, I could lazily just copy paste that

[1039:08]

or memorize it, but we're going to do

[1039:09]

this query more dynamically. I want to

[1039:12]

next though figure out who is in that

[1039:16]

show. So, if I have a show ID, I want to

[1039:19]

figure out who's in it. But how do I get

[1039:21]

to the people and the names of those

[1039:23]

people? I have to logically go through

[1039:25]

this cross referencing of the stars

[1039:27]

table. So, here's where this query is

[1039:29]

going to be a bit meteor than the past

[1039:31]

ones and that we need to do a bit more

[1039:33]

work than before. All right. Well,

[1039:34]

what's the work I need to do? Let me go

[1039:36]

ahead now and do the following. Select

[1039:40]

all of the person IDs that are

[1039:42]

associated with this show id. So, how do

[1039:46]

I do that? Select person ID from the

[1039:48]

stars table where the show ID equals and

[1039:53]

I could lazily copy paste this, but

[1039:56]

let's avoid that. Where the show ID

[1039:58]

equals, let me now in parenthesis do

[1040:00]

this. select ID from shows where title

[1040:04]

equals quote unquote the office and year

[1040:08]

equals 2005 and then close my

[1040:11]

parenthesis semicolon. So what am I

[1040:12]

doing? I'm taking a second baby step if

[1040:14]

you will. The innermost query inside the

[1040:17]

parenthesis is just again dynamically

[1040:19]

figuring out the unique ID of the office

[1040:21]

I care about. The outer query is now

[1040:24]

figuring out all of the person IDs

[1040:26]

associated with that show as per the

[1040:28]

stars table. And the stars table has

[1040:30]

only two columns. Show id and person ID.

[1040:33]

That's how the linkage is done just with

[1040:35]

those integers. Enter. I now have a

[1040:38]

column of person IDs that are starring

[1040:41]

in that version of the office. So how do

[1040:43]

I take this one final step if I really

[1040:45]

want to care about their names and not

[1040:47]

their random person IDs? Well, I could

[1040:49]

go ahead and select the name from the

[1040:52]

people table where that person's ID is

[1040:56]

in the following set. So when I'm

[1040:58]

dealing with a single value, I just use

[1041:00]

equals for equality. But when I'm

[1041:02]

dealing with a whole result set, a whole

[1041:04]

column of answers, I use the preposition

[1041:06]

in in SQL instead. So where the person's

[1041:10]

ID is in the following data set. Well,

[1041:13]

let's do the same query as before.

[1041:14]

Select all the person IDs from the stars

[1041:17]

table where the show ID I care about

[1041:21]

equals because there's only one show I

[1041:22]

care about. I'm going to further

[1041:24]

parenthesize this. Select ID from shows

[1041:27]

where title equals quote unquote the

[1041:30]

office and year equals 2005.

[1041:33]

Uh, enter. I'll close my parenthesis.

[1041:35]

Enter. I'll close my parenthesis.

[1041:37]

Semicolon. And now from the outside in,

[1041:40]

I've taken three baby steps. The

[1041:41]

innermost one just gets me the show ID.

[1041:44]

The second one in the middle gets me all

[1041:46]

of the related person IDs. And the last

[1041:48]

one is really the final flourish. Get me

[1041:50]

all of the names of these people based

[1041:52]

on those IDs. Enter. And now we see all

[1041:56]

of the stars in this show beyond even

[1041:58]

the subset that we've been playing with

[1041:59]

visually on the screen.

[1042:03]

Okay, that's a lot. Let me pause here

[1042:05]

and see if there's any questions. Yeah,

[1042:11]

>> this outermost query is what gives me

[1042:13]

the names. But that query needs to know

[1042:16]

the ID of the person who name whose name

[1042:18]

you want. So the middle query actually

[1042:21]

gets all of those person IDs. But to get

[1042:23]

those person IDs, I need to know the

[1042:25]

show id. So the innermost query, this

[1042:27]

one gets me the show ID of the office

[1042:30]

itself.

[1042:33]

All right. So at the risk of

[1042:35]

overwhelming, here are other ways you

[1042:37]

can solve the same problem. But I do

[1042:39]

claim that the nested selects is

[1042:41]

probably conceptually and pragmatically

[1042:43]

the easiest way. But let's also solve

[1042:46]

this problem by doing a few joins just

[1042:47]

so you've seen it. Actually, before we

[1042:49]

uh do a join, let's let's flip the

[1042:51]

question around first. How about all of

[1042:52]

the shows that Steve Carell has starred

[1042:54]

in besides The Office? So, let me select

[1042:56]

everything I know from the people table

[1042:58]

where the name of the person equals

[1043:00]

quote unquote Steve Carell semicolon.

[1043:03]

All right, there seems to be only one

[1043:04]

Steve Carell in IMDb born in 1962.

[1043:07]

That's all nice and good. What I really

[1043:08]

care about is his ID. So, I'm going to

[1043:10]

uh narrow this down to selecting just

[1043:12]

his ID. Now, I could memorize or copy

[1043:15]

paste 136797, but don't need to do that.

[1043:18]

Let's just use this as part of a nested

[1043:19]

query. Let's now select all of the show

[1043:22]

ids from the stars table that are

[1043:25]

somehow related to Steve Carell's person

[1043:28]

ID. So where person ID equals and I

[1043:32]

could copy paste this but that's

[1043:34]

generally frowned upon. So let's not do

[1043:36]

that. Let's just set it equal to a

[1043:38]

nested query where I do the same thing

[1043:39]

as before. Select ID from people where

[1043:42]

name equals Steve Carell. Then close my

[1043:46]

parenthesis semicolon. All right. He's

[1043:48]

been in a lot of TV shows, but this is

[1043:50]

not useful because I have no idea what

[1043:51]

all of these integers are. So, the final

[1043:53]

flourish, select the title from the

[1043:56]

shows table where the ID of the shows I

[1043:59]

care about is somehow in this

[1044:02]

parenthetical list. Well, what's that

[1044:04]

parenthetical list? Well, select the

[1044:05]

show ID from stars where the person ID

[1044:09]

equals Steve Carell's. What is his ID?

[1044:11]

Well, I didn't memorize it. So, I'm

[1044:13]

going to select ID from people where the

[1044:16]

name of the person I care about is Steve

[1044:19]

Carell, quote unquote. Close these par

[1044:22]

this parenthesis. Close this

[1044:23]

parenthesis. Semicolon. Enter. And now I

[1044:25]

see all of Steve Carell shows. And even

[1044:28]

though we're doing this in a black and

[1044:29]

white command line environment, think

[1044:31]

about what the actual IMDb is doing with

[1044:33]

both of these queries. If you go to

[1044:34]

IMDb.com and search for Steve Carell,

[1044:36]

even though there's going to be a lot of

[1044:37]

colors and pretty pictures and whatnot,

[1044:39]

you'll probably get in some form a list

[1044:41]

of all of Steve Carell shows. Or if you

[1044:43]

search for The Office, you'll get a list

[1044:45]

in some form of all of the stars there

[1044:47]

in. I could claim then that if imdb.com

[1044:50]

is using SQL, which it very likely is,

[1044:53]

but not necessarily, they are executing

[1044:55]

queries just like we did. And when you

[1044:58]

type into the search box something like

[1045:00]

the office or Steve Carell, they're

[1045:02]

essentially just copy pasting your user

[1045:05]

input into a prefabbed SQL query that

[1045:08]

they wrote in advance so as to get you

[1045:10]

the answers that you actually care

[1045:12]

about. So this is how a lot of today's

[1045:14]

websites and mobile apps are actually

[1045:16]

working. The programmer comes up with

[1045:18]

sort of the template for the queries you

[1045:20]

might ask and then you supply the actual

[1045:22]

data you're searching for. All right,

[1045:25]

how about now as promised a couple of

[1045:27]

other ways to implement these many to

[1045:29]

many relationships uh based queries but

[1045:33]

by using joins. If I know I need to

[1045:36]

involve the shows table, the people

[1045:38]

table and the stars table, I can

[1045:39]

actually do this all in one breath

[1045:40]

without any nested queries. Select for

[1045:43]

me the title from the shows table. But

[1045:47]

let's join that on the stars table on

[1045:51]

shows do ID equaling stars dot show id.

[1045:56]

Uh

[1045:58]

but let's additionally join the shows

[1046:00]

table on the following. Let's join it on

[1046:03]

people on stars.person

[1046:06]

id equaling people id. In other words,

[1046:08]

if you know conceptually that you've got

[1046:10]

these three tables, you want to somehow

[1046:12]

combine them without using nested

[1046:14]

selects. just figure out how to line

[1046:16]

them all up. So again, I'm selecting

[1046:18]

from the shows table, but I'm joining it

[1046:20]

with the stars table by lining up the

[1046:23]

shows tables primary key with the stars

[1046:26]

tables foreign key. And I'm lining it up

[1046:29]

with the people table by lining up the

[1046:32]

stars tables foreign key with the people

[1046:35]

tables primary key. I'm just kind of

[1046:37]

logically connecting all of the things I

[1046:39]

know to be related. And lastly, let's

[1046:41]

just say where the name I care about

[1046:43]

equals quote unquote Steve

[1046:46]

Carell semicolon. It's a little slower

[1046:49]

for now. And this speaks to the question

[1046:50]

that was asked earlier. How is the

[1046:51]

database doing this? Well, slowly,

[1046:53]

apparently by default, unless we

[1046:54]

optimize it, I got back essentially the

[1046:57]

same results. Although there is some

[1046:59]

duplication as a result uh which alludes

[1047:01]

to the um filling in blank of blanks

[1047:04]

that I alluded to earlier. But let me

[1047:06]

show you one other technique too. But

[1047:08]

again, I would encourage you certainly

[1047:09]

for problem set seven to focus on nested

[1047:11]

queries when you can because they're a

[1047:12]

little conceptually simpler. If I care

[1047:14]

about the titles of those shows, I could

[1047:16]

select title from the shows table and

[1047:19]

the stars table and the people table all

[1047:21]

at once in one breath. But I want to do

[1047:24]

so where the shows tables primary key

[1047:27]

equals the stars tables foreign key. uh

[1047:32]

and the people tables primary key equals

[1047:35]

the stars tables foreign key and the

[1047:39]

name I care about is Steve Carell. In

[1047:41]

other words, this is just a third way to

[1047:43]

express the exact same idea by doing

[1047:46]

implicit joins by selecting data clearly

[1047:48]

from all three tables as per this

[1047:50]

commaepparated list of table names, but

[1047:52]

telling the database with your

[1047:54]

predicate, the wear clause, how you want

[1047:56]

to line all of those tables up. If I hit

[1047:58]

enter here, cross my fingers, I should

[1048:01]

get back the same results as well,

[1048:04]

albeit with duplication, which I didn't

[1048:06]

see in the nested queries. Okay, that

[1048:08]

too was a mouthful. Let me pause here

[1048:11]

for questions.

[1048:13]

Yeah,

[1048:15]

>> to do that,

[1048:21]

>> correct? In order to do this, you as the

[1048:22]

programmer must know the internal

[1048:24]

structure of the database, which is

[1048:25]

quite often the case, whether you

[1048:26]

created the database yourself or you

[1048:28]

work with a colleague who designed the

[1048:30]

schema for the database. That said, I

[1048:32]

think your question is hinting at sort

[1048:33]

of the challenge like I really need to

[1048:35]

know the underlying implementation

[1048:37]

details when really all I care about is

[1048:38]

the answers to my questions. In code

[1048:41]

quite oftenly nowadays um there are

[1048:44]

object relational mappings whereby you

[1048:46]

can use OMS for short whereby you can

[1048:49]

use libraries that they understand the

[1048:52]

underlying database schema. You as the

[1048:54]

programmer do not need to because it

[1048:55]

figures out how to do all of the joins

[1048:58]

for you. So for CS50 we're introducing

[1049:00]

everyone to the bottom up understanding

[1049:01]

of how these joins work. But that too

[1049:03]

can be easily automated because of those

[1049:06]

schemas. Yeah. Just notice when you're

[1049:08]

typing across you indent is indentation

[1049:11]

important in SQL.

[1049:12]

>> Good question. Is indentation in SQL

[1049:14]

important? Technically no. But like with

[1049:17]

any of the languages we've talked about

[1049:18]

thus far, it is good for the humans and

[1049:20]

certainly good for the students in a

[1049:22]

context like this. Python of the

[1049:23]

languages we looked at is the most

[1049:25]

rigorous whereby indentation very much

[1049:27]

matters and the consistency thereof. SQL

[1049:29]

I'm just trying to pretty print things

[1049:30]

to make it easy to gro visually. All

[1049:33]

right. So those last two queries were

[1049:35]

arguably kind of slow. Whereas with my

[1049:37]

nested queries, I actually got lucky and

[1049:38]

just boom, I got the answer quite

[1049:39]

quickly. Those joins seem to be a step

[1049:42]

backwards and that it was taking more

[1049:43]

time to get back the same data that I

[1049:45]

actually cared about. But that's

[1049:47]

something we can actually chip away at.

[1049:48]

It turns out that one of the other

[1049:50]

values of a relational database visa v

[1049:53]

something like a spreadsheet is that you

[1049:55]

can actually tell the database in

[1049:56]

advance how to optimize for certain

[1049:58]

queries. This is not the case for

[1050:00]

spreadsheets. If you have a lot of data

[1050:01]

in Google spreadsheets or Microsoft

[1050:03]

Excel or Apple Numbers, tens of

[1050:05]

thousands of rows, hundreds of thousands

[1050:07]

of rows, millions of rows, your

[1050:09]

computer's going to slow to a crawl. And

[1050:10]

at some point, those software packages

[1050:12]

are just going to say, "Sorry, file is

[1050:13]

too big." And they're certainly not

[1050:15]

going to be terribly fast at searching

[1050:16]

the data. But with a SQL database and

[1050:19]

relational databases more generally, you

[1050:21]

are as much the architect of it as you

[1050:22]

are the user of it in this case. And so

[1050:25]

you can tell the database in advance if

[1050:27]

you want to optimize for certain queries

[1050:30]

like select statements. So for instance,

[1050:32]

let me go back to VS Code here and just

[1050:34]

for the sake of discussion, let's time

[1050:36]

how long it takes to find all of the

[1050:38]

shows whose name is the office. I'm

[1050:41]

going to use a SQLite command called

[1050:43]

timer. And I'm going to set it to on.

[1050:44]

And this is just now going to tell me

[1050:46]

for every command I run how long it

[1050:48]

took. I'm going to now select everything

[1050:51]

from the shows table where the title of

[1050:53]

the show equals quote unquote the office

[1050:55]

close quote semicolon enter. And that

[1050:58]

query took let's say in real terms 0.042

[1051:02]

seconds. That's crazy fast. Like it's

[1051:04]

less than a second. I mean it's truly a

[1051:06]

split second. So no big deal. But it's a

[1051:08]

fairly simple query. But I bet we could

[1051:10]

optimize even this. Now why would you

[1051:12]

want to optimize even queries that are

[1051:13]

already pretty fast? Well, if they're

[1051:15]

very commonly being executed, and I dare

[1051:17]

say someone going to imdb.com and

[1051:19]

searching for The Office or any TV show,

[1051:21]

like that's the common case. People are

[1051:22]

looking for TV shows, movies, actors,

[1051:24]

and so forth. It'd be nice to use as

[1051:26]

little amount of time to answer those

[1051:28]

questions as possible. Why? One, it

[1051:30]

makes for happier customers and users

[1051:31]

because you're getting them the answer

[1051:32]

faster. Two, it saves you money because

[1051:35]

presumably if you've spent $1,000 for a

[1051:38]

server and that server has certain

[1051:40]

amount of RAM, a certain speed CPU or

[1051:42]

brain, it can only do so many searches

[1051:44]

per unit of time, per second, per

[1051:46]

minute, or the like. So, wouldn't it be

[1051:48]

nice if all of those searches is faster

[1051:50]

using less time? So, you can handle not

[1051:52]

a thousand users at once, but 2,000

[1051:54]

users or 5,000 users all with the same

[1051:56]

hardware. So, there's uh certainly

[1051:58]

upsides there. Well, how can I go about

[1052:00]

optimizing a query? Well, I can create

[1052:02]

my own index. Another use of the create

[1052:04]

keyword in SQL where I can tell the

[1052:06]

database to optimize for searches on a

[1052:10]

specific table and specific columns

[1052:12]

therein. I say create index and then I

[1052:15]

come up with a name for the index

[1052:16]

whatever I want on the name of the table

[1052:19]

that I want to index and then in

[1052:20]

parenthesis the columns that I want to

[1052:22]

optimize for. So what does this mean in

[1052:24]

real terms? Well, let's go back to VS

[1052:26]

Code here and let me create an index

[1052:28]

called for instance title index though

[1052:30]

the name doesn't matter on the shows

[1052:32]

table uh using the title column. In

[1052:36]

other words, tell the database please

[1052:38]

expedite searches on the shows tables

[1052:41]

title column. After all, that's what I

[1052:44]

just searched on. Enter. Now, that took

[1052:46]

a moment, almost half a second, but

[1052:48]

that's a table. That's an index that

[1052:49]

only has to be created once. If I do a

[1052:51]

lot of updates and deletes, it might

[1052:53]

actually take a little bit of time over

[1052:55]

over the course of using the database to

[1052:57]

maintain that index. But for now, that's

[1052:59]

a one-time operation, creating the

[1053:01]

index. But watch what happens now if I

[1053:03]

scroll up in my history and go to the

[1053:05]

exact same query as before, which

[1053:07]

previously took 0.042

[1053:10]

seconds, which yes, is fast, but not

[1053:12]

nearly as fast as the new version, which

[1053:15]

is 0.001

[1053:18]

seconds instead. orders of magnitude

[1053:20]

faster. So I can handle 4 uh2 times as

[1053:24]

many users on the same database so to

[1053:26]

speak than I could have previously just

[1053:28]

by building this index. So what actually

[1053:31]

is an index? Well, we come full circle

[1053:33]

to discussions in like uh week five of

[1053:35]

the class. So an index in a database is

[1053:38]

very often created using what's called a

[1053:39]

B tree. This is not binary tree. A B

[1053:42]

tree is its own distinct structure

[1053:44]

that's very similar in spirit in that

[1053:45]

it's fairly shallow because most of the

[1053:48]

nodes have children but it doesn't

[1053:49]

necessarily have two children. It might

[1053:51]

have more children. And in fact, the

[1053:52]

more children the nodes have, the sort

[1053:54]

of higher up you can pull all of the

[1053:56]

leaf nodes and the shorter you can make

[1053:58]

the height of the tree. So this is just

[1054:00]

a generic representation of a B tree.

[1054:02]

But what this implies is that when I am

[1054:04]

now searching for titles like the

[1054:06]

office, the database doesn't have to do

[1054:08]

the default behavior which is start at

[1054:09]

the top and use linear search all the

[1054:11]

way to the bottom. If it has proactively

[1054:13]

built up an index in memory thanks to my

[1054:16]

command, it now has a treel like

[1054:18]

structure storing those titles that

[1054:20]

allows it to find in some logarithmic

[1054:22]

time whether it's log base 2 or some

[1054:24]

other base the same data much more

[1054:27]

quickly. And that's how we went from 042

[1054:30]

to 0.001

[1054:32]

second instead in this case here.

[1054:36]

Questions then on these here indexes?

[1054:42]

No. All right. Well, let's propose that

[1054:45]

we can combine some of today's ideas. It

[1054:47]

turns out that now we're getting to the

[1054:50]

point in the course where you're not

[1054:51]

just choosing between this language and

[1054:52]

another. You're generally using a suite

[1054:54]

of languages to solve problems. And

[1054:55]

indeed, in the coming weeks of the

[1054:57]

class, when we transition to web-based

[1054:58]

applications, you're going to use a bit

[1055:00]

of Python, you're going to use a bit of

[1055:01]

SQL, you're going to use a bit of

[1055:03]

JavaScript and two other languages

[1055:04]

called HTML and CSS. You might be using

[1055:06]

like five different languages at a time

[1055:08]

just to build one application. Why?

[1055:10]

Because some of them are better for the

[1055:12]

job than others. And indeed, that's the

[1055:13]

ecosystem in which real world software

[1055:15]

development is done. Well, to make this

[1055:17]

bridge, we have a version of the CS50

[1055:19]

library, recall, for Python, which has

[1055:21]

functions like get string, even though

[1055:22]

it's not that useful because it's just

[1055:23]

like the input function, but get int uh

[1055:26]

and get float. But also, in the CS50

[1055:28]

library for Python, we have a module

[1055:33]

that specifically makes it easier to use

[1055:35]

SQL from Python code. After all,

[1055:38]

wouldn't it be nice if I could get the

[1055:39]

best of both worlds and implement like

[1055:41]

an interactive program in Python, but

[1055:43]

that uses SQL to actually get back data?

[1055:46]

Or I can build a website that allows

[1055:48]

people to search for TV shows or TV

[1055:50]

stars and actually get that data from a

[1055:52]

database, but use Python to generate the

[1055:55]

web pages themselves. Well, we have some

[1055:57]

documentation for this library here, but

[1055:58]

I'm going to go ahead and use it in real

[1055:59]

time to show you how much more easily

[1056:03]

you can solve certain problems by using

[1056:05]

each tool for what it's good at. So,

[1056:07]

let's go back to VS Code here. Let me

[1056:09]

exit out of SQL light and get back to my

[1056:11]

normal terminal. And let me go ahead and

[1056:14]

let's say minimize

[1056:17]

my terminal here.

[1056:19]

Uh, actually, let's go ahead and open up

[1056:21]

favorites.py, which is where we left off

[1056:24]

before. And recall that in the last

[1056:26]

version of favorites.py, we had simply

[1056:28]

used a dictionary to go about keeping

[1056:29]

track of how many of you said Python or

[1056:31]

C or Scratch. And when I last ran this

[1056:34]

program with Python of favorites.py, pi.

[1056:36]

The answer looked like this. Now notice

[1056:38]

that it's not sorted alphabetically,

[1056:40]

otherwise C would be first. And it's

[1056:42]

also not sorted numerically, otherwise C

[1056:46]

would be second. So it would be nice in

[1056:48]

Python to maybe exercise some control

[1056:50]

over this. But I stopped sort of doing

[1056:51]

that before because it gets very

[1056:52]

annoying quickly. And by this I mean the

[1056:55]

following. Let me go back into VS Code

[1056:57]

here uh and into favorites.py. And if I

[1057:00]

wanted to sort by uh the counts here, I

[1057:03]

could do this. Uh, I could change my

[1057:06]

loop from iterating for favorite in

[1057:08]

counts to favorite in sorted counts. So,

[1057:11]

this is actually not too bad thus far. I

[1057:13]

can actually sort dictionaries pretty

[1057:14]

readily. So, now if I run this and let

[1057:16]

me make my terminal a little bit taller

[1057:18]

so we can see both results. If I run the

[1057:20]

program now, you'll see that it's sorted

[1057:23]

alphabetically by key. So apparently

[1057:25]

when you use the sorted function in

[1057:27]

Python and pass it a dictionary, you can

[1057:29]

still iterate over all of the key value

[1057:31]

pairs in that dictionary, but it's been

[1057:33]

sorted now by key. So that's nice if

[1057:35]

that's to be my goal, but maybe that's

[1057:37]

not really my goal. And here's how

[1057:39]

alternatively I could sort by value, the

[1057:41]

190, the 58, and the 24. I can still use

[1057:44]

the sorted function, but I need to tell

[1057:46]

Python to use a key, a sorting key of

[1057:49]

the counts dictionaries gets function.

[1057:53]

Uh, and then if I run it again, I now

[1057:56]

see it's sorted by value. But darn it,

[1057:57]

it's now sorted in the opposite order. I

[1057:59]

see scratch at 24, then 58, then 190. If

[1058:02]

I want to reverse it, well then I have

[1058:03]

to go up here and add another named

[1058:05]

parameter. Reverse equals true. I can

[1058:08]

run it another time. And now I get the

[1058:11]

result I care about. Long story short,

[1058:13]

this is just very annoying to have to

[1058:14]

use that amount of code to actually

[1058:17]

answer relatively simple questions. And

[1058:19]

this is why we did transition for much

[1058:20]

of today to a declarative language like

[1058:22]

SQL that just let me select what I care

[1058:25]

about in that data. So if I again I go

[1058:28]

back into my database version with

[1058:30]

SQLite 3 of favorites.db. I'll maximize

[1058:32]

my terminal window. What did we do

[1058:34]

before? Well, we can select uh from the

[1058:38]

database

[1058:39]

uh select uh let's see favorite comma

[1058:44]

count star from favorites group by uh

[1058:49]

favorite semicolon whoops.

[1058:53]

Oh,

[1058:59]

sorry. What did we do? We do select

[1059:02]

language, comma, count, star from

[1059:04]

favorites, group by favorite. Oh, damn

[1059:06]

it. What happened? Oh, we deleted it.

[1059:10]

See, this is why you don't use the

[1059:11]

delete or drop command. So, I'm not

[1059:13]

going to demonstrate this again, but

[1059:14]

recall uh before break that when we last

[1059:17]

selected this information, we used the

[1059:18]

group by command to actually group by

[1059:21]

the language in question and we got back

[1059:22]

all the counts. But then we were very

[1059:24]

easily able to reorder things by

[1059:26]

actually just using order by and then

[1059:28]

doing something in ascending order or

[1059:30]

for instance descending order instead.

[1059:32]

Well, now let's actually combine these

[1059:34]

worlds of Python and SQL together to

[1059:37]

write first a program that does just

[1059:39]

that. But to do this, we're going to

[1059:41]

need to restore that database. So let's

[1059:43]

go ahead and do this. Let's remove

[1059:45]

favorites. DB, which is just a file in

[1059:47]

my account. Let's go ahead and run uh

[1059:50]

SQLite 3 of favorites.d DB to create a

[1059:54]

new version thereof. Let's now go ahead

[1059:56]

and change my mode as we did earlier in

[1059:58]

class to CSV. Let's now do import of

[1060:01]

favorites uh CSV into a table called

[1060:04]

favorites. And now let's doquit. And

[1060:07]

when I do ls, okay, now it's back

[1060:09]

favorites.db in addition to today's

[1060:11]

other files. Now let me go ahead and run

[1060:14]

SQLite 3 of favorites. DB. And just as a

[1060:18]

sanity check, select star from favorites

[1060:20]

semicolon. There's all of the data back.

[1060:22]

minus the addition and subtraction that

[1060:24]

we ourselves made earlier manually. And

[1060:26]

let's go ahead and in SQL go ahead and

[1060:29]

do select language,

[1060:32]

count star from favorites

[1060:36]

and group by language,

[1060:39]

but let's order by count star in

[1060:43]

descending order. And that's one of the

[1060:45]

last commands we ran with this file. And

[1060:46]

there is the answer in a single line of

[1060:48]

code instead of some 17 lines of code

[1060:51]

plus or minus some white space here. Can

[1060:54]

we merge now these two ideas? Well,

[1060:55]

let's see how to do this. Let's go back

[1060:58]

into favorites.py here and make a new

[1061:00]

and improved version of it that actually

[1061:02]

uses SQL and no dictionary, no for loop,

[1061:04]

no try except or any of this. Instead,

[1061:07]

let's go ahead and from CS50's own

[1061:09]

library import a SQL function which will

[1061:12]

give me access to this functionality.

[1061:14]

Let's create a variable called DB by

[1061:16]

convention, but I could call it anything

[1061:17]

I want and set it equal to CS50SQL

[1061:20]

function and pass to CS50SQL function

[1061:23]

the path to the database file I want to

[1061:25]

open. This is a little weird, but the

[1061:27]

syntax here is SQLite without the three

[1061:30]

colon slash

[1061:33]

favorites.

[1061:36]

DB. This syntax, otherwise known as a

[1061:38]

URI, is going to allow us to use the SQL

[1061:41]

light lang uh uh protocol in order to

[1061:45]

open up favorites. DB, which is the very

[1061:48]

file I was just experimenting with

[1061:49]

manually in my terminal. Here now is how

[1061:52]

I can execute a SQL query in Python

[1061:55]

using CS50's library. Now, as an aside,

[1061:57]

even though this is indeed meant to be a

[1061:59]

training wheel, CS50's library is just

[1062:01]

easier to use than a lot of the real

[1062:03]

world libraries that makes this

[1062:04]

possible. So because we spend so

[1062:06]

relatively little time on this, we're

[1062:07]

still using this training wheel for

[1062:09]

this. Give me a variable called rows

[1062:11]

because I want to get back all of the

[1062:13]

rows from this table that contain those

[1062:15]

languages and e do db.execute.

[1062:18]

The only function that's useful in the

[1062:20]

CS50 library for SQL is this execute

[1062:22]

function which allows me to write

[1062:24]

literally a line of SQL like select

[1062:27]

language count star uh from favorites

[1062:31]

group by language order by count star uh

[1062:37]

descending order. Just to make my life

[1062:39]

easier, I'm going to add that alias

[1062:41]

trick that we saw before. So as n to

[1062:43]

change the count to the variable n. And

[1062:45]

then here I can just do order by n

[1062:47]

instead. It's a little long, but notice

[1062:50]

that now I'm using SQL as a string that

[1062:54]

I'm passing as an argument to this

[1062:56]

dbexecute

[1062:57]

function. So at the very end of this,

[1062:59]

I've got to close my quote, close my

[1063:02]

parenthesis so as to use one language in

[1063:06]

effect inside of another. Now assuming I

[1063:09]

do get back a temporary tables rows with

[1063:11]

that line of code on line five, let's do

[1063:13]

this. For each row in rows, go ahead and

[1063:17]

do the following. Create a variable

[1063:19]

called language and set it equal to row

[1063:21]

quote unquote language. Then create

[1063:24]

another variable called n, for instance,

[1063:26]

and set it equal to row quote unquote n.

[1063:28]

And then let's just go ahead and print

[1063:30]

out language and n respectively. So what

[1063:33]

does CS50's library do? It returns by

[1063:36]

design a list of rows. Each of those

[1063:40]

rows is a dictionary of key value pairs.

[1063:44]

So when I do for row and rows, this is

[1063:46]

just iterating over a list of values.

[1063:48]

And we've done that over the past couple

[1063:50]

of weeks. Inside of this loop, I'm just

[1063:52]

creating temporarily two variables, uh,

[1063:55]

language and n, to show you that each

[1063:57]

row is indeed a dictionary, which means

[1063:59]

I can index into it using strings like

[1064:01]

quote unquote language and quote unquote

[1064:03]

n because those are the columns that I

[1064:07]

selected using this query up above.

[1064:09]

Strictly speaking, I don't even need

[1064:10]

these variables. I can just get rid of

[1064:12]

that and a little more succinctly just

[1064:14]

pass in row bracket language and then

[1064:18]

row bracket uh n instead. So let me go

[1064:21]

down to my terminal window here, exit

[1064:23]

out of SQLite, run Python of

[1064:26]

favorites.py in this form, enter and I

[1064:29]

get back it would seem

[1064:33]

the same exact answer 190 58 and 24 in

[1064:38]

this case. questions now on this

[1064:40]

co-mingling

[1064:42]

of languages.

[1064:46]

All right, how about one final thing?

[1064:48]

Once we have the ability to like use

[1064:49]

Python, now we can in fact make things

[1064:50]

interactive. So for instance, let me

[1064:52]

close my terminal temporarily. Let me go

[1064:54]

ahead and now ask for some user input.

[1064:56]

So after opening the database, let's do

[1064:58]

this. Let's ask the human using Python's

[1065:00]

input function or equivalently CS50's

[1065:02]

get string function for their favorite

[1065:03]

TV show and store it in that same

[1065:05]

variable. Then let's do a SQL query that

[1065:08]

selects that data. Rows equals

[1065:10]

db.execute

[1065:12]

select and let's see how many people

[1065:14]

selected uh this favorite problem rather

[1065:17]

not TV show how about favorite problem

[1065:19]

from our favorites data set. So select

[1065:21]

count star as n from the favorites

[1065:25]

database where the problem in question

[1065:28]

equals well now I need to put the user's

[1065:32]

input. I don't know what that is yet

[1065:34]

because they haven't typed it in yet.

[1065:35]

So, what I'm going to go ahead and do is

[1065:37]

a placeholder and say favorite close

[1065:41]

quote and make this whole thing an F

[1065:43]

string. Then I'm going to go down here

[1065:45]

and I don't need to iterate because

[1065:46]

ideally I'm just getting back a single

[1065:48]

answer. How many people chose this

[1065:50]

problem as their favorite? So, I'm going

[1065:51]

to say that uh the row I care about is

[1065:55]

simply the first row. So, rows is a

[1065:58]

list. So, rows bracket zero is the first

[1066:00]

and only row in that list. And then

[1066:02]

let's go ahead and print out row quote

[1066:05]

unquote n. Let's see the result here and

[1066:08]

then see what happens. Let me put some

[1066:12]

single quotes here and single quotes

[1066:14]

here. Let me open my terminal. Let me do

[1066:17]

python of favorites.py

[1066:19]

and I'll say hello, world. Enter. And as

[1066:24]

before at the start of class, 42 of you

[1066:26]

like that. However, this is not not not

[1066:30]

how you should ever write SQL code in

[1066:32]

Python. What could go wrong with this

[1066:35]

code?

[1066:37]

Nothing went wrong a moment ago, but

[1066:39]

what could go wrong?

[1066:42]

Yeah, the user input. How so?

[1066:49]

>> True. I don't know what those are yet,

[1066:50]

but we're about to go there. What even

[1066:52]

more simplistically could go wrong by

[1066:54]

plugging in the user's input here? Yeah,

[1066:58]

>> like hello.

[1067:02]

>> Exactly. If I inputed the other problem

[1067:04]

we played with, hello, it's me where it

[1067:06]

was it apostrophe s that if interpolated

[1067:09]

right here is clearly going to confuse

[1067:12]

the uh single quotes such that who knows

[1067:14]

what's going to come back. Now, in the

[1067:16]

best case, the code might just not work

[1067:18]

and I'll get some kind of error in on

[1067:20]

the screen, which is not great for the

[1067:21]

user because the program is not going to

[1067:22]

be useful. There's no user friendly

[1067:24]

error message. But in the worst case,

[1067:26]

the user could do something incredibly

[1067:28]

malicious if you are simply blinding

[1067:31]

blindly trusting user input and plugging

[1067:34]

their input into a SQL query that you

[1067:36]

yourself constructed. Why? What if the

[1067:39]

user types something crazy like the word

[1067:41]

delete or drop or update or any of those

[1067:45]

destructive commands that we saw earlier

[1067:47]

and somehow tricks your code into

[1067:50]

executing maybe the select but then

[1067:52]

eventually an additional query like a

[1067:55]

delete. Maybe they type in a semicolon

[1067:57]

and then delete or a semicolon and then

[1067:58]

drop or something like that. This is the

[1068:00]

biggest threat to taking user input and

[1068:04]

trusting it in the context of databases.

[1068:06]

And it's called uh as one of your

[1068:09]

classmates knows already, what's known

[1068:11]

as a SQL injection attack. A SQL

[1068:14]

injection attack is the ability for an

[1068:17]

adversary or an unknowing user to

[1068:20]

somehow inject code into your database.

[1068:23]

A SQL injection attack then might look

[1068:25]

something like this in the real world.

[1068:27]

here for instance is like the login

[1068:29]

screen to github.com. Um they do

[1068:31]

actually use SQL among other languages

[1068:33]

underneath the hood I believe not

[1068:34]

necessarily for this but suppose they

[1068:36]

did and when logging into github.com

[1068:38]

you're prompted for your username or

[1068:40]

email address and then of course your

[1068:41]

password. Well, what if I know a little

[1068:43]

something about SQL and suppose for the

[1068:45]

sake of discussion, GitHub is using SQL

[1068:47]

light, which they're not using because

[1068:48]

it's not meant for massive large uh

[1068:50]

massive data sets like this. But suppose

[1068:53]

they are. And just to be malicious, I

[1068:56]

type in my username mailinharbor.edu,

[1068:58]

but then I use a single quote and then

[1069:01]

dash dash. Well, the single quote is

[1069:03]

there, me being an adversary in the

[1069:04]

story, because maybe I can confuse their

[1069:06]

code by closing their quotes sooner than

[1069:09]

they intended. And we haven't talked

[1069:11]

about this yet, but it turns out that

[1069:12]

dash in SQL is the comment character. So

[1069:15]

it's like hash in Python or slash and C.

[1069:18]

This in SQL means ignore everything to

[1069:20]

the right. That alone can be used fairly

[1069:23]

maliciously as follows. Here, for

[1069:25]

instance, could be the code that GitHub

[1069:27]

is using underneath the hood, whereby

[1069:30]

they might have some Python code, and

[1069:32]

heck, maybe they're using the CS50

[1069:33]

library that executes this pre-made

[1069:35]

query. select star from the users table

[1069:38]

where the username equals this question

[1069:40]

mark and the password equals this

[1069:42]

question mark passing in username and

[1069:44]

password for instance. Uh but if they

[1069:47]

are trusting the username and password I

[1069:50]

typed in and just plugging it right

[1069:51]

there, they could be vulnerable to

[1069:53]

indeed a SQL injection attack. For

[1069:55]

instance, this code we'll soon see is

[1069:57]

actually the right way to do it. But

[1069:58]

suppose they were doing it with fstrings

[1070:01]

like I started to in my version of

[1070:03]

favorites.py. Same thing. Select star

[1070:05]

from users. where username equals this

[1070:07]

username and password equals this

[1070:09]

password and the little f here means

[1070:11]

here's a format string. What could go

[1070:13]

wrong? Well, let me actually paste in

[1070:15]

the mail at harbor.edu single quote-

[1070:18]

dash text here. Notice that this single

[1070:21]

quote and this single quote are meant to

[1070:22]

surround the username. And same thing

[1070:24]

for the password there. But watch what

[1070:26]

happens when I type in my data. Mail at

[1070:28]

harbor.edu single quote. So this would

[1070:30]

seem to finish the thought prematurely.

[1070:33]

and then it says dash dash and so that

[1070:36]

just means ignore everything else. And

[1070:38]

so the effect here is essentially to

[1070:39]

gray out all of that stuff because it's

[1070:41]

effectively been commented out. So what

[1070:43]

GitHub ends up doing accidentally in

[1070:45]

this case is selecting star from users

[1070:47]

where username is mailon at harbor.edu

[1070:49]

irrespective of what his password

[1070:51]

actually is. And if you assume that down

[1070:53]

here they've got some conditional logic

[1070:55]

like well if we get back some rows that

[1070:57]

means that mail is in fact a registered

[1070:59]

user. Go ahead and log him in. We don't

[1071:02]

know what the code looks like, so it's

[1071:03]

dot dot dot. You've just enabled anyone

[1071:05]

on the internet to log in as me or

[1071:07]

anyone else just by suffixing their

[1071:10]

input with a single quote and dash dash.

[1071:13]

And that's the least of our concerns. If

[1071:14]

we additionally went in there and maybe

[1071:16]

instead of dash we put a semicolon and

[1071:18]

then delete from users or drop users, we

[1071:21]

could cause massive havoc on their

[1071:23]

database. This happens all the time.

[1071:25]

Even now in the current year, you can

[1071:27]

Google around and see examples of

[1071:29]

companies that have not used proper

[1071:31]

sanitization of user input. And it's not

[1071:34]

just the intern. It's like random people

[1071:35]

on the internet are accessing or

[1071:37]

destroying their data maliciously. So

[1071:40]

what is the solution to a problem like

[1071:42]

this? Well, one, do not use format

[1071:45]

strings in Python to simply plug in user

[1071:47]

input. But the more important lesson is

[1071:50]

never trust users input. either they're

[1071:52]

going to do something accidentally or

[1071:54]

they're going to do something

[1071:55]

maliciously and you do not want that to

[1071:57]

happen. So the solution then is to use a

[1071:59]

library. Almost always use a library.

[1072:01]

This is not a wheel you should reinvent

[1072:03]

yourself. And by library I mean

[1072:04]

something like this. If you instead use

[1072:07]

a library like CS50s and you don't just

[1072:10]

use fstrings, you'll see in a moment you

[1072:12]

use question marks. What will happen is

[1072:14]

this. When the user goes and types in

[1072:16]

mailinharvard.edu single quote dash,

[1072:18]

that's fine. and let them put weird

[1072:20]

scary characters like single quotes in

[1072:22]

their input. The library will take

[1072:24]

charge of escaping user input. So

[1072:27]

anything dangerous in their input will

[1072:29]

be changed from one single quote to two

[1072:31]

because we saw earlier today that that's

[1072:33]

how you escape a character. And that

[1072:35]

means that now what you have is in

[1072:37]

effect my username is apparently

[1072:39]

meenhar.edu

[1072:41]

apostrophe dash and that's my username.

[1072:43]

Well that's obviously not a real email

[1072:45]

address. It's not a real username. This

[1072:47]

is just going to return false. No rows

[1072:49]

are actually going to come back. And the

[1072:51]

way to do this now in our favorites

[1072:52]

example analogously is in VS Code here

[1072:56]

to actually go up into this uh execute

[1072:59]

line. Don't use an F string. Change the

[1073:03]

value of problem to be a placeholder

[1073:05]

instead and then pass into this execute

[1073:10]

function one or more arguments that will

[1073:13]

be substituted in for that question

[1073:15]

mark. And this is not a CS50 thing. This

[1073:16]

is a uh industry convention whereby you

[1073:19]

quite often use literally a question

[1073:21]

mark. And that means that whatever this

[1073:23]

variable's value is will get plugged

[1073:25]

into that question mark for you. But the

[1073:28]

single quotes will be added. Any

[1073:29]

dangerous characters will be escaped for

[1073:31]

you. And at that point, you can trust

[1073:33]

that the user can type in anything they

[1073:34]

want. Your code is not going to break.

[1073:37]

You can see hints of this actually in

[1073:39]

the real world. If you've ever gone to a

[1073:40]

website and they tell you like, oh, you

[1073:42]

can't you like for passwords for

[1073:44]

instance, like all of us probably

[1073:45]

intuitively know that you should have

[1073:46]

pretty long uh hard to guess passwords

[1073:49]

with letters and numbers and punctuation

[1073:51]

symbols. Sometimes websites very

[1073:53]

stupidly prohibit you from using certain

[1073:56]

punctuation symbols, which should drive

[1073:58]

you nuts because there's no

[1073:59]

computational reason that you have to

[1074:00]

put the onus on the user to sanitize

[1074:02]

their own input. But quite likely those

[1074:04]

websites have kind of learned part of

[1074:06]

this lesson and they know some

[1074:07]

characters can be dangerous in SQL like

[1074:09]

semicolons or single quotes or the like

[1074:12]

and they just don't want you to ever

[1074:14]

type those in. Even though there are

[1074:15]

solutions to this problem, use a library

[1074:17]

that someone else smarter than you u

[1074:20]

with more history of writing code than

[1074:22]

you has used that's open source so that

[1074:24]

many people have seen it and banged on

[1074:26]

it over the years so that this problem

[1074:28]

is not something you're vulnerable to.

[1074:32]

questions then on what these here SQL

[1074:34]

injection attacks

[1074:36]

are all about. Yeah,

[1074:38]

>> I guess you're telling the user what not

[1074:40]

to use, you're also telling them what

[1074:42]

system you're using and so maybe that

[1074:46]

>> Good point. So if by also telling people

[1074:49]

what characters they shouldn't use,

[1074:50]

you're leaking information because a

[1074:51]

smart adversary might know, oh well, if

[1074:53]

they don't want me using that symbol,

[1074:54]

they're probably using this language or

[1074:55]

this technology. Yes, no good comes from

[1074:57]

telling the world more information than

[1074:59]

they need to know. So that's another

[1075:00]

good paranoia to have. How about one

[1075:03]

other issue before we come full circle

[1075:04]

to the SQL injection attacks. There's

[1075:07]

another challenge with relational

[1075:08]

databases and with SQL uh itself, namely

[1075:11]

race conditions. This isn't so much a

[1075:13]

problem when I'm writing a a little

[1075:14]

program here on my own computer. uh but

[1075:17]

when you're running SQL code on a

[1075:19]

database in the real world in the cloud

[1075:22]

where you have many different servers

[1075:24]

talking to that database and many

[1075:25]

different users uh talking to those web

[1075:27]

servers as is going to be the case at

[1075:29]

Meta and Google and Microsoft and any

[1075:31]

number of popular companies nowadays and

[1075:33]

even some of CS50's own apps uses

[1075:35]

centralized SQL databases where if

[1075:37]

multiple people are trying to do the

[1075:39]

same thing on them at the same time

[1075:40]

submit their homework run check 50 we

[1075:42]

too are vulnerable to what are called

[1075:44]

race conditions. So what is a race

[1075:45]

condition? Well, the way I learned this

[1075:47]

back in the day when taking a course on

[1075:48]

databases and operating systems uh more

[1075:50]

generally was to think of a scenario

[1075:52]

like this. Maybe in your dorm, you and

[1075:54]

your roommates have a little dorm fridge

[1075:55]

and you're both in the habit of really

[1075:57]

liking to drink milk as the story was

[1075:59]

told to us. And so maybe one of you

[1076:00]

comes home from class one day and you

[1076:02]

get get to your room, look in the

[1076:04]

fridge, there's no milk in there. And so

[1076:05]

you decide to walk across the street to

[1076:07]

CVS or some other store to get milk.

[1076:09]

Meanwhile, your roommate comes home from

[1076:11]

their class and opens the fridge and

[1076:12]

it's like, "Oh, we're out of milk. Let

[1076:14]

me go to the store, too." And for the

[1076:16]

sake of the story, they go to a

[1076:17]

different store altogether so that you

[1076:18]

don't run into each other and the

[1076:20]

problem solves itself. So now both of

[1076:21]

you are on your way to a store to get

[1076:23]

milk. Time passes. You both come home.

[1076:25]

One of you puts a jug of milk in the

[1076:26]

fridge. The other one gets home and is

[1076:28]

like, "Ah, damn it." Like we already got

[1076:29]

milk. I can't fit this milk in the

[1076:32]

fridge or now it's too much milk. We

[1076:33]

don't really like milk this much. It's

[1076:34]

going to go bad. Like very bad outcome

[1076:36]

here. Having too much milk is the moral

[1076:38]

of the story. But what's the what stupid

[1076:42]

story? What's the What's the real

[1076:45]

takeaway? Why did we find ourselves in a

[1076:47]

situation where we ended up with too

[1076:49]

milk, too much milk?

[1076:51]

>> We didn't know what the other person

[1076:53]

>> we didn't know what the other person was

[1076:54]

doing. And to really geek out on this,

[1076:56]

we inspected the state of a variable

[1076:58]

that was in the process of being updated

[1077:01]

by someone else. And this is a thing in

[1077:03]

computing as far back as Scratch. Recall

[1077:05]

with Scratch, you could have multiple

[1077:06]

scripts running at the same time for a

[1077:08]

single sprite because Scratch in effect

[1077:09]

is multi-threaded. You can have a single

[1077:12]

sprite doing multiple things in parallel

[1077:15]

by having those multiple scripts.

[1077:16]

Similarly, here your room is sort of

[1077:18]

multi-threaded because you have two

[1077:20]

independent beings who can both go to

[1077:21]

the store, solve the same problem in

[1077:23]

parallel. The problem though is that if

[1077:25]

one is not aware that the other is doing

[1077:26]

that work already, you might make poor

[1077:28]

decisions. So, in the real world, what

[1077:30]

should the first roommate have done

[1077:31]

after inspecting the state of the

[1077:33]

refrigerator and realizing, "Oh, we're

[1077:34]

out of milk." Okay, call the other

[1077:37]

roommate or maybe more simply like put a

[1077:39]

note on the door or like maybe

[1077:41]

dramatically lock the refrigerator

[1077:43]

somehow. And in fact, that's a term of

[1077:44]

art in databases is to actually use a

[1077:46]

database lock so that if you are in the

[1077:49]

process of updating the value in the

[1077:51]

database, lock it so that no one else

[1077:53]

can inspect the value of that database

[1077:55]

and potentially make a poor decision. So

[1077:59]

when might this actually happen in the

[1078:00]

real world rather than the contrived

[1078:02]

milk example. So there are a lot of

[1078:04]

social media posts nowadays that are

[1078:05]

quite popular. To this day, as of today,

[1078:07]

this is still the most popular Instagram

[1078:09]

post for instance. And imagine when this

[1078:10]

was first posted, hundreds, thousands,

[1078:13]

hundreds of thousands of people might

[1078:14]

have all been clicking the heart icon

[1078:16]

essentially at the same time. Now, Meta

[1078:19]

uh the company behind Instagram

[1078:20]

presumably has lots and lots of

[1078:22]

different servers, but let's suppose for

[1078:24]

the sake of discussion they have a

[1078:25]

single database, which is not true, but

[1078:27]

the danger is still there. Even with

[1078:29]

multiple databases, all of these

[1078:31]

different web servers are talking to the

[1078:33]

same database. And suppose those those

[1078:35]

servers are using Python code and hey

[1078:37]

the CS50 library that might look a

[1078:39]

little something like this in order to

[1078:41]

decide how to update the total number of

[1078:43]

likes for an Instagram post. The first

[1078:45]

line of code running on meta servers

[1078:48]

might say this. Get these rows as

[1078:50]

follows. execute a query like select the

[1078:53]

current number of likes from the posts

[1078:55]

table where the ID of the post is

[1078:57]

whatever it is 1 2 3 4 5 6 whatever

[1079:00]

notice no SQL injection attacks uh

[1079:02]

possible here because I'm using the

[1079:03]

placeholder not an F string then the

[1079:06]

next line of code running on meta server

[1079:08]

maybe just stores in a variable just to

[1079:09]

make the code more readable uh the first

[1079:12]

rows likes column so it's again it's the

[1079:14]

CS50 library in the story rows is a list

[1079:17]

of dictionaries so this is the first

[1079:19]

such element in the list and this is the

[1079:22]

likes column in the column we just

[1079:26]

selected the temporary table. Lastly,

[1079:28]

what do we want to do? Well, we want to

[1079:29]

plus+ essentially that total. So, we

[1079:32]

update the post table setting the number

[1079:34]

of likes equal to this question mark

[1079:36]

where the ID equals this question mark.

[1079:38]

And we didn't see this already, but the

[1079:40]

CS50 library supports indeed multiple

[1079:42]

arguments after the SQL string. I'm

[1079:44]

going to update the number of likes to

[1079:45]

be likes plus one. Plugging in the same

[1079:48]

ID of that post. So in short, take on

[1079:51]

faith that it's quite common that in

[1079:52]

order to achieve one small goal like

[1079:54]

updating the number of likes stands to

[1079:56]

reason you might need to do two database

[1079:58]

queries or three lines of code. Now if

[1080:00]

these lines of code are executing on

[1080:02]

multiple web servers, you could

[1080:04]

certainly imagine that if people are

[1080:05]

hitting the the like button pretty much

[1080:07]

at the same time, maybe one server is

[1080:10]

going to execute this first line of code

[1080:12]

and it's going to get its answer. Maybe

[1080:14]

there's a hundred likes at this point in

[1080:15]

the story. And then just by chance on

[1080:17]

another server, this line of code is

[1080:19]

also executed, but it too gets the same

[1080:22]

answer. There's currently a hundred

[1080:23]

likes. Meanwhile, the first server in

[1080:26]

the story continues to do its execution

[1080:29]

of code such that it updates the number

[1080:30]

of likes from 100 to 101. But because

[1080:33]

the other server was essentially running

[1080:35]

the same code in parallel, it's going to

[1080:37]

make the same mathematical decision and

[1080:38]

update the number of posts, the number

[1080:40]

of likes from 100 to 101. But at this

[1080:43]

point in the story, the number of likes

[1080:44]

should obviously be 10. and two, so

[1080:47]

we've lost data. And that's one of the

[1080:49]

dangers of a race condition is that

[1080:51]

you'll end up with an inaccurate result.

[1080:53]

And for a company like Meta, they don't

[1080:55]

want to go losing data like likes like

[1080:57]

this. Like that actually drives

[1080:58]

engagement and so forth. And so like

[1081:00]

that's genuinely a technical, if not a

[1081:02]

business problem as well. So it's

[1081:03]

analogous to sort of the milk problem,

[1081:05]

but actually at scale. So what's the

[1081:07]

solution? There's a bunch of different

[1081:09]

ways, but conceptually, we just want to

[1081:11]

lock the database when this logic is

[1081:13]

being executed such that when one server

[1081:14]

is updating the number of likes, no one

[1081:16]

else should be allowed to update the

[1081:18]

like count at the same time. Now, that's

[1081:20]

a little crazy for someone as big as

[1081:22]

Meta because you're really just

[1081:23]

serializing all of these likes and

[1081:25]

slowing things down. So, there's more

[1081:27]

fine grain control nowadays, namely

[1081:29]

called transactions, where you can

[1081:30]

essentially lock not the whole table and

[1081:32]

certainly not the whole database, but

[1081:34]

just the row in question, for instance.

[1081:36]

And so you would use commands in SQL

[1081:39]

like begin transaction and then execute

[1081:41]

the lines of code that you want. And

[1081:43]

then when you're ready to commit it,

[1081:44]

that is save it, you use the commit

[1081:46]

command. But if something goes wrong or

[1081:48]

you get interrupted, you can actually

[1081:49]

roll back the whole thing. And what this

[1081:52]

kind of code does in effect by using

[1081:55]

more verbose uh CS50 and Python code

[1081:57]

like this is you can ensure that those

[1082:00]

three lines of code inside or

[1082:01]

technically the two database queries

[1082:03]

inside will either both be executed or

[1082:05]

not at all. They will not be

[1082:07]

interrupted. And that's the fundamental

[1082:09]

solution to this problem analogous to

[1082:11]

putting a lock on the fridge or by

[1082:14]

leaving a note or calling your roommate

[1082:16]

preventing them from making the same

[1082:18]

decision themselves.

[1082:21]

questions then on these race conditions

[1082:24]

the solutions again even though this

[1082:25]

won't be gerine for CS50 simply using

[1082:27]

techniques like locks and what we called

[1082:30]

transactions

[1082:33]

no all right then a final moment to end

[1082:36]

on uh we would not be a computer science

[1082:37]

course if we didn't introduce you to a

[1082:39]

few pieces of CS cannon uh here is a

[1082:42]

sort of meme that's circulated for years

[1082:44]

when it comes to like optical character

[1082:46]

recognition OCR of like toll booths

[1082:49]

trying to detect your license plate

[1082:50]

automatically

[1082:51]

This is someone trying to have a funny

[1082:52]

old time tricking the city into deleting

[1082:55]

their database altogether. Because if

[1082:57]

you're just scanning this off of

[1082:58]

someone's license plate or front of the

[1082:59]

car and just blindly plugging it in

[1083:01]

without sanitizing their input, escaping

[1083:03]

their input with something like a good

[1083:04]

library, you might very well drop the

[1083:07]

entire database. As an aside, something

[1083:09]

did something similar too where I think

[1083:10]

they made their license plate null. NL,

[1083:13]

which just confused the heck out of the

[1083:14]

system, too, because the programmers

[1083:16]

didn't understand why null was all over

[1083:17]

the place when lights were being run and

[1083:19]

whatnot. And lastly, a very famed uh

[1083:22]

character in the world of XKCD as

[1083:24]

computer science circles goes is this.

[1083:26]

So we'll end as we've done before on an

[1083:28]

awkward silence as you process this here

[1083:30]

canonical CS joke.

[1083:39]

>> Now you two know who Bobby Tables is.

[1083:42]

All right, that's it for week seven.

[1083:43]

We'll see you next time.

[1083:47]

Heat. Heat.

[1085:03]

All right. This is CS50 and this is our

[1085:06]

lecture on artificial intelligence or

[1085:08]

AI. Particularly for all of those family

[1085:10]

members who are here in the audience

[1085:11]

with us for the first time. In fact, uh

[1085:13]

for those students among us, maybe a

[1085:14]

round of applause for all of the family

[1085:16]

members who have come here today to join

[1085:18]

you.

[1085:19]

Nice. So nice to see everyone. And as

[1085:22]

CS50 students already know, it's sort of

[1085:24]

a thing in programming circles to uh

[1085:27]

have a rubber duck on your desk. Indeed,

[1085:28]

a few weeks back, we gave one to all

[1085:30]

CS50 students. And the motivation is to

[1085:32]

have someone something to talk to in the

[1085:35]

presence of a bug or mistake in your

[1085:36]

code or confusion you're having when it

[1085:38]

comes to solving some problem. And the

[1085:40]

idea is that in the absence of having a

[1085:42]

friend, family member, TA of whom you

[1085:44]

can ask questions is to literally

[1085:46]

verbalize your confusion, your question

[1085:48]

to this inanimate object on your desk.

[1085:50]

And in that process of verbalizing your

[1085:52]

own confusion and explaining yourself,

[1085:53]

quite often does that proverbial light

[1085:55]

bulb go off over your head and voila,

[1085:57]

problem is solved. Now, as CS50 students

[1086:00]

also know, we sort of virtualized that

[1086:02]

rubber duck over the past few years and

[1086:04]

most recently in a form of uh this guy

[1086:07]

here. So, in students programming

[1086:09]

environment within CS50, a tool called

[1086:11]

Visual Studio Code at a URL of CS50.dev,

[1086:14]

they have a virtual rubber duck

[1086:16]

available available to them at all

[1086:18]

times. And early on in the very first

[1086:20]

version of this rubber duck, it was a

[1086:22]

chat window that looked like this. And

[1086:24]

if students had a question, they could

[1086:25]

simply type into the chat window

[1086:26]

something like, "I'm hoping you can help

[1086:28]

me solve a problem." And for multiple

[1086:30]

years, all the CS50 duck did was respond

[1086:33]

with one, two, or three quacks. Uh we

[1086:35]

have anecdotal evidence to suggest that

[1086:37]

that alone was enough for answering

[1086:39]

students questions because it was in

[1086:40]

that process of like actually typing out

[1086:42]

the confusion that you realize, oh, I'm

[1086:45]

doing something silly and you figure it

[1086:47]

out on your own. But of course now that

[1086:49]

we live in an age of chatgbt and claude

[1086:51]

and gemini and all of these other AI

[1086:53]

based tools came as no surprise perhaps

[1086:55]

when in 2023 this same duck started

[1086:58]

responding to students in English and

[1086:59]

that now is the tool that they have

[1087:01]

available which is in effect meant to be

[1087:03]

a less helpful version of chat GPT one

[1087:06]

that doesn't just spoil answers outright

[1087:07]

but tries to guide them to solutions

[1087:09]

akin to any good teacher or tutor and so

[1087:12]

today's lecture is indeed on just that

[1087:14]

and the underlying building blocks that

[1087:16]

make possible that their rubber duck in

[1087:18]

all of the AI with which we're all

[1087:20]

increasingly familiar, namely generative

[1087:22]

artificial intelligence using this

[1087:23]

technology known as AI to generate

[1087:26]

something, whether that's images or

[1087:28]

sounds or video or text. And in fact,

[1087:30]

what we thought we'd do to get everyone

[1087:32]

involved early on is if you uh have a

[1087:34]

phone uh by your side, if you'd like to

[1087:36]

go ahead and scan this QR QR code here,

[1087:40]

and that's going to lead you to a

[1087:41]

polling station where you can buzz in

[1087:43]

with some answers. Um, CS50's preceptor

[1087:45]

Kelly is going to kindly join me here on

[1087:47]

stage to help run the keyboard. And what

[1087:49]

we're about to do is play a little game

[1087:50]

and see just how good we humans are

[1087:53]

right now at distinguishing AI from

[1087:56]

reality. And so we'll borrow some data

[1087:58]

from uh the New York Times, which a

[1088:00]

couple years back actually published

[1088:01]

some examples of AI and not AI, and

[1088:04]

we'll see just how good this this

[1088:06]

technology has gotten. So here we have

[1088:07]

two photographs on the screen. In a

[1088:09]

moment, you'll be asked on your phone,

[1088:11]

if you were successful in scanning that

[1088:13]

code, which one of these is AI, left or

[1088:16]

right.

[1088:18]

So hopefully on your phone here, if you

[1088:20]

want to go ahead and swipe to the next

[1088:21]

screen, we'll activate the poll here. In

[1088:23]

a moment, you should see on your phone a

[1088:25]

prompt inviting you to select left or

[1088:31]

right.

[1088:33]

And feel free to raise your hand if

[1088:36]

you're not seeing that. But it looks

[1088:38]

like the responses are coming in. And at

[1088:40]

the risk of spoiling, it looks like 70%

[1088:42]

plus of you think it is the answer on

[1088:44]

the right. And if Kelly, maybe we could

[1088:45]

swipe back to the two photographs. In

[1088:47]

this particular case, yes, it was in

[1088:49]

fact the one on the right. Maybe it

[1088:50]

looked a little too good or maybe a

[1088:52]

little too unreal. Maybe. Let's see

[1088:55]

maybe a couple of other examples. So,

[1088:57]

same QR code. No need to rescan. Let's

[1088:59]

go ahead and pull up these two examples.

[1089:00]

Now, two photographs, same question.

[1089:02]

Which of these is AI? Left or right?

[1089:07]

left

[1089:09]

or right.

[1089:12]

All right, want to take a look at the

[1089:13]

chart, see what the responses are coming

[1089:14]

in a little closer in this case, but a

[1089:16]

majority of you think the answer is in

[1089:18]

fact left here, though 5% of you were

[1089:20]

truthfully admitting that you're unsure.

[1089:22]

But Kelly, if you want to swipe back to

[1089:23]

the photos, the answer this time was in

[1089:25]

fact a trick question. They were both in

[1089:27]

fact AI, which perhaps speaks to just

[1089:29]

how good this technology is already

[1089:31]

getting. Neither of these faces exists

[1089:34]

in the real world. It was synthesized

[1089:36]

based on lots of training data. So, two

[1089:38]

photographs that look like humans but do

[1089:41]

not in fact exist. How about one more?

[1089:42]

This time focusing on text, which will

[1089:44]

be uh the focus, of course, underlying

[1089:46]

our duck. Did a fourth grader write this

[1089:48]

or the new chatbot? Here are two final

[1089:50]

examples. Uh same code as before, so no

[1089:52]

need to rescan. And here are the texts.

[1089:54]

Essay one. I like to bring a yummy

[1089:56]

sandwich and a cold juice box for lunch.

[1089:58]

And sometimes I'll even pack a tasty

[1089:59]

piece of fruit or a bag of crunchy

[1090:01]

chips. As we eat, we chat and laugh and

[1090:04]

catch up on each other's day. dot dot

[1090:05]

dot. C. Essay two. My mother packs me a

[1090:08]

sandwich, a drink, fruit, and a treat.

[1090:09]

When I get into a lunchroom, I find an

[1090:11]

empty table and sit there and eat my

[1090:13]

lunch. My friends come and sit down with

[1090:15]

me. dot dot dot. The question now,

[1090:17]

lastly, is which of these is AI? One or

[1090:20]

two?

[1090:22]

Essay one or two? The bars here are

[1090:26]

duking themselves out. Looks like a

[1090:28]

majority of you say essay one. Let's go

[1090:29]

back to the text. And someone of you who

[1090:31]

one of you who says essay 1, why if you

[1090:33]

want to raise a quick hand? Why essay

[1090:35]

one? Yeah.

[1090:38]

>> Okay. And so essay 2 looks more like you

[1090:40]

would write. And can I ask what grade

[1090:41]

you are in?

[1090:43]

>> A fifth grader. So is this a new fifth

[1090:45]

grader or not? The answer here in fact

[1090:49]

is that essay one is the AI because

[1090:52]

indeed essay 2 is more akin to what a

[1090:54]

fourth or if I may a fifth grader would

[1090:55]

write. And I dare say there are maybe

[1090:57]

some telltale signs. I'm not sure a

[1090:59]

typical fourth grader or fifth grader

[1091:01]

would catch up on each other's day in

[1091:03]

the vernacular that we see in essay one.

[1091:05]

But suffice it to say this game is not

[1091:07]

something we can play for in the years

[1091:08]

to come because it's just going to get

[1091:10]

too hard to discern something that's AI

[1091:13]

generated or not. And so among our goals

[1091:15]

for today is really to give you a better

[1091:17]

sense of not just how technologies like

[1091:18]

this duck and these games that we've

[1091:19]

played here with images and text work,

[1091:21]

but really what are the underlying

[1091:23]

principles of artificial intelligence

[1091:24]

that frankly have been with us and have

[1091:26]

been been developing for decades and

[1091:28]

have really now come to a head in recent

[1091:30]

years thanks to advances in research,

[1091:32]

thanks to all the more cloud computing,

[1091:34]

thanks to all the more uh memory and

[1091:36]

disk space and information sheer volume

[1091:39]

thereof that we have at our disposal

[1091:40]

that can be used to train all of these

[1091:42]

here technologies. ies. So that their

[1091:44]

duck is built on a fairly complicated uh

[1091:47]

architecture that looks a little

[1091:48]

something like this where here's a

[1091:49]

student using one of CS50's tools.

[1091:51]

Here's a website with which CS50

[1091:53]

students are familiar called CS50.AI AI

[1091:55]

where we the staff wrote a bunch of code

[1091:56]

that actually talks to what are called

[1091:58]

APIs, application programming

[1092:00]

interfaces, thirdparty services by

[1092:02]

companies like Microsoft and OpenAI that

[1092:05]

really have been doing the hard work of

[1092:06]

developing these models as well as some

[1092:08]

local sweet uh some local sauce that we

[1092:10]

CS50 add into the mix to make it

[1092:13]

specific the ducks answers to CS50

[1092:15]

itself. But what we've essentially been

[1092:17]

doing is uh something that with which

[1092:18]

you might be familiar in part prompt

[1092:20]

engineering which has started popping up

[1092:22]

for better or for worse on uh LinkedIn

[1092:24]

profiles everywhere. And prompt

[1092:25]

engineering really it's not so much a

[1092:27]

form of engineering as it is a form of

[1092:29]

asking good questions and being detailed

[1092:32]

in your question giving context to the

[1092:34]

underlying AI so that the answer with

[1092:36]

high probability is what you want back.

[1092:38]

And so there's two terms in this world

[1092:40]

of prompt engineering that are worth

[1092:42]

knowing about. So in CS50 has leveraged

[1092:44]

both of these to implement that duck. We

[1092:46]

for instance wrote what's called a

[1092:47]

system prompt which are instructions

[1092:48]

written by us humans often in English

[1092:51]

that sort of nudge the underlying AI

[1092:53]

technology to have a certain personality

[1092:55]

or a specific domain of expertise. For

[1092:58]

instance, we CS50 have written a system

[1093:01]

prompt essentially that looks like this.

[1093:02]

In reality, it's like a lot of lines

[1093:04]

long nowadays, but the essence of it is

[1093:06]

this. You are a friendly and supportive

[1093:08]

teaching assistant for CS50.

[1093:10]

You are also a rubber duck and that is

[1093:13]

sufficient to turn an AI into a rubber

[1093:15]

duck. It turns out answer student

[1093:16]

questions only about CS50 in the field

[1093:18]

of computer science. Do not answer

[1093:20]

questions about unrelated topics. Do not

[1093:22]

provide full answers to problem sets as

[1093:23]

this would violate academic honesty.

[1093:25]

Answer this question colon and after

[1093:28]

that preamble if you will aka system

[1093:30]

prompt we effectively copy paste

[1093:32]

whatever question a student has typed in

[1093:34]

otherwise known as a user prompt. And

[1093:36]

that is why the duck behaves like a duck

[1093:38]

in our case and not a cat or a dog or a

[1093:40]

PhD, but rather something that's been

[1093:42]

attenuated to the particular goals we

[1093:44]

have pedagogically in the course. And in

[1093:46]

fact, those of you who are CS50 students

[1093:49]

might recall from quite some weeks ago

[1093:51]

in week zero when we first introduced

[1093:53]

the course uh to the class, we had code

[1093:57]

that we whipped up that day that

[1093:58]

ultimately looked a little something

[1094:00]

like this. And I'll walk through it

[1094:01]

briefly line by line. But now on the

[1094:03]

heels of having studied some Python in

[1094:05]

CS50, this year code that I whipped up

[1094:07]

in the first lecture might make now a

[1094:09]

bit more sense. In that first lecture,

[1094:11]

we imported OpenAI's own library code

[1094:14]

that a third party company wrote to make

[1094:16]

it possible for us to implement code on

[1094:18]

top of theirs. We created a variable

[1094:20]

called client in week zero and this gave

[1094:22]

us access to the OpenAI client. That is

[1094:24]

software that they wrote for us. We then

[1094:26]

defined in week zero a user prompt which

[1094:28]

came from the user using the input

[1094:29]

function with which CS50 students are

[1094:31]

now familiar. And then we defined this

[1094:33]

system prompt that day where I said

[1094:34]

limit your answer to one sentence.

[1094:36]

Pretend you're a dot dot dot cat I think

[1094:38]

was the persona of the day. And then we

[1094:40]

used some bit more arcane code here. But

[1094:42]

in essence we created a variable called

[1094:44]

response which was meant to represent

[1094:46]

the response from OpenAI server. We used

[1094:48]

client.responses.create create which is

[1094:50]

a function or method that OpenAI gives

[1094:53]

us that allows us to pass in three

[1094:54]

arguments. The input from the user that

[1094:57]

is the user prompt the instructions from

[1094:59]

us that is the system prompt and then

[1095:01]

the specific model or version of AI that

[1095:03]

we wanted to use and the last thing we

[1095:05]

did that day was print out

[1095:07]

response.output_ext

[1095:09]

and that's how we were able to answer

[1095:10]

questions like what is CS50 or the like.

[1095:14]

So, we've seen all of that before, but

[1095:16]

we didn't talk about that week exactly

[1095:18]

how it was working or what more we could

[1095:20]

actually do with it. And so, in fact,

[1095:22]

what I thought we'd do today is peel

[1095:24]

back a layer that we've not allowed into

[1095:26]

the course up until now. And indeed, you

[1095:27]

still cannot use this feature until the

[1095:29]

very end of the class in CS50 when you

[1095:31]

get to your final projects, at which

[1095:33]

point you are welcome and encouraged to

[1095:34]

use VS Code in uh this particular way.

[1095:38]

So, here again is VS Code. For those

[1095:40]

unfamiliar, this is the programming

[1095:41]

environment we use here with students.

[1095:43]

And let me open up some code that was

[1095:44]

assigned to students a couple of weeks

[1095:46]

back, namely a spell checker that they

[1095:49]

had to implement in C. So I came in

[1095:51]

advance with a folder called speller.

[1095:53]

And inside of this folder, I had code

[1095:55]

that day and all students had that week

[1095:57]

called dictionary.c. And in this file,

[1096:00]

which will not look familiar to many of

[1096:01]

you if you've not taken weeks 0 through

[1096:04]

uh seven up until now, we did have some

[1096:06]

placeholders for students. So long story

[1096:08]

short, students had to answer a few

[1096:10]

questions. that is write code to do this

[1096:12]

to-do, this to-do, this to-do, and one

[1096:14]

more. There were four functions or

[1096:16]

blanks that students needed to fill in

[1096:18]

with code. And I dare say it took most

[1096:20]

students 5 hours, 10 hours, 15 hours,

[1096:24]

something in that very broad range. Let

[1096:27]

me show you now how using AI, you soon,

[1096:31]

the aspiring programmers can start to

[1096:32]

write code all the more quickly. not by

[1096:34]

just choosing a different language but

[1096:36]

by using these AI best based

[1096:38]

technologies beyond the duck itself. So

[1096:40]

what I've done here on the right hand

[1096:42]

side of VS code is enabled a feature

[1096:44]

that CS50 disables for all students from

[1096:46]

the start of the course called copilot.

[1096:48]

This is very similar in spirit to

[1096:49]

products from Google um and anthropic

[1096:51]

and other companies as well. But this is

[1096:53]

the one that comes from Microsoft and in

[1096:55]

turn GitHub here and it too gives us me

[1096:56]

sort of a chat window here and this is

[1096:58]

just one of its features. For instance,

[1097:00]

if I wanted to implement to get started

[1097:03]

the check function, I could just ask it

[1097:04]

to do that. Implement the check function

[1097:09]

and uh how about using a hasht in C. I'm

[1097:14]

going to go ahead and click enter. Now

[1097:16]

it's going to work. It's using as

[1097:18]

reference that is context the very file

[1097:20]

that I've opened which is dictionary.c

[1097:22]

here. Um, copilot in general as as well

[1097:25]

as a lot of AI tools are familiar with

[1097:26]

CS50 itself because it's been freely

[1097:28]

available as open courseware for years.

[1097:30]

What you see here it doing is

[1097:32]

essentially thinking though that's a bit

[1097:33]

of an overstatement. It's not really

[1097:35]

thinking. It's trying to find patterns

[1097:38]

in what the the problem is I want to

[1097:40]

solve among all of its training data

[1097:42]

that it's seen before and come up with a

[1097:44]

pretty good answer. So for today's

[1097:46]

purposes, I'm going to wave my hand at

[1097:47]

the chat GPT like explanation of what to

[1097:50]

do that has appeared at right. But

[1097:52]

what's juiciest to look at here is on

[1097:54]

the left if I now scroll down is

[1097:56]

highlighted in green is all of the

[1097:58]

suggested code for implementing this

[1098:01]

here check function. Now it might not be

[1098:03]

the way you implemented it yourself but

[1098:05]

I do dare say this has hints of exactly

[1098:08]

what you probably did when it came to

[1098:10]

implementing a hash a hash table. And in

[1098:13]

fact I can go ahead and keep all of this

[1098:15]

code if I like how it looks. Let's

[1098:16]

assume that's all correct there. Uh it

[1098:18]

might be the case that I want to now

[1098:20]

implement the load function. So how

[1098:21]

about now implement load function enter

[1098:26]

as simple as that. And what data is

[1098:28]

being used? Well, a few different

[1098:29]

things. It says one reference. So it's

[1098:31]

indeed using this one file. But there's

[1098:33]

also what are called comments in the

[1098:34]

code with which all students are now

[1098:36]

familiar. These slash commands in gray

[1098:38]

that are giving English hints as to what

[1098:40]

this function is supposed to do. There's

[1098:42]

implicit information as to what the

[1098:43]

inputs to these functions, otherwise

[1098:45]

known as arguments are meant to be, what

[1098:46]

the outputs are meant to be. So the

[1098:48]

underlying AI called co-pilot here kind

[1098:51]

of has a decent number of hits hints and

[1098:53]

much like a good TA or good software

[1098:55]

engineer that's enough context to figure

[1098:57]

out how to fill in those blanks. And so

[1098:59]

here too if I scroll down now we'll see

[1099:02]

in green some suggested code via which

[1099:06]

it could uh solve that same problem as

[1099:09]

well. the load function. And I dare say

[1099:12]

I've been talking for far fewer minutes

[1099:14]

than CS50 students spent actually coding

[1099:16]

the solution from scratch to this here

[1099:18]

problem. So I'll go ahead and click

[1099:20]

keep. I'll assume that it's correct. But

[1099:22]

that's actually quite a big assumption.

[1099:24]

And those of you wondering like why have

[1099:25]

we been learning off all this? If I

[1099:26]

could just ask in English it to do my

[1099:28]

homework for me. I mean there's a lot to

[1099:29]

be said for the muscle memory that

[1099:31]

hopefully you feel you've been

[1099:32]

developing over the past several weeks.

[1099:33]

The reality is if you don't have an eye

[1099:35]

for what you're looking at, there's no

[1099:36]

way you're going to be able to

[1099:37]

troubleshoot an issue in here, explain

[1099:39]

it to someone else, make marginal

[1099:41]

changes or the like. And yet, what's

[1099:43]

incredibly exciting even to someone like

[1099:45]

me, all of the staff, friends of mine in

[1099:47]

the industry, is that this kind of

[1099:49]

functionality and AI amplifies your

[1099:51]

capabilities as a programmer sort of

[1099:53]

overnight. Once you have that

[1099:55]

vocabulary, that muscle memory for doing

[1099:57]

it yourself, the AI can just take it

[1099:59]

from there and get rid of all of the

[1100:01]

tedium, allow you to focus at the

[1100:02]

whiteboard with the other humans on sort

[1100:04]

of the overarching problems that you

[1100:06]

want to solve and leave it to this AI to

[1100:08]

actually solve problems for you. A fun

[1100:11]

exercise too might be to go back uh at

[1100:13]

terms end and try solving any number of

[1100:15]

the courses assignments. For instance,

[1100:16]

let me go ahead and do this. In my

[1100:18]

terminal window here, I'm going to go

[1100:19]

back to my main directory. I'm going to

[1100:21]

create an empty file called Mario.c. C

[1100:24]

that has nothing in it. And I'm going to

[1100:26]

go ahead in my chat window here and say,

[1100:28]

please implement a program in C that

[1100:32]

prints a left aligned pyramid of bricks

[1100:37]

using hash symbols for bricks and use

[1100:41]

the CS50 library to ask the user for a

[1100:46]

non negative height as an integer.

[1100:49]

Period. I dare say that's essentially

[1100:51]

the English description of what was for

[1100:53]

CS50 this year problem set one to

[1100:56]

implement a program called Marioc. This

[1100:58]

two is sort of doing its thing. It's

[1101:00]

using one reference. It's working. It

[1101:02]

knows as a hint that this file is called

[1101:04]

Mario.c. And it's seen a lot of those in

[1101:06]

its training data over time. There's an

[1101:08]

English explanation of what I should do.

[1101:09]

And those CS50 students in the room

[1101:11]

probably recognize the sort of basic

[1101:13]

structure here of using a dowh loop to

[1101:15]

prompt the user for a height using the

[1101:16]

CS50 library which has been included.

[1101:18]

print a left alto line pyramid using

[1101:20]

some kind of loop and boom, we are done.

[1101:22]

And these are fairly bite-sized problems

[1101:24]

as you'll see as you get to terms end

[1101:26]

with your final project, which is a

[1101:28]

fairly open-ended opportunity to apply

[1101:31]

your newfound knowledge and savvy with

[1101:32]

programming itself to a problem of

[1101:34]

interest. It will allow you to implement

[1101:36]

far grander projects, far greater

[1101:39]

projects than has been possible to date,

[1101:41]

certainly in just the few weeks we have

[1101:42]

to do it because of this uh

[1101:44]

amplification of your own abilities. So

[1101:47]

with that promise, let's talk about how

[1101:50]

in the heck any of this is actually

[1101:51]

working. I clearly just generated a

[1101:53]

whole lot of stuff and that's how we

[1101:55]

began the story with the generation of

[1101:56]

those images and those two essays by

[1101:58]

kids. But what is generative artificial

[1102:00]

intelligence or really what is AI

[1102:02]

itself? And these are some of the

[1102:04]

underlying building blocks that aren't

[1102:05]

going anywhere anytime soon and indeed

[1102:07]

have led us as a progression to the

[1102:09]

capabilities you just saw. So spam, we

[1102:11]

sort of take for granted now that in our

[1102:12]

Gmail inboxes or Outlook inboxes, most

[1102:15]

of the spam just ends up in a folder.

[1102:17]

Well, there's not some human at

[1102:18]

Microsoft or Google sort of manually

[1102:20]

labeling the messages as they come in,

[1102:22]

deciding spam or not spam. They're

[1102:24]

figuring out using code and nowadays

[1102:26]

using AI that looks like spam and

[1102:28]

therefore I'm going to put it in the

[1102:30]

spam folder, which is probably correct

[1102:31]

99% of the time, but indeed there's

[1102:34]

potentially a failure rate. Um, other

[1102:36]

applications might include handwriting

[1102:38]

recognition. Certainly Microsoft and

[1102:39]

Google doesn't know the handwriting

[1102:40]

style of all of us here in this room,

[1102:42]

but it's been trained on enough other

[1102:44]

humans handwriting styles that odds are

[1102:47]

your handwriting in mine looks similar

[1102:49]

to someone else's. And so with very high

[1102:51]

probability, they could recognize

[1102:53]

something like Hello World here as

[1102:55]

indeed that same digital text. All of us

[1102:57]

are into streaming services nowadays,

[1102:59]

Netflix and the like. Well, they're

[1103:01]

getting pretty darn good at knowing if I

[1103:03]

watched X, I might also like Y. Why?

[1103:06]

Well, because of other things I've I've

[1103:08]

watched before and maybe upvoted and

[1103:10]

downvoted. Maybe because of other things

[1103:12]

people have watched who like similar

[1103:14]

movies or TV shows to me. So that too is

[1103:17]

AI. There's no ifels else if else if

[1103:20]

else construct for every movie or TV

[1103:22]

show in their database. It's sort of

[1103:24]

figuring out much more organically,

[1103:26]

dynamically what you and I might like.

[1103:28]

And then all these voice assistants

[1103:29]

today, Siri, Alexa, Google Assistant,

[1103:31]

and the like. Those two don't recognize

[1103:33]

your voice or necessarily know what

[1103:35]

questions you're going to ask it.

[1103:36]

There's no massive if else if that has

[1103:38]

all possible questions in the world just

[1103:40]

waiting for you or me to ask it. That

[1103:42]

too, of course, is dynamically

[1103:43]

generated. But that's getting a bit

[1103:45]

ahead of ourselves. Let's like rewind in

[1103:47]

time. And some of the parents in the

[1103:48]

audience might remember this year game

[1103:51]

among the first arcade games in the

[1103:52]

world, namely Pong. And so this was a

[1103:54]

black and white game whereby there's two

[1103:56]

players, a paddle on the left, a paddle

[1103:58]

on the right, and then using some kind

[1103:59]

of joystick or track ball, they can move

[1104:01]

their paddles up and down, and the goal

[1104:02]

is to bounce the ball back and forth and

[1104:04]

ideally catch it every time. Otherwise,

[1104:06]

you uh lose a point. Uh this is just an

[1104:09]

animated GIF, so there's nothing really

[1104:11]

dramatic to watch. It's going to stay at

[1104:12]

15 against 12. Uh just looping again and

[1104:15]

again. Nothing interesting is going to

[1104:16]

happen, but this is a nice example of a

[1104:18]

game that lends itself to solving it

[1104:21]

with code. And indeed, it's been in our

[1104:22]

vernacular for years to play against not

[1104:24]

just the computer, but the the CPU, the

[1104:27]

central processing unit, or really the

[1104:29]

AI. And yet, AI does not need to be

[1104:31]

nearly as sophisticated as the tools we

[1104:33]

now see. For instance, here's a

[1104:35]

successor to Pong known as Breakout.

[1104:36]

Similar in spirit, but there's just one

[1104:38]

paddle and one ball, and the goal is to

[1104:40]

bounce the ball off of these colorful

[1104:42]

bricks, and you get more and more points

[1104:44]

depending on how high up you can get the

[1104:45]

ball. All of us as humans, even if

[1104:47]

you've never played this old school

[1104:48]

game, probably have an instinct as to

[1104:50]

where we should move the paddle. If the

[1104:52]

ball just left it going this way, which

[1104:54]

direction should I move the paddle? I

[1104:56]

mean, probably to the left. And indeed,

[1104:57]

that'll catch it on the way down. So,

[1104:59]

you and I just made a decision that's

[1105:01]

fairly instinctive, but it's been

[1105:03]

ingrained in us, but we could sort of

[1105:04]

take all the fun out of the game and

[1105:06]

start to quantify it or describe it a

[1105:09]

little more algorithmically, step by

[1105:11]

step. In fact, decision trees are a

[1105:13]

concept from economics, strategic

[1105:15]

thinking, computer science as well.

[1105:16]

That's one way of solving this problem

[1105:18]

in such a way that you will always play

[1105:21]

this game well if you just follow this

[1105:23]

algorithm. So, for instance, how might

[1105:25]

we implement uh code uh or decision-m

[1105:28]

process for something like breakout?

[1105:30]

Well, you ask yourself first, is the

[1105:31]

ball to the left of the paddle? If so,

[1105:33]

you know where we're going, then go

[1105:35]

ahead and move the paddle left. But what

[1105:37]

if the answer were no? In fact, well,

[1105:39]

you don't just blindly move the paddle

[1105:41]

to the right. probably. What should you

[1105:42]

then ask?

[1105:44]

>> Are we right below the ball?

[1105:45]

>> Are you right below the ball? If the

[1105:46]

ball's coming right at you, you don't

[1105:48]

want to just naively go to the right and

[1105:49]

then risk missing it. So, there's

[1105:50]

another question to ask. Is the ball to

[1105:52]

the right of the paddle? And that's a

[1105:53]

yes no question. If yes, well then okay,

[1105:56]

move it to the right. But if not, you

[1105:58]

should probably stay exactly where you

[1106:00]

are and don't move the paddle. All

[1106:02]

right, so that's fairly deterministic,

[1106:04]

if you will. Um, and we can map it to

[1106:06]

code using pseudo code in uh say a class

[1106:09]

like CS50. We can say in a loop, well,

[1106:11]

while the game is ongoing, if the ball's

[1106:13]

to the left of the paddle, then move the

[1106:14]

paddle left. Uh, else if the ball's to

[1106:16]

the right of the paddle, sorry for the

[1106:18]

typo there, move the paddle right. Uh,

[1106:20]

else just don't move the paddle. And so

[1106:22]

these decision trees, as we drew it,

[1106:24]

have a perfect mapping to code or really

[1106:26]

pseudo code in this particular case,

[1106:28]

which is to say that's how people who

[1106:30]

implemented the breakout game or the

[1106:31]

pawn game, who implemented a computer

[1106:33]

player surely coded it up. It was as

[1106:35]

straightforward as that. But how about

[1106:37]

something like tic-tac-toe, which some

[1106:38]

of you might have played on the way in

[1106:40]

for just a moment on the scraps of paper

[1106:42]

um that you might have had. Uh here we

[1106:45]

have a tic-tac-toe board with two uh O's

[1106:47]

and two X's. For those unfamiliar, this

[1106:49]

game tic-tac-toe, otherwise known as

[1106:51]

knights and crosses, is a matter of

[1106:53]

going back and forth, X's and O's

[1106:54]

between two people. And the goal is to

[1106:56]

get three O's in a row or three X's in a

[1106:58]

row, either vertically, horizontally, or

[1107:00]

diagonally. So this is a game here in

[1107:02]

mid-progress. Well, let's consider how

[1107:04]

you could solve the game of tic-tac-toe

[1107:06]

like a a computer, like an AI might.

[1107:08]

Well, you could ask yourself, can I get

[1107:09]

three in a row on this turn? Well, if

[1107:11]

yes, well, play in the square to get

[1107:13]

three in a row. It's as straightforward

[1107:14]

as that. If you can't, though, what

[1107:16]

should you ask? Well, can my opponent

[1107:18]

get three in a row on their next turn?

[1107:20]

Because if so, you should probably at

[1107:22]

least block their move next, so at least

[1107:24]

you don't lose. now. But this game,

[1107:26]

tic-tac-toe, is relatively simple as it

[1107:28]

is, gets a little harder to play when

[1107:31]

it's not obvious where you should go.

[1107:33]

Now, all of us as humans, if you grew up

[1107:34]

playing this game, probably had

[1107:36]

heruristics you used, like you really

[1107:37]

like the middle or you like the top

[1107:39]

corner or something like that. So, we

[1107:40]

probably can uh make our next move

[1107:43]

quickly, but is it optimal? And I dare

[1107:45]

say if back in childhood or more

[1107:47]

recently you've ever lost a game of

[1107:49]

tic-tac-toe like you're just bad at

[1107:51]

tic-tac-toe because logically there's no

[1107:53]

reason you should ever lose a game of

[1107:55]

tic-tac-toe if you're playing optimally.

[1107:57]

At worst you should force a tie but at

[1108:00]

best you should win the game. So think

[1108:02]

of that the next time you play

[1108:02]

tic-tac-toe and lose like you're doing

[1108:04]

something wrong. But in your defense

[1108:06]

it's because the question mark is sort

[1108:08]

of not obvious. like how do I answer it

[1108:10]

when the answer is not right in front of

[1108:12]

me to move for the win or move for the

[1108:14]

block? Well, one algorithm you could

[1108:16]

have been using all of these years is

[1108:17]

called Miniax. And as the name suggest,

[1108:19]

it's all about minimizing something and

[1108:21]

or maximizing something else. So here

[1108:23]

too, let's take a bit of fun out of the

[1108:25]

game and turn it into some math, but

[1108:26]

relatively simple math. So here we have

[1108:29]

three representative tic-tac-toe boards.

[1108:31]

O has one here, X has one here, and the

[1108:34]

middle is a tie. Doesn't matter how we

[1108:36]

score these boards, but we need a

[1108:37]

consistent system. So I'm going to

[1108:38]

propose that anytime O wins the score of

[1108:41]

the game is negative 1. Anytime X wins,

[1108:43]

the score of the game is a positive one.

[1108:45]

And anytime nobody wins, the score is

[1108:48]

zero. Um so at this point each of these

[1108:51]

boards have these values negative 1, 0,

[1108:53]

and one. So the goal therefore in this

[1108:56]

game of tic-tac-toe now is for X to

[1108:58]

maximize its score because one is the

[1109:00]

biggest value available and O's goal in

[1109:02]

life is to minimize its score. So that's

[1109:04]

how we take the fun out of the game. We

[1109:05]

turn it into math where one player just

[1109:07]

wants to maximize, one player just wants

[1109:09]

to minimize their score. All right, so a

[1109:12]

quick uh sanity check here. Here's a

[1109:14]

board. It's not colorcoded. What is the

[1109:16]

value of this board?

[1109:18]

>> One because x has in fact one straight

[1109:22]

there down the middle. So x is one zero

[1109:25]

o is negative one otherwise a tie. So

[1109:28]

now let's see how we go about with those

[1109:31]

principles in place figuring out where

[1109:33]

we should play in tic-tac-toe. Now,

[1109:35]

here's a fairly easy configuration.

[1109:36]

There's only two moves left. It's not

[1109:38]

hard to figure out how to win or tie

[1109:39]

this game. But let's use it for simpl

[1109:41]

for simplicity. It's O's turn, for

[1109:43]

instance. So, where can O go? Well, that

[1109:46]

invites the question, well, what is the

[1109:47]

value of the board? Or how do we how do

[1109:49]

we minimize the value of the board for O

[1109:51]

to win? Well, O can go in one of two

[1109:53]

places, top left or bottom middle. Which

[1109:56]

way should O go? Well, if O goes in top

[1109:59]

left, we should consider what's the

[1110:01]

value of this board? Is it minimal?

[1110:03]

Well, let's see. uh if O goes here, X is

[1110:06]

obviously going to go here. X is

[1110:08]

therefore going to win. So the value of

[1110:09]

this board is going to be a one. Now

[1110:12]

since there's only one way logically to

[1110:14]

get from this configuration to this one,

[1110:16]

we might as well call the value of this

[1110:17]

board by transitivity one. And so O

[1110:19]

probably doesn't want to go there

[1110:21]

because that's a pretty maximal score

[1110:22]

and O wants to minimize. Over here

[1110:24]

though, if O goes bottom middle, well

[1110:26]

then X is going to go top left. And now

[1110:28]

no one has one. So the value of this

[1110:30]

board is thus

[1110:31]

>> zero. we might as well treat this as

[1110:33]

zero because that's the only way to get

[1110:34]

there logically. So now O more

[1110:37]

mathematically and logically can decide

[1110:39]

do I want an end point of one or an end

[1110:42]

point of zero. Well zero is probably the

[1110:45]

better option because that's less than

[1110:47]

one and thus it's the minimal

[1110:48]

possibility. So O is going to go ahead

[1110:51]

in the bottom middle and at least force

[1110:53]

a tie. And so that's where you see

[1110:54]

evidence where if you humans are ever

[1110:56]

losing the game of tic-tac-toe, you have

[1110:58]

not followed that their logic. But you

[1111:00]

could probably do it if there's just two

[1111:02]

moves left. But the catch is, let's go

[1111:04]

ahead and sort of rewind to three moves

[1111:05]

left here. There are three blanks. And

[1111:07]

I've kind of zoomed out. The catch is

[1111:09]

that the decision tree gets a lot bigger

[1111:12]

the more and more moves that are left.

[1111:14]

It gets sort of bigger and bushier in

[1111:15]

that it's essentially doubling in size

[1111:18]

and width. And that's great if you have

[1111:19]

the luxury of writing it down on a piece

[1111:21]

of paper. But if you're doing this on

[1111:22]

your head while playing against a a

[1111:23]

fifth grader, if I may, you're probably

[1111:25]

not drawing out all of the various

[1111:27]

boards and configurations, trying to

[1111:28]

play it optimally. You're going with

[1111:30]

some instinct. And your instincts might

[1111:32]

not be aligned with an algorithm that is

[1111:34]

tried andrude miniax that will ideally

[1111:37]

get you to win the game, but at least

[1111:38]

will get you to force a tie if you can't

[1111:40]

win. But tic-tac-toe is not that hard. I

[1111:44]

mean, how many different ways are there

[1111:45]

to play tic-tac-toe? could write a

[1111:47]

computer program to pretty much play

[1111:49]

tic-tac-toe optimally. Um, we could use

[1111:52]

code like this. If the player is X for

[1111:53]

each possible move, calculate the score

[1111:55]

for the board at that point in time and

[1111:57]

then choose the move with the highest

[1111:58]

score. So, you just try all

[1112:00]

possibilities mathematically and then

[1112:01]

you make the decision. Most of us in our

[1112:03]

heads are not doing that, but we could.

[1112:04]

Um, else does the player essentially do

[1112:06]

the same thing, but choose the minimal

[1112:08]

possible score. So, that's the code for

[1112:10]

implementing tic-tac-toe. How many ways

[1112:12]

are there to play tic-tac-toe though?

[1112:14]

Well, 255,168,

[1112:17]

which means if we were to draw that

[1112:18]

tree, it would be pretty darn big and it

[1112:20]

would take you quite a bit of time to

[1112:22]

sort of think through all those

[1112:23]

possibilities. So, in your defense,

[1112:24]

you're maybe not that bad at

[1112:25]

tic-tac-toe. It's just harder than you

[1112:27]

thought as a game. But what about games

[1112:29]

with which we might as adults be more

[1112:31]

familiar? Well, what about the game of

[1112:32]

chess, which is often used as a measure

[1112:34]

of like how smart a computer is, whether

[1112:36]

it's Watson back in the day playing

[1112:37]

against it or something else? Well, if

[1112:39]

we consider even just the first four

[1112:40]

moves of tic-tac-toe, whereby I mean

[1112:43]

black goes and white goes, and then they

[1112:44]

each go three more times. So, four

[1112:46]

pair-wise moves. How many different ways

[1112:48]

are there to play chess? Well, it turns

[1112:51]

out 85 billion just to get the game

[1112:54]

started. And that's a lot of decisions

[1112:57]

to consider and then make. How about the

[1112:59]

game of Go a familiar? Consider the

[1113:00]

first four move 266 quintilion

[1113:03]

possibilities. And this is where we sort

[1113:05]

of as humans and even with our modern

[1113:07]

PCs and Macs and phones kind of have to

[1113:10]

throw up our hands because I don't have

[1113:11]

this many bytes of memory in my

[1113:14]

computer. I don't have this many hours

[1113:16]

in my life left to actually crunch all

[1113:17]

of those numbers and figure out the

[1113:19]

solution. And so where AI comes in is

[1113:22]

where it's no longer as simple as just

[1113:23]

writing if else's and loops and no

[1113:26]

longer as simple as just trying all

[1113:27]

possibilities. You instead need to write

[1113:30]

code that doesn't solve the problem

[1113:31]

directly but in some sense indirectly.

[1113:33]

You write code so that the computer

[1113:36]

figures out how to win. Perhaps by

[1113:38]

showing it configurations of the board

[1113:40]

that are a good place to be in that is

[1113:42]

promising and maybe showing it boards

[1113:45]

that it doesn't want to find itself in

[1113:46]

the configuration of because that's

[1113:47]

going to lead it to lose. In other

[1113:49]

words, you train it but not necessarily

[1113:51]

as exhaustive. And this is what we mean

[1113:53]

nowadays by machine learning. writing

[1113:55]

code via which machines learn how to

[1113:58]

solve problems generally by being

[1114:00]

trained on massive amounts of data and

[1114:02]

then in new problems looking for

[1114:04]

patterns via which they can apply those

[1114:06]

past training data to the problem at

[1114:08]

hand. And reinforcement learning is one

[1114:10]

way to think about this. In fact, in

[1114:11]

fact, we as humans use reinforcement

[1114:14]

learning which is a type of machine

[1114:16]

learning sort of all of the time. Um in

[1114:18]

fact uh uh a fun demonstration to watch

[1114:21]

here involves these here are pancakes.

[1114:23]

So, in fact, let me go ahead and pull up

[1114:24]

a short recording here of an actual

[1114:26]

researcher in a lab who's trying to

[1114:28]

teach a robot how to make uh how to flip

[1114:31]

pancakes. So, we'll see here in this

[1114:33]

video that there's a robot has a arm

[1114:35]

that can go up, down, left, right. This,

[1114:36]

of course, is the human, the researcher,

[1114:38]

and he's just going to show the robot

[1114:40]

one or more times like how to flip a

[1114:41]

pancake

[1114:44]

and crosses his fingers and okay, seems

[1114:46]

to have done it well. Does it again. Not

[1114:48]

quite the same, but pretty good. And now

[1114:51]

he's going to let the robot just try to

[1114:54]

figure out how to flip that pancake

[1114:55]

after having just trained it a few

[1114:57]

different times. The first few times,

[1114:59]

odds are the robot's not going to do

[1115:01]

super well cuz it really doesn't

[1115:03]

understand what the human just did or

[1115:05]

what the whole purpose of. But and

[1115:06]

here's the key detail with reinforcement

[1115:08]

learning. Behind the scenes, the human

[1115:11]

is probably rewarding the robot when it

[1115:14]

does a good job. like better and better

[1115:15]

it flips, the more it gets rewarded as

[1115:17]

by like hitting a key and giving it a

[1115:18]

point, for instance, or giving it the

[1115:19]

digital equivalent of a cookie. Or

[1115:21]

conversely, every time the robot screws

[1115:23]

up and drops the pancake on the floor,

[1115:24]

sort of a proverbial slap on the wrist,

[1115:26]

a punishment so that it does less of

[1115:28]

that behavior the next time. And any of

[1115:30]

you who are parents, which by definition

[1115:32]

today, many of you are, odds are,

[1115:34]

whether it's not this or maybe just

[1115:36]

verbal uh approval or reprimands, have

[1115:38]

you probably trained children at some

[1115:40]

point to do more of one thing and less

[1115:42]

of another. And what you're seeing in

[1115:43]

the backdrop there is now just a

[1115:45]

quantization of the movements X, Y, and

[1115:47]

Z coordinates so that it can do more of

[1115:49]

the X's and the Y's and the Z that led

[1115:51]

it to some kind of reward. And now after

[1115:54]

you're up to some 50 trials, the robot

[1115:56]

seems to be getting better and better

[1115:59]

such that like a good human, we'll see

[1116:00]

if I can do this without embarrassing

[1116:02]

myself, can flip the thing. That's

[1116:04]

pretty good. That was pretty I've been

[1116:05]

doing this a long time. Okay,

[1116:08]

so we've seen then how you might uh

[1116:12]

reinforce learning through that kind of

[1116:15]

domain. Let's take an example that's

[1116:16]

familiar to those of you who are gamers.

[1116:18]

Anytime you've played a game where

[1116:19]

there's some kind of map or a world that

[1116:21]

you need to explore up, down, left,

[1116:22]

right, maybe you're trying to get to the

[1116:23]

exit. So here simplistically is the

[1116:26]

player at the yellow dot. Here for

[1116:28]

instance in green is the exit of the map

[1116:30]

and you want to get to that point. And

[1116:32]

maybe somewhere else in this world

[1116:33]

there's a lot of like lava pits and you

[1116:35]

don't want to fall into the lava pit

[1116:36]

because you lose a life or you lose a

[1116:37]

point or there's some penalty or

[1116:39]

punishment associated with that. Well,

[1116:41]

we with this bird's eye view can

[1116:42]

obviously see how to get to the green

[1116:44]

dot. But if you're playing a game like

[1116:45]

Zelda or something like that, all you

[1116:47]

can do is move up, down, left, right,

[1116:48]

and sort of hope for the best. So, let's

[1116:50]

do just that. Suppose the yellow dot

[1116:52]

just randomly chooses a direction and

[1116:54]

goes to the right. Well, now we can sort

[1116:56]

of take away a life, take away a point

[1116:58]

or effectively punish it so that it

[1117:00]

knows don't do that. And so long as the

[1117:03]

uh player has a bit of memory, either

[1117:05]

the human player or the code that's

[1117:06]

implementing this just with a dark red

[1117:08]

line, that means don't do that again

[1117:10]

because that didn't lead to a good

[1117:11]

outcome. So maybe the next time the

[1117:13]

yellow dot goes this way and this way

[1117:14]

and then ah didn't realize that that's

[1117:16]

actually the same lava pit. But that's

[1117:17]

fine. Use a little bit more memory and

[1117:19]

remind me don't do that because I just

[1117:20]

lost a second life in this story and

[1117:22]

maybe it goes this way next time. Ah,

[1117:24]

now I need to remember don't do that.

[1117:26]

But effectively, I'm either being

[1117:28]

punished for doing the wrong thing. Ah,

[1117:30]

or as we'll soon see, being rewarded for

[1117:32]

doing more of the successful thing. And

[1117:34]

just by chance, maybe I finally make my

[1117:37]

way to the exit in this way. And so I

[1117:39]

can be rewarded for that. Now I got 100

[1117:41]

points or whatever it is, the high

[1117:42]

score. So now, as per these green lines,

[1117:44]

I can just follow that path again and

[1117:46]

again, and I can always win this game.

[1117:48]

kind of like me nowadays, like 30 years

[1117:50]

later, playing Super Mario Brothers

[1117:51]

because I can get through all the warp

[1117:52]

levels because I know where everything

[1117:53]

is because for some reason that's still

[1117:54]

stored in my brain. Is this the best way

[1117:57]

to play? Am I as good at Super Mario

[1117:59]

Brothers as I might think?

[1118:02]

What's bad about this solution? Yeah.

[1118:09]

>> Exactly. Yeah. I've moved many more

[1118:10]

times than I need to. And just for fun

[1118:12]

today, what grade are you in?

[1118:13]

>> Uh, seventh.

[1118:14]

>> Seventh grade. Wonderful. So now seventh

[1118:16]

grade observation is like exactly that

[1118:18]

that we could have taken a shorter path

[1118:20]

which is essentially that way albeit uh

[1118:23]

making some straight moves. And so we're

[1118:25]

never going to find that shorter path.

[1118:27]

We're never going to get the highest

[1118:28]

score possible if I just keep naively

[1118:30]

following my welltrodden path. And so

[1118:32]

how do we break out of that mold? And

[1118:35]

you can see this even in the real world.

[1118:36]

Another sort of personal example is I'm

[1118:38]

the type of person for some reason where

[1118:39]

if I go to a restaurant for the first

[1118:40]

time, I choose a dish off the menu and I

[1118:42]

really like it. I will never again order

[1118:44]

anything else off that menu other than

[1118:46]

that dish because I know it is good. But

[1118:48]

there could be something even better on

[1118:50]

the menu, but I'm never going to explore

[1118:52]

that because I'm sort of fixed in my

[1118:54]

ways, as some of you from the smiles

[1118:55]

might be too. But what if we took

[1118:58]

advantage of exploring just a little

[1118:59]

bit? And there's this principle of

[1119:01]

exploring versus exploiting when it

[1119:03]

comes to using artificial intelligence

[1119:05]

to solve problems. Up until now, I've

[1119:07]

just been exploiting knowledge I already

[1119:08]

have. Don't go through the red walls. Do

[1119:10]

go through the green walls. Exploit,

[1119:12]

exploit, exploit. and I will get to a

[1119:14]

final solution. But what if I just

[1119:15]

sprinkle in a little bit of randomness

[1119:18]

along the way and maybe 10% of the time

[1119:20]

as represented by this epsilon variable,

[1119:22]

I as the computer in the story generate

[1119:25]

a random number between zero and one.

[1119:26]

And if it's less than that percent,

[1119:28]

which is going to happen 10% of the

[1119:29]

time, I'm going to make a random move

[1119:31]

instead of one that I know will get me

[1119:33]

closer to the exit. Otherwise, I'll

[1119:35]

indeed make the move with the highest

[1119:36]

value. Now, this isn't going to

[1119:37]

necessarily win me the game that first

[1119:39]

time, but if I play it enough and enough

[1119:41]

and enough and insert some of this

[1119:43]

randomness, I might very well find a

[1119:44]

better solution and therefore be a

[1119:47]

better player, a better winner overall.

[1119:49]

If I just 10% of the time ordered

[1119:51]

something else off the menu, I might

[1119:52]

find that there's an amazing dish out

[1119:53]

there that otherwise I wouldn't have

[1119:55]

discovered. And so indeed using that

[1119:58]

approach can we finally find a more

[1120:00]

optimal path through the maze as was

[1120:02]

shorter there presumably therefore

[1120:04]

maximizing our score and doing even

[1120:06]

better than we might have by just

[1120:08]

exploiting the same knowledge. So you

[1120:11]

can see this even in the game of

[1120:12]

Breakout especially if you write a

[1120:14]

solution in code to play this game for

[1120:16]

you. Let me go ahead and pull up another

[1120:18]

video recording of an AI playing

[1120:20]

Breakout. And what this AI is doing is

[1120:22]

essentially figuring out maybe more

[1120:24]

intelligently than you or I could, how

[1120:25]

to play this game optimally. And what

[1120:28]

we'll see here is that just like uh the

[1120:32]

pancake flipping robot, there's some

[1120:34]

notion of scoring and rewards and

[1120:36]

penalties here. So like right now, the

[1120:38]

paddle is just doing random stuff. It

[1120:39]

doesn't really know how to play the game

[1120:41]

yet, but it realizes after 200 episodes

[1120:44]

that, oh, my score goes up if I hit the

[1120:47]

ball and it goes down equivalently if I

[1120:49]

miss it. and it's still a little

[1120:51]

twitchy. It doesn't quite understand

[1120:52]

what it's supposed to do and why. But if

[1120:54]

you do it again and again and again and

[1120:57]

it's rewarded andor punished enough,

[1120:59]

you'll see that it starts to get pretty

[1121:02]

good and closer to what a good human

[1121:04]

might do. But here's where the algorithm

[1121:06]

gets a little creepy. If you let it play

[1121:08]

long enough, or if you and I, the humans

[1121:10]

play long enough, you might find a

[1121:11]

certain trick to the game. I dare say

[1121:13]

the AI becomes a bit scarily sent

[1121:15]

sentient in that turns out if you're

[1121:18]

smart enough to break through that top

[1121:20]

row, you can let the game just play

[1121:22]

itself for you and maximize your score

[1121:23]

without even touching the ball.

[1121:25]

Something that I do find a little creepy

[1121:27]

that I just figured out how to do that

[1121:28]

without being told. But it's just a

[1121:30]

logical continuation of rewarding it for

[1121:32]

good behavior and punishing it for bad

[1121:36]

behavior. So that next time you have an

[1121:38]

occasion to play Breakout, consider that

[1121:40]

kind of strategy as opposed to doing

[1121:41]

more of the work yourself, let the

[1121:43]

computer do it for you instead. Well,

[1121:45]

what else is there to consider in this

[1121:47]

world of AI in the context of machine

[1121:49]

learning? Well, there's specifically a

[1121:52]

category of learning that's supervised.

[1121:54]

And we've been using this for years. And

[1121:55]

in fact, our first example of spam early

[1121:57]

on was certainly supervised. Why?

[1121:59]

Because it was you and I who was like

[1122:00]

putting the ma email into the spam

[1122:02]

folder. to this day, maybe once a day, I

[1122:04]

hit the keyboard shortcut in Gmail to

[1122:06]

say, "Ah, this is spam. You should have

[1122:07]

caught this." And that is training

[1122:09]

Google's algorithm further, assuming

[1122:11]

it's not just little old me, but maybe

[1122:12]

thousands of people tagging that same

[1122:14]

kind of email as spam. That's supervised

[1122:16]

learning and that there's a human in the

[1122:18]

loop doing at least something. Um, so

[1122:20]

spam detection might be one of those.

[1122:22]

But the catch is that labeling data in

[1122:24]

that way manually just doesn't scale

[1122:26]

very well. That would be akin to having

[1122:27]

someone at Google or Microsoft labeling

[1122:29]

every email or someone at Netflix doing

[1122:31]

the same for all of the videos out

[1122:32]

there. It's expensive in terms of human

[1122:34]

power. And there's certainly problems

[1122:36]

out there with so much data. It's just

[1122:38]

not realistic for humans to label

[1122:39]

millions of pieces of data, billions of

[1122:41]

pieces of data. We've got to move to an

[1122:43]

unsupervised model. And so this is where

[1122:45]

the world starts to consider deep

[1122:48]

learning, solving problems using code

[1122:51]

whereby you don't even have humans in

[1122:52]

the loop in quite the same way. and

[1122:54]

neural networks inspired by the world of

[1122:56]

biology are sort of the inspiration for

[1122:58]

what is the state-of-the-art even

[1123:00]

underlying today's rubber duck and more

[1123:02]

generally these things called large

[1123:04]

language models like chat GPT and the

[1123:06]

like. So here pictured somewhat

[1123:07]

abstractly is a neuron and it's

[1123:09]

something in the human body that

[1123:10]

transmits a signal say from left to

[1123:12]

right electrically and if you have

[1123:13]

multiple neurons you can

[1123:14]

intercommunicate among them so that if I

[1123:16]

think a thought uh then I know how to

[1123:18]

raise my hand because some kind of

[1123:19]

message electrically has gone from my

[1123:21]

head to this extremity here. So that's

[1123:23]

in essence what I remember from nth

[1123:25]

grade biology. But as computer

[1123:27]

scientists, we sort of abstract all of

[1123:28]

this away. So instead of calling these

[1123:30]

two neuron, drawing them as neurons,

[1123:32]

let's just start drawing neurons as

[1123:33]

these little circles. And if they have

[1123:35]

connective tissue between them of sorts,

[1123:37]

we'll just draw a a straight line an

[1123:39]

edge between them. So this is what a

[1123:40]

computer scientist would call a graph.

[1123:42]

If you have two such neurons over here

[1123:44]

leading to one out uh one neuron here,

[1123:47]

you can think of this as being like

[1123:48]

maybe two inputs to a problem and now

[1123:50]

one output there too. We can represent

[1123:53]

the notion of problem solving, which is

[1123:54]

what CS50 and intro courses more

[1123:56]

generally are all about. So let's solve

[1123:58]

a problem with a neural network without

[1124:00]

necessarily training it in advance, just

[1124:02]

letting it figure out how to answer this

[1124:04]

question. Here's a very simple

[1124:05]

two-dimensional world, XY grid, and here

[1124:07]

are two dots. And the dots in this world

[1124:09]

are either blue or they are red. But I

[1124:11]

have no idea yet what makes a dot blue

[1124:14]

or red. However, if you train me on

[1124:16]

those two dots, I bet I could come up

[1124:18]

with predictions, especially if you let

[1124:20]

me label this world in terms of x

[1124:22]

coordinates on the horizontal,

[1124:23]

y-coordinates on the vertical, and then

[1124:25]

you know what? We can think of this

[1124:26]

neural network very simply as

[1124:28]

representing the x coordinate here, the

[1124:30]

y-coordinate here, and the answer I want

[1124:31]

to get is quote unquote red or blue or

[1124:34]

zero or one or true or false, however

[1124:36]

you want to think of the representation.

[1124:38]

So, how do I get from a specific

[1124:39]

xycoordinate to a prediction of color if

[1124:42]

I only know the coordinates? Well, up

[1124:44]

from the get-go, maybe the best I can do

[1124:46]

is just divide the world into blue dots

[1124:48]

on the left and red dots on the right. A

[1124:49]

best fit line, if you will, based on

[1124:51]

very minimal data. Of course, if you

[1124:53]

give me a third dot, it's going to be

[1124:54]

pretty easy to realize that I was a

[1124:57]

little too hasty. That line is not

[1124:59]

vertical. So, maybe we pivot the line

[1125:00]

this way. And now I'm back in business.

[1125:02]

Now, I can predict with higher

[1125:03]

probability based on XY what color the

[1125:06]

next dot will be. You give me enough of

[1125:07]

these dots, I can come up with a pretty

[1125:10]

good best fit line. It's not perfect,

[1125:12]

but here's a hint at why AI is not

[1125:14]

perfect, but 99% of the time, maybe I'll

[1125:16]

be able to predict correctly. And I can

[1125:18]

do even better if you let me squiggle

[1125:19]

the line a little bit and maybe make it

[1125:21]

more than just a simple uh slope. So,

[1125:23]

what is it we're really doing with

[1125:26]

implementing this neural network, albeit

[1125:28]

simplistically with just three neurons?

[1125:30]

Well, essentially, we're trying to come

[1125:31]

up with three values, three parameters,

[1125:34]

an A, a B, and a C. And what do those

[1125:36]

represent? Well, really just a solution

[1125:38]

to this formula. that their line we drew

[1125:40]

can be represented if you think back to

[1125:42]

like high school math with a formula

[1125:44]

along these lines where by it's a * x

[1125:47]

plus b * y plus some constant c and we

[1125:49]

can just arbitrarily conclude that if

[1125:51]

that value mathematically gives me a

[1125:53]

number greater than zero predict it's

[1125:54]

going to be blue otherwise predict it's

[1125:56]

going to be red we can sort of map our

[1125:58]

mathematics just like with tic-tac-toe

[1126:00]

to the actual problem we care about by

[1126:02]

defining the world in this way and so if

[1126:04]

you give me enough data points and

[1126:06]

enough data points I can come up with

[1126:08]

answers for that A, that B, that C. The

[1126:10]

so-called parameters in neural networks.

[1126:12]

Now, in reality, neural networks are not

[1126:13]

composed of like three neurons and a

[1126:15]

couple of edges. They look a little

[1126:16]

something more like this. And in

[1126:18]

practice, they've got billions of these

[1126:20]

things here on the screen. In which

[1126:22]

case, pretty much every one of these

[1126:24]

edges represents some mathematical value

[1126:26]

that was contrived based on lots and

[1126:29]

lots of training data. And whereas I,

[1126:31]

the computer scientist, might know what

[1126:32]

these neurons over here represent

[1126:34]

because those are my inputs, three in

[1126:36]

this case. and I, the computer

[1126:37]

scientist, know what this one represents

[1126:39]

at the end. If you sort of took the hood

[1126:41]

off of this thing and looked inside the

[1126:42]

neural network, even though there'd be

[1126:44]

millions billions of numbers going on

[1126:46]

there, I can't tell you what this neuron

[1126:48]

represents or why this edge has this uh

[1126:50]

weight. It's because of the massive

[1126:51]

amount of training data that that's just

[1126:53]

how the math works out. And if you feed

[1126:55]

me more data, I might change some of

[1126:57]

those parameters more. So the graph

[1126:59]

ultimately might look quite different,

[1127:00]

but my inputs and my outputs are going

[1127:02]

to be what I use to solve that their

[1127:04]

problem. So if you want to predict like

[1127:06]

rainfall from humidity or pressure, you

[1127:07]

can have two inputs giving that one

[1127:09]

output. Uh advertising dollar spent in a

[1127:11]

given month that might predict sales by

[1127:12]

just having trained again on such

[1127:14]

volumes of data. And when we get now

[1127:16]

full circle to something like CS50's

[1127:18]

rubber duck and large language models

[1127:19]

like claude and gemini and chacht what's

[1127:22]

really happening and this is all hot off

[1127:24]

the press in recent years screenshotted

[1127:26]

here are some of the recent research

[1127:27]

papers that have driven a lot of this

[1127:29]

advancement in recent years. you have

[1127:31]

from open AAI say a generative

[1127:32]

pre-trained transformer which is a lot

[1127:35]

to say but there's the GPT in chat GPT

[1127:39]

and essentially this is a neural network

[1127:41]

that's been trained on large volumes of

[1127:43]

textual information that gives us the

[1127:46]

interactive chat feature that we have in

[1127:47]

the class and we all have more generally

[1127:49]

in chatbt itself. So an example of what

[1127:53]

is actually happening underneath the

[1127:54]

hood of these GPTs. Well, here's a

[1127:57]

paragraph that up until recent years was

[1127:59]

kind of a hard paragraph to end with the

[1128:01]

dot dot dot. Uh, Massachusetts is a

[1128:04]

state in the New England region of the

[1128:05]

northeastern United States. It borders

[1128:06]

on the Atlantic Ocean to the east. The

[1128:08]

state's capital is dot dot dot. Now,

[1128:10]

most anyone living in Massachusetts

[1128:11]

probably knows that answer. But if this

[1128:13]

AI has just been trained on lots and

[1128:15]

lots of data, there's probably a lot of

[1128:17]

people who say Massachusetts in part of

[1128:19]

a sentence and then the answer, which I

[1128:21]

won't say yet, is in uh the other part

[1128:23]

of the sentence. But in this example,

[1128:25]

given that the question we're asking is

[1128:27]

sort of so far from some of the useful

[1128:30]

keywords up until recently, this was a

[1128:32]

hard problem to solve because there was

[1128:33]

so much distance. Moreover, there's

[1128:35]

these nouns that are being used to

[1128:36]

substitute for the proper noun. Like we

[1128:38]

suddenly start calling it a state, we

[1128:39]

call it a state down here. And it wasn't

[1128:41]

necessarily obvious to AIS that we're

[1128:43]

talking about the same thing as if it

[1128:46]

were just city, state, where you'd have

[1128:48]

much more proximity. So in a nutshell,

[1128:50]

what we now do especially to solve

[1128:52]

problems like these is we first break

[1128:54]

down a sentence or the training data or

[1128:56]

input alike into like an array or a list

[1128:59]

of the words themselves. We come up with

[1129:01]

a representation of each of these words.

[1129:04]

For instance, the word Massachusetts if

[1129:06]

you encode it in a certain way uh is

[1129:08]

going to be represented with an array or

[1129:09]

vector of numbers, floatingoint values.

[1129:11]

So many so that the word Massachusetts

[1129:13]

in one model would use these 1536

[1129:16]

floatingoint numbers to represent

[1129:18]

Massachusetts essentially in an

[1129:20]

n-dimensional space. So not just an XY

[1129:22]

plane but somewhere sort of virtually

[1129:24]

out there and then and this has been the

[1129:26]

key to these GPTs an attention is

[1129:29]

calculated based on all of that data

[1129:31]

whereby in this picture the thicker

[1129:33]

lines imply more of a relationship

[1129:35]

between those two words. So

[1129:36]

Massachusetts and state is inferred as

[1129:38]

having a thicker line, a higher

[1129:40]

attention from one word to the other.

[1129:42]

Whereas our A's and our ises and our

[1129:44]

thus have thinner lines because they're

[1129:45]

just not as much signal to the AI as to

[1129:48]

what the answer to this question is.

[1129:50]

Meanwhile, when you then feed that

[1129:52]

sentence like the state's capital is one

[1129:55]

word per neuron here, the goal is to get

[1129:58]

the answer to that question. And even

[1130:00]

here, this is way smaller of a

[1130:01]

representation than the actual neural

[1130:03]

network would be. But in effect, all

[1130:05]

these LLMs, large language models are

[1130:08]

are just statistical models. Like what

[1130:10]

is the highest probability word that it

[1130:12]

should spit out at the end of this

[1130:13]

paragraph based on all of the Reddit

[1130:15]

posts and Google search results and

[1130:17]

encyclopedias and Wikipedias that it's

[1130:19]

found and trained on online? Well, the

[1130:21]

answer hopefully will be Boston. But of

[1130:24]

course, 1% of the time, maybe less than

[1130:26]

that, the answer might not be correct.

[1130:27]

And even CS50's own duck is fallible,

[1130:29]

even though we've written lots of code

[1130:31]

to try to put downward pressure on those

[1130:33]

mistakes. And those mistakes are what

[1130:34]

we'll call lastly hallucinations where

[1130:37]

the AI just makes something up perhaps

[1130:39]

because some crazy human on the internet

[1130:41]

made something up and it was interpreted

[1130:42]

as authoritative or just by bad luck

[1130:45]

because of a bit of that exploration 10%

[1130:47]

of the time 1% of the time the AI sort

[1130:49]

of veered this way in the large language

[1130:51]

model in the neural network and spit out

[1130:53]

an answer that just in fact is not

[1130:55]

correct. And so I thought I'd end for

[1130:57]

today on this final note, a poem with

[1130:58]

which many of us might have grown up

[1131:00]

from Shell Silverstein here about the

[1131:02]

homework machine, which years ago

[1131:03]

somehow sort of predicted the state we

[1131:05]

would be in with these AI machines. He

[1131:08]

said, "The homework machine, oh, the

[1131:09]

homework machine, most perfect

[1131:11]

contraption that's ever been seen. Just

[1131:13]

put in your homework, then drop in a

[1131:15]

dime, snap on the switch, and in 10

[1131:17]

seconds time, your homework comes out

[1131:18]

quick and clean as can be." Here it is.

[1131:21]

9 + 4, and the answer is three. Three.

[1131:25]

Oh, me. I guess it's not as perfect as I

[1131:27]

thought it would be. This then was CS50.

[1131:31]

See you next time.

[1131:59]

Heat. Heat.

[1132:32]

Heat. Heat.

[1132:51]

All right, this is CS50 and this is

[1132:55]

already week 8. uh and up until now of

[1132:57]

course in so many of our problem sets

[1132:59]

like we've been writing command line

[1133:00]

code like a black and white terminal

[1133:02]

window and everything is very keyboard

[1133:03]

based very textual but of course like

[1133:05]

the apps that you and I are using like

[1133:07]

every day are in the form of a web

[1133:09]

browser and on our phone and so today

[1133:11]

and really for the rest of the semester

[1133:13]

we now transition to using all of the

[1133:14]

building blocks that we've been

[1133:15]

accumulating over the past few weeks but

[1133:17]

to redeploy them in the context of web

[1133:20]

apps and for your final project for

[1133:21]

instance if you so choose even mobile

[1133:23]

apps as well. So today we're going to

[1133:25]

understand how the internet that we use

[1133:27]

every day actually works. We're going to

[1133:28]

introduce you to a language called HTML

[1133:30]

which is the language in which web pages

[1133:32]

are written. A language called CSS which

[1133:34]

is the language with which web pages are

[1133:37]

stylized. And then lastly JavaScript

[1133:39]

which of those is the only actual

[1133:40]

programming language but even though

[1133:42]

we'll spend uh quite little time on it

[1133:44]

you'll see syntactically and

[1133:46]

functionally it's very similar to C to

[1133:48]

Python and languages indeed that have

[1133:50]

come before. All right. So we use the

[1133:52]

internet every day. So what exactly is

[1133:54]

it? Well, in the simplest form, like

[1133:56]

we've got networks in the world and

[1133:58]

networks are interconnections of

[1133:59]

computers, whether with wires or

[1134:01]

wirelessly. You have a network at home

[1134:03]

nowadays for the most part. You

[1134:05]

certainly have a network on a campus

[1134:06]

like this. In corporations, you have

[1134:08]

networks. So interconnections of

[1134:09]

computers. As soon as you start

[1134:11]

networking the networks, if not

[1134:13]

networking the networks of networks, you

[1134:15]

have in effect the internet. So this

[1134:17]

global interconnection of computers,

[1134:18]

servers, devices and so many other

[1134:20]

things literally nowadays that we take

[1134:23]

for granted every day. But how does it

[1134:24]

actually work and where did it come

[1134:26]

from? Well, if we rewind to like 1969,

[1134:28]

the internet in its original form really

[1134:30]

something known as ARPANet for the

[1134:32]

advanced research projects agency, a

[1134:34]

project from the Department of Defense

[1134:36]

that was really designed to interconnect

[1134:39]

what limited supercomputers we had back

[1134:41]

then that were otherwise geographically

[1134:43]

inaccessible to so many researchers and

[1134:45]

others. The internet or ARPANET really

[1134:47]

just looked like this with UCLA and just

[1134:49]

a few other nodes so to speak

[1134:51]

interconnected somehow. Uh just a year

[1134:54]

or so later did we have Harvard and MIT

[1134:56]

and others on the east coast. And if we

[1134:58]

fast forward now to today of course we

[1135:00]

can find and route data most anywhere in

[1135:02]

the world. And in fact the world is now

[1135:05]

filled with these things called routers.

[1135:07]

A router is just a computer a server uh

[1135:09]

that routes data up down left right

[1135:12]

geographically. And of course in the

[1135:13]

real world it might go out this wire

[1135:15]

here, out this wire here, out this wire

[1135:16]

or out this wire. And in fact, just to

[1135:18]

make more real what we're about to be

[1135:20]

talking about when we talk about

[1135:21]

networks of computers and eventually the

[1135:23]

internet, um we engaged some of our

[1135:25]

teaching fellows over the past few years

[1135:26]

to perform a a little skit of sorts for

[1135:28]

us using uh Zoom, if you will, whereby

[1135:31]

each of the teaching fellows or humans

[1135:33]

you're about to see consider them as

[1135:35]

representing a router, a device on the

[1135:36]

internet whose purpose in life is to

[1135:38]

route data. And what they're routing is

[1135:40]

what we're going to start calling

[1135:41]

packets. packets of information which

[1135:42]

metaphorically you can think of as just

[1135:44]

like a little white envelope like this

[1135:45]

that we use to send things via snail

[1135:47]

mail via the US Postal Service or beyond

[1135:49]

that internationally. So I give you in

[1135:51]

just 60 seconds or so what it means to

[1135:54]

send a packet on the internet for

[1135:56]

instance from Phyllis in the bottom

[1135:58]

right hand corner to a familiar face

[1136:00]

Brian at top left. If we could dim the

[1136:02]

lights if only to be dramatic.

[1136:08]

Heat. Heat.

[1136:34]

Thank you. Sure, we can clap for that.

[1136:37]

And we actually should clap for that

[1136:39]

because you're seeing the sort of final

[1136:40]

version which looked kind of perfect,

[1136:42]

but they were all smiling and clapping

[1136:43]

because it took us so many damn takes to

[1136:45]

like actually get the coordination of

[1136:46]

that correct. But for now, assume that

[1136:48]

it was in fact correct. But notice

[1136:50]

what's among the takeaways from even

[1136:51]

that little skid is that the packet, the

[1136:54]

envelope from Phyllis to Brian could

[1136:55]

have taken any number of paths. It could

[1136:57]

have gone up and then to the left. It

[1136:58]

could have gone left and then up. It

[1136:59]

could have zigzagged and the like. And

[1137:01]

that's actually representative of how

[1137:02]

the world now looks because of so many

[1137:05]

wires and so many wireless connections.

[1137:07]

There's actually a lot of ways that data

[1137:08]

can travel from point A to point B. And

[1137:10]

it turns out it's not even necessarily

[1137:12]

going to be the shortest difference. It

[1137:14]

might be the least expensive dis uh

[1137:16]

distance uh or perhaps just the result

[1137:18]

of how some humans or somehow some

[1137:20]

servers have automatically configured

[1137:22]

the d the uh routes to get from point A

[1137:24]

to point B. So let's consider how the

[1137:27]

data is actually getting there. So long

[1137:28]

story short, all of those routers and

[1137:30]

indeed all devices on the internet

[1137:32]

including the ones in your pocket or on

[1137:34]

your laps speak a language, more

[1137:36]

technically a protocol nowadays known as

[1137:39]

TCP IP. And this is actually a pair of

[1137:41]

protocols which is a set of conventions

[1137:43]

that governs how computers behave on the

[1137:45]

internet. In the human world, we have

[1137:46]

protocols as well. For instance, when I

[1137:48]

meet someone for the first time, I very

[1137:50]

often instinctively sort of extend my

[1137:51]

hand just sort of hoping that they too

[1137:53]

will extend their hand and shake. And

[1137:54]

that's a human protocol in that it

[1137:56]

governs how to people in that case

[1137:58]

intercommunicate. Well, servers have the

[1138:00]

same kinds of protocols, but it's all

[1138:02]

textbased or bit based instead of of

[1138:04]

course physical. But TCP and e and IP

[1138:07]

are two different protocols that solve

[1138:09]

two different problems. And let's focus

[1138:10]

on the last of them first. So IP short

[1138:13]

for internet protocol is simply a

[1138:15]

protocol that decides to give all of us

[1138:19]

a unique address in the world. In other

[1138:21]

words, there are these things called IP

[1138:23]

addresses. It's a numeric address that

[1138:24]

literally every computer in the world

[1138:26]

has in order to uniquely identify it.

[1138:29]

Case in point, in the real world, we

[1138:30]

have addresses too. For instance, in

[1138:32]

this building here, Memorial Hall, we're

[1138:33]

at 45 Quincy Street, Cambridge,

[1138:35]

Massachusetts 02138 USA. And

[1138:38]

theoretically that unique identifier

[1138:40]

should get an envelope in the physical

[1138:41]

world to this location from any other in

[1138:44]

the real world. IP as applied to the

[1138:47]

internet just means that similarly do

[1138:49]

devices, Macs, PCs, phones, and

[1138:51]

everything else on the internet have a

[1138:53]

unique identifier as well known as an IP

[1138:55]

address. It's a number, but it's

[1138:57]

typically formatted in dotted decimal

[1138:59]

notation, so to speak. So it's something

[1139:01]

dot something dot something dot

[1139:02]

something. And just as a bit of trivia,

[1139:04]

each of these number signs represents a

[1139:07]

value from 0 to 255. So there are four

[1139:10]

such values apparently. And just doing

[1139:13]

some quick week zero math, if each of

[1139:14]

those values can be 0 to 255, how many

[1139:17]

bits is an IP address presumably?

[1139:22]

>> So eight bits per number. And how many

[1139:24]

was this?

[1139:26]

>> So 32 bits because if you're counting

[1139:27]

from 0 to 255, well that's 256 total

[1139:31]

possibilities. That's two to the eth

[1139:32]

which means 8 bits. 8 bits. 8 bits. 8

[1139:34]

bits. So IP addresses are 32 bits.

[1139:36]

Little trivia that's germanine only in

[1139:38]

so far as it does kind of limit how many

[1139:41]

total devices we could seem to have in

[1139:42]

the world. If you've got only 32 bits,

[1139:44]

how high can you count? Roughly

[1139:48]

>> two.

[1139:49]

>> So two to the 32nd power, which we've

[1139:51]

generally ballparked as 4 billion, which

[1139:53]

is to say you can have 4 billion devices

[1139:55]

total, it would seem on the internet,

[1139:57]

which is a big number. But there's also

[1139:58]

a lot of humans nowadays. is and odds

[1140:00]

are most everyone in this room has at

[1140:01]

least two devices to their name. Maybe a

[1140:04]

phone and a laptop with which you're

[1140:05]

taking the course. Maybe even more

[1140:07]

devices thanks to the internet of things

[1140:09]

like smart home devices. We have so many

[1140:10]

IP addresses being assigned to things.

[1140:12]

So long story short, the world is

[1140:14]

gradually transitioning from this

[1140:16]

version here, IPv4,

[1140:18]

uh to IPv6, which instead of using 32

[1140:21]

bits is actually using 128 bits, which

[1140:23]

is crazy large and gives us more than

[1140:25]

enough IP addresses for the foreseeable

[1140:27]

future. To be fair, we've been talking

[1140:29]

about this for like 20, 30 years,

[1140:30]

transitioning from V4 to V6, and it's

[1140:32]

still gradually in motion. But for

[1140:34]

simplicity in the class and in general,

[1140:36]

we'll still use IPv4, if only because

[1140:38]

it's a little easier to wrap your mind

[1140:40]

around. Now, this is admittedly a pretty

[1140:42]

arcane diagram. But this is the diagram,

[1140:45]

ASI art, if you will, that's in the U

[1140:47]

official specification of what we mean

[1140:50]

by an IP datagramgram. More

[1140:52]

colloquially, this is what a packet

[1140:54]

actually looks like. Now, what are we

[1140:55]

looking at? Well, you're just looking at

[1140:57]

like a grid of bits. So this here

[1141:00]

represents 32 bits total where this is

[1141:02]

bit zero and that's bit 31 zero indexed

[1141:05]

all the way over there. And then each

[1141:07]

row represents 32 more bits. 32 more

[1141:09]

bits. 32 more bits. Which is to say

[1141:11]

anytime a computer like Phyllis sends an

[1141:14]

envelope of information on the internet.

[1141:16]

It contains at least this information. A

[1141:18]

whole bunch of bits broken down into

[1141:20]

bytes. Now, the only ones we'll really

[1141:22]

care about today are this one here,

[1141:24]

source address, which is to say when

[1141:25]

Phyllis sends that packet, she writes

[1141:27]

her source address, her IP address,

[1141:30]

something

[1141:31]

on the outside of the envelope, so to

[1141:33]

speak. And she also puts Brian's IP

[1141:35]

address, whatever that is, something

[1141:37]

else something else

[1141:39]

on the outside of the envelope as well.

[1141:41]

There's a whole bunch of other bits

[1141:42]

involved which are useful, but we'll

[1141:44]

wave our hands at those for today. But

[1141:46]

that really speaks to what's actually

[1141:48]

happening. And if we do this

[1141:49]

metaphorically in the real world, it's

[1141:51]

kind of like taking out that envelope.

[1141:53]

And for instance, if Brian's IP address

[1141:56]

is 1.23.4

[1141:58]

for the sake of discussion, Phyllis in

[1142:00]

advance of our filming that bit would

[1142:02]

have written something like 1.23.4

[1142:06]

in the middle of the envelope, just like

[1142:07]

we would in the real world. But

[1142:09]

presumably, she wants Brian to be able

[1142:11]

to reply to acknowledge receipt or send

[1142:13]

his own message. So, she's also going to

[1142:15]

put her IP address, for instance, in the

[1142:17]

top left corner of the envelope,

[1142:20]

5.67.7.8

[1142:22]

for the sake of discussion, so that

[1142:23]

Brian knows when he writes out his own

[1142:25]

packet of information how to actually or

[1142:28]

to whom to reply. But at the end of the

[1142:31]

day, it's all just bits uh being sent in

[1142:33]

a specific pattern and there is formal

[1142:36]

documentation is the the order in which

[1142:38]

all of those bits will actually be sent

[1142:40]

out on the wire or wirelessly. So in

[1142:43]

short, IP ensures that all of us have

[1142:45]

unique IP addresses via which data can

[1142:48]

go from us or to us. But that's only one

[1142:51]

problem. Nowadays, of course, servers

[1142:52]

can do so many other things. They can do

[1142:54]

email and chat and video conferencing,

[1142:57]

game servers, and who knows what. And it

[1142:59]

would be nice if a single server

[1143:01]

certainly could do multiple things. And

[1143:03]

in fact, that's very much the case.

[1143:04]

Single servers nowadays, and a server is

[1143:06]

just a term of art for a computer used

[1143:08]

to serve information to other people. By

[1143:10]

contrast, our laptops, our desktops are

[1143:12]

generally clients because they only

[1143:14]

serve one of us, not multiple people.

[1143:16]

But these are just uh terms of art.

[1143:18]

We're describing at the end of the day

[1143:19]

still computers. IP only ensures that we

[1143:22]

can uniquely address computers on the

[1143:23]

internet. But there's another protocol

[1143:25]

in TCPIP, namely the TCP portion that

[1143:29]

allows computers to uniquely identify

[1143:33]

services that they're offering uh to the

[1143:36]

rest of the world. So for instance, TCP

[1143:39]

allows it allows a computer to

[1143:41]

distinguish whether it has received a

[1143:42]

packet that's an email or receive a

[1143:44]

packet that's a chat message or a piece

[1143:46]

of a video conference or the like, which

[1143:48]

is to say there's more than just IP

[1143:50]

addresses on the outside of these

[1143:52]

envelopes. There are also what are

[1143:53]

called port numbers as well. Uh

[1143:56]

similarly, numeric uh numeric values

[1143:59]

that are usually in the range of like 0

[1144:01]

to one uh zero on up in the low

[1144:03]

thousands and they're standardized. For

[1144:05]

instance, if you are requesting a web

[1144:07]

page using http

[1144:09]

slash with which all of us are

[1144:10]

presumably familiar, unbeknownst to you,

[1144:12]

on the outside of the virtual envelope

[1144:15]

that your computer subsequently sends is

[1144:17]

the port number 80. Because when the

[1144:19]

server receives that, it knows, oh, this

[1144:21]

human is requesting a web page and not,

[1144:23]

for instance, their email or something

[1144:25]

else. or nowadays if you're using HTTPS

[1144:28]

where the S denotes secure in the URL

[1144:31]

you're actually using port 443 which is

[1144:33]

just an arbitrary number that a bunch of

[1144:34]

humans in a room decided on years ago to

[1144:37]

standardize what goes on the outside of

[1144:38]

an envelope. So just to be more clear

[1144:40]

then when Phyllis is sending a request

[1144:42]

to Brian and if Phyllis for instance is

[1144:44]

the client just a human using a computer

[1144:46]

and Brian in this story is now a web

[1144:48]

server better yet a secure web server

[1144:51]

that's somehow encrypting or scrambling

[1144:53]

the information to keep it secure well

[1144:55]

on the outside of this envelope after

[1144:57]

Brian's IP address which was 1.2.3.4

[1145:00]

four. Phyllis is also going to write the

[1145:03]

number 443 so that when Brian receives

[1145:06]

and opens this envelope, he knows what

[1145:08]

he's looking at. A request for a web

[1145:09]

page and not an email or a chat message

[1145:11]

or something else. Moreover, we can

[1145:13]

continue the story just a little bit

[1145:15]

further. Phyllis also writes on the

[1145:18]

envelope not only her IP address 5.67.8,

[1145:21]

but some number as well in that top

[1145:24]

lefthand corner, whatever it happens to

[1145:26]

be, which is a port number via which

[1145:28]

Brian can reply to her. In this way,

[1145:31]

Phyllis can in effect have multiple tabs

[1145:33]

open, be using Zoom and uh some chat

[1145:36]

software or something else, running

[1145:38]

multiple programs on her computer, and

[1145:40]

the internet packets are all coming in,

[1145:42]

but her computer knows to which tabs or

[1145:45]

applications those packets belong. So,

[1145:48]

if you really want to geek out, here's

[1145:50]

what this thing looks like. This is just

[1145:51]

the sequencing of bits for TCP as well,

[1145:55]

which is to say, in addition to the

[1145:56]

dozens of bits we looked at a moment ago

[1145:58]

that standardize what IP is putting on

[1146:00]

the outside of the envelope, TCP is

[1146:03]

adding uh 16 bits that specify a port

[1146:06]

number, which means you can indeed have

[1146:07]

tens of thousands of possible port

[1146:09]

numbers, a destination port number, and

[1146:11]

a bunch of other stuff, including this

[1146:13]

so-called sequence number, which happens

[1146:15]

to be a 32bit value, which is actually

[1146:18]

pretty important because quite often

[1146:20]

when sending messages on the internet,

[1146:22]

they're pretty large. And it would be

[1146:24]

nice if one person downloading a big

[1146:26]

image or one person downloading a movie

[1146:28]

or streaming a movie doesn't mean that

[1146:30]

no one else on the internet can do

[1146:32]

something else at that moment in time.

[1146:34]

So for the sake of discussion, suppose

[1146:35]

that this very happy cat here is a very

[1146:37]

large JPEG, for instance, a very large

[1146:39]

graphical file. It would be nice, let's

[1146:41]

say, that if Phyllis is trying to send

[1146:44]

or receive an image as large as this,

[1146:46]

it's not just in one massive envelope

[1146:48]

that's going to prevent a whole bunch of

[1146:49]

other users from similarly using the

[1146:51]

internet at that moment in time. So, at

[1146:53]

the risk of a a bit of heresy, we can

[1146:55]

actually tear this cat in half and

[1146:59]

fragment it really. And then inside of

[1147:02]

Phyllis's envelope or equivalently

[1147:04]

Brian's reply depending on where this

[1147:06]

cat is coming from or going to part of

[1147:09]

that cat can go in this envelope. And

[1147:11]

now say in the bottom left hand corner

[1147:13]

of this envelope, Phyllis or Brian could

[1147:16]

write the sequence number in question.

[1147:19]

One out of four, two out of four, three

[1147:21]

out of four, four out of four. So that

[1147:22]

when this and hopefully the other

[1147:24]

packets arrive at their destination, the

[1147:26]

recipient's computer can check, okay,

[1147:28]

this was a really big file in this case.

[1147:30]

Do I have all of the parts? Yes, it can

[1147:32]

be inferred from the so-called sequence

[1147:35]

number which we've represented there in

[1147:36]

that memo field of the envelope. There's

[1147:38]

a bunch of other stuff that can go on

[1147:39]

here too, including prioritization of

[1147:42]

data as well. Um, but ultimately TCP

[1147:45]

just allows servers to handle multiple

[1147:48]

types of services and also allows it to

[1147:51]

receive data reliably because if for

[1147:54]

instance a recipient only gets two out

[1147:57]

of the four packets or three out of the

[1147:59]

four packets, the fact that there's a

[1148:01]

sequence number involved is enough

[1148:02]

information for that recipient to say to

[1148:04]

the sender, hey, I'm missing one or two

[1148:06]

or three or more packets. Please resend

[1148:09]

them. So in short, TCP guarantees

[1148:11]

delivery by just doing some bookkeeping

[1148:14]

on the outside of these envelopes. So in

[1148:16]

short, IP allows us to uniquely identify

[1148:18]

computers and TCP guarantees delivery

[1148:21]

and allows us to multiplex so to speak

[1148:23]

among multiple services on the same

[1148:25]

device. Questions on the uh this jargon

[1148:30]

thus far because today's filled with

[1148:31]

acronyms unfortunately.

[1148:34]

questions on IP, TCP or anything

[1148:38]

else. Okay, so seeing none, uh, as

[1148:42]

promised, let's do yet another acronym.

[1148:44]

So, it would be pretty tedious if

[1148:46]

Phyllis and Brian and all of us humans

[1148:48]

had to write actually IP addresses into

[1148:50]

our browsers when visiting websites. Uh,

[1148:53]

and in fact, most of us never do that.

[1148:56]

Instead, we go to google.com or

[1148:58]

Harvard.edu edu or actual domain name so

[1149:01]

to speak which were so much easier for

[1149:03]

us humans to remember than these

[1149:04]

arbitrary IP addresses that are either

[1149:06]

automatically assigned to computers or

[1149:08]

manually configured uh by humans

[1149:10]

configuring servers but there's another

[1149:12]

acronym in the world and there's another

[1149:14]

technology used on the internet namely

[1149:15]

DNS for domain name system and this is

[1149:18]

just a certain type of server that every

[1149:20]

home has if even if you didn't know it

[1149:23]

every uh campus has every company has

[1149:26]

there's so many DNS servers around the

[1149:28]

world but their purpose in life quite

[1149:29]

simply is to translate what you and I

[1149:31]

know as domain names like google.com,

[1149:34]

harvard.edu and the like into their

[1149:36]

corresponding IP addresses. And so in

[1149:39]

short, inside of these DNS servers are

[1149:41]

essentially like a two column table or

[1149:43]

spreadsheet, however you want to think

[1149:44]

about it, whereby here's all of the

[1149:46]

domain names in the world. Here are all

[1149:47]

of the corresponding IP addresses in the

[1149:49]

world. And so when your Mac or PC or

[1149:52]

phone being used by you is trying to

[1149:54]

access google.com or harbor.edu, edu.

[1149:57]

That device certainly when it's first

[1149:58]

booted up has no idea what IP address

[1150:02]

what the IP address is for that server.

[1150:04]

It's not the case that Apple or Google

[1150:05]

are pre-installing billions of IP

[1150:08]

addresses inside of our devices. But

[1150:10]

your device is smart enough to ask the

[1150:12]

local network at home on campus or at

[1150:15]

work. Well, what is the IP address of

[1150:17]

google.com? What is the IP address of

[1150:18]

harbor.edu? Then what your Mac, PC or

[1150:21]

phone actually do upon getting that

[1150:23]

answer from one of these local DNS

[1150:24]

servers is it writes the corresponding

[1150:27]

IP address on the outside of that

[1150:29]

envelope. So it's a wonderfully useful

[1150:31]

service that just makes the internet

[1150:33]

more useful for you and I to use because

[1150:35]

we can use names instead of IP addresses

[1150:37]

as well. Um technically these things are

[1150:39]

called fully qualified domain names.

[1150:41]

Where do they come from? Well, some of

[1150:42]

you might actually have your own

[1150:43]

personal website. You might have gone

[1150:44]

through this process. It's actually not

[1150:46]

that hard to get your own domain name.

[1150:47]

You can go to any number of what are

[1150:49]

called internet registars and pay them

[1150:52]

some money and it's essentially a on a

[1150:54]

rental basis. So you rent a domain name

[1150:56]

for a year or maybe three or five years

[1150:58]

at a time and they can automatically

[1150:59]

bill you. The domain name might be as

[1151:01]

little as a dollar per year or thousands

[1151:04]

of dollars per year depending on whether

[1151:05]

someone has scooped it up and is maybe

[1151:07]

squatting or the like. But all you do

[1151:09]

ultimately is pay someone money and they

[1151:11]

give you the rights to use that domain

[1151:13]

name. And then what you do technically

[1151:14]

is you configure some DNS server

[1151:16]

somewhere in the world to know what the

[1151:19]

eventual IP address is for your server

[1151:22]

that's going to serve up your domain

[1151:24]

names, web pages. And long story short

[1151:27]

with DNS, I say that you have one in

[1151:29]

your home and on your work and on your

[1151:30]

campus because it's a very hierarchical

[1151:32]

kind of structure. like there is out

[1151:34]

there somewhere these so-called root

[1151:36]

servers that essentially know what all

[1151:38]

the IP addresses are of all of the

[1151:40]

dotcoms for instance or all of theus or

[1151:43]

the like but my Mac doesn't know that

[1151:45]

and so my Mac might actually ask that

[1151:47]

root server what is that IP address but

[1151:50]

in ter more efficiently my Mac is better

[1151:52]

still going to ask the local network

[1151:54]

first when I'm at home it asks my home

[1151:56]

DNS server which is built into the

[1151:58]

little home router that you've got

[1151:59]

somewhere in there uh or if you're on

[1152:00]

campus it asks Harvard's DNS server And

[1152:03]

this whole design is recursive to borrow

[1152:05]

a term from a few weeks ago in that if

[1152:07]

my computer doesn't know the answer,

[1152:09]

what's the IP address for this domain?

[1152:11]

If Harvard doesn't know the answer, it

[1152:12]

eventually gets escalated to those

[1152:14]

so-called root servers, but then cached

[1152:16]

that is remembered by all of these other

[1152:18]

DNS servers along the way. So, it's a

[1152:20]

very elegant hierarchical design, but at

[1152:22]

the end of the day, it's just doing

[1152:23]

this. It's a big cheat sheet of domain

[1152:25]

names to IP addresses, and the server is

[1152:27]

responding for us. All right, one more

[1152:30]

acronym. So, how do I know what my MAC's

[1152:33]

IP address should be? How do I know what

[1152:35]

my phone's IP address should be? Uh, how

[1152:38]

do I know what the IP address is of the

[1152:40]

DNS server of whom I should be asking

[1152:42]

any of these questions? How do I know

[1152:43]

the IP address of the router to whom to

[1152:45]

hand my data off to? Like, there's a lot

[1152:47]

of assumptions built into the story

[1152:49]

we've been telling. And the answer is,

[1152:51]

unfortunately, yet another acronym,

[1152:53]

DHCP, is the solution to all of those

[1152:56]

problems. And it wasn't always. You

[1152:57]

know, back in my day, we used to have to

[1152:59]

manually type in what our computer's IP

[1153:01]

address was based on what some human

[1153:03]

told us it would be. We had to type in

[1153:05]

our DNS server, type in our router

[1153:07]

address. But now, uh, now DHCP is just

[1153:11]

yet another server running in your home

[1153:13]

network, running on campus, running in

[1153:15]

your corporate network whose purpose in

[1153:16]

life is to answer questions of the form,

[1153:18]

what is my IP address? which is to say

[1153:21]

when you boot up your Mac, your PC, your

[1153:22]

phone for the first time, it essentially

[1153:24]

broadcasts a message like hello world,

[1153:26]

what's my IP address? And hopefully

[1153:28]

there's one such DHCP server on that

[1153:31]

local network wired or wirelessly that

[1153:34]

will respond based on how Harvard or

[1153:36]

Comcast or Verizon or someone at home

[1153:38]

has configured it to tell you what your

[1153:40]

devices IP address is, what the IP is of

[1153:43]

your local router, what the IP address

[1153:44]

is or are of your DNS servers and the

[1153:47]

like. And so this is why things just

[1153:49]

work nowadays once you've connected to

[1153:51]

like a Wi-Fi network or physically

[1153:53]

plugged in. Dynamic host configuration

[1153:56]

protocol didn't always exist. Wonderful

[1153:58]

that it now does. All right, enough sort

[1154:01]

of outside of the envelope stuff.

[1154:03]

Everything else today will be a deeper

[1154:05]

dive inside the inside of this envelope

[1154:08]

to look at what actually are the

[1154:10]

messages that we are sending, receiving,

[1154:12]

how are you structuring the web pages

[1154:14]

and designing everything that comes back

[1154:16]

from the server to the client. And let's

[1154:18]

dive in then to this acronym HTTP which

[1154:21]

you've been typing for years or seeing

[1154:23]

for years even though you don't really

[1154:24]

have to type it anymore because browsers

[1154:26]

just assume that this is what you want.

[1154:28]

But HTTP is another protocol, hypertext

[1154:32]

transfer protocol, whose purpose in life

[1154:34]

is to request web pages and receive web

[1154:36]

pages. As a protocol, it just

[1154:38]

standardizes like what goes inside of

[1154:40]

that envelope when you're trying to use

[1154:42]

the web. There are different protocols

[1154:44]

for email, different protocols for Zoom,

[1154:46]

different protocols for Discord, and any

[1154:48]

number of other internet services. We'll

[1154:50]

focus predominantly today on HTTP, which

[1154:52]

happens to use ports 80 and 443, among

[1154:55]

others, as we saw. So let's see what

[1154:57]

HTTP uh it uh is all about or HTTPS the

[1155:02]

corresponding secure version thereof. So

[1155:05]

here is a URL canonical URL in that it

[1155:07]

has a whole bunch of components. Let's

[1155:09]

consider what some of the jargon is that

[1155:11]

we're going to start taking for granted.

[1155:12]

So if you go to httpswww.agample.com/

[1155:17]

you are implicitly requesting the root

[1155:20]

of that website. root just means the

[1155:23]

default directory, the default folder if

[1155:25]

you will. And that's what the yellow

[1155:26]

highlighted slash here just means like

[1155:28]

give me the default web page.

[1155:30]

Technically speaking, what you're going

[1155:31]

to receive in your browser, unbeknownst

[1155:33]

to you, is an actual file. By

[1155:35]

convention, it's a file called

[1155:36]

index.html,

[1155:38]

maybe index.htm, or any number of other

[1155:40]

files. But it would be pretty stupid if

[1155:43]

we as humans all had to type out the

[1155:45]

actual file name that we want. So the

[1155:47]

server by default is just going to

[1155:48]

return you the root of the website. If

[1155:51]

though you're inside of a folder or you

[1155:53]

do actually click on a link that leads

[1155:55]

you to a file, you might very well have

[1155:57]

at the end of this domain name a full

[1155:59]

path as well, which might contain zero

[1156:01]

or more folder names and zero or more

[1156:03]

file uh zero or one file names as well.

[1156:06]

In fact, it could be explicitly

[1156:07]

file.html orfolder/or/folder/file.html.

[1156:12]

You've probably seen thousands of these

[1156:14]

over time, even if you haven't really

[1156:15]

given it much thought. So we today

[1156:18]

onward will be creating all of this

[1156:20]

stuff here but we need to understand

[1156:22]

what's going on to the left too. So here

[1156:24]

is the so-called domain name or more

[1156:26]

properly the fully qualified domain name

[1156:28]

and it has a few different parts too. So

[1156:30]

this is technically the domain name as

[1156:32]

we all refer to it something.com

[1156:36]

means commercial and that com is more

[1156:38]

specifically known as a tople domain or

[1156:41]

tldd. Back in the day there were only a

[1156:43]

few of these.gov.com.net.org

[1156:46]

org.edu and a bunch of others. Now,

[1156:48]

there's hundreds, if not thousands of

[1156:50]

them. Many of them aren't really used

[1156:52]

prominently in the wild, but there are

[1156:53]

some not on that original list, like

[1156:55]

CS50 uses. IO a lot, which doesn't mean

[1156:58]

input output. It's actually a two-letter

[1157:01]

country code that has been uh uh

[1157:04]

essentially rented to us and anyone else

[1157:06]

using that same TL because in the

[1157:08]

English- speakaking world, io actually

[1157:10]

sounds kind of cool. It's kind of

[1157:11]

conotes indeed input and output.tv TV is

[1157:14]

another one that actually belongs to a

[1157:16]

country but in fact also sounds like uh

[1157:18]

in English television and so that too

[1157:20]

has been used as well but in general

[1157:22]

there are top level domains like these

[1157:24]

some of them now are full words some of

[1157:26]

them are two characters denoting they

[1157:27]

belong to a country they are the sort of

[1157:30]

top level indeed uh categorization of

[1157:33]

all of these websites meanwhile many

[1157:35]

URLs but not all also have something to

[1157:37]

the left of the domain name known as a

[1157:38]

host name which technically speaking

[1157:40]

refers to the name of the server that

[1157:43]

you're requesting specifically. It

[1157:44]

doesn't have to be literally one server.

[1157:46]

www can refer to dozens of hundreds

[1157:49]

thousands of servers. Indeed, if you go

[1157:51]

to any popular website like gmail.com or

[1157:53]

the like. Even though you only have one

[1157:55]

domain name, somehow or other

[1157:57]

technologically it is referring to

[1157:59]

clusters of hundreds or thousands of

[1158:01]

servers that ensure that they can handle

[1158:03]

all of the customers that might visit

[1158:05]

that site. And then lastly, there's this

[1158:07]

the scheme or the protocol in use

[1158:09]

specifically. And for our discussion

[1158:10]

today, it's always going to be HTTPS,

[1158:12]

which is ideal because it's secure and

[1158:14]

encrypted somehow. Uh, but it can also

[1158:16]

be indeed HTTP col. So that's it. Like

[1158:20]

that's just the jargon with which you

[1158:22]

should be familiar when it comes to URLs

[1158:24]

like these. And what we'll be doing

[1158:26]

today is actually creating content that

[1158:28]

lives at URLs like that and serving it

[1158:31]

up to us. But what do the messages

[1158:35]

ultimately look like that are going

[1158:36]

inside of these envelopes? what the URLs

[1158:39]

are doing are just getting us to the

[1158:41]

right place. But how do we express in

[1158:44]

some form of code that we want this

[1158:46]

fileh from this server using encryption

[1158:50]

in this way? Well, inside of the virtual

[1158:52]

envelopes that Phyllis was sending to

[1158:53]

Brian and he would have ultimately sent

[1158:55]

back to her are messages that look like

[1158:58]

this. Uh get, post, and a bunch of other

[1159:01]

verbs, if you will. So, HTTP supports a

[1159:04]

bunch of operations or verbs, namely

[1159:07]

get, post, and a few others. And it was

[1159:09]

in the the first of these that Phyllis

[1159:10]

would have put inside of her envelope

[1159:12]

initially in order to get a web page

[1159:14]

like a cat from Brian. Specifically,

[1159:17]

inside of the envelope, she would have

[1159:18]

had a textual message. It's not code per

[1159:20]

se. There's no functions or loops or

[1159:22]

variables or anything like that. It's a

[1159:24]

protocol just in the sense that humans

[1159:26]

years ago standardized what messages

[1159:29]

should appear inside of those envelopes

[1159:31]

if you want to get a web page from a

[1159:33]

server. So for instance, if Brian in

[1159:35]

this story is now suddenly harvard.edu,

[1159:37]

specifically www.har.edu,

[1159:39]

Phyllis's envelope would have contained

[1159:41]

a message saying get in all caps slash

[1159:43]

if she just wants the root or the

[1159:45]

default page from Brian's server, the

[1159:47]

version of HTTP that she's using, for

[1159:49]

instance, version two. And she would

[1159:51]

also specify just in case Brian is

[1159:53]

multitasking and serving up websites for

[1159:55]

different domain names on the same

[1159:56]

physical box which actual host that she

[1159:59]

wants and maybe a bunch of other lines

[1160:01]

as well. And hopefully if all goes well,

[1160:04]

Brian would have responded with an

[1160:06]

envelope of his own containing an HTTP

[1160:08]

response in answer to her HTTP request.

[1160:12]

And Brian's envelope would have

[1160:13]

contained a textual message that just

[1160:15]

confirms what version of HTTP he's

[1160:17]

using, a status code, which is an arcane

[1160:19]

number that just indicates in this case

[1160:21]

that everything is okay. All is well,

[1160:24]

and he would specify the type of content

[1160:26]

he's sending back to her in his own

[1160:27]

envelope because it could be HTML. More

[1160:30]

on that to later today. It could be a

[1160:31]

JPEG, it could be a GIF, it could be any

[1160:33]

number of other file formats. And this

[1160:35]

is just a hint to Phyllis's browser as

[1160:37]

to what's going to be inside of that

[1160:39]

envelope she is getting back within her

[1160:41]

browser. And then maybe a bunch of other

[1160:43]

stuff as well. So even though some of

[1160:46]

these details like these underlying

[1160:48]

implementation details might visually be

[1160:50]

new to you if you've never really

[1160:52]

thought about it, turns out we as

[1160:54]

aspiring programmers can actually see

[1160:56]

and and poke around with these building

[1160:58]

blocks and ultimately today take

[1161:00]

advantage of them. So you're about to

[1161:01]

see a program that's called curl which

[1161:03]

stands for connect URL. It's installed

[1161:05]

in Linux systems like cs50.dev. It's

[1161:08]

also comes with Macs and PCs quite

[1161:09]

frequently or you can easily install it.

[1161:11]

And essentially it's a headless browser

[1161:14]

that allows you to pretend to be a

[1161:15]

browser and grab the response from a

[1161:18]

server by pretending to send by actually

[1161:21]

sending the contents of an envelope like

[1161:24]

this. So for instance, if I want to

[1161:26]

pretend to be a browser and request

[1161:28]

harbor.edu, edu. I can type this in my

[1161:31]

cs50.dev terminal window. And let me go

[1161:34]

ahead and maximize its size and do the

[1161:36]

following. curl- i, which specifically

[1161:39]

is only going to show me the headers,

[1161:40]

the text that we were just talking

[1161:42]

about. And it's not going to send any of

[1161:43]

the contents of Harvard's website. Curl-

[1161:46]

capital I httpswww.harboard.edu/.

[1161:51]

So if I were typing this into a browser,

[1161:53]

I would actually see Harvard's homepage.

[1161:55]

In this case, I'm just going to see the

[1161:57]

contents of the envelope as black and

[1161:59]

white text on the screen. Specifically,

[1162:01]

only the first few lines, the so-called

[1162:03]

headers that the server is responding

[1162:05]

with, just as I claimed Brian would to

[1162:07]

Phyllis. I hit enter, and there's indeed

[1162:09]

more lines than I had in my slide, but

[1162:12]

you can see that everything is in fact

[1162:14]

200. Okay, this is a convention. 200

[1162:16]

means all is indeed okay. There's a

[1162:18]

bunch of other information here,

[1162:20]

including the date and time in which

[1162:21]

this response came back. Here's that

[1162:23]

content pipeline text HTML and then some

[1162:25]

other details and a whole bunch of other

[1162:27]

information as well. So that's one way

[1162:30]

of seeing what's going on underneath the

[1162:32]

hood. Well, what other responses might

[1162:35]

come back? Well, it turns out that 200,

[1162:38]

okay, is the best possible outcome, but

[1162:40]

there's another a bunch of other

[1162:41]

outcomes that are possible as well. For

[1162:43]

instance, sometimes you'll get not 200

[1162:46]

but 301, which means moved permanently.

[1162:50]

uh it uh colloquially speaking and what

[1162:53]

does this mean? Well, if a server

[1162:54]

responds to a browser with a numeric

[1162:56]

code of 301, that means that the browser

[1162:59]

is supposed to go to this location

[1163:01]

instead. It's sort of like putting a

[1163:03]

detour sign on the server that says

[1163:05]

there's nothing for you here. Go over

[1163:07]

here to this location instead. And now

[1163:09]

notice in this example, it's telling the

[1163:12]

user to go to httpsw.har.edu/

[1163:16]

do slash that's actually what I typed

[1163:18]

before so I would not have seen that

[1163:20]

myself but if I go back to VS Code here

[1163:23]

and let's run the exact same command but

[1163:25]

let's try to visit the insecure version

[1163:27]

of Harvard's website http slash which

[1163:30]

just means that anyone else on the

[1163:32]

internet can technically see what it is

[1163:34]

I am now doing with my browser which

[1163:36]

might not be desirable enter this time

[1163:39]

Harvard server does not just tell me 200

[1163:42]

okay it actually says 301 move

[1163:44]

permanently and if I read lower in these

[1163:46]

lines there indeed is the location to

[1163:48]

which I should actually go and it's a

[1163:49]

subtle difference. It's forcing me to go

[1163:51]

to https instead without actually

[1163:54]

showing me the contents of Harvard's

[1163:55]

website. So nowadays you and I don't

[1163:57]

even have to think about this. You and I

[1163:58]

are not even in the habit surely of

[1164:00]

typing http

[1164:02]

or https col.

[1164:05]

But the browser is ensuring in this case

[1164:07]

that you are redirected so to speak

[1164:09]

automatically to the secure version of

[1164:12]

that site instead. Now there's other

[1164:14]

status codes and in fact even if you

[1164:16]

never realized it before now what

[1164:17]

numeric code do you essentially you

[1164:19]

sometimes see on the internet when

[1164:20]

something goes wrong 404. So 404 is a

[1164:24]

weirdly public arcane error number error

[1164:27]

number or status code that just means

[1164:29]

file not found. And we can simulate this

[1164:32]

as follows. For instance if I in my

[1164:34]

terminal window do curl-hwww.har.edu

[1164:41]

I'll suppose that Harvard has a whole

[1164:43]

department dedicated to cats, which it

[1164:44]

does not. But if I hit enter here,

[1164:46]

you'll see that I get an HTTP24

[1164:50]

status code, which just means the

[1164:52]

website does not in fact exist. And if I

[1164:54]

visited https/www.har.edu/cats

[1164:58]

in my browser, I would presumably see

[1165:00]

some error page that may or may not show

[1165:02]

me visually 404. But many websites, most

[1165:05]

websites, for better or for worse,

[1165:07]

reveal this number. So much so that most

[1165:09]

everyone in this room is probably

[1165:11]

familiar with 404, even though its

[1165:13]

origin is this very low-level arcane

[1165:16]

status code buried in the HTTP headers

[1165:19]

inside of envelopes like these. There's

[1165:22]

a whole bunch of others if you'd like

[1165:24]

some fun facts. Uh 200 is indeed okay.

[1165:26]

301 is moved permanently. There's a

[1165:28]

bunch of other 300 ones that all relate

[1165:30]

to go elsewhere. Uh 400 generally means

[1165:33]

that you as the user have somehow done

[1165:35]

something wrong or next week as we start

[1165:37]

writing code that talks to web servers.

[1165:39]

Maybe your code has done something wrong

[1165:41]

when requesting a website. 500s are

[1165:44]

really bad. It means the server is

[1165:45]

messed up somehow. Either it's not

[1165:47]

available or the programmer made some

[1165:49]

bug in their code such that it's

[1165:51]

crashing with for instance something

[1165:52]

like an internal server error. Uh, we

[1165:54]

included 418, which is not actually a

[1165:56]

thing, but it was a fun uh um sort of

[1165:59]

April Fool's joke years ago where a

[1166:01]

bunch of uh humans thought it would be

[1166:02]

funny to write up a whole specification

[1166:04]

for what it means for a server to

[1166:05]

respond with a number of 418. Inside

[1166:09]

joke, not funny at the moment, but uh it

[1166:11]

is sort of part of internet lore

[1166:13]

nowadays. Um we can have a little bit of

[1166:16]

fun with this, maybe with the at the

[1166:17]

expense of our dear friends down the

[1166:19]

road. Um, for years now, someone has

[1166:22]

been paying for uh the following

[1166:25]

behavior. Let me go back to V uh VS Code

[1166:28]

here in my terminal window. Let me do

[1166:30]

curl- httpsychool.org.

[1166:36]

Have you ever been ever reply perhaps?

[1166:39]

Well, let me actually go to

[1166:40]

httpsafetyschool.org

[1166:44]

and just for fun, hit enter. Oh my

[1166:47]

goodness, look at where we are. So, how

[1166:51]

is this implemented? Well, if I finish

[1166:53]

what I began over here by just looking

[1166:54]

at the HTTP headers inside of the

[1166:56]

envelope my actual browser just sent to

[1166:59]

safetychool.org

[1167:01]

for like 20 years, presumably some

[1167:03]

Harvard alum has been paying the bill to

[1167:05]

rent this domain name just to have this

[1167:08]

trick implemented such that 301 move

[1167:10]

permanently is directing people ever

[1167:12]

since to yale.edu. There's a bunch of

[1167:15]

others if you go down the rabbit hole of

[1167:16]

looking on Reddit and the like Stanford,

[1167:18]

Berkeley, there's a healthy competition

[1167:19]

on East Coast and West Coast, but it all

[1167:21]

boils down to very arcane understanding

[1167:23]

of how HTTP works, the protocol that

[1167:27]

governs how data is sent from web

[1167:30]

browsers to web servers. Now, you can of

[1167:33]

course use curl for connecting to URLs

[1167:35]

in the context of something like CS50.

[1167:38]

You could have been doing stuffing stuff

[1167:39]

like this all the time though with your

[1167:41]

actual browser. So, I'm using Chrome

[1167:43]

here, but most any browser nowadays has

[1167:45]

the ability to give you developer tools

[1167:49]

uh natively, which is to say somewhere

[1167:51]

there should be an a menu option that

[1167:53]

lets you use developer tools that are

[1167:55]

conducive to someone who knows a bit of

[1167:56]

programming to poking around underneath

[1167:58]

the hood of the browser and see what's

[1168:00]

going on. For instance, I'm going to go

[1168:02]

ahead and open up a new window here, and

[1168:05]

I'm going to rightclick on the

[1168:06]

background, or I can go to the

[1168:08]

appropriate menu in Chrome's dot dot dot

[1168:10]

menu, and I'm going to go to inspect,

[1168:12]

which pulls up what we're going to call

[1168:14]

developer tools. I'm doing it incognito

[1168:17]

mode for reasons we'll see next week.

[1168:18]

This has the effect of clearing

[1168:20]

automatically any of my cookies, my

[1168:22]

browser history, because most anytime I

[1168:23]

do something with the web browser today,

[1168:25]

I want to pretend like I'm doing it for

[1168:26]

the very first time so that the behavior

[1168:28]

is exactly as we suspect. uh expect. So

[1168:31]

down here, now that I've opened up the

[1168:34]

so-called developer tools in Chrome, and

[1168:36]

they look almost the same in Safari and

[1168:37]

Edge and a bunch of other browsers as

[1168:39]

well, I will see a tab called elements,

[1168:41]

which shows me all of the elements of

[1168:43]

this web page once it appears, including

[1168:44]

the so-called HTML code we're about to

[1168:46]

write. I can see a console where error

[1168:49]

message might sometimes appear, similar

[1168:50]

in spirit to the terminal window in VS

[1168:52]

Code. I can also see the network

[1168:54]

connections that the browser is making

[1168:56]

to the server. And that's where I

[1168:57]

thought we'd start our attention here.

[1168:59]

Here I have a brand new browser window.

[1169:01]

I'm clicking on network over here. Um,

[1169:04]

just to make sure we can see everything

[1169:05]

without it getting automatically

[1169:06]

deleted, I've clicked on preserve log

[1169:09]

and disable cache just so that it

[1169:11]

behaves exactly as expected. And now

[1169:13]

let's go up here for the first time in

[1169:15]

this incognito window and go to

[1169:17]

http/safetieschool.org.

[1169:21]

Enter. And you'll see a whole bunch of

[1169:25]

output including this warning in this

[1169:27]

particular mode. This is increasingly

[1169:29]

common nowadays for websites that do not

[1169:31]

support HTTPS, which this alum hasn't

[1169:33]

been paying for. Uh you'll get a warning

[1169:35]

typically that specifies you might not

[1169:37]

want to do this because the whole world,

[1169:39]

at least the whole world between you and

[1169:41]

point B, might know what it is you're uh

[1169:43]

accessing on the web. I can go ahead and

[1169:45]

pass through this. In fact, once I do

[1169:47]

that and click on connect to site, we'll

[1169:51]

see even more output at the bottom and a

[1169:53]

whole bunch of output that's kind of

[1169:54]

overwhelming. Notice at bottom left

[1169:56]

here, just going to safetychool.org

[1169:59]

resulted in 61 HTTP requests, in effect,

[1170:02]

61 envelopes going back and forth. I'm

[1170:06]

going to focus though on the ones at the

[1170:07]

very top here, whereby when we finally

[1170:10]

click through that warning, and I got

[1170:13]

back a response from the server, having

[1170:15]

visited safetieschool.org, here is

[1170:17]

Chrome's presentation of the same

[1170:19]

information that curl was showing me in

[1170:21]

my terminal window. The message that

[1170:23]

came back was 301 move permanently. The

[1170:25]

protocol or the verb being used was get.

[1170:28]

There's some uh mentions of the IP

[1170:30]

address in question here and a whole

[1170:31]

bunch of other stuff that we'll wave our

[1170:33]

hands at for today. So all of this time

[1170:35]

you can see the same and let's try this

[1170:37]

with some cats. Let me click on the

[1170:39]

little ghostbuster symbol to clear

[1170:41]

everything uh down in the developer

[1170:43]

tools. Let me zoom out and this time let

[1170:45]

me go to httpsw.har.edu/cats

[1170:49]

edu/cats which recall did not exist

[1170:52]

according to curl. If I hit enter, I do

[1170:54]

see a web page. It's interesting that

[1170:57]

Harvard has chosen to fairly arcanely

[1170:59]

reveal to all visitors 404, which means

[1171:02]

nothing except in so far as the status

[1171:04]

code. But if I scrolled through all of

[1171:07]

the 59 requests that were involved and

[1171:10]

just displaying this very graphical page

[1171:12]

and go back to the top, you'll see by

[1171:14]

clicking on the first row for cats

[1171:16]

itself that I used get to get it uh that

[1171:19]

URL/cats in the end and it was indeed

[1171:22]

404 not found. So you can sort of have

[1171:24]

all this fun on your own by just poking

[1171:26]

underneath the hood of what your browser

[1171:28]

has been hiding from you all of this

[1171:30]

time.

[1171:32]

All right. Any questions now before we

[1171:36]

dive in?

[1171:39]

No. All right. Well, that's the network

[1171:41]

tab. Let's look at some of the others

[1171:44]

and see how we can start writing the

[1171:45]

stuff oursel. Let me go to stanford.edu.

[1171:48]

Enter. A whole bunch of things will fly

[1171:50]

across the screen, but this time I'm

[1171:51]

going to go to the elements tab. And

[1171:53]

what we're about to dive into is an

[1171:55]

actual language, not a programming

[1171:56]

language, a markup language called HTML,

[1171:59]

hypertext markup language, whose purpose

[1172:00]

in life is just to tell browsers what to

[1172:03]

display on the screen. So here is all of

[1172:05]

the so-called HTML that some human or

[1172:08]

humans or software at Stanford wrote in

[1172:10]

order to create Stanford's homepage,

[1172:12]

which as of today looks lovely like

[1172:14]

this. Uh the interesting thing though

[1172:16]

about the code that Stanford has written

[1172:18]

to generate this website is that it's

[1172:20]

being sent to me as a copy. And this is

[1172:22]

quite unlike the code we've been writing

[1172:24]

thus far. Um when you wrote code in

[1172:26]

Scratch, it was sort of there in the

[1172:28]

browser and stored on MIT server. When

[1172:30]

you wrote C code and ran it, it was

[1172:32]

inside of the code space and not given

[1172:34]

to any user who might access it. The way

[1172:36]

the web works though is a little bit

[1172:37]

different. Inside of those envelopes are

[1172:40]

literally copies of what's on the server

[1172:42]

being sent to the browser. And so it's

[1172:44]

your browser, the so-called client,

[1172:46]

that's actually reading that code, HTML

[1172:49]

in this case, top to bottom, left to

[1172:50]

right, and figuring out how to display

[1172:52]

it. It's not executed on the server per

[1172:54]

se. Now, that story is going to change a

[1172:56]

bit next week when we start using Python

[1172:58]

to dynamically generate HTML so that

[1173:01]

we're not writing all of this code by

[1173:03]

hand after this week, but for now,

[1173:05]

everything you see was the result of the

[1173:08]

browser executing code that Stanford

[1173:10]

wrote. The implication of that is that

[1173:13]

we can have a bit of fun with these same

[1173:14]

developer tools. For instance, if I

[1173:17]

control-click or rightclick on something

[1173:19]

like the word Stanford in the middle

[1173:21]

middle of their homepage, choose that

[1173:23]

same inspect option. What's nice about

[1173:25]

these developer tools is it's going to

[1173:27]

jump to the very line of code that

[1173:28]

created that Stanford brand name in the

[1173:31]

middle of the web page. And this is a

[1173:32]

wonderful teaching and learning tool

[1173:34]

because in the days to come when you're

[1173:35]

trying to learn more and more HTML, you

[1173:37]

can literally do this for any website on

[1173:39]

the internet and understand how it is

[1173:40]

someone implemented a design for

[1173:42]

instance that you really like and you

[1173:44]

can learn from other websites how

[1173:46]

they've constructed the same. So over

[1173:48]

here you'll see that the word Stanford

[1173:50]

is just in the source code of this page

[1173:52]

in the so-called HTML and you know just

[1173:54]

for fun I can change it to Harvard. Hit

[1173:56]

enter and now Stanford's website looks

[1173:58]

like we've been there um and rather

[1174:00]

hacked it. Of course, it's not that easy

[1174:02]

to hack Stanford's website. What have I

[1174:04]

presumably only done just now?

[1174:08]

I've changed my local copy of that

[1174:10]

particular website. So, if I just click

[1174:12]

on the reload icon, I'll actually see

[1174:15]

that Stanford's website, for better, for

[1174:17]

worse, still looks like that. But this

[1174:19]

speaks to now the control that we have

[1174:21]

within our browser to actually

[1174:22]

manipulate and learn from what it is

[1174:24]

that's going on underneath the hood. So,

[1174:28]

let's dive into this language called

[1174:29]

HTML, hypertext markup language. It's

[1174:31]

not a programming language, which means

[1174:33]

we're going to fly through it even

[1174:34]

quicker than usual because it really

[1174:35]

just contains some basic building blocks

[1174:38]

that do have some interesting

[1174:39]

intellectual design under them, but for

[1174:41]

the most part, it becomes an exercise

[1174:42]

ultimately and just like looking up

[1174:44]

other tags that exist, read the

[1174:45]

documentation and figure out how you can

[1174:47]

use them to do other features in

[1174:49]

websites. So, let's take a look at

[1174:50]

perhaps the simplest of webpage and

[1174:52]

specifically glean from them what tags

[1174:54]

are and what attributes are. really the

[1174:56]

only two terms of art that are going to

[1174:58]

be generained for this particular

[1174:59]

language. No loops, no conditionals, no

[1175:01]

variables, no complexity really other

[1175:03]

than basic building blocks like these.

[1175:05]

So here is HTML for the simplest of

[1175:08]

websites. This is like a mini version of

[1175:10]

what Stanford's uh team presumably wrote

[1175:12]

on their server, but it's only like a

[1175:14]

dozen lines of code instead of hundreds

[1175:16]

or thousands, however long that website

[1175:18]

was. Any web page written today,

[1175:20]

assuming it's using the latest version

[1175:21]

of HTML, which happens to be version

[1175:23]

five as of today, uh begins with code

[1175:26]

that looks like this. This kind of code

[1175:29]

will presumably be stored in a file

[1175:31]

called file.html,

[1175:33]

uh index.html, Stanford.html, whatever

[1175:36]

the file is actually named. This is

[1175:38]

simply what's going to be inside of the

[1175:39]

contents. You could save this file on

[1175:42]

your own Mac, open it up, and your

[1175:43]

browser would open it, but you're going

[1175:45]

to be the only one in the world that can

[1175:46]

actually see the contents of that web

[1175:48]

page if it's just on your Mac or just on

[1175:50]

your PC. So, we of course are going to

[1175:52]

be writing HTML on a server so that not

[1175:55]

just you, but in theory, especially for

[1175:57]

your final project, anyone on the world

[1175:59]

with an internet connection can access

[1176:00]

the same. So, we within the context of

[1176:03]

CS50.dev dev are going to start using

[1176:05]

this new command HTTP server whose

[1176:08]

purpose in life is just to serve up

[1176:10]

files via HTTP. Now, there's kind of an

[1176:13]

interesting design going on here because

[1176:15]

if we use ht if we use uh cs50.dev,

[1176:19]

otherwise known as GitHub code spaces,

[1176:21]

there's already a web server running on

[1176:24]

that website because when you go to

[1176:26]

cs50.dev dev and log in and get

[1176:28]

redirected some longer URL. You're using

[1176:30]

a web application aka VS Code that

[1176:33]

allows you to write code in the cloud.

[1176:35]

Now, that application by default is

[1176:37]

running on port 80 and 443. So, it

[1176:40]

doesn't matter if you start at HTTP or

[1176:41]

HTTPS, both will work. But that means

[1176:44]

that your code that we write today and

[1176:47]

you write for the next problem set or

[1176:49]

for your final project can't live at

[1176:51]

port 80 or port 443 because GitHub, the

[1176:54]

company that hosts this, is already

[1176:56]

using those default standard ports. But

[1176:58]

we can use any number of other port

[1176:59]

numbers. I claimed earlier there's tens

[1177:01]

of thousands of numbers that we could

[1177:02]

use. So that's what we're actually going

[1177:04]

to do. So let me go back to VS Code

[1177:06]

here. Let me shrink down my terminal

[1177:09]

window. Let me create a first file today

[1177:12]

called for instance uh hello.html.

[1177:16]

Enter. And now I've got an empty tab as

[1177:18]

usual. I'm going to very quickly whip up

[1177:20]

the exact same contents that we just

[1177:22]

saw. So an angled bracket, an

[1177:23]

exclamation point, dock type HTML, then

[1177:26]

open bracket HTML, close bracket, and

[1177:29]

notice the autocomplete kicked in for

[1177:30]

this particular language. So I don't

[1177:32]

have to type everything myself. Inside

[1177:34]

of this tag, so to speak, I'm now going

[1177:36]

to put a head tag inside of which is

[1177:38]

going to be a title tag. I'm going to

[1177:40]

say something like hello title just to

[1177:41]

be quick. And then down here below those

[1177:44]

lines, I'm going to put a so-called body

[1177:46]

tag inside of which is hello body just

[1177:48]

for some quick text. And that's it. This

[1177:51]

is now a file inside of my code space.

[1177:54]

And there's no command to just compile

[1177:57]

or run this in the terminal because the

[1177:58]

goal is going to be to open this HTML

[1178:00]

file with a browser. If I want to do

[1178:02]

that in another browser tab, I need to

[1178:04]

tell code my code space to serve that

[1178:08]

file via HTTP. So, the simplest way to

[1178:11]

do this is as follows, http-server

[1178:14]

enter. You're going to see a whole bunch

[1178:15]

of text on the screen. You're going to

[1178:17]

see a green button hopefully pop up that

[1178:19]

says open in browser, which is going to

[1178:21]

allow you to open up, and I'll zoom in

[1178:23]

the contents of the current folder with

[1178:25]

a web browser. My URL has changed to be

[1178:28]

different from what it was a moment ago.

[1178:30]

I came in advance today with my own

[1178:32]

folder of code like we usually do.

[1178:33]

Source 8, which contains all of today's

[1178:35]

pre-made examples. But here is the file

[1178:37]

I just created a moment ago. And if I

[1178:40]

click on that hello.html,

[1178:42]

what we're looking at at the moment is

[1178:43]

just a directory listing, a directory

[1178:45]

index of all of the files in my code

[1178:47]

right now, I see the simplest of web

[1178:49]

pages. It's a little underwhelming, but

[1178:51]

clearly here's hello body, which takes

[1178:53]

up like 95% of the screen, the so-called

[1178:56]

viewport, which is just a big

[1178:57]

rectangular region of the screen, but

[1178:59]

there's the title in the tab up there.

[1179:01]

So, if you've ever wondered or cared

[1179:02]

like where does the content in a web

[1179:04]

page come from, well, here's the body

[1179:06]

content. Here's the head or the title

[1179:08]

content. And then everything else is

[1179:10]

just sort of icing on the cake. So, I've

[1179:12]

written at this point a file called

[1179:15]

hello.html.

[1179:16]

it has yielded this effect of having

[1179:18]

something in the head uh in the uh the

[1179:20]

head of the page and the body. But let's

[1179:22]

actually tease apart what just happened.

[1179:24]

So at the start of any file written in

[1179:26]

this language called HTML, the latest

[1179:27]

version thereof, five, it literally just

[1179:29]

starts with this. And this is just the

[1179:30]

kind of thing you memorize or copy

[1179:32]

paste. Uh open bracket exclamation point

[1179:35]

dot type HTML close bracket over there.

[1179:38]

It looks a little bit different because

[1179:39]

we're not going to use for the most part

[1179:41]

the exclamation point syntax anywhere

[1179:43]

else unless we're using an HTML comment.

[1179:45]

So HTML has comments just like Python, C

[1179:47]

and other languages. But let's focus

[1179:49]

really on this juicier part. Here we

[1179:52]

have what's known as an uh an element in

[1179:55]

HTML. An element includes a start tag

[1179:58]

and an end tag or equivalently an open

[1180:00]

tag and a close tag. So here for

[1180:03]

instance is syntax that essentially is

[1180:05]

going to tell the browser when my

[1180:06]

browser reads this file top to bottom

[1180:08]

left to right hey browser here comes the

[1180:10]

HTML of my page and the language in

[1180:12]

which the contents of this page are

[1180:14]

written are in English. So HTML all

[1180:17]

lowercase is the name of the tag so to

[1180:20]

speak and equivalently the name of the

[1180:22]

element. Lang is what's going to be

[1180:23]

called an attribute which just modifies

[1180:25]

the default behavior of the uh element

[1180:28]

and quote unquote en is the value

[1180:30]

thereof which is the shorthand notation

[1180:32]

for English and their shorthand

[1180:33]

notations for most every human language

[1180:35]

as well. So you have a tag name and an

[1180:38]

attribute with a value. And we've seen

[1180:40]

these things so many times. These key

[1180:42]

value pairs in the context of

[1180:44]

dictionaries or hashts or any number of

[1180:47]

other contexts. Key value pairs in HTML

[1180:49]

are separated by an equal sign with the

[1180:51]

value typically quoted in this way.

[1180:53]

Double quotes or single quotes but being

[1180:55]

consistent. Then notice at the end of

[1180:57]

this file as per the indentation,

[1180:59]

there's something symmetrically down

[1181:01]

here that has the effect of closing the

[1181:03]

tag or ending the tag. And this

[1181:05]

effectively tells the browser, "Hey

[1181:06]

browser, that's it for my HTML."

[1181:08]

Meanwhile, everything else follows the

[1181:10]

similar paradigm inside of those two

[1181:13]

tags. Here is a head tag that says, "Hey

[1181:15]

browser, here comes the head of my page.

[1181:16]

Hey browser, that's it for the head of

[1181:18]

the page. Hey browser, inside of the

[1181:20]

head, here comes the title, that's it

[1181:22]

for the title. Well, what is the title?

[1181:24]

Hello, title." Just as I wrote in my

[1181:26]

code space. Same story for body. Hey

[1181:28]

browser, here comes the body of the

[1181:29]

page. The 95% of the screen, that's it

[1181:32]

for the body. But what's in the body is

[1181:34]

exactly that. The indentation is nice

[1181:36]

and pretty printed. I've used four

[1181:37]

spaces as we commonly do. Not strictly

[1181:40]

necessary. In fact, in my own code

[1181:42]

space, I didn't even bother putting

[1181:44]

these on three separate lines. I just

[1181:45]

did one line. That's fine because as

[1181:47]

we'll see, browsers typically ignore

[1181:50]

whites space. Uh but I've done it there

[1181:52]

as we often do just to ensure that

[1181:54]

things are pretty printed and therefore

[1181:56]

readable by us humans. Let me call your

[1181:58]

attention to one other thing on the

[1182:00]

screen. Up until now, before every

[1182:02]

lecture, I've been hiding a whole bunch

[1182:04]

of tabs in my terminal window. But

[1182:07]

today, I left enabled one that you've

[1182:08]

probably seen but not cared about

[1182:10]

before, namely ports. And it's under

[1182:12]

this ports tab that you can actually see

[1182:14]

a real incarnation of a TCP port. By

[1182:17]

default, when you run the command HTTP

[1182:19]

server, it serves up my current folders

[1182:22]

content on its own web server, its own

[1182:24]

HTTP server, but not using the default

[1182:27]

port 80 or 443 because GitHub is already

[1182:30]

using those on CS50.dev and their

[1182:32]

product. But by default, we've chosen

[1182:35]

another common developer port number

[1182:37]

8080, which is interesting only in so

[1182:39]

far as it's 80 twice, but it's a human

[1182:41]

convention, but it could have been any

[1182:43]

number of thousands of other

[1182:44]

possibilities. But this line here is

[1182:46]

just telling me that I am some

[1182:48]

apparently running a server on port

[1182:50]

8080. And if I click on there too, I can

[1182:52]

manually open the same tab. But that's

[1182:54]

what the green button was doing for me.

[1182:56]

It was informing me, hey, you've just

[1182:58]

started a web server on this port. Do

[1182:59]

you want to open a new tab with the

[1183:02]

contents thereof?

[1183:05]

So this is the picture we're now

[1183:08]

painting. Let me pull back up the code

[1183:10]

that we just wrote and let me propose

[1183:13]

that what we've really done is built a

[1183:15]

tree in the browser's memory. So we kind

[1183:17]

of have come full circle with week five

[1183:18]

when we talked about trees and other

[1183:20]

hierarchical structures. If we assume

[1183:22]

that the document can be represented

[1183:23]

with a node that looks a bit like an

[1183:24]

oval up here that just represents the

[1183:26]

whole contents of the file. Well, it

[1183:28]

starts with a single root element by

[1183:30]

convention, the HTML element. And your

[1183:32]

page can have only one of those

[1183:34]

elements. But the HTML tag inside of it

[1183:36]

can be a head tag and a body tag. And in

[1183:39]

this case, the head tag, recall, had a

[1183:40]

title tag as well as the actual text

[1183:43]

thereof, which was hello title.

[1183:44]

Meanwhile, the body had just the text

[1183:46]

thereof as well. And so when I keep

[1183:48]

saying that the browser is downloading

[1183:49]

the file, for instance, hello.html,

[1183:52]

reading it top to bottom, left to right.

[1183:53]

It's doing literally that, but somehow

[1183:55]

or other, it's using Maloc or whatever

[1183:57]

language it's written in to allocate

[1183:59]

node, node, node, node, node, and

[1184:02]

populating that tree in your browser's

[1184:05]

memory or RAM, a data structure quite

[1184:08]

like that. So, it's all sort of gerine

[1184:10]

to where we've been before.

[1184:14]

Before now, we take I think a snack, are

[1184:18]

there any questions

[1184:20]

about what we've just seen?

[1184:23]

anything at all. Shouldn't have prefaced

[1184:25]

this with the only thing between us is

[1184:28]

uh these questions and snacks.

[1184:30]

No. All right, snack time. All right,

[1184:31]

see you in 10. Snacks.

[1184:36]

All right,

[1184:39]

so we are back and pretty much

[1184:41]

everything we do here on out will look

[1184:42]

structurally like this. And we're just

[1184:45]

going to introduce a few more tags and a

[1184:46]

few more attributes to give you a sense

[1184:47]

of some of the basic building blocks of

[1184:49]

most any website out there. And you'll

[1184:51]

find pretty quickly that it starts to

[1184:53]

get kind of tedious writing it out. In

[1184:55]

fact, I will resort to some copy paste

[1184:56]

today just to kind of speed things up.

[1184:58]

But this is going to motivate indeed

[1184:59]

next week when we reintroduce Python as

[1185:02]

well as SQL to actually auto automate

[1185:04]

generation of HTML as well. So all of

[1185:07]

today's websites and many of today's

[1185:08]

mobile apps are written in HTML. But

[1185:11]

people are in decreasingly writing this

[1185:14]

kind of stuff by hand. Rather they are

[1185:15]

writing code that generates precisely

[1185:17]

what we're going to learn. So

[1185:18]

understanding the fundamentals will

[1185:20]

still be useful so we know what code to

[1185:21]

write next week and beyond. So let me go

[1185:24]

back into VS Code here. And what I'm

[1185:26]

going to go ahead and do is open up

[1185:27]

another terminal window so that I can

[1185:29]

leave HTTP server running in this first

[1185:31]

terminal window. And what I'm going to

[1185:33]

go ahead and propose that we do is

[1185:35]

implement a web page that has not just a

[1185:37]

single line of text, but maybe some

[1185:38]

paragraphs. So I'm going to call this

[1185:40]

paragraphs.html.

[1185:42]

That's going to open up a new tab. And

[1185:44]

here's where I'm going to save some

[1185:45]

time. I'm going to go back to hello.html

[1185:47]

HTML and just highlight all and copy

[1185:49]

paste this as the beginning of this

[1185:50]

file. But what I'll start doing is just

[1185:52]

changing the title of each page to match

[1185:54]

the file name. So this is going to be my

[1185:56]

paragraphs example. And instead of

[1185:58]

saying just hello body, let's actually

[1185:59]

have a few paragraphs of text. Um I'd

[1186:01]

rather not waste time writing even full

[1186:03]

paragraphs of text. So let's actually

[1186:04]

open up the doc and let's log in and for

[1186:07]

instance just ask it for a help quick

[1186:09]

helping hand here. Write three

[1186:11]

paragraphs about

[1186:14]

uh computer science. don't really care

[1186:17]

what the output is. All I want is some

[1186:18]

dynamically generated text to save me

[1186:20]

some keystrokes. And here we have an

[1186:22]

educational answer there, too. Even

[1186:24]

though all we really care about today is

[1186:26]

the fact that this is three chunks of

[1186:28]

text. Hopefully, that's all quite

[1186:30]

accurate. All right, I'm going to go

[1186:31]

ahead and highlight all of that. Go back

[1186:33]

into my paragraphs.html tab. Paste it

[1186:36]

inside of the body. It's so long, the

[1186:38]

paragraphs, that the text scrolls. I can

[1186:40]

at least clean this up slightly. I'm

[1186:42]

going to go ahead and just indent it

[1186:43]

twice just so that at least it's pretty

[1186:45]

printed inside of the body. And now I'm

[1186:47]

going to go back to my other tab which

[1186:49]

represents the contents of hello.html.

[1186:51]

I'm going to click back which is going

[1186:53]

to show me that same directory listing

[1186:54]

again which now has a new file

[1186:56]

paragraphs.html and I'm going to click

[1186:58]

it so as to see these three paragraphs

[1187:00]

of text.

[1187:03]

What looks wrong? Yeah,

[1187:05]

>> paragraphs.

[1187:06]

>> There's no paragraphs. It's just one big

[1187:07]

blob of text. It's the same text, but

[1187:10]

buried in there is the end of the first

[1187:12]

paragraph and the start of the next, and

[1187:14]

same for the third. So, what's going on?

[1187:16]

Well, appropo of my comment earlier

[1187:18]

about browsers not really caring about

[1187:20]

whites space, you can put all the white

[1187:22]

space you want there. It's just going to

[1187:23]

ignore it in this particular case. All

[1187:25]

it's going to give me minimally is a

[1187:27]

single space between each of these

[1187:28]

paragraphs of text. So, HTML is very

[1187:30]

pedantic. Like, if you want there to be

[1187:32]

more paragraphs, you need to tell the

[1187:34]

browser, put a paragraph here, put a

[1187:36]

paragraph there. And the way to do this

[1187:38]

thankfully isn't all that hard. I'm

[1187:39]

going to go inside of the body here and

[1187:41]

I'm going to simply open a tag called

[1187:43]

open uh P for paragraph for short.

[1187:46]

Notice that VS Code in this particular

[1187:47]

case is a little annoying because it's

[1187:49]

trying to finish my thought, but it

[1187:50]

doesn't know that I already wrote this

[1187:52]

text. So, I'm just going to delete what

[1187:53]

it automatically generated. And then I'm

[1187:55]

going to manually indent this. And I'm

[1187:57]

going to do the same thing again for the

[1187:58]

other paragraphs. Up here, I'm going to

[1188:00]

open the paragraph tag. I'm going to

[1188:01]

delete temporarily the close tag so that

[1188:04]

I can actually put it below that chunk

[1188:06]

of text here. Indent this and then down

[1188:10]

here. And this would have been easier if

[1188:11]

I just did it right the first time. I'm

[1188:13]

going to do the same thing with the

[1188:14]

third and final paragraph. So now what

[1188:17]

we in effect have three times in a row

[1188:18]

is hey browser here comes a paragraph

[1188:20]

then the first paragraph. Hey browser

[1188:22]

that's it for the paragraph. Hey browser

[1188:24]

here comes a paragraph that's it for the

[1188:25]

paragraph. Hey browser comes a

[1188:26]

paragraph. So, three times in total with

[1188:28]

open, close, open, close, open, close.

[1188:31]

Now, if I go back to the browser,

[1188:32]

nothing appears to have changed yet, but

[1188:33]

that's cuz I'm looking at a copy that

[1188:35]

was downloaded a moment ago in that

[1188:37]

virtual envelope. So, this is why, among

[1188:39]

other reasons, we hit reload on web

[1188:41]

pages to get the latest version. And

[1188:43]

voila, now we have three actual

[1188:45]

paragraphs. Um, the white space is

[1188:47]

inserted automatically by the browser,

[1188:48]

but it's at least prettier to the eye

[1188:51]

now. So, that then is the paragraph tag.

[1188:54]

So, useful, of course, if we have

[1188:55]

paragraphs of text. What are some other

[1188:57]

tags we might introduce? Well, maybe

[1188:58]

you're writing a paper or a blog post or

[1189:01]

the like. It's pretty typical to want

[1189:02]

headings of sections of the page. Maybe

[1189:04]

chapters and then sections and then

[1189:06]

subsections or the like. HTML can help

[1189:08]

with this too. So, let me go into my

[1189:10]

terminal window again, create a file

[1189:13]

called how about uh let's call it

[1189:15]

headings.html.

[1189:17]

And then in this file, let me similarly

[1189:20]

go back to hello.html, copy paste it

[1189:23]

into headings. I'm going to close

[1189:24]

paragraphs because we're done with that.

[1189:25]

And I'm just going to change the title

[1189:27]

now to headings. And inside of the body

[1189:29]

here, what I'm going to go ahead and do

[1189:32]

is uh you know, it would have been nice

[1189:33]

to have some of that same text. Let's

[1189:34]

let me go back one step. Let me grab the

[1189:37]

paragraphs and paste that into this new

[1189:40]

file. Let me rename it to headings to

[1189:42]

make clear which file we're in. And now

[1189:44]

let me go ahead and propose that

[1189:46]

wouldn't it be nice if I made clear that

[1189:47]

this is the first paragraph. So I'm

[1189:49]

going to use the H1 tag, which is the

[1189:50]

heading one tag. And I'm just going to

[1189:52]

say one for the sake of discussion. And

[1189:54]

down here, I'm going to say H2 and say

[1189:56]

two for the sake of discussion. And down

[1189:58]

here, H3 3 because I don't really care

[1190:00]

what these things are called. Just want

[1190:01]

to demonstrate the functionality. If I

[1190:03]

go back to my other tab now, back to the

[1190:06]

directory listing, there's my brand new

[1190:07]

file headings.html. And it's the same

[1190:09]

paragraphs, but now you have some big

[1190:11]

bold text that looks reminiscent of the

[1190:13]

chapter heading, the section heading,

[1190:14]

the subsection heading, and the like. Or

[1190:16]

that you might see on a news site or a

[1190:18]

blog site or the like. So you've got H1

[1190:21]

through H6 from biggest and boldest to

[1190:24]

uh smaller but still bold. And the

[1190:26]

browser decides on all of those settings

[1190:27]

for us. But it also makes some semantic

[1190:31]

clarity to me that probably the most

[1190:32]

important thing on the page at least to

[1190:34]

begin with is that H1 tag and then

[1190:36]

everything else is like supporting

[1190:37]

paragraphs or arguments or whatever the

[1190:39]

case might be. There's a hierarchy

[1190:41]

implicit there. All right. What are some

[1190:43]

other things we can do with web pages?

[1190:44]

Well, let me open my terminal window

[1190:46]

again and why don't we code up how about

[1190:48]

a list of values cuz lists are

[1190:50]

everywhere on the internet. So, let me

[1190:53]

open up list.html and then close my

[1190:55]

terminal. Uh, I'll go ahead and start

[1190:58]

with that same file, headings.html,

[1191:01]

paste it into list, change the name

[1191:03]

here. Let's delete everything I did. And

[1191:05]

again, the only reason I'm copying and

[1191:07]

pasting is just to avoid writing out the

[1191:08]

same boilerplate code again and again

[1191:10]

with the HTML tag, head tag, body tag,

[1191:12]

and so forth. Let's focus on the new

[1191:14]

stuff. The new stuff in this example

[1191:16]

will be a list of values like the words

[1191:18]

fu, bar, and baz, which much like a

[1191:20]

mathematician might go with xyz as

[1191:22]

placeholders, computer scientists would

[1191:24]

typically reach for words like fu, bar,

[1191:26]

and baz when nonsensical placeholders.

[1191:29]

And this looks like a list of three

[1191:30]

values, one after the other. Of course,

[1191:33]

if I go back into my directory index,

[1191:36]

click on list, how many list items am I

[1191:39]

going to see per line?

[1191:41]

Yeah. Well, it's going to be just one

[1191:43]

big blob of text here, too. It doesn't

[1191:44]

matter if it looks like a list. It is

[1191:46]

just going to be text after text after

[1191:49]

text separated by a single space, not

[1191:51]

the multiple lines I had. So, here too,

[1191:53]

we've got to be pretty pedantic. If I

[1191:54]

want a list of values, I need to use a

[1191:57]

tag that conveys that. And the tag I'll

[1191:59]

use first is going to be ul for

[1192:01]

unordered list, which gives me a

[1192:03]

bulleted list. And then inside of this

[1192:05]

unordered list, I claim we're going to

[1192:07]

have a whole bunch of list items or li

[1192:09]

for short. uh like fu, like bar, like

[1192:14]

baz or any other things that you want to

[1192:16]

put in your list. If I now go back to my

[1192:18]

other tab, reload, now you get the

[1192:20]

familiar bulleted lists that you might

[1192:22]

see in any number of websites, Google

[1192:24]

Docs or the like. How does Google Docs

[1192:26]

do it underneath the hood? Well, they're

[1192:28]

just using a UL tag and some LI tags

[1192:30]

inside of that to give you the bulleted

[1192:32]

list that's just happening automatically

[1192:33]

when you click the appropriate button in

[1192:35]

something like Google Docs, which at the

[1192:37]

end of the day is just a website. Well,

[1192:39]

what if I want to number these things?

[1192:40]

Well, if I go back to VS Code, I could

[1192:42]

certainly just start numbering them like

[1192:43]

1 2 3, which is fine, but honestly, like

[1192:48]

computers can count and with loops

[1192:50]

pretty quickly. Also, it's a little

[1192:51]

annoying. If I want to go back in later

[1192:53]

and insert something between some of

[1192:54]

those elements, I then have to reumber

[1192:56]

everything manually. I mean, this is one

[1192:57]

of the things computers are good at. So,

[1193:00]

take a guess. If I want not an unordered

[1193:02]

list, but an ordered list that is

[1193:05]

numbered, what might you change? Yes, O

[1193:09]

is a good bet. Let's change both the

[1193:11]

open tag and the close tag. Let me go

[1193:13]

back to this uh my second tab. Reload.

[1193:16]

And now we have it. Uh one, two, and

[1193:18]

three. And you can actually use a whole

[1193:19]

table of contents. You can use uh sub

[1193:22]

bullets or subning. Anything you can do

[1193:23]

in like a table of contents, HTML can do

[1193:25]

for you automatically here. Well, what

[1193:27]

about tabular data? Laying out data in

[1193:30]

kind of rows and columns. Well, we can

[1193:32]

do that, too. Let me go ahead and open

[1193:33]

up a new file. Uh how about table.html.

[1193:37]

HTML. Let me go ahead then in this file,

[1193:40]

copy paste as before, just so I have

[1193:42]

some boilerplate. Let's get rid of

[1193:43]

everything in the body. And then let's

[1193:45]

just manually whip up a little table

[1193:47]

like this. Open bracket table. Inside of

[1193:50]

the table tags, I'm going to have a TR

[1193:52]

tag for table row. Inside of this table

[1193:55]

row, I'm going to have a table data tag,

[1193:57]

which is going to have the number one.

[1193:59]

I'm going to give myself another two,

[1194:01]

another three. Outside of the table row,

[1194:05]

I'm gonna have another table row. And

[1194:07]

I'm gonna create maybe four. And now I'm

[1194:10]

going to do five. And now I'm gonna do

[1194:12]

six. And you can perhaps see where this

[1194:14]

is going. After this, I'm going to do

[1194:16]

one more table row. How about a little

[1194:18]

tediously? Seven. How about eight? How

[1194:22]

about nine?

[1194:25]

And then lastly, just to make it look a

[1194:27]

little familiar, final table row. How

[1194:29]

about with a TD of an asterisk? And then

[1194:32]

how about a zero? And lastly, how about

[1194:35]

a pound symbol? Maybe. Any guesses as to

[1194:38]

what we're making in HTML here?

[1194:41]

Like a telephone keypad. Yeah. So, let's

[1194:44]

go back over to Let me close the old

[1194:46]

file. Back over to the browser. Click

[1194:48]

back. There's my new file, table.html.

[1194:51]

And it's not going to be very pretty,

[1194:52]

but I dare say that's exactly what you

[1194:54]

see when you pull up the phone app and

[1194:56]

you start dialing a number. It's sort of

[1194:57]

a numeric keypad laid out automatically

[1195:00]

for me in rows and columns. Now, this

[1195:01]

one's a little underwhelming. Let me

[1195:03]

open up a file that I made in advance of

[1195:05]

class today. Um, in my favorites uh file

[1195:08]

here, I'm going to go ahead and copy a

[1195:11]

pre-made example. I'm going to open up

[1195:13]

this file called favorites0.html.

[1195:16]

And what you'll see here is a slightly

[1195:18]

more complicated table, still with a

[1195:19]

table tag, but this time with a t head

[1195:21]

tag for table head and then a tbody tag

[1195:24]

inside of which are all of those rows.

[1195:25]

And I know this just by having read the

[1195:27]

documentation. And then notice this.

[1195:29]

Inside of the first TR in the T head,

[1195:32]

there are three TH's, table headings,

[1195:35]

timestamp, language, and problem, which

[1195:37]

might sound a little familiar when we

[1195:39]

last collected data from everyone via

[1195:41]

that Google form. Well, let's go ahead

[1195:43]

and spoil what this is. Let me go back

[1195:44]

to the directory index. There is this

[1195:46]

pre-made file, favorites.html, and

[1195:48]

arguably a more compelling use of a

[1195:50]

table. Now, we have an HTML table

[1195:52]

containing all of the form submissions

[1195:54]

that you all clicked in with the other

[1195:56]

day when we were asking you your

[1195:57]

favorite language and your favorite

[1195:59]

problem. It's not super pretty, but

[1196:01]

indeed it's in rows and columns. And so,

[1196:03]

it's reminiscent of the HTML that Google

[1196:05]

is using in the actual Google Sheets

[1196:07]

software to lay out a sheet of data for

[1196:10]

you in those same rows and columns. All

[1196:12]

right. Well, let's do something that's a

[1196:14]

little more visually interesting. Let me

[1196:15]

go back to VS Code here. uh close out

[1196:18]

those first uh those last two. And how

[1196:20]

about let's do something with images?

[1196:22]

Well, I brought again uh inside of

[1196:24]

today's code. Uh how about our same

[1196:27]

bridge that we keep opening up in class?

[1196:29]

And this is the week's bridge. Looks a

[1196:31]

little something Whoops. Uh looks a

[1196:34]

little something like this. Here though

[1196:36]

is just the raw image. How could I

[1196:38]

include an image in a web page that I

[1196:41]

serve up on the internet? Well, let's go

[1196:43]

ahead and try this. Let me close the

[1196:44]

ping itself. Let me copy this and create

[1196:47]

a new file called how about image.html.

[1196:50]

Hide my terminal. Copy paste that. Just

[1196:53]

quickly change the title to image so we

[1196:54]

know where we are. And inside of the

[1196:56]

body of this page, let's go and embed

[1196:58]

that image so that we can include not

[1196:59]

just the image, but if we want

[1197:01]

paragraphs of text around it, headings

[1197:02]

as well. Heck, maybe a table, any other

[1197:04]

features that we've seen already. I'm

[1197:06]

going to say img, which is image for

[1197:08]

short. Source src for short equals quote

[1197:12]

unquote bridge.png. And then I'm going

[1197:14]

to close the tag here. Now I'm going to

[1197:17]

go back to my other tab. Go back into my

[1197:19]

directory index. Here's my brand new

[1197:21]

file, image.html. And this too isn't

[1197:23]

going to look all that different from

[1197:24]

the actual image because I have no other

[1197:26]

content. But when I click on this,

[1197:28]

you'll see that there is the full screen

[1197:30]

image. And it's even a little too big to

[1197:32]

fit in my viewport in the body of the

[1197:34]

page. But we can fix something like that

[1197:36]

later. I've embedded in this website

[1197:38]

precisely that image. But I should do a

[1197:41]

little bit better here. In fact, if the

[1197:43]

image is slow to load or if someone uh

[1197:46]

is visually impaired and doesn't know

[1197:47]

what they're looking at, it would be

[1197:49]

nice to have some alternative text that

[1197:50]

something like screen reader software

[1197:52]

could recite. So, there's another

[1197:53]

attribute for this tag specifically

[1197:55]

called alt for alternative. And I can

[1197:57]

put something like Harvard University to

[1198:00]

at least give the user a textual

[1198:02]

description of what kind of photo

[1198:03]

they're looking at. You'll also see that

[1198:05]

text if indeed the image is slow to load

[1198:06]

or if it's broken, like missing

[1198:08]

altogether, you won't see 404. you'll

[1198:10]

see like a broken image icon, but at

[1198:11]

least with some explanatory text as to

[1198:13]

what the developer intended you to see

[1198:16]

at that point. It's not going to change

[1198:17]

at all if I reload here by going back to

[1198:20]

image.html, but again, a screen reader

[1198:22]

or an astute viewer would see that

[1198:25]

ultimately in the browser. But there's

[1198:27]

something different, and this isn't a

[1198:29]

mistake for once. What have I done

[1198:33]

differently, but apparently not wrong? I

[1198:35]

claim

[1198:38]

something new or noteworthy about this

[1198:40]

particular image tag. Yeah.

[1198:43]

>> Yeah. There's no like close tag. There's

[1198:45]

no like open bracket/ img which is the

[1198:48]

pattern we followed for every other tag

[1198:50]

like closing the HTML tag, the head tag,

[1198:52]

the body tag and so forth. I just don't

[1198:55]

see any end tag here. And it's just not

[1198:57]

necessary. Turns out there are certain

[1198:58]

HTML tags that can be empty elements,

[1199:01]

which is to say doesn't make semantic

[1199:03]

sense to start and end an image. Like

[1199:04]

it's either there or it's not. And so

[1199:06]

some tags just don't require an end tag

[1199:08]

if it's sort of obvious to the browser

[1199:10]

that the image should go there. So image

[1199:12]

is one such of those tags. And then I

[1199:14]

noticed um I'm missing the lang here,

[1199:16]

which isn't strictly necessary because

[1199:17]

I've got no textual content, but just

[1199:19]

for consistency, let me go back and put

[1199:21]

that in as before. Um, meanwhile, um,

[1199:25]

the image is exactly as it would appear

[1199:27]

in the screen, but it doesn't have to be

[1199:28]

just an image we embed. We can do

[1199:30]

something with like video. So, let me go

[1199:32]

ahead and open up a file called

[1199:34]

video.html.

[1199:36]

Let me copy paste some of that starter

[1199:38]

code. Change this to video. And instead

[1199:40]

of the image tag, as you might imagine,

[1199:42]

there's also a video tag. It's a little

[1199:44]

more involved, but per the

[1199:45]

documentation, I know I can do this

[1199:48]

video. And then inside of the video tag,

[1199:50]

I can actually have multiple sources

[1199:52]

just in case the browser might want

[1199:54]

different versions or different

[1199:55]

resolutions, sort of qualities thereof.

[1199:57]

And this somewhat confusingly is an

[1200:00]

actual tag called source, not shortened,

[1200:03]

but stupidly this tag has an attribute

[1200:05]

called source, which is shortened that

[1200:08]

equals the name of the file you want to

[1200:10]

embed. And I came with today's examples,

[1200:12]

a video file called video.mpp4, which is

[1200:14]

a small video that you can embed. And I

[1200:16]

can tell the browser what type of video

[1200:18]

it is to be clear. And the convention

[1200:20]

here or content type is to say the type

[1200:22]

of this video is an MPEG 4 video. There

[1200:25]

are other features though for the video

[1200:27]

tag. In fact, in when you see a video on

[1200:29]

a page, you can very often see like a

[1200:31]

play icon, a pause icon, maybe some

[1200:33]

other controls. Well, it turns out you

[1200:35]

can put an HTML attribute on the video

[1200:38]

tag literally called controls that will

[1200:40]

enable those. If you don't turn them on,

[1200:42]

there's no way to like start and stop

[1200:44]

the video and or see rather those

[1200:45]

controls visually. This way, the user

[1200:47]

actually sees them. But this attribute

[1200:49]

is a little bit different from others.

[1200:50]

It doesn't actually need a value. It

[1200:52]

just has to be present and the browser

[1200:54]

will know when it sees the word

[1200:55]

controls, oh, I should turn on the

[1200:57]

controls feature. And for good measure,

[1201:00]

especially in today's world of

[1201:01]

advertisements everywhere, if you want

[1201:03]

the video to play automatically

[1201:05]

potentially, uh, or at least not annoy

[1201:07]

the user, you might want to mute it by

[1201:08]

default as well. So another attribute

[1201:10]

per the documentation for the video tag

[1201:12]

is that you can start the video muted as

[1201:14]

well. And only when the user clicks on

[1201:16]

it might you actually start to hear

[1201:18]

something. But of course these are

[1201:20]

fairly basic examples of media inside of

[1201:24]

pages. Let's actually do what the uh H

[1201:27]

is meant to imply in HTML. The hypertext

[1201:30]

the ability to link from one page to

[1201:32]

another. That is a feature we haven't

[1201:34]

yet seen. So let me go ahead and do

[1201:36]

this. And let me just for completeness,

[1201:38]

let me go back into hello.html because I

[1201:39]

completely forgot the language

[1201:41]

attribute, even though that's really

[1201:42]

just there for SEO, search engine

[1201:45]

optimization, or for tools like Google

[1201:46]

Translate or the like that know

[1201:48]

therefore what language they're

[1201:50]

translating from. Um, let me go into my

[1201:52]

terminal window here and let's create

[1201:55]

another file called link.html, which

[1201:57]

demonstrates exactly that, the ability

[1201:59]

to link from one web page to another. Uh

[1202:02]

let's go ahead here and change the title

[1202:04]

to link so I know where I am. And in the

[1202:07]

body of this page, let's go ahead and

[1202:09]

create what's called a hyper reference

[1202:11]

or hyperlink. Uh I'll encourage people

[1202:13]

in this page to visit the actual Harvard

[1202:15]

website. So let's do visit. How about uh

[1202:19]

Harvard period just to demonstrate where

[1202:22]

we're beginning. If I go back into this

[1202:24]

directory index, click on link.html.

[1202:28]

This, of course, is not yet a link, so I

[1202:30]

should probably make it one. Well,

[1202:32]

instead of just saying visit Harvard,

[1202:33]

maybe I should say harvard.edu. Go back

[1202:36]

to the other tab. Reload. And it's

[1202:38]

harvard.edu, but I can click and

[1202:40]

highlight it, but it's not clickable.

[1202:42]

It's not underlined like a link. All

[1202:44]

right. Well, maybe I need to do like

[1202:45]

www.harboard.edu.

[1202:48]

Reload. Still nothing happening. All

[1202:51]

right. Well, maybe I need the full URL

[1202:53]

in the scheme. https

[1202:55]

and maybe the slash at the end. Reload

[1202:58]

again and nothing's happening. So here

[1203:00]

too, HTML is pedantic. Like it will not

[1203:02]

create a link for you unless you tell it

[1203:04]

to create a link. And the fact that when

[1203:06]

you post on social media nowadays or in

[1203:08]

Google Docs, things are automatically

[1203:10]

hyperl for you, like that's a feature

[1203:11]

implemented in code. Very often, Python

[1203:14]

or JavaScript or something else where

[1203:16]

some human wrote code that looks for

[1203:18]

patterns in the uh input you've typed in

[1203:21]

and if it looks like you've typed a URL,

[1203:23]

it will automatically link it for you.

[1203:25]

But what are those websites doing for

[1203:27]

you automatically? Well, they're doing

[1203:29]

this. If you want to have a tag, a link

[1203:32]

here to Harvard's website, you use open

[1203:34]

bracket a for anchor, href for hyper

[1203:37]

reference. Set that equal to the URL to

[1203:40]

which you want to link. Close the tag

[1203:42]

and then in between the open tag and the

[1203:44]

closed tag, put the actual word you want

[1203:47]

to link to. So now if I go back to this

[1203:50]

page and reload, now I have what looked

[1203:52]

like my original attempt, just visit

[1203:54]

Harvard, but it's a hyperlink. And this

[1203:56]

is super subtle, but if I hover over

[1203:58]

that underlined word, which is blue by

[1204:00]

default, you'll actually see in the

[1204:01]

browser's bottom lefthand corner where

[1204:03]

you're going to be whisked away to, even

[1204:05]

though that's all too subtle, but this

[1204:07]

now looks like I intended, an actual

[1204:09]

hyperlink to Harvard. In fact, I could

[1204:12]

link it to the full URL, but it would be

[1204:15]

a little redundant. And even though this

[1204:17]

looks like uh you shouldn't have to do

[1204:19]

this, this is indeed how HTML works. The

[1204:21]

href attribute is where you're going to

[1204:23]

go. The text inside of the open and

[1204:25]

close tag is what the user will see. So

[1204:27]

if you want them to see the full URL,

[1204:29]

you got to put it there. And now I can

[1204:31]

see the full URL to where I'm being led.

[1204:34]

But here's where you can actually

[1204:36]

introduce discussions of like cyber

[1204:37]

security. How could this feature be

[1204:39]

abused? Might you think? This stupid

[1204:42]

simple feature. Yeah. have it display

[1204:45]

something but actually

[1204:47]

>> yeah you could have it display one thing

[1204:49]

but lead to somewhere else and it

[1204:50]

wouldn't be that hard for the adversary

[1204:52]

who's maybe tricked you into visiting

[1204:54]

their web page to say you're actually

[1204:56]

going to go to yale.edu edu instead of

[1204:58]

Harvard. But if I reload the page, it

[1205:00]

doesn't look any different. Unless the

[1205:03]

viewer is astute enough to look at this

[1205:04]

tiny little text in the bottom of the

[1205:06]

screen or just click on the link and be

[1205:08]

whisked away to the wrong destination.

[1205:10]

That can be problematic. Like this is a

[1205:12]

nice haha sort of prank. But you could

[1205:15]

certainly imagine doing this with like

[1205:16]

paypal.com addresses or any number of

[1205:19]

banks or anything where you're trying to

[1205:20]

collect personal information from

[1205:22]

someone. And if the resulting website

[1205:24]

looks quite like the one you're actually

[1205:27]

creating, uh, it looks quite like the

[1205:29]

website they're expecting, but it's

[1205:31]

actually your copy thereof, it's all too

[1205:33]

easy to wage what are called fishing

[1205:35]

attacks. P H I S H I N G, which means to

[1205:38]

lead someone to what looks like the real

[1205:40]

site, but is not. Typically, to get

[1205:42]

their username, their password, their

[1205:44]

credit card information, or something

[1205:45]

else. But it boils down to just these

[1205:47]

basic building blocks like this.

[1205:50]

questions then on any of these building

[1205:52]

blocks that we've seen thus far. Yeah.

[1205:56]

>> I think I might have gone lost in the

[1205:58]

earlier portion.

[1205:59]

>> Sure.

[1206:00]

>> How did you um like get get it to open

[1206:03]

up? Like did you run the file in

[1206:06]

>> Oh, good question. How did I get it to

[1206:08]

open up? So, let me rewind. So, the very

[1206:10]

first thing we did after creating

[1206:12]

hello.html HTML was open a terminal

[1206:14]

window and specifically I ran a command

[1206:16]

which was HTTP server http-server which

[1206:20]

starts my own web server in my code

[1206:22]

space but not on the default port 80

[1206:24]

and443 because that's what cs50.dev is

[1206:26]

already using instead it chose by our

[1206:29]

design 8080 which is commonly used by

[1206:31]

developers when making websites. Then I

[1206:34]

just kind of hid my terminal because

[1206:35]

it's not interesting to see constantly

[1206:37]

then. But that web server is still

[1206:39]

running in my code space. And anytime

[1206:42]

I'm saying let's go back to this tab, I

[1206:44]

am now visiting a different URL that was

[1206:46]

the result of my clicking on that green

[1206:48]

button which led me to my own website.

[1206:51]

If you ever get lost or close that tab

[1206:53]

by accident, no big deal. If you go to

[1206:55]

the ports tab of your terminal, you can

[1206:58]

actually hover over this and click on

[1207:00]

that same URL and open up the contents

[1207:03]

of your own site instead.

[1207:07]

>> Fluffy meme. Yes, these are randomly

[1207:09]

generated names by GitHub, which is the

[1207:11]

company that hosts VS Code in this way.

[1207:13]

And they do this to ensure uniqueness

[1207:14]

without it being some arcane sequence of

[1207:16]

random letters and numbers. They

[1207:17]

concatenate random English words

[1207:19]

together. A good question. All right.

[1207:23]

So, what else can we do here? Well, let

[1207:25]

me propose that there's a bit more you

[1207:27]

can do with even these URLs. Here, of

[1207:29]

course, is the scheme and the host name

[1207:30]

and the domain and the TLD. But after

[1207:32]

the URL, things can get a little more

[1207:34]

interesting than just folder names and

[1207:35]

file names. In fact, it's quite common

[1207:37]

to see URLs that have somewhere in them

[1207:39]

a question mark and then a bunch of

[1207:41]

other key value pairs which is this

[1207:43]

omnipresent computer science thing it

[1207:44]

seems including in the context of URLs

[1207:47]

whereby if you want to pass a input to a

[1207:51]

web server one means by which you can do

[1207:53]

that is literally in the URL itself. So

[1207:57]

for instance, if you visit google.com

[1207:59]

and you want to search for something,

[1208:00]

you and I are all in the habit of course

[1208:02]

of just typing into a search box. But

[1208:05]

how is that search box actually getting

[1208:07]

the data into Google's servers? Well,

[1208:09]

it's via these URLs. And if there's not

[1208:11]

one input, but two inputs, the URL might

[1208:14]

be a bit longer and there might be one

[1208:15]

or more amperands in the URL that just

[1208:17]

separate more key value pairs. And it

[1208:20]

turns out we can see this in the real

[1208:22]

world as follows. Let me go back to VS

[1208:24]

Code here. Let me open up a new tab. Uh,

[1208:27]

and let me open up uh, google.com. And

[1208:30]

I'm just going to hit enter on the

[1208:32]

shortest way of saying it. So, I get to

[1208:33]

Google's home uh, homepage here. Even

[1208:36]

though notice I ended up at some longer

[1208:37]

form of the URL. In fact, I'm going to

[1208:39]

delete everything else from the URL

[1208:40]

that's not relevant to us today. It's

[1208:43]

still forcibly coming back. So, Google

[1208:44]

is somehow trying to track me by putting

[1208:46]

that in there. That's fine. All I'm

[1208:48]

going to do is search for cats. Now,

[1208:50]

there's a whole bunch of other

[1208:51]

functionality that's clearly happening,

[1208:52]

like autocomplete, and it's trying to

[1208:54]

figure out what results or words I might

[1208:56]

want. I'm just going to go ahead and hit

[1208:57]

enter. And this is all to say that

[1209:00]

notice if I zoom in on the URL at the

[1209:02]

top of my screen, it's a crazy long URL

[1209:04]

because Google probably is doing a bunch

[1209:05]

of tracking and advertising and

[1209:07]

analytics technologically, none of which

[1209:09]

is relevant to us today. But notice

[1209:11]

after www.google.com,

[1209:14]

there's /arch, which is the path on

[1209:16]

their server, the search program that

[1209:18]

someone there has written. There's a

[1209:19]

question mark and then there is an HTTP

[1209:22]

parameter as these things are called the

[1209:24]

more precise name for key value pairs in

[1209:26]

URLs. This is an HTTP parameter. Its

[1209:29]

value after the equal sign is in fact

[1209:32]

cats. All this other stuff I have no

[1209:34]

idea what it is. I'm going to just

[1209:35]

delete it and hit enter and it stays

[1209:37]

gone. But I still get cats in my search

[1209:39]

results. So this I would argue is sort

[1209:42]

of the canonically shortest form of a

[1209:44]

Google URL that's useful. In fact, if I

[1209:47]

want to search for dogs instead, I don't

[1209:48]

have to use the search box. I can

[1209:50]

literally manually make my own URL, hit

[1209:52]

enter, and if I zoom out, there are

[1209:55]

Google search results about dogs. So,

[1209:57]

this URL 2 is sort of the essence then

[1210:00]

of how URLs work. And specifically, the

[1210:02]

get verb, which was that keyword in all

[1210:05]

caps that I claimed was inside of the

[1210:06]

envelope, and it's what Phyllis's

[1210:08]

browser was sending, and it's what my

[1210:09]

browser has been sending through all of

[1210:11]

these examples. But here's where things

[1210:13]

now can get interesting. If I know how

[1210:15]

Google's server works, its backend, the

[1210:19]

part that knows all about cats and dogs

[1210:21]

on the internet, I can implement my own

[1210:23]

front end by just knowing a bit of HTML.

[1210:25]

So, let me actually go back into VS Code

[1210:28]

here. Let me go uh into my second

[1210:31]

terminal, which is blank, and let me go

[1210:33]

ahead and create something called

[1210:35]

search.html.

[1210:36]

I'm going to go ahead and copy my

[1210:38]

original code, close link, and paste it

[1210:41]

here. Hide my terminal. call this thing

[1210:44]

search and then inside of the body of

[1210:45]

this page I'm going to make my own

[1210:47]

version of Google here. I'm going to use

[1210:49]

a form tag and I'm going to in that form

[1210:52]

specify an input tag whose name is going

[1210:56]

to be exactly equal to what I saw Google

[1210:58]

uses Q which happens to stand for query.

[1211:01]

Uh I am then going to add another one

[1211:03]

input. Uh the type of this button

[1211:06]

actually let's say the type of this box

[1211:08]

this input is going to be text. The type

[1211:10]

of this next one is going to be a submit

[1211:12]

button. Uh, and then that's it. Let me

[1211:16]

go back into my other tab. Go back into

[1211:19]

my directory listing. Click on

[1211:20]

search.html. And this is not pretty, but

[1211:23]

it is the beginning of my very own

[1211:25]

search engine. Unfortunately, if I type

[1211:26]

in cats, notice what happens. My URL

[1211:30]

changes such that it's search.html

[1211:32]

question mark q equals cats. I know

[1211:35]

nothing about cats. I don't have a

[1211:36]

database of cats. I haven't done any

[1211:37]

backend work, just the front end. The

[1211:39]

front end is what the user sees. The

[1211:41]

back end is what provides data to the

[1211:43]

front end. But why don't I tell this

[1211:46]

form not to submit to me. But let's say

[1211:48]

that its action should actually be go to

[1211:51]

go to https

[1211:53]

www.google.com/arch

[1211:56]

which is the URL that I saw in my

[1211:58]

browser. I'm just inferring how Google

[1212:00]

works. I'm going to be pedantic even

[1212:02]

though this is the default. I'm going to

[1212:04]

say the method I want my form to use is

[1212:06]

get. Confusingly, it should be lowercase

[1212:08]

here, even though inside of the envelope

[1212:09]

it will be all caps. And then I'm going

[1212:12]

to go back to this page. Reload after

[1212:14]

going back. And you'll see the same

[1212:17]

exact box, but when I search now for

[1212:19]

cats, submit, notice my URL changes to

[1212:24]

Google's own. It's like voila. Like I

[1212:26]

just implemented my own Google without

[1212:27]

doing the actual hard part. I've

[1212:29]

actually just done the more simple front

[1212:31]

end. And there's a few other things I

[1212:33]

can do here that are sort of nice. I can

[1212:34]

change the type to be a search box. I

[1212:36]

can change the value of my button, not

[1212:38]

to be the default, which notice was

[1212:40]

submit. I can say Google search. And I

[1212:42]

can keep tweaking this to make it even

[1212:44]

prettier and prettier here. Now in my

[1212:47]

version is now a box that has uh cats.

[1212:50]

Notice that it's trying to complete my

[1212:51]

thought. I can actually go back into the

[1212:53]

form. I can say autocomplete equals off

[1212:56]

to turn off that feature. So now if I

[1212:59]

click in this box and type Oh,

[1213:05]

autocomplete equals off. Why is it still

[1213:08]

there?

[1213:12]

>> Did I forget to refresh? Oh, thank you.

[1213:15]

I forgot to refresh. Hence my point. So

[1213:17]

you always have to reload after making a

[1213:19]

change. And now the autocomplete feature

[1213:20]

is off. And this other little thing,

[1213:22]

it's subtle, but this little X that will

[1213:23]

just clear the whole thing. That is

[1213:25]

simply the result of having changed text

[1213:27]

to search for the type of that box. Um,

[1213:30]

there's other things you can do too for

[1213:31]

accessibility or user friendliness. I

[1213:33]

can do auto uh focus here for instance

[1213:37]

without any attribute or without any

[1213:39]

value. If I now reload this page, notice

[1213:42]

that the cursor is automatically

[1213:43]

blinking in the text box, which is a

[1213:45]

marginal change, but much easier for me

[1213:46]

to now type cats without having to

[1213:48]

stupidly click in the box in order to

[1213:50]

actually foreground it so I can type

[1213:52]

input. So, suffice it to say, this is

[1213:55]

not really the business that Google is

[1213:57]

in. They do much more on the back end

[1213:59]

than they do on the front end. But with

[1214:00]

just these basic building blocks, can I

[1214:02]

implement the beginnings of the same

[1214:04]

website? In fact, let me do one other

[1214:06]

flourish. You'll see that that text box

[1214:07]

is blank. Not clear what I might want to

[1214:09]

do. Well, there's another attribute I

[1214:10]

can use. Placeholder equals something

[1214:12]

like query. I can at least tell the user

[1214:14]

what to search for. If I reload again,

[1214:16]

now I see in gray text query

[1214:18]

instructions so that I roughly know what

[1214:20]

now to type. So all these things that

[1214:22]

you see every day on websites are really

[1214:24]

as easy as just coding up some HTML like

[1214:27]

that. But what else can we do with HTML?

[1214:30]

Well, it turns out this is a topic for

[1214:32]

another longer day too. There exist in

[1214:34]

computing what are called regular

[1214:35]

expressions which is a fancy way of

[1214:37]

describing patterns which are quite

[1214:38]

useful when you want to validate input.

[1214:40]

For instance, if you want the user to

[1214:42]

have to type in an email address with

[1214:43]

the at sign with the tldd and so forth,

[1214:46]

it would be nice to make sure that they

[1214:47]

get a warning if they try to skip that

[1214:50]

field or they mistype something in it as

[1214:52]

well. Um, with the world of regular

[1214:54]

expressions known in short as reg x's,

[1214:57]

you have a whole bunch of uh

[1214:59]

documentation here that in a nutshell

[1215:01]

will introduce you to some pretty

[1215:03]

powerful syntax that we won't spend much

[1215:04]

time on at all today, but it's syntax

[1215:06]

that exists not only in uh the world of

[1215:08]

the web, but in Python and so many other

[1215:11]

languages as well. So consider this just

[1215:13]

a quick crash course. If you want to

[1215:15]

define a pattern in say a website that

[1215:17]

ensures that the user types in a email

[1215:19]

address, you can use these textual

[1215:21]

building blocks whereby in the world of

[1215:23]

regular expressions, a single dot

[1215:25]

represents any character. If you don't

[1215:27]

care what the character is, dot

[1215:29]

confusingly doesn't represent a period,

[1215:31]

it represents any character. Star

[1215:33]

represents zero or more times. Uh plus

[1215:36]

means one or more times. Question mark

[1215:38]

means zero or one time if you want

[1215:40]

something to be there or not. curly

[1215:42]

braces with a number means this many

[1215:44]

times n and you can even have a range of

[1215:46]

values instead. And then you can use

[1215:48]

square brackets and some other syntax to

[1215:50]

say I want the user to type in any of

[1215:52]

these characters or digits in this case.

[1215:55]

Or you can do ranges like this. I want

[1215:57]

them to type in any decimal digit

[1215:58]

between 0 and 9 or back slashd

[1216:00]

represents any digit. Back slash capital

[1216:02]

d means anything that's not a digit.

[1216:04]

Long story short, humans over the years

[1216:06]

have come up with shorthand notation

[1216:08]

known as regular expressions via which

[1216:10]

you can define patterns. This is useful

[1216:13]

because if I wanted to make a web page

[1216:15]

that does in fact require that someone

[1216:17]

type in say an email address, I can

[1216:19]

enforce that to some extent. If I go

[1216:21]

back to my browser here and into VS

[1216:23]

Code, let me go ahead and create a new

[1216:26]

file called say register.html to be

[1216:29]

representative of registering for some

[1216:31]

website. I'll change the title here real

[1216:32]

quick. I'm going to keep the form, but

[1216:35]

in this case, I'm not going to bother

[1216:37]

with Google anymore. So, let's make it a

[1216:39]

bit simpler than before. And let's go

[1216:41]

ahead and do this. Inside of the form,

[1216:44]

I'm going to have an input. Uh, I'm

[1216:47]

going to have the name of this input be

[1216:48]

email because that's what I'm

[1216:49]

collecting. I'm going to have a

[1216:50]

placeholder be quote unquote email so

[1216:52]

the user know what's to type in. Um, and

[1216:54]

I'm going to go ahead here and have

[1216:56]

something like how about

[1217:00]

uh this a pattern as well. So actually

[1217:04]

let's say uh let's say type equals text,

[1217:09]

but I'm going to specify additionally a

[1217:11]

pattern. So the pattern I want the user

[1217:13]

to type in in between these quotes is

[1217:15]

going to be any character one or more

[1217:17]

times. That is to say their username,

[1217:19]

then an at sign. then any character one

[1217:23]

or more times. Uh then literally a

[1217:26]

period and we didn't see this on the

[1217:27]

screen but just like in C when you want

[1217:28]

to escape special characters if you want

[1217:30]

literally a period in their input as the

[1217:33]

like the dot in harbor.edu you can say

[1217:36]

backslash period to mean a literal

[1217:39]

period and then the word or the uh tld

[1217:41]

edu. So I think now what this means and

[1217:44]

let me go ahead and give myself a button

[1217:46]

and just so you've seen it there's also

[1217:47]

a button element in HTML which is

[1217:49]

similar in spirit to the submit button

[1217:50]

we saw a moment ago. Let me go back to

[1217:52]

my directory listing go into

[1217:55]

register.html

[1217:56]

and let me go ahead and just type in

[1217:58]

like mail as my name register and you'll

[1218:02]

see please match the requested format.

[1218:04]

So I have not satisfied it properly

[1218:06]

until I actually type in something like

[1218:10]

and now it's happy. Alternatively, it's

[1218:12]

a little tedious to actually type in

[1218:14]

these patterns. So, there are some

[1218:15]

shorthands for them. I can actually get

[1218:17]

rid of this pattern. And if I read the

[1218:19]

documentation for HTML, there is

[1218:21]

actually an input of type email which

[1218:23]

just does all of that pattern matching

[1218:25]

for you. But the scary thing is that

[1218:27]

it's actually pretty involved to

[1218:29]

validate email addresses. I did a very

[1218:31]

simplified version of username at

[1218:33]

domain.tld.

[1218:35]

This is the regular expression that some

[1218:37]

browsers use to validate email addresses

[1218:40]

because even though mine is relatively

[1218:41]

simple [email protected], turns out

[1218:43]

there's a crazy amount of syntax that is

[1218:45]

valid in email addresses. And this is

[1218:47]

where regular expressions get scary. But

[1218:49]

for our purposes today, they're a thing

[1218:50]

that exists. You might find them useful

[1218:52]

in HTML. You might find them useful in

[1218:54]

Python. They're incredibly useful when

[1218:56]

it comes to extracting information from

[1218:58]

web pages. If you're analytically

[1218:59]

minded, you like the world of data

[1219:01]

science, you like to uh gather and

[1219:04]

analyze data, you can use regular

[1219:06]

expressions not just to validate data

[1219:07]

but to find patterns of data in actual

[1219:10]

websites or documents and extract that

[1219:12]

data so as to perform operations or

[1219:14]

analysis on them. So wonderfully useful

[1219:17]

if complicated tool. The catch though is

[1219:20]

this. Notice that here I'm still

[1219:24]

required to type in a valid email

[1219:26]

address register and I'm getting even

[1219:28]

more explicit information this time

[1219:30]

because I use the type equals email. The

[1219:32]

catch though with web pages is that

[1219:34]

they're not to be trusted in so far as

[1219:36]

this HTML came from the server and is

[1219:38]

downloaded onto the user's Mac or PC or

[1219:40]

phone where they have a copy thereof. I

[1219:43]

can open up developer tools as I did

[1219:45]

before by right-clicking or

[1219:46]

control-clicking and choosing inspect or

[1219:48]

whatever the menu option might be. I can

[1219:50]

go into the elements of this page,

[1219:52]

literally the HTML, and if I don't want

[1219:55]

to type in email, I want to just type in

[1219:57]

any old text and see if I can break your

[1219:59]

site, I can just change it. And now

[1220:02]

there is no such warning. Which is to

[1220:04]

say, even though you will encounter, not

[1220:06]

just today, but over the coming weeks as

[1220:07]

you play with HTML certain features,

[1220:10]

they are not to be trusted in general

[1220:13]

when it comes to security. And just like

[1220:14]

our discussion in the world of SQL and

[1220:16]

SQL injection attacks, this is one of

[1220:18]

the attack vectors. If two people are

[1220:20]

working on a website, one person's

[1220:22]

implementing the database stuff, one

[1220:23]

person's implementing the HTML, and the

[1220:25]

database person's like, "Oh, I don't

[1220:26]

need to worry about escaping characters

[1220:28]

because we're doing you we're using the

[1220:29]

pattern attribute in the HTML." Bad idea

[1220:33]

because it's this easy to hack a

[1220:34]

website, disable features that have been

[1220:37]

written for the site by just literally

[1220:39]

deleting them in your own copy. So,

[1220:41]

we'll see next week how we can defend

[1220:42]

against this on the server side, but the

[1220:44]

point now is just not to trust the

[1220:46]

user's input at all.

[1220:49]

All right. How can we be sure our HTML

[1220:52]

is right? Well, there's a bunch of ways,

[1220:53]

but one tool that's worth knowing about

[1220:55]

is this one here at validator.w3.org

[1220:59]

is a website uh by the group that

[1221:01]

essentially standardizes this and other

[1221:03]

languages. If I click on their validate

[1221:05]

by directput tab and I quickly go back

[1221:08]

into VS Code and let me grab the

[1221:10]

simplest of my examples, hello.html, I

[1221:12]

can just copy paste that into their

[1221:14]

website. Click check and they have

[1221:16]

written code to validate that the HTML I

[1221:19]

have written is in fact correct.

[1221:21]

Anything I've opened that needs to be

[1221:22]

closed has been closed. I don't have any

[1221:23]

stupid typos or missing brackets or

[1221:25]

quote marks. This is a wonderfully

[1221:27]

useful tool just to validate that your

[1221:29]

code is syntactically correct. Even

[1221:31]

though it might still look like a mess

[1221:33]

visually on the screen, this will at

[1221:34]

least check for you the underlying HTML.

[1221:40]

All right. So, up until now, everything

[1221:41]

I've done has been pretty boring. It's

[1221:44]

black and white. The pages are fairly

[1221:46]

simplistic. Turns out we can take things

[1221:48]

the final mile using another language

[1221:50]

altogether. Namely, something called

[1221:52]

CSS, which is the second of our three

[1221:54]

languages today. This two not a

[1221:56]

programming language, although

[1221:57]

curiously, they keep adding more and

[1221:59]

more features that are making it more

[1222:00]

and more like a programming language,

[1222:02]

but more on that another time. This

[1222:04]

stands for cascading stylesheets. And

[1222:06]

whereas HTML is all about the skeleton

[1222:08]

of a website, the structure thereof, CSS

[1222:11]

is like the the skin, the aesthetics

[1222:13]

thereof, the final mile that actually

[1222:15]

allows you to control the positioning of

[1222:17]

things more precisely, the colors, the

[1222:19]

font sizes, all of the aesthetics. It

[1222:21]

lets you do the finer touches on the

[1222:23]

website. And with CSS, we have slightly

[1222:27]

different syntax, but frankly, it just

[1222:28]

boils down to even more key value pairs.

[1222:31]

And as with HTML, we'll give you a taste

[1222:33]

of the basic structure and principles

[1222:35]

underlying CSS. There's so many uh key

[1222:38]

value pairs that are possible that we

[1222:39]

certainly won't do them justice today,

[1222:41]

but it's the kind of thing where you

[1222:43]

ultimately look it up in a reference, a

[1222:44]

book, um a website, or the like to pick

[1222:47]

up even more than these techniques.

[1222:49]

Well, let's do this. Let me propose that

[1222:50]

in a moment. We're going to see what are

[1222:52]

called properties. This is CSS's jargon

[1222:54]

for key value pairs. Why do we have yet

[1222:57]

another word? because a different group

[1222:58]

of humans in a different room came up

[1222:59]

with this language versus the other

[1223:01]

people. But it's just key value pairs

[1223:03]

known as now as properties instead of as

[1223:06]

attributes in HTML itself. There's going

[1223:08]

to be different ways we can define

[1223:10]

properties and this is kind of a laundry

[1223:12]

list of some of them and we'll see them

[1223:13]

in context. But in short, CSS is just

[1223:15]

going to allow us to slap a whole bunch

[1223:16]

of key value pairs on our HTML elements

[1223:19]

to make them hopefully look prettier or

[1223:21]

be more precisely controlled

[1223:22]

aesthetically. So, in my HTML, thus far,

[1223:25]

we've generally had something that looks

[1223:26]

like this. Turns out, if I want to start

[1223:29]

using some CSS, I can introduce, as

[1223:31]

we'll see, a so-called style tag in the

[1223:33]

head of my page. And inside of that

[1223:35]

style tag, I can put these so-called key

[1223:37]

value pairs. Or, as we'll soon see too,

[1223:39]

if I want to factor them out and put

[1223:40]

them into a separate file, I can

[1223:42]

actually use a link tag, which

[1223:45]

confusingly has nothing to do with

[1223:46]

hyperlinks or clickable text, but just

[1223:49]

links in another file. In this case,

[1223:51]

styles.css. the relationship of which

[1223:54]

shall be that of stylesheet. This the

[1223:56]

sort of copy paste stuff that you do

[1223:57]

where the only thing you really care

[1223:59]

about as the developer is the name of

[1224:00]

the file in which you're putting your

[1224:02]

styles. All right, let's do this. Let me

[1224:04]

go back over to VS Code, close out

[1224:06]

register.html, open up a new file this

[1224:09]

time called home.html, and let me

[1224:12]

purport to make a simple homepage for

[1224:14]

someone like John Harvard. I'll copy

[1224:16]

paste my boiler plate. I'll change the

[1224:18]

title here just to be uh let's say uh

[1224:22]

home. And then inside of the body of

[1224:24]

this page, let's do the simplest web

[1224:26]

page possible for someone called John

[1224:27]

Harvard. I'm going to say here's a

[1224:29]

paragraph of text uh when John Harvard

[1224:32]

is going to be the person's name. Here's

[1224:33]

another paragraph of text. Welcome to my

[1224:36]

homepage will be in the middle of this

[1224:38]

page. Then a final paragraph of text

[1224:41]

inside of which is like copyright.

[1224:44]

See how about uh John Harvard down here.

[1224:48]

So, it's a basic website. It's just

[1224:50]

three paragraphs. It's not going to be

[1224:51]

pretty, but let's make sure I haven't

[1224:52]

done anything wrong. Let me close my

[1224:54]

developer tools. Click back. Click home.

[1224:57]

And there we have it. The simplest of

[1224:59]

pages for John Harvard. Welcome to my

[1225:00]

homepage. Copyright John Harvard. Let's

[1225:03]

at least start to exercise some control

[1225:04]

over this. Let's change the font size

[1225:06]

and the alignment of the text. So, back

[1225:08]

in VS Code, let's go ahead and add uh

[1225:11]

for now, actually, not even a style tag,

[1225:14]

but a style attribute. I'm going to go

[1225:16]

ahead here and type in style quote

[1225:18]

equals quote unquote font-size

[1225:22]

large and then text-all

[1225:25]

colon center semicolon. And I apologize,

[1225:28]

but semicolons are back in CSS. Then, in

[1225:31]

my next paragraphs, open tag, let's do

[1225:33]

something similar, but different. font

[1225:35]

size colon medium for medium text align

[1225:39]

colon center semicolon. Uh, and then

[1225:41]

lastly down here, let's do style equals

[1225:43]

quote unquote. Font size colon small

[1225:46]

because it's the footer, so who cares?

[1225:48]

Text align colon center semicolon.

[1225:51]

Strictly speaking, at the last key value

[1225:53]

pair, otherwise known as a property, you

[1225:55]

don't need the semicolons, but just for

[1225:57]

consistency, I'll keep them uh for for

[1226:00]

that. All right, let's go back to this

[1226:02]

page, reload, and watch. All of the text

[1226:05]

a moment ago was left aligned and the

[1226:07]

same size. Now, it's a little subtle,

[1226:09]

but it's clearly centered, but it's

[1226:11]

large, medium, and small, respectively.

[1226:14]

Even if you've never seen CSS before,

[1226:16]

what rubs you wrong about this design,

[1226:17]

though, based on all weeks past?

[1226:22]

Yeah.

[1226:26]

>> Yeah. For every line, I've been

[1226:27]

repeating myself with text align center.

[1226:29]

Text align center. text in line center.

[1226:30]

And if we really want to nitpick, these

[1226:32]

aren't really paragraphs, right? There's

[1226:34]

like no phrases or full sentences, let

[1226:36]

alone paragraphs. So, it turns out

[1226:38]

there's a whole bunch of tags we can use

[1226:40]

to lay out a page. And in fact, I'm

[1226:42]

going to transition to one that's a

[1226:43]

little more generic than paragraphs,

[1226:45]

namely div, which is just going to

[1226:47]

create a division in the page for me.

[1226:49]

And this doesn't have any functional

[1226:50]

impact, but semantically it's a little

[1226:52]

nicer because it means I've got the

[1226:54]

division here for the header, the

[1226:55]

division here for the main part, and the

[1226:56]

division down here for the footer. It's

[1226:58]

just a different way of thinking about

[1226:59]

it. is just different rectangular swaths

[1227:01]

of the page. But I like your point that

[1227:03]

text align center is kind of stupidly

[1227:05]

duplicated all of these times. Let me

[1227:08]

actually go ahead and first reload this

[1227:10]

change because there is one side effect

[1227:12]

that we might want to get back. When I

[1227:14]

reload now using divs instead of

[1227:15]

paragraphs, well, there goes the nice

[1227:17]

white space in between my text. Divs

[1227:19]

just give me rectangle after rectangle.

[1227:21]

And as an aside, let me control-click or

[1227:24]

rightclick, open up developer tools yet

[1227:26]

again, and notice this other trick with

[1227:28]

your elements tab. Whatever you hover

[1227:31]

over at the bottom of your screen will

[1227:33]

be colorcoded at the top of the screen.

[1227:35]

So if I dive into the body by clicking

[1227:37]

this little triangle, let me zoom in. At

[1227:40]

bottom left, I can now see my own HTML

[1227:43]

much more uh pretty printed and colorful

[1227:45]

down here. If I click on this one or

[1227:47]

hover over it, you'll see that the first

[1227:49]

div, the rectangular region is

[1227:50]

highlighted. Now the second, now the

[1227:52]

third. That's all we mean by divisions

[1227:54]

of the page. Um, this allows me to see

[1227:56]

my copy of it in the browser as opposed

[1227:59]

to in the original file. So just another

[1228:01]

technique for developer tools. All

[1228:03]

right, but I don't like this

[1228:04]

duplication, but here is now the C in

[1228:07]

CSS. Cascading stylesheets means that if

[1228:10]

you want one property or key value pair

[1228:13]

to sort of cascade down on all of the

[1228:15]

other tags inside of that one, you can

[1228:18]

do that. For instance, in the body tag,

[1228:20]

I can add my own style attribute here

[1228:22]

and put all of that text align center

[1228:24]

there. Why? Because div are the three

[1228:27]

children of the body tag to borrow our

[1228:29]

vernacular from family trees and from

[1228:31]

trees more generally. So, this too

[1228:34]

should work because text align center

[1228:35]

should cascade down now on all three of

[1228:38]

those children. And indeed, if I reload

[1228:39]

the page, nothing visually changes, but

[1228:42]

it's arguably now better designed.

[1228:46]

All right, what more could we do here?

[1228:48]

Well, how about this? It would be nice

[1228:50]

to make clear to servers out there, like

[1228:53]

search engines, like what's going on in

[1228:54]

the page semantically. And the term of

[1228:56]

art out there nowadays is the semantic

[1228:57]

web, which essentially is about putting

[1228:59]

more hints in your HTML so that servers

[1229:02]

like um search engines kind of know more

[1229:05]

so what they're looking at. This is

[1229:07]

pretty generic right now. Div, div, div.

[1229:09]

But presumably the top of the page is

[1229:11]

among the most important things because

[1229:12]

that's effectively like the header of

[1229:14]

the page. Then the middle div is kind of

[1229:16]

the second most important because it's

[1229:18]

like the main part of the page and the

[1229:19]

footer is like the least important. So

[1229:21]

it turns out there are other tags in

[1229:23]

HTML besides paragraphs and divs. There

[1229:25]

are literally tags like header which

[1229:28]

allows me to define the header of the

[1229:29]

page, main which allows me to define the

[1229:32]

main part of the page and then even

[1229:33]

footer which allows me to define that

[1229:35]

too. So now if Google and Bing and other

[1229:37]

search engines are sort of crawling my

[1229:39]

website once it's public, they know that

[1229:41]

John Harvard's important because it's in

[1229:43]

the header, uh, welcome to my homepage

[1229:45]

is important because it's in the main

[1229:46]

page. They're probably not going to care

[1229:47]

as much about the copyright because it's

[1229:49]

in the footer. So it's just providing

[1229:50]

more hints to these kinds of services.

[1229:53]

Um, moreover, we can do some other

[1229:55]

things here. This is kind of a hackish

[1229:57]

way to implement a copyright symbol.

[1229:59]

HTML also has what are called entities

[1230:01]

where if I can do this magical

[1230:03]

incantation here, uh, amperand hash

[1230:06]

symbol 169 semicolon. Notice that VS

[1230:09]

code recognizes this as an HTML entity.

[1230:11]

If I go back to this page and notice my

[1230:14]

first approach was just parenthesis C

[1230:16]

parenthesis. If I reload now, having

[1230:18]

used that HTML entity, which I only know

[1230:20]

by having looked it up, now I get the

[1230:22]

copyright symbol that actually comes in

[1230:24]

the font that's being used here.

[1230:28]

All right, so let's transition now to

[1230:31]

this approach whereby I claimed before

[1230:33]

that you can actually use a style tag.

[1230:36]

And why might we want to do this? Well,

[1230:37]

looking back at my code here, this is

[1230:39]

sort of hinting at potentially bad

[1230:41]

design. Even though there are different

[1230:43]

arguments for and against this, right

[1230:45]

now I'm sort of co-mingling my data with

[1230:47]

my presentation thereof. Like John

[1230:49]

Harvard, welcome to my homepage and

[1230:50]

copyright such and such is sort of the

[1230:52]

data I care about. Um, but I'm sort of

[1230:55]

mixing in the stylization of all of this

[1230:57]

stuff by putting CSS and HTML in the

[1231:00]

same place. So to be clear, all of the

[1231:02]

green stuff and even well everything

[1231:05]

we've seen thus far, the tags and the

[1231:07]

attributes, that's all HTML syntax.

[1231:10]

Everything between the quotes is now

[1231:12]

CSS. And this is the first we've seen

[1231:15]

this before only in the sense that we've

[1231:16]

used SQL inside of Python code. Here

[1231:19]

we're using CSS inside of HTML code. But

[1231:21]

the CSS syntax is everything thus far

[1231:23]

inside of those quote marks. Wouldn't it

[1231:26]

be nice to kind of factor that out so

[1231:28]

that I can see it all in one place and

[1231:29]

better still factor it out ultimately to

[1231:31]

another file? And I can do this as

[1231:33]

follows. Let me in my home.html HTML get

[1231:37]

rid of all of these style attributes and

[1231:39]

really go whittle the page down to its

[1231:41]

essence whereby I just have the header

[1231:46]

main and footer tags inside of which is

[1231:48]

that content. It's already easier to

[1231:50]

read at least for me the human inside of

[1231:52]

my head tag. Now though let me go up and

[1231:54]

say style and inside of this new style

[1231:56]

tag let me show you another approach for

[1231:58]

stylizing the page. Up here is where we

[1232:01]

can actually select elements to operate

[1232:03]

on using what are called selectors. So

[1232:06]

if I want to modify the style of my

[1232:07]

page's body, I can do that by typing

[1232:10]

body. And then I'm afraid curly braces

[1232:12]

are back in CSS 2, I can put text align

[1232:15]

center up here. And the fact that I've

[1232:17]

put the word body before those curly

[1232:19]

braces just means all of these key value

[1232:21]

pairs, one in this case, will operate on

[1232:23]

the body. Meanwhile, down here, I can

[1232:25]

say the header is going to have font

[1232:28]

size colon large. Uh, the main part of

[1232:31]

the page is going to have font size

[1232:33]

colon medium. And then lastly, the

[1232:35]

footer of the page is going to have font

[1232:37]

size colon small. You know, definitely

[1232:40]

more lines now, which isn't the best,

[1232:42]

but the effect now if I go back to my

[1232:44]

browser and reload visually is pretty

[1232:48]

much the same. I've just relocated all

[1232:50]

of those key value pairs elsewhere, but

[1232:52]

as a stepping stone now for doing

[1232:54]

something a little smarter whereby I now

[1232:57]

can uh lay the foundation for putting

[1233:00]

this in another file al together. But

[1233:02]

first, let me note this too. The fact

[1233:04]

that I've put all of these key value

[1233:05]

pairs associated with specific HTML tags

[1233:09]

doesn't really make them very usable or

[1233:11]

re rather reusable. And so when I

[1233:13]

alluded to earlier that these properties

[1233:15]

can be applied to different selections

[1233:17]

of HTML type selectors, class selectors,

[1233:20]

ID selector, attribute selector. Let's

[1233:21]

just give you a little taste of this.

[1233:22]

What do we mean? Well, suppose that I

[1233:25]

want to generically be able to use text

[1233:29]

align center uh without associate it

[1233:31]

only with the body. Maybe I want to use

[1233:33]

this for a larger project where I want

[1233:34]

to uh center many things on the page. I

[1233:37]

can define my own keyword like the word

[1233:39]

centered which doesn't exist per se but

[1233:42]

if I prefix it with a dot what I've just

[1233:44]

created is what's called a CSS class and

[1233:47]

a class is just a set of key value pairs

[1233:51]

properties that you can associate with

[1233:52]

any HTML tags meanwhile if I want this

[1233:56]

key value pair to be associated with the

[1233:58]

notion of large I can define large I can

[1234:00]

define medium and I can define dot small

[1234:03]

down here the motivation for which is

[1234:05]

that now in my page page. If I want to

[1234:09]

center the body, oops, let me fix my own

[1234:11]

typo. If I want to center the body, I

[1234:14]

can say please use the class known as

[1234:17]

centered on this tag. And then on the

[1234:20]

header, I can say please use the class

[1234:22]

known as large on this tag. And then

[1234:25]

please use the class called medium here.

[1234:27]

And then lastly, use the class called

[1234:29]

small here. So now in the spirit of a

[1234:32]

lot of the modularization we did in

[1234:33]

Scratch and in CN Python of making your

[1234:35]

own functions, classes aren't functions,

[1234:37]

but they are a way to encapsulate one or

[1234:39]

more properties and use or reuse them

[1234:43]

anywhere you want in a web page. It's

[1234:45]

not that over it's not that impressive

[1234:46]

here in this short one, but it lays the

[1234:48]

foundation for doing much more

[1234:50]

interesting things soon down the road.

[1234:53]

In fact, let's take a step in that same

[1234:55]

direction. Let me go ahead and now

[1234:57]

highlight everything I've put inside of

[1235:00]

this style tag

[1235:02]

um and cut it onto my clipboard. I'm

[1235:04]

going to get rid of the style tag al

[1235:06]

together. I'm going to create quickly a

[1235:08]

new file comb.css and I'm just going to

[1235:11]

paste all of that stuff in there. And

[1235:12]

just to be nitpicky, I'm going to

[1235:13]

de-indent it so it's all left aligned.

[1235:16]

So all I've done is just move everything

[1235:18]

I just wrote into a new file called

[1235:20]

home.css.

[1235:21]

I'll close that. Out of sight, out of

[1235:23]

mind. But what I'm going to do now in

[1235:24]

the head instead of a style tag which

[1235:26]

contained all of that clutter, I'm going

[1235:28]

to say link href equals home.css and

[1235:32]

then this real tag which just means the

[1235:34]

relationship of this file to this one

[1235:35]

should be that of a stylesheet. And this

[1235:38]

tag 2 does not need to be closed. It

[1235:40]

just is. And now if I go back here and

[1235:43]

reload, still no changes other than the

[1235:46]

tweaked the font a moment ago. Still no

[1235:48]

changes. But now it's better design with

[1235:52]

that file completely separate. So where

[1235:55]

are we going with this? Well, just to

[1235:57]

kind of circle back to something we did

[1235:58]

earlier, let me open up my terminal

[1236:00]

window. And recall earlier we had this

[1236:02]

file like favorites0.html.

[1236:05]

And this contained all of the data from

[1236:06]

a couple of weeks back that we solicited

[1236:08]

via that Google form. And recall a bit

[1236:10]

ago when we went into favorites 0.html.

[1236:13]

I mean, it was just kind of an ugly uh

[1236:15]

table structure. But it turns out in the

[1236:18]

world of uh in the world of HTML and

[1236:20]

CSS, there are also what we're going to

[1236:22]

call frameworks, which is a fancy word

[1236:24]

for library. But a framework is sort of

[1236:27]

a way of doing something by using

[1236:29]

someone else's library. And to do it

[1236:31]

their way, you just read their

[1236:32]

documentation and then you adopt their

[1236:34]

functions in the case of code or you

[1236:36]

adopt their CSS classes in the case of

[1236:38]

this example. So, one of the most

[1236:40]

popular frameworks out there nowadays

[1236:41]

and among the simplest and best

[1236:42]

documented is one called Bootstrap. Uh,

[1236:45]

which is a set of uh CSS classes and

[1236:50]

other features that you can use because

[1236:52]

it's open source in your own code. And

[1236:54]

in fact, all of the documentation is at

[1236:56]

this URL here. I read the documentation

[1236:58]

before class and I copied really the one

[1237:00]

line of code that I need to make

[1237:01]

favorites.html

[1237:03]

even prettier. So, let me go back into

[1237:05]

VS Code and let me copy my pre-made

[1237:07]

example from earlier. And you'll see

[1237:09]

that in favorites, whoops, favorites

[1237:12]

one.html,

[1237:13]

I have all of the same code, all of

[1237:16]

those lines of everyone's submissions.

[1237:18]

But notice I've added now this link tag.

[1237:21]

And it's a little longer than the one I

[1237:22]

wrote. It's referencing a third party

[1237:24]

website, JS Deliver, which is a CDN,

[1237:26]

content delivery network, which is to

[1237:28]

say a server that just serves up content

[1237:30]

for other people to use. But I copied

[1237:32]

that from Bootstrap's own documentation.

[1237:34]

And what I did here is the following. I

[1237:36]

added a class to my table tag

[1237:40]

specifically with a value of table and

[1237:42]

followed by a space table striped. Why?

[1237:44]

Well, I read Bootstrap's documentation

[1237:46]

at that previous URL and I liked the

[1237:48]

look of their tables because it lays it

[1237:50]

out with nice stripes like white and

[1237:52]

gray and white and gray and it sort of

[1237:53]

formats everything quite a bit nicer.

[1237:56]

So, if I go into this version in my

[1237:59]

second tab by going back first and now

[1238:01]

opening up favorites 1.html, HTML, same

[1238:04]

exact data, two lines of change, and

[1238:08]

voila, now we're talking. This looks

[1238:10]

much more like a table that you would

[1238:12]

see on any pretty website like your

[1238:14]

Gmail inbox or the like, all by simply

[1238:17]

changing the CSS and not really the HTML

[1238:19]

at all. So, the motivation for

[1238:21]

introducing those classes a moment ago

[1238:22]

was so that we can have reusability of

[1238:25]

code. And better still, we can start to

[1238:26]

stand on the shoulders of others by

[1238:28]

using code that other people have

[1238:30]

written in order to improve the

[1238:31]

aesthetics of our own websites

[1238:34]

as well. All right, how about a couple

[1238:36]

of final flourishes with some style? Let

[1238:39]

me close out these examples here and let

[1238:41]

me propose to go into how about that

[1238:45]

same link example from earlier. So, let

[1238:47]

me reopen link.html, which recall had

[1238:49]

this fishing attack at the time. I'm

[1238:51]

going to revert this to the safe version

[1238:53]

and just say visit Harvard at Harvard's

[1238:55]

actual URL. Suppose I wanted to stylize

[1238:57]

this link beyond the default. Well,

[1238:59]

let's see what it looks like by default.

[1239:01]

If I go back into link.html, this is

[1239:03]

what it looked like before, blue and

[1239:05]

underlined by default per the browser's

[1239:07]

decision. But I can override that and

[1239:09]

any number of ways to keep things

[1239:10]

simple. I'm just going to stay in my

[1239:12]

same file now rather than uh be pedantic

[1239:15]

about moving it to another file. And if

[1239:17]

I want to stylize the anchor tag, just

[1239:19]

as before, I can say a and then in some

[1239:22]

curly braces here, I can do something

[1239:23]

like this. Color uh colon red. If I want

[1239:26]

to make it crimsonlike instead, let me

[1239:29]

go back to VS Code or my other tab.

[1239:31]

Click reload. And now we have a red tab.

[1239:33]

I can really geek out. And if you

[1239:35]

remember your hexadimal codes from our

[1239:37]

discussion of images a few weeks back, I

[1239:39]

can do hash FF000000,

[1239:42]

which is a lot of red, no green, no

[1239:44]

blue. And if I go back to my other tab,

[1239:46]

click reload, same exact thing. You have

[1239:48]

that much control over even the color

[1239:50]

codes that you might use. Maybe you

[1239:52]

don't like the underlining in this

[1239:53]

particular case. Well, that's fine. I

[1239:54]

can do something like text decoration

[1239:57]

none per the documentation. I can reload

[1240:01]

and gone is that underline. Maybe it'd

[1240:03]

be nice to hover over the word and then

[1240:04]

see the underline. Well, I can do that,

[1240:06]

too. Turns out I can have these pseudo

[1240:08]

selectors whereby I say the name of the

[1240:09]

tag, then a keyword like hover, which

[1240:12]

browsers know to recognize. And when I

[1240:14]

hover over an anchor, what I want to do

[1240:16]

is change the text decoration to

[1240:18]

underline temporarily. If I go back to

[1240:21]

this tab now, reload, looks the same,

[1240:24]

but as I move my cursor over, notice

[1240:26]

that it's underlining it for a visual

[1240:28]

effect. Let's see what's going on with

[1240:30]

my developer tools. If I right click

[1240:32]

anywhere and choose inspect, notice a

[1240:34]

detail I haven't showed us before is not

[1240:37]

uh is under the elements tab here.

[1240:39]

Notice if I go down to my link here and

[1240:41]

let me just make the right hand pane

[1240:43]

here a bit bigger. All this time but

[1240:45]

ignored up until now has been this part

[1240:47]

of developer tools whereby I can

[1240:50]

actually see all of the CSS that applies

[1240:52]

to the element I have just selected,

[1240:54]

namely this link. And I see here in nice

[1240:56]

pretty printed fashion that I'm using

[1240:58]

this color FF00000000 text decoration

[1241:00]

none. Why is this useful? Well, one, if

[1241:03]

you want to learn from another website

[1241:04]

how it's doing its thing, you can just

[1241:06]

look at the CSS, but also if you want to

[1241:08]

be able to iterate more quickly and just

[1241:10]

kind of tinker with things, I can

[1241:11]

actually turn the color on and off by

[1241:13]

just hovering over the inspector here

[1241:16]

and just turn it on and off by clicking

[1241:18]

and uncclicking. And if I want to just

[1241:20]

play around with, oh, maybe maybe

[1241:22]

Harvard should be 00 FF0000, enter, I

[1241:25]

can make it green instead. So, you can

[1241:27]

temporarily change the browser's copy of

[1241:29]

your own HTML or CSS just to tinker and

[1241:32]

iterate quickly just like I tinkered

[1241:34]

with Stanford's uh own website or at

[1241:36]

least my own copy thereof.

[1241:40]

Lastly, how about in terms of these

[1241:42]

selectors? These are using type

[1241:44]

selectors that is selecting the name of

[1241:46]

the tag. If I want to actually uh affect

[1241:49]

one tag specifically, a very common

[1241:52]

convention is to give an HTML element a

[1241:54]

unique ID. For instance, I'm going to

[1241:56]

call this Harvard. And by uh honor

[1241:58]

system, I should not give any other

[1242:00]

element in this page an ID of Harvard.

[1242:03]

The motivation is that I can now

[1242:04]

uniquely identify this tag by for

[1242:07]

instance changing this to hash Harvard,

[1242:10]

which is just the convention for

[1242:11]

specifying that it's not a class now.

[1242:13]

It's instead an ID. You do not put the

[1242:16]

hash though in the actual value down

[1242:18]

here. And what I can even do down here

[1242:20]

is something like um uh hash harbored to

[1242:25]

scope that as well. If I now reload,

[1242:27]

we're back to the red version and the

[1242:29]

same functionality as before. And it's

[1242:31]

just a more precise way now to target

[1242:33]

your CSS properties to a very specific

[1242:37]

element instead.

[1242:40]

Okay, that was a lot. Any questions on

[1242:43]

any of this thus far?

[1242:48]

No. That clear? All right. Well, one

[1242:52]

last language for the day. And and we do

[1242:54]

mean what we say like that is the extent

[1242:56]

to which you will learn formally HTML

[1242:58]

and CSS like everything else just

[1243:00]

follows those exact same patterns. It's

[1243:02]

different classes. It's different

[1243:04]

attributes. It's different tag names.

[1243:06]

All of which can be picked up through

[1243:07]

practice, through uh osmosis, through uh

[1243:09]

references. But that's really it for the

[1243:11]

fundamentals. And so our last focus

[1243:13]

today is on an actual programming

[1243:15]

language that we'll just scratch the

[1243:16]

surface of, if only because it's so darn

[1243:18]

omnipresent nowadays. Most every website

[1243:21]

you use is made from not only HTML and

[1243:23]

CSS, but if it's in any way interactive,

[1243:25]

odds are it's using JavaScript, a

[1243:27]

programming language that is very

[1243:28]

commonly used client side whereby humans

[1243:31]

write the code on the server, but then

[1243:33]

your browser as before downloads it to

[1243:34]

the client and then it runs in your own

[1243:36]

Mac, your PC or your phone. That said,

[1243:38]

JavaScript is also very popular on the

[1243:40]

server nowadays. It's not just a

[1243:42]

browserbased language. In JavaScript,

[1243:45]

what you have most powerfully though is

[1243:47]

the ability in memory to mutate this

[1243:50]

tree in real time. In other words, think

[1243:52]

about even your Gmail inbox or your

[1243:54]

Outlook inbox. Typically, you see email

[1243:56]

after email after email after email.

[1243:58]

Odds are per today, what HTML tag is

[1244:00]

creating that UI of row after row after

[1244:03]

row?

[1244:05]

Which tag?

[1244:07]

like table tag like the table tag

[1244:10]

probably right table row table row table

[1244:12]

row but it wouldn't make well actually

[1244:15]

this is the way things used to work in

[1244:16]

my day back in the day when you visited

[1244:18]

not even Gmail before it existed but

[1244:20]

your email inbox you would download from

[1244:22]

the server a web page containing a table

[1244:24]

tag with table rows and table data

[1244:26]

elements and that was your inbox if you

[1244:28]

wanted to see if you got new mail you

[1244:30]

just reload the whole page and it would

[1244:32]

download new contents from the server

[1244:34]

and show you the new HTML with

[1244:36]

JavaScript

[1244:37]

which has come onto the scene over the

[1244:38]

past 20 plus years. You have the ability

[1244:40]

to download the data once initially,

[1244:43]

then use code to just grab some more

[1244:45]

data every 30 seconds or some more data

[1244:47]

pretty much anytime an email arrives.

[1244:49]

And if this picture here represents not

[1244:52]

our super simple hello title, hello body

[1244:53]

page, but a whole bunch of table rows

[1244:55]

for your existing email. The moment you

[1244:57]

get more email, you can use JavaScript

[1244:59]

code to add another node to this tree,

[1245:02]

another node to this tree representing

[1245:03]

the table row tag. the table row tag

[1245:06]

again and again. So in short, with

[1245:08]

JavaScript, you have the ability to

[1245:10]

change the tree, otherwise known as the

[1245:12]

document object model or DOM for short,

[1245:16]

dynamically in order to evolve the web

[1245:19]

page. So let's take a quick tour of what

[1245:21]

JavaScript does have syntactically and

[1245:22]

then I'll just demonstrate some of the

[1245:24]

capabilities thereof without dwelling

[1245:26]

today on syntax beyond this. So in

[1245:28]

Scratch, which is looking pretty good

[1245:30]

now, you had conditionals which looked

[1245:31]

like this. In JavaScript, it's pretty

[1245:33]

much the same as C. The curly braces are

[1245:35]

back at least for uh two or more lines.

[1245:37]

Uh but uh indentation doesn't matter

[1245:40]

except for the style thereof as it uh as

[1245:43]

in contrast with Python. If you have an

[1245:44]

if else, it's going to look the exact

[1245:46]

same in C. If you have if else if else,

[1245:49]

you have the exact same thing in C.

[1245:51]

Different from Python because this was l

[1245:52]

if in Python. Now we're back more

[1245:54]

verbosely to else if as in C. Uh

[1245:57]

variables in JavaScript. Well, here in

[1245:59]

Scratch is how you set a variable

[1246:00]

counter to zero. In JavaScript, there's

[1246:02]

a few ways to do this, but the most uh

[1246:05]

reasonable for now is to let counter

[1246:07]

equal zero. So, you don't specify the

[1246:09]

type. This is more of a polite way of

[1246:11]

asking the browser, please let a

[1246:13]

variable called counter exist and set it

[1246:15]

equal to zero by default. Semicolons are

[1246:18]

back. However, that's not strictly true.

[1246:20]

Browsers are smart enough to know where

[1246:21]

semicolons actually matter, but for our

[1246:23]

purposes, assume that they're always

[1246:25]

there. How do you change counter by one?

[1246:26]

Well, you can do it the pedantic way,

[1246:28]

which is a little verbose. You can do

[1246:29]

the plus equals trick or nicely back in

[1246:32]

play is the plus+ in JavaScript just

[1246:34]

like in C but not in Python. Loops in

[1246:37]

JavaScript. Well, in Scratch, if you

[1246:39]

want to do things three times, here's

[1246:41]

how you would do it in JavaScript. It's

[1246:43]

pretty much the same as C except for not

[1246:46]

mentioning the data type. Instead, you

[1246:48]

use the keyword let here. But otherwise,

[1246:50]

this is exactly the same as in C. Uh if

[1246:52]

you want to do something forever for

[1246:54]

whatever reason in JavaScript, you can

[1246:55]

say while true, which is exactly how we

[1246:57]

did it in C. If you have a web page like

[1247:02]

this, meanwhile,

[1247:04]

and you want to insert some JavaScript

[1247:05]

to it, you can do it in a couple a few

[1247:07]

different ways. You can put a script tag

[1247:10]

just like the style tag in the head of

[1247:11]

the web page. This can get you into

[1247:13]

trouble though for reasons you might

[1247:15]

encounter whereby if you put your

[1247:16]

JavaScript code up here and you try to

[1247:18]

use it to modify the web page but the

[1247:20]

web page isn't defined until down here

[1247:23]

you can get into some uh a race

[1247:26]

condition really where the data does not

[1247:27]

yet exist. So um you instead of putting

[1247:30]

it there or even in another file, it's

[1247:32]

actually pretty common too to avoid that

[1247:34]

altogether by putting your script code

[1247:36]

or your script tag at the end of the

[1247:38]

page just before the end of the body to

[1247:40]

ensure that all of the web page exists

[1247:42]

already. This is similar in spirit to

[1247:44]

the deaf issues we saw in Python or the

[1247:47]

prototype issues we saw in C. There's

[1247:49]

bunches of solutions though to this here

[1247:51]

problem. But let's now take some

[1247:52]

JavaScript for an actual spin and use VS

[1247:56]

Code to write some of it as follows. uh

[1247:58]

in VS Code. Let me go ahead and close

[1248:00]

link.html,

[1248:02]

open up my terminal temporarily, and

[1248:04]

let's improve my actually let's just

[1248:07]

improve the very file, hello.html, that

[1248:09]

I have here in front of me, and actually

[1248:11]

have it be more interactive and give me

[1248:13]

sort of a popup on the screen when I

[1248:16]

type in my name. So, let's start as

[1248:18]

follows. First, let's go ahead and

[1248:20]

change this just to uh hello, just for

[1248:22]

short. And in the body of this page,

[1248:24]

let's give myself a form. And in this

[1248:26]

form, let's give myself an input. Uh,

[1248:28]

we'll turn off autocomplete just to

[1248:30]

avoid distractions. We'll turn on

[1248:32]

autofocus to save me a click. I'm going

[1248:34]

to give this HTML element an ID uniquely

[1248:37]

of name. A placeholder also of name just

[1248:40]

so the human knows what to do. And the

[1248:42]

type of this field shall be text. In

[1248:43]

other words, I want to create a program

[1248:46]

week one and week zero where I type in

[1248:48]

my name and see hello such and such. I'm

[1248:51]

going to give myself an input a submit

[1248:53]

button with input type equals submit.

[1248:55]

don't really care what the button says,

[1248:56]

but I do care now when I go back to my

[1248:59]

other tab, close my developer tools, go

[1249:02]

back into hello.html, I now have

[1249:04]

something that looks like this. It looks

[1249:05]

similar to our search example for cats,

[1249:07]

but now I'm asking the user for their

[1249:09]

name along with the submit button. But

[1249:11]

what I want to have happen is when I

[1249:12]

type in David and click submit, I want

[1249:14]

to see hello David somewhere on the

[1249:15]

screen. Well, how can I do this? Well, a

[1249:18]

few different ways, but JavaScript

[1249:19]

allows me to do things like this. And

[1249:21]

for upcoming problem sets, you won't

[1249:23]

necessarily have to write JavaScript

[1249:25]

like this. So consider this a whirlwind

[1249:26]

tour, not so much uh something to

[1249:29]

ingrain. Here I can add a new attribute

[1249:33]

to the form tag called onsubmit, which

[1249:35]

as the name suggests means call the

[1249:37]

following function when this form is

[1249:40]

submitted. Well, what function do I want

[1249:41]

to call? I'm going to call it a greet

[1249:43]

function. And that's it for now. How do

[1249:45]

I define a greet function? Well, I

[1249:48]

could, among other places, put this

[1249:50]

inside of the head of my page in a

[1249:51]

script tag. I can define a function in

[1249:53]

JavaScript by literally saying function

[1249:55]

and then the name of the function and

[1249:56]

then in parenthesis any arguments there

[1249:58]

too. I'm not going to have any. And then

[1250:00]

in curly braces, I can actually define

[1250:01]

the meat of that function. And for

[1250:03]

instance, I can do this. Uh, let name

[1250:06]

equal the following document.query

[1250:10]

selector. And now what I want to do is

[1250:13]

this. Document is a global variable that

[1250:15]

just comes with JavaScript in the

[1250:17]

browser that allows me to write code

[1250:19]

involving the whole document, the web

[1250:20]

page itself. Query selector is a fancy

[1250:23]

name for a function that lets me select

[1250:25]

specific elements of the page using CSS

[1250:29]

selector. So the very same syntax we saw

[1250:31]

with names and with dots and with hash

[1250:33]

symbols a moment ago are back in play

[1250:35]

for JavaScript here. So if I want to

[1250:38]

create a variable that stores the name

[1250:41]

that the human typed in, what I can do

[1250:43]

is pass to query selector a selector for

[1250:47]

that element, which is quote unquote

[1250:49]

hash name, where hash just means ID. But

[1250:51]

the reason I'm using name is because the

[1250:53]

unique identifier I put here is name. If

[1250:56]

I change this to foo nonsensically,

[1250:57]

that's fine. I just have to change this

[1250:59]

to foo up here. So I'm in full control

[1251:01]

over what is called what. But if I want

[1251:03]

to get the value that the user typed

[1251:05]

into that box, I now do value. And we've

[1251:07]

seen these dots before. In C, they were

[1251:09]

for accessing strrus. In Python, they

[1251:11]

were for accessing contents of objects.

[1251:13]

So this just means use the document

[1251:15]

global variable, use the query selector

[1251:18]

function or method inside of it, get the

[1251:20]

element whose unique ID is name, and

[1251:22]

then go inside of that text box and give

[1251:24]

me its value. So it's a very long-winded

[1251:26]

way of saying store the user's input in

[1251:28]

a variable called name. But what's nice

[1251:30]

now, even though this is going to be a

[1251:31]

bit ugly, is I can then use a built-in

[1251:35]

JavaScript function called alert. And I

[1251:37]

can say something like hello, close

[1251:39]

quote, then plus, which we've seen

[1251:42]

before in Python, and concatenate with

[1251:44]

it that name's V value. Now, this isn't

[1251:48]

quite complete and for reasons I'm going

[1251:49]

to wave my hand at. I also need to add

[1251:52]

annoyingly return false down here

[1251:53]

because otherwise if I click submit,

[1251:55]

yes, the greet function will get called,

[1251:57]

but the browser will still try to submit

[1251:58]

the form to a server which is going to

[1252:00]

interrupt my own code. So, long story

[1252:01]

short, this is a bit of a hackish

[1252:03]

approach for now to just making sure

[1252:04]

that the only thing that happens when I

[1252:06]

submit this form is that my function is

[1252:09]

called. Now, if I didn't screw anything

[1252:11]

up, I should now see after reloading

[1252:13]

this page a prompt for my name. I'll

[1252:15]

type it in and when I click submit, I

[1252:18]

should see an ugly but functional alert

[1252:21]

box pop up with dynamically generated

[1252:23]

text, namely hello, David. I say it's

[1252:26]

ugly because by convention, Chrome shows

[1252:28]

you the full URL or the domain name of

[1252:30]

the website in question, which is my

[1252:32]

randomly generated one, which does look

[1252:33]

stupid. So, we can do better than this.

[1252:35]

But the point is now that I have written

[1252:38]

code in JavaScript to listen for the

[1252:42]

submission of this form and when that

[1252:43]

happens call that their function. And

[1252:46]

this is generally the paradigm of

[1252:47]

JavaScript. There exists in the context

[1252:49]

of websites a whole bunch of events that

[1252:51]

can happen. And this is a word we

[1252:52]

haven't used since week zero in Scratch.

[1252:54]

Recall that in Scratch you have events

[1252:56]

like when green flag clicked and when

[1252:57]

the green flag is clicked you can do

[1252:59]

something in response. Same thing in the

[1253:01]

world of web programming. Here are just

[1253:03]

some of the events that can happen in a

[1253:05]

web page. Like the user can change

[1253:07]

something, click on something, drag

[1253:08]

something, key up, put the keyboard up,

[1253:10]

put the mouse down, or other things.

[1253:13]

What I'm listening for is the submission

[1253:14]

of a form, which is cool because in

[1253:16]

JavaScript then you can essentially

[1253:19]

write code that listens for any number

[1253:20]

of these events and then does something

[1253:22]

when it happens. Consider after all in

[1253:24]

Gmail, if you click the little refresh

[1253:26]

icon within Gmail itself to get new

[1253:28]

mail, it runs some JavaScript code. it

[1253:30]

turns out to talk to Google's servers,

[1253:32]

get more email, and update your site. If

[1253:34]

you click and drag on Google Maps to see

[1253:36]

like higher up geographically, well,

[1253:38]

what's happening? Some JavaScript code

[1253:40]

is listening for your mouse going down

[1253:42]

and dragging so as to go fetch more

[1253:45]

tiles, more rectangular pictures of the

[1253:47]

map wherever you're trying to drag. So,

[1253:49]

anything that's interactive in websites

[1253:51]

nowadays like that is using JavaScript

[1253:53]

by just listening for things that you or

[1253:56]

someone else might actually do. Well,

[1253:58]

let me go ahead and start opening some

[1254:00]

pre-made examples just to give you a

[1254:02]

sense of the other syntax that is in use

[1254:04]

today with JavaScript. I'm going to go

[1254:06]

ahead and open up a version of this

[1254:08]

hello program called hello2.html,

[1254:12]

which is different in that I'm

[1254:14]

practicing what I preached earlier by

[1254:16]

putting the script tag at the bottom of

[1254:18]

the page just to ensure that the form

[1254:20]

and everything inside of it already

[1254:22]

exists for sure by the time this code

[1254:24]

executes. Moreover, what I'm getting out

[1254:26]

of is the business of using the onsubmit

[1254:29]

attribute. So, just as I tried to get my

[1254:32]

CSS out of my HTML and put it elsewhere,

[1254:35]

similarly, I'm trying to get my

[1254:36]

JavaScript code like the greet function

[1254:38]

out of the HTML and putting it down

[1254:40]

here. Now, why is this useful? This is a

[1254:42]

big mouthful, but it just follows a

[1254:43]

general pattern as follows.

[1254:45]

Document.query selector quote unquote

[1254:47]

form is just getting a reference to the

[1254:51]

actual form element in the page. So if

[1254:52]

you imagine in your mind's eye that this

[1254:54]

is drawn out as a tree in the computer's

[1254:55]

memory, this is just getting me a

[1254:58]

pointer to the form node in that tree.

[1255:01]

Haven't seen this before, but it kind of

[1255:02]

does what it says. Add event listener is

[1255:05]

a function or method that you can call

[1255:06]

on any element that just tells it to

[1255:09]

listen subsequently for this event and

[1255:11]

when that event is heard, submit in this

[1255:13]

case, call the following anonymous

[1255:15]

function, otherwise known as a lambda

[1255:17]

function. But long story short, this

[1255:18]

syntax just means when submit happens on

[1255:20]

that element, execute the code between

[1255:23]

these curly braces. What happens? Alert.

[1255:26]

Hello. Quote unquote document.query

[1255:28]

selector name.val. I didn't bother with

[1255:29]

the variable this time. This does

[1255:31]

exactly the same thing, but is a purely

[1255:33]

JavaScript solution without using the

[1255:35]

onsubmit attribute. And we show you this

[1255:38]

only because especially for final

[1255:39]

projects, you might want to do something

[1255:41]

like add event listener to make like

[1255:43]

maybe a drop- down menu or some

[1255:45]

interactive clickable thing in your

[1255:47]

website that just listens for one of

[1255:48]

these events to happen before actually

[1255:50]

executing some code. Um, notice I've

[1255:53]

been conventionally using single quotes

[1255:54]

in JavaScript because that's just a

[1255:56]

thing in the JavaScript community to

[1255:57]

generally prefer single quotes over

[1255:58]

double quotes. Why? Well, it means

[1256:00]

people in JavaScript are hitting the

[1256:02]

shift symbol like much less than the

[1256:03]

rest of the world to get double quotes.

[1256:05]

It's just a convention. So long as

[1256:06]

you're consistent, um either is fine. Um

[1256:10]

conditional on not having actual

[1256:11]

apostrophes and text and such. Let me

[1256:13]

show you one other convention. Instead

[1256:16]

of putting my code at the bottom of my

[1256:19]

page just before the body ends, it is

[1256:21]

also alternatively conventional as in

[1256:24]

hello3.html HTML to do this to still

[1256:27]

maybe put the script tag at the top of

[1256:29]

the page, but to additionally have this

[1256:32]

magical line whereby you add an event

[1256:34]

listener before you do anything else

[1256:36]

that listens for this crazy weirdly

[1256:38]

named event called DOM content loaded.

[1256:41]

But now that you've heard DOM briefly,

[1256:42]

DOM is document object model just means

[1256:44]

the tree in memory. This is just the

[1256:46]

fancy way of saying when that tree is

[1256:48]

loaded, go ahead and do the following.

[1256:51]

And this ensures that when a browser

[1256:53]

reads all of this code top to bottom,

[1256:54]

left to right, this code won't actually

[1256:56]

be executed until the whole DOM is

[1256:58]

loaded into the computer's memory. That

[1257:00]

whole tree is built. So that's all

[1257:03]

that's being referred to there. The rest

[1257:04]

of the code is actually exactly the

[1257:07]

same. Um, what more can we do? Well,

[1257:09]

just so you've seen it, I can delete all

[1257:11]

of that code, move it to a file called

[1257:13]

like hello.js, JS. And in the fourth

[1257:15]

version of this example, I'm back to

[1257:17]

just HTML because I can put all of the

[1257:19]

fancy complexity inside of my script uh

[1257:24]

tag here, factoring that code out into

[1257:26]

hello 4.js, but the code is otherwise, I

[1257:29]

claim, unchanged.

[1257:32]

All right, this is a lot. I know it's

[1257:34]

quick, but do the general principles

[1257:36]

make sense? Like just listening for

[1257:38]

events and running some code in

[1257:40]

response? That's really all we're

[1257:41]

talking about. Allah week zero with

[1257:44]

scratch.

[1257:46]

All right. Well, let let me let things

[1257:47]

escalate just a little bit. And this

[1257:49]

time I'll open the demo first. Let me go

[1257:51]

ahead and open up hello 5.html which I

[1257:55]

wrote in advance, which okay, this is

[1257:57]

definitely starting to look like a

[1257:58]

mouthful, but in a moment it'll make a

[1258:00]

bit more sense. Let me go ahead into my

[1258:02]

other tab here. Click back. Go into

[1258:04]

source 8, which is all of my pre-made

[1258:06]

examples. And I said we're in hello 5

[1258:09]

now. And in hello 5, there's no submit

[1258:12]

button because watch this fanciness when

[1258:13]

I search for something like C uh or

[1258:17]

David as a full-fledged word there.

[1258:19]

Notice it's just happening inside of the

[1258:21]

web page. Moreover, if you poke around,

[1258:23]

let me rightclick on the page. Let me

[1258:25]

inspect to open my developer tools. Let

[1258:27]

me expand the body down here. And

[1258:29]

actually, let me reload the page. So,

[1258:30]

notice by default, this is what my web

[1258:33]

page looks like. It's just got an empty

[1258:34]

paragraph tag for some reason. But watch

[1258:36]

what happens at the bottom of the

[1258:38]

screen. And I'll zoom in a bit more.

[1258:40]

When I start typing my name like D and

[1258:43]

then let me expand this triangle. You

[1258:46]

see it beginning A V I D. When I say

[1258:50]

that JavaScript can mutate the DOM, the

[1258:52]

actual tree in memory. Like that's what

[1258:54]

you're seeing. You're seeing the HTML

[1258:56]

preprinted color-coded version of that

[1258:58]

tree in memory. And how is it working?

[1259:00]

Well, if we go back to the code here,

[1259:01]

well, let me wave my hands at this first

[1259:03]

line. This just means don't do this

[1259:04]

until the whole DOM is loaded. Let's

[1259:06]

look at this line, which means give me a

[1259:08]

variable called input, and set that

[1259:10]

equal to, okay, the input tag on the

[1259:12]

page, the text box, and then do what?

[1259:15]

Well, take that input, add an event

[1259:17]

listener that's forever listening for

[1259:18]

key up, like my finger going up off the

[1259:20]

keyboard, and when that happens, call

[1259:22]

the following function, which has no

[1259:23]

name, but that just means call these

[1259:25]

lines inside of the curly braces. Well,

[1259:28]

what happens inside of those curly

[1259:29]

braces? Well, here's a variable called

[1259:31]

name. And this is just pointing at the

[1259:33]

paragraph tag. Apparently, I'm checking

[1259:36]

this question. If there's hm if input

[1259:38]

value, so this is like saying if input

[1259:41]

value does not equal quote unquote just

[1259:43]

implicitly, go ahead and set the inner

[1259:46]

HTML of that name

[1259:49]

variable equal to hello quote unquote

[1259:54]

input value. Now, this is crazy syntax

[1259:56]

and I'm showing it just because you'll

[1259:57]

see it in documentation online. This is

[1259:59]

similar in spirit to Python's F strings.

[1260:02]

It's ugly syntax with dollar signs and

[1260:04]

curly braces and worse yet back ticks.

[1260:07]

However, this is a manifestation of

[1260:08]

really the JavaScript community

[1260:09]

presumably deciding that if you want the

[1260:11]

language to evolve, you have to make

[1260:13]

sure you're backwards compatible with

[1260:14]

old versions of the language. So, they

[1260:17]

chose characters and syntax that

[1260:19]

probably do not appear already in the

[1260:21]

wild. That's why sometimes things look

[1260:23]

uglier, I would surmise, than otherwise.

[1260:25]

But long story short, this just means if

[1260:27]

there's input there, go ahead and say

[1260:29]

hello, input. Otherwise, it says by

[1260:31]

default, hello whoever you are. And in

[1260:33]

fact, if I go back here and delete my

[1260:35]

name, watch what happens. It goes back

[1260:37]

to that default. So, here is just an

[1260:39]

example of listening for keystrokes

[1260:41]

going up and down and making sure that

[1260:43]

the page responds accordingly. How about

[1260:46]

something else? Let me go back into my

[1260:48]

directory listing. Let me open up

[1260:50]

background.html, which I wrote in

[1260:52]

advance. It's super simple, but this is

[1260:53]

the first of like an interactive website

[1260:55]

that has three buttons labeled R, G, and

[1260:57]

B. As you might imagine, clicking on R

[1260:59]

does that. G does this, B does that.

[1261:02]

Well, how is this working? This is the

[1261:04]

first example now where you can use

[1261:05]

JavaScript code to alter CSS

[1261:08]

dynamically. So, let me reload the page.

[1261:10]

So, it's back to white. Let me open

[1261:11]

developer tools and watch what's

[1261:14]

happening now on the body tag

[1261:16]

specifically. Initially, there's no

[1261:18]

stylization on the body other than the

[1261:20]

browser's default margins and whatnot

[1261:22]

over here. But watch what happens at

[1261:24]

bottom right when I click on the R

[1261:26]

button. You see that all of a sudden

[1261:28]

background color red was dynamically

[1261:30]

added. Now it's green, now it's blue.

[1261:33]

And notice the HTML at bottom left is

[1261:35]

changing too. So somehow I am listening

[1261:37]

for clicks and then changing CSS in

[1261:40]

response. So if I go back to VS Code,

[1261:42]

let's close Hello 5. Let's open up

[1261:44]

source 8's uh version of

[1261:46]

background.html. And in here, it's a bit

[1261:48]

of a mouthful, but the HTML is simple.

[1261:50]

Here's three buttons. And because I

[1261:52]

wanted them to be uniquely identifiable,

[1261:55]

I gave them all IDs of red, green, and

[1261:57]

blue, respectively. And then this code

[1261:59]

is a bit of copy paste. And frankly, I

[1262:01]

could probably avoid that if I were more

[1262:02]

elegant. But just to be pedantic, here's

[1262:05]

what's happening. Here's a variable

[1262:06]

called body that's just getting the body

[1262:09]

element, the node in the tree at that

[1262:12]

moment in time. And then these three

[1262:14]

lines of code, their purpose in life is

[1262:15]

to handle the red clicks. How? Well,

[1262:17]

we're telling the document to select the

[1262:20]

element whose ID is read, listen for the

[1262:23]

click event, and whenever that happens,

[1262:26]

do this. Body, which is the same

[1262:28]

variable as before, dotstyle, which we

[1262:31]

haven't seen before, but any element can

[1262:32]

have a style property associated with it

[1262:35]

in JavaScript. Background color equals

[1262:37]

quote unquote red. And the other blocks

[1262:39]

of code are exact same thing for green

[1262:41]

and for blue. The whole point here is

[1262:42]

we're now listening for clicks on

[1262:44]

buttons and changing not the contents of

[1262:47]

the button but rather the style thereof

[1262:49]

of the whole page. As an aside, this is

[1262:52]

curios uh curiosity. This is what's

[1262:54]

known as camelc case whereby like a

[1262:55]

camel has a hump in the middle. This

[1262:57]

word has a hump in the middle like

[1262:59]

capital C all of a sudden to separate

[1263:01]

the two words in CSS. Recall it was uh a

[1263:04]

moment ago background dash color. Anyone

[1263:07]

want to guess why this is not how you

[1263:10]

write it in JavaScript?

[1263:12]

Anything with hyphens in CSS is changed

[1263:14]

to camelc case in JavaScript.

[1263:19]

>> Uh it's not related to comments. It's

[1263:21]

simpler than that. Yeah.

[1263:26]

>> Yeah. Right. Like left hand wasn't

[1263:27]

talking to right hand and people

[1263:28]

realize, oh damn it. Like this now means

[1263:30]

background minus color which is not a

[1263:32]

thing because minus is indeed just like

[1263:34]

in C and in Python a mathematical

[1263:36]

operator. So, the world decided to

[1263:37]

reconcile this problem by just

[1263:39]

capitalizing uh the character that would

[1263:42]

otherwise be where the hyphen is. Well,

[1263:44]

little CSS trivia. All right, what else

[1263:46]

can we do? How about a couple of final

[1263:49]

examples here? So, what more can we do

[1263:51]

with CSS? So, back in my day, too, we

[1263:53]

had a tag called the HTML blink tag,

[1263:56]

which is among the few tags in the world

[1263:58]

of HTML that has actually been

[1263:59]

deprecated, that is removed from

[1264:01]

language. Like no one removes things

[1264:02]

from languages generally, but the blink

[1264:04]

tag was so hideous, followed only by the

[1264:06]

marquee tag whereby my own homepage is

[1264:08]

like a freshman had like welcome to my

[1264:10]

homepage just moving across the screen

[1264:12]

like this from left to right for no good

[1264:14]

reason like an ugly marquee and like uh

[1264:16]

on like a digital signage nowadays. But

[1264:19]

we can bring it back as follows. So if I

[1264:22]

close out my developer tools, go back

[1264:24]

into my source 8 directory and open up

[1264:26]

blink. This is what the blink tag used

[1264:28]

to do back in the day. Now, this version

[1264:31]

is implemented instead in JavaScript

[1264:32]

code as follows. I have a function here

[1264:35]

called blink, which I'm apparently

[1264:36]

calling every once in a while. Uh, how

[1264:39]

is that happening? Well, let's scroll

[1264:41]

down. Here's my HTML, super simple.

[1264:43]

Literally just says hello world. But

[1264:45]

notice this. There's another global

[1264:47]

variable we haven't seen in JavaScript

[1264:49]

called window. That refers to like the

[1264:50]

general window, not necessarily the

[1264:52]

contents of the page, where you can call

[1264:54]

a method called set interval. And you

[1264:56]

can tell that method set interval to

[1264:58]

call a specific function every number of

[1265:00]

milliseconds. So if I want to call blink

[1265:02]

every 500 milliseconds, that's the line

[1265:04]

of code that I use. If I scroll up to

[1265:06]

now this function, let's see how blink

[1265:08]

is implemented both now and perhaps back

[1265:10]

in the day. Well, body is a variable

[1265:12]

here that's just pointing to the body

[1265:14]

node in the DOM. And this is a big

[1265:18]

mouthful, but if that body's styles

[1265:21]

visibility property in CSS is quote

[1265:24]

unquote hidden, then change that body's

[1265:27]

styles visibility property to be

[1265:28]

visible. Otherwise, change it to be

[1265:31]

hidden instead. Here too, don't

[1265:34]

understand why left hand and right hand

[1265:35]

weren't talking to one another. You

[1265:36]

would think that the opposite of visible

[1265:38]

would be invisible, but in CSS, the

[1265:40]

opposite of visible is hidden. Just have

[1265:43]

to memorize stupid things like that. But

[1265:44]

what's this really doing? It's just

[1265:46]

changing the CSS from hidden to visible.

[1265:48]

Hidden to visible every 500

[1265:50]

milliseconds. So in fact what you're

[1265:52]

seeing here in the blink is if I inspect

[1265:55]

this page too. And now notice it's kind

[1265:57]

of fun just to watch it. You can see the

[1265:59]

HTML at bottom left and the CSS at

[1266:01]

bottom right just automatically changing

[1266:03]

because I'm doing that every 500

[1266:05]

milliseconds. All right. How about one

[1266:08]

other? Well, autocomplete. Well, we saw

[1266:10]

a step toward this with my hello, David

[1266:12]

example a moment ago. Super common

[1266:14]

though in Google and like every website

[1266:16]

now to automatically try to finish your

[1266:17]

thought. How is that happening? Well,

[1266:19]

that's not just HTML and CSS. That is

[1266:21]

also some JavaScript thrown into the

[1266:24]

mix. So, for instance, let me go into my

[1266:26]

terminal and open up source 8's example

[1266:28]

called autocomplete.html.

[1266:31]

And here I am going to borrow a file

[1266:34]

called large.js which is just a massive

[1266:36]

version. I'll open that too if you're

[1266:38]

curious. Large.js is just a huge

[1266:41]

JavaScript array. eras are back

[1266:44]

containing all of the words from problem

[1266:46]

set five, the spellchecking problem set

[1266:48]

where you had a 100,000 plus words in uh

[1266:52]

C in a file given to you. Now we've

[1266:54]

converted that to JavaScript by using a

[1266:57]

global variable like this in the code

[1266:59]

here. What's happening? Well, apparently

[1267:01]

there's going to be a text box at the

[1267:02]

bot at the top of the page that we see.

[1267:04]

Then there's an empty unordered list. So

[1267:07]

an empty bulleted list. And then there's

[1267:09]

this code down here. I'm apparently

[1267:11]

creating a variable called input that's

[1267:13]

referencing that text box. I'm then

[1267:15]

listening for key up just as like we've

[1267:18]

done before. And then I'm doing this.

[1267:20]

I'm setting a variable called HTML equal

[1267:22]

to quote unquote nothing. So an empty

[1267:24]

string. And then I'm checking does the

[1267:26]

input text box have any value

[1267:28]

implicitly. If so, what am I doing? This

[1267:31]

is kind of cool. It's a bit of Python

[1267:32]

and C together syntactically for each

[1267:35]

word in the words array. JavaScript uses

[1267:38]

the keyword of instead of in like

[1267:40]

Python, but so be it. What I'm doing now

[1267:42]

is in JavaScript, I'm saying if that

[1267:44]

current word in that big file of 100,000

[1267:46]

words starts with whatever the user

[1267:48]

typed in, go ahead and add to that HTML

[1267:52]

string using plus equals, which is just

[1267:55]

concatenation. We've seen plus before,

[1267:57]

the following, an LI tag inside of which

[1268:00]

is that specific word. And so in effect,

[1268:03]

what you're seeing now is what every

[1268:05]

almost every website nowadays does.

[1268:07]

They're not manually writing HTML like

[1268:09]

we've been doing much of today. They're

[1268:11]

writing code that dynamically generates

[1268:13]

HTML because the programmers understand

[1268:14]

what HTML is. They understand that

[1268:16]

unordered lists have li children. And so

[1268:20]

using this string that I've highlighted,

[1268:21]

they're creating LI element after LI

[1268:23]

element for the purpose of changing the

[1268:26]

inner HTML of the UL element to be the

[1268:31]

value of that variable. And this is a

[1268:33]

very long way of saying how is

[1268:35]

autocomplete implemented in general.

[1268:37]

Well, just like this, if I search for

[1268:38]

cats by typing in C, there's every word

[1268:41]

in that 100,000 dictionary that starts

[1268:43]

with C. A T S. And there's every word

[1268:47]

that starts with C A T S. Meanwhile,

[1268:50]

watch what happens underneath the hood.

[1268:52]

If I open up my inspect tab again and I

[1268:54]

go to my body, inside of this is the

[1268:59]

empty UL, but watch as soon as I start

[1269:01]

typing something like C. Now I can

[1269:03]

expand the triangle because there is an

[1269:05]

LI element that's been created for every

[1269:07]

one of the words that match. As I do

[1269:10]

ATS, now I've got just four of them. And

[1269:13]

there is cats, there is cats skill and

[1269:16]

so forth. So anytime you go to

[1269:18]

google.com like we did earlier and we

[1269:20]

went to google.com and started searching

[1269:22]

for cats, where are all of those search

[1269:24]

results coming from? Someone wrote

[1269:25]

JavaScript that's listening for key up

[1269:27]

or the like and then dynamically

[1269:29]

populating an unordered list or in this

[1269:31]

case a much prettier list of the

[1269:33]

matching results. And the final example

[1269:35]

that we thought we'd leave you with, and

[1269:37]

again the whole purpose of introducing

[1269:38]

JavaScript is to give you a taste of its

[1269:40]

syntax and its relative familiarity, but

[1269:42]

with the power that you can uh the power

[1269:44]

with which you can leverage it to make

[1269:46]

websites so much more interactive. And

[1269:48]

in fact, with Bootstrap, you don't just

[1269:50]

get CSS you can use, you have a whole

[1269:52]

set of JavaScript functionality. So you

[1269:54]

can have drop- down menus and the like.

[1269:56]

For instance, for instance, among the

[1269:57]

things you'll use for an upcoming

[1269:58]

problem set and perhaps your final

[1270:00]

project, something that looks a little

[1270:02]

like this, uh, in Bootstrap.html,

[1270:05]

here's a whole bunch of code that I

[1270:07]

literally copied and pasted from

[1270:09]

Bootstrap's documentation. And it's just

[1270:11]

like boilerplate code for a corporate

[1270:13]

website that has features with pricing

[1270:15]

and disabled menu options as well, just

[1270:17]

for the sake of discussion. And then

[1270:19]

here, if I go back into this example,

[1270:21]

you'll see fairly simple website that

[1270:25]

looks like this. A so-called navbar with

[1270:27]

all of the main menu options of like a

[1270:29]

corporate website. And notice if you

[1270:31]

start to resize the window, which I'll

[1270:32]

do here, and put it into sort of mobile

[1270:35]

mode because it's so narrow now, thanks

[1270:36]

to JavaScript, it's listening for clicks

[1270:38]

on this hamburger menu and revealing the

[1270:41]

menu options that way. This is quite

[1270:42]

like how CS50's own website works and so

[1270:44]

many other websites out there. But the

[1270:46]

last one we thought we'd use is you're

[1270:48]

so in the habit of using Google Maps or

[1270:50]

Uber Eats or any number of apps that

[1270:52]

need to know your location. That too is

[1270:54]

exposed through JavaScript quite simply.

[1270:57]

Let me go ahead and in geoloccation.html

[1271:00]

HTML open up uh the following code

[1271:04]

whereby

[1271:05]

super simple even though some new

[1271:07]

functions there exists another global

[1271:09]

variable in JavaScript in browsers

[1271:11]

called navigator which has a property

[1271:13]

called an object called geoloccation

[1271:15]

which has a function called get current

[1271:17]

position that takes an argument which is

[1271:19]

just an anonymous function which means

[1271:21]

call this code when you're ready to know

[1271:22]

the uh coordinates because it might take

[1271:24]

a while to figure out your GPS

[1271:25]

coordinates and once you do this simple

[1271:28]

example is just going to write to the

[1271:30]

document that is the rectangular page

[1271:32]

the positions latitude that comes back

[1271:34]

and the position's longitude that comes

[1271:36]

back. So to see this in action, let me

[1271:38]

go ahead and uh open up that second tab.

[1271:42]

Go back into

[1271:45]

geol location.

[1271:47]

It's notice for privacy sake, it's

[1271:49]

asking me to approve this. So I'm going

[1271:50]

to say allow this time. There are

[1271:52]

apparently my laptop's GPS coordinates.

[1271:55]

And if I go to google maps.com, I can

[1271:57]

actually paste this in here. Enter. And

[1272:01]

looks like if we zoom in in in okay, I'm

[1272:04]

not technically outside, so it's only

[1272:06]

close to a degree of precision, but it's

[1272:09]

probably mapping to one of the Wi-Fi

[1272:10]

access points that's on that corner of

[1272:12]

the building. So, we're pretty darn

[1272:13]

close, pretty much close enough to get

[1272:15]

me my my food or my my ride here. And a

[1272:19]

final note, now that you've seen a

[1272:20]

little bit of JavaScript, let me go

[1272:22]

ahead and open up just 60 final seconds

[1272:25]

of uh just how uh how much effort it

[1272:27]

took us to put not only this lecture

[1272:29]

together, but particularly that example

[1272:30]

of the teaching fellows passing packets,

[1272:33]

everything we like to think is very

[1272:34]

finely flourished here. Uh but here's a

[1272:37]

little bit of behind the scenes and

[1272:38]

these final 60 seconds together. If we

[1272:39]

could dim the lights before we adjourn.

[1272:42]

>> Off you go.

[1272:44]

Offering. Okay,

[1272:47]

Josh. Nice.

[1272:50]

Helen. Oh,

[1272:56]

Bentimony. No. Oh, wait.

[1273:04]

That was amazing. Josh

[1273:09]

um Sophie

[1273:14]

Amazing.

[1273:19]

That was perfect.

[1273:25]

>> I think I

[1273:28]

over to you all.

[1273:30]

>> Oh, nice guy.

[1273:36]

That was amazing. Thank you all.

[1273:38]

>> So good.

[1273:40]

>> All right, that's it for CS50. We'll see

[1273:42]

you next time. Heat up

[1274:13]

here. Heat. Heat.

[1275:02]

All right, this is CS50. This is already

[1275:06]

week nine. And I dare say this week is

[1275:08]

the most representative of what you'll

[1275:11]

be doing after the class if you so

[1275:12]

choose to program in the future and

[1275:14]

tackle some project that's new to you.

[1275:15]

In fact, the closest to this week was

[1275:17]

perhaps week six wherein we didn't

[1275:19]

really introduce all that many new

[1275:21]

concepts but really translated them from

[1275:22]

C and to Python. And so this week in

[1275:25]

particular, the goal is to really

[1275:26]

synthesize the past 10 weeks of class,

[1275:29]

drawing upon a lot of the building

[1275:30]

blocks that are hopefully now uh

[1275:32]

metaphorically in your toolbox and gives

[1275:34]

you an opportunity now to apply those

[1275:36]

ideas to new problems. In particular,

[1275:38]

web programming. So every day you and I

[1275:40]

are using the web in some form. Every

[1275:42]

day you and I are using mobile apps in

[1275:43]

some form. And we said last week that

[1275:45]

the languages underlying a lot of those

[1275:47]

applications are HTML and JavaScript for

[1275:49]

the layout and aesthetics. and then also

[1275:51]

in part JavaScript for a lot of the

[1275:52]

client side interactivity that you might

[1275:55]

experience nowadays. Well, today we come

[1275:57]

full circle and bring back a serverside

[1276:00]

component whereby we'll again write some

[1276:02]

Python, we'll again write some SQL code

[1276:04]

and use it to make our full-fledged own

[1276:07]

web applications and in turn if you so

[1276:09]

choose mobile applications as for your

[1276:11]

final project as well. So up until now

[1276:13]

when we did anything with the web, you

[1276:15]

ran this command last week HTTP server

[1276:18]

which literally did just that. It

[1276:19]

spawned a so-called HTTP server that is

[1276:21]

a web server whose purpose in life is

[1276:22]

just to serve up content from like your

[1276:24]

current folder, any files therein, any

[1276:26]

folders therein. And so all of the URLs

[1276:28]

generally followed a certain format. So

[1276:30]

if your URL were example.com/reall

[1276:34]

just denotes the root of the web server

[1276:37]

and so in there typically by default you

[1276:39]

would see a directory index. We'll see

[1276:41]

today that that goes away because

[1276:42]

generally when you visit something.com/

[1276:45]

you want to see the actual website, not

[1276:46]

the contents of everything in the

[1276:48]

server. So we'll see how to address

[1276:50]

that. But the URLs up until now have

[1276:51]

been of a form like file.html literally

[1276:54]

referencing a file in that folder or

[1276:56]

folder slash which just means whatever

[1276:58]

is inside of that folder or

[1277:00]

folder/file.html

[1277:01]

or dot dot dot. You can nest these

[1277:03]

things however long that you want. And

[1277:05]

recall that more generally we said that

[1277:06]

you're referring to some kind of path on

[1277:08]

the server where pi the p path is a step

[1277:10]

of folders ending in perhaps a file

[1277:13]

name. So today we're going to generalize

[1277:14]

that at least in terms of nomenclature

[1277:16]

and start talking more about routes

[1277:17]

because essentially in web programming

[1277:20]

we are going to exercise a lot more

[1277:21]

control over what is in the URL. So back

[1277:24]

in the day it referred to literally a

[1277:27]

file on the server and as recently as

[1277:28]

last week the URLs referred to literally

[1277:31]

a file on the server. However, we'll see

[1277:33]

in code that we can actually just parse

[1277:36]

this that is analyze what is after the

[1277:39]

domain name in a URL and just use this

[1277:42]

as generic input to the server to figure

[1277:44]

out what kind of output to produce.

[1277:46]

We're going to see the same convention

[1277:47]

though. If you want to pass in specific

[1277:48]

parameters, key value pairs, uh we'll

[1277:51]

use a question mark after our so-called

[1277:52]

route key equals value. And then if

[1277:54]

there's another one or more, we'll just

[1277:56]

separate them by amperands. And to do

[1277:58]

all of this, we're going to recall the

[1278:01]

inside of those virtual envelopes.

[1278:03]

Recall that if we did something like on

[1278:04]

google.com to search for cats, what was

[1278:06]

really being sent to the server was a

[1278:08]

request for /arch, which notice is not

[1278:11]

search.html. There's no folder per se

[1278:13]

there. This is just the name of a

[1278:15]

program really running on Google

[1278:16]

servers. And that's going to be the

[1278:18]

so-called route that we ourselves start

[1278:20]

programming today. question mark Q

[1278:22]

equals cats just meant that the query

[1278:23]

parameter the input from the web form is

[1278:26]

going to contain in this particular

[1278:27]

example the word cats. So how are we

[1278:30]

going to do all do this? So we could

[1278:31]

implement our own web server in C. It

[1278:33]

would be a nightmare to like use a

[1278:35]

language as lowle as C and actually deal

[1278:37]

with something as high level as writing

[1278:39]

code for the web. We're instead going to

[1278:41]

use Python for the most part if only

[1278:42]

because it's much higher level. But even

[1278:44]

then, we would probably if we wanted to

[1278:47]

do this thing uh from scratch, we would

[1278:49]

have to write a lot of Python code to

[1278:51]

like analyze the insides of these

[1278:53]

envelopes, figure out what inputs are

[1278:55]

being passed to the server, and then

[1278:57]

figure out how to access that in Python

[1278:59]

code. It's just a lot of work to just

[1279:01]

get a web application up and working.

[1279:02]

And so what the world generally does is

[1279:04]

they don't reinvent the wheel of writing

[1279:06]

their own web server. Rather, they use

[1279:08]

an off-the-shelf fairly generic web

[1279:10]

server or application server as it might

[1279:12]

be called. And we for instance are going

[1279:14]

to use something called flask. Now flask

[1279:16]

is a framework as the world would say or

[1279:18]

more specifically a micro framework

[1279:20]

which just means it's a library of code

[1279:21]

that other people wrote to make it

[1279:23]

easier for us to implement web

[1279:25]

applications. So they took the time to

[1279:27]

figure out how to handle get requests on

[1279:29]

a server, post requests on a server,

[1279:32]

figure out how to extract key value

[1279:33]

pairs from URLs, the sort of commodity

[1279:35]

stuff that like literally every web

[1279:37]

application on the internet has to do

[1279:39]

anyway. So we don't have to retrace

[1279:41]

those steps ourselves. What this will

[1279:42]

allow us to do is only implement the

[1279:44]

problems that we care about by using

[1279:46]

this framework. And to be clear, a

[1279:48]

framework much like Bootstrap is not

[1279:50]

only a library that someone else has

[1279:52]

written for you, but it's like a set of

[1279:53]

conventions that you follow in order to

[1279:55]

use the library in their recommended

[1279:58]

way. So it's more of a generic term that

[1279:59]

includes library and a set of

[1280:01]

conventions. And how do you know how to

[1280:02]

use either? You just read the

[1280:03]

documentation or take a class in which

[1280:05]

we're about to give you an introduction

[1280:07]

to some of this right here. So instead

[1280:09]

of running today http-server

[1280:11]

to start a web server that just serves

[1280:13]

up static content files and folders in

[1280:16]

our account we're instead going to run

[1280:17]

the command moving forward flask space

[1280:20]

run and this is going to look for code

[1280:22]

that we've written in our current

[1280:24]

directory and if it is in accordance

[1280:26]

with the conventions to which I'm

[1280:27]

alluding by using the so-called

[1280:29]

framework then it's going to start our

[1280:30]

web application on some TCP port for

[1280:33]

instance 8080 as we discussed last week

[1280:36]

to do this all we have to have in our

[1280:37]

current folder There is minimally a file

[1280:39]

called app.py by default. This is

[1280:42]

hinting at an application in the

[1280:44]

language called Python. And what code we

[1280:45]

put in there we'll soon see. And then

[1280:47]

ideally we would have another text file

[1280:49]

called requirements.ext by convention

[1280:52]

inside of which is just one per line the

[1280:54]

name of all of the libraries that we

[1280:56]

want this web application to include. In

[1280:58]

other words, if I go over here to VS

[1281:00]

Code, if I don't have such a file,

[1281:02]

that's fine, but I want to use a

[1281:03]

framework like Flask. Recall our pip

[1281:05]

command for installing Python packages.

[1281:07]

is I could just say pip install flask

[1281:10]

enter and that would go ahead and

[1281:12]

install the flask framework or library

[1281:14]

for me just like we did a few weeks ago

[1281:15]

with installing the silly little cows uh

[1281:18]

library as well. I've already done that

[1281:20]

in advance and better still I've

[1281:22]

installed I've come with uh my code

[1281:25]

today both of these files app and

[1281:27]

requirements.ext and in fact if I go

[1281:29]

ahead and create one just for fun here

[1281:32]

all you need do in a requirements.ext

[1281:34]

text file is literally put the name of

[1281:35]

the library that you want to include and

[1281:37]

then you run pip in a slightly different

[1281:39]

way to install that library or any other

[1281:42]

libraries that are in that file as well.

[1281:44]

So let me wave my hands at the

[1281:45]

requirements.ext for uh moving forward.

[1281:47]

It just means what libraries do you want

[1281:48]

to use with this web application so you

[1281:50]

don't have to remember or memorize them

[1281:51]

and type them all out manually. All

[1281:54]

right. So what's going to go inside of

[1281:56]

app.py? Well, the minimal amount of code

[1281:58]

that we can write to make our own web

[1282:00]

application that does something like

[1282:01]

print out hello world to my browser

[1282:03]

could look like this. Now, there's a bit

[1282:06]

of new syntax here, but not all that

[1282:08]

much today moving forward. The very

[1282:09]

first line just says from flask import

[1282:11]

flask, which is a weird way of just

[1282:13]

saying give me access to the flask

[1282:15]

library. Capitalization no matters. And

[1282:17]

so, the package that we're using is

[1282:19]

called flask lowercase, but we want to

[1282:21]

have access to a special function in

[1282:23]

there called flask capital F. So this is

[1282:25]

sort of a copy paste line. The next

[1282:27]

one's a little weird looking, but it

[1282:28]

essentially says give me a variable

[1282:30]

called app and turn this file into a

[1282:34]

flask application. We haven't seen this

[1282:35]

in a few weeks, but there was that weird

[1282:37]

if conditional that we put at the bottom

[1282:39]

of some of our Python code a few weeks

[1282:41]

back that just said if uh dot dot dot

[1282:44]

and it mentioned in there name if name

[1282:46]

equals equals_.

[1282:49]

So we've seen an illusion to name. For

[1282:51]

our purposes, name just refers to

[1282:53]

whatever the name of this file here is.

[1282:55]

No matter what I call it, you can sort

[1282:57]

of access the current file by way of

[1282:59]

this special global variable. So this

[1283:01]

line collectively just means turn this

[1283:03]

file into a flask application and store

[1283:06]

the result in a variable called app. So

[1283:08]

I can now do stuff with flask. And what

[1283:10]

am I going to do? Well, down here, let

[1283:13]

me first point out a familiar syntax.

[1283:15]

I'm defining a function that I called

[1283:17]

index by convention, but I could have

[1283:18]

called it anything I want whose sole

[1283:20]

purpose in life is just to return quote

[1283:21]

unquote hello world, which is the super

[1283:23]

simple output this web app is going to

[1283:25]

display. But, and this is the new

[1283:28]

syntax, I'm using here, what's generally

[1283:30]

called a Python decorator, which is a

[1283:32]

type of function that essentially

[1283:35]

affects the behavior of the function

[1283:37]

right after it. So, by saying atapp.rout

[1283:41]

route quote unquote slash. This is

[1283:43]

telling the Flask framework associate

[1283:45]

this index function with this route, the

[1283:48]

single forward slash. And that's how

[1283:51]

we're going to take over the default

[1283:52]

behavior of the slash portion of the URL

[1283:55]

by telling it to return whatever this

[1283:58]

function returns. And we'll see this in

[1284:00]

action now. So let me go over here say

[1284:04]

to VS Code. And within VS Code, I'm

[1284:06]

going to whip up exactly that

[1284:07]

application in a file called uh app.py.

[1284:11]

Just so as to combine this and some

[1284:13]

subsequent examples, maybe the same

[1284:14]

folder, I'm going to first create a

[1284:15]

directory or folder called hello. I'm

[1284:18]

going to go into that hello folder. I'm

[1284:20]

going to go ahead and recreate that same

[1284:23]

requirements file just for good measure

[1284:25]

to tell the world that I want to use the

[1284:26]

flask library here. And then I'm

[1284:29]

additionally going to create now app.py.

[1284:31]

And I'll type this fairly quickly, but

[1284:32]

I'm just reciting what we saw a moment

[1284:33]

ago. From the Flask package, import the

[1284:36]

Flask function, lowercase F, capital F,

[1284:39]

respectively. Then give me a variable

[1284:40]

called app. Set it equal to that

[1284:42]

function call passing in the name of

[1284:44]

this file, whatever it actually is. And

[1284:47]

then lastly, let's go ahead and call at

[1284:50]

app.rout quote unquote slash, which

[1284:53]

says, hey, Python, whatever the next

[1284:55]

function is, associate it with this

[1284:58]

slash route. And so I'm going to define

[1285:00]

that function. I could call it anything

[1285:01]

I want, foo or bar or baz. But in so far

[1285:05]

as slash represents the index of the

[1285:07]

website, like the default page, I'm just

[1285:09]

going to go ahead and call it by

[1285:10]

convention index and then return for now

[1285:12]

hello, world. And that's it. So whereas

[1285:16]

last week when I was writing code in

[1285:18]

HTML files, I was making web pages, now

[1285:22]

I've created what we'll call a web

[1285:24]

application. And it's an application in

[1285:25]

the sense that there's actually some

[1285:26]

logic going on there. There's some

[1285:28]

functions, there could be some

[1285:29]

conditionals, there's clearly a

[1285:30]

variable, there could be loops, and all

[1285:32]

of the sort of stuff we've seen in

[1285:34]

Scratch, NC, and Python as well. We'll

[1285:37]

now see back in this Python file. So,

[1285:40]

how do we now run this? Well, let me go

[1285:41]

back into my terminal window here, and

[1285:43]

I'll clear it just for good measure. I'm

[1285:45]

going to go ahead and run flask run

[1285:47]

enter. I'm going to see some cryptic

[1285:49]

looking output, but there's that

[1285:50]

familiar pop-up with the green button

[1285:52]

that wants to open up this application,

[1285:54]

whereas HTTP server uses 8080 by

[1285:57]

default. Flask uses port 5000 by

[1285:59]

default. And here we have it. I've just

[1286:02]

opened up my second tab, and we spent a

[1286:03]

lot of time there last week. This is the

[1286:05]

server I'm running, not on port 8080,

[1286:08]

but on port 5000 today. And there is the

[1286:10]

contents of what was spit out by my very

[1286:12]

first application. Now, even though the

[1286:15]

browser is rendering this like it is a

[1286:17]

web page, notice this. If I uh inspect,

[1286:20]

if I rightclick or control-click

[1286:22]

anywhere on the screen and go to view

[1286:24]

page source, you'll see that there's no

[1286:26]

actual HTML on this page. It's literally

[1286:28]

a single line of text, hello, world. If

[1286:31]

I close that and rightclick or

[1286:33]

control-click again and go to inspect

[1286:35]

like we did last week to open up

[1286:36]

developer tools, you'll see that the

[1286:38]

browser has actually filled in some

[1286:39]

blanks here for me by just rendering as

[1286:42]

it should the minimal possible web page.

[1286:44]

But the content I actually sent to the

[1286:47]

web browser is only literally hello,

[1286:49]

world. So how can I actually send a web

[1286:52]

page of my own rather than letting the

[1286:54]

browser do something like this? Well, I

[1286:56]

could go ahead and close that and go

[1286:59]

back to my application. I'm going to go

[1287:01]

ahead now and hide the terminal just

[1287:02]

because the server is still running. And

[1287:04]

what I'm going to go ahead and do here

[1287:05]

is well, nothing's really stopping me

[1287:08]

from returning not just a string of

[1287:10]

text, but a string of HTML. And this

[1287:11]

might not look pretty, but let me go

[1287:13]

ahead and do open bracket doc type HTML

[1287:17]

close bracket then HTML then head then

[1287:20]

title. And I'll just title this for

[1287:22]

instance hello to keep it simple. back

[1287:24]

slashtitle back slash head open bracket

[1287:27]

body hello, world back slashbody back

[1287:30]

sltl uh close quotes and I used single

[1287:34]

quotes in this case but I could have

[1287:35]

just as easily used double quotes but

[1287:37]

that's a full-fledged web page like

[1287:38]

that's the minimal amount of content we

[1287:40]

saw last week actually you know what for

[1287:41]

good measure let's actually add lang

[1287:43]

equals quote unquote en so it's actually

[1287:45]

fortuitous that you use single quotes

[1287:46]

because now I have some double quotes

[1287:48]

inside and even though this is not

[1287:49]

pretty printed it's just one massive

[1287:51]

mouthful of HTML all along one Fine.

[1287:54]

When I now go back to the browser,

[1287:56]

reload the page as by clicking here, and

[1287:59]

then view page source again, here's what

[1288:02]

my browser received this time. Indeed,

[1288:04]

it's the full-fledged HTML. And in fact,

[1288:07]

if I close that tab and reopen developer

[1288:09]

tools via inspect, now we'll see in the

[1288:11]

tab absolutely everything that I sent

[1288:14]

over, including a title, including the

[1288:16]

lang equals n. And had I typed even

[1288:18]

more, we would have seen that, too. All

[1288:21]

right. So, what was the point of this

[1288:24]

exercise? It feels as though that I've

[1288:26]

really just taken more time, added more

[1288:27]

complexity to achieve literally what I

[1288:29]

could have done last week by just

[1288:30]

creating index.html

[1288:33]

myself without any Python code. But I

[1288:35]

dare say what we're trying to do is lay

[1288:36]

the foundation for a full-fledged

[1288:38]

interactive website that maybe has forms

[1288:40]

that we can submit to the application

[1288:42]

that allows us to generate not just one

[1288:44]

page, but maybe two or three or any

[1288:46]

number. So what you're seeing here is

[1288:47]

sort of the beginning of google.com's

[1288:50]

search application or gmail.com itself

[1288:53]

or facebook.com or any web application

[1288:55]

you can think of begins with a little

[1288:57]

code that theoretically looks a little

[1289:00]

something like this. But this is kind of

[1289:03]

stupid to put HTML hardcoded no less in

[1289:05]

one long string here inside of my

[1289:08]

application. Let's try to factor this

[1289:09]

out. That was a lesson we preached last

[1289:11]

week about sort of factoring out our

[1289:12]

JavaScript, factoring out our CSS. We

[1289:14]

can do the same thing with our actual

[1289:16]

HTML here. And so what I'm actually

[1289:18]

going to do is import not only the Flask

[1289:21]

function, but also another function that

[1289:23]

per its documentation comes with Flask

[1289:25]

called render template with an

[1289:27]

underscore in between. This is a

[1289:29]

function whose purpose in life is to

[1289:30]

render a template, so to speak, of HTML.

[1289:32]

We'll see what we mean by template in

[1289:34]

just a bit. But down here, what I'm

[1289:35]

going to do is now delete all of that

[1289:38]

code. And let me just assume that I'm

[1289:40]

going to put that same code in a file

[1289:41]

called index.html, html just like I did

[1289:44]

last week. So let's instead return the

[1289:46]

return value of render template of quote

[1289:49]

unquote index.html.

[1289:52]

Now that file does not yet exist.

[1289:54]

Indeed, if I go into my terminal window,

[1289:56]

create a second terminal just so I can

[1289:58]

leave the server running but still see

[1290:00]

what's going on. I'm going to CD into

[1290:02]

that same hello directory, type ls to

[1290:04]

list my files, and I only see app.pay

[1290:06]

and requirements.ext. But it turns out

[1290:09]

per Flask's documentation, if you want

[1290:12]

to create your own HTML files, you

[1290:14]

simply have to add a directory that by

[1290:16]

convention is called templates. And

[1290:18]

that's it. So in addition to app.py

[1290:20]

requirements.ext, I need a folder called

[1290:21]

templates. So let's go back into VS

[1290:23]

Code, make dur templates. Capitalization

[1290:26]

matters, all lowercase. Now, let me go

[1290:28]

ahead and cd into templates and run the

[1290:31]

code command and create a file called

[1290:33]

index.html in the templates folder. And

[1290:36]

then super quickly, let me hide this.

[1290:38]

Let me whip up that same page again. Doc

[1290:40]

type HTML html lang equals quote unquote

[1290:44]

en close bracket uh head close bracket

[1290:47]

title close bracket hello and then down

[1290:50]

here body close bracket hello, world. So

[1290:52]

autocomplete is helping me type quickly.

[1290:54]

But now I have a file with my HTML that

[1290:57]

this application I claim is going to

[1290:59]

spit out automatically for me. So let's

[1291:02]

see the effect. Let me go back into my

[1291:04]

other browser tab. Let me close the

[1291:06]

developer tools and let me quite simply

[1291:08]

just click reload. And no apparent

[1291:11]

change. It's working exactly as it did

[1291:13]

before, but I've laid the foundation for

[1291:15]

making a much more useful layout of my

[1291:19]

files so that I can actually keep my

[1291:20]

logic, my Python code, and my HTML a bit

[1291:24]

separate from that. All right. Well, how

[1291:26]

can we make this into something even

[1291:28]

more interesting? Well, let's start to

[1291:29]

take some actual user input for

[1291:31]

instance. So, wouldn't it be nice if I

[1291:34]

could pass in via the URL something like

[1291:37]

Q equals cats, but maybe something like

[1291:39]

name equals David or name equals Kelly

[1291:41]

and actually see the name that's being

[1291:43]

outputed. In other words, let me zoom in

[1291:45]

up here and let me pretend like this

[1291:47]

happened automatically. Let me do

[1291:49]

question mark uh name equals David.

[1291:52]

Enter. Well, it would be nice if I saw

[1291:54]

hello, David. I'll I'll propose rather

[1291:56]

than just hello, world. So, how do I

[1291:59]

actually get access to everything after

[1292:00]

the question mark? Well, here is where a

[1292:03]

framework like Flask and any number of

[1292:04]

alternatives starts to shine. It gives

[1292:07]

me that answer for uh automatically. And

[1292:10]

so it turns out in Flask once you've

[1292:12]

used it, you have access to a special

[1292:13]

global variable as we'll call it called

[1292:16]

request.orgs

[1292:17]

where args just means the arguments or

[1292:19]

the parameters that were passed in to

[1292:21]

this HTTP request. So how do we use

[1292:24]

this? Well, let me go back to VS Code

[1292:26]

here. And at the very top line, in

[1292:29]

addition to importing Flask, capital F,

[1292:32]

render template, let's also import

[1292:34]

request, which is a global variable that

[1292:36]

comes with the Flask framework. And then

[1292:38]

I'm going to use it as follows. I'm

[1292:41]

going to go ahead and say um a second

[1292:45]

argument to the render template function

[1292:47]

where I'm going to say placeholder

[1292:49]

equals request. Actually, let me not do

[1292:52]

that yet. Let me first create a variable

[1292:54]

name equals request args. And then let

[1292:58]

me go ahead and get the name key from

[1293:02]

the arguments. And then down here, let's

[1293:04]

go ahead and pass in placeholder equals

[1293:08]

name. So what am I doing here on line 8?

[1293:11]

I'm creating a variable called name. I'm

[1293:13]

storing in that the value that's in the

[1293:15]

request global variable in what's

[1293:17]

apparently a dictionary called args,

[1293:19]

specifically the name key therein. So if

[1293:22]

the thing after the question mark name

[1293:24]

equals is David, this should give me

[1293:25]

David. If it's Kelly, it should give me

[1293:27]

Kelly instead. Then what I'm doing is

[1293:29]

rendering this template called

[1293:31]

index.html, but I'm additionally passing

[1293:33]

in some named parameters. We talked

[1293:35]

briefly about that in week six when we

[1293:37]

introduced the idea that Python can take

[1293:38]

not only a commaepparated list of

[1293:40]

arguments, but some of which can have

[1293:42]

names. So I'm proposing that one such

[1293:44]

name of an argument to this render

[1293:46]

template function can be placeholder for

[1293:49]

instance. Now, at the moment, this code

[1293:50]

isn't going to do anything useful. If I

[1293:53]

go back indeed to the other tab, click

[1293:55]

reload after zooming in, even with my

[1293:58]

name in the URL, you'll see that we

[1293:59]

still see hello, David. But here's where

[1294:02]

things now get interesting. And here too

[1294:04]

is what we mean by template. If I go

[1294:07]

back into VS Code, open up index.html

[1294:11]

again, and instead of putting the word

[1294:14]

world there, what I'd like to see is not

[1294:15]

hello world, but hello, placeholder. But

[1294:18]

of course, if I literally type that, I'm

[1294:20]

going to see literally placeholder

[1294:21]

unless I surround placeholder with pairs

[1294:25]

of curly braces like this. And by using

[1294:28]

these pairs of curly braces, I'm telling

[1294:30]

Flask that I want to interpolate, so to

[1294:33]

speak, that variable. I want to

[1294:34]

substitute in its value. So this is yet

[1294:36]

another syntax. In Python, we saw

[1294:38]

fstrings. In C, we saw percent s. When

[1294:41]

using something like print f in an HTML

[1294:44]

file, when using flask specifically, we

[1294:47]

use these pair of curly braces to denote

[1294:50]

this is indeed a placeholder whose value

[1294:52]

should be plugged in. So now let's go

[1294:55]

back over to the second tab. Recall if I

[1294:57]

zoom in that passed in already to this

[1295:00]

URL is question mark name equals David.

[1295:02]

And this time when I click reload,

[1295:04]

voila, now I see my actual name. And

[1295:07]

unlike the JavaScript examples last week

[1295:10]

which were doing everything client side,

[1295:12]

notice here if I go to uh rightclick or

[1295:15]

control-click and view page source,

[1295:17]

what's noteworthy today is that David in

[1295:20]

this case literally came from the

[1295:22]

server. This was not rendered client

[1295:23]

side. The server sent this HTML and

[1295:26]

specifically this text. So, if I go back

[1295:30]

to the same tab here, zoom in and change

[1295:34]

David for instance to Kelly, what I

[1295:37]

should see instead when I hit enter is

[1295:39]

hello, Kelly. And indeed, if I go back

[1295:41]

to the source code and reload the page

[1295:44]

there, I should see in the view page

[1295:47]

source that the server sent indeed

[1295:49]

hello, Kelly. So, it's in this sense

[1295:51]

that it's an application. The URL is

[1295:54]

providing input to the application by

[1295:56]

way of this URL format, the so-called

[1295:58]

get for uh the get string that's being

[1296:00]

passed in. And if I look at the code

[1296:02]

that I'm running, app.py is the code

[1296:04]

that's running. It is grabbing that name

[1296:07]

from the URL. I am then passing it into

[1296:11]

my index.html file and then my HTML file

[1296:15]

is plugging the actual value in for me.

[1296:18]

And so what's going on with for instance

[1296:20]

these curly braces? Well, here too is

[1296:22]

where we're actually using a library.

[1296:24]

And included in Flask is another library

[1296:27]

called Ginga. And Ginga is what's called

[1296:29]

a templating library. And there's so

[1296:30]

many templating libraries in the world.

[1296:32]

Ginga is actually fairly s simple, which

[1296:34]

is nice. And which is why Flask uses it.

[1296:36]

And for now, you can just think of Ginga

[1296:37]

as being the library that knows how to

[1296:40]

interpolate variables inside of pairs of

[1296:43]

curly braces. So why are we introducing

[1296:45]

yet another frame, another library? of

[1296:47]

all the folks who implemented Flask

[1296:49]

decided that it was not worth their time

[1296:50]

reinventing the wheel of a templating

[1296:52]

language, a language via which you can

[1296:54]

figure out what values to plug in where.

[1296:56]

So they just lean on another library

[1296:58]

that someone else wrote years prior so

[1297:00]

as to not reinvent that wheel

[1297:02]

themselves. And that's all that's going

[1297:03]

on with a framework. In this case, it's

[1297:05]

using perhaps multiple libraries

[1297:07]

instead. All right. So what then is a

[1297:12]

template? So this then is a template.

[1297:15]

What you're looking at here, hello,

[1297:16]

placeholder, is a template in the sense

[1297:18]

that it's kind of the blueprint for the

[1297:19]

web page I want the user to see, but

[1297:22]

it's going to be dynamically generated

[1297:24]

using indeed this blueprint by plugging

[1297:27]

in the value of placeholder inside of

[1297:29]

those pairs of curly braces. And so

[1297:32]

that's why index.html starting today is

[1297:35]

in a folder called templates because

[1297:36]

this is not just static HTML like the

[1297:38]

stuff we wrote last week. This is the uh

[1297:41]

the the the blueprint for the actual

[1297:44]

HTML that we want the browser to spit

[1297:47]

out. But there's a bug here. Notice

[1297:50]

what's going to happen here. If I go up

[1297:51]

to this URL and I get rid of the name

[1297:53]

altogether, for instance, I just visit

[1297:55]

the slash route without any key value

[1297:57]

pairs and hit enter. This is sort of bad

[1298:00]

bad request. It's an HTTP 400. In fact,

[1298:02]

if you look at the tab, here's another

[1298:04]

HTTP status code that we probably

[1298:06]

haven't seen before. But 400 just means

[1298:08]

the user did something wrong by not

[1298:09]

passing in the parameter that was

[1298:11]

expected. Well, that's a little bad

[1298:13]

design if like the user has to manually

[1298:15]

type in things to the URLs. Like no

[1298:16]

human actually does that. That's not

[1298:18]

good for business or customers in

[1298:20]

general. So I can go back into app.py

[1298:23]

and just make a little bit of

[1298:24]

conditional code here. And here's too

[1298:26]

where we see what makes this an

[1298:27]

application and not just a static page.

[1298:30]

Instead of just blindly getting the name

[1298:32]

here, I could instead do something like

[1298:34]

this. Well, if the name parameter is in

[1298:38]

request.orgs, and this is just Python

[1298:40]

syntax for asking if this key is in this

[1298:43]

dictionary, then I'm going to go ahead

[1298:44]

and define name and set it equal to

[1298:46]

request.orgs quote unquote name. Else,

[1298:50]

if there is no name in the request,

[1298:52]

well, then I might as well give some

[1298:53]

default value like name equals quote

[1298:55]

unquote world. And that alone logically

[1298:58]

makes sure that I only try to access

[1299:00]

request.org's name if the key is

[1299:03]

actually there. So, if I go back to the

[1299:05]

browser now, reload without anything

[1299:07]

else in the URL. Now, we're back in

[1299:09]

business and it's saying hello, world.

[1299:11]

But if I go up to the URL bar and add

[1299:13]

name equals David, enter, that too now

[1299:17]

works. So, it's a web application in the

[1299:18]

sense that not only does it have

[1299:20]

function calls as well as a variable,

[1299:22]

but now we've got some conditional logic

[1299:23]

with boolean expressions as well.

[1299:27]

All right, questions on anything we've

[1299:30]

done thus far because it was a lot all

[1299:34]

at once. Questions thus far? Yeah.

[1299:40]

>> Good question. Let's try that. What if I

[1299:42]

just did question mark name equals

[1299:43]

nothing? Well, let me go back to that

[1299:45]

other tab. Uh, delete the name David and

[1299:48]

hit enter. And I indeed see hello,

[1299:50]

nothing. Why? Because the name key is

[1299:52]

provided now. It just doesn't have a

[1299:54]

value. And so the conditional has the

[1299:56]

same answer. Well, yes, name is in

[1299:58]

request.orgs, but there's just no value

[1300:01]

associated with it. And here again is

[1300:03]

the value or a hint at the value of

[1300:04]

using a framework like flask. The fact

[1300:06]

that I can just import the request

[1300:08]

global variable and then ask questions

[1300:10]

like is this parameter in this

[1300:12]

dictionary means I don't have to write

[1300:14]

any of the code that like figures out

[1300:16]

what the URL looks like, break it apart

[1300:18]

between the question mark and the equal

[1300:20]

signs and any amperands therein. That's

[1300:22]

all sort of generic logic that every web

[1300:25]

application has to do. So again, Flask

[1300:27]

is sort of doing that lift for me and I

[1300:29]

can just focus on the logic that I

[1300:31]

actually care about. All right. Well, a

[1300:34]

quick convention here. It's I've used

[1300:36]

the word placeholder here just to kind

[1300:38]

of hit the nail on the head and make

[1300:39]

clear this is a placeholder, but frankly

[1300:41]

it's a little more readable

[1300:42]

stylistically to not just put hello

[1300:44]

generic placeholder, but to say

[1300:46]

something like hello, name so that a

[1300:48]

colleague or even myself looking at this

[1300:49]

file down the line knows that okay,

[1300:51]

we're trying to print out the user's

[1300:52]

name here. That's fine. You can change

[1300:54]

the name of these variables to be

[1300:56]

anything you want. And even though it

[1300:57]

looks weird, it's conventional in Flask

[1300:59]

to do something like this. Name equals

[1301:01]

name. But each of these names means

[1301:04]

something different. This is the name of

[1301:06]

the placeholder that I'm going to put in

[1301:08]

my actual template. This is the value

[1301:10]

that I actually want to give it. And it

[1301:12]

just keeps me a little ser by just

[1301:14]

reusing the same name instead of calling

[1301:16]

it placeholder or placeholder 1,

[1301:17]

placeholder 2, placeholder 3, or

[1301:19]

something generic like that. Now it's

[1301:21]

just a little clear even though it looks

[1301:22]

weird to say name equals name. Again,

[1301:26]

that just allows me to do this in my

[1301:28]

template. All right. Well, what more can

[1301:30]

I do after that? Well, let me propose

[1301:33]

that we can actually go in and simplify

[1301:36]

this code a little bit. It turns out

[1301:38]

this is so common to just ask a question

[1301:40]

as to whether the parameter is there and

[1301:41]

then do something with it or not that

[1301:43]

flask comes with some logic to do this.

[1301:45]

And in fact, I can get rid of all four

[1301:47]

of these lines. Just go ahead and with

[1301:49]

confidence declare a variable called

[1301:51]

name, set it equal to request.orgs,

[1301:53]

arcs, but in the so-called dictionary,

[1301:56]

use a function called get that comes

[1301:58]

with it, which technically doesn't

[1301:59]

relate to the verb that was used by

[1302:02]

HTTP. This just means literally get me

[1302:04]

the following. And if you want to get

[1302:06]

the parameter called name, you literally

[1302:08]

just say quote unquote name. However, in

[1302:11]

case there is no name parameter, you can

[1302:13]

also give this function a default value

[1302:15]

like world. And so now we've collapsed

[1302:17]

into four lines uh from four lines into

[1302:20]

one that exact same logic. So this gets

[1302:22]

me the HTTP parameter called name. But

[1302:25]

if it's not there, it gives me a default

[1302:27]

value of world. So that no matter what,

[1302:29]

this name variable has what I care

[1302:31]

about. Indeed, if I go back over here,

[1302:33]

let's type in how about name equals

[1302:36]

David again. Enter. That's there. If I

[1302:38]

type in uh no name, enter. That too is

[1302:42]

now working as well. All right. Well,

[1302:45]

let's see if we can refine this a bit

[1302:47]

more. Let me propose that in our next

[1302:50]

version of this. Let's introduce a

[1302:52]

second route. So two URLs. Much like uh

[1302:55]

Google has many different URLs as does

[1302:57]

most any web application. At the moment,

[1302:58]

I'm doing everything in my slash route.

[1303:00]

So how might I move away from this?

[1303:02]

Well, let me go ahead and not only add a

[1303:05]

second route, but an actual form via

[1303:07]

which the user can type in their their

[1303:08]

name. So to do this, let me propose that

[1303:11]

in index.html, HTML. Instead of just

[1303:14]

printing out the user's name and

[1303:15]

trusting that they're going to have

[1303:17]

typed their name in manually to the URL,

[1303:19]

which again is not normal behavior,

[1303:20]

let's actually show the user a form via

[1303:23]

which they can do exactly that. So

[1303:24]

here's my form tag. Uh let's say the

[1303:27]

method I'm going to use is get so that I

[1303:29]

see everything in the URL. Let's give

[1303:31]

myself an input uh that whose name is

[1303:34]

name because this is the human's name.

[1303:36]

And notice somewhat confusingly, this

[1303:39]

name on the left is the HTTP, sorry,

[1303:42]

this name on the left is the HTML

[1303:45]

attribute that we saw last week. So,

[1303:47]

it's different from what we just did in

[1303:49]

Python, even though they're all called

[1303:50]

the same thing. The type of this input

[1303:52]

is going to be text. And let's go ahead

[1303:55]

and make this a little more user

[1303:56]

friendly. Let's put some placeholder

[1303:58]

text called name, so the human knows

[1304:00]

what what to type in. Let's go ahead and

[1304:02]

disable autocomplete just so we don't

[1304:04]

see previous input into this text box.

[1304:06]

And let's autofocus it so that the

[1304:08]

cursor is blinking in the text box by

[1304:11]

default. Then lastly, let's go ahead and

[1304:13]

have a button the type of which is

[1304:14]

submit. So that clicking this button

[1304:15]

actually submits the form. And I'm just

[1304:17]

going to call this button like greet

[1304:19]

because I want the user to be able to

[1304:20]

greet themselves by clicking this

[1304:22]

button. Now I should specify action. The

[1304:25]

only other time we used action is when

[1304:27]

we actually went to httpsw.google.com/

[1304:30]

google.com/arch

[1304:32]

that's not relevant today because I'm

[1304:33]

trying to print hello world not search

[1304:36]

for cats and such but this is where I

[1304:38]

too have control if I want to submit

[1304:40]

this form to a specific location on in

[1304:43]

my web application action is where I can

[1304:45]

specify it so why don't I pretend that

[1304:47]

there exists a route in my application

[1304:50]

called /greet and if you go to

[1304:53]

example.com/greet

[1304:55]

question mark name equals David this now

[1304:58]

will greet the user with hello David for

[1305:00]

instance, but slashgreet does not exist.

[1305:02]

If we go back to app.py, literally the

[1305:04]

only route that currently exists is

[1305:07]

single slash, but I can change that. I

[1305:09]

can go into my uh app.py as I have here

[1305:13]

and below this function, I can go ahead

[1305:15]

and define app.rout quote unquote /greet

[1305:18]

and just invent any route that I want. I

[1305:21]

can then define a function that will be

[1305:23]

called whenever that route is visited.

[1305:25]

By convention, to keep myself sane, I'm

[1305:27]

going to call the function the same

[1305:28]

thing as the route, but you don't have

[1305:30]

to do this. It's just to minimize uh

[1305:32]

decisions I have to make. And then in

[1305:34]

this function, what I'm going to do is

[1305:35]

this. Return render template greet.html,

[1305:39]

which doesn't exist yet, but that's a

[1305:41]

problem to be solved. And then I can

[1305:42]

pass in the name of the user. I'm going

[1305:46]

to go ahead and save myself a line of

[1305:47]

code and just say request.orgs.get

[1305:50]

quote unquote name, world. In other

[1305:53]

words, strictly speaking, I don't need

[1305:54]

that variable on its own line. This has

[1305:56]

the effect of what we already did in

[1305:58]

index, but I'm doing it all in one

[1306:00]

elegant oneliner. And now in index, in

[1306:04]

so far as I want the index of the site

[1306:06]

to just show the user the form via which

[1306:08]

they can type in their name, this one's

[1306:09]

easy now. Render template quote unquote

[1306:12]

index.html

[1306:13]

and return that template. So to recap,

[1306:17]

here's index.html, HTML which is now a

[1306:19]

form instead of a template for hello,

[1306:21]

such and such. App.py is going to return

[1306:24]

that template whenever I visit the index

[1306:26]

or slash of the page. And then this

[1306:28]

greet route is going to handle the case

[1306:30]

of printing out greet.html passing in

[1306:33]

the user's name. All right, I think I'm

[1306:35]

not quite good to go yet, but let's try

[1306:37]

this out. Let me go back to my browser

[1306:39]

tab, reload, and there we have it. I

[1306:41]

have a web form now instead of the uh

[1306:44]

the hello, soando, I'm going to go ahead

[1306:46]

and type in my name. And notice the URL

[1306:48]

at the moment, even though Chrome is

[1306:49]

hiding it, technically it's there slash,

[1306:52]

but Chrome and most browsers today sort

[1306:54]

of hide as much stuff as they can if

[1306:56]

it's not all that intellectually

[1306:58]

interesting. But watch what happens when

[1306:59]

I click greet to the URL. It

[1307:02]

automatically sends me to /Greet

[1307:04]

question mark name equals David. And

[1307:06]

this is just like the way the forms

[1307:08]

worked last week when we recreated our

[1307:09]

own version of Google in search.html

[1307:12]

because the action there was

[1307:13]

google.com/arch.

[1307:15]

The user was whisked away to Google

[1307:17]

server. Today I stay on the same server

[1307:19]

because the action I used was quite

[1307:20]

simply slashgree which is assumed to be

[1307:22]

on my own server. But clearly I screwed

[1307:24]

something up because I have a big

[1307:26]

internal server error in front of me as

[1307:28]

you soon will too. Odds are as you dive

[1307:30]

into this uh 500 is the status code that

[1307:33]

means your fault somehow. Now why is

[1307:35]

that? Well, it's unclear from this

[1307:37]

generic black and white message.

[1307:39]

However, because I'm the developer, I

[1307:41]

can go back to VS Code, open my terminal

[1307:43]

window, and recall that I have two

[1307:45]

terminals open now. One that I can type

[1307:46]

stuff in, the other of which is still

[1307:49]

running from before. Let me open up that

[1307:51]

one. And you'll see if I maximize my

[1307:54]

terminal window, a whole bunch of scary

[1307:56]

error messages here. But the relevant

[1307:59]

one is probably going to be, let's see,

[1308:02]

down here. Race template not found

[1308:05]

error. Ginga exceptions template not

[1308:08]

found. Greet.h. html. So there's a lot

[1308:10]

of esoteric error messages here, more so

[1308:12]

than usual, but the simple fact is that

[1308:14]

I just screwed up and I did not create

[1308:17]

greet.html. So file not found by the

[1308:20]

server. So the user doesn't see all that

[1308:22]

complexity. That's deliberate by design.

[1308:23]

It's generally not good for cyber

[1308:25]

security. if you're revealing to the

[1308:27]

user all of the error messages that are

[1308:29]

happening on your server because maybe

[1308:30]

that suggests they can hack in some way

[1308:32]

some way by taking advantage of those

[1308:34]

error messages and the information

[1308:35]

implicit in them. But they are there in

[1308:37]

your terminal window to actually see and

[1308:39]

diagnose. So how do I fix this? Well,

[1308:41]

not a problem. Let me shrink my terminal

[1308:43]

window back down. Let me code a file

[1308:44]

called greet.html.

[1308:47]

And in greet.html, let's create the

[1308:48]

template via which I'm going to greet

[1308:50]

the user, which ironically is the exact

[1308:52]

same as index.html HTML used to be. So,

[1308:55]

let me recreate that real quick. Uh, doc

[1308:58]

type HTML. Let me close my terminal.

[1309:00]

HTML lang equals en uh head uh title

[1309:05]

hello body hello, and there's my uh

[1309:10]

here's my placeholder hello, name. So,

[1309:13]

to be clear, the index.html template

[1309:15]

doesn't have any curly braces or

[1309:17]

anything dynamic. It just spits out the

[1309:19]

HTML for the form. Greek.html HTML spits

[1309:22]

out HTML and the actual greeting. And

[1309:24]

it's app.py that decides which of these

[1309:26]

to show the user. Either index.html if

[1309:29]

they visit the slash route or greet.html

[1309:33]

if they somehow find their way to the

[1309:35]

/greet route, which they will

[1309:36]

automatically by simply submitting that

[1309:39]

form. All right, so let's go back into

[1309:41]

this internal server error and go back

[1309:43]

to the form. Nothing has changed with

[1309:45]

the form, but now when I type in David

[1309:47]

click greet, not only will the URL

[1309:48]

change to be slashgreet question mark

[1309:51]

name equals David, I actually now see

[1309:54]

the content that I expected a moment

[1309:58]

ago. All right. Well, now it's a

[1310:00]

opportunity to critique. I have these

[1310:02]

two templates open, index.html and

[1310:04]

greet.html. And even if you've never

[1310:07]

done web programming before and even if

[1310:08]

you've never did HTML before last week,

[1310:11]

what is bad about this design

[1310:13]

intuitively?

[1310:15]

>> Say again.

[1310:16]

>> Abstraction.

[1310:17]

>> Abstraction in what sense?

[1310:21]

>> Yes. So that's exactly the the hangup I

[1310:23]

have here. There's a lot of duplication.

[1310:25]

And technically I didn't copy paste

[1310:27]

though I might as well have because

[1310:28]

notice as I very hintingly go back and

[1310:31]

forth almost every line of code in these

[1310:33]

files is the same except for the form

[1310:36]

which is there or not there or the hello

[1310:38]

comma like all of the boilerplate HTML

[1310:41]

namely everything I just highlighted

[1310:43]

here lines one through seven in

[1310:44]

greet.html HTML and this and this is

[1310:47]

what we really start to mean about a

[1310:49]

template. Like wouldn't it be nice if we

[1310:51]

could factor out all of that HTML that's

[1310:53]

common to both files, put it in

[1310:55]

literally a template that both routes

[1310:58]

can use so that I can write that

[1311:00]

boilerplate code once instead of again

[1311:03]

and again. Cuz imagine in your mind's

[1311:04]

eye, well, if I have three routes or

[1311:06]

four routes or five routes, I'm going to

[1311:07]

be like typing the same darn HTML three,

[1311:09]

four, five times. That's got to be dumb

[1311:11]

and that's got to be solvable as we've

[1311:13]

seen in other languages as well. So, let

[1311:16]

me indeed go ahead and try to improve

[1311:18]

this. And the syntax is a little weird,

[1311:19]

but it's the kind of thing you get used

[1311:21]

to quite quickly. I'm going to go ahead

[1311:23]

and create a third HTML file now by

[1311:26]

going back to my terminal window inside

[1311:29]

still my templates directory. And by

[1311:31]

convention, this file is going to be

[1311:32]

called layout.html. Why this? That's

[1311:34]

what the flask documentation tells you

[1311:36]

to do. So, in layout.html, HTML. I can

[1311:38]

pull all of my boilerplate HTML, the

[1311:41]

stuff that is invariant and doesn't

[1311:42]

change. So, here we go. Doc type HTML uh

[1311:46]

HTML tag lang equals en close bracket

[1311:50]

open bracket head open bracket title.

[1311:53]

We'll call it hello for all of the

[1311:54]

pages. Open bracket body. And here's

[1311:57]

where it gets interesting. The body is

[1311:59]

the only thing that has been changing in

[1312:01]

these two examples. In index.html,

[1312:04]

it was a web form. In greet.html, HTML.

[1312:07]

It was just a simple string of hello, so

[1312:09]

and so. So, what I want to tell Flask is

[1312:12]

that everything in the body will just be

[1312:14]

a dynamic block of code. And the syntax

[1312:17]

for that, which takes a little bit

[1312:18]

getting used to, but it's also sort of

[1312:20]

copy-pasteable. Block body using percent

[1312:24]

signs this time. And because I don't

[1312:26]

want any such body in the template, I'm

[1312:29]

going to literally close this block as

[1312:31]

follows. And here you see another

[1312:33]

example of sort of HTML like syntax but

[1312:36]

instead of using angled brackets, Ginga

[1312:39]

uh the templating library that Flask

[1312:41]

uses uses curly brace and percent sign

[1312:43]

to open the tag and then the opposite to

[1312:45]

close it. So what you really have here

[1312:47]

are two Ginga tags as we'll call them.

[1312:50]

This one is called block and I'm

[1312:51]

defining an arbitrary name here. I could

[1312:52]

have called it foo bar or baz but

[1312:54]

because I want this block to refer to

[1312:56]

the body of the page by convention I'm

[1312:58]

going to call it body. And then this

[1312:59]

weird syntax which is used in some other

[1313:01]

languages too just means end whatever

[1313:04]

block you just began. And so again you

[1313:06]

just see reasonable people disagreeing.

[1313:08]

The people who invented HTML use nice

[1313:10]

angled brackets and words like these.

[1313:12]

The people who came up with ginger used

[1313:14]

curly braces and percent signs. Why?

[1313:17]

Well, odds are these are not normal

[1313:18]

symbols that a human would type when

[1313:20]

writing uh code, at least in HTML. So

[1313:23]

they just chose something that probably

[1313:24]

wouldn't collide with actual syntax the

[1313:27]

human wants to use. So that's it for the

[1313:29]

template. This is now a uh this is

[1313:32]

essentially a blueprint that doesn't

[1313:34]

have just a placeholder for a single

[1313:35]

word or value like name. I can put a

[1313:38]

whole chunk of code here now instead.

[1313:40]

And how do I do that? Well, let me go

[1313:42]

into index.html with the moment which at

[1313:44]

the moment is a little duplicative in

[1313:46]

that it's got all of this boilerplate.

[1313:48]

So you know what? I'm going to go ahead

[1313:49]

and delete everything that is already in

[1313:52]

my layout both above and below that web

[1313:55]

form. And now I'm going to use a bit

[1313:57]

more ginger syntax. This too takes a

[1313:59]

little while to memorize or copy paste.

[1314:01]

But if I want index.html

[1314:04]

to use the layout.html

[1314:07]

blueprint, I can simply say extends

[1314:10]

layout.html

[1314:12]

and then close tag using percent sign

[1314:15]

close bracket here. And then if what I

[1314:18]

want to plug into that layout is the

[1314:20]

following code, I can say as before

[1314:23]

block uh body and then down here I can

[1314:27]

say

[1314:29]

end block. And that's it. And just to be

[1314:32]

a little nitpicky, I'm going to

[1314:33]

de-indent that slightly. And now even

[1314:35]

though it looks like web pages suddenly

[1314:37]

look a lot uglier. Well, they do because

[1314:39]

like this is weird looking syntax, but I

[1314:41]

have now distilled index.html into its

[1314:44]

essence. This is the only thing that

[1314:46]

changes visav the greeting page. And so

[1314:50]

I've put my HTML here that I care about.

[1314:52]

I've said to Flask, this is what

[1314:55]

index.html's

[1314:56]

body block shall be. Where to put it?

[1314:59]

Well, put it into that particular

[1315:02]

layout.html file. And so the logic for

[1315:05]

greet.html is the same thing. It's going

[1315:06]

to look just as weird, but again, you

[1315:08]

get used to it. Let's go ahead and

[1315:09]

delete everything that's boilerplate in

[1315:11]

greet.html, both above and below. up at

[1315:15]

the top. Let's tell Flask that

[1315:17]

greet.html 2 extends layout.html.

[1315:23]

And let's go ahead and say to Flask that

[1315:26]

the block uh called body shall be this

[1315:30]

for greet.html.

[1315:32]

And the end of this block is now down

[1315:35]

here. And just to be nitpicky, I'll

[1315:37]

de-indent that too. So again, the pages

[1315:39]

look a little weirder now, but it's

[1315:40]

going to follow a paradigm that we just

[1315:42]

see again and again, such that the only

[1315:43]

juicy stuff is what's inside of that

[1315:45]

body block. So now, if I go back to my

[1315:48]

layout, it looks exactly like this. This

[1315:50]

indeed is a placeholder, not just for a

[1315:52]

single variable like name or the

[1315:54]

placeholder we did before. This is the

[1315:56]

placeholder for a whole block of code

[1315:58]

that came from a file, not from a

[1316:00]

variable. And so if I go back into my

[1316:03]

other tab here, go click back to go back

[1316:05]

to the web form and reload. Notice that

[1316:08]

I have the familiar looking form. But if

[1316:10]

I now look at my developer or if I look

[1316:14]

at view page source, notice everything

[1316:16]

that came from the web page from the

[1316:18]

server. Here's that boiler plate up

[1316:20]

here. Here's that boiler plate down

[1316:22]

here. And here's the stuff that's unique

[1316:24]

to this page. And recall too,

[1316:26]

aesthetically I de-indented it, which is

[1316:28]

why it's now no longer pretty printed in

[1316:30]

what the browser sees. Like that's okay.

[1316:31]

There's no reason to obsess over the

[1316:33]

indentation and the pretty printing of

[1316:35]

what the browser sees. Ultimately, the

[1316:38]

reason I did this indentation is because

[1316:41]

arguably when I'm in VS Code here and I

[1316:43]

look at index.html,

[1316:45]

this is clearly indented inside of the

[1316:47]

body block just so I know what's part of

[1316:49]

that block. The browser does not care

[1316:51]

about superfluous whites space or less

[1316:54]

thereof.

[1316:56]

All right, questions on what we've just

[1317:00]

done here, which is to truly take this

[1317:02]

template out for a spin and now remove

[1317:05]

what redundancies I had accidentally

[1317:07]

introduced.

[1317:10]

Questions?

[1317:13]

No. Okay. Amazing. All right. Well,

[1317:15]

let's go ahead and look at this URL

[1317:17]

again. I'm not liking the fact that

[1317:19]

every example we've done thus far

[1317:21]

involves putting my name or Kelly's name

[1317:23]

right there in the URL bar. Well, why is

[1317:25]

that? Well, if I have like a nosy

[1317:26]

sibling and they sit down at my browser,

[1317:28]

they're going to see like every URL I

[1317:29]

visited, including whose name was

[1317:31]

greeted. Now, that's not all that big a

[1317:33]

deal, but now imagine it's a username

[1317:34]

and a password that the form is

[1317:36]

submitting or a credit card number that

[1317:37]

the form is submitting or just search

[1317:39]

terms that you don't want the world

[1317:40]

knowing you're searching for. They're

[1317:41]

going to end up in the URL bar. Why? If

[1317:43]

you are using method equals get for the

[1317:46]

form, that's how get works. It literally

[1317:48]

puts all of the HTTP parameters in the

[1317:50]

URL, which is wonderfully useful if it's

[1317:53]

sort of uh low stake stuff like the

[1317:55]

Google search box or if it is um or

[1317:59]

potentially low stake stuff like the

[1318:00]

Google search box or if you just want to

[1318:02]

be able to hyperlink directly to a URL

[1318:05]

like this. In other words, if I put this

[1318:06]

into an anchor tag open bracket a href

[1318:10]

and a URL like this, I could deep link a

[1318:12]

user to a web page that just always says

[1318:14]

hello, David. So get strings contain all

[1318:16]

of the requisite information to render a

[1318:18]

page for the user. But this isn't really

[1318:20]

good for privacy. So recall that there's

[1318:23]

not only get, but there's also something

[1318:25]

called post. And post is just a

[1318:27]

different HTTP verb that essentially

[1318:29]

with respect to those virtual envelopes

[1318:31]

next last week sort of puts the

[1318:33]

information more deeply inside of the

[1318:34]

envelope such that it's not written

[1318:36]

right there in the URL bar, but it's

[1318:38]

still accessible by the server. So if I

[1318:41]

do this, watch what happens. Let me uh

[1318:43]

go back into VS Code. Let me go back

[1318:46]

into index.html which has the form. And

[1318:49]

let me quite simply change the method

[1318:50]

from get to post. And now let me go back

[1318:54]

to my other browser tab. Back to the

[1318:56]

form and reload so that the form knows

[1318:58]

that the method has changed. Now type in

[1319:01]

David and click greet. And before I do

[1319:03]

that, let me zoom in on the URL bar.

[1319:05]

Notice that the URL does change. I'm at

[1319:09]

slashgreet, but I haven't revealed to

[1319:11]

the world or to anyone with physical

[1319:13]

access to my browser what URL I just

[1319:15]

searched for. All they know is that I

[1319:17]

went to /greet, but not the key value

[1319:19]

pair or pairs that were passed in. Of

[1319:21]

course, this clearly hasn't worked. I've

[1319:23]

got an HTTP status code of 405, which

[1319:25]

means method not allowed. That's because

[1319:27]

flask by default when defining routes

[1319:30]

simply assumes that you want get instead

[1319:33]

of post. Now, get is good for the

[1319:36]

default page. In fact, when I go back

[1319:38]

here, this is equivalent to me visiting

[1319:40]

the slash route just in the browser. So,

[1319:42]

I want my index to generally support

[1319:44]

get, but the greet route should support

[1319:48]

post. And the simplest way to do this is

[1319:50]

to pass in another argument to the route

[1319:52]

function, which we haven't needed before

[1319:53]

because the default is get. And I can

[1319:55]

instead tell flask a commaepparated list

[1319:59]

of the HTTP methods that I want this

[1320:01]

route to support. So if I wanted to

[1320:02]

support just post, I can pass in a list

[1320:05]

containing just post. And recall FL uh

[1320:08]

Python uses square brackets for lists,

[1320:10]

which are their version of arrays in C.

[1320:13]

Now by default, this argument is this

[1320:16]

methods equals get. And that's why the

[1320:17]

only thing supported a moment ago was

[1320:19]

get. That's why I'm now changing it to

[1320:21]

be post instead. I have to make one

[1320:24]

other change though. It turns out if you

[1320:26]

read the documentation when accessing

[1320:28]

HTTP parameters via post instead of get

[1320:32]

you move from using request.orgs to

[1320:35]

request form. This is completely

[1320:37]

unintuitive that request.orgs is get and

[1320:40]

request.form is post because they all

[1320:43]

come from forms. So it's bad naming

[1320:46]

admittedly. So you just kind of have to

[1320:47]

remember request.orgs is used for get.

[1320:50]

Request form is used for post. So all I

[1320:53]

need to do further is change this to be

[1320:55]

request.form

[1320:58]

and that's it. Now my web application

[1321:01]

will support web form submitting to it

[1321:03]

via post instead of get. Let me go ahead

[1321:05]

and type in my name. Now I'll zoom in.

[1321:07]

Notice that the URL will again change to

[1321:09]

/greet with no parameters evident. But I

[1321:12]

will be greeted this time because the

[1321:14]

server knew to look deeper into that

[1321:15]

envelope for those key value pairs

[1321:18]

instead. And just to be now uh sort of

[1321:22]

diagnostic about this, let me go back

[1321:23]

once more. Let me rightclick or

[1321:25]

control-click on my desktop and go to

[1321:27]

inspect. Here's where developer tools

[1321:29]

can be super useful as well. I'm going

[1321:32]

to go in here and I'm going to go ahead

[1321:34]

and clear this. And now I'm going to

[1321:36]

type in David again and I'm going to

[1321:38]

click greet. But because I have the

[1321:40]

network tab open like we played with

[1321:41]

last week, it's going to show me all of

[1321:43]

the requests going from my browser to

[1321:45]

server, which is going to be useful here

[1321:47]

because not only do I see, okay, it

[1321:48]

obviously worked because I got back a

[1321:50]

200, but if I click on this diagnostic

[1321:52]

output, I can actually go to the payload

[1321:55]

tab here and I'll see that the form data

[1321:58]

that was submitted was name, the value

[1322:00]

of which was David. So you can see what

[1322:02]

you're submitting. So you can do this

[1322:03]

today like if you want to log into some

[1322:05]

website uh Gmail or otherwise you can

[1322:07]

actually see all of the data that your

[1322:09]

own keyboard is submitting to the server

[1322:11]

even if it's using post because the

[1322:13]

browser that you control of course can

[1322:15]

see the same there.

[1322:18]

All right, any questions now on this

[1322:21]

transition from get to post

[1322:26]

kind of on a roll or not going so well.

[1322:28]

We'll see. All right, so what more can

[1322:30]

we do with this? Well, let's give

[1322:32]

ourselves a couple more building blocks

[1322:33]

before we transition to actually

[1322:35]

implementing some real world problems as

[1322:37]

I did years ago with one such example.

[1322:39]

Suppose that I don't like this direction

[1322:42]

I'm going in in so far as every time I

[1322:44]

have a page with a form, it submits to

[1322:46]

another route altogether. Cuz in your

[1322:48]

mind's eye, just kind of extrapolate.

[1322:50]

Well, if I have two forms on my page, I

[1322:52]

now need four routes. If I have three

[1322:54]

forms, I need six routes. It seems a

[1322:56]

little annoying that you use one route

[1322:58]

just to show the form and another route

[1323:00]

to process the form. This is going to

[1323:02]

get annoying over time because it's like

[1323:04]

twice as many routes as might be ideal.

[1323:06]

So, is there a way to get kind of the

[1323:07]

best of both worlds and combine these

[1323:08]

two routes into one so that everything

[1323:11]

related to greeting the user all happens

[1323:13]

in one place? Well, you can as follows.

[1323:16]

What I'm going to go ahead and do is

[1323:18]

delete my greet route al together and

[1323:20]

most of my index route. But I'm going to

[1323:22]

ask a question. I'm going to first say

[1323:25]

that the methods that the index route

[1323:27]

support now shall be both get and post

[1323:29]

as a commaepparated list there. And then

[1323:32]

inside of my index route I can simply

[1323:34]

ask a question of the form if the

[1323:36]

request that is submitted to the server

[1323:38]

has a method of post then assume that

[1323:44]

form was submitted. This is just a

[1323:46]

Python comment note to self that I'm

[1323:47]

going to come back to in a moment. else

[1323:49]

if the request method is not post. So I

[1323:51]

could technically say if l if uh l if

[1323:54]

request method equals equals get then

[1323:59]

but this is kind of dumb because I only

[1324:00]

support two verbs. So I might as well

[1324:03]

just assume for efficiency else handles

[1324:06]

the get implicitly then go ahead and

[1324:09]

assume that no form was submitted. So

[1324:12]

show form. So just notes to self as to

[1324:14]

what I want to do. So how do I show the

[1324:16]

form? Well this line was easy. return

[1324:18]

render template of index.html.

[1324:23]

If though the form was submitted, what

[1324:25]

do I want to do? Well, just as before,

[1324:26]

let's return render template greet.html

[1324:32]

passing in a name value of

[1324:34]

request.form.get

[1324:36]

quote unquote name else a default value

[1324:38]

of world. So, the exact same logic from

[1324:41]

each of the two functions a moment ago,

[1324:44]

but I've now combined them into one by

[1324:46]

just using some conditional logic and

[1324:47]

just asking the server if the user got

[1324:49]

here via post, well, the only way they

[1324:51]

could have gotten here via post is by

[1324:53]

having clicked that button and submitted

[1324:54]

the form. So, let's just go ahead and

[1324:56]

greet them. Else, if they got here via

[1324:58]

get by just typing in example.com or

[1325:00]

whatever the actual URL is, let's go

[1325:02]

ahead and show them the template. So,

[1325:04]

it's still good design in that I have a

[1325:06]

separate template for each of these

[1325:08]

pieces of functionality that is only

[1325:10]

minimally different, but I'm sort of

[1325:13]

deciding which of those to show based on

[1325:16]

the actual logic in this here app. All

[1325:18]

right, so this is almost perfect except

[1325:21]

for one bug. What else needs to change

[1325:25]

if I've just combined my greet route and

[1325:29]

this default slash route as well?

[1325:34]

Yeah.

[1325:36]

Yeah. So, in the form that has

[1325:38]

index.html, recall that there's an

[1325:40]

action line that specifies like to what

[1325:42]

URL do you want to submit this? Well,

[1325:44]

let me go back to index.html. It can't

[1325:46]

be /greet anymore because that doesn't

[1325:48]

exist. So, I'm just going to delete the

[1325:49]

word greet and submit it to slash

[1325:51]

instead, which will have the effect of

[1325:54]

also just omitting it entirely. If you

[1325:55]

don't specify an action, it submits to

[1325:57]

the very location that it came from. But

[1325:59]

if you want to be pedantic and even more

[1326:01]

clear, just specifying that the action

[1326:03]

now of this form is just this, then that

[1326:06]

will work here, too. All right, so let's

[1326:09]

test it. Let's go back to the other tab.

[1326:11]

Back to the form, reload. It's blank

[1326:13]

now. I type in David. Click greet. And

[1326:16]

this two is working. But again, if I go

[1326:18]

back and reload, get is working as well.

[1326:20]

But there's nothing ending up in the URL

[1326:22]

because I'm now using post, which again

[1326:24]

tends to be a good thing for privacy

[1326:26]

reasons as well. Let me show one final

[1326:28]

flourish before we transition to

[1326:30]

something realworld motivated. If I go

[1326:32]

into app.py, for a while now, I've been

[1326:36]

passing in this default value of world,

[1326:37]

which is fine, especially if it's

[1326:39]

something short and sweet. That's the

[1326:40]

default value. But I can actually put a

[1326:42]

bit of conditional logic in my template

[1326:44]

as well. So, in fact, let me go into

[1326:46]

greet.html HTML and trust that I will

[1326:49]

now be passed in a name variable. But I

[1326:52]

can decide for myself in the template

[1326:54]

whether I want to say hello name or if

[1326:56]

it's blank hello world instead. And how

[1326:59]

might I do this? Well, I can always say

[1327:01]

hello, but then I'm going to use some

[1327:03]

Ginga syntax that we haven't seen yet.

[1327:05]

But it turns out in Ginga, the

[1327:07]

templating language that Flask uses, you

[1327:09]

can use Python-like syntax too. And you

[1327:11]

can ask questions like well if uh the

[1327:13]

name variable has a value well then go

[1327:16]

ahead and output the value of that name.

[1327:19]

Else if the name variable does not have

[1327:21]

a value go ahead and output a literal

[1327:23]

value like world. Uh and then down here

[1327:25]

end if. So ginger again is a little

[1327:27]

weird in that it says end block end if

[1327:29]

but that's the way it is. But even

[1327:31]

though this looks a little weird, it's

[1327:32]

just a nice clever way of putting a bit

[1327:34]

of logic into my template. And if the

[1327:37]

name has a value, so it's not empty or

[1327:39]

none, go ahead and display it. Hence the

[1327:42]

curly braces. Else go ahead and

[1327:43]

literally say world. Why is it not

[1327:45]

problematic? And you can see the dots

[1327:46]

here that there's all of this white

[1327:48]

space after the word hello,

[1327:53]

like otherwise this would seem to create

[1327:55]

quite a messy paragraph or phrase of

[1327:58]

text in terms of whites space. But

[1328:01]

>> HTML ignore ignores superfluous whites

[1328:04]

space. So anything more than a single

[1328:05]

space just gets canonicalized or

[1328:08]

collapsed into a single space. And we

[1328:09]

saw that recall last week accidentally

[1328:11]

when I had those three paragraphs of of

[1328:13]

text uh from uh from the duck, but I

[1328:17]

wanted them deliberately to be separate

[1328:19]

paragraphs and they weren't because all

[1328:20]

of that white space was ignored until I

[1328:22]

actually introduced the uh paragraph tag

[1328:26]

instead. So this just moves some of that

[1328:28]

logic. now to the templates. So for all

[1328:31]

this logic and more, here's the official

[1328:32]

documentation for Flask and specifically

[1328:35]

Ginga's own documentation, but for the

[1328:38]

most part, we've seen what's possible

[1328:40]

already. And I promised a real world

[1328:42]

example. So here now it is. So uh back

[1328:45]

when I took CS50 as a sophomore, there

[1328:49]

was no web programming in the class. And

[1328:50]

frankly, there was barely any web

[1328:52]

actually in the world because it was all

[1328:53]

so new HTML and the like. But uh it was

[1328:57]

my sophomore, spring maybe or junior

[1328:59]

fall that I also got involved in the

[1329:01]

freshman inter mural sports program or

[1329:02]

frost IM's for short. And back in the

[1329:04]

day uh we would walk from say Matthews

[1329:07]

Hall to Wigglesworth uh freshman year at

[1329:09]

least to register for sports by filling

[1329:12]

out what was called a sheet of paper and

[1329:14]

then you would go to the proctor's dorm

[1329:15]

room and slide it like under their door

[1329:17]

or through the mail slot and that's how

[1329:18]

we registered for sports. It was sort of

[1329:20]

ripe for disruption before that was even

[1329:21]

a phrase. And so one of the very first

[1329:23]

projects I took on myself personally

[1329:25]

after taking CS50 was to figure out how

[1329:27]

web programming worked. And Python

[1329:29]

wasn't really a wasn't a thing yet uh

[1329:32]

nor was half of the topics we've been

[1329:33]

talking about thus far. But at the time

[1329:35]

I learned a programming language called

[1329:37]

Pearl. I learned a little something

[1329:38]

about CSV files which we did a couple of

[1329:41]

weeks back too. And I built this the

[1329:43]

freshman intramural sports website via

[1329:45]

which you could click on a bunch of

[1329:46]

links and get some information. But most

[1329:48]

importantly, you could register for

[1329:49]

sports as by typing in your name,

[1329:51]

selecting the sport for which you want

[1329:53]

to register, click submit, and no longer

[1329:55]

walk across Harvard Yard with a piece of

[1329:56]

paper to actually register for sports.

[1329:58]

So, we thought we'd use this as sort of

[1330:00]

the beginning of a motivation for how we

[1330:03]

can now solve problems using web- based

[1330:04]

interfaces using code. Um, and also what

[1330:07]

not to do, like background images that

[1330:09]

repeat like this are not really in

[1330:10]

fashion anymore, nor arguably in 1997.

[1330:13]

Um but let's leave that as a cliffhanger

[1330:14]

and come back in 10 minutes after a

[1330:16]

snack with re-implementing the frost

[1330:18]

IM's website. All right, we are back. So

[1330:21]

among the goals now are to recreate the

[1330:24]

beginnings of a site like this for frost

[1330:26]

IMS whereby we want to enable students

[1330:28]

to uh visit a form, fill out that form

[1330:31]

and submit it to a server and then

[1330:32]

register. And we'll dispense with all of

[1330:35]

the amazing graphics and such and keep

[1330:37]

it fairly simplistic and core HTML. So

[1330:40]

let's go ahead and do this. Back here in

[1330:41]

VS Code, I've gotten ready now for this

[1330:44]

next set of examples. And in particular,

[1330:45]

I've created in advance a directory

[1330:47]

called frost im.py,

[1330:51]

requirements.ext, and templates, which

[1330:53]

are essentially the same as the ones we

[1330:54]

just created, but I stripped out the

[1330:56]

hello and greeting specific stuff. I'm

[1330:58]

going to go ahead in this terminal and

[1331:00]

do flask run. So, I get the server up

[1331:01]

and running again on port 5000. And then

[1331:04]

I'm going to go ahead and open up

[1331:05]

another terminal here as I did before.

[1331:07]

cd into frost ims in that terminal where

[1331:09]

I'll see the exact same files and I'll

[1331:11]

give you a quick tour of what I created

[1331:13]

in advance. So here in app.py is quite

[1331:16]

simply the simplest of applications that

[1331:18]

just renders the index.html template

[1331:21]

with an expectation in a moment that

[1331:22]

we're going to make it more interesting

[1331:24]

than that. Meanwhile, if I open my temp

[1331:26]

uh my terminal again and open up

[1331:27]

requirements.txt, it just mentions

[1331:30]

flask, but it's already installed. So no

[1331:32]

more to say about that for now. Now, let

[1331:34]

me go ahead lastly and open up

[1331:35]

templates, uh, the templates folder. Two

[1331:37]

files there in the first of which is

[1331:39]

layout.html, which looks almost the

[1331:42]

same, except I did add a slightly more

[1331:44]

userfriendly tag to the head of the

[1331:46]

page, which you might not have seen

[1331:48]

before, but this is a tag that

[1331:49]

essentially you can copy and paste into

[1331:50]

templates of your own that help the

[1331:53]

content of a page resize to be mobile

[1331:55]

friendly. In fact, without this line, if

[1331:57]

you were to develop problem set 9 or

[1331:59]

your final project for the web and then

[1332:01]

try to access the site on a phone,

[1332:03]

everything might look quite a bit too

[1332:04]

small, font sizes and more, this line

[1332:07]

tends to help the browsers resize

[1332:08]

dynamically so that it actually matches

[1332:10]

the width of the devices own width. For

[1332:13]

instance, a phone versus a laptop or

[1332:15]

desktop. But otherwise, everything else

[1332:16]

is the same there, including the

[1332:18]

placeholder for the body block that I've

[1332:20]

defined here on line 9. Lastly, there's

[1332:23]

one more file that at the moment doesn't

[1332:25]

do anything all that interesting except

[1332:27]

is ready to contain the contents of the

[1332:30]

registration form for frost IM. So,

[1332:34]

let's go ahead and start with actually

[1332:35]

that. Let me quickly whip up a form that

[1332:38]

minimally gives the user something that

[1332:40]

they can submit to the server to

[1332:41]

register for sports and then we'll

[1332:43]

improve upon it a bit iteratively. So,

[1332:45]

here inside of the body of index.html,

[1332:48]

html which is going to extend the actual

[1332:50]

layout, the blueprint we already

[1332:51]

created. I'm going to have a quick title

[1332:53]

for the page like register just to make

[1332:55]

clear to the student what they need to

[1332:56]

do using the H1 which is the big and

[1332:58]

bold tag. Then I'm going to go ahead and

[1333:00]

have a form tag uh whose uh action is

[1333:04]

going to be anything I want, but since I

[1333:06]

want the user to register, I'm going to

[1333:08]

have it go to slashregister, which makes

[1333:10]

more sense semantically than greet now

[1333:12]

because we're doing something else. The

[1333:14]

method I'm going to have the student use

[1333:16]

is post, if only because they don't want

[1333:18]

their roommates knowing what they

[1333:19]

visited in their browser. So this way it

[1333:20]

will tuck the HTTP parameters deeper in

[1333:22]

that virtual envelope so it's not stored

[1333:24]

in the browser's history. Inside of this

[1333:26]

form, I'm going to have minimally an

[1333:29]

input box for the student's name. So

[1333:31]

I'll call that aptly name and set name

[1333:34]

equal to name in my HTML. The type of

[1333:36]

this text box will be exactly that text.

[1333:38]

And then just to make it a little more

[1333:39]

user friendly, I'm going to add a

[1333:41]

placeholder of name so they know what to

[1333:42]

do. I'm going to go ahead and uh turn

[1333:45]

off autocomplete in case multiple

[1333:46]

roommates want to uh sign in from the

[1333:48]

same computer, register from the same

[1333:50]

computer. And then we'll turn on

[1333:51]

autofocus to put the cursor in that name

[1333:53]

box. And then, and you didn't see this

[1333:55]

last week, but if you've ever wondered

[1333:57]

how drop-own menus are implemented in

[1333:59]

HTML, if you've never done this

[1334:00]

yourself, those drop-own menus on web

[1334:02]

pages are called select menus. And if I

[1334:04]

want the user to select a sport to

[1334:06]

register for, I'm going to call this

[1334:07]

input a uh sport. And this is an

[1334:11]

alternative to just having a generic

[1334:13]

text box where we have the students type

[1334:15]

in the sport they want to register for

[1334:16]

which would be fraught with

[1334:17]

typographical errors and changes in

[1334:19]

capitalization. A drop-own menu of

[1334:21]

course standardizes what the human can

[1334:23]

select. So inside of this dropdown I'm

[1334:26]

going to have a few options. uh the

[1334:27]

first of which uh will be uh basketball

[1334:31]

for instance, the second of which will

[1334:33]

be soccer and the third of which I think

[1334:36]

was the first three with which we

[1334:37]

debuted back in the day was ultimate

[1334:40]

frisbee. Now these option tags can take

[1334:42]

some attributes. Uh by default they will

[1334:45]

take on the value of whatever words are

[1334:47]

typed in between the open and close

[1334:48]

tags. But just to be pedantic I'm going

[1334:50]

to make clear that the value of

[1334:52]

selecting this option shall be

[1334:54]

basketball. But I could change it to be

[1334:56]

something else if I so chose. The value

[1334:58]

of this selection will be soccer and the

[1335:01]

value of this last option will be

[1335:04]

ultimate frisbee just in case I want to

[1335:06]

store something else in my database

[1335:08]

ultimately. Now that is a complete

[1335:10]

index.html I think. So if I go back to

[1335:14]

uh my browser tab which previously was

[1335:16]

showing me the hello program because I

[1335:18]

stopped and restarted Flask and you can

[1335:20]

stop flask by just hitting C uh for

[1335:22]

interrupting it. I'm going to reload the

[1335:24]

page and I should now see okay a

[1335:25]

slightly more interesting form with a

[1335:27]

name box with the uh cursor is blinking

[1335:29]

there and then a select menu a dropown

[1335:31]

with three options. Now it's a little

[1335:33]

presumptuous of me to select basketball

[1335:35]

by default and in fact this is kind of

[1335:37]

inviting user error if they type in

[1335:39]

their name don't really think about it

[1335:40]

and now register for basketball

[1335:42]

accidentally. So I'm going to make a

[1335:44]

couple of improvements here. I'm

[1335:45]

actually gonna have essentially a blank

[1335:47]

option at the top whose value is nothing

[1335:50]

and I'm gonna have it just labeled

[1335:52]

sport. And just to be super clear, I'm

[1335:55]

going to select this value by default.

[1335:57]

So the option tag in HTML supports not

[1335:59]

only a value attribute, but it turns out

[1336:01]

a selected attribute, which if present

[1336:03]

means that's the option that will be

[1336:05]

selected by default. So if we go back

[1336:08]

now to this page and reload to get a new

[1336:10]

copy of the HTML, looks a little better.

[1336:12]

I still have the name at left, but the

[1336:14]

sport now menu looks like this. So, it's

[1336:16]

a little more clear what I want them to

[1336:18]

do from this dropdown. And sport

[1336:21]

deliberately on the back end won't have

[1336:23]

a value. And theoretically, this will

[1336:24]

help me determine if they actually

[1336:26]

selected a sport or just clicked

[1336:28]

register and ignored the drop down

[1336:30]

still. But I do need a way for them to

[1336:32]

register ideally by clicking a button.

[1336:34]

So, I'm going to add a button, the type

[1336:36]

of which is submit. And then I'm going

[1336:38]

to have this button's label be register.

[1336:40]

So now if I go back to the form once

[1336:42]

more, reload, I now have I think a

[1336:45]

complete form, albeit not very pretty,

[1336:47]

via which David can register, for

[1336:49]

instance, for basketball by clicking

[1336:51]

register. And ah darn it, I have a 404

[1336:55]

not found. But why is that?

[1336:58]

Why is nothing yet found? Why is

[1337:02]

slashregister not found? Yeah,

[1337:05]

>> what's that?

[1337:07]

>> I haven't Well, I haven't linked the

[1337:08]

option to anything. I think the form has

[1337:11]

been linked. Whoops. The form is telling

[1337:13]

the browser to go to slregister. So,

[1337:15]

this is correct behavior. But if we go

[1337:18]

to app.py, like there's no route defined

[1337:20]

for slregister. So, of course, it's not

[1337:22]

found because there's an infinite number

[1337:24]

of routes that don't exist and register

[1337:25]

is currently among those. So, I can

[1337:28]

define that myself. I can say app.root

[1337:30]

quote unquote register. Uh, I do want to

[1337:33]

use post. So I need to proactively say

[1337:35]

that the methods this uh function will

[1337:37]

support will be indeed post instead of

[1337:40]

the default of get. I'm going to define

[1337:41]

an actual function to call when this

[1337:43]

route is used. And by convention I'm

[1337:44]

going to call it just register even

[1337:46]

though I could call it anything I want.

[1337:48]

And inside of my register function, well

[1337:50]

for now I'm going to cheat a little bit.

[1337:52]

I'm going to at least just say uh I'm

[1337:55]

going to at least check that the user

[1337:57]

has given me a name and a sport. So how

[1337:59]

can I express this? Well, because I have

[1338:01]

already imported the request global

[1338:03]

variable that comes with flask, I can

[1338:05]

ask questions of it. And I can say

[1338:06]

something like if it is not the case

[1338:08]

that request.form.getame

[1338:12]

has a value or if it's the case that or

[1338:15]

if it's not the case that

[1338:16]

request.form.getport

[1338:19]

has a value, then let's go ahead and

[1338:21]

give the user uh a warning of sorts.

[1338:24]

I'll return render template of a file

[1338:26]

called failure.html.

[1338:28]

This doesn't exist yet, but no big deal.

[1338:30]

Let me go back into my terminal. Let me

[1338:33]

uh go into templates and create a file

[1338:35]

called failure.html.

[1338:37]

And in this file, I'm going to say that

[1338:39]

it extends

[1338:41]

uh layout.html.html.

[1338:46]

And then it has a block body inside of

[1338:50]

which is going to be something like

[1338:51]

super trivial for now, just to get us

[1338:54]

going. And this failure page is simply

[1338:56]

going to say you are not registered

[1339:00]

exclamation point and then end block. So

[1339:03]

that's it. Just sort of an error page

[1339:04]

that now exists. I'm going to close it

[1339:06]

out of sight, out of mind. But I think

[1339:08]

this now will work. If it is not the

[1339:10]

case that the user gave us a name or

[1339:12]

it's not the case that the user gave us

[1339:13]

a sport, we will show this error

[1339:15]

message. Otherwise, if all seems to be

[1339:17]

well, for now, we're not going to do

[1339:19]

anything useful with the information,

[1339:20]

but I'm going to go ahead and return

[1339:22]

render template of success.html,

[1339:25]

which is simply going to assume that the

[1339:27]

user was successfully registered. So,

[1339:28]

let's whip that up quickly. Uh, I'm

[1339:30]

going to go ahead and code up

[1339:32]

success.html

[1339:34]

inside of this file, which will

[1339:35]

similarly extend uh layout.html

[1339:39]

inside of which there's a body block

[1339:42]

that quite simply says, "How about you

[1339:44]

are registered?" and we'll just pretend

[1339:45]

that it is so and block. So that's it.

[1339:48]

In short, I want the two templates that

[1339:49]

show failure or success respectively. So

[1339:53]

I think now in app.py, we're in better

[1339:55]

shape. I now have a register route that

[1339:58]

will get called if post is used to visit

[1340:00]

it. And I'm going to check request.form,

[1340:02]

which is where you get the post

[1340:03]

variables from. Check whether name or

[1340:05]

sport is provided. And I'm going to

[1340:07]

render a template accordingly. So let's

[1340:09]

try this. Let me go back to my other tab

[1340:11]

and go back to the form. Let me type in

[1340:13]

my name, David, but no sport. Click

[1340:16]

register, and I have an internal server

[1340:19]

error, which was not intended. So, let's

[1340:21]

figure out how to diagnose this. So, it

[1340:22]

seems to be the case that I'm at

[1340:24]

/register. That was intended, but

[1340:26]

something clearly went wrong. So, let's

[1340:27]

go back. Now, I could just kind of stare

[1340:29]

at my code endlessly, but recall that

[1340:31]

there should be some hints in my

[1340:32]

terminal window that's running Flask.

[1340:34]

So, let me go back to my other terminal,

[1340:36]

and there it is. Unexpected char double

[1340:39]

quote at line 11. Well, look, sounds

[1340:42]

like user error. So, that is in

[1340:44]

failure.html.

[1340:45]

And you can kind of see it because Flask

[1340:47]

is like underlining it literally for me.

[1340:48]

What did I do that was stupid?

[1340:52]

Yeah, I just didn't close my quote. So,

[1340:53]

amateur hour here. So, let me go into I

[1340:56]

do need to open it after all,

[1340:57]

ironically. So, let's go ahead in my

[1340:59]

other terminal, open up failure.html.

[1341:03]

And there it is. One stupid character

[1341:05]

away from correctness. All right, let's

[1341:07]

close this again. Go back to the other

[1341:09]

tab. Let's try this again. David as my

[1341:12]

name but no sport. Register. Okay, you

[1341:15]

are not registered. I don't know why,

[1341:17]

but I know I'm not registered. Let's try

[1341:18]

it again with a name. Uh with no name,

[1341:21]

but yes, a sport. Click register. You

[1341:24]

are not registered. All right, just for

[1341:25]

good measure, let's give no name and no

[1341:27]

sport. You are not registered. So, that

[1341:29]

seems to be working. Let's now

[1341:30]

cooperate. Let's go ahead and register

[1341:32]

as David for basketball. Cross my

[1341:34]

fingers. Damn it. And internal server

[1341:38]

error. Let's try to learn from my past

[1341:39]

mistakes. Let's open up this eyeball it.

[1341:42]

I did it twice even though that was not

[1341:43]

copy paste. So 0 for two. All right,

[1341:46]

let's go back here. Notice now I can

[1341:48]

actually just click reload because the

[1341:49]

browser is smart enough to remember what

[1341:51]

I just posted to the server. So if I

[1341:53]

click reload, you'll be prompted to

[1341:55]

confirm the form submission less you be

[1341:58]

doing this on a website with your credit

[1341:59]

card or something where you don't want

[1342:01]

to send it twice. But in this case, I'm

[1342:03]

fine with sending my name and basketball

[1342:04]

twice. So I'm going to click continue.

[1342:06]

And this time it worked telling me that

[1342:09]

I'm actually registered. So I'm not

[1342:11]

doing anything with the students data,

[1342:13]

but at least I am validating that they

[1342:16]

gave me some input. Now there's a catch

[1342:19]

here. The catch of course with HTML is

[1342:22]

that it's all executed s client side.

[1342:25]

And so for instance, suppose that a

[1342:27]

student is really upset that we only

[1342:29]

offer basketball, soccer, and ultimate

[1342:31]

frisbee. And maybe they really want to

[1342:33]

register for volleyball even though

[1342:34]

we're not offering volleyball. Well,

[1342:35]

there's arguably like a security

[1342:37]

vulnerability here where technically my

[1342:39]

code right now will tolerate any user

[1342:42]

input even if it's not in that dropdown

[1342:44]

because after all, let me go ahead and

[1342:46]

rightclick or control-click on my web

[1342:48]

page and open up the developer tools.

[1342:51]

Let me go into the form as sort of a

[1342:53]

hacker type student. Let me go into the

[1342:55]

select menu and okay, no big deal. If I

[1342:58]

want uh ultimate frisbee to exist, well,

[1343:00]

I just need to know a little HTML. I'm

[1343:02]

going to rightclick on that element and

[1343:04]

click edit as HTML. This literally lets

[1343:07]

me start editing the HTML of the page.

[1343:09]

I'm going to give myself my own option.

[1343:11]

Option value equals volleyball. Close

[1343:15]

bracket volleyball. Uh, enter. And now

[1343:19]

when I close developer tools, woohoo, I

[1343:21]

can register for volleyball if I want.

[1343:23]

So let's select volleyball. Type in

[1343:26]

maybe Kelly is hacking the site.

[1343:28]

Register. And she is registered for

[1343:30]

volleyball apparently. All right. So the

[1343:32]

short answer is the short the takeaway

[1343:34]

here is do not trust user input ever for

[1343:37]

reasons we've already seen when we

[1343:38]

discuss SQL ever more so now that we're

[1343:40]

dealing with the web because who knows

[1343:42]

what users are going to do accidentally

[1343:44]

foolishly or even in Kelly's case here

[1343:46]

maliciously trying to pass data that we

[1343:49]

did not expect. So what would be the

[1343:50]

defense against this? Like this is just

[1343:53]

how HTML works and assume that I'm

[1343:55]

actually registering Kelly for sports

[1343:57]

now and somehow she's now signed up for

[1343:58]

volleyball in our database. What would a

[1344:00]

solution be logically here?

[1344:03]

Yeah.

[1344:05]

>> Yeah. So maybe do some server side

[1344:07]

validation. So don't just blindly check

[1344:09]

that we have a value from the user.

[1344:12]

Actually check that it's one of those

[1344:13]

sports. So if I go back to app.py, I

[1344:16]

could do this in a few ways. And maybe

[1344:18]

my first instinct would be this. Let's

[1344:20]

check for the name and do this. But

[1344:22]

let's also do this. Like if request

[1344:25]

form.get get quote unquote uh sport. And

[1344:30]

actually, let's put this in a variable

[1344:31]

just to make it even easier to type. So,

[1344:33]

sport equals this. If sport uh how about

[1344:38]

does not equal uh what was it? Basket

[1344:42]

ball and sport does not equal uh soccer

[1344:48]

and sport does not equal quote unquote

[1344:52]

ultimate frisbee, then render an error.

[1344:55]

So, uh, return render template quote

[1344:59]

unquote failure.html.

[1345:01]

So, now if I go back to this form and

[1345:04]

try to register as Kelly again, you are

[1345:06]

not registered. So, I somehow caught her

[1345:08]

because volleyball of course is not in

[1345:10]

the list of sports that I put there. But

[1345:12]

what might you not like about this

[1345:14]

approach?

[1345:16]

Even if you've never done web stuff

[1345:17]

before, what's bad about this?

[1345:20]

>> Yeah, I have to hardcode every single

[1345:22]

sport now in not only app.py PI to check

[1345:25]

for the validity on the server of what

[1345:27]

the humanness has typed in. But recall

[1345:29]

that the drop down itself came from

[1345:30]

index.html. So I now in duplicate have

[1345:33]

to put like all of the sports there too.

[1345:35]

So like this just seems bad to have

[1345:37]

duplication. And so better might be to

[1345:39]

do something more like this at the top

[1345:41]

of my file here. Why don't I go ahead

[1345:42]

and just give myself a global variable

[1345:44]

which in the context of this web app is

[1345:46]

perfectly reasonable. So I can access it

[1345:48]

anywhere. Let's call it sports in all

[1345:50]

caps just to note that this is a global

[1345:52]

variable in constant. Even though Python

[1345:54]

does not have consts in the sense that C

[1345:56]

does, but this is sort of on the honor

[1345:58]

system. If you see a variable in all

[1345:59]

caps like this, just don't mess with it.

[1346:01]

Use it, but don't mess with it. So, uh,

[1346:04]

inside of the square brackets, this is

[1346:06]

going to be a list of the sports that I

[1346:07]

do want to support. So, basket ball,

[1346:11]

uh, soccer,

[1346:13]

ultimate frisbee, and that's it. Now,

[1346:16]

instead of doing all of this, what I can

[1346:19]

instead ask is a simpler question like

[1346:21]

this. If sport not in sports, then go

[1346:25]

ahead and return render template quote

[1346:28]

unquote failure.html.

[1346:30]

And I can actually tighten this up a

[1346:32]

little bit. I don't need two calls to

[1346:34]

failure.html. Why don't I just borrow

[1346:36]

this code and say or uh sport not in

[1346:42]

sports render a failure. And now I've

[1346:44]

tightened this up quite a bit more, but

[1346:47]

I'm essentially using Python to just ask

[1346:48]

is the sport that Oops, sorry, I deleted

[1346:50]

too much. Sport equals actually, let's

[1346:53]

just tighten it up further. Sport does

[1346:54]

not exist. So let's do request.form.get

[1346:57]

quote unquote sport. So if the sport

[1346:59]

that the human typed in or selected from

[1347:02]

the drop down somehow is not in this

[1347:04]

global list of possible sports, well

[1347:07]

then it's a failure. Don't let Kelly or

[1347:08]

whoever register instead. But if I now

[1347:11]

have this global variable, I can be a

[1347:12]

bit smarter in my template. I don't need

[1347:14]

to manually write out all three of these

[1347:17]

sports here. Instead, I think I can be

[1347:19]

smart about this. And when I render

[1347:21]

index.html itself, why don't I just pass

[1347:24]

in a variable called sports for

[1347:26]

instance, set it equal to the value of

[1347:28]

that global array. And then in my

[1347:30]

template, and here's where templating

[1347:32]

again gets interesting and starts to

[1347:34]

save you time. Let me go into

[1347:35]

index.html, HTML delete all but the se

[1347:39]

default value the blank one and do

[1347:41]

something like this. Ginger it turns out

[1347:43]

also supports loops like Python for

[1347:45]

sports in sports using the curly braces

[1347:48]

and the percent signs. I can now

[1347:50]

dynamically generate options as many as

[1347:53]

I want. So option value equals quote

[1347:55]

unquote the current sport close uh quote

[1347:58]

there close bracket sport. So it's a

[1348:01]

little redundant but again this is just

[1348:03]

how HTML is. This is what the human

[1348:05]

sees. This is the value that gets

[1348:07]

submitted to the server in case you want

[1348:08]

one to differ from the other. And then

[1348:10]

below that option line, I can say end

[1348:13]

for which is a bit weird, but that's how

[1348:15]

it works in Ginga to stop that loop. So

[1348:17]

this is kind of powerful. Now if I have

[1348:19]

three sports, 30 sports, all of the

[1348:22]

options will be dynamically generated by

[1348:24]

this template. And so now we're starting

[1348:26]

to save ourselves time and I can

[1348:28]

centrally manage all the sports by just

[1348:30]

updating this global list here in

[1348:32]

app.py. So, let's go back to the

[1348:33]

browser, uh, back to the form, reload,

[1348:36]

and you'll see that the drop-down

[1348:38]

thankfully still works the same way, but

[1348:40]

all of those options were dynamically

[1348:42]

generated. Indeed, if I view page source

[1348:44]

from my browser, you'll see, and there's

[1348:46]

some extra whites space there because

[1348:47]

the loop was adding some whites space on

[1348:49]

each iteration, I still have the three

[1348:51]

sports, but not volleyball, as was my

[1348:53]

intention. So now if uh if Kelly even

[1348:56]

tries hacking this version of the site

[1348:58]

by going in here and select and typing

[1349:00]

in volleyball manually registering the

[1349:02]

logic will still catch it because only

[1349:04]

those three sports are in that array. So

[1349:06]

it's perfectly fine for me now to

[1349:08]

register for basketball because it's

[1349:10]

among the sports sorry in that list not

[1349:13]

array questions on any of these here

[1349:17]

techniques.

[1349:20]

All right how about another type of

[1349:21]

form? So, select menus are nice, but you

[1349:24]

also might see radio buttons on

[1349:25]

websites, which are the mutually

[1349:26]

exclusive little circles that you can

[1349:28]

select to choose one or another option.

[1349:30]

Uh, let me go back to index.html and

[1349:33]

just show you how those can be created

[1349:34]

as well. Instead of using a select menu,

[1349:37]

turns out we can create a whole bunch of

[1349:39]

inputs uh of radio type type as follows

[1349:42]

uh as of radio button type as follows.

[1349:44]

for each sport. So for sport in sports,

[1349:48]

let's go ahead and output

[1349:50]

in between this tag and the N4 the

[1349:53]

following input type equals radio

[1349:59]

uh and let's give it a name. The name of

[1350:01]

this radio box is going radio uh button

[1350:04]

is going to be sport and the value of

[1350:06]

the current input is going to be quote

[1350:08]

unquote sport. And the word that the

[1350:12]

human's going to see is as before sport.

[1350:14]

So notice it's just another type of

[1350:16]

input. Previously we've seen text for

[1350:19]

instance two lines above. We also saw

[1350:21]

last time search. We saw email. There's

[1350:23]

a bunch of text input types. This one

[1350:26]

though is going to display as a radio

[1350:28]

button instead. And the human is going

[1350:29]

to see this label here. If I now go back

[1350:31]

to my other browser tab and click back,

[1350:34]

click reload on the form. I should see

[1350:36]

it's not pretty, but it's a radio button

[1350:39]

in the sense that these are mutually

[1350:40]

exclusive. How does the browser know

[1350:42]

that I should only be allowed to select

[1350:44]

one of them? Well, because I use the

[1350:46]

same name for each of those radio

[1350:48]

buttons. It knows that means mutual

[1350:50]

exclusivity. In fact, if I view page

[1350:52]

source in the browser, you'll see that

[1350:54]

all three of the inputs that were

[1350:56]

dynamically generated, type equals

[1350:57]

radio, type equals radio, type equals

[1350:59]

radio, also have identical names. And so

[1351:02]

that's just how that works. And that's

[1351:04]

the only change necessary. If I now go

[1351:06]

ahead and type in my name, David

[1351:07]

Basketball, click register, we're still

[1351:10]

up and running because what the server

[1351:12]

gets is still exactly the same inside of

[1351:15]

request.form.

[1351:17]

They can access. You can still access

[1351:19]

name or sport no matter what type it was

[1351:23]

in the user's own browser.

[1351:27]

Questions on these techniques?

[1351:31]

All right. Right. Well, it's kind of

[1351:32]

obnoxious that when you don't do

[1351:34]

something right in this website, like

[1351:35]

forget your name, but do select a sport,

[1351:37]

all you are told is generically you are

[1351:39]

not registered. Like, it'd be nice and

[1351:41]

much more userfriendly, better UX, user

[1351:43]

experience, so to speak, to actually

[1351:45]

tell the user what's wrong so they can

[1351:46]

actually fix the problem. Now, there's a

[1351:47]

bunch of ways we can do this, but I'm

[1351:49]

going to propose that we go ahead and do

[1351:51]

this. Let's create a template called

[1351:53]

error.html, whose purpose in life is

[1351:56]

just to tell the user a little something

[1351:57]

more about what they did wrong. So, I'm

[1351:59]

going to go back into my terminal window

[1352:01]

here. I'm going to code up a file called

[1352:03]

error.html.

[1352:05]

Enter. And I'm going to go ahead and

[1352:07]

before as before extend uh layout.html,

[1352:12]

learning from my past mistakes and

[1352:13]

closing that quote. Then I'm going to go

[1352:15]

ahead and do body block down here. And

[1352:18]

then inside of this block body, I'm

[1352:21]

going to go ahead and have just some

[1352:22]

simple text like an H1 tag that just

[1352:25]

says error to the user. then a paragraph

[1352:27]

tag that's going to contain some error

[1352:29]

message to be determined. Uh and then uh

[1352:33]

that's it for now. So I've got the

[1352:34]

template for an error message screen.

[1352:36]

Let me go back into app.py now and let

[1352:39]

me add some logic because app.py does

[1352:41]

know what's wrong. It's just at the

[1352:43]

moment we're very generically returning

[1352:44]

a failure template instead of something

[1352:47]

more precise. But if I know that the

[1352:49]

user hasn't given me their name, well

[1352:50]

let me say that error message. So, let's

[1352:53]

actually get rid of these two lines and

[1352:54]

be a little more specific like this. So,

[1352:57]

if or how about let's do it like this.

[1353:00]

How about validate the user's name

[1353:02]

first? So, name equals request.form.get

[1353:05]

quote unquote name. That just gives me a

[1353:07]

variable containing the user's name. If

[1353:09]

they didn't give me a name, which I can

[1353:11]

express with just if not name, like if

[1353:13]

name is blank or none, then let me go

[1353:16]

ahead and return render template of that

[1353:19]

error template. But let's pass in a

[1353:20]

specific message like missing name. And

[1353:24]

so by passing in another argument to

[1353:26]

this template called message, I can

[1353:29]

trust that Flask will dynamically output

[1353:31]

that message where I tell it to using

[1353:33]

the old curly braces. Meanwhile, let's

[1353:36]

go ahead and validate not just the name,

[1353:37]

but validate uh sport. I can do this in

[1353:41]

a couple of ways. Let's do this. So

[1353:42]

sport equals request.form.get quote

[1353:45]

unquote sport. Then in here, let's say

[1353:48]

if there's no sport, go ahead and return

[1353:51]

render template quote unquote

[1353:53]

error.html,

[1353:56]

message equals missing sport. So quite

[1353:58]

like name. But we can be more specific

[1354:00]

now, too. If the sport they did give me

[1354:03]

is not in the global sports list, well

[1354:06]

then it's Kelly trying to register for

[1354:08]

volleyball again. So let's return render

[1354:10]

template of error.html, HTML, but this

[1354:13]

time the message shall be invalid sport

[1354:16]

or something like that. So, we're being

[1354:18]

ever more clear otherwise they are

[1354:20]

presumably confirmed because we got this

[1354:23]

far logically. So, if I go back to the

[1354:25]

other browser tab, go back to the form

[1354:27]

and let's go ahead and type in no name

[1354:29]

and just click register.

[1354:32]

Okay, what did I do wrong accidentally?

[1354:34]

So, let's go back to VS Code, open my

[1354:38]

terminal, open the first terminal window

[1354:39]

where Flask run is running. un

[1354:42]

encountered unknown tag body. So I did

[1354:44]

something stupid in error.html.

[1354:47]

So let's go into error.html

[1354:50]

and uh body block. Oh, that's subtle.

[1354:58]

I just transposed the words. It's

[1354:59]

supposed to be block body. That was

[1355:01]

dumb. All right. Block body. I think

[1355:03]

that's correct. So let's go back to the

[1355:05]

browser. Let's reload. It's prompting me

[1355:07]

to reconfirm that I want to submit the

[1355:10]

exact same form which recall had no name

[1355:11]

and no sport. But now I see an error in

[1355:14]

a good way. This is not an uh server

[1355:17]

error. This is my error. Missing name.

[1355:19]

Now it's not super user friendly, but

[1355:20]

it's at least more explanatory than you

[1355:22]

are not registered. All right, let's go

[1355:24]

back. Let's give it a name, but no

[1355:25]

sport. Register. Ah, missing sport.

[1355:28]

Let's go back. Uh, let's go ahead and

[1355:30]

give it a sport, but uh a sport, but no

[1355:34]

name. Missing name as before. And if I

[1355:37]

took the time to actually hack the HTML

[1355:38]

and do what Kelly did before and add

[1355:40]

volleyball, it would similarly say

[1355:41]

invalid sport in this case, too, because

[1355:44]

it's not in that same list.

[1355:47]

All right, questions on this technique.

[1355:53]

All right. Well, it's all fine and good

[1355:54]

to have a registration site that does

[1355:55]

this, but it's literally just throwing

[1355:57]

out the information. And what I did like

[1355:58]

years ago was actually even cut a corner

[1356:00]

initially where I think I wrote code

[1356:02]

that just sent an automatic email to the

[1356:04]

proctor running frost IM containing the

[1356:06]

person's name and the sport for which

[1356:07]

they registered. But that was very

[1356:09]

quickly replaced by a better feature

[1356:11]

which is actually store the data in the

[1356:14]

server itself and keep track of it

[1356:15]

rather than just send it off via email.

[1356:17]

So let's do a first pass at actually

[1356:19]

storing information on everyone who has

[1356:21]

registered for sports. Well, well, let

[1356:23]

me go up here and let me create another

[1356:25]

global variable to make my life easier

[1356:26]

here called registrance and set this

[1356:29]

equal to curly brace close curly brace.

[1356:31]

What do these two characters represent

[1356:34]

if empty especially?

[1356:37]

What data type is this? It's a

[1356:39]

dictionary. So, it's a Python dict. So,

[1356:41]

you could similarly say dict explicitly

[1356:44]

open close pen. But it's more Pythonic

[1356:46]

generally to just use two curly braces.

[1356:48]

This is just giving me an empty

[1356:49]

dictionary. Why? Well, I want to store

[1356:51]

the two things I'm se collecting about

[1356:53]

all of the students, their name and the

[1356:55]

sport for which they registered. So, key

[1356:57]

value, name sport. So, how can I go

[1356:59]

about doing this? Well, it's pretty

[1357:01]

trivial. Down here in my register

[1357:03]

function, recall that I'm just kind of

[1357:05]

naively saying you're registered even

[1357:07]

though I'm not doing anything with their

[1357:08]

name or sport. But that's easy. Let's

[1357:11]

remember the student for real now. So in

[1357:14]

that registrance uh uh dictionary, let's

[1357:18]

go ahead and index into it using the

[1357:20]

student's name, David or Kelly or

[1357:22]

whoever, and set that equal to the sport

[1357:24]

for which they registered. And now

[1357:26]

notice the name is coming as before from

[1357:28]

request.form.get.

[1357:30]

The sport is similarly coming from that

[1357:32]

function. And so this is just

[1357:33]

remembering that key value pair. So

[1357:36]

that's all fine and good. It's in the

[1357:38]

computer's memory. How do we actually

[1357:39]

see it? Well, wouldn't it be nice after

[1357:41]

you register if you could see the actual

[1357:44]

registrance of the website? Um, uh,

[1357:47]

certainly if you're the proctor trying

[1357:48]

to run the sports. Well, yes. So, let's

[1357:50]

go down here and let's create another

[1357:52]

route like /registrants, which is just

[1357:56]

going to give me a list of everyone

[1357:57]

who's registered. Let's define a

[1357:58]

function called registrants, though I

[1358:00]

could call it anything I want. And this

[1358:01]

one's going to be relatively simple.

[1358:03]

Let's render a template called

[1358:06]

registrants which will soon exist and

[1358:08]

pass in all of the registrants that are

[1358:11]

in that global dictionary. And again I

[1358:14]

can call this placeholder anything I

[1358:16]

want but in so far as it contains the

[1358:17]

registrance I'm setting registrance

[1358:19]

equal to the registrance global

[1358:21]

dictionary. So let's go now into my

[1358:24]

terminal window and create

[1358:25]

registrance.html HTML and create really

[1358:28]

the beginnings of an actual frostim's

[1358:30]

website that's going to show the proctor

[1358:32]

who has now registered. So let me go

[1358:35]

into this terminal and do code of

[1358:37]

registrance.html

[1358:38]

and close the terminal. Let's try to get

[1358:40]

this right. Finally extends layout.html

[1358:44]

close quote uh close bracket there. Then

[1358:48]

let's do block body in the right order.

[1358:51]

Then end block down here. And then

[1358:54]

inside of the block here, this is going

[1358:56]

to be a bit more of a mouthful, but

[1358:57]

let's use some of our HTML from last

[1358:58]

week. We'll give an H1 tag that says

[1359:00]

registrance so the proctor knows what

[1359:02]

they're looking at. Then let's put this

[1359:03]

in a table for instance with two

[1359:05]

columns, names and sports. So table tag

[1359:08]

followed by a T head tag for the table

[1359:10]

heading. Uh then that heading is going

[1359:12]

to contain just a single row for TR. And

[1359:15]

each of those has a th table heading. Uh

[1359:18]

one of which, and actually I'll make it

[1359:19]

tighter is name. The other of which is

[1359:21]

going to be sport. So these are the

[1359:23]

column headings, the table headings, TH

[1359:25]

tags for short. After the head of the

[1359:28]

table, let's go ahead and do a T body

[1359:30]

for table body. And inside of here, this

[1359:32]

is where Ginga comes in use. I can say

[1359:35]

for each name in the registrance

[1359:38]

placeholder that was plugged in and for

[1359:40]

proactively, what do I want to do on

[1359:42]

each iteration? Well, I think want to

[1359:44]

output table row, table row, table row.

[1359:46]

And in here I can do TR and then inside

[1359:49]

of that a table data for the cell on the

[1359:51]

left putting in the student's name which

[1359:53]

is coming from this for loop just like

[1359:55]

in Python. And then one more table data

[1359:58]

namely the registrance uh placeholder

[1360:02]

indexed into at that name which because

[1360:04]

it's a dictionary will give me the sport

[1360:06]

for that student's name. And then I

[1360:09]

think we're good to go. And in fact,

[1360:10]

just to hark back to something I said

[1360:12]

last week when we were imagining,

[1360:14]

actually this is in week five when we

[1360:15]

were talking about stacks and like your

[1360:16]

Gmail or Outlook inbox is essentially a

[1360:18]

stack with the newest emails on top. And

[1360:20]

I hypothesized at the time that it's

[1360:22]

just row after row after row after row

[1360:24]

when we started talking last week about

[1360:25]

HTML. Here is what Google and Microsoft

[1360:28]

and others are probably doing. Anytime

[1360:29]

you have tabular information in a page,

[1360:32]

they've got some data in memory like the

[1360:34]

registrants and they're just using code

[1360:36]

like this in Ginger to output table row,

[1360:38]

table row, table row. Imagine this is

[1360:40]

your email instead. Same exact idea. And

[1360:43]

now we have the ability to express that

[1360:45]

kind of logic. So let's go back now into

[1360:48]

the browser. Click reload on the form.

[1360:51]

Let's register for instance David for

[1360:54]

basketball. Click register. It claims

[1360:56]

I'm registered. But hopefully now I'm

[1360:59]

legitimately registered because that

[1361:01]

variable is storing it in memory. And in

[1361:03]

fact, let's go ahead and go now to not

[1361:06]

slregister, but I'll zoom in at the top

[1361:09]

registrance and hit enter. And we will

[1361:12]

see a very ugly but functional HTML

[1361:15]

table containing two columns name and

[1361:17]

sport. The so-called t head with which

[1361:19]

David and basketball are present.

[1361:21]

Moreover, if we now go back to that form

[1361:24]

and let's try registering Kelly for

[1361:26]

instance for soccer. Click register. Now

[1361:28]

let's manually go to registrants again.

[1361:31]

Now Kelly and David are in the server's

[1361:35]

memory as well.

[1361:38]

Questions then on what this example is

[1361:42]

now doing or how it's achieving these

[1361:44]

results? Yeah.

[1361:49]

>> Really good question. If you wanted to

[1361:50]

restrict the registrance page to only

[1361:52]

certain people, ideally you would have a

[1361:53]

password on it. Um, and in fact, one of

[1361:55]

the next examples we'll do in a few

[1361:56]

minutes is a a login page for exactly

[1361:59]

that reason. Right now, just sort of on

[1362:00]

the honor system that only the proctor

[1362:02]

in question goes to this URL. But just

[1362:05]

for the sake of discussion actually,

[1362:07]

suppose that you did want the

[1362:08]

registration list to be public if only

[1362:10]

to like hype up who has already

[1362:11]

registered. Well, it's not you good to

[1362:14]

just tell people go to the /registers

[1362:16]

URL. We can actually link them to that

[1362:18]

in a few different ways. So for

[1362:19]

instance, I can go down to uh how about

[1362:23]

uh let's say success.html.

[1362:26]

So let me open up success.html.

[1362:29]

It just says you are registered. I can

[1362:31]

do something like this. Um a href equals

[1362:35]

/registrance. So I have control now over

[1362:38]

my HTML and the routes. So slregistrance

[1362:40]

will exist. Uh see who

[1362:44]

else registered. Period. So, this will

[1362:47]

create a nice little HTML link that

[1362:49]

links me to that route. So, let's try

[1362:50]

this. So, let's go back to the form over

[1362:53]

here. Uh, let's go ahead and register

[1362:56]

John for ultimate frisbee and register.

[1363:00]

All right. And now we see you are

[1363:01]

registered. See who else registered. And

[1363:02]

if I hover over this, it's super small,

[1363:04]

but it would have showed me in the

[1363:05]

bottom left corner at the link. And

[1363:06]

indeed, here now is John at the bottom

[1363:09]

of this table. And just to be clear, if

[1363:10]

I view page source on the browser, you

[1363:12]

see all of the TRS that we dynamically

[1363:15]

generated on the server side before they

[1363:18]

were sent as such to the browser. All

[1363:21]

right. What if we wanted to do something

[1363:23]

slightly more elegant here? Well, I

[1363:25]

don't have to just use this HTML hack

[1363:27]

like why don't I just show the user who

[1363:29]

has registered automatically. And this

[1363:30]

is kind of a cool feature of web apps as

[1363:33]

well. In addition to importing flask

[1363:36]

render template and request, I'm going

[1363:37]

to also import a function called

[1363:39]

redirect that comes with flask. And

[1363:41]

indeed, rather than just show

[1363:44]

success.html,

[1363:45]

I'm going to go ahead and return the

[1363:47]

result of redirecting the user to

[1363:50]

/registrance. So to be clear, I'm in my

[1363:53]

register route, and instead of showing

[1363:55]

them the success page anymore, which I

[1363:56]

might as well delete at this point, just

[1363:58]

going to redirect them to this list of

[1364:00]

everyone who is registered, including

[1364:02]

themselves. So, if I go back over here

[1364:04]

and type in someone like Doug, who maybe

[1364:05]

will play basketball with me, and click

[1364:07]

register, watch what happens to the URL

[1364:10]

at the very top of the screen, I'm

[1364:12]

automatically whisked away to

[1364:15]

registrance in this case. Um, I made a

[1364:17]

change to the code though, and so the

[1364:18]

server actually was smart enough to

[1364:20]

reload. So, Doug is now uh the only one

[1364:22]

in the database. And this actually hints

[1364:24]

at a problem we should really solve.

[1364:26]

Like, in fact, let's do this real fast.

[1364:28]

Let me go ahead and register myself

[1364:29]

again for basketball. Register. Now,

[1364:30]

it's Doug and David. The catch though is

[1364:33]

if this server ever goes offline, maybe

[1364:35]

because it needs to be updated or it

[1364:37]

crashes or it reboots, when you hit

[1364:41]

control C and get back to your terminal,

[1364:43]

Flask server is no longer running, which

[1364:45]

means that global variable called C

[1364:46]

registrance in all caps is gone. It's

[1364:49]

like free. The memory has been freed.

[1364:51]

So, if I were to rerun Flask now, as

[1364:54]

would happen automatically if the server

[1364:55]

itself rebooted, well, this is not great

[1364:57]

because if I go back to the registrance

[1364:59]

page and click reload, no one has

[1365:02]

registered. And in fact, that's what

[1365:03]

happened with Doug a moment ago because

[1365:04]

I changed my actual app.py, Flask was

[1365:07]

smart enough to realize, oh wait, the

[1365:08]

code has changed. I better reload the

[1365:09]

program, which gave me a brand new

[1365:12]

version of that global

[1365:15]

dictionary. So what would be better

[1365:17]

clearly than storing registrants in

[1365:20]

memory in RAM in a variable in the

[1365:23]

server?

[1365:25]

Yeah. Yeah. So in an actual database and

[1365:29]

so here's two where everything kind of

[1365:30]

comes full circle and connects again. So

[1365:32]

let me go back into uh app.py here. And

[1365:36]

I like generally the logic of what I've

[1365:38]

done. I don't like the fact that I'm

[1365:39]

just storing my registrance inside of

[1365:42]

this global variable, which is again

[1365:44]

just in the computer's volatile memory.

[1365:46]

Let's actually put this in a database

[1365:48]

instead. So, let me go up here and get

[1365:50]

rid of this global dictionary and let me

[1365:53]

do something a little smarter up here.

[1365:55]

Let me import from CS50's own library

[1365:57]

the SQL function that we've used before.

[1365:59]

And again, even though we've been taking

[1366:00]

off all almost all of CS50's training

[1366:02]

wheels, the reality is using CS50's SQL

[1366:05]

library, even through final projects,

[1366:06]

just makes using SQL in Python so much

[1366:09]

easier. But there's certainly thirdparty

[1366:10]

libraries you can use. Um, let me go

[1366:13]

down now and in addition to creating my

[1366:15]

app, let's create a database, DB for

[1366:17]

short, setting that equal to SQLite, and

[1366:19]

then SQLite SL, which is not a typo. And

[1366:23]

let's assume that the database shall be

[1366:24]

called frost imdb. More on that in a

[1366:27]

moment. And then down here, now that I

[1366:30]

have a database variable, let's not

[1366:32]

remember the student by storing them in

[1366:34]

this dictionary. Let's actually execute

[1366:36]

a line of SQL. So, db.execute

[1366:40]

insert into Well, wait a minute. What am

[1366:43]

I going to insert them into? Not to

[1366:45]

worry. I came prepared for this. So, let

[1366:46]

me go ahead and maximize my terminal

[1366:48]

window and then run SQLite 3 of a file

[1366:50]

called frost imdb. And this is a file I

[1366:53]

made in advance, but it's super simple.

[1366:55]

In fact, if I type dot schema just to

[1366:57]

see the design of this database, you'll

[1366:59]

see that in advance I created a table in

[1367:02]

this database called registrance. It has

[1367:04]

a column called ID, a column called

[1367:06]

name, and a column called sport. And the

[1367:08]

primary key of this table is to use the

[1367:11]

ID value which is just an integer. And

[1367:13]

now notice I have some constraints here.

[1367:15]

I want the user to give me a name and a

[1367:17]

sport. So I've specified that it's not

[1367:19]

just text, it's not null. That is null

[1367:22]

values should not be possible to put in

[1367:23]

here. All right. So, let me go ahead and

[1367:25]

exit out of SQLite 3. Let me go back

[1367:28]

into uh my code editor here. And now I

[1367:31]

know what to insert into. Insert into

[1367:33]

the table called registrance. What?

[1367:35]

Well, I want to insert how about a name

[1367:38]

of the student and the sport for which

[1367:40]

they registered. And the values

[1367:41]

therefore that I want to insert are

[1367:43]

going to be whatever they came from the

[1367:46]

post request. Here's where you do not

[1367:48]

want to make yourself vulnerable to SQL

[1367:50]

injection attacks. No fst strings in

[1367:51]

here. you know, just plugging the

[1367:53]

students input in blindly. This is where

[1367:55]

and why we use these placeholders in

[1367:57]

both CS50's library and in many

[1367:59]

libraries uh in the real world to

[1368:01]

specify that I want the library to

[1368:04]

properly sanitize the user's input and

[1368:06]

get rid of any scary characters like

[1368:07]

apostrophes or semicolons or the like.

[1368:10]

So, I'm going to pass in name and sport.

[1368:13]

And this one line has the effect of, as

[1368:15]

you recommended, storing the

[1368:17]

registration in an actual database on

[1368:20]

the server, not just in volatile

[1368:22]

temporary memory. But we do have to

[1368:24]

change one thing. This line here is no

[1368:26]

longer valid because there's no global

[1368:28]

variable there via which we can get all

[1368:30]

of the registrants. But that's no big

[1368:32]

deal. Here's how most web apps would do

[1368:34]

this. I'm going to define a variable

[1368:35]

called registrance and set it equal to

[1368:38]

DB execute of select star from

[1368:42]

registrance. It's as easy as that to

[1368:44]

just get all of the registrants from my

[1368:46]

database. And down here, there's no

[1368:48]

longer an all capitalized variable, but

[1368:49]

there is a lowercase one registrance.

[1368:52]

So, to be clear, in my register route, I

[1368:55]

am inserting the user into the database.

[1368:57]

And in my registrance route, I am

[1368:59]

selecting the users from the database.

[1369:01]

And then the rest of the code, I think,

[1369:02]

can stay the same. So, let's go back to

[1369:04]

fro's here. Go back to the form. Let's

[1369:07]

register David for basketball register.

[1369:11]

Ah, I did screw up. You're seeing some

[1369:14]

weirdness here. What are you actually

[1369:15]

seeing? There's one user registered. Not

[1369:19]

intentional. But what does this syntax

[1369:21]

suggest? We're looking at this is a

[1369:23]

dictionary. Recall that the db.execute

[1369:25]

method that comes with CS50 SQL library

[1369:27]

gives you a list of dictionary objects.

[1369:30]

And so because there's only one

[1369:31]

registrant at the moment, you're seeing

[1369:33]

my dictionary for my registration, which

[1369:35]

is not what I want to show here. And I

[1369:37]

forgot. I need to also go back into the

[1369:39]

registrance

[1369:41]

uh template to tweak my syntax as

[1369:44]

follows. Let me go back into VS Code

[1369:47]

here. Let me go into registrance.html.

[1369:50]

And because I am passing in now not a

[1369:54]

dictionary but a list of dictionaries, I

[1369:57]

just need to think about the problem a

[1369:58]

little bit differently. So my syntax

[1370:00]

here is going to be for each uh let's do

[1370:03]

this as follows.

[1370:06]

For each registrant

[1370:09]

in that registrance list of

[1370:12]

dictionaries, go ahead and display the

[1370:15]

current registrance name and go ahead

[1370:18]

and display the current registrance

[1370:22]

sport. In other words, I'm using Python

[1370:24]

syntax which works as well in Ginga

[1370:26]

here. This iterates over the list of

[1370:28]

registrants each of which is a

[1370:30]

dictionary. So I'm using dictionary

[1370:31]

syntax now to index into the name key of

[1370:34]

the registrant dict uh object and the

[1370:36]

sport key of the same. So now let me go

[1370:41]

back to my browser and I'm just going to

[1370:43]

go ahead and reload the registrance page

[1370:45]

without resubmitting the form. Now there

[1370:48]

it is. David and basketball. And now

[1370:49]

let's go back to the form and register a

[1370:51]

couple more people. Kelly for soccer

[1370:53]

register. Notice we're at the

[1370:55]

registrance link. Kelly is indeed still

[1370:57]

registered. Let me go back to this and

[1370:59]

let's register John. Ultimate Frisbee

[1371:01]

register. Let's go ahead and kill the

[1371:04]

Flask server by going to my first

[1371:06]

terminal window. Uh, control C. And now

[1371:09]

let me go ahead and rerun Flask, which

[1371:11]

was bad before. That's how Doug ended up

[1371:13]

the only registrant last time. But this

[1371:15]

time if I go back to the registrance

[1371:17]

page and immediately click reload, even

[1371:19]

though the server is running a new in

[1371:21]

memory, the database is persistent,

[1371:23]

which was the whole point of using SQL

[1371:25]

from week uh seven onward. And let's do

[1371:29]

one more for good measure. If I go back

[1371:30]

to the form, we'll register Doug so he

[1371:33]

can play basketball with me, too. And we

[1371:35]

even have Doug now in the database. It's

[1371:37]

an ugly looking table, but the data is

[1371:39]

in fact all there.

[1371:42]

All right, questions now on this

[1371:45]

improvement which is getting closer and

[1371:47]

closer to what the actual Frostim's

[1371:49]

database did uh website did so many

[1371:53]

years ago.

[1371:57]

All right. Well, let me propose this

[1371:58]

now. We have this table of registrants.

[1372:00]

Suppose that um maybe uh Kelly was not a

[1372:05]

very sportsman like when she played

[1372:07]

soccer last time. So, we want to

[1372:08]

dregister Kelly from soccer. That is

[1372:10]

nope. we're going to reject your

[1372:11]

registration. Let's think for a moment

[1372:13]

about the design here. Like, here's an

[1372:15]

HTML table containing names and sports.

[1372:18]

And wouldn't it be nice if we could add

[1372:19]

a button that would let me dregister

[1372:21]

Kelly or anyone for that matter? When I

[1372:24]

click on that button, what information

[1372:26]

should ideally be sent from the browser

[1372:28]

to the server to remove someone like

[1372:30]

Kelly from the database?

[1372:32]

>> ID.

[1372:33]

>> Yeah. The ID of the person. And you're

[1372:35]

proposing ID instead of name. Why? the

[1372:38]

ID uniquely identifies in that SQL

[1372:41]

table.

[1372:42]

>> Exactly. The ID uniquely identifies the

[1372:44]

user in the SQL table. So, in fact,

[1372:46]

let's see this real quick. If I go back

[1372:47]

to VS Code and we'll revisit essentially

[1372:49]

a week seven issue here. Let me go back

[1372:51]

into my second terminal where I can

[1372:53]

again run SQLite 3 after maximizing my

[1372:56]

terminal. And before I just wrote schema

[1372:58]

to see what the table is. Now I'm going

[1373:01]

to literally run select star from

[1373:03]

registrance in SQLite 3 and we'll see a

[1373:06]

little askar table of all four of us who

[1373:09]

registered but we also see the unique ID

[1373:11]

and the value of the unique ID recall

[1373:12]

from week seven is that it's the

[1373:14]

so-called primary key. It is the value

[1373:16]

that uniquely identifies users as

[1373:18]

minimally as possible and that's a good

[1373:20]

thing because if we have another Kelly

[1373:21]

registering for frost IM's we don't want

[1373:23]

to dregister the wrong Kelly or both

[1373:25]

Kelly's we want only the Kelly with ID

[1373:28]

of two. So somehow the button we add to

[1373:32]

the registrance page should contain in

[1373:34]

it the ID of the person we want to

[1373:37]

delete. Because if you do pass the ID of

[1373:40]

the person that you want to delete to

[1373:41]

the server, the server can do some kind

[1373:43]

of select looking or some kind of delete

[1373:46]

statement using that ID number and

[1373:48]

delete just that row. So there's a few

[1373:51]

ways we can do this, but let me propose

[1373:53]

that we proceed as follows. in our

[1373:56]

registrance route, which is where we can

[1373:58]

currently see all of these users. Let's

[1374:01]

go ahead and output an ugly but

[1374:02]

functional form for each of those users.

[1374:05]

So, let me go ahead and uh minimize this

[1374:08]

and hide my terminal window. And in

[1374:10]

registrance, let's go ahead and just do

[1374:12]

this. In addition to outputting every

[1374:15]

registrance name and sport, let's also

[1374:17]

output a third column whose purpose in

[1374:19]

life is to contain an HTML form. The

[1374:22]

action of that form will be a route like

[1374:24]

dregister and the method we're going to

[1374:26]

use is going to be post just so that we

[1374:29]

don't accidentally store uh personally

[1374:31]

identifying information in a URL or

[1374:33]

such. This form is going to have a

[1374:35]

button the type of which is submit and

[1374:38]

the button is going to say dregister.

[1374:41]

And I could now implement the ID in a

[1374:43]

couple of ways. I could do input name

[1374:46]

equals ID, type equals text. And now if

[1374:51]

I go back to my other browser tab and

[1374:53]

reload, I should see a button for every

[1374:55]

one of these registrants. And I do. But

[1374:58]

this is kind of like the honor system

[1375:00]

where I just let the user type in the ID

[1375:02]

of who they want to delete. And it's

[1375:04]

sort of weird that I have multiple forms

[1375:05]

in that case. But here is where

[1375:07]

dynamically generating HTML can get

[1375:09]

pretty uh useful. Let's change the type

[1375:12]

of this input to hidden and set the

[1375:16]

value of this uh input to be whatever

[1375:20]

the current registrance ID actually is.

[1375:24]

Uh storing this in here and let's go

[1375:27]

ahead and not confuse this. So we'll use

[1375:29]

single quotes on the outside instead. So

[1375:32]

inside of this value I'm putting the

[1375:35]

current user's ID. So, if I go back now,

[1375:38]

notice that the text boxes are going to

[1375:39]

disappear, but the buttons will not. But

[1375:42]

all of that information is still there.

[1375:43]

If I right click or control-click and

[1375:45]

open up my developer, uh, let's open up

[1375:47]

view page source because it's just a bit

[1375:48]

bigger. Notice that David and Kelly and

[1375:51]

John and everyone else here has the same

[1375:53]

HTML as before, plus another column

[1375:56]

containing a form that contains a I

[1375:59]

somehow messed up still. Why is this

[1376:02]

blank? So, this is still not good.

[1376:09]

Ah, thank you. I accidentally pluralized

[1376:12]

this, but it should be registrant

[1376:14]

because I'm inside of this for loop and

[1376:16]

each iteration gives me a variable

[1376:18]

called registrance. So, user error on my

[1376:20]

part. So, let's go ahead and

[1376:21]

dramatically do this again. Let me view

[1376:22]

page source of the same page. Scroll

[1376:24]

down a bit. Thankfully, there is now for

[1376:27]

every one of these registrants a hidden

[1376:30]

ID for one for me, two for Kelly, and I

[1376:33]

bet if we keep scrolling, we'll see

[1376:34]

three for John, and four for Doug. So,

[1376:37]

now this form has enough information,

[1376:39]

even though there's no user input other

[1376:40]

than the clicking of the button to tell

[1376:42]

the server whom to delete. So, how do we

[1376:45]

delete the user from that particular

[1376:48]

registration table? Well, I think we

[1376:50]

just need to add a route. So, let me go

[1376:51]

back into VS Code here into app.py and

[1376:56]

let's go ahead and create another route

[1376:58]

for instance uh in here say uh we'll put

[1377:01]

it up here below uh up here below index.

[1377:04]

So, app.root quote unquote slash

[1377:07]

dregister whoops dregister and now

[1377:10]

defregister

[1377:12]

but I could call it anything I want. And

[1377:14]

how do I do this? Well, let's first get

[1377:15]

the ID from the form. ID equals

[1377:18]

requestform.get get quote unquote ID.

[1377:21]

Let's do a bit of a sanity check here.

[1377:22]

So if there is an ID and it's not blank

[1377:25]

for some reason, go ahead and do

[1377:27]

DB.execute

[1377:29]

delete from registrance where ID equals

[1377:35]

uh question mark. And now let's pass in

[1377:37]

the user's actual ID. And then no matter

[1377:40]

what, let's go ahead and redirect the

[1377:42]

user back to the registrance page so

[1377:44]

that we can hopefully see the result of

[1377:47]

that change. So again, I'm just using a

[1377:50]

bit of SQL per week 7. I'm using a

[1377:52]

placeholder by using the question mark,

[1377:53]

passing in the actual ID from the form.

[1377:55]

And I'm only doing this if there is an

[1377:57]

ID that was passed in. And I'm letting

[1377:59]

the database actually do the deletion.

[1378:02]

All right, so let's try to do this.

[1378:03]

Let's go back to the browser here.

[1378:05]

Reload the /registance page for good

[1378:07]

measure. Let's decree that Kelly is now

[1378:09]

dregistered by clicking this button. And

[1378:12]

oh, so close.

[1378:14]

method not allowed at the dregister

[1378:18]

route. What did I do wrong?

[1378:23]

Let me go back to the code. What's wrong

[1378:25]

with my dregister route?

[1378:30]

Well, what method is the form using? If

[1378:31]

I go back to registrance.html, the meth

[1378:34]

the form is using post.

[1378:37]

>> Yeah. So, I need to override the

[1378:38]

default, which is get. So, I need to go

[1378:40]

up here again and just change an

[1378:42]

argument to be methods equals and then

[1378:44]

in a list containing only post now

[1378:47]

instead of get. All right, let's go back

[1378:48]

to the form and go back. And now let's

[1378:50]

try to dregister Kelly. She's gone.

[1378:53]

Let's get rid of me now. I'm gone. And

[1378:55]

indeed, if I go back to VS Code, open my

[1378:58]

terminal, maximize it, and select star

[1379:01]

from registrance again, you'll see that

[1379:03]

the two of us are indeed gone in this

[1379:06]

case.

[1379:08]

questions now on this technique because

[1379:11]

now we have most of the plumbing in

[1379:13]

place for adding people to a database,

[1379:15]

deleting people from a database. It's

[1379:16]

very similar in spirit now to most any

[1379:18]

website that has this kind of

[1379:20]

interactivity.

[1379:23]

All right, subtle question. I

[1379:25]

deliberately in my

[1379:28]

registrance.html file uh used post as we

[1379:33]

just discovered instead of get. Why

[1379:36]

though? because it wasn't that strong an

[1379:38]

argument that I hinted at earlier of

[1379:40]

like, well, I don't want like Kelly's ID

[1379:42]

to end up in my URL bar or mine. Like

[1379:44]

IDs are not really personally

[1379:46]

identifiable. They're just opaque

[1379:48]

integers at the moment. But why would it

[1379:50]

be bad if you could delete people by

[1379:53]

using the get method?

[1379:56]

So this is kind of subtle but the catch

[1379:59]

with using get is that by definition you

[1380:02]

can visit that resource that route by

[1380:04]

just typing in a URL or following a

[1380:07]

hyperlink. So for instance if an

[1380:10]

adversary were to type a URL like

[1380:13]

/registrance question mark id equals oh

[1380:16]

I don't know uh four and then send me

[1380:20]

this URL in an email or send this URL in

[1380:22]

an email to the proctor who's running

[1380:24]

the frostam's program. If that proctor

[1380:26]

simply clicks naively on this link as my

[1380:29]

code is implemented now and I've used

[1380:31]

get instead of post, what's going to

[1380:33]

happen?

[1380:36]

>> Doug gets dregistered just because the

[1380:38]

proctor followed a link in their email.

[1380:40]

And this is hinting at the kinds of

[1380:42]

fishing attacks that are possible too.

[1380:44]

Bad design like generally when you are

[1380:46]

using get requests that is just simple

[1380:49]

URLs that are clickable or typable. They

[1380:51]

should not have the effect of changing

[1380:53]

data on the server. Post is much better

[1380:56]

if only because you can't just click a

[1380:58]

link and post happens. To induce a post

[1381:01]

request, you almost always have to click

[1381:03]

a button. So, at least this case, the

[1381:05]

proctor would receive an email. They

[1381:07]

would have to receive an email, click on

[1381:08]

a link, and then they would see a web

[1381:10]

page like this that clearly has a button

[1381:12]

labeled dregister or the like, which is

[1381:14]

an additional layer of protection. And

[1381:17]

there's even more attacks that you can

[1381:19]

wage by supporting get. So in general,

[1381:21]

post requests are preferred anytime

[1381:23]

there's anything remotely personally

[1381:26]

identifiable or remotely destructive

[1381:28]

like actually changing data on the

[1381:30]

database like this. All right. Well,

[1381:33]

what more can or should we do with fro

[1381:37]

perhaps? Well, let's see. Maybe one or

[1381:39]

so final flourishes here. Um, if I want

[1381:42]

to go ahead and maybe make those error

[1381:45]

messages a little more interesting.

[1381:46]

Let's do that for just a second. Let me

[1381:48]

go back to uh my uh other browser tab

[1381:51]

here. Let's go back to the registration

[1381:53]

page where the form is and let's

[1381:55]

deliberately not cooperate and just

[1381:57]

click register so that I get an error

[1381:58]

about missing name. Well, wouldn't it be

[1382:00]

nice if we made this a little more user

[1382:01]

friendly by including like an image on

[1382:03]

the page as is commonly the case? Well,

[1382:05]

we can certainly include images in

[1382:06]

websites using the image tag, but the

[1382:09]

catch is we actually have to be a little

[1382:11]

more clever about how we store the image

[1382:14]

on the server in order for this to work.

[1382:16]

So for instance, let me go into that

[1382:18]

error page. We don't need success open

[1382:20]

anymore and we don't need layout anymore

[1382:22]

or this index anymore. Let's focus on

[1382:25]

error. And suppose that I did want to

[1382:26]

include an an error message containing

[1382:28]

like a a grumpy cat on the screen. Well,

[1382:30]

ideally I would just do alt or I would

[1382:32]

do open bracket image uh source equals

[1382:35]

and then something like cat.jpeg where

[1382:38]

cat.jpeg is the name of a cat in this

[1382:41]

current folder. And just to be clear,

[1382:43]

let's have an alternative text of grumpy

[1382:45]

cat for screen readers or slow

[1382:47]

connections.

[1382:49]

Okay, this unfortunately is not going to

[1382:51]

work. Let's go over here and induce the

[1382:53]

same error by just reloading and

[1382:55]

submitting the same form. And you'll see

[1382:56]

indeed a broken image because that image

[1382:59]

that cat.jpeg does not exist, but we do

[1383:02]

at least see the alternative text. Well,

[1383:04]

I did come prepared with a cat already.

[1383:06]

And so, let me go ahead and grab this

[1383:09]

cat from another folder. And this cat is

[1383:11]

going to contain uh is going to exist in

[1383:14]

a file called cat.jpeg. And indeed, if I

[1383:16]

type ls now after having grabbed a copy

[1383:18]

of that cat, it exists alongside app.py.

[1383:21]

Seems good. Let's go back to the browser

[1383:23]

here. Let's reload. And we should see

[1383:27]

ah still no cat. Well, why is this?

[1383:29]

Well, this is a side effect of using the

[1383:30]

framework as well. It turns out for

[1383:32]

organizational sake, any images you want

[1383:35]

to display on a page or any CSS files or

[1383:37]

JavaScript files that you want to embed

[1383:38]

in a page, if they're static assets,

[1383:40]

should actually be in a folder called

[1383:42]

static. And by static, that just means

[1383:44]

unchanging. You or someone else wrote

[1383:46]

them once and they're not dynamic in the

[1383:47]

way that app.py is. So, I'm actually

[1383:49]

going to use my mv command and move

[1383:51]

cat.jpeg into the static folder. Indeed,

[1383:53]

if I type ls now, cat is gone, but it is

[1383:56]

in the static folder. And now if I go

[1383:58]

back over here, I think we'll be good

[1384:00]

except that I do need to go into

[1384:02]

error.html and say that the source of

[1384:04]

this image is actually in

[1384:05]

/static/cat.jpeg

[1384:08]

to make clear it's in that folder. And

[1384:10]

so indeed when I now reload the page

[1384:12]

once more now I see a very grumpy cat at

[1384:15]

least guiding my error message. A but

[1384:18]

there is a difference here. Even though

[1384:20]

when accessing the static directory I

[1384:22]

have to be explicit. Notice that this

[1384:24]

whole time we have never once mentioned

[1384:28]

the templates directory. The render

[1384:30]

template function to be clear knows

[1384:33]

automatically to look in the templates

[1384:34]

folder for your template. You do not and

[1384:37]

you should not say something like

[1384:38]

templates here. You simply specify the

[1384:41]

name of the file. But in the in the uh

[1384:43]

HTML template, you do actually have to

[1384:46]

include as I did /static in the HTML.

[1384:51]

All right, let's do one final flourish

[1384:53]

with the actual code. Suppose that it's

[1384:56]

time to modernize and let people

[1384:58]

register not just for one sport as per

[1385:00]

the radio buttons, but multiple sports.

[1385:01]

It's a little obnoxious to make me go

[1385:03]

back and fill out my name again and

[1385:04]

again and again if I want to register

[1385:06]

once, twice, three times for sports. So,

[1385:08]

why don't we uh go ahead and in terms of

[1385:11]

UI change those radio buttons to

[1385:12]

checkboxes? That's a very easy fix. Let

[1385:15]

me go into uh my templates folder and

[1385:18]

into index.html HTML where this form is.

[1385:20]

And if I want to change radio buttons to

[1385:22]

checkboxes, literally just change radio

[1385:24]

to checkbox. If I go back to the browser

[1385:26]

here and reload, you'll see the familiar

[1385:28]

checkboxes now, which are not mutually

[1385:30]

exclusive. It lets me check multiple

[1385:32]

ones, thereby registering for multiple

[1385:34]

sports at once. But my logic has to

[1385:36]

change a tiny little bit here whereby if

[1385:39]

I want to go ahead and get all of the

[1385:41]

sports for which the user is registered,

[1385:43]

well, that logic has to change in

[1385:44]

app.py. So where is my register route?

[1385:47]

Down here. And we haven't touched this

[1385:49]

in a while, but recall that the register

[1385:51]

route here has uh a validate name chunk

[1385:55]

of code, validate sport chunk of code,

[1385:57]

and we most recently did the insert into

[1386:00]

chunk of code as well. But if the user

[1386:02]

is registering for multiple sports, I'm

[1386:05]

okay with having one row per sport, even

[1386:07]

though I'm sure we could do better than

[1386:09]

that. But how do I iterate over all of

[1386:11]

the sports that the user gave me? Well,

[1386:13]

I need to change my validation code here

[1386:15]

a little bit. If you know the user can

[1386:17]

select multiple values as with

[1386:19]

checkboxes, you're going to use

[1386:21]

request.form.getlist

[1386:23]

and then the name of the uh parameter

[1386:26]

that you want to get the value of. And

[1386:28]

then this is going to give me back a

[1386:29]

list of values. So I'm going to go ahead

[1386:31]

and change semantically my code to say

[1386:33]

sports because I'm expecting zero or

[1386:35]

more sports now instead of one. So if

[1386:37]

there are no sports, we're going to just

[1386:39]

say missing sport. Heck, missing sports.

[1386:41]

Um but then I can't simply do this. I

[1386:44]

can't just say is the sport for which

[1386:46]

the user registered in that array or not

[1386:49]

because they might have given me two

[1386:50]

sports or three. So logically I should

[1386:53]

really check all of the sports that the

[1386:55]

human typed in for me and I should

[1386:57]

probably do something like this instead.

[1387:00]

So for each uh sport

[1387:03]

in the sports that the user typed in, go

[1387:06]

ahead and uh ask the question if that

[1387:10]

sport is not in sports, then go ahead

[1387:12]

and output invalid sport. So it's just a

[1387:16]

bit of tedium here. We're just adding a

[1387:17]

bit of logic, but this way I'm iterating

[1387:19]

over every check box that the user

[1387:21]

checked and making sure they didn't do

[1387:23]

what Kelly did earlier and sort of make

[1387:25]

up her own sport and submit that to me

[1387:28]

among all of the others. But this now

[1387:30]

should let me. Let's try. Let's reload.

[1387:33]

Oh, and then actually one other line

[1387:34]

here. We also need to do it down here.

[1387:37]

Uh, for each sport in sports, we better

[1387:40]

execute that line of code multiple

[1387:42]

times. So, let's see what happens. Let's

[1387:44]

go ahead and register David for actually

[1387:47]

let's see what who's in the database

[1387:49]

still. So registrance. So we've got John

[1387:51]

and Doug. No David or Kelly. So let's

[1387:52]

reregister David for basketball and

[1387:54]

soccer. Click register. And now I'm

[1387:57]

indeed registered for both. And I

[1387:59]

observe that it's kind of bad design

[1388:03]

that I'm just inserting myself twice

[1388:05]

into the database. So let me go ahead

[1388:07]

and open up the Frostims database one

[1388:09]

last time. Uh let me do a select uh let

[1388:11]

me do a select star from registrance.

[1388:15]

You'll see too that David and David are

[1388:17]

both there. What would be a better

[1388:18]

design here to get rid of the redundancy

[1388:23]

and to know that I'm the same person

[1388:25]

ideally?

[1388:27]

Yeah.

[1388:29]

>> Yeah. I should probably have an ID for

[1388:31]

the the person as well. So this is going

[1388:33]

to complicate it more than we want to

[1388:35]

play with today. Instead of just a

[1388:37]

registrance table, I should probably

[1388:38]

have like a students table that has an

[1388:40]

ID for every student and the name of

[1388:42]

every student and then change this table

[1388:44]

as we've seen with the IMDb database and

[1388:46]

others. I should really be storing the

[1388:48]

IDs of the students, the Harvard IDs if

[1388:50]

you will, and not just their names like

[1388:52]

this. So, there's room for improvement,

[1388:54]

but the point here is just how we can

[1388:55]

actually use checkboxes and get back

[1388:57]

multiple items from folks.

[1389:01]

All right,

[1389:03]

that was a lot. Questions on where we're

[1389:06]

now at.

[1389:09]

All right, to make the coding a little

[1389:11]

less tedious, what we're going to do is

[1389:12]

look at a few final examples that have

[1389:14]

sort of come pre-made, and we'll walk

[1389:16]

through the code, pointing out only

[1389:18]

what's different as opposed to some of

[1389:19]

the boilerplate that we keep seeing. Um,

[1389:21]

where we left off now, recall, is that

[1389:22]

we have app.py, which is all of our

[1389:24]

logic, requirements.ext, text which just

[1389:26]

enumerates the libraries that we want to

[1389:28]

use in the project. Static which now

[1389:29]

contains any static files like cats or

[1389:31]

JavaScript or CSS and templates which

[1389:33]

contains our actual templates. It's

[1389:35]

worth noting that we're actually

[1389:36]

following a fairly common paradigm. This

[1389:38]

is not specific to Flask. The model that

[1389:40]

we've essentially the the paradigm that

[1389:43]

we've essentially been implementing is

[1389:44]

this. If this uh shape over here

[1389:46]

represents the human or the user, they

[1389:49]

keep interacting with what the world

[1389:50]

generally calls a view. A view is the

[1389:52]

term of art that just describes like the

[1389:54]

user interface. aka view. But that view

[1389:57]

is generated by a certain type of code,

[1389:59]

namely controller logic. So app.py is

[1390:02]

technically what the world would call

[1390:04]

controller logic or business logic uh to

[1390:06]

use an industry term. And that

[1390:08]

controller code, aka app.py, is

[1390:10]

generating one or more views. So the

[1390:12]

views that we're referring to here is

[1390:14]

like everything in your templates. Those

[1390:16]

are your views. But there's a third

[1390:17]

piece of the puzzle that we just

[1390:18]

introduced which is generally called a

[1390:21]

model. And initially my model was just a

[1390:23]

stupidly simple uh dictionary in memory

[1390:25]

and that evolved eventually into

[1390:27]

frostams.db. So your model is generally

[1390:29]

your persistent data like where you're

[1390:31]

storing data related to the application.

[1390:33]

And even though the picture doesn't lend

[1390:35]

itself to pronouncing it in the right

[1390:36]

order this is what's known as the MVC

[1390:39]

paradigm model view controller. And it's

[1390:42]

a very common way of developing web apps

[1390:44]

by just thinking about the different

[1390:46]

problems you need to solve with this

[1390:48]

kind of nomenclature. Like I've got to

[1390:50]

implement my controller which does all

[1390:51]

of the logic, all of the variables,

[1390:53]

functions, conditionals, loops, and so

[1390:55]

forth. I've got to implement the view

[1390:57]

which contains everything the user sees

[1390:59]

and interacts with like the HTML. And

[1391:01]

I've got to eventually implement the

[1391:02]

model which is like all of the backend

[1391:04]

data space and such. The catch though is

[1391:08]

that this is not a clean line because

[1391:10]

clearly in views we've seen variables,

[1391:11]

we've seen loops, we've seen

[1391:13]

conditionals. So this is just a general

[1391:15]

mindset to have and in the real world if

[1391:17]

you ever uh explore web apps again you

[1391:20]

are henceforth familiar with what's

[1391:22]

known as this MVC model. But now let's

[1391:25]

solve some other real world problem. So

[1391:27]

here's what you see on the occasion that

[1391:29]

you sign into something like Gmail or

[1391:31]

really any other website that asks for a

[1391:32]

username and then eventually a password

[1391:34]

or some such thing. This is just a web

[1391:36]

form. It looks a lot prettier than mine

[1391:37]

because they're using some fancy CSS to

[1391:39]

make things blue and nicely indented and

[1391:41]

so forth, but it's just HTML underneath

[1391:43]

the hood with probably an input type

[1391:45]

equals text to give me this text box. Of

[1391:48]

course, when you log into Gmail after

[1391:50]

providing your password, somehow Gmail

[1391:52]

remembers often for days, weeks even

[1391:55]

that you have logged in already. Now,

[1391:57]

how is that actually working? Well, when

[1391:59]

you first log into a site like Gmail and

[1392:02]

click submit or the next button in this

[1392:03]

case, presumably the browser is

[1392:05]

submitting in a virtual envelope, so to

[1392:07]

speak, a message like this to Google's

[1392:11]

servers. Post slash something to

[1392:13]

accounts.google.com, which happens to be

[1392:15]

the URL that Google uh typically uses

[1392:17]

for this. And inside of this, the dot

[1392:18]

dot dot is your username and password

[1392:21]

and anything else that might be

[1392:22]

submitted to the server. Ideally, the

[1392:24]

server responds to you with 200. Okay,

[1392:26]

like here is your inbox. Okay, you

[1392:28]

logged in successfully, but it also

[1392:31]

underneath the hood, every time you've

[1392:34]

been logging into Gmail, has been

[1392:36]

planting a cookie on your computer. And

[1392:38]

you might be generally familiar with

[1392:39]

cookies. They have kind of a bad rap

[1392:41]

because they're often used and are used

[1392:43]

quite frequently for tracking, for

[1392:44]

advertising, um, and really kind of

[1392:47]

keeping eyes on you in some way. But in

[1392:48]

their basic form, they're just a feature

[1392:50]

of HTTP, which is wonderfully useful

[1392:52]

because it solves some typical problems.

[1392:55]

Uh this is another HTTP header that is

[1392:59]

usually inside of those virtual

[1393:00]

envelopes that come back from servers to

[1393:02]

browsers. In addition to telling the

[1393:04]

browser what the type of content is in

[1393:05]

the envelope, it might tell the browser,

[1393:07]

please set the following cookie. A

[1393:09]

cookie is just a key value pair. It

[1393:11]

might be something like session

[1393:13]

literally equals some value. And that

[1393:16]

value is usually a random string that

[1393:18]

might be 1 2 3 4 5 6 or something like

[1393:20]

that, but it's a unique identifier. Or

[1393:23]

naively, if Google implemented cookies

[1393:25]

poorly, they could technically tell your

[1393:28]

browser to store a cookie on your

[1393:29]

computer containing your username and a

[1393:31]

password. Why? So that tomorrow when you

[1393:33]

open up Gmail, you're not prompted again

[1393:35]

with the stupid form to log in. It

[1393:37]

already knows your browser that you're

[1393:39]

logged in. And your browser can do that

[1393:40]

by just sending the same cookie it got

[1393:42]

yesterday to the server. Now, this is

[1393:45]

bad to use cookies to store usernames

[1393:47]

and passwords generally because it's

[1393:49]

putting very precious data in the

[1393:51]

browser's memory and any sibling or

[1393:53]

roommate who walks over to your browser

[1393:54]

can now find your username and password

[1393:56]

by just poking around your cookies. So

[1393:58]

generally what browsers do is more like

[1394:00]

this screenshot here whereby all the

[1394:03]

server does is it puts a big random

[1394:05]

value on your computer somewhere

[1394:08]

essentially a text file containing a big

[1394:10]

random value and that is equivalent

[1394:12]

essentially to sort of a handstamp like

[1394:14]

if you go into a bar or a club or an

[1394:16]

amusement park generally you show your

[1394:17]

ticket once when you go in and then

[1394:19]

thereafter you just show your hand if

[1394:21]

you want to be able to come and go again

[1394:23]

and again. So right now my hand has not

[1394:25]

yet been stamped. We uh have this nice

[1394:27]

here smiley face sticker. I might have a

[1394:29]

smiley face now on my hand anytime I

[1394:31]

want to go back into the bar or club or

[1394:33]

amusement park because they now know,

[1394:34]

oh, we already checked who you are,

[1394:36]

presumably the very first time that you

[1394:38]

came in. That's all cookies are

[1394:40]

effectively doing is it's putting a

[1394:42]

virtual handstamp in your browser

[1394:44]

because the browser the next time you go

[1394:46]

to Gmail and click on a link or click on

[1394:48]

an email. Your browser unbeknownst to

[1394:50]

you will send a get request that looks

[1394:52]

like this but also contains a line like

[1394:55]

cookie colon and then that same key

[1394:57]

value pair. It's like presenting your

[1394:59]

handstamp again and again every time you

[1395:01]

open an email or click on a link in

[1395:03]

Gmail. This cookie header is what the

[1395:05]

browser sends. This set cookie header is

[1395:08]

what the server sends. So this is the

[1395:10]

act of stamping your hand. This is the

[1395:12]

act of presenting your hand. And that

[1395:14]

effectively is how browsers and servers

[1395:17]

remember who you are. This is how

[1395:20]

advertisers generally remember who you

[1395:23]

are because at one point or other they

[1395:24]

put a cookie on your computer and

[1395:26]

unbeknownst to you, you're going to this

[1395:28]

website, this website, this website and

[1395:29]

your browser has been presenting this

[1395:31]

handstamp all this time so advertisers

[1395:34]

know, oh that's David again, that's

[1395:35]

David again. And that's David again

[1395:37]

because they're seeing the h same

[1395:38]

handstamp. And so one of the reasons why

[1395:40]

last week for instance I kept opening

[1395:42]

things in incognito mode which you might

[1395:44]

use generally if you want to do

[1395:45]

something private and not have it be

[1395:46]

saved in the computer's memory is also

[1395:48]

because incognito mode gets rid of all

[1395:50]

of your cookies when you close the

[1395:52]

window effectively like wiping off the

[1395:54]

handstamp the next time you go to that

[1395:57]

same website. So that's all a cookie is.

[1396:00]

It's a key value pair that can be

[1396:01]

planted on your computer, but it's a

[1396:03]

wonderfully powerful mechanism for

[1396:05]

implementing, and this is the juiciest

[1396:07]

idea for today, I'd argue, what are

[1396:09]

called sessions. Sessions are this

[1396:12]

feature whereby browsers and servers

[1396:14]

have a persistent connection to each

[1396:16]

other, even though HTTP is what we'll

[1396:19]

call stateless. So stateless just means

[1396:21]

that you don't have a constant

[1396:22]

connection to the server when you are

[1396:24]

using a website. And that's not always

[1396:26]

true. And nowadays you sometimes do have

[1396:27]

a consistent a persistent connection but

[1396:30]

cookies allow you to close your laptop

[1396:32]

even shut down your computer come back

[1396:34]

the next day and still have the illusion

[1396:35]

of being connected just as you were the

[1396:37]

previous day because of this virtual

[1396:39]

presentation of handstamps. So a session

[1396:42]

more concretely you can think of in

[1396:43]

Python as a dictionary of key value

[1396:46]

pairs that you can associate with each

[1396:48]

and every user. That is to say, when I

[1396:51]

log into a website that is using

[1396:52]

sessions implemented with cookies, they

[1396:55]

can store any number of key value pairs

[1396:57]

about me in the server's memory. And my

[1397:00]

presentation of the handstamp will

[1397:02]

ensure that they keep uh they know which

[1397:04]

key value pairs to assign to mate. Let

[1397:06]

me go back into VS Code here and let me

[1397:08]

CD into a directory with which I came,

[1397:10]

which is called login, which is just

[1397:12]

going to be a relatively simple Flask

[1397:14]

application that demonstrates how you

[1397:15]

can implement the ability to log into a

[1397:17]

website. And we'll keep it super simple

[1397:18]

with just usernames, no passwords. But

[1397:20]

as you'll see in problem set 9, we'll

[1397:22]

add some passwords to the mix as well.

[1397:24]

If I type ls inside of this login

[1397:25]

directory, you'll see some familiar

[1397:27]

friends, app.py, requirements.ext, and

[1397:29]

templates. But let me draw our attention

[1397:31]

to one other library we're going to now

[1397:32]

start using called Flask session. So

[1397:35]

flask session is just a third party

[1397:37]

library that gives us the ability to use

[1397:38]

cookies in our application and not have

[1397:41]

to know or understand any of the

[1397:42]

screenshots we just saw of HTTP

[1397:44]

requests. it sort of suffices to

[1397:46]

stipulate, okay, someone figured out how

[1397:48]

cookies works. I just want to use them

[1397:50]

now as a feature so that when a user

[1397:52]

uses my website, I can associate data

[1397:54]

with them like who they are, what their

[1397:56]

username is, and therefore that they've

[1397:58]

logged in. So, let's go ahead and close

[1398:00]

requirements.ext and open up app.py in

[1398:03]

this case. Here is an implementation of

[1398:06]

a program whose purpose in life is to

[1398:08]

enable me to log in. And in fact, before

[1398:10]

we demon before we walk through the

[1398:13]

code, let me do this in this uh

[1398:14]

terminal, let's do flask run. And I

[1398:17]

already hit control C on my other

[1398:18]

terminal window a moment ago. Uh let me

[1398:20]

now go into my other tab up here and

[1398:23]

reload the slash route, which is now

[1398:26]

going to be this login route instead of

[1398:27]

frost imams. All this website does by

[1398:29]

default is it tells me first you are not

[1398:31]

logged in, but here's a link to log in.

[1398:33]

It's a little small, but if you look in

[1398:35]

the bottom lefthand corner of my browser

[1398:37]

right now, it's a URL that ends with

[1398:39]

slashlo. And in fact, I can see that

[1398:41]

more clearly if I view page source in

[1398:43]

the browser. Here is the only thing I'm

[1398:45]

really seeing in this web app so far.

[1398:48]

But notice what happens now. If I click

[1398:50]

on login, the route in my URL just

[1398:53]

changed to /lo. I'm again keeping it

[1398:56]

simple with just usernames, no

[1398:57]

passwords, but I'm going to log in as

[1398:59]

David and click login. But first, let me

[1399:01]

show you the code. In view page source,

[1399:04]

I have a form that submits to /lo using

[1399:07]

the post method. The only thing about

[1399:09]

this button that's that form that's

[1399:11]

interesting is it's got a text box and a

[1399:13]

login button. Same as we've seen before.

[1399:16]

So, let's click it. Now, I click login.

[1399:18]

And notice I get whisked away back to

[1399:20]

the original route, the slash route.

[1399:21]

Even though Chrome is hiding the slash

[1399:23]

from me, but the website somehow knows

[1399:25]

that I'm logged in as David. In fact, if

[1399:27]

I open up my page source in the browser,

[1399:30]

I'll see that now it doesn't say you are

[1399:32]

not logged in. It says I am logged in as

[1399:34]

David. And it's now giving me apparently

[1399:36]

conditionally a logout link. So I argue

[1399:39]

this is representative now of any

[1399:41]

website that lets you log in and out of

[1399:43]

it. So how does this work? Well, in my

[1399:45]

login account uh in my login app here,

[1399:48]

what do we have in app.py? The

[1399:50]

following. I've got from flask import

[1399:52]

flask redirect render template request

[1399:55]

and a new one session which you can

[1399:58]

essentially think of as a dictionary

[1400:01]

where you can store key value pairs for

[1400:03]

each and every user and flask will make

[1400:06]

sure that your code has a different copy

[1400:09]

of session for every user that visits.

[1400:11]

You can just treat it as though you only

[1400:13]

have one user, but Flask will ensure

[1400:15]

that when a user visits, they get their

[1400:17]

own copy of session, their own copy of

[1400:19]

session, their own copy of session

[1400:20]

essentially to store whatever you want.

[1400:22]

This next line here, I just need to copy

[1400:24]

paste from flask session import capital

[1400:26]

session. This line is the same. Turn

[1400:28]

this file into a flask app. This stuff

[1400:30]

is new and find a copy paste. This just

[1400:33]

says configure this app to use sessions

[1400:36]

by storing the cookies on the server as

[1400:39]

files instead of in a database or

[1400:40]

somewhere else. But this is the default

[1400:42]

that we use for our examples. All right,

[1400:45]

what's going on here? Well, in my slash

[1400:47]

route, I've got an index function whose

[1400:49]

purpose in life seems to be to render a

[1400:51]

template called index.html and then pass

[1400:53]

in a name placeholder, which is the

[1400:56]

value of session.get.name.

[1400:58]

So whatever name is stored in the

[1401:00]

session if any that gets passed into the

[1401:03]

template. So let's go down this rabbit

[1401:04]

hole. Let me open up index.html.

[1401:07]

Interesting. So here is the logic that

[1401:09]

implemented those two different versions

[1401:11]

of the homepage that we saw. If the name

[1401:14]

has a value, so if it's not empty, we

[1401:16]

saw you are logged in as such and such.

[1401:18]

Here's a logout link. If though there

[1401:20]

was no name, as happens by default

[1401:22]

before you even log in, you see you are

[1401:24]

not logged in. Here's a link to log in.

[1401:27]

So that's all the homepage is is it's

[1401:29]

conditional logic checking if there is

[1401:30]

in fact a user logged in. All right.

[1401:33]

Well, let's go back to app.pay. How does

[1401:35]

the login work? Well, if you find your

[1401:38]

way to the login route, then I'm asking

[1401:40]

a question. If the user got here via

[1401:42]

post, they probably got here by clicking

[1401:45]

the login button that I gave them. So,

[1401:48]

let's store in the session dictionary

[1401:50]

the word name and make the value of that

[1401:53]

key this value here where what I've just

[1401:56]

highlighted is whatever the user typed

[1401:58]

into the form whether it's David, Kelly,

[1401:59]

John or anyone else. That's what comes

[1402:02]

back from the form and I'm just storing

[1402:03]

that in the session which again is like

[1402:05]

this special global variable that you

[1402:07]

get one per user and it's implemented

[1402:10]

underneath the hood by way of cookies or

[1402:11]

these handstamps. Then I'm just

[1402:13]

redirected to the slash route.

[1402:15]

Otherwise, if the request method wasn't

[1402:17]

post, that means the user just van newly

[1402:19]

visited example.com or whatever my

[1402:21]

website is. That's why I show them

[1402:22]

login.html. All right, let's go down

[1402:24]

that rabbit hole. Let's open up

[1402:26]

login.html.

[1402:27]

It's pretty simple. It's just a stupid

[1402:29]

form that has a text box and a submit

[1402:31]

button. But the most important part is

[1402:33]

that as we saw in the browser, it

[1402:35]

submits to /lo the route we just saw.

[1402:39]

All right, if I go back to here, how do

[1402:41]

you log out? Well, we didn't actually

[1402:42]

click this, but here is how you can

[1402:44]

delete the contents of the session and

[1402:46]

actually log the user out. You just call

[1402:48]

session.clear. And so, in fact, if I go

[1402:51]

back over here and click log out, how

[1402:53]

does the server know that I've logged

[1402:54]

out? Well, that route very quickly, you

[1402:56]

didn't even see the URL bar change

[1402:58]

logged me out by clearing the whole

[1403:00]

session. And so, the cookie that was

[1403:02]

planted on my computer was essentially

[1403:04]

deleted at this point in time. Or

[1403:06]

really, the server side data that's

[1403:08]

associated with that cookie was deleted.

[1403:10]

So, I'm no longer seeing it at all. So,

[1403:13]

that's kind of it. Like, if you log into

[1403:15]

a website, whether it's Facebook or

[1403:17]

Gmail or Outlook or anything else, like

[1403:19]

that's effectively how they're logging

[1403:20]

you in, but of course, they're adding

[1403:22]

into the mix some uh passwords and other

[1403:26]

security as well. All right, how about

[1403:28]

one other example? Let me go back into

[1403:30]

VS Code here and let me go into my first

[1403:33]

terminal, hit C to kill this login

[1403:36]

example. Let me hit cd to go back and

[1403:38]

then cd uh store to implement the

[1403:41]

simplest of web stores like some kind of

[1403:43]

e-commerce site that has an actual

[1403:45]

shopping cart implemented. Let me do

[1403:47]

flask run inside of this directory. Open

[1403:50]

up my other terminal window. And in my

[1403:52]

other terminal window, I'm going to go

[1403:53]

cd to go back and then go into store

[1403:55]

here where I'm going to see some

[1403:56]

familiar files, namely app.py

[1403:59]

requirements.ext, but a database file

[1404:02]

this time in addition to my templates.

[1404:03]

Well, let's see what's inside of that

[1404:04]

database. Let me go ahead and run SQLite

[1404:07]

3 of store.db dots schema to see what's

[1404:09]

in the database. Ah, this is like a

[1404:11]

bookstore like the very first version of

[1404:13]

amazon.com if you will. And the table

[1404:15]

has uh two columns an ID column and a

[1404:18]

title column for all of the books that

[1404:19]

this store shall sell. Well, what are

[1404:21]

those books? Select star from books

[1404:24]

semicolon. Okay, so this is a bookstore

[1404:27]

that sells only five books among them

[1404:29]

the Hitchhiker's Guide to the Galaxy and

[1404:31]

sequels. All right. So, wouldn't it be

[1404:33]

nice if we have a website that displays

[1404:35]

everything in this catalog and lets me

[1404:37]

like add things to my cart? And in fact,

[1404:39]

here is maybe the better metaphor for

[1404:41]

what a session is. A session essentially

[1404:43]

gives you the ability to implement a

[1404:44]

shopping cart like this where the

[1404:46]

shopping cart of course in the real

[1404:47]

world is specific to each user. Like if

[1404:49]

I'm on Amazon.com and Kelly's on

[1404:51]

Amazon.com and both logged in, we

[1404:53]

obviously don't see the contents of each

[1404:54]

other's carts. And that's because we

[1404:56]

have separate cookies on our hands. And

[1404:58]

so Flask or whatever Amazon is using

[1405:01]

creates the illusion that we each have

[1405:03]

our own global dictionary called session

[1405:05]

in which Amazon can store any key value

[1405:07]

pairs it wants like what's in our

[1405:09]

shopping cart. So let's try this. Let me

[1405:11]

go back to my other browser and reload.

[1405:13]

So I'll now see not the login example

[1405:15]

but the bookstore example. And it's

[1405:17]

super ugly because I whipped it up using

[1405:19]

the simplest of HTML. But you'll see

[1405:21]

here every one of the books in the

[1405:23]

database plus an add to cart button. And

[1405:26]

even if again you're sort of new to all

[1405:28]

this web programming, there's not all

[1405:30]

that much you can do with HTML except

[1405:33]

use forms maybe with some hidden

[1405:34]

elements to achieve this result. So here

[1405:36]

we have the H1 tag with books. Here's an

[1405:38]

H2 which is big and bold but not quite

[1405:40]

as big. Here's the form. Here's the uh

[1405:42]

here's the button for the Hitcher's

[1405:44]

Guide to the Galaxy as an aside because

[1405:46]

there's like a curly quote or an

[1405:48]

apostrophe in the book's name. This is

[1405:50]

just an HTML entity that Flask is

[1405:52]

outputting for me, even though it's not

[1405:53]

there uh visually in the database. So,

[1405:56]

what is the button do for Hitchhiker's

[1405:57]

Guide to the Galaxy? Well, it's a form

[1405:59]

whose action is /cart, presumably

[1406:01]

because I want to add it to my cart

[1406:03]

using the post method. I've got an input

[1406:05]

name equals ID, the type of which is

[1406:07]

hidden, the value of which is one. And

[1406:09]

fast forward 2 3 4. So just like the

[1406:11]

dregister example for Kelly, similarly,

[1406:14]

is each book going to be addable to a

[1406:16]

cart instead of removable by using that

[1406:18]

unique ID? And indeed, every form has an

[1406:20]

add to cart button. So what's happening

[1406:23]

then on the server? Well, let's take a

[1406:25]

look at the other tab here. If I go back

[1406:28]

into uh VS Code and if I go into my

[1406:33]

let's say let's minimize the terminal

[1406:36]

window here and let's open up inside of

[1406:38]

store. Let's open up our template for

[1406:42]

index.html which is sort of the entry

[1406:45]

point. Oh, which is not that. Uh let's

[1406:47]

open up app.py first and figure out

[1406:49]

what's going on. So at the top we have

[1406:51]

some imports including our SQL library.

[1406:54]

We have an app variable being created, a

[1406:56]

DB variable being created using that

[1406:58]

same store.db. We've got this

[1407:00]

boilerplate code which just again

[1407:02]

enables cookies and stores the contents

[1407:04]

on the local file system instead of in a

[1407:06]

database. Ah here's the interesting

[1407:07]

beginning point. How did I see that big

[1407:10]

page with all the books and the buttons?

[1407:12]

Well, for the slash route, we've got

[1407:14]

this function that first uses some SQL

[1407:17]

to get all of the books from the

[1407:18]

database. Select star from books. And

[1407:20]

then, ah, there's no index.html because

[1407:22]

I called it books.html in this case just

[1407:24]

because. And I set the books placeholder

[1407:27]

equal to the value of the books

[1407:28]

variable. All right, let's go down this

[1407:30]

rabbit hole now. Let's open up the

[1407:32]

templates folders books.html file. Okay,

[1407:36]

so here we have that H1 with books and

[1407:38]

then we have a for loop which is going

[1407:40]

to output for every book an H2 tag and a

[1407:43]

form tag a form tag again and again and

[1407:46]

again each of which has a value that

[1407:49]

equals the current book's ID but the

[1407:52]

title in the H2 of course is the title

[1407:54]

of the book which is more human

[1407:56]

friendly. So what happens when I

[1407:57]

actually click on add to cart for the

[1407:59]

Hitchhiker's Guide to the Galaxy? Well,

[1408:00]

I should indeed see that now that one

[1408:02]

book has been added. And if I go back

[1408:04]

and add another like the restaurant at

[1408:06]

the end of the universe, I now have two

[1408:08]

books in my cart. So, where is that data

[1408:10]

actually being stored? Well, if we go

[1408:11]

back to VS Code here, uh, hide the

[1408:14]

terminal and focus on the cart route.

[1408:16]

The cart route because it supports post

[1408:18]

in addition to get also is doing this

[1408:21]

for me. Well, first it's checking with

[1408:22]

some logic here. If there is no cart in

[1408:25]

the session, go ahead and create a key

[1408:27]

called cart and set it equal to an empty

[1408:29]

list. In other words, I can put any key

[1408:31]

value pairs into the session that I

[1408:33]

want. So, if I want my shopping cart to

[1408:35]

effectively be a list of all of the

[1408:37]

books that the user has added to their

[1408:38]

cart, it stands to reason that my cart

[1408:41]

by default should just be an empty list

[1408:43]

when they first arrive. However, if the

[1408:45]

user has clicked submit in order to get

[1408:48]

here, well, I'm going to do this. I'm

[1408:50]

going to get the ID of the book that

[1408:51]

they've submitted via that form. And if

[1408:54]

it indeed exists and it's not someone

[1408:55]

like Kelly messing around and sending me

[1408:57]

invalid parameters, I am going to append

[1409:00]

to the cart list in the session the book

[1409:04]

ID. And then I'm just going to redirect

[1409:07]

the user to the cart. And anytime you do

[1409:08]

a redirect that always is using get, not

[1409:12]

post. And so when I come back to this

[1409:14]

cart route later, I'm not going to be

[1409:16]

using post. I'm going to be using get,

[1409:19]

which means this chunk of code here is

[1409:20]

executed. I have a variable called

[1409:22]

books. set it equal to the results of

[1409:24]

doing select star from books where id in

[1409:26]

the following parenthesized list of ids

[1409:29]

recall that in is the preposition that

[1409:30]

gives me back multiple ids if I so

[1409:32]

choose and then I'm rendering cart.html

[1409:35]

HTML with those there books. And if I go

[1409:39]

back to the application, the reason why

[1409:40]

I'm seeing two elements here, and indeed

[1409:42]

if I go to my developer tools or view

[1409:45]

page source rather, I'll see two list

[1409:47]

items inside of an ordered list or a

[1409:50]

numbered list containing the contents

[1409:52]

then of that shopping cart. All right.

[1409:55]

So, if we now have the ability to use

[1409:57]

sessions to remember who has logged in

[1409:58]

and we have the ability with sessions to

[1410:00]

remember what someone has added to their

[1410:01]

shopping cart, what else can we do with

[1410:04]

web applications more generally, even if

[1410:07]

not using sessions? Well, let me go

[1410:09]

ahead and close this tab here. Let me go

[1410:11]

back to VS Code here. Close out these

[1410:13]

two examples and let's do a final set of

[1410:15]

examples that demonstrate what we can do

[1410:17]

with some real world data and a web

[1410:19]

application. I have lastly a directory

[1410:21]

called shows which is evocative of our

[1410:23]

use of IMDb in the past. And I'm going

[1410:25]

to go ahead into my first terminal

[1410:27]

window. Hit control C and call your

[1410:28]

attention to one thing before we move

[1410:30]

on. Every time I have executed a SQL

[1410:33]

query inside of my code in my first

[1410:36]

terminal window where Flask is running,

[1410:38]

you'll see either in green for success

[1410:40]

or yellow or red for some issues the

[1410:44]

actual SQL code uh SQL commands that are

[1410:46]

being sent to your database. This is

[1410:48]

useful if you mess something up at some

[1410:50]

point related to a database query. You

[1410:52]

can actually see in your terminal where

[1410:55]

you're running flask run actually what

[1410:57]

SQL command was sent to the server to to

[1411:00]

try to troubleshoot errors that way.

[1411:01]

Otherwise, you're just flying blind when

[1411:03]

actually interacting only with the web

[1411:06]

browser. But for now, let me go ahead

[1411:07]

and clear that away and cd back to my

[1411:10]

default directory and cd now into shows

[1411:13]

where if I type ls, we'll see a whole

[1411:15]

bunch of files. app.py requirements.ext

[1411:17]

text and this time shows.db which is the

[1411:19]

very same database that we had in past

[1411:22]

weeks when we played with some of the

[1411:24]

very large number of shows in the

[1411:26]

internet movie database. And what does

[1411:28]

zap.py do here? Well, it implements the

[1411:30]

simplest of programs. This gives me

[1411:32]

access first to shows.db with some

[1411:34]

boilerplate up top. If I scroll down

[1411:36]

here, you'll see that there's a uh

[1411:38]

index.html template that's rendered by

[1411:40]

default. And then apparently there's a

[1411:42]

search route which is akin to what

[1411:43]

Google does for us when we searched for

[1411:45]

cats and dogs in the past. But for the

[1411:47]

first time I'm implementing my own

[1411:49]

search engine for TV shows, not for dogs

[1411:52]

and cats. But what does this search

[1411:53]

route do? Well, it uses a shows variable

[1411:56]

and it executes the SQL select star from

[1411:58]

shows where title equals question mark

[1412:00]

and it passes in just like Google does

[1412:03]

the Q parameter for query and then it

[1412:05]

renders a template called search.html

[1412:07]

HTML passing in those shows as a

[1412:09]

placeholder. In other words, what does

[1412:11]

this do? Well, let me go back over to

[1412:13]

the store uh to the store tab here.

[1412:15]

Change the URL to just slash. And

[1412:17]

because I'm now running uh I'm no longer

[1412:19]

running the store, I do want to go ahead

[1412:21]

and run in my first terminal window

[1412:24]

flask run to start start off the shows

[1412:27]

application instead. So if I now go back

[1412:29]

to that tab because no server is

[1412:31]

running, what I see here now is the

[1412:32]

simplest of search boxes like our Google

[1412:34]

example asking for a query, but this

[1412:36]

time I can search for things with which

[1412:38]

I'm more familiar, like the office,

[1412:40]

capital T, capital O, search. And what I

[1412:43]

get back, not that enlighteningly, but

[1412:45]

is the title of every show that matches

[1412:47]

exactly that. If I go ahead and view

[1412:48]

page source, you'll see that what was

[1412:50]

generated was a unordered list of

[1412:52]

offices that are in the database. And

[1412:55]

recall there's the British one, the

[1412:56]

American one, and a bunch of others as

[1412:57]

well. However, this form does not work.

[1413:00]

If I type in something like the office

[1413:02]

search, I get no results in that case,

[1413:04]

which isn't so much a bug. Well, is just

[1413:07]

a lack of features here. And so, let me

[1413:09]

actually go into VS Code here, and let

[1413:12]

me propose that we come up with a better

[1413:14]

version of this code. So, in fact, I'm

[1413:16]

going to go into the pre-made examples

[1413:18]

with which I came today. I'm going to go

[1413:20]

into the next version of shows here. Run

[1413:22]

flask run here. reload the application

[1413:25]

over here and now show you that the

[1413:27]

office in lowercase does actually work.

[1413:29]

Moreover, it searches for anything that

[1413:31]

mentions the office. So if you had to

[1413:33]

guess how might this be implemented

[1413:36]

underneath the hood, well, if I open up

[1413:38]

my other terminal window and go into

[1413:39]

that same directory, shows one and open

[1413:42]

up this version of app.py, PI you'll see

[1413:45]

that instead of using a simple query

[1413:47]

like before I'm now using the like

[1413:50]

keyword here because I'm checking that

[1413:52]

it is like the office and notice this is

[1413:54]

a bit clever here or a bit confusing at

[1413:56]

first glance the placeholder I want is

[1413:59]

question mark but I don't want to just

[1414:01]

search for the user's input I want to

[1414:03]

tolerate zero or more characters to the

[1414:05]

left via the SQL wild card and zero or

[1414:07]

more characters to the right so I'm

[1414:10]

concatenating onto the user's input a

[1414:12]

percent sign here a percent sign here

[1414:14]

because recall from our week seven with

[1414:15]

SQL. This just means look for anything

[1414:18]

case insensitively that has t space o

[1414:21]

ffic in it no matter where that string

[1414:25]

is in the text. How did it know to

[1414:27]

render that though as this bulleted list

[1414:29]

of all of these offices? Well, let me go

[1414:31]

into my terminal here and open up uh

[1414:35]

search.html which is the template that

[1414:37]

the search route is using. And you'll

[1414:39]

see that I'm just iterating over with a

[1414:41]

ginger for loop each of those shows. and

[1414:43]

then outputting a list item for each of

[1414:45]

those matches effectively just as I did

[1414:47]

before. But there's this other technique

[1414:49]

I can use altogether and it's generally

[1414:52]

going to open up more possibilities for

[1414:54]

us in final projects if not beyond of

[1414:56]

creating essentially my own API. Rather

[1414:58]

than to just make a web app that spits

[1415:00]

out the entire HTML page that I want the

[1415:02]

user to see, wouldn't it be nice if I

[1415:04]

could just start to create routes that

[1415:06]

spit out the data that I want and then I

[1415:08]

or even some third party making a

[1415:10]

website with the same data can integrate

[1415:12]

my application into their own. And

[1415:15]

indeed, an API is an application

[1415:17]

programming interface. And it's

[1415:19]

essentially web- based functions you can

[1415:21]

call to get data from someone else's

[1415:23]

services generally using HTTP. And you

[1415:26]

can return the data in any number of

[1415:28]

formats in text format um in HTML format

[1415:31]

or in something called JSON format which

[1415:33]

is short for JavaScript object notation

[1415:35]

which looks a little something like this

[1415:37]

which is quite like Python arrays and

[1415:40]

dictionaries combined. But notice here

[1415:42]

with a wave of the hand, there's a whole

[1415:44]

bunch of key value pairs in this

[1415:45]

particular example of all of the offices

[1415:47]

that are in IMDb's database. And so I

[1415:50]

wanted to show us these final versions

[1415:53]

of this same shows application that

[1415:54]

works a little bit differently. If I go

[1415:56]

into say shows 2 example here now run

[1416:00]

whoops and let's go ahead and exit out

[1416:02]

of the previous flask copy and run shows

[1416:05]

two inside of which is flask run. Notice

[1416:08]

here that if I go back to this web form

[1416:10]

now, notice that there is no more search

[1416:13]

button because this is meant to be

[1416:14]

highly interactive and I can search for

[1416:16]

t space of ffic.

[1416:19]

And you'll notice that this is

[1416:20]

effectively autocomplete which we saw a

[1416:22]

taste of last week with JavaScript which

[1416:24]

I am in fact using here. But how is this

[1416:26]

working? Well, let me reload and open up

[1416:28]

my developer tools. And in developer

[1416:30]

tools, let's watch the network tab this

[1416:33]

time because when I type in something

[1416:34]

like t, you'll see that my web page

[1416:37]

suddenly made a request to my own

[1416:39]

slasharch route. And if I click on my

[1416:42]

developer tools and look at the response

[1416:44]

that came back, you'll see that the

[1416:46]

slasharch route spit out not a full web

[1416:48]

page, but just a whole bunch of LI tags.

[1416:52]

Now, why is that? Well, let me go back

[1416:53]

to VS Code and open up in my other

[1416:56]

terminal uh app.py. And in app.py,

[1417:00]

scrolling down to search, you'll see

[1417:02]

that when I get shows from the database,

[1417:05]

I'm still using search.html, which

[1417:07]

previously extended my layout and

[1417:10]

plugged in that whole ordered unordered

[1417:12]

list. But this time, if I go into this

[1417:15]

version of search.html, HTML, you'll see

[1417:17]

that I'm only spitting out raw HTML

[1417:20]

because I'm assuming that maybe someone,

[1417:22]

myself included, wants to use slash

[1417:24]

search to just get a whole bunch of list

[1417:27]

items that they can put into their own

[1417:29]

unordered list or UL tag. And so what's

[1417:32]

effectively happening over here is every

[1417:34]

time I type a letter, notice at bottom

[1417:36]

left, another HTTP request goes across

[1417:39]

the internet, another HTTP request, and

[1417:41]

each of those is returning the set of LI

[1417:44]

elements that line up with the query

[1417:47]

that I've typed in. But this is a little

[1417:48]

sloppy arguably in so far as I'm

[1417:50]

returning a chunk of HTML, but out of

[1417:53]

context, and I'm dictating to the user

[1417:54]

that they have to use list items.

[1417:56]

Wouldn't it be nice to just send the raw

[1417:58]

data? And I can do that, too. Let me go

[1418:00]

back into VS Code here and look at our

[1418:02]

final example, shows three, inside of

[1418:06]

which is a version of this code that now

[1418:08]

returns that so-called JavaScript object

[1418:10]

notation. And if I go into shows three,

[1418:13]

run flask run, go back over now to my

[1418:16]

browser tab, and click reload, I'll see

[1418:19]

now when I search for say T and click on

[1418:23]

that row. Notice now in the response tab

[1418:26]

of my developer tools, I'm getting back

[1418:28]

a whole bunch of juicy information. A

[1418:30]

massive JavaScript object notation chunk

[1418:33]

of data. Notice the square bracket means

[1418:35]

here comes a list or an array. Here

[1418:37]

comes a dictionary or dict. And indeed,

[1418:40]

that's what I'm seeing. This looks like

[1418:41]

Python, but it's technically JavaScript

[1418:43]

and it's technically JavaScript's object

[1418:44]

notation. This just means this is the

[1418:46]

juicy data I'm getting back from the

[1418:48]

server. And if you now think way back to

[1418:50]

week zero and even our family weekend

[1418:53]

lecture on AI, a lecture on AI where I

[1418:55]

was writing code that talked to open AIS

[1418:57]

so-called API to get responses from our

[1419:00]

serverside cat. They were sending us

[1419:03]

JavaScript object notation like this and

[1419:05]

I was just grabbing the data that I

[1419:07]

actually cared about, namely the cat's

[1419:09]

actual response. And so in this case, if

[1419:12]

I open up in my other terminal window

[1419:14]

here, app.py, Pi. You'll see in my

[1419:17]

search route that instead of returning a

[1419:19]

template, I'm using a crazy named

[1419:21]

function called JSONify, which is just

[1419:24]

another function that comes with Flask

[1419:26]

itself that has the effect of taking the

[1419:29]

list of Python dictionaries that came

[1419:32]

back from my SQL database, JSONifying it

[1419:34]

in such a way that I then can uh serve

[1419:38]

it to anyone on the internet, myself

[1419:40]

included, as a service so that I and

[1419:43]

they can use my own data to implement

[1419:45]

ment their own web web applications. So

[1419:48]

that's sort of it for web programming.

[1419:50]

Ultimately, you now have all of the

[1419:52]

building blocks from week zero onward to

[1419:53]

make your own web applications. And if

[1419:55]

you so choose for final projects, your

[1419:56]

own mobile applications, even if this

[1419:58]

too, like everything else has felt like

[1419:59]

a bit of a fire hose, it is in the

[1420:01]

process of your final project of

[1420:03]

specking out and proposing and executing

[1420:05]

your own final project that will make

[1420:07]

all of this feel much more comfortable

[1420:09]

and familiar. And you'll look back on so

[1420:11]

many of the past weeks as useful

[1420:12]

building blocks. Uh but this then was

[1420:14]

your CS50 education weeks 0 through

[1420:16]

nine. We have just one more left next

[1420:18]

week. So we'll see you then.

[1420:40]

Heat. Heat.

[1421:04]

Heat.

[1421:19]

Heat.

[1421:38]

All right, this is CS50 week 10, the

[1421:42]

very end. And we will end today's class

[1421:44]

just as we ended week zero, which is a

[1421:46]

little bit of cake outside in the

[1421:47]

transcept. But over these past 10 plus

[1421:50]

weeks, if you've been feeling like it

[1421:51]

was that proverbial fire hose sort of

[1421:53]

hitting you in the face with so much new

[1421:55]

content, so many new skills, so many new

[1421:57]

challenges, um realize that you're in

[1421:58]

very good company. And we can officially

[1422:00]

declare nonetheless that if you started

[1422:02]

the class among those less comfortable,

[1422:04]

you are officially after today no longer

[1422:06]

less comfortable. You're at least

[1422:07]

somewhere in between. And if you were in

[1422:09]

between, you're more comfortable. And if

[1422:10]

you were more comfortable, you're

[1422:11]

perhaps now most comfortable among those

[1422:13]

here. Um, but keep in mind as per CS50

[1422:15]

syllabus, what does ultimately matter in

[1422:17]

this course is not so much where you end

[1422:18]

up relative to your classmates, but

[1422:20]

where you end up relative where uh to

[1422:22]

where you yourself began. And that's

[1422:24]

taken into account come final projects,

[1422:26]

come final grades. But most importantly,

[1422:28]

that's really what's most important

[1422:29]

educationally in general is that delta

[1422:31]

from week zero to in our case here now

[1422:34]

week 10. Uh, so if it's any reassurance,

[1422:36]

something I like to bring up around this

[1422:38]

time is just how badly I did in CS50 and

[1422:40]

like the very first problem set. Like I

[1422:42]

didn't even get hello world right

[1422:43]

somehow in the fall of 1996. So here's a

[1422:46]

photograph of my homework assignment for

[1422:48]

assignment one. It was a program to

[1422:49]

print hello world on the screen. I was

[1422:51]

incredibly detailed with my comments.

[1422:53]

Even commenting that main is main which

[1422:55]

is not the way you're supposed to

[1422:56]

program. Even telling the the TF where

[1422:58]

my file ended, which is not really

[1423:00]

necessary. And I got minus two for not

[1423:02]

even following directions uh correctly.

[1423:04]

So take some comfort in that. Even if by

[1423:07]

problems at nine, you're still getting

[1423:09]

points off, you're hopefully, at least

[1423:11]

in my case, in some very good company.

[1423:13]

It only gets better and easier uh and

[1423:15]

faster in time. But the whole course

[1423:17]

ultimately has really been about this

[1423:18]

picture, right? Problem solving is

[1423:20]

computer science. And you have inputs,

[1423:22]

which is the problem to be solved. You

[1423:23]

have the outputs that you want to get

[1423:24]

to, which is presumably the solutions

[1423:26]

there, too. And inside of that

[1423:27]

proverbial black box are these

[1423:29]

algorithms, step-by-step instructions

[1423:31]

for solving some problem. And I pulled

[1423:33]

up my own notes from CS50's first

[1423:34]

lecture some 25 plus years ago too where

[1423:37]

I wrote down this in my horrible writing

[1423:40]

handwriting to this day. But I noted

[1423:41]

that what an algorithm is is a precise

[1423:43]

sequence of steps for getting something

[1423:44]

done which is pretty much what we now

[1423:46]

say. Uh I noted that programming itself

[1423:49]

as we have for weeks now is the process

[1423:50]

of taking an algorithm and putting it

[1423:52]

into a language that a computer can

[1423:54]

process and that's what you've done in

[1423:56]

Scratch and C and Python and SQL and

[1423:57]

JavaScript and anything in between. Um,

[1424:00]

and most important, at least my takeaway

[1424:01]

that day when it comes to algorithms is

[1424:04]

precision and correctness. Um, and

[1424:06]

indeed those are points we've made

[1424:07]

perhaps not as emphatically um, over the

[1424:09]

past several weeks as well. But we

[1424:10]

thought we'd see just how much those two

[1424:12]

lessons in particular have sunk in uh,

[1424:15]

by doing a bit of an exercise, some CS50

[1424:17]

Pictionary and this our last lecture al

[1424:19]

together this term. Um, for which to

[1424:21]

begin we need one brave volunteer to

[1424:23]

come on up stage.

[1424:26]

Who would like to volunteer?

[1424:29]

Who? How about Okay, over here. We never

[1424:30]

call from the middle of the section.

[1424:32]

Come on up. Come on up. A round of

[1424:34]

applause for being so brave. Nice.

[1424:39]

All right, come on over.

[1424:47]

And in just a moment, let's go ahead and

[1424:50]

do introductions. First, if you want to

[1424:52]

come up over to the middle of the uh

[1424:53]

stage and introduce yourself to the

[1424:55]

world.

[1424:56]

>> Hi, I'm Gia. I'm a freshman.

[1424:59]

>> All right. Nice. Nice to meet you. Thank

[1425:00]

you for joining us. So, what we're about

[1425:01]

to do is G is going to look at my screen

[1425:03]

where there's going to be a picture on a

[1425:05]

white screen. All of you presumably have

[1425:07]

a white sheet of paper in front of you

[1425:09]

that you grabbed on the way in. If you

[1425:10]

don't, just grab one from a friend or

[1425:12]

your binder or the like. And if you

[1425:13]

really don't, that's okay, too. But

[1425:15]

hopefully everyone has a pen or pencil

[1425:16]

or someone near you does. And what Gia,

[1425:18]

we're going to ask you to do is program

[1425:20]

the audience to draw what it is you see

[1425:23]

on the screen. You can say anything you

[1425:26]

want, but you may not use any physical

[1425:27]

gestures or the like. Verbal programming

[1425:29]

only.

[1425:30]

>> Okay.

[1425:30]

>> All right. Come on over to the lectern

[1425:33]

and in just a moment GN only Gia will

[1425:36]

see what is actually here on the screen.

[1425:40]

So,

[1425:42]

step one for your audience.

[1425:45]

Okay. So, the first thing that you need

[1425:47]

to do is draw two lines right next to

[1425:51]

each other. Two vertical lines.

[1425:55]

Okay.

[1425:55]

>> Okay.

[1425:56]

>> Step two.

[1425:57]

>> Step two. Once you have done that, you

[1425:59]

need to draw three dots. One on above

[1426:04]

those two vertical lines, one right in

[1426:06]

the middle between those two vertical

[1426:07]

lines, and one at on the bottom of these

[1426:10]

three vertical lines, but beneath those

[1426:13]

two vertical lines. Yeah. So, three

[1426:15]

dots.

[1426:17]

>> Okay. Step three. Step three is on the

[1426:21]

top of the left vertical line, you're

[1426:23]

going to connect a line from that

[1426:27]

position to the top dot that you drew.

[1426:31]

And then on the top of the right

[1426:33]

vertical line, you're going to connect

[1426:36]

that position to the top dot that you

[1426:38]

drew.

[1426:40]

>> All right, step four

[1426:42]

>> is remember that top left position?

[1426:45]

You're going to connect that to the

[1426:49]

middle dot that you drew. And then the

[1426:52]

top right of the vertical line at the

[1426:55]

Yes. You're going to connect that to the

[1426:57]

middle dot of the line that you drew.

[1427:01]

>> Got it?

[1427:01]

>> And then step five, on the bottom left

[1427:05]

of your left vertical line, you're going

[1427:08]

to connect that position to the bottom

[1427:11]

dot that you drew. And then on the

[1427:14]

bottom right of the right vertical line,

[1427:16]

you're going to connect that position to

[1427:18]

the bottom dot that you drew.

[1427:21]

And now from the middle dot to the

[1427:24]

bottom dot, you should have no line in

[1427:26]

between that. And you can now draw a

[1427:29]

line between those two dots.

[1427:31]

>> Step six and the last.

[1427:34]

>> I think you should be done.

[1427:36]

>> All right. A round of applause then for

[1427:38]

our programmer. Let me give you a little

[1427:40]

something

[1427:41]

>> if you want to take a seat. So now what

[1427:43]

Kelly and I are going to do is very

[1427:44]

quickly collect your execution of this

[1427:46]

program and we'll see just how it went

[1427:48]

with Gia as the programmer. If you want

[1427:51]

to just reach out and hand me or Kelly

[1427:53]

over there any of your handwritings. We

[1427:56]

don't need all of them. Just a

[1427:57]

representative sample will suffice. If

[1428:00]

you're proud of your work, extend your

[1428:02]

hand quite a bit. Okay. Very proud.

[1428:05]

Okay.

[1428:06]

>> Okay.

[1428:08]

>> Okay. Okay. One more. One more. That's

[1428:11]

okay. All right. All right, I'm going to

[1428:12]

run back to the stage. Okay, it's okay

[1428:14]

if we didn't grab yours.

[1428:19]

All right.

[1428:28]

All right. Thank you to Kelly for

[1428:29]

grabbing these as well. So, without

[1428:31]

having seen any of these, here is how

[1428:33]

you all interpreted Gia's instructions.

[1428:36]

So, here's one interpretation.

[1428:40]

Okay. Perhaps similar or different from

[1428:42]

your own. Uh here's another several

[1428:48]

vertical vertical line question mark.

[1428:50]

Okay. Uh here is

[1428:54]

very narrow one.

[1428:57]

All right.

[1428:59]

And

[1429:01]

and let's see if we got any other

[1429:03]

variants thereof. Actually, the rest of

[1429:05]

them are pretty consistent. So, G, if

[1429:08]

it's any reassurance, I'm seeing a lot

[1429:11]

of ones that look like this. Here's

[1429:14]

another that looks like th this. And

[1429:19]

here's yet another that looks like this.

[1429:22]

So, if you're wondering where we're

[1429:23]

going with this, if I go ahead and

[1429:25]

reveal what it was Gia was looking at on

[1429:26]

the screen, she was in fact having you

[1429:28]

draw this here cube. So, some of the

[1429:32]

takeaways here. So, suffice to say, not

[1429:34]

all of that went well. Uh, but why was

[1429:37]

that? Well, I dare say it was very easy

[1429:38]

to get confused, I think, G, in some of

[1429:40]

your words because you had in your

[1429:42]

mind's eye exactly what it was you were

[1429:43]

drawing. And of course, it was right

[1429:44]

there on the screen. But we didn't

[1429:46]

leverage, at least in G's instructions,

[1429:48]

any abstractions. I dare say it might

[1429:49]

have been a little bit easier for all of

[1429:51]

us if maybe she had just teed things up

[1429:52]

by saying, "All right, everyone, we're

[1429:54]

going to draw a cube," for instance,

[1429:56]

which is indeed an abstraction over

[1429:58]

these lower level details that she was

[1430:00]

focusing on. But perhaps there could

[1430:01]

have been another approach altogether,

[1430:04]

which is even more pedantic. For

[1430:06]

instance, a lot of the earliest drawing

[1430:08]

programs and even worlds like Scratch

[1430:10]

sort of take for granted that you have a

[1430:11]

coordinate system like X's and Y's and

[1430:13]

you can go up, down, left, and right.

[1430:15]

So, an alternative to just saying, "Hey,

[1430:16]

I'll draw a cube, which could be subject

[1430:19]

to interpretation because the cube like

[1430:21]

this is it like this rotated." So, we

[1430:23]

still would have needed more information

[1430:24]

than just a cube from Gia. But here,

[1430:27]

maybe an alternative approach would have

[1430:28]

been to really get into the weeds and

[1430:30]

say, "Put your pen at the top of the

[1430:31]

page and then draw a straight line to

[1430:34]

the southwest, for instance, and then

[1430:36]

draw another line of the same distance

[1430:38]

to the south and then to the southeast

[1430:40]

or so forth." And it could have been in

[1430:42]

terms of degrees. It could be

[1430:43]

directionally in that way, but it might

[1430:45]

not have been clear to anyone what it

[1430:46]

was we were drawing until enough of the

[1430:48]

lines suddenly appear on the screen and

[1430:50]

then voila, you see that we've been

[1430:51]

drawing a cube this whole time. So the

[1430:54]

degree to which we're precise and the

[1430:56]

layer of the level of abstraction that

[1430:58]

we operate in is incredibly important.

[1431:00]

Whether it's for another human to

[1431:01]

understand us, for an AI to understand

[1431:03]

us nowadays, or anything in between. All

[1431:06]

right, why don't we go ahead and flip

[1431:08]

things around a bit um for this? Why

[1431:10]

don't we go ahead and get one more

[1431:12]

volunteer to do something a little

[1431:13]

different here on stage? One more. Okay,

[1431:17]

how about here on the aisle? Come on

[1431:18]

down. Round of applause for this brave

[1431:20]

volunteer. Come on down.

[1431:26]

All right. So, in this exercise, we're

[1431:28]

going to flip things around. So, you all

[1431:30]

will be giving the instructions verbally

[1431:31]

by just shouting them out. And our

[1431:33]

volunteer, whose name is

[1431:34]

>> Presley.

[1431:35]

>> Preston.

[1431:36]

>> Presley.

[1431:36]

>> Presley. Presley, you want to say a

[1431:38]

quick introduction?

[1431:39]

>> Yeah. Uh, my name is Presley. I'm a

[1431:40]

freshman uh living in Stoton House.

[1431:42]

>> Nice. Well, welcome. Come on over to the

[1431:44]

the uh the easel here. And we have a

[1431:48]

black marker for Presley here. And the

[1431:51]

only thing that we ask is that you not

[1431:53]

look up or behind you because the answer

[1431:56]

is going to be right there on the

[1431:57]

screen. But everyone else is welcome to

[1431:58]

look up or over to the TV screen. And if

[1432:00]

you want to go ahead and face the easel

[1432:02]

here and as you draw, just make sure to

[1432:04]

kind of open up after each uh stroke of

[1432:06]

the pen so that everyone can see what

[1432:07]

you have done. All right. So no looking

[1432:09]

up as of now because what the audience

[1432:11]

is about to do is to program you to draw

[1432:14]

this on the screen. Oh, way to encourage

[1432:18]

him. Okay. So, step one, feel free to

[1432:20]

just raise your hand and we'll shout

[1432:22]

them out.

[1432:23]

>> Oh, I heard draw a circle over here.

[1432:27]

>> But not too big. I heard over here

[1432:30]

a stick figure.

[1432:32]

>> Good abstraction. You're going to end up

[1432:33]

drawing a stick figure.

[1432:36]

But we should probably be a little more

[1432:38]

helpful than that. So, let's do the hand

[1432:39]

thing just so we can be more precise and

[1432:40]

not overwhelm Presley. There was a hand

[1432:42]

over here. Yeah. And back.

[1432:43]

>> Draw a line down.

[1432:46]

>> Draw a line down from the circle.

[1432:47]

Presley

[1432:48]

>> from the bottom

[1432:49]

>> from the bottom of the circle.

[1432:53]

Okay, someone else.

[1432:58]

>> Actually, let me let me rewind. Sorry.

[1433:00]

Say it again.

[1433:05]

>> Draw two diagonal lines from the line

[1433:07]

you just drew.

[1433:09]

>> Well, I don't think the audience likes

[1433:10]

this. Wait, let's Oh,

[1433:13]

>> okay.

[1433:15]

Okay, that's what we were told. Next

[1433:17]

step, someone else.

[1433:26]

>> Good one. Okay. Extend the original

[1433:29]

vertical line to be about the same

[1433:31]

height as the circle.

[1433:36]

>> Okay. Yeah, that's good. Good feedback.

[1433:37]

All right. Someone else. Next step.

[1433:41]

Next step. Yes.

[1433:44]

Draw two diagonal lines from the bottom

[1433:46]

of the line.

[1433:48]

>> Nice.

[1433:50]

Draw two diagonal lines from the bottom

[1433:51]

of that line that look like legs. Good

[1433:54]

use of detail and abstraction.

[1434:00]

Okay, nice. Next step.

[1434:05]

>> Anyone? We're close. Yeah, over here.

[1434:10]

line

[1434:16]

>> on the left. So, you're going to draw a

[1434:17]

speech bubble to the left of the head

[1434:19]

with the word high, capital H, with a

[1434:21]

short line.

[1434:24]

>> No bubble, just high.

[1434:33]

>> And you wanted to clarify one other

[1434:35]

detail. And then a line from high to the

[1434:38]

face.

[1434:38]

>> A line from high to the face

[1434:41]

>> with space in between.

[1434:44]

Okay. No, you're doing great. It's okay,

[1434:46]

Presley. Okay. Hang in there. Okay.

[1434:48]

Final step or two.

[1434:51]

Next step.

[1434:54]

Anyone at all.

[1434:57]

>> Feel free to shout it out.

[1434:59]

>> Adjust the arms to make them look like

[1435:01]

they're running.

[1435:02]

>> Adjust the arms to make them look like

[1435:03]

they're running.

[1435:09]

Good luck.

[1435:13]

>> Draw a perpendicular line from the left

[1435:15]

arm.

[1435:15]

>> Oh, I like that. Draw a perpendicular

[1435:17]

line from the left arm

[1435:20]

>> to the bottom

[1435:20]

>> to the bottom.

[1435:24]

>> Okay. And lastly, one final step.

[1435:31]

>> Same side as

[1435:34]

Yeah, it's permanent. Uh,

[1435:39]

I think we need a final touch on the

[1435:40]

other arm. Maybe. Yes. One final step.

[1435:46]

>> Anyone?

[1435:52]

>> Draw a perpendicular line per diagonally

[1435:55]

to the left

[1435:56]

>> of the arm

[1435:57]

>> of the right arm.

[1436:05]

Just a little bit.

[1436:06]

>> Just a little bit.

[1436:09]

>> All right. I think I've I think we've

[1436:11]

withheld our applause long enough.

[1436:12]

Presley, if you want to take a step back

[1436:13]

and look at what you They were trying to

[1436:15]

get you to draw a round of applause.

[1436:20]

So, here too. Let me Here you go. Your

[1436:23]

dorm room if you would like. Okay. And a

[1436:26]

little Super Mario as well. All right.

[1436:29]

So, here too. Um, I think you were the

[1436:31]

problem this time. Round of applause for

[1436:33]

Presley.

[1436:34]

And of course, since it's, you know,

[1436:36]

permanent ink, it's easy to sort of go

[1436:37]

off the rails early on and make a

[1436:39]

mistake. But I think that was actually a

[1436:40]

nice mix of low-level details like the

[1436:43]

directions of the lines and the lengths

[1436:44]

thereof and also some abstractions

[1436:46]

because I do dare say someone shouting

[1436:48]

out that it is to be a stick figure gave

[1436:50]

him a much more helpful mental model. So

[1436:52]

that might be sort of the comments on

[1436:54]

top of the function, but when we really

[1436:55]

got into the weeds of implementing that

[1436:57]

function, it was more akin to stepbystep

[1437:00]

instructions for solving this here

[1437:02]

particular problem. So my thanks to

[1437:04]

Presley for bearing with us with that

[1437:06]

one as well. So beyond this, where have

[1437:11]

we been up until now? So uh if we look

[1437:15]

back at the past several weeks, this is

[1437:17]

sort of the trajectory on which uh we've

[1437:19]

been. So we started with scratch from

[1437:21]

scratch literally in the very first

[1437:22]

week. The goal of which was to introduce

[1437:24]

you to some of those procedural

[1437:25]

fundamentals like what a loop is and a

[1437:27]

conditional and boolean expressions and

[1437:29]

variables which have pretty much

[1437:30]

recurred in different forms and

[1437:32]

different languages over the week since

[1437:34]

thereafter we transitioned to a more

[1437:36]

traditional language C which many of you

[1437:38]

will never use again and admittedly even

[1437:40]

I only use it for like a month or two of

[1437:42]

the year during CS50 itself. The intent

[1437:45]

was to be this incredibly foundational

[1437:47]

language that so many other languages

[1437:49]

today are built on top of. Case in

[1437:51]

point, the interpreter that you might

[1437:52]

use for Python itself can be written in

[1437:55]

C. And that speaks to how we sort of

[1437:57]

talked about bootstrapping from one

[1437:59]

language to another, from lowlevel to

[1438:01]

high level and beyond. Arrays and

[1438:03]

algorithms, all of that and uh memory

[1438:05]

and data structures like all of that is

[1438:06]

sort of omnipresent in computing, in

[1438:08]

programming and the like. even though

[1438:10]

you might not need to in modern

[1438:11]

languages like Python uh worry as much

[1438:14]

about managing your own memory because

[1438:16]

good programmers better programmers have

[1438:17]

figured out how to solve those problems

[1438:19]

for you in the language itself or in the

[1438:21]

libraries that you're using. You can

[1438:22]

take for granted now that you at least

[1438:24]

know what a hash table is, what a linked

[1438:26]

list is, what the trade-offs are among

[1438:27]

those, what the running times are. And

[1438:29]

that's what computer scientists and

[1438:30]

software engineers think about and talk

[1438:32]

about and whiteboard about in the real

[1438:34]

world when trying to implement

[1438:36]

algorithms of their own to real world

[1438:38]

problems or implementing real world

[1438:40]

products. And then of course over the

[1438:41]

past few weeks we've sort of used that

[1438:43]

as a stepping stone to talk about very

[1438:45]

modern programming paradigms. most

[1438:46]

recently web programming. And even

[1438:48]

though we didn't use it explicitly in

[1438:49]

the class, mobile programming is

[1438:51]

increasingly based on HTML and CSS and

[1438:53]

JavaScript, which might be something

[1438:55]

some of you will tackle for your own

[1438:56]

final projects. And you can't escape now

[1438:58]

using or seeing or leveraging somehow

[1439:01]

artificial intelligence. And among the

[1439:02]

goals for today is to at least point you

[1439:04]

in the direction of tools that now

[1439:06]

having finished problem set 9, you are

[1439:08]

welcome and encouraged to use for your

[1439:10]

final project so that you can build all

[1439:12]

the more um and all the more

[1439:14]

successfully than even some of your

[1439:15]

predecessors just a few years ago could

[1439:17]

have now that your own work and your own

[1439:18]

knowhow can be amplified by the impact

[1439:21]

of AI itself. Um this of course now

[1439:24]

brings us to today the end, but wanted

[1439:26]

to give you a sense of where you can go

[1439:28]

here on out. So with your final project,

[1439:30]

this really is the uh the intent of the

[1439:32]

final project is to be the very first of

[1439:34]

hopefully many projects that you decide

[1439:36]

to spec out for yourself. Like every

[1439:38]

problem set thus far has been written by

[1439:40]

me and the team and you've been

[1439:41]

following our instructions step by step.

[1439:43]

The final project takes all of those

[1439:44]

training wheels off. And even though you

[1439:46]

are welcome and encouraged to borrow

[1439:48]

code from say problem set 9 if you want

[1439:50]

to do something web- based or even

[1439:52]

earlier if you want to do something

[1439:53]

that's more similar to past pets is to

[1439:55]

make it ultimately your own. And even if

[1439:57]

you want, start with a completely empty

[1439:59]

window and just a blinking prompt and

[1440:01]

build something of your own. Um, setting

[1440:03]

out for yourself, as you've seen in the

[1440:05]

specification, a good goal, which you

[1440:07]

intend to meet no matter what, a better

[1440:10]

goal, which is a bit more of a stretch,

[1440:11]

and a best goal, which in practice

[1440:13]

rarely ever happens with software. To

[1440:15]

this day, 25 years since taking CS50

[1440:17]

myself, um, or plus now, um, even I

[1440:21]

consistently underappreciate just how

[1440:23]

long it takes sometimes to solve

[1440:24]

problems. But that's beginning to go

[1440:26]

away at least to some extent thanks to

[1440:28]

AI where at least now you essentially

[1440:30]

have a junior colleague next to you who

[1440:32]

can help solve bugs for you, point you

[1440:34]

in the right direction, even tackle

[1440:36]

features as well. Um, all that we ask

[1440:38]

for this final project is that you build

[1440:40]

something of interest to you, that you

[1440:41]

solve an actual problem, that you impact

[1440:43]

campus, or that you, as we say in the

[1440:45]

spec, change the world and try to

[1440:47]

achieve something, try to create

[1440:48]

something that outlives the course

[1440:50]

itself over these final few weeks of the

[1440:52]

class and even continue on with it if

[1440:54]

you'd like in January and beyond. Uh,

[1440:56]

for now, this the so-called CS50

[1440:59]

charades for which we need two teams of

[1441:01]

three. So, if you're sitting there in a

[1441:02]

group of three of friends total, or

[1441:05]

we'll form one up here live. So, come on

[1441:06]

up as our first volunteer. Need five

[1441:08]

more volunteers. Feel free to volunteer.

[1441:10]

The person's next to you. Three in a

[1441:12]

row. How about two more over here? One.

[1441:14]

And how about two on the end? Come on

[1441:16]

up. All right. And a round of applause

[1441:18]

for these six here volunteers. And

[1441:23]

all right, let me give you one

[1441:24]

microphone.

[1441:26]

Let me give you second microphone. And

[1441:28]

Kelly, if you want to come on up as

[1441:29]

well. I think these three seem to know

[1441:31]

each other already. So, we'll have them

[1441:32]

be one team. If you guys want to be

[1441:34]

another team as well, come on up. Uh,

[1441:36]

let me take one microphone actually for

[1441:38]

the other team. All right. And how about

[1441:41]

quick introductions to this team here.

[1441:43]

And first, we need a team name from you

[1441:44]

all. You haven't had time to think about

[1441:46]

this.

[1441:49]

>> Team A. Okay. So, team A is who?

[1441:55]

>> Uh, I'm Leah. I'm a first year and I'm

[1441:57]

in wholeworthy.

[1441:58]

>> Welcome. Uh,

[1441:59]

>> my name is Stephen. I'm a freshman in

[1442:01]

candidate F.

[1442:03]

I'm Charlotte. I'm a freshman and I'm

[1442:05]

also in Canada F.

[1442:06]

>> All right, let's do introductions on the

[1442:08]

other team as well. You are going to be

[1442:09]

team

[1442:10]

>> Awesome Sauce.

[1442:12]

>> Awesome sauce. Okay. Versus team A. Uh,

[1442:14]

if you want to go ahead and introduce

[1442:15]

yourselves here.

[1442:17]

>> Hi, my name is Jenny Pan. I'm a freshman

[1442:19]

in Hollis.

[1442:20]

>> Hi, my name is Noah. I'm a freshman in

[1442:23]

Halbut.

[1442:24]

>> And hi, my name is Marie and I'm a

[1442:25]

freshman. Sorry, I'm a freshman in

[1442:27]

Canada.

[1442:28]

>> All right, welcome to both of our teams

[1442:30]

here. And among the goals now, let's

[1442:31]

leave one microphone with each team, uh,

[1442:33]

is to play a bit of charades whereby one

[1442:35]

of you in a moment is going to be

[1442:37]

responsible for acting out a word that

[1442:39]

you see on the screen. So, we're going

[1442:40]

to put on this screen and this screen

[1442:41]

over here some term that relates to CS50

[1442:44]

somehow, and that person's goal over the

[1442:46]

course of 60 seconds is going to be to

[1442:48]

act that out in such a way that their

[1442:49]

teammates can hopefully guess what the

[1442:52]

word is. We'll give you 60 seconds at a

[1442:53]

time. Kelly has kindly offered to keep

[1442:55]

score. Um, and if you solve it in fewer

[1442:58]

than 60 seconds, we got another word for

[1442:59]

you and another word. And we'll see how

[1443:00]

many points you can acrewue over the

[1443:02]

course of those 60 seconds. And

[1443:03]

depending on how this goes, we'll do

[1443:04]

maybe one or two rounds in total.

[1443:06]

Questions.

[1443:07]

>> Skips do we get?

[1443:08]

>> How many skips do you get? I guess you

[1443:09]

can skip uh as many as you want until we

[1443:13]

run out of questions.

[1443:14]

>> Oh. Oh,

[1443:15]

>> but try not to run through all of our

[1443:16]

questions. All right. Any questions

[1443:18]

though beyond that? All right. So, if

[1443:20]

you guys want to step off stage over

[1443:21]

there, why don't we have team A begin?

[1443:23]

So, one of you, Leah, if you're holding

[1443:25]

the mic, if you want to be the charader,

[1443:28]

let's go ahead and have you stand here

[1443:30]

so you can see the screen. And we only

[1443:32]

ask that you two not look up because the

[1443:34]

answer is going to be right there.

[1443:36]

>> All right. And you should just shout out

[1443:38]

uh the word that Leah is acting out.

[1443:40]

Question.

[1443:40]

>> Acting only charades.

[1443:42]

>> Speaking.

[1443:43]

>> Yeah. Yeah, I can't speak because that

[1443:44]

would kind of defeat the point. So, yes,

[1443:46]

just acting out. Just acting out

[1443:47]

physically. All right.

[1443:48]

>> I'm going to go over here. Give me just

[1443:50]

a moment to get the slides ready with

[1443:53]

your questions. And Leah, the first

[1443:56]

clue. Oh, and Kelly's going to be timing

[1443:57]

you. 60 seconds to acrew as many points

[1443:59]

as you can. All right, here we go. Go.

[1444:05]

Act that out.

[1444:08]

>> Oh, that was weird. Thank you. Sorry.

[1444:11]

Yes. Act out. This is CS50. All right.

[1444:13]

No. Act this out. Please go.

[1444:18]

>> Loop. calling a recursion.

[1444:20]

>> Yes. One point

[1444:24]

>> coming

[1444:25]

>> uh an array link list

[1444:29]

>> abstraction

[1444:32]

>> snake.

[1444:33]

>> Python. Python.

[1444:35]

>> Yes. Python

[1444:39]

>> duck. The duck.

[1444:40]

>> Nice.

[1444:43]

>> Binary.

[1444:45]

Uh

[1444:46]

>> one zero

[1444:47]

>> binary digit bit

[1444:50]

>> bite

[1444:53]

>> one zero. It's definitely binary asy.

[1445:01]

>> Want to pass

[1445:06]

>> link list array.

[1445:07]

>> Yes. Array

[1445:11]

>> loop.

[1445:12]

>> Yes. Loop

[1445:13]

>> time. time. All right. Very nicely done.

[1445:19]

All right. Five is the score to beat.

[1445:22]

So, if you guys want to step over here,

[1445:24]

if uh one of you has the mic, go ahead

[1445:25]

and assume the same roles.

[1445:28]

Five is the score to beat.

[1445:33]

All right. Five is the score to beat.

[1445:35]

All right. Here we go. Final round.

[1445:39]

First word. And you guys just make sure

[1445:40]

you don't look up.

[1445:42]

Go. Head

[1445:52]

node

[1445:54]

>> algorithm

[1445:55]

>> input

[1445:57]

algorithm

[1446:06]

>> these are hard

[1446:10]

No.

[1446:13]

>> Sure. You have to act it out. Act it

[1446:14]

out.

[1446:20]

>> Oh, they go.

[1446:23]

Run time. Run time. What's that?

[1446:27]

>> Tree.

[1446:28]

>> Yes. Tree.

[1446:30]

>> Next one.

[1446:30]

>> Oh my god.

[1446:32]

>> Next one.

[1446:35]

>> Binary search.

[1446:38]

>> Binary boolean. No.

[1446:41]

A merge s call phone call

[1446:43]

>> function.

[1446:44]

>> It was binary search, wasn't it?

[1446:46]

>> What was binary

[1446:47]

>> phone? Oh, that's time. All right, but a

[1446:50]

round of applause for our team awesome

[1446:52]

sauce.

[1446:56]

>> Okay, we have some some parting prizes

[1446:58]

for you, your very own Super Mario Pezes

[1447:01]

for you guys as well. I'm glad we

[1447:03]

squared away that the ability to pass

[1447:04]

though on the question, so thank you for

[1447:06]

that. All right, so admittedly pretty

[1447:08]

hard. Our thanks to all of these

[1447:09]

volunteers for playing that out. Allow

[1447:11]

me to turn our attention back to here in

[1447:15]

just a moment where else uh we can go

[1447:18]

from here. So up until now

[1447:22]

up until now

[1447:24]

we've been using Visual Studio Code for

[1447:26]

CS50 at the URL CS50. Recall that this

[1447:29]

is just an adaptation of a commercial

[1447:31]

tool called GitHub code spaces which is

[1447:33]

like a cloud-based version of Visual

[1447:35]

Studio Code itself or VS code which is

[1447:37]

an largely open source tool for

[1447:39]

Microsoft that's incredibly popular in

[1447:41]

the industry which is to say even though

[1447:42]

we have the CS50 library in there and we

[1447:45]

turned off by default some of the menu

[1447:46]

options and we disabled AI. It is the

[1447:49]

tool that so many programmers around the

[1447:50]

world do use every day to write code. So

[1447:52]

you have been learning all this time

[1447:54]

sort of industry standards in that

[1447:55]

sense. It is now time if you so choose,

[1447:58]

but you are welcome to keep using this

[1448:00]

for your final project if feeling more

[1448:01]

comfortable with it. Uh to drop the

[1448:03]

4CS50 and actually install on your own

[1448:06]

Mac or PC if you so choose Visual Studio

[1448:08]

Code itself. You can go to this URL

[1448:10]

here. Um it's fairly straightforward to

[1448:12]

install it. But invariably you'll run

[1448:13]

into probably some technical support

[1448:15]

headaches depending on the language that

[1448:17]

you're trying to use with it. For

[1448:18]

instance, if you're trying to use it

[1448:19]

with Python, you'll probably also have

[1448:20]

to download and install Python onto your

[1448:22]

computer at least if you want the latest

[1448:24]

version. And just know a priori that

[1448:26]

sometimes just stuff happens and it just

[1448:28]

doesn't work and you have to Google or

[1448:30]

ask chat GPT and that's fine and

[1448:32]

honestly that's kind of normal but this

[1448:34]

is also why we don't do any of this in

[1448:36]

week zero of the class so that we can

[1448:37]

focus on hello world and Mario and cash

[1448:40]

and credit and get into the interesting

[1448:43]

parts of computing and programming and

[1448:45]

not frust uh not frustrating you so with

[1448:48]

technical support challenges. But now

[1448:49]

given that all of you are somewhere in

[1448:51]

between or among those more comfortable

[1448:53]

uh you're now ready to sort of uh deal

[1448:55]

with those same technical challenge

[1448:57]

yourself. But who knows maybe it will go

[1448:59]

perfectly smoothly. Um you can go to

[1449:00]

CS50's own documentation because if you

[1449:02]

want to be able to use all of the same

[1449:03]

software that CS50 has pre-installed you

[1449:06]

can use a technology known as

[1449:07]

containerization with a tool called

[1449:09]

Docker and actually run a CS50

[1449:11]

environment on your Mac or PC or even in

[1449:13]

the cloud but still run VS Code on your

[1449:16]

own Mac and PC. Among the upsides of

[1449:18]

which are that you're not dependent

[1449:19]

necessarily on the cloud. You can do

[1449:21]

everything offline. Uh which is useful

[1449:23]

in general. You can do things more

[1449:24]

quickly sometimes if you're using the

[1449:26]

full capabilities of your own computer

[1449:28]

and not just a browser. So this is

[1449:29]

generally how uh programmers approach

[1449:31]

their code using something like VS Code

[1449:33]

or alternative products. And in fact

[1449:35]

there's a bunch of others out there but

[1449:37]

perhaps the trendiest right now are

[1449:39]

these three here. Not just Visual uh

[1449:41]

Studio Code itself um but a tool called

[1449:43]

Cursor, another one called Windsurf.

[1449:45]

There's dozens of other text editors,

[1449:48]

often known as integrated development

[1449:49]

environments, which tend to have even

[1449:51]

more features that you can download for

[1449:52]

free or commercially on your own Macs,

[1449:54]

PCs, and the like. Uh, but you can't go

[1449:56]

wrong transitioning from CS50 to VS Code

[1449:58]

on your own Mac or PC, if only because

[1450:00]

you're already familiar with it. As for

[1450:02]

the command line, so those of you with

[1450:04]

Macs might have found somewhere in your

[1450:06]

utilities folder a program called

[1450:08]

Terminal. Um, if not, poke around there

[1450:10]

later today and you'll see that all this

[1450:12]

time you've had a command line interface

[1450:14]

available to you on Mac OS. Windows has

[1450:17]

something similar as well. They don't

[1450:18]

necessarily come with all of the same

[1450:19]

tools that we've been using within

[1450:21]

CS50.dev, but if you're a Mac user and

[1450:23]

you go to this URL here, or you're a

[1450:25]

Windows user and you go to this URL

[1450:27]

here, or if you're a Linux user, you

[1450:28]

probably know all of this already, so

[1450:30]

there's no URL for you there. Um you can

[1450:32]

install some of those same tools on your

[1450:34]

Mac and PC and feel all the more at home

[1450:36]

uh doing things in a command line as

[1450:38]

well. Um git this is something that we

[1450:40]

actually in CS50 abstract on top of.

[1450:42]

This is essentially the de facto

[1450:44]

standard nowadays for collaborating with

[1450:46]

other people using a central cloud

[1450:48]

server in order to share your code with

[1450:50]

it and in turn other people uh for

[1450:52]

versioning your code so that you keep

[1450:54]

track of multiple uh versions thereof

[1450:56]

and changes that you've made. um go to

[1450:58]

this URL here if you would like and

[1450:59]

you'll see a tutorial by CS50's own

[1451:01]

Brian U introducing you to actual Git

[1451:04]

because we've been sort of abstracting

[1451:05]

away this particular tool by just doing

[1451:08]

it all automatically for you. If you've

[1451:09]

ever gone through your timeline in

[1451:11]

CS50.dev being able to roll back to

[1451:13]

previous versions of your code, we're

[1451:15]

just using Git, but we're automatically

[1451:16]

running this command for you. If you

[1451:18]

want to collaborate with partners for

[1451:20]

your final project, you can use Git.

[1451:22]

However, I will encourage you to

[1451:23]

alternatively use Visual Studio Code's

[1451:26]

live share feature, which allows one of

[1451:28]

you to log into your code space, click

[1451:30]

some buttons, and then share access to

[1451:32]

your code space with your friend or your

[1451:34]

partner on whom with whom you're working

[1451:35]

on the project, and you can both in real

[1451:37]

time like Google Docs edit the code or

[1451:39]

different files therein uh using that

[1451:41]

one code space. A little easier than

[1451:43]

getting onboarded at least with Git. um

[1451:46]

hosting a website if this proves of

[1451:47]

interest for your final project or even

[1451:48]

after the course if it's a static

[1451:50]

website. Two popular places to go if

[1451:53]

only because they offer free tiers is

[1451:55]

what's called GitHub pages which you can

[1451:56]

use to just host HTML CSS and JavaScript

[1451:59]

with no Python, no Flask, no backend. Um

[1452:01]

or Netlefi is a popular company nowadays

[1452:03]

too that has an uh entry-level account

[1452:05]

that for which you can sign up for free.

[1452:07]

If you just want to have like a

[1452:08]

portfolio website, if you're an artist

[1452:10]

or a programmer, you just want to have

[1452:11]

static content that you write once and

[1452:13]

deploy, these are good starting points,

[1452:15]

but not all of them. Hosting a web app.

[1452:17]

So, this law, this list gets even

[1452:19]

longer. And all of these recommendations

[1452:21]

are essentially uh curated by the

[1452:23]

teaching staff. So, they're all

[1452:24]

opinionated, but these are perhaps the

[1452:26]

most common places you can go. Um,

[1452:28]

Amazon, Microsoft, Google, Cloudflare,

[1452:31]

they all have student type accounts. So,

[1452:33]

if you use your.edu email address, for

[1452:35]

instance, or some other form of proving

[1452:36]

your status as a current student, you

[1452:38]

can generally sign up for discounts and

[1452:40]

free access to a lot of these same

[1452:41]

services as well without having to pay

[1452:43]

while you're just learning along the

[1452:45]

way. GitHub has something similar called

[1452:47]

the student developer pack. And then a

[1452:49]

couple of other companies for hosting

[1452:50]

web apps that have been popular are

[1452:52]

Heroku, Verscell, and bunches of others.

[1452:54]

So by web app we mean not just HTML, CSS

[1452:57]

and JavaScript but maybe some Python

[1452:59]

maybe some JavaScript on the server

[1453:01]

maybe Ruby yet another language or any

[1453:03]

number of others when you actually need

[1453:05]

a backend in addition to the front end

[1453:07]

maybe you need a database as well this

[1453:09]

would be the place to start whether it's

[1453:11]

at the CS50 hackathon or beyond um and

[1453:14]

nowadays this is a slide that didn't

[1453:15]

even need to exist a couple of years ago

[1453:16]

asking AI again for your final projects

[1453:19]

you are welcome and encouraged to

[1453:21]

amplify your own productivity with AI

[1453:23]

not by having it do for you but moving

[1453:25]

away from the duck which by design has

[1453:27]

been fairly limited and meant to be a

[1453:29]

good teacher but not necessarily one

[1453:31]

that's going to be a good partner when

[1453:32]

it comes to building your final project.

[1453:34]

So chatbt claw gemini uh GitHub copilot

[1453:37]

openai codeex v 0ero um are all uh

[1453:41]

popular tools right now that you might

[1453:43]

want to play around with. The easiest of

[1453:45]

these to use perhaps if not familiar

[1453:47]

with say Chacha BT already would be

[1453:49]

GitHub copilot only because you can

[1453:51]

enable it within your CS50 code space by

[1453:54]

following our own documentation at

[1453:56]

cs50.thed the docs.io where we'll tell

[1453:58]

you the sequence of steps via which you

[1454:00]

can reenable AI now that you're allowed

[1454:02]

to for your final project and turn on

[1454:04]

all of those features that were disabled

[1454:06]

by default. Um and then there's still

[1454:08]

humans out there like it remains to be

[1454:09]

seen just how popular these websites are

[1454:11]

in the years to come for better or for

[1454:12]

worse. Um, but among the places that

[1454:14]

programmers and technopiles have gone

[1454:16]

for years are Reddit, Stack Overflow,

[1454:19]

Server Fault, where there's a rich

[1454:20]

history of questions and answers that

[1454:22]

ironically all of those AIs have been

[1454:24]

trained on, which unfortunately means

[1454:25]

some of these might be driven out of

[1454:27]

business eventually in some sense if

[1454:28]

we're all just turning only to AI. But

[1454:30]

when you actually want that human

[1454:31]

component, these are still good places

[1454:33]

to go. Um, and then news. Two of the

[1454:35]

many places you can go for news in

[1454:37]

technology, computing, computer science

[1454:39]

more broadly, would be TechCrunch is

[1454:41]

still a good one. hacker news so to

[1454:42]

speak and then you might have some of

[1454:44]

your own popular choices as well. Um and

[1454:46]

then if uh with some bias um take other

[1454:49]

classes like CS50 besides this

[1454:50]

undergraduate class has a rich history

[1454:52]

now over the past decade of creating all

[1454:54]

the more open courseware. So courses in

[1454:56]

more Python, more SQL, a language called

[1454:58]

R, cyber security, uh game development

[1455:01]

and more. All of those are linked at

[1455:03]

this URL here edex.org.css50

[1455:05]

where you need not pay or sign up beyond

[1455:07]

auditing the course and all of the

[1455:09]

content is freely available. something

[1455:11]

for winter break, for instance, if you

[1455:12]

want to dive a little more deeply into

[1455:13]

some subject for the sake of your final

[1455:15]

project, your professional aspirations,

[1455:16]

or even just to prepare for spring term.

[1455:20]

And then over the coming weeks too, will

[1455:21]

CS50 itself be soliciting interest in

[1455:24]

applications for becoming a teaching

[1455:25]

fellow or TF, a course assistant or CA.

[1455:27]

If you would like to get all the more

[1455:29]

involved as a teacher of CS50 next fall,

[1455:32]

uh do uh follow the application link

[1455:34]

that we will soon circulate uh via

[1455:36]

email. Um, and do stay in touch too if

[1455:38]

you just enjoy answering other people's

[1455:40]

questions or seeing what the pulse of

[1455:43]

sort of computing is. At this URL here

[1455:45]

is a whole bunch of CS50's own

[1455:47]

communities uh in social media largely

[1455:49]

via which you can follow along at home

[1455:51]

in the months and years to come too. So,

[1455:53]

a few thanks before we do one final game

[1455:56]

al together. Um, to all of the people

[1455:57]

who have been making this course

[1455:59]

possible. Um, so our friends at Memorial

[1456:00]

Hall who make bring us into this

[1456:02]

beautiful space and make it possible for

[1456:03]

us to have of all things a class in such

[1456:05]

a space. um our friends at ESS who help

[1456:08]

with the audio each and every week in

[1456:09]

CS50. Um the restaurant Changa down the

[1456:11]

road, we hope you'll continue to visit

[1456:13]

our friends there. Wesley Chen is a good

[1456:14]

friend of ours and the manager um please

[1456:16]

tell him you're from CS50 and I'm sure

[1456:18]

he'll be delighted to see you. Um and

[1456:19]

then CS50's own team, most of whom were

[1456:21]

in back there or sitting next to you

[1456:22]

with cameras um without whom the course

[1456:24]

wouldn't be possible. And of course

[1456:26]

CS50's own teaching fellows and CAS,

[1456:28]

just a few of whom posed here for this

[1456:29]

photo. If I could invite you to all give

[1456:31]

everyone here a round of applause, my

[1456:33]

thanks to all of them.

[1456:39]

So,

[1456:40]

um, and then of course the CS50 duck

[1456:42]

should be thanked as well. Okay.

[1456:47]

Thanks. The CS50's own Rang Shinlu and

[1456:49]

some of our own former teaching fellows

[1456:51]

and students who have been behind the

[1456:53]

development of that their duck that

[1456:55]

you've gotten to know over these past

[1456:57]

several months. All right, if Kelly

[1456:58]

could join me again on screen, the only

[1457:00]

thing between us and cake is a final

[1457:02]

game, namely a quiz show in which all of

[1457:04]

you can partake. Here we go. Question

[1457:07]

one. What is the largest number an 8bit

[1457:09]

unsigned binary digit can represent?

[1457:12]

256, 128, 255, or one?

[1457:16]

Starting strong, and keep in mind all of

[1457:18]

these questions came from you all

[1457:20]

because we asked you recently for review

[1457:21]

questions that are now on the screen.

[1457:26]

Again the timer is clicking and most

[1457:29]

popular answer was 255

[1457:32]

which I think if we click once more

[1457:34]

we'll confirm was in fact the correct

[1457:36]

answer. So why is that and why is it not

[1457:39]

256? Well if we start counting from zero

[1457:41]

as we always have that's consuming one

[1457:44]

of the 256 possibilities. So the largest

[1457:47]

number that we can represent with that's

[1457:48]

8 bit and unsigned which means no

[1457:50]

negative numbers involved is indeed

[1457:52]

going to be 255.

[1457:54]

treasure that information now always.

[1457:57]

All right, next question from Kelly.

[1457:59]

Which issue is at the center of the year

[1458:02]

2038 problem, which hopefully you added

[1458:04]

to your Google calendars a few weeks

[1458:06]

back. Integer overflow, malicious

[1458:08]

inputs, SQL injection attacks, or memory

[1458:10]

leak.

[1458:13]

Which of those is at the core of the

[1458:15]

year 2038 problem?

[1458:19]

All right, let's go ahead and reveal the

[1458:24]

number one answer with 92% of you saying

[1458:27]

integer overflow is in fact correct

[1458:31]

because we're still in the habit of

[1458:32]

using 32-bit integers to keep track of

[1458:35]

time from the so-called epoch which was

[1458:36]

January 1st, 1970. And unfortunately, we

[1458:40]

humans aren't great at sort of planning

[1458:41]

ahead. And so we're going to run out of

[1458:43]

permutations of 32bits by a certain date

[1458:46]

in the year 2038 unless everyone

[1458:48]

upgrades their computers to 64-bit

[1458:51]

counters which thankfully most every

[1458:53]

piece of modern hardware nowadays is

[1458:55]

using already. Your Macs, your PCs, and

[1458:57]

your phones. So hopefully this will be

[1458:59]

really a non-event, but hopefully you'll

[1459:00]

think of us in CS50 in uh you know 10

[1459:03]

plus years when your Google calendar

[1459:05]

reminder goes off. Question three, which

[1459:08]

of the following is not a step of

[1459:10]

compiling? Linking, pre-processing,

[1459:12]

assembling, or interpreting?

[1459:16]

Bit more of a challenge. Which of these

[1459:18]

is not a step of compiling?

[1459:22]

All right, almost 200 responses coming

[1459:24]

in.

[1459:27]

All right, why don't we go ahead and

[1459:28]

reveal the most popular answer with 54%

[1459:31]

of you saying interpreting is in fact

[1459:35]

correct. Recall that we we talked about

[1459:37]

compiling. Compiling itself is just one

[1459:39]

of several steps. There is in fact the

[1459:41]

pre-processing step which takes care of

[1459:43]

any of the hash symbols in C that start

[1459:46]

with hash include hashdefine and the

[1459:48]

like. That's pre-processing. Uh there

[1459:50]

was then assembling or there was then

[1459:52]

compiling which actually compiled your

[1459:54]

code into assembly code. There was then

[1459:57]

the assembler which would actually take

[1460:00]

it down further to machine code and then

[1460:03]

linking

[1460:04]

29. This is for 29% of you. The linking

[1460:07]

step, recall, was taking your zeros and

[1460:09]

ones and combining them with say CS50's

[1460:11]

libraries zeros and ones and maybe the

[1460:13]

standard IO libraries zeros and ones,

[1460:15]

linking them all together to give you

[1460:17]

one executable program like hello uh

[1460:20]

itself. All right, next question. What

[1460:23]

does a pointer store? The name of a

[1460:25]

variable, the memory addresses of a

[1460:27]

value, the size of a value, or the value

[1460:30]

of a variable?

[1460:33]

Think for a moment.

[1460:36]

What does a pointer store?

[1460:41]

All right, about 200 responses in and

[1460:43]

yes, the memory address of a variable

[1460:45]

with 96% of you confirming as much. That

[1460:48]

is correct. Question five.

[1460:54]

What is the running time of linear

[1460:56]

search? Big O of 1, big O of N, big O of

[1460:58]

N squared, or big O of N log N? linear

[1461:03]

search running time.

[1461:08]

And recall that with something like

[1461:10]

search, you could get lucky. But if big

[1461:13]

O is the upper bound on our running

[1461:15]

time, you might not. You might hit the

[1461:16]

end of the list that you're searching.

[1461:19]

And so the running time of linear search

[1461:21]

is of course big O of N. It might be

[1461:26]

omega of one, but not big O of one. At

[1461:28]

least if we're considering what the

[1461:30]

worst case scenarios might be. All

[1461:31]

right, on to question six. Which what

[1461:33]

data structure follows the first in

[1461:35]

first out principle? A Q, a link list, a

[1461:39]

stack, or a hash table? First in, first

[1461:42]

out, aka FIFO.

[1461:46]

Which of these is FIFO?

[1461:51]

All right. First in, first out is in

[1461:55]

fact a Q as you would hope if you're

[1461:56]

getting in line for a restaurant, for a

[1461:58]

store. You'd hope that if you're the

[1462:00]

first one in line, you're going to be

[1462:02]

the first one out equitably speaking.

[1462:04]

And so it is in fact a queue. The

[1462:05]

opposite of that in some sense then

[1462:07]

would have been a stack whereby when you

[1462:08]

think about the cafeteria trays, the

[1462:10]

sort of first one in is actually the

[1462:12]

last one out. So LIFO instead for a

[1462:14]

stack. All right, question seven. Which

[1462:17]

operator returns the memory address of a

[1462:20]

variable? An asterisk, a dollar sign, an

[1462:23]

amperand, or a hyphen and a greater than

[1462:26]

sign.

[1462:28]

presumably in C

[1462:32]

which returns the memory address of a

[1462:35]

variable.

[1462:39]

All right, let's see what everyone

[1462:41]

thinks.

[1462:42]

So the most popular and correct answer

[1462:45]

is the amperand. This is the address of

[1462:47]

operator. The asterisk recall in most

[1462:49]

context is the opposite of that. That's

[1462:51]

the dreference operator. It's actually

[1462:52]

go to an address. Um this is not a thing

[1462:56]

in C. Uh this though is similar in

[1462:59]

spirit to a combination of the star

[1463:00]

operator and the dot operator which

[1463:03]

means to dreference and follow a pointer

[1463:05]

to something inside of a strct

[1463:07]

typically. All right, question eight.

[1463:10]

Which SQL command is used to remove

[1463:11]

duplicate rows from a result set?

[1463:14]

Remove, unique, distinct, or clean?

[1463:19]

We didn't spend a huge amount of time on

[1463:21]

these keywords,

[1463:23]

but only one of them applies here. A

[1463:27]

result set is just the answers that you

[1463:28]

get back when doing your select. And if

[1463:31]

you want to filter out duplicates, you

[1463:34]

can in fact say

[1463:37]

distinct is correct. Unique is also a

[1463:39]

keyword in SQL, but that is when you

[1463:41]

want to define in your schema that a

[1463:43]

columns values are going to be unique,

[1463:45]

like an email address column instead.

[1463:47]

Distinct is how you filter out

[1463:48]

duplicates in your selects. All right,

[1463:50]

question nine. We're past the halfway

[1463:52]

mark. What does an HTTP code of 418

[1463:56]

signify? Not found. I'm a teapot.

[1463:58]

Forbidden, unauthorized.

[1464:02]

418.

[1464:04]

This too. If you know this one, moving

[1464:06]

forward, you'll be considered among the

[1464:08]

CS

[1464:10]

elite.

[1464:13]

answers are coming in a little slower,

[1464:14]

but I'm a teapot is correct, which is

[1464:16]

not actually a thing or useful

[1464:18]

technology. It was in fact an April

[1464:20]

Fool's joke years ago where a bunch of

[1464:22]

computer scientists got together in a

[1464:23]

room and wrote out an entire

[1464:25]

specification for what it means for a

[1464:27]

server to return 418. I'm a teapot. All

[1464:30]

right, number 10. Where does Malo

[1464:34]

dynamically allocate memory from? The

[1464:36]

heap, the stack, global variables, or

[1464:40]

assembly?

[1464:48]

All right,

[1464:50]

heap is in fact correct. That's the sort

[1464:52]

of top part of the memory. Even though

[1464:53]

top and bottom make no actual technical

[1464:55]

sense. It's just our artist rendition

[1464:56]

thereof. The stack recall is what is

[1464:58]

used when functions are being called.

[1465:00]

Every time a function is called, it gets

[1465:02]

a so-called frame on the stack. That's

[1465:03]

where your local variables and your

[1465:04]

arguments get put. But if in C you use

[1465:06]

maloc, it does in fact end up on the

[1465:08]

heap. in C. If you allocate memory with

[1465:11]

Maloc but forget to call free, what

[1465:13]

problem can occur? A memory leak,

[1465:14]

segmentation fault, stack overflow, or

[1465:17]

all of the above

[1465:20]

if you allocate memory with Maloc but

[1465:22]

forget to call free. What problem can

[1465:25]

occur?

[1465:31]

All right, most popular answer is in

[1465:34]

fact memory leak, which is correct. Um,

[1465:38]

you could imagine scenarios in which you

[1465:40]

also get a segmentation fault andor a

[1465:42]

stack overflow, but those aren't direct

[1465:43]

consequences of not calling free. That's

[1465:46]

generally the consequence of using too

[1465:48]

much memory, for instance, or in this

[1465:50]

case doing something wrong with your

[1465:51]

memory. So interrelated, yes, but in

[1465:53]

terms of not calling free for each

[1465:55]

maloc, this is what's going to happen by

[1465:57]

definition. All right, well done there.

[1466:00]

Next question, which is 12.

[1466:03]

What does this domain name give the web

[1466:06]

page of? Safetychool.org. Is it Harvard

[1466:09]

University? Is it Princeton University?

[1466:11]

Is it Yale University? Or Colombia

[1466:14]

University?

[1466:19]

All right. Recall that this was in the

[1466:21]

context of our HTTP redirections.

[1466:24]

Yes. Interesting. Yes. In fact, uh Yale

[1466:27]

University, some alum has been paying

[1466:29]

like $10 a year for like 20 years for

[1466:31]

this joke. safetychool.org if you visit

[1466:33]

it returns an HTTP 301 uh HTTP header

[1466:37]

which says the location of it is in fact

[1466:39]

yale.edu.

[1466:41]

All right 13 three to go. What is the

[1466:44]

purpose of DNS? Uh to encrypt data sent

[1466:46]

over the dark web to find the nearest

[1466:50]

coffee shop for you to protect your

[1466:52]

location against hackers or to translate

[1466:54]

domain names into IP addresses.

[1466:58]

What is the purpose of DNS? If helpful,

[1467:01]

domain name system.

[1467:06]

All right, about at the 200 mark and the

[1467:08]

correct answer is indeed domain names

[1467:10]

into IP addresses. That is a server that

[1467:13]

is on your home network, on your ISP's

[1467:15]

network, on your campus's network, your

[1467:17]

corporate network. That just answers

[1467:18]

questions like that for you. All right,

[1467:20]

second to last question. Which of the

[1467:22]

following is not a built-in SQL feature

[1467:24]

to tackle race conditions? Begin

[1467:27]

transaction, commit, roll back, or

[1467:29]

enroll?

[1467:35]

We talked ever so briefly about this in

[1467:37]

the context of ending up with too much

[1467:39]

milk. Recall

[1467:42]

and the correct answer is

[1467:45]

indeed in roll. All three of those even

[1467:47]

though you didn't have to use them for

[1467:49]

problem set seven or nine um are indeed

[1467:52]

uh features of SQL. Uh but enroll is not

[1467:56]

a thing. All right. And the very last

[1467:58]

question. and try to answer this as

[1468:00]

quickly as you can. What does Professor

[1468:02]

Men say at the beginning of every CS50

[1468:05]

lecture? Welcome to Harvard's computer

[1468:07]

science class. Hello everyone. Ready to

[1468:09]

code? All right, this is CS50

[1468:13]

or let's get started with some

[1468:15]

programming.

[1468:19]

All of these questions were in fact

[1468:21]

written by you all.

[1468:25]

All right. And the correct answer, I'm

[1468:27]

pretty sure with 98% of you saying so,

[1468:30]

is all right, this is CS50. And all

[1468:33]

right, this was CS50. Cake is now

[1468:37]

served.

Download Subtitles

These subtitles were extracted using the Free YouTube Subtitle Downloader by LunaNotes.

Download more subtitles

Related Videos

Most Viewed

ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1

ดาวน์โหลดซับไตเติ้ล DMD LAND 3 The Final Land Day 1

ดาวน์โหลดซับไตเติ้ลสำหรับวิดีโอ DMD LAND 3 The Final Land Day 1 เพื่อช่วยให้เข้าใจเนื้อหาได้ง่ายขึ้น และเพิ่มความสะดวกในการติดตามทุกช่วงเวลา เหมาะสำหรับผู้ชมที่ต้องการความชัดเจนและเข้าถึงข้อมูลอย่างครบถ้วน

Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63

Descarga Subtítulos para NARCISISMO | 6 DE COPAS - Episodio 63

Accede fácilmente a los subtítulos del episodio 63 de '6 DE COPAS', centrado en el narcisismo. Descargar estos subtítulos te ayudará a entender mejor el contenido y mejorar la experiencia de visualización.

Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56

Subtítulos para TIPOS DE APEGO | 6 DE COPAS Episodio 56

Descarga los subtítulos para el episodio 56 de la tercera temporada de 6 DE COPAS, centrado en los tipos de apego. Mejora tu comprensión y disfruta del contenido en detalle con nuestros subtítulos precisos y accesibles.

Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen

Untertitel für 'Nicos Weg' Deutsch lernen A1 Film herunterladen

Laden Sie die Untertitel für den gesamten Film 'Nicos Weg' herunter, um Ihr Deutschlernen auf A1 Niveau zu unterstützen. Untertitel helfen Ihnen, Wortschatz und Aussprache besser zu verstehen und verbessern das Hörverständnis effektiv.

Download Subtitles for Your Favorite Videos Easily

Download Subtitles for Your Favorite Videos Easily

Enhance your video watching experience by downloading accurate subtitles and captions. Enjoy better understanding, accessibility, and language support for all your favorite videos.

Buy us a coffee

If you found these subtitles useful, consider buying us a coffee. It would help us a lot!

Let's Try!

Start Taking Better Notes Today with LunaNotes!